linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PadLock processing multiple blocks at a time
       [not found]   ` <20041130222442.7b0f4f67.davem@davemloft.net>
@ 2005-01-11 17:03     ` Michal Ludvig
  2005-01-11 17:08       ` [PATCH 1/2] " Michal Ludvig
  2005-01-11 17:08       ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig
  0 siblings, 2 replies; 18+ messages in thread
From: Michal Ludvig @ 2005-01-11 17:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmorris, cryptoapi, linux-kernel

Hi all,

I have got some improvements for VIA PadLock crypto driver.

1. Generic extension to crypto/cipher.c that allows offloading the 
   encryption of the whole buffer in a given mode (CBC, ...) to the 
   algorithm provider (i.e. PadLock). Basically it extends 'struct 
   cipher_alg' by some new fields:

@@ -69,6 +73,18 @@ struct cipher_alg {
                          unsigned int keylen, u32 *flags);
        void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src);
        void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src);
+       size_t cia_max_nbytes;
+       size_t cia_req_align;
+       void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
+       void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+                       size_t nbytes, int encdec, int inplace);
 };

  If cia_<mode> is non-NULL that function is used instead of the 
  software <mode>_process chaining function (e.g. cbc_process()). In the 
  case of PadLock it can significantly speed-up the {en,de}cryption.

2. On top of this I have an extension of the padlock module to support 
   this scheme.

I will send both patches in separate follow ups.

The speedup gained by this change is quite significant (measured with 
bonnie on ext2 over dm-crypt with aes128):

			No encryption	2.6.10-bk1	multiblock
Writing with putc()	10454 (100%)	7479  (72%)	9353  (89%)
Rewriting		16510 (100%)	7628  (46%)	10611 (64%)
Writing intelligently	61128 (100%)	21132 (35%)	48103 (79%)
Reading with getc()	9406  (100%)	6916  (74%)	8801  (94%)
Reading intelligently	35885 (100%)	15271 (43%)	23202 (65%)

Numbers are in kB/s, percents show the slowdown from plaintext run. 
As can be seen, the multiblock encryption is significantly faster 
in comparsion to the already comitted single-block-at-a-time 
processing.

More statistics (e.g. comparsion with aes.ko and aes-i586.ko) are 
available at http://www.logix.cz/michal/devel/padlock/bench.xp

Dave, if you're OK with these changes, please merge them.

Michal Ludvig
-- 
* A mouse is a device used to point at the xterm you want to type in.
* Personal homepage - http://www.logix.cz/michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] PadLock processing multiple blocks at a time
  2005-01-11 17:03     ` PadLock processing multiple blocks at a time Michal Ludvig
@ 2005-01-11 17:08       ` Michal Ludvig
  2005-01-14 13:10         ` [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers " Michal Ludvig
  2005-01-11 17:08       ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Ludvig @ 2005-01-11 17:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmorris, cryptoapi, linux-kernel

# 
# Extends crypto/cipher.c for offloading the whole chaining modes
# to e.g. hardware crypto accelerators.
# 
#	Signed-off-by: Michal Ludvig <mludvig@suse.cz>
# 

Index: linux-2.6.10/crypto/api.c
===================================================================
--- linux-2.6.10.orig/crypto/api.c	2004-12-24 22:35:39.000000000 +0100
+++ linux-2.6.10/crypto/api.c	2005-01-10 16:37:11.943356651 +0100
@@ -217,6 +217,19 @@ int crypto_alg_available(const char *nam
 	return ret;
 }
 
+void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index)
+{
+	char *ptr;
+
+	ptr = kmalloc(size + alignment, mode);
+	*index = ptr;
+	if (alignment > 1 && ((long)ptr & (alignment - 1))) {
+		ptr += alignment - ((long)ptr & (alignment - 1));
+	}
+
+	return ptr;
+}
+
 static int __init init_crypto(void)
 {
 	printk(KERN_INFO "Initializing Cryptographic API\n");
@@ -231,3 +244,4 @@ EXPORT_SYMBOL_GPL(crypto_unregister_alg)
 EXPORT_SYMBOL_GPL(crypto_alloc_tfm);
 EXPORT_SYMBOL_GPL(crypto_free_tfm);
 EXPORT_SYMBOL_GPL(crypto_alg_available);
+EXPORT_SYMBOL_GPL(crypto_aligned_kmalloc);
Index: linux-2.6.10/include/linux/crypto.h
===================================================================
--- linux-2.6.10.orig/include/linux/crypto.h	2005-01-07 17:26:42.000000000 +0100
+++ linux-2.6.10/include/linux/crypto.h	2005-01-10 16:37:52.157648454 +0100
@@ -42,6 +42,7 @@
 #define CRYPTO_TFM_MODE_CBC		0x00000002
 #define CRYPTO_TFM_MODE_CFB		0x00000004
 #define CRYPTO_TFM_MODE_CTR		0x00000008
+#define CRYPTO_TFM_MODE_OFB		0x00000010
 
 #define CRYPTO_TFM_REQ_WEAK_KEY		0x00000100
 #define CRYPTO_TFM_RES_WEAK_KEY		0x00100000
@@ -72,6 +73,18 @@ struct cipher_alg {
 	                  unsigned int keylen, u32 *flags);
 	void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src);
 	void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src);
+	size_t cia_max_nbytes;
+	size_t cia_req_align;
+	void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
 };
 
 struct digest_alg {
@@ -124,6 +137,11 @@ int crypto_unregister_alg(struct crypto_
 int crypto_alg_available(const char *name, u32 flags);
 
 /*
+ * Helper function.
+ */
+void *crypto_aligned_kmalloc (size_t size, int mode, size_t alignment, void **index);
+
+/*
  * Transforms: user-instantiated objects which encapsulate algorithms
  * and core processing logic.  Managed via crypto_alloc_tfm() and
  * crypto_free_tfm(), as well as the various helpers below.
@@ -258,6 +276,18 @@ static inline unsigned int crypto_tfm_al
 	return tfm->__crt_alg->cra_digest.dia_digestsize;
 }
 
+static inline unsigned int crypto_tfm_alg_max_nbytes(struct crypto_tfm *tfm)
+{
+	BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+	return tfm->__crt_alg->cra_cipher.cia_max_nbytes;
+}
+
+static inline unsigned int crypto_tfm_alg_req_align(struct crypto_tfm *tfm)
+{
+	BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+	return tfm->__crt_alg->cra_cipher.cia_req_align;
+}
+
 /*
  * API wrappers.
  */
Index: linux-2.6.10/crypto/cipher.c
===================================================================
--- linux-2.6.10.orig/crypto/cipher.c	2004-12-24 22:34:57.000000000 +0100
+++ linux-2.6.10/crypto/cipher.c	2005-01-10 16:37:11.974350710 +0100
@@ -20,7 +20,31 @@
 #include "internal.h"
 #include "scatterwalk.h"
 
+#define CRA_CIPHER(tfm)	(tfm)->__crt_alg->cra_cipher
+
+#define DEF_TFM_FUNCTION(name,mode,encdec,iv)	\
+static int name(struct crypto_tfm *tfm,		\
+                struct scatterlist *dst,	\
+                struct scatterlist *src,	\
+		unsigned int nbytes)		\
+{						\
+	return crypt(tfm, dst, src, nbytes,	\
+		     mode, encdec, iv);		\
+}
+
+#define DEF_TFM_FUNCTION_IV(name,mode,encdec,iv)	\
+static int name(struct crypto_tfm *tfm,		\
+                struct scatterlist *dst,	\
+                struct scatterlist *src,	\
+		unsigned int nbytes, u8 *iv)	\
+{						\
+	return crypt(tfm, dst, src, nbytes,	\
+		     mode, encdec, iv);		\
+}
+
 typedef void (cryptfn_t)(void *, u8 *, const u8 *);
+typedef void (cryptblkfn_t)(void *, u8 *, const u8 *, u8 *,
+			    size_t, int, int);
 typedef void (procfn_t)(struct crypto_tfm *, u8 *,
                         u8*, cryptfn_t, int enc, void *, int);
 
@@ -38,6 +62,36 @@ static inline void xor_128(u8 *a, const 
 	((u32 *)a)[3] ^= ((u32 *)b)[3];
 }
 
+static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+			cryptfn_t *fn, int enc, void *info, int in_place)
+{
+	u8 *iv = info;
+	
+	/* Null encryption */
+	if (!iv)
+		return;
+		
+	if (enc) {
+		tfm->crt_u.cipher.cit_xor_block(iv, src);
+		(*fn)(crypto_tfm_ctx(tfm), dst, iv);
+		memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
+	} else {
+		u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
+		u8 *buf = in_place ? stack : dst;
+
+		(*fn)(crypto_tfm_ctx(tfm), buf, src);
+		tfm->crt_u.cipher.cit_xor_block(buf, iv);
+		memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
+		if (buf != dst)
+			memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
+	}
+}
+
+static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+			cryptfn_t fn, int enc, void *info, int in_place)
+{
+	(*fn)(crypto_tfm_ctx(tfm), dst, src);
+}
 
 /* 
  * Generic encrypt/decrypt wrapper for ciphers, handles operations across
@@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const 
 static int crypt(struct crypto_tfm *tfm,
 		 struct scatterlist *dst,
 		 struct scatterlist *src,
-                 unsigned int nbytes, cryptfn_t crfn,
-                 procfn_t prfn, int enc, void *info)
+		 unsigned int nbytes, 
+		 int mode, int enc, void *info)
 {
-	struct scatter_walk walk_in, walk_out;
-	const unsigned int bsize = crypto_tfm_alg_blocksize(tfm);
-	u8 tmp_src[bsize];
-	u8 tmp_dst[bsize];
+ 	cryptfn_t *cryptofn = NULL;
+ 	procfn_t *processfn = NULL;
+ 	cryptblkfn_t *cryptomultiblockfn = NULL;
+ 
+ 	struct scatter_walk walk_in, walk_out;
+ 	size_t max_nbytes = crypto_tfm_alg_max_nbytes(tfm);
+ 	size_t bsize = crypto_tfm_alg_blocksize(tfm);
+ 	int req_align = crypto_tfm_alg_req_align(tfm);
+ 	int ret = 0;
+	int gfp;
+ 	void *index_src = NULL, *index_dst = NULL;
+ 	u8 *iv = info;
+ 	u8 *tmp_src, *tmp_dst;
 
 	if (!nbytes)
-		return 0;
+		return ret;
 
 	if (nbytes % bsize) {
 		tfm->crt_flags |= CRYPTO_TFM_RES_BAD_BLOCK_LEN;
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
+ 
+ 	switch (mode) {
+ 		case CRYPTO_TFM_MODE_ECB:
+ 			if (CRA_CIPHER(tfm).cia_ecb)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_ecb;
+ 			else {
+ 				cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+						CRA_CIPHER(tfm).cia_encrypt :
+						CRA_CIPHER(tfm).cia_decrypt;
+ 				processfn = ecb_process;
+ 			}
+ 			break;
+ 
+ 		case CRYPTO_TFM_MODE_CBC:
+ 			if (CRA_CIPHER(tfm).cia_cbc)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_cbc;
+ 			else {
+ 				cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+						CRA_CIPHER(tfm).cia_encrypt :
+						CRA_CIPHER(tfm).cia_decrypt;
+ 				processfn = cbc_process;
+ 			}
+ 			break;
+ 
+		/* Until we have the appropriate {ofb,cfb,ctr}_process()
+		   functions, the following cases will return -ENOSYS if
+		   there is no HW support for the mode. */
+ 		case CRYPTO_TFM_MODE_OFB:
+ 			if (CRA_CIPHER(tfm).cia_ofb)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_ofb;
+ 			else
+ 				return -ENOSYS;
+ 			break;
+ 
+ 		case CRYPTO_TFM_MODE_CFB:
+ 			if (CRA_CIPHER(tfm).cia_cfb)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_cfb;
+ 			else
+ 				return -ENOSYS;
+ 			break;
+ 
+ 		case CRYPTO_TFM_MODE_CTR:
+ 			if (CRA_CIPHER(tfm).cia_ctr)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_ctr;
+ 			else
+ 				return -ENOSYS;
+ 			break;
+ 
+ 		default:
+ 			BUG();
+ 	}
+ 
+	if (cryptomultiblockfn)
+		bsize = (max_nbytes > nbytes) ? nbytes : max_nbytes;
+ 
+ 	/* Some hardware crypto engines may require a specific 
+ 	   alignment of the buffers. We will align the buffers
+ 	   already here to avoid their reallocating later. */
+	gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+	tmp_src = crypto_aligned_kmalloc(bsize, gfp,
+					 req_align, &index_src);
+	tmp_dst = crypto_aligned_kmalloc(bsize, gfp,
+					 req_align, &index_dst);
+ 
+ 	if (!index_src || !index_dst) {
+		ret = -ENOMEM;
+		goto out;
+  	}
+
 	scatterwalk_start(&walk_in, src);
 	scatterwalk_start(&walk_out, dst);
 
@@ -81,7 +214,13 @@ static int crypt(struct crypto_tfm *tfm,
 
 		scatterwalk_copychunks(src_p, &walk_in, bsize, 0);
 
-		prfn(tfm, dst_p, src_p, crfn, enc, info, in_place);
+ 		if (cryptomultiblockfn)
+ 			(*cryptomultiblockfn)(crypto_tfm_ctx(tfm),
+					      dst_p, src_p, iv,
+					      bsize, enc, in_place);
+ 		else
+ 			(*processfn)(tfm, dst_p, src_p, cryptofn,
+				     enc, info, in_place);
 
 		scatterwalk_done(&walk_in, 0, nbytes);
 
@@ -89,46 +228,23 @@ static int crypt(struct crypto_tfm *tfm,
 		scatterwalk_done(&walk_out, 1, nbytes);
 
 		if (!nbytes)
-			return 0;
+			goto out;
 
 		crypto_yield(tfm);
 	}
-}
-
-static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
-			cryptfn_t fn, int enc, void *info, int in_place)
-{
-	u8 *iv = info;
-	
-	/* Null encryption */
-	if (!iv)
-		return;
-		
-	if (enc) {
-		tfm->crt_u.cipher.cit_xor_block(iv, src);
-		fn(crypto_tfm_ctx(tfm), dst, iv);
-		memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
-	} else {
-		u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
-		u8 *buf = in_place ? stack : dst;
 
-		fn(crypto_tfm_ctx(tfm), buf, src);
-		tfm->crt_u.cipher.cit_xor_block(buf, iv);
-		memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
-		if (buf != dst)
-			memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
-	}
-}
+out:
+	if (index_src)
+		kfree(index_src);
+	if (index_dst)
+		kfree(index_dst);
 
-static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
-			cryptfn_t fn, int enc, void *info, int in_place)
-{
-	fn(crypto_tfm_ctx(tfm), dst, src);
+	return ret;
 }
 
 static int setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen)
 {
-	struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher;
+	struct cipher_alg *cia = &CRA_CIPHER(tfm);
 	
 	if (keylen < cia->cia_min_keysize || keylen > cia->cia_max_keysize) {
 		tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
@@ -138,80 +254,28 @@ static int setkey(struct crypto_tfm *tfm
 		                       &tfm->crt_flags);
 }
 
-static int ecb_encrypt(struct crypto_tfm *tfm,
-		       struct scatterlist *dst,
-                       struct scatterlist *src, unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_encrypt,
-	             ecb_process, 1, NULL);
-}
+DEF_TFM_FUNCTION(ecb_encrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_ENCRYPT, NULL);
+DEF_TFM_FUNCTION(ecb_decrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_DECRYPT, NULL);
 
-static int ecb_decrypt(struct crypto_tfm *tfm,
-                       struct scatterlist *dst,
-                       struct scatterlist *src,
-		       unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_decrypt,
-	             ecb_process, 1, NULL);
-}
-
-static int cbc_encrypt(struct crypto_tfm *tfm,
-                       struct scatterlist *dst,
-                       struct scatterlist *src,
-		       unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_encrypt,
-	             cbc_process, 1, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_encrypt_iv(struct crypto_tfm *tfm,
-                          struct scatterlist *dst,
-                          struct scatterlist *src,
-                          unsigned int nbytes, u8 *iv)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_encrypt,
-	             cbc_process, 1, iv);
-}
-
-static int cbc_decrypt(struct crypto_tfm *tfm,
-                       struct scatterlist *dst,
-                       struct scatterlist *src,
-		       unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_decrypt,
-	             cbc_process, 0, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_decrypt_iv(struct crypto_tfm *tfm,
-                          struct scatterlist *dst,
-                          struct scatterlist *src,
-                          unsigned int nbytes, u8 *iv)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_decrypt,
-	             cbc_process, 0, iv);
-}
-
-static int nocrypt(struct crypto_tfm *tfm,
-                   struct scatterlist *dst,
-                   struct scatterlist *src,
-		   unsigned int nbytes)
-{
-	return -ENOSYS;
-}
-
-static int nocrypt_iv(struct crypto_tfm *tfm,
-                      struct scatterlist *dst,
-                      struct scatterlist *src,
-                      unsigned int nbytes, u8 *iv)
-{
-	return -ENOSYS;
-}
+DEF_TFM_FUNCTION(cbc_encrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_encrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cbc_decrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_decrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(cfb_encrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_encrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cfb_decrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_decrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ofb_encrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_encrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ofb_decrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_decrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ctr_encrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_encrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ctr_decrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_decrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, iv);
 
 int crypto_init_cipher_flags(struct crypto_tfm *tfm, u32 flags)
 {
@@ -245,17 +309,24 @@ int crypto_init_cipher_ops(struct crypto
 		break;
 		
 	case CRYPTO_TFM_MODE_CFB:
-		ops->cit_encrypt = nocrypt;
-		ops->cit_decrypt = nocrypt;
-		ops->cit_encrypt_iv = nocrypt_iv;
-		ops->cit_decrypt_iv = nocrypt_iv;
+		ops->cit_encrypt = cfb_encrypt;
+		ops->cit_decrypt = cfb_decrypt;
+		ops->cit_encrypt_iv = cfb_encrypt_iv;
+		ops->cit_decrypt_iv = cfb_decrypt_iv;
+		break;
+	
+	case CRYPTO_TFM_MODE_OFB:
+		ops->cit_encrypt = ofb_encrypt;
+		ops->cit_decrypt = ofb_decrypt;
+		ops->cit_encrypt_iv = ofb_encrypt_iv;
+		ops->cit_decrypt_iv = ofb_decrypt_iv;
 		break;
 	
 	case CRYPTO_TFM_MODE_CTR:
-		ops->cit_encrypt = nocrypt;
-		ops->cit_decrypt = nocrypt;
-		ops->cit_encrypt_iv = nocrypt_iv;
-		ops->cit_decrypt_iv = nocrypt_iv;
+		ops->cit_encrypt = ctr_encrypt;
+		ops->cit_decrypt = ctr_decrypt;
+		ops->cit_encrypt_iv = ctr_encrypt_iv;
+		ops->cit_decrypt_iv = ctr_decrypt_iv;
 		break;
 
 	default:

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] PadLock processing multiple blocks at a time
  2005-01-11 17:03     ` PadLock processing multiple blocks at a time Michal Ludvig
  2005-01-11 17:08       ` [PATCH 1/2] " Michal Ludvig
@ 2005-01-11 17:08       ` Michal Ludvig
  2005-01-14  3:05         ` Andrew Morton
  2005-01-14 13:15         ` [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once Michal Ludvig
  1 sibling, 2 replies; 18+ messages in thread
From: Michal Ludvig @ 2005-01-11 17:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmorris, cryptoapi, linux-kernel

# 
# Update to padlock-aes.c that enables processing of the whole 
# buffer of data at once with the given chaining mode (e.g. CBC).
# 
# Signed-off-by: Michal Ludvig <michal@logix.cz>
# 
Index: linux-2.6.10/drivers/crypto/padlock-aes.c
===================================================================
--- linux-2.6.10.orig/drivers/crypto/padlock-aes.c	2005-01-07 17:26:42.000000000 +0100
+++ linux-2.6.10/drivers/crypto/padlock-aes.c	2005-01-10 17:59:17.000000000 +0100
@@ -369,19 +369,54 @@ aes_set_key(void *ctx_arg, const uint8_t
 
 /* ====== Encryption/decryption routines ====== */
 
-/* This is the real call to PadLock. */
-static inline void
+/* These are the real calls to PadLock. */
+static inline void *
 padlock_xcrypt_ecb(uint8_t *input, uint8_t *output, uint8_t *key,
-		   void *control_word, uint32_t count)
+		   uint8_t *iv, void *control_word, uint32_t count)
 {
 	asm volatile ("pushfl; popfl");		/* enforce key reload. */
 	asm volatile (".byte 0xf3,0x0f,0xa7,0xc8"	/* rep xcryptecb */
 		      : "+S"(input), "+D"(output)
 		      : "d"(control_word), "b"(key), "c"(count));
+	return NULL;
+}
+
+static inline void *
+padlock_xcrypt_cbc(uint8_t *input, uint8_t *output, uint8_t *key,
+		   uint8_t *iv, void *control_word, uint32_t count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xd0"	/* rep xcryptcbc */
+		      : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv)
+		      : "d"(control_word), "b"(key), "c"(count));
+	return iv;
+}
+
+static inline void *
+padlock_xcrypt_cfb(uint8_t *input, uint8_t *output, uint8_t *key,
+		   uint8_t *iv, void *control_word, uint32_t count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xe0"	/* rep xcryptcfb */
+		      : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv)
+		      : "d"(control_word), "b"(key), "c"(count));
+	return iv;
+}
+
+static inline void *
+padlock_xcrypt_ofb(uint8_t *input, uint8_t *output, uint8_t *key,
+		   uint8_t *iv, void *control_word, uint32_t count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xe8"	/* rep xcryptofb */
+		      : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv)
+		      : "d"(control_word), "b"(key), "c"(count));
+	return iv;
 }
 
 static void
-aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, int encdec)
+aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg,
+	    uint8_t *iv_arg, size_t nbytes, int encdec, int mode)
 {
 	/* Don't blindly modify this structure - the items must 
 	   fit on 16-Bytes boundaries! */
@@ -419,21 +454,126 @@ aes_padlock(void *ctx_arg, uint8_t *out_
 	else
 		key = ctx->D;
 	
-	memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
-	padlock_xcrypt_ecb(data->buf, data->buf, key, &data->cword, 1);
-	memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+	if (nbytes == AES_BLOCK_SIZE) {
+		/* Processing one block only => ECB is enough */
+		memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
+		padlock_xcrypt_ecb(data->buf, data->buf, key, NULL,
+				   &data->cword, 1);
+		memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+	} else {
+		/* Processing multiple blocks at once */
+		uint8_t *in, *out, *iv;
+		int gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+		void *index = NULL;
+
+		if (unlikely(((long)in_arg) & 0x0F)) {
+			in = crypto_aligned_kmalloc(nbytes, gfp, 16, &index);
+			memcpy(in, in_arg, nbytes);
+		}
+		else
+			in = (uint8_t*)in_arg;
+
+		if (unlikely(((long)out_arg) & 0x0F)) {
+			if (index)
+				out = in;	/* xcrypt can work "in place" */
+			else
+				out = crypto_aligned_kmalloc(nbytes, gfp, 16,
+							     &index);
+		}
+		else
+			out = out_arg;
+
+		/* Always make a local copy of IV - xcrypt may change it! */
+		iv = data->buf;
+		if (iv_arg)
+			memcpy(iv, iv_arg, AES_BLOCK_SIZE);
+
+		switch (mode) {
+			case CRYPTO_TFM_MODE_ECB:
+				iv = padlock_xcrypt_ecb(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			case CRYPTO_TFM_MODE_CBC:
+				iv = padlock_xcrypt_cbc(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			case CRYPTO_TFM_MODE_CFB:
+				iv = padlock_xcrypt_cfb(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			case CRYPTO_TFM_MODE_OFB:
+				iv = padlock_xcrypt_ofb(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			default:
+				BUG();
+		}
+
+		/* Back up IV */
+		if (iv && iv_arg)
+			memcpy(iv_arg, iv, AES_BLOCK_SIZE);
+
+		/* Copy the 16-Byte aligned output to the caller's buffer. */
+		if (out != out_arg)
+			memcpy(out_arg, out, nbytes);
+
+		if (index)
+			kfree(index);
+	}
+}
+
+static void
+aes_padlock_ecb(void *ctx, uint8_t *dst, const uint8_t *src,
+		uint8_t *iv, size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, NULL, nbytes, encdec,
+		    CRYPTO_TFM_MODE_ECB);
+}
+
+static void
+aes_padlock_cbc(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+		size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+		    CRYPTO_TFM_MODE_CBC);
+}
+
+static void
+aes_padlock_cfb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+		size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+		    CRYPTO_TFM_MODE_CFB);
+}
+
+static void
+aes_padlock_ofb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+		size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+		    CRYPTO_TFM_MODE_OFB);
 }
 
 static void
 aes_encrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
 {
-	aes_padlock(ctx_arg, out, in, CRYPTO_DIR_ENCRYPT);
+	aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+		    CRYPTO_DIR_ENCRYPT, CRYPTO_TFM_MODE_ECB);
 }
 
 static void
 aes_decrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
 {
-	aes_padlock(ctx_arg, out, in, CRYPTO_DIR_DECRYPT);
+	aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+		    CRYPTO_DIR_DECRYPT, CRYPTO_TFM_MODE_ECB);
 }
 
 static struct crypto_alg aes_alg = {
@@ -454,9 +594,25 @@ static struct crypto_alg aes_alg = {
 	}
 };
 
+static int disable_multiblock = 0;
+MODULE_PARM(disable_multiblock, "i");
+MODULE_PARM_DESC(disable_multiblock,
+		 "Disable encryption of whole multiblock buffers.");
+
 int __init padlock_init_aes(void)
 {
-	printk(KERN_NOTICE PFX "Using VIA PadLock ACE for AES algorithm.\n");
+	if (!disable_multiblock) {
+		aes_alg.cra_u.cipher.cia_max_nbytes = (size_t)-1;
+		aes_alg.cra_u.cipher.cia_req_align  = 16;
+		aes_alg.cra_u.cipher.cia_ecb        = aes_padlock_ecb;
+		aes_alg.cra_u.cipher.cia_cbc        = aes_padlock_cbc;
+		aes_alg.cra_u.cipher.cia_cfb        = aes_padlock_cfb;
+		aes_alg.cra_u.cipher.cia_ofb        = aes_padlock_ofb;
+	}
+
+	printk(KERN_NOTICE PFX 
+		"Using VIA PadLock ACE for AES algorithm%s.\n", 
+		disable_multiblock ? "" : " (multiblock)");
 
 	gen_tabs();
 	return crypto_register_alg(&aes_alg);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] PadLock processing multiple blocks at a time
  2005-01-11 17:08       ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig
@ 2005-01-14  3:05         ` Andrew Morton
  2005-01-14 13:15         ` [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once Michal Ludvig
  1 sibling, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2005-01-14  3:05 UTC (permalink / raw)
  To: Michal Ludvig; +Cc: davem, jmorris, cryptoapi, linux-kernel

Michal Ludvig <michal@logix.cz> wrote:
>
> # 
> # Update to padlock-aes.c that enables processing of the whole 
> # buffer of data at once with the given chaining mode (e.g. CBC).
> # 

Please don't email different patche sunder the same Subject:.  Choose a
Subject: which is meaningful for each patch?

This one kills gcc-2.95.x:

drivers/crypto/padlock-aes.c: In function `aes_padlock':
drivers/crypto/padlock-aes.c:391: impossible register constraint in `asm'
drivers/crypto/padlock-aes.c:402: impossible register constraint in `asm'
drivers/crypto/padlock-aes.c:413: impossible register constraint in `asm'
drivers/crypto/padlock-aes.c:391: `asm' needs too many reloads


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-11 17:08       ` [PATCH 1/2] " Michal Ludvig
@ 2005-01-14 13:10         ` Michal Ludvig
  2005-01-14 14:20           ` Fruhwirth Clemens
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Ludvig @ 2005-01-14 13:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, jmorris, cryptoapi, linux-kernel

Hi all,

I'm resending this patch with trailing spaces removed per Andrew's 
comment.

This patch extends crypto/cipher.c for offloading the whole chaining modes
to e.g. hardware crypto accelerators. It is much faster to let the 
hardware do all the chaining if it can do so.

Signed-off-by: Michal Ludvig <michal@logix.cz>

---

 crypto/api.c           |   14 ++
 crypto/cipher.c        |  313 ++++++++++++++++++++++++++++++-------------------
 include/linux/crypto.h |   30 ++++
 3 files changed, 236 insertions(+), 121 deletions(-)


Index: linux-2.6.10/crypto/api.c
===================================================================
--- linux-2.6.10.orig/crypto/api.c	2004-12-24 22:35:39.000000000 +0100
+++ linux-2.6.10/crypto/api.c	2005-01-10 16:37:11.943356651 +0100
@@ -217,6 +217,19 @@ int crypto_alg_available(const char *nam
 	return ret;
 }
 
+void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index)
+{
+	char *ptr;
+
+	ptr = kmalloc(size + alignment, mode);
+	*index = ptr;
+	if (alignment > 1 && ((long)ptr & (alignment - 1))) {
+		ptr += alignment - ((long)ptr & (alignment - 1));
+	}
+
+	return ptr;
+}
+
 static int __init init_crypto(void)
 {
 	printk(KERN_INFO "Initializing Cryptographic API\n");
@@ -231,3 +244,4 @@ EXPORT_SYMBOL_GPL(crypto_unregister_alg)
 EXPORT_SYMBOL_GPL(crypto_alloc_tfm);
 EXPORT_SYMBOL_GPL(crypto_free_tfm);
 EXPORT_SYMBOL_GPL(crypto_alg_available);
+EXPORT_SYMBOL_GPL(crypto_aligned_kmalloc);
Index: linux-2.6.10/include/linux/crypto.h
===================================================================
--- linux-2.6.10.orig/include/linux/crypto.h	2005-01-07 17:26:42.000000000 +0100
+++ linux-2.6.10/include/linux/crypto.h	2005-01-10 16:37:52.157648454 +0100
@@ -42,6 +42,7 @@
 #define CRYPTO_TFM_MODE_CBC		0x00000002
 #define CRYPTO_TFM_MODE_CFB		0x00000004
 #define CRYPTO_TFM_MODE_CTR		0x00000008
+#define CRYPTO_TFM_MODE_OFB		0x00000010
 
 #define CRYPTO_TFM_REQ_WEAK_KEY		0x00000100
 #define CRYPTO_TFM_RES_WEAK_KEY		0x00100000
@@ -72,6 +73,18 @@ struct cipher_alg {
 	                  unsigned int keylen, u32 *flags);
 	void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src);
 	void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src);
+	size_t cia_max_nbytes;
+	size_t cia_req_align;
+	void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
+	void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
+			size_t nbytes, int encdec, int inplace);
 };
 
 struct digest_alg {
@@ -124,6 +137,11 @@ int crypto_unregister_alg(struct crypto_
 int crypto_alg_available(const char *name, u32 flags);
 
 /*
+ * Helper function.
+ */
+void *crypto_aligned_kmalloc (size_t size, int mode, size_t alignment, void **index);
+
+/*
  * Transforms: user-instantiated objects which encapsulate algorithms
  * and core processing logic.  Managed via crypto_alloc_tfm() and
  * crypto_free_tfm(), as well as the various helpers below.
@@ -258,6 +276,18 @@ static inline unsigned int crypto_tfm_al
 	return tfm->__crt_alg->cra_digest.dia_digestsize;
 }
 
+static inline unsigned int crypto_tfm_alg_max_nbytes(struct crypto_tfm *tfm)
+{
+	BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+	return tfm->__crt_alg->cra_cipher.cia_max_nbytes;
+}
+
+static inline unsigned int crypto_tfm_alg_req_align(struct crypto_tfm *tfm)
+{
+	BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
+	return tfm->__crt_alg->cra_cipher.cia_req_align;
+}
+
 /*
  * API wrappers.
  */
Index: linux-2.6.10/crypto/cipher.c
===================================================================
--- linux-2.6.10.orig/crypto/cipher.c	2004-12-24 22:34:57.000000000 +0100
+++ linux-2.6.10/crypto/cipher.c	2005-01-10 16:37:11.974350710 +0100
@@ -20,7 +20,31 @@
 #include "internal.h"
 #include "scatterwalk.h"
 
+#define CRA_CIPHER(tfm)	(tfm)->__crt_alg->cra_cipher
+
+#define DEF_TFM_FUNCTION(name,mode,encdec,iv)	\
+static int name(struct crypto_tfm *tfm,		\
+                struct scatterlist *dst,	\
+                struct scatterlist *src,	\
+		unsigned int nbytes)		\
+{						\
+	return crypt(tfm, dst, src, nbytes,	\
+		     mode, encdec, iv);		\
+}
+
+#define DEF_TFM_FUNCTION_IV(name,mode,encdec,iv)	\
+static int name(struct crypto_tfm *tfm,		\
+                struct scatterlist *dst,	\
+                struct scatterlist *src,	\
+		unsigned int nbytes, u8 *iv)	\
+{						\
+	return crypt(tfm, dst, src, nbytes,	\
+		     mode, encdec, iv);		\
+}
+
 typedef void (cryptfn_t)(void *, u8 *, const u8 *);
+typedef void (cryptblkfn_t)(void *, u8 *, const u8 *, u8 *,
+			    size_t, int, int);
 typedef void (procfn_t)(struct crypto_tfm *, u8 *,
                         u8*, cryptfn_t, int enc, void *, int);
 
@@ -38,6 +62,36 @@ static inline void xor_128(u8 *a, const 
 	((u32 *)a)[3] ^= ((u32 *)b)[3];
 }
 
+static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+			cryptfn_t *fn, int enc, void *info, int in_place)
+{
+	u8 *iv = info;
+
+	/* Null encryption */
+	if (!iv)
+		return;
+
+	if (enc) {
+		tfm->crt_u.cipher.cit_xor_block(iv, src);
+		(*fn)(crypto_tfm_ctx(tfm), dst, iv);
+		memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
+	} else {
+		u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
+		u8 *buf = in_place ? stack : dst;
+
+		(*fn)(crypto_tfm_ctx(tfm), buf, src);
+		tfm->crt_u.cipher.cit_xor_block(buf, iv);
+		memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
+		if (buf != dst)
+			memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
+	}
+}
+
+static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
+			cryptfn_t fn, int enc, void *info, int in_place)
+{
+	(*fn)(crypto_tfm_ctx(tfm), dst, src);
+}
 
 /*
  * Generic encrypt/decrypt wrapper for ciphers, handles operations across
@@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const 
 static int crypt(struct crypto_tfm *tfm,
 		 struct scatterlist *dst,
 		 struct scatterlist *src,
-                 unsigned int nbytes, cryptfn_t crfn,
-                 procfn_t prfn, int enc, void *info)
+		 unsigned int nbytes,
+		 int mode, int enc, void *info)
 {
-	struct scatter_walk walk_in, walk_out;
-	const unsigned int bsize = crypto_tfm_alg_blocksize(tfm);
-	u8 tmp_src[bsize];
-	u8 tmp_dst[bsize];
+ 	cryptfn_t *cryptofn = NULL;
+ 	procfn_t *processfn = NULL;
+ 	cryptblkfn_t *cryptomultiblockfn = NULL;
+
+ 	struct scatter_walk walk_in, walk_out;
+ 	size_t max_nbytes = crypto_tfm_alg_max_nbytes(tfm);
+ 	size_t bsize = crypto_tfm_alg_blocksize(tfm);
+ 	int req_align = crypto_tfm_alg_req_align(tfm);
+ 	int ret = 0;
+	int gfp;
+ 	void *index_src = NULL, *index_dst = NULL;
+ 	u8 *iv = info;
+ 	u8 *tmp_src, *tmp_dst;
 
 	if (!nbytes)
-		return 0;
+		return ret;
 
 	if (nbytes % bsize) {
 		tfm->crt_flags |= CRYPTO_TFM_RES_BAD_BLOCK_LEN;
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
+
+ 	switch (mode) {
+ 		case CRYPTO_TFM_MODE_ECB:
+ 			if (CRA_CIPHER(tfm).cia_ecb)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_ecb;
+ 			else {
+ 				cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+						CRA_CIPHER(tfm).cia_encrypt :
+						CRA_CIPHER(tfm).cia_decrypt;
+ 				processfn = ecb_process;
+ 			}
+ 			break;
+
+ 		case CRYPTO_TFM_MODE_CBC:
+ 			if (CRA_CIPHER(tfm).cia_cbc)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_cbc;
+ 			else {
+ 				cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ?
+						CRA_CIPHER(tfm).cia_encrypt :
+						CRA_CIPHER(tfm).cia_decrypt;
+ 				processfn = cbc_process;
+ 			}
+ 			break;
+
+		/* Until we have the appropriate {ofb,cfb,ctr}_process()
+		   functions, the following cases will return -ENOSYS if
+		   there is no HW support for the mode. */
+ 		case CRYPTO_TFM_MODE_OFB:
+ 			if (CRA_CIPHER(tfm).cia_ofb)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_ofb;
+ 			else
+ 				return -ENOSYS;
+ 			break;
+
+ 		case CRYPTO_TFM_MODE_CFB:
+ 			if (CRA_CIPHER(tfm).cia_cfb)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_cfb;
+ 			else
+ 				return -ENOSYS;
+ 			break;
+
+ 		case CRYPTO_TFM_MODE_CTR:
+ 			if (CRA_CIPHER(tfm).cia_ctr)
+ 				cryptomultiblockfn = CRA_CIPHER(tfm).cia_ctr;
+ 			else
+ 				return -ENOSYS;
+ 			break;
+
+ 		default:
+ 			BUG();
+ 	}
+
+	if (cryptomultiblockfn)
+		bsize = (max_nbytes > nbytes) ? nbytes : max_nbytes;
+
+ 	/* Some hardware crypto engines may require a specific
+ 	   alignment of the buffers. We will align the buffers
+ 	   already here to avoid their reallocating later. */
+	gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+	tmp_src = crypto_aligned_kmalloc(bsize, gfp,
+					 req_align, &index_src);
+	tmp_dst = crypto_aligned_kmalloc(bsize, gfp,
+					 req_align, &index_dst);
+
+ 	if (!index_src || !index_dst) {
+		ret = -ENOMEM;
+		goto out;
+  	}
+
 	scatterwalk_start(&walk_in, src);
 	scatterwalk_start(&walk_out, dst);
 
@@ -81,7 +214,13 @@ static int crypt(struct crypto_tfm *tfm,
 
 		scatterwalk_copychunks(src_p, &walk_in, bsize, 0);
 
-		prfn(tfm, dst_p, src_p, crfn, enc, info, in_place);
+ 		if (cryptomultiblockfn)
+ 			(*cryptomultiblockfn)(crypto_tfm_ctx(tfm),
+					      dst_p, src_p, iv,
+					      bsize, enc, in_place);
+ 		else
+ 			(*processfn)(tfm, dst_p, src_p, cryptofn,
+				     enc, info, in_place);
 
 		scatterwalk_done(&walk_in, 0, nbytes);
 
@@ -89,46 +228,23 @@ static int crypt(struct crypto_tfm *tfm,
 		scatterwalk_done(&walk_out, 1, nbytes);
 
 		if (!nbytes)
-			return 0;
+			goto out;
 
 		crypto_yield(tfm);
 	}
-}
-
-static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
-			cryptfn_t fn, int enc, void *info, int in_place)
-{
-	u8 *iv = info;
-	
-	/* Null encryption */
-	if (!iv)
-		return;
-		
-	if (enc) {
-		tfm->crt_u.cipher.cit_xor_block(iv, src);
-		fn(crypto_tfm_ctx(tfm), dst, iv);
-		memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm));
-	} else {
-		u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0];
-		u8 *buf = in_place ? stack : dst;
 
-		fn(crypto_tfm_ctx(tfm), buf, src);
-		tfm->crt_u.cipher.cit_xor_block(buf, iv);
-		memcpy(iv, src, crypto_tfm_alg_blocksize(tfm));
-		if (buf != dst)
-			memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm));
-	}
-}
+out:
+	if (index_src)
+		kfree(index_src);
+	if (index_dst)
+		kfree(index_dst);
 
-static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src,
-			cryptfn_t fn, int enc, void *info, int in_place)
-{
-	fn(crypto_tfm_ctx(tfm), dst, src);
+	return ret;
 }
 
 static int setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen)
 {
-	struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher;
+	struct cipher_alg *cia = &CRA_CIPHER(tfm);
 	
 	if (keylen < cia->cia_min_keysize || keylen > cia->cia_max_keysize) {
 		tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
@@ -138,80 +254,28 @@ static int setkey(struct crypto_tfm *tfm
 		                       &tfm->crt_flags);
 }
 
-static int ecb_encrypt(struct crypto_tfm *tfm,
-		       struct scatterlist *dst,
-                       struct scatterlist *src, unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_encrypt,
-	             ecb_process, 1, NULL);
-}
+DEF_TFM_FUNCTION(ecb_encrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_ENCRYPT, NULL);
+DEF_TFM_FUNCTION(ecb_decrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_DECRYPT, NULL);
 
-static int ecb_decrypt(struct crypto_tfm *tfm,
-                       struct scatterlist *dst,
-                       struct scatterlist *src,
-		       unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_decrypt,
-	             ecb_process, 1, NULL);
-}
-
-static int cbc_encrypt(struct crypto_tfm *tfm,
-                       struct scatterlist *dst,
-                       struct scatterlist *src,
-		       unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_encrypt,
-	             cbc_process, 1, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_encrypt_iv(struct crypto_tfm *tfm,
-                          struct scatterlist *dst,
-                          struct scatterlist *src,
-                          unsigned int nbytes, u8 *iv)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_encrypt,
-	             cbc_process, 1, iv);
-}
-
-static int cbc_decrypt(struct crypto_tfm *tfm,
-                       struct scatterlist *dst,
-                       struct scatterlist *src,
-		       unsigned int nbytes)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_decrypt,
-	             cbc_process, 0, tfm->crt_cipher.cit_iv);
-}
-
-static int cbc_decrypt_iv(struct crypto_tfm *tfm,
-                          struct scatterlist *dst,
-                          struct scatterlist *src,
-                          unsigned int nbytes, u8 *iv)
-{
-	return crypt(tfm, dst, src, nbytes,
-	             tfm->__crt_alg->cra_cipher.cia_decrypt,
-	             cbc_process, 0, iv);
-}
-
-static int nocrypt(struct crypto_tfm *tfm,
-                   struct scatterlist *dst,
-                   struct scatterlist *src,
-		   unsigned int nbytes)
-{
-	return -ENOSYS;
-}
-
-static int nocrypt_iv(struct crypto_tfm *tfm,
-                      struct scatterlist *dst,
-                      struct scatterlist *src,
-                      unsigned int nbytes, u8 *iv)
-{
-	return -ENOSYS;
-}
+DEF_TFM_FUNCTION(cbc_encrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_encrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cbc_decrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cbc_decrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(cfb_encrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_encrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(cfb_decrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(cfb_decrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ofb_encrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_encrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ofb_decrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ofb_decrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, iv);
+
+DEF_TFM_FUNCTION(ctr_encrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_encrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, iv);
+DEF_TFM_FUNCTION(ctr_decrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv);
+DEF_TFM_FUNCTION_IV(ctr_decrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, iv);
 
 int crypto_init_cipher_flags(struct crypto_tfm *tfm, u32 flags)
 {
@@ -245,17 +309,24 @@ int crypto_init_cipher_ops(struct crypto
 		break;
 		
 	case CRYPTO_TFM_MODE_CFB:
-		ops->cit_encrypt = nocrypt;
-		ops->cit_decrypt = nocrypt;
-		ops->cit_encrypt_iv = nocrypt_iv;
-		ops->cit_decrypt_iv = nocrypt_iv;
+		ops->cit_encrypt = cfb_encrypt;
+		ops->cit_decrypt = cfb_decrypt;
+		ops->cit_encrypt_iv = cfb_encrypt_iv;
+		ops->cit_decrypt_iv = cfb_decrypt_iv;
+		break;
+
+	case CRYPTO_TFM_MODE_OFB:
+		ops->cit_encrypt = ofb_encrypt;
+		ops->cit_decrypt = ofb_decrypt;
+		ops->cit_encrypt_iv = ofb_encrypt_iv;
+		ops->cit_decrypt_iv = ofb_decrypt_iv;
 		break;
 	
 	case CRYPTO_TFM_MODE_CTR:
-		ops->cit_encrypt = nocrypt;
-		ops->cit_decrypt = nocrypt;
-		ops->cit_encrypt_iv = nocrypt_iv;
-		ops->cit_decrypt_iv = nocrypt_iv;
+		ops->cit_encrypt = ctr_encrypt;
+		ops->cit_decrypt = ctr_decrypt;
+		ops->cit_encrypt_iv = ctr_encrypt_iv;
+		ops->cit_decrypt_iv = ctr_decrypt_iv;
 		break;
 
 	default:

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once
  2005-01-11 17:08       ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig
  2005-01-14  3:05         ` Andrew Morton
@ 2005-01-14 13:15         ` Michal Ludvig
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Ludvig @ 2005-01-14 13:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, jmorris, cryptoapi, linux-kernel

Hi all,

Update to padlock-aes.c that enables processing of the whole buffer of 
data at once with the given chaining mode (e.g. CBC). It brings much 
higher speed over the case where the chaining is done in software by 
CryptoAPI.

This is updated revision of the patch. Now it compiles even with GCC 
2.95.3.

Signed-off-by: Michal Ludvig <michal@logix.cz>

---

 padlock-aes.c |  176 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 166 insertions(+), 10 deletions(-)

Index: linux-2.6.10/drivers/crypto/padlock-aes.c
===================================================================
--- linux-2.6.10.orig/drivers/crypto/padlock-aes.c	2005-01-11 14:01:05.000000000 +0100
+++ linux-2.6.10/drivers/crypto/padlock-aes.c	2005-01-11 23:40:26.000000000 +0100
@@ -369,19 +369,54 @@ aes_set_key(void *ctx_arg, const uint8_t
 
 /* ====== Encryption/decryption routines ====== */
 
-/* This is the real call to PadLock. */
-static inline void
+/* These are the real calls to PadLock. */
+static inline void *
 padlock_xcrypt_ecb(uint8_t *input, uint8_t *output, uint8_t *key,
-		   void *control_word, uint32_t count)
+		   uint8_t *iv, void *control_word, uint32_t count)
 {
 	asm volatile ("pushfl; popfl");		/* enforce key reload. */
 	asm volatile (".byte 0xf3,0x0f,0xa7,0xc8"	/* rep xcryptecb */
 		      : "+S"(input), "+D"(output)
 		      : "d"(control_word), "b"(key), "c"(count));
+	return NULL;
+}
+
+static inline void *
+padlock_xcrypt_cbc(uint8_t *input, uint8_t *output, uint8_t *key,
+		   uint8_t *iv, void *control_word, uint32_t count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xd0"	/* rep xcryptcbc */
+		      : "+S"(input), "+D"(output), "+a"(iv)
+		      : "d"(control_word), "b"(key), "c"(count));
+	return iv;
+}
+
+static inline void *
+padlock_xcrypt_cfb(uint8_t *input, uint8_t *output, uint8_t *key,
+		   uint8_t *iv, void *control_word, uint32_t count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xe0"	/* rep xcryptcfb */
+		      : "+S"(input), "+D"(output), "+a"(iv)
+		      : "d"(control_word), "b"(key), "c"(count));
+	return iv;
+}
+
+static inline void *
+padlock_xcrypt_ofb(uint8_t *input, uint8_t *output, uint8_t *key,
+		   uint8_t *iv, void *control_word, uint32_t count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xe8"	/* rep xcryptofb */
+		      : "+S"(input), "+D"(output), "+a"(iv)
+		      : "d"(control_word), "b"(key), "c"(count));
+	return iv;
 }
 
 static void
-aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, int encdec)
+aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg,
+	    uint8_t *iv_arg, size_t nbytes, int encdec, int mode)
 {
 	/* Don't blindly modify this structure - the items must 
 	   fit on 16-Bytes boundaries! */
@@ -419,21 +454,126 @@ aes_padlock(void *ctx_arg, uint8_t *out_
 	else
 		key = ctx->D;
 	
-	memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
-	padlock_xcrypt_ecb(data->buf, data->buf, key, &data->cword, 1);
-	memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+	if (nbytes == AES_BLOCK_SIZE) {
+		/* Processing one block only => ECB is enough */
+		memcpy(data->buf, in_arg, AES_BLOCK_SIZE);
+		padlock_xcrypt_ecb(data->buf, data->buf, key, NULL,
+				   &data->cword, 1);
+		memcpy(out_arg, data->buf, AES_BLOCK_SIZE);
+	} else {
+		/* Processing multiple blocks at once */
+		uint8_t *in, *out, *iv;
+		int gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
+		void *index = NULL;
+
+		if (unlikely(((long)in_arg) & 0x0F)) {
+			in = crypto_aligned_kmalloc(nbytes, gfp, 16, &index);
+			memcpy(in, in_arg, nbytes);
+		}
+		else
+			in = (uint8_t*)in_arg;
+
+		if (unlikely(((long)out_arg) & 0x0F)) {
+			if (index)
+				out = in;	/* xcrypt can work "in place" */
+			else
+				out = crypto_aligned_kmalloc(nbytes, gfp, 16,
+							     &index);
+		}
+		else
+			out = out_arg;
+
+		/* Always make a local copy of IV - xcrypt may change it! */
+		iv = data->buf;
+		if (iv_arg)
+			memcpy(iv, iv_arg, AES_BLOCK_SIZE);
+
+		switch (mode) {
+			case CRYPTO_TFM_MODE_ECB:
+				iv = padlock_xcrypt_ecb(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			case CRYPTO_TFM_MODE_CBC:
+				iv = padlock_xcrypt_cbc(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			case CRYPTO_TFM_MODE_CFB:
+				iv = padlock_xcrypt_cfb(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			case CRYPTO_TFM_MODE_OFB:
+				iv = padlock_xcrypt_ofb(in, out, key, iv,
+							&data->cword,
+							nbytes/AES_BLOCK_SIZE);
+				break;
+
+			default:
+				BUG();
+		}
+
+		/* Back up IV */
+		if (iv && iv_arg)
+			memcpy(iv_arg, iv, AES_BLOCK_SIZE);
+
+		/* Copy the 16-Byte aligned output to the caller's buffer. */
+		if (out != out_arg)
+			memcpy(out_arg, out, nbytes);
+
+		if (index)
+			kfree(index);
+	}
+}
+
+static void
+aes_padlock_ecb(void *ctx, uint8_t *dst, const uint8_t *src,
+		uint8_t *iv, size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, NULL, nbytes, encdec,
+		    CRYPTO_TFM_MODE_ECB);
+}
+
+static void
+aes_padlock_cbc(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+		size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+		    CRYPTO_TFM_MODE_CBC);
+}
+
+static void
+aes_padlock_cfb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+		size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+		    CRYPTO_TFM_MODE_CFB);
+}
+
+static void
+aes_padlock_ofb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv,
+		size_t nbytes, int encdec, int inplace)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec,
+		    CRYPTO_TFM_MODE_OFB);
 }
 
 static void
 aes_encrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
 {
-	aes_padlock(ctx_arg, out, in, CRYPTO_DIR_ENCRYPT);
+	aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+		    CRYPTO_DIR_ENCRYPT, CRYPTO_TFM_MODE_ECB);
 }
 
 static void
 aes_decrypt(void *ctx_arg, uint8_t *out, const uint8_t *in)
 {
-	aes_padlock(ctx_arg, out, in, CRYPTO_DIR_DECRYPT);
+	aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE,
+		    CRYPTO_DIR_DECRYPT, CRYPTO_TFM_MODE_ECB);
 }
 
 static struct crypto_alg aes_alg = {
@@ -454,9 +594,25 @@ static struct crypto_alg aes_alg = {
 	}
 };
 
+static int disable_multiblock = 0;
+MODULE_PARM(disable_multiblock, "i");
+MODULE_PARM_DESC(disable_multiblock,
+		 "Disable encryption of whole multiblock buffers.");
+
 int __init padlock_init_aes(void)
 {
-	printk(KERN_NOTICE PFX "Using VIA PadLock ACE for AES algorithm.\n");
+	if (!disable_multiblock) {
+		aes_alg.cra_u.cipher.cia_max_nbytes = (size_t)-1;
+		aes_alg.cra_u.cipher.cia_req_align  = 16;
+		aes_alg.cra_u.cipher.cia_ecb        = aes_padlock_ecb;
+		aes_alg.cra_u.cipher.cia_cbc        = aes_padlock_cbc;
+		aes_alg.cra_u.cipher.cia_cfb        = aes_padlock_cfb;
+		aes_alg.cra_u.cipher.cia_ofb        = aes_padlock_ofb;
+	}
+
+	printk(KERN_NOTICE PFX
+		"Using VIA PadLock ACE for AES algorithm%s.\n",
+		disable_multiblock ? "" : " (multiblock)");
 
 	gen_tabs();
 	return crypto_register_alg(&aes_alg);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 13:10         ` [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers " Michal Ludvig
@ 2005-01-14 14:20           ` Fruhwirth Clemens
  2005-01-14 16:40             ` Michal Ludvig
  0 siblings, 1 reply; 18+ messages in thread
From: Fruhwirth Clemens @ 2005-01-14 14:20 UTC (permalink / raw)
  To: Michal Ludvig
  Cc: Andrew Morton, James Morris, cryptoapi, David S. Miller, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4498 bytes --]

On Fri, 2005-01-14 at 14:10 +0100, Michal Ludvig wrote:

> This patch extends crypto/cipher.c for offloading the whole chaining modes
> to e.g. hardware crypto accelerators. It is much faster to let the 
> hardware do all the chaining if it can do so.

Is there any connection to Evgeniy Polyakov's acrypto work? It appears,
that there are two project for one objective. Would be nice to see both
parties pulling on one string.

> +	void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> +			size_t nbytes, int encdec, int inplace);
> +	void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> +			size_t nbytes, int encdec, int inplace);
> +	void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> +			size_t nbytes, int encdec, int inplace);
> +	void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> +			size_t nbytes, int encdec, int inplace);
> +	void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> +			size_t nbytes, int encdec, int inplace);

What's the use of adding mode specific functions to the tfm struct? And
why do they all have the same function type? For instance, the "iv" or
"inplace" argument is meaningless for ECB.

Have a look at
http://clemens.endorphin.org/patches/lrw/2-tweakable-cipher-interface.diff

This patch takes the following approach to handle the 
cipher mode/interface issue:

Every mode is associated with one or more interfaces. This interface is
either cit_encrypt, cit_encrypt_iv, or cit_encrypt_tweaks. How these
interfaces are associated with cipher modes, is handled in
crypto_init_cipher_flags. 

Except for CBC, every mode associates with just one interface. In CBC,
the CryptoAPI caller can use the IV interface to supply an IV, or use
the current tfm's IV by using cit_encrypt instead of cit_encrypt_iv.

I don't see a gain to through dozens of pointers into the tfm, as a tfm
is always assigned a single mode.
 
>  /*
>   * Generic encrypt/decrypt wrapper for ciphers, handles operations across
> @@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const 
>  static int crypt(struct crypto_tfm *tfm,
>  		 struct scatterlist *dst,
>  		 struct scatterlist *src,
> -                 unsigned int nbytes, cryptfn_t crfn,
> -                 procfn_t prfn, int enc, void *info)

Your patch heavily interferes with my cleanup patch for crypt(..). To
put it briefly, I consider crypt(..) a mess. The function definition of
crypto and the procfn_t function is just a patchwork of stuff, added
when needed. 

I've rewritten a generic scatterwalker, that's a generic replacement for
crypto, that can apply any processing function with arbitrary argument
length to the data associated with a set of scatterlists. I think this
function shouldn't be in crypto/ but in some more generic location, as I
think it could be useful for much more things. 

http://clemens.endorphin.org/patches/lrw/1-generic-scatterwalker.diff
is the generic scatterwalk patch. 

int scatterwalk_walker_generic(void (function)(void *priv, int length,
void **buflist), void *priv, int steps, int nsl, ...) 

"function" is applied to the scatterlist data. 
"priv" is a private data structure for bookkeeping. It's supplied to the
function as first parameter.
"steps" is the number of times function is called.
"nsl" is the number of scatterlists following.

After "nsl", the scatterlists follow in a tuple of data:
<struct scatterlist *, int steplength, int ioflag>

ECB, for example:
	...
struct ecb_process_priv priv = { 
	.tfm = tfm,
	.crfn = tfm->__crt_alg->cra_cipher.cia_decrypt,
};
int bsize = crypto_tfm_alg_blocksize(tfm);
scatterwalk_walker_generic(ecb_process_gw, 	// processing function
	&priv,		// private data
	nbytes/bsize,	// number of steps
	2, 		// number of scatterlists
	dst, bsize, 1, 	// first, ioflag set to output
	src, bsize, 0);	// second, ioflag set to input

..
static void ecb_process_gw(void *_priv, int nsg, void **buf) 
{
	struct ecb_process_priv *priv = (struct ecb_process_priv *)_priv;
	char *dst = buf[0];	// pointer to correctly kmapped and copied dst
	char *src = buf[1];	// pointer to correctly kmapped and copied src
	priv->crfn(crypto_tfm_ctx(priv->tfm), dst, src);
}

Well, I recognize that I'm somehow off-topic now. But, it demonstrates
clearly, why we should get rid of crypt(..) and replace it with
something more generic.

-- 
Fruhwirth Clemens <clemens@endorphin.org>  http://clemens.endorphin.org

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 14:20           ` Fruhwirth Clemens
@ 2005-01-14 16:40             ` Michal Ludvig
  2005-01-15 12:45               ` Fruhwirth Clemens
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Ludvig @ 2005-01-14 16:40 UTC (permalink / raw)
  To: Fruhwirth Clemens
  Cc: Andrew Morton, James Morris, cryptoapi, David S. Miller, linux-kernel

On Fri, 14 Jan 2005, Fruhwirth Clemens wrote:

> On Fri, 2005-01-14 at 14:10 +0100, Michal Ludvig wrote:
> 
> > This patch extends crypto/cipher.c for offloading the whole chaining modes
> > to e.g. hardware crypto accelerators. It is much faster to let the 
> > hardware do all the chaining if it can do so.
> 
> Is there any connection to Evgeniy Polyakov's acrypto work? It appears,
> that there are two project for one objective. Would be nice to see both
> parties pulling on one string.

These projects do not compete at all. Evgeniy's work is a complete 
replacement for current cryptoapi and brings the asynchronous 
operations at the first place. My patches are simple and straightforward 
extensions to current cryptoapi that enable offloading the chaining to 
hardware where possible.

> > +	void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > +			size_t nbytes, int encdec, int inplace);
> > +	void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > +			size_t nbytes, int encdec, int inplace);
> > +	void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > +			size_t nbytes, int encdec, int inplace);
> > +	void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > +			size_t nbytes, int encdec, int inplace);
> > +	void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > +			size_t nbytes, int encdec, int inplace);
> 
> What's the use of adding mode specific functions to the tfm struct? And
> why do they all have the same function type? For instance, the "iv" or
> "inplace" argument is meaningless for ECB.

The prototypes must be the same in my implementation, because in crypt() 
only a pointer to the appropriate mode function is taken and further used 
as "(func*)(arg, arg, ...)".

BTW these functions are not added to "struct crypto_tfm", but to "struct 
crypto_alg" which describes what a particular module supports (i.e. along 
with the block size, algorithm name, etc). In this case it can say that 
e.g. padlock.ko supports encryption in CBC mode in addition to a common 
single-block processing.

BTW I'll look at the pointers of the tweakable api over the weekend...

Michal Ludvig
-- 
* A mouse is a device used to point at the xterm you want to type in.
* Personal homepage - http://www.logix.cz/michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 16:40             ` Michal Ludvig
@ 2005-01-15 12:45               ` Fruhwirth Clemens
  2005-01-18 16:49                 ` James Morris
  0 siblings, 1 reply; 18+ messages in thread
From: Fruhwirth Clemens @ 2005-01-15 12:45 UTC (permalink / raw)
  To: Michal Ludvig
  Cc: Andrew Morton, James Morris, cryptoapi, David S. Miller, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4730 bytes --]

On Fri, 2005-01-14 at 17:40 +0100, Michal Ludvig wrote: 
> > Is there any connection to Evgeniy Polyakov's acrypto work? It appears,
> > that there are two project for one objective. Would be nice to see both
> > parties pulling on one string.
> 
> These projects do not compete at all. Evgeniy's work is a complete 
> replacement for current cryptoapi and brings the asynchronous 
> operations at the first place. My patches are simple and straightforward 
> extensions to current cryptoapi that enable offloading the chaining to 
> hardware where possible.

Fine, I just saw in Evgeniy's reply, that he took your padlock
implementation. I thought both of you have been working on different
implementations. 

But actually both aim for the same goal. Hardware crypto-offloading.
With padlock the need for a async interface isn't that big, because it's
not "off-loading" as it's done on the same chip and in the same thread. 

However, developing two different APIs isn't particular efficient. I
know, at the moment there isn't much choice, as J.Morris hasn't commited
to acrypto in anyway. But I think it would be good to replace the
synchronized CryptoAPI implementation altogether, put the missing
internals of CryptoAPI into acrypto, and back the interfaces of
CryptoAPI with small stubs, that do like

somereturnvalue synchronized_interface(..) {
	acrypto_kick_some_operation(acrypto_struct);
	wait_for_completion(acrypto_struct);
	return fetch_result(acrypto_struct);
}

The other way round, a asynchron interface using a synchronized
interface doesn't seem natural to me.
(That doesn't mean I oppose your patches, merely that we should start to
think in different directions)

> > > +	void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > +			size_t nbytes, int encdec, int inplace);
> > > +	void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > +			size_t nbytes, int encdec, int inplace);
> > > +	void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > +			size_t nbytes, int encdec, int inplace);
> > > +	void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > +			size_t nbytes, int encdec, int inplace);
> > > +	void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv,
> > > +			size_t nbytes, int encdec, int inplace);
> > 
> > What's the use of adding mode specific functions to the tfm struct? And
> > why do they all have the same function type? For instance, the "iv" or
> > "inplace" argument is meaningless for ECB.
> 
> The prototypes must be the same in my implementation, because in crypt() 
> only a pointer to the appropriate mode function is taken and further used 
> as "(func*)(arg, arg, ...)".
> 
> BTW these functions are not added to "struct crypto_tfm", but to "struct 
> crypto_alg" which describes what a particular module supports (i.e. along 
> with the block size, algorithm name, etc). In this case it can say that 
> e.g. padlock.ko supports encryption in CBC mode in addition to a common 
> single-block processing.

Err, right. I overlooked that it's cia and not cit. However, I don't
like the idea of extending structs when there is a new cipher mode. I
think the API should not have to be extended for every addition, but
should be designed for such extension right from the start.

What about a "selector" function, which returns the appropriate
encryption function for a mode?

typedef void (procfn_t)(struct crypto_tfm *, u8 *,
                        u8*, cryptfn_t, int enc, void *, int);

put 
	procfn_t (*cia_modesel)(u32 function, int iface, int encdec);
into struct crypto_alg;

then in crypto_init_cipher_ops, instead of

	switch (tfm->crt_cipher.cit_mode) {
..
	case CRYPTO_TFM_MODE_CFB:
        	ops->cit_encrypt = cfb_encrypt;
		ops->cit_decrypt = cfb_decrypt;
..
}
we do,
	struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher;
	
	switch (tfm->crt_cipher.cit_mode) {
..
		case CRYPTO_TFM_MODE_CFB:
			ops->cit_encrypt = cia->cia_modesel(cit_mode, 0, IFACE_ECB);
			ops->cit_decrypt = cia->cia_modesel(cit_mode, 1, IFACE_ECB);
			ops->cit_encrypt_iv = cia->cia_modesel(cit_mode, 0, IFACE_IV);
			ops->cit_decrypt_iv = cia->cia_modesel(cit_mode, 1, IFACE_IV);
..

Alternatively, we could also add a lookup table. But I like this better,
since this is much easier to read for people, and tfm's aren't alloced
that often.

Probably, we can add a wrapper for cia_modesel, that when cia_modesel is
NULL, it falls back to the old behaviour. This way, we don't have to
patch all algorithm implementations to include cia_modesel.

How you like that idea?

-- 
Fruhwirth Clemens <clemens@endorphin.org>  http://clemens.endorphin.org

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-15 12:45               ` Fruhwirth Clemens
@ 2005-01-18 16:49                 ` James Morris
  2005-01-20  3:30                   ` David McCullough
  0 siblings, 1 reply; 18+ messages in thread
From: James Morris @ 2005-01-18 16:49 UTC (permalink / raw)
  To: Fruhwirth Clemens
  Cc: Michal Ludvig, Andrew Morton, cryptoapi, David S. Miller, linux-kernel

On Sat, 15 Jan 2005, Fruhwirth Clemens wrote:

> However, developing two different APIs isn't particular efficient. I
> know, at the moment there isn't much choice, as J.Morris hasn't commited
> to acrypto in anyway.

There is also the OCF port (OpenBSD crypto framework) to consider, if 
permission to dual license from the original authors can be obtained.


- James
-- 
James Morris
<jmorris@redhat.com>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-18 16:49                 ` James Morris
@ 2005-01-20  3:30                   ` David McCullough
  2005-01-20 13:47                     ` James Morris
  0 siblings, 1 reply; 18+ messages in thread
From: David McCullough @ 2005-01-20  3:30 UTC (permalink / raw)
  To: James Morris
  Cc: Fruhwirth Clemens, Andrew Morton, linux-kernel, cryptoapi,
	Michal Ludvig, David S. Miller


Jivin James Morris lays it down ...
> On Sat, 15 Jan 2005, Fruhwirth Clemens wrote:
> 
> > However, developing two different APIs isn't particular efficient. I
> > know, at the moment there isn't much choice, as J.Morris hasn't commited
> > to acrypto in anyway.
> 
> There is also the OCF port (OpenBSD crypto framework) to consider, if 
> permission to dual license from the original authors can be obtained.

For anyone looking for the OCF port for linux,  you can find the latest
release here:

	http://lists.logix.cz/pipermail/cryptoapi/2004/000261.html

One of the drivers uses the existing kernel crypto API to implement
a SW crypto engine for OCF.

As for permission to use a dual license,  I will gladly approach the
authors if others feel it is important to know the possibility of it at this
point,

Cheers,
Davidm

-- 
David McCullough, davidm@snapgear.com  Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security   Fx:+61 7 38913630 http://www.uCdot.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-20  3:30                   ` David McCullough
@ 2005-01-20 13:47                     ` James Morris
  2005-03-03 10:50                       ` David McCullough
  0 siblings, 1 reply; 18+ messages in thread
From: James Morris @ 2005-01-20 13:47 UTC (permalink / raw)
  To: David McCullough
  Cc: Fruhwirth Clemens, Andrew Morton, linux-kernel, cryptoapi,
	Michal Ludvig, David S. Miller

On Thu, 20 Jan 2005, David McCullough wrote:

> As for permission to use a dual license,  I will gladly approach the
> authors if others feel it is important to know the possibility of it at this
> point,

Please do so.  It would be useful to have the option of using an already
developed, debugged and analyzed framework.


- James
-- 
James Morris
<jmorris@redhat.com>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-20 13:47                     ` James Morris
@ 2005-03-03 10:50                       ` David McCullough
  0 siblings, 0 replies; 18+ messages in thread
From: David McCullough @ 2005-03-03 10:50 UTC (permalink / raw)
  To: James Morris
  Cc: Fruhwirth Clemens, Andrew Morton, linux-kernel, cryptoapi,
	Michal Ludvig, David S. Miller


Jivin James Morris lays it down ...
> On Thu, 20 Jan 2005, David McCullough wrote:
> 
> > As for permission to use a dual license,  I will gladly approach the
> > authors if others feel it is important to know the possibility of it at this
> > point,
> 
> Please do so.  It would be useful to have the option of using an already
> developed, debugged and analyzed framework.

Ok,  I finally managed to get responses from all the individual
contributors,  though none of the corporations contacted have responded.

While a good number of those contacted were happy to dual-license,  most
are concerned that changes made under the GPL will not be available for
use in BSD.  A couple were a definate no.

I have had offers to rewrite any portions that can not be dual-licensed,
but I think that is overkill for now unless there is significant
interest in taking that path.

Fortunately we have been able to obtain some funding to complete a large
amount of work on the project so it should have some nice progress in the
next couple of weeks as that ramps up :-)

Cheers,
Davidm

-- 
David McCullough, davidm@snapgear.com  Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security   Fx:+61 7 38913630 http://www.uCdot.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov
                   ` (3 preceding siblings ...)
  2005-01-14 22:34 ` Evgeniy Polyakov
@ 2005-01-14 22:41 ` Evgeniy Polyakov
  4 siblings, 0 replies; 18+ messages in thread
From: Evgeniy Polyakov @ 2005-01-14 22:41 UTC (permalink / raw)
  To: johnpol
  Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton,
	James Morris, cryptoapi, David S. Miller

On Sat, 15 Jan 2005 01:31:03 +0300
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> 
> Crypto routing.
> This feature allows the same session to be processed by several devices/algorithms. 
> For example if you need to encrypt data and then sign it in TPM device you can create 
> one route to encryption device and then route it to TPM device. (Note: this feature 
> must be discussed since there is no time slice after session allocation, only in 
> crypto_device->data_ready() method and there are locking issues in ->callback() method).

Actually it is already impleneted by 
crypto_session_alloc();
route manipulations
crypto_session_add();

And sessions can be (re)routed inside crypto devices itself.

	Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov
                   ` (2 preceding siblings ...)
  2005-01-14 22:33 ` Evgeniy Polyakov
@ 2005-01-14 22:34 ` Evgeniy Polyakov
  2005-01-14 22:41 ` Evgeniy Polyakov
  4 siblings, 0 replies; 18+ messages in thread
From: Evgeniy Polyakov @ 2005-01-14 22:34 UTC (permalink / raw)
  To: johnpol
  Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton,
	James Morris, cryptoapi, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 481 bytes --]

 via-padlock - patch to enable xcrypt instructions on various VIA CPUs (for example Nehemiah family).
 	It is totally Michal's work, I've just ported it to acrypto.
 	Not tested.
 
 fcrypt - driver for CE-InfoSys FastCrypt PCI card equipped with a SuperCrypt CE99C003B chip that 
 	can offload DES and 3DES encryption from the CPU.
 	It is totally Michal's work too, I've just ported it to acrypto.
 	Not tested.

	Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

[-- Attachment #2: fcrypt-04_01_2005.tar.gz --]
[-- Type: application/octet-stream, Size: 8502 bytes --]

[-- Attachment #3: via-padlock.patch-04_01_2005 --]
[-- Type: application/octet-stream, Size: 24092 bytes --]

diff -Nru /tmp/empty/Makefile via-padlock/Makefile
--- /tmp/empty/Makefile	1970-01-01 03:00:00.000000000 +0300
+++ via-padlock/Makefile	2004-10-26 07:20:11.000000000 +0400
@@ -0,0 +1,6 @@
+obj-m		+= padlock.o
+padlock-objs	:= padlock-aes.o padlock-generic.o
+
+clean:
+	rm -f *.o *.ko *.mod.* .*.cmd *~
+	rm -rf .tmp_versions
diff -Nru /tmp/empty/padlock-aes.c via-padlock/padlock-aes.c
--- /tmp/empty/padlock-aes.c	1970-01-01 03:00:00.000000000 +0300
+++ via-padlock/padlock-aes.c	2004-12-20 12:49:12.225384528 +0300
@@ -0,0 +1,553 @@
+/* 
+ * Cryptographic API.
+ *
+ * Support for VIA PadLock hardware crypto engine.
+ *
+ * Linux developers:
+ *  Michal Ludvig <mludvig@suse.cz>
+ *
+ * Key expansion routine taken from crypto/aes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * ---------------------------------------------------------------------------
+ * Copyright (c) 2002, Dr Brian Gladman <brg@gladman.me.uk>, Worcester, UK.
+ * All rights reserved.
+ *
+ * LICENSE TERMS
+ *
+ * The free distribution and use of this software in both source and binary
+ * form is allowed (with or without changes) provided that:
+ *
+ *   1. distributions of this source code include the above copyright
+ *      notice, this list of conditions and the following disclaimer;
+ *
+ *   2. distributions in binary form include the above copyright
+ *      notice, this list of conditions and the following disclaimer
+ *      in the documentation and/or other associated materials;
+ *
+ *   3. the copyright holder's name is not used to endorse products
+ *      built using this software without specific written permission.
+ *
+ * ALTERNATIVELY, provided that this notice is retained in full, this product
+ * may be distributed under the terms of the GNU General Public License (GPL),
+ * in which case the provisions of the GPL apply INSTEAD OF those given above.
+ *
+ * DISCLAIMER
+ *
+ * This software is provided 'as is' with no explicit or implied warranties
+ * in respect of its properties, including, but not limited to, correctness
+ * and/or fitness for purpose.
+ * ---------------------------------------------------------------------------
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/crypto.h>
+#include <asm/byteorder.h>
+#include <linux/mm.h>
+
+#include <asm/scatterlist.h>
+
+#include "padlock.h"
+
+#include "../crypto_def.h"
+#include "../acrypto.h"
+#include "../crypto_stat.h"
+
+static inline int aes_hw_extkey_available (u8 key_len);
+
+static inline 
+u32 generic_rotr32 (const u32 x, const unsigned bits)
+{
+	const unsigned n = bits % 32;
+	return (x >> n) | (x << (32 - n));
+}
+
+static inline 
+u32 generic_rotl32 (const u32 x, const unsigned bits)
+{
+	const unsigned n = bits % 32;
+	return (x << n) | (x >> (32 - n));
+}
+
+#define rotl generic_rotl32
+#define rotr generic_rotr32
+
+/*
+ * #define byte(x, nr) ((unsigned char)((x) >> (nr*8))) 
+ */
+inline static u8
+byte(const u32 x, const unsigned n)
+{
+	return x >> (n << 3);
+}
+
+#define u32_in(x) le32_to_cpu(*(const u32 *)(x))
+#define u32_out(to, from) (*(u32 *)(to) = cpu_to_le32(from))
+
+static u8 pow_tab[256];
+static u8 log_tab[256];
+static u8 sbx_tab[256];
+static u8 isb_tab[256];
+static u32 rco_tab[10];
+static u32 ft_tab[4][256];
+static u32 it_tab[4][256];
+
+static u32 fl_tab[4][256];
+static u32 il_tab[4][256];
+
+static inline u8
+f_mult (u8 a, u8 b)
+{
+	u8 aa = log_tab[a], cc = aa + log_tab[b];
+
+	return pow_tab[cc + (cc < aa ? 1 : 0)];
+}
+
+#define ff_mult(a,b)    (a && b ? f_mult(a, b) : 0)
+
+#define f_rn(bo, bi, n, k)					\
+    bo[n] =  ft_tab[0][byte(bi[n],0)] ^				\
+             ft_tab[1][byte(bi[(n + 1) & 3],1)] ^		\
+             ft_tab[2][byte(bi[(n + 2) & 3],2)] ^		\
+             ft_tab[3][byte(bi[(n + 3) & 3],3)] ^ *(k + n)
+
+#define i_rn(bo, bi, n, k)					\
+    bo[n] =  it_tab[0][byte(bi[n],0)] ^				\
+             it_tab[1][byte(bi[(n + 3) & 3],1)] ^		\
+             it_tab[2][byte(bi[(n + 2) & 3],2)] ^		\
+             it_tab[3][byte(bi[(n + 1) & 3],3)] ^ *(k + n)
+
+#define ls_box(x)				\
+    ( fl_tab[0][byte(x, 0)] ^			\
+      fl_tab[1][byte(x, 1)] ^			\
+      fl_tab[2][byte(x, 2)] ^			\
+      fl_tab[3][byte(x, 3)] )
+
+#define f_rl(bo, bi, n, k)					\
+    bo[n] =  fl_tab[0][byte(bi[n],0)] ^				\
+             fl_tab[1][byte(bi[(n + 1) & 3],1)] ^		\
+             fl_tab[2][byte(bi[(n + 2) & 3],2)] ^		\
+             fl_tab[3][byte(bi[(n + 3) & 3],3)] ^ *(k + n)
+
+#define i_rl(bo, bi, n, k)					\
+    bo[n] =  il_tab[0][byte(bi[n],0)] ^				\
+             il_tab[1][byte(bi[(n + 3) & 3],1)] ^		\
+             il_tab[2][byte(bi[(n + 2) & 3],2)] ^		\
+             il_tab[3][byte(bi[(n + 1) & 3],3)] ^ *(k + n)
+
+static void
+gen_tabs (void)
+{
+	u32 i, t;
+	u8 p, q;
+
+	/* log and power tables for GF(2**8) finite field with
+	   0x011b as modular polynomial - the simplest prmitive
+	   root is 0x03, used here to generate the tables */
+
+	for (i = 0, p = 1; i < 256; ++i) {
+		pow_tab[i] = (u8) p;
+		log_tab[p] = (u8) i;
+
+		p ^= (p << 1) ^ (p & 0x80 ? 0x01b : 0);
+	}
+
+	log_tab[1] = 0;
+
+	for (i = 0, p = 1; i < 10; ++i) {
+		rco_tab[i] = p;
+
+		p = (p << 1) ^ (p & 0x80 ? 0x01b : 0);
+	}
+
+	for (i = 0; i < 256; ++i) {
+		p = (i ? pow_tab[255 - log_tab[i]] : 0);
+		q = ((p >> 7) | (p << 1)) ^ ((p >> 6) | (p << 2));
+		p ^= 0x63 ^ q ^ ((q >> 6) | (q << 2));
+		sbx_tab[i] = p;
+		isb_tab[p] = (u8) i;
+	}
+
+	for (i = 0; i < 256; ++i) {
+		p = sbx_tab[i];
+
+		t = p;
+		fl_tab[0][i] = t;
+		fl_tab[1][i] = rotl (t, 8);
+		fl_tab[2][i] = rotl (t, 16);
+		fl_tab[3][i] = rotl (t, 24);
+
+		t = ((u32) ff_mult (2, p)) |
+		    ((u32) p << 8) |
+		    ((u32) p << 16) | ((u32) ff_mult (3, p) << 24);
+
+		ft_tab[0][i] = t;
+		ft_tab[1][i] = rotl (t, 8);
+		ft_tab[2][i] = rotl (t, 16);
+		ft_tab[3][i] = rotl (t, 24);
+
+		p = isb_tab[i];
+
+		t = p;
+		il_tab[0][i] = t;
+		il_tab[1][i] = rotl (t, 8);
+		il_tab[2][i] = rotl (t, 16);
+		il_tab[3][i] = rotl (t, 24);
+
+		t = ((u32) ff_mult (14, p)) |
+		    ((u32) ff_mult (9, p) << 8) |
+		    ((u32) ff_mult (13, p) << 16) |
+		    ((u32) ff_mult (11, p) << 24);
+
+		it_tab[0][i] = t;
+		it_tab[1][i] = rotl (t, 8);
+		it_tab[2][i] = rotl (t, 16);
+		it_tab[3][i] = rotl (t, 24);
+	}
+}
+
+#define star_x(x) (((x) & 0x7f7f7f7f) << 1) ^ ((((x) & 0x80808080) >> 7) * 0x1b)
+
+#define imix_col(y,x)       \
+    u   = star_x(x);        \
+    v   = star_x(u);        \
+    w   = star_x(v);        \
+    t   = w ^ (x);          \
+   (y)  = u ^ v ^ w;        \
+   (y) ^= rotr(u ^ t,  8) ^ \
+          rotr(v ^ t, 16) ^ \
+          rotr(t,24)
+
+/* initialise the key schedule from the user supplied key */
+
+#define loop4(i)                                    \
+{   t = rotr(t,  8); t = ls_box(t) ^ rco_tab[i];    \
+    t ^= E_KEY[4 * i];     E_KEY[4 * i + 4] = t;    \
+    t ^= E_KEY[4 * i + 1]; E_KEY[4 * i + 5] = t;    \
+    t ^= E_KEY[4 * i + 2]; E_KEY[4 * i + 6] = t;    \
+    t ^= E_KEY[4 * i + 3]; E_KEY[4 * i + 7] = t;    \
+}
+
+#define loop6(i)                                    \
+{   t = rotr(t,  8); t = ls_box(t) ^ rco_tab[i];    \
+    t ^= E_KEY[6 * i];     E_KEY[6 * i + 6] = t;    \
+    t ^= E_KEY[6 * i + 1]; E_KEY[6 * i + 7] = t;    \
+    t ^= E_KEY[6 * i + 2]; E_KEY[6 * i + 8] = t;    \
+    t ^= E_KEY[6 * i + 3]; E_KEY[6 * i + 9] = t;    \
+    t ^= E_KEY[6 * i + 4]; E_KEY[6 * i + 10] = t;   \
+    t ^= E_KEY[6 * i + 5]; E_KEY[6 * i + 11] = t;   \
+}
+
+#define loop8(i)                                    \
+{   t = rotr(t,  8); ; t = ls_box(t) ^ rco_tab[i];  \
+    t ^= E_KEY[8 * i];     E_KEY[8 * i + 8] = t;    \
+    t ^= E_KEY[8 * i + 1]; E_KEY[8 * i + 9] = t;    \
+    t ^= E_KEY[8 * i + 2]; E_KEY[8 * i + 10] = t;   \
+    t ^= E_KEY[8 * i + 3]; E_KEY[8 * i + 11] = t;   \
+    t  = E_KEY[8 * i + 4] ^ ls_box(t);    \
+    E_KEY[8 * i + 12] = t;                \
+    t ^= E_KEY[8 * i + 5]; E_KEY[8 * i + 13] = t;   \
+    t ^= E_KEY[8 * i + 6]; E_KEY[8 * i + 14] = t;   \
+    t ^= E_KEY[8 * i + 7]; E_KEY[8 * i + 15] = t;   \
+}
+
+static int
+aes_set_key(void *ctx_arg, const u8 *in_key, unsigned int key_len)
+{
+	struct aes_ctx *ctx = ctx_arg;
+	u32 i, t, u, v, w;
+	u32 P[AES_EXTENDED_KEY_SIZE];
+	u32 rounds;
+
+	if (key_len != 16 && key_len != 24 && key_len != 32) {
+		return -EINVAL;
+	}
+
+	ctx->key_length = key_len;
+
+	ctx->E = ctx->e_data;
+	ctx->D = ctx->d_data;
+
+	/* Ensure 16-Bytes alignmentation of keys for VIA PadLock. */
+	if ((int)(ctx->e_data) & 0x0F)
+		ctx->E += 4 - (((int)(ctx->e_data) & 0x0F) / sizeof (ctx->e_data[0]));
+
+	if ((int)(ctx->d_data) & 0x0F)
+		ctx->D += 4 - (((int)(ctx->d_data) & 0x0F) / sizeof (ctx->d_data[0]));
+
+	E_KEY[0] = u32_in (in_key);
+	E_KEY[1] = u32_in (in_key + 4);
+	E_KEY[2] = u32_in (in_key + 8);
+	E_KEY[3] = u32_in (in_key + 12);
+
+	/* Don't generate extended keys if the hardware can do it. */
+	if (aes_hw_extkey_available(key_len))
+		return 0;
+
+	switch (key_len) {
+	case 16:
+		t = E_KEY[3];
+		for (i = 0; i < 10; ++i)
+			loop4 (i);
+		break;
+
+	case 24:
+		E_KEY[4] = u32_in (in_key + 16);
+		t = E_KEY[5] = u32_in (in_key + 20);
+		for (i = 0; i < 8; ++i)
+			loop6 (i);
+		break;
+
+	case 32:
+		E_KEY[4] = u32_in (in_key + 16);
+		E_KEY[5] = u32_in (in_key + 20);
+		E_KEY[6] = u32_in (in_key + 24);
+		t = E_KEY[7] = u32_in (in_key + 28);
+		for (i = 0; i < 7; ++i)
+			loop8 (i);
+		break;
+	}
+
+	D_KEY[0] = E_KEY[0];
+	D_KEY[1] = E_KEY[1];
+	D_KEY[2] = E_KEY[2];
+	D_KEY[3] = E_KEY[3];
+
+	for (i = 4; i < key_len + 24; ++i) {
+		imix_col (D_KEY[i], E_KEY[i]);
+	}
+
+	/* PadLock needs a different format of the decryption key. */
+	rounds = 10 + (key_len - 16) / 4;
+
+	for (i = 0; i < rounds; i++) {
+		P[((i + 1) * 4) + 0] = D_KEY[((rounds - i - 1) * 4) + 0];
+		P[((i + 1) * 4) + 1] = D_KEY[((rounds - i - 1) * 4) + 1];
+		P[((i + 1) * 4) + 2] = D_KEY[((rounds - i - 1) * 4) + 2];
+		P[((i + 1) * 4) + 3] = D_KEY[((rounds - i - 1) * 4) + 3];
+	}
+
+	P[0] = E_KEY[(rounds * 4) + 0];
+	P[1] = E_KEY[(rounds * 4) + 1];
+	P[2] = E_KEY[(rounds * 4) + 2];
+	P[3] = E_KEY[(rounds * 4) + 3];
+
+	memcpy(D_KEY, P, AES_EXTENDED_KEY_SIZE_B);
+
+	return 0;
+}
+
+/* Tells whether the ACE is capable to generate
+   the extended key for a given key_len. */
+static inline int aes_hw_extkey_available(u8 key_len)
+{
+	/* TODO: We should check the actual CPU model/stepping
+	         as it's likely that the capability will be
+	         added in the next CPU revisions. */
+	if (key_len == 16)
+		return 1;
+	return 0;
+}
+
+static void aes_padlock(void *ctx_arg, u8 *out_arg, const u8 *in_arg,
+			const u8 *iv_arg, size_t nbytes, int encdec,
+			int mode)
+{
+	struct aes_ctx *ctx = ctx_arg;
+	char bigbuf[sizeof(union cword) + 16];
+	union cword *cword;
+	void *key;
+
+	if (((long)bigbuf) & 0x0F)
+		cword = (void*)(bigbuf + 16 - ((long)bigbuf & 0x0F));
+	else
+		cword = (void*)bigbuf;
+
+	/* Prepare Control word. */
+	memset (cword, 0, sizeof(union cword));
+	cword->b.encdec = !encdec;	/* in the rest of cryptoapi ENC=1/DEC=0 */
+	cword->b.rounds = 10 + (ctx->key_length - 16) / 4;
+	cword->b.ksize = (ctx->key_length - 16) / 8;
+
+	/* Is the hardware capable to generate the extended key? */
+	if (!aes_hw_extkey_available(ctx->key_length))
+		cword->b.keygen = 1;
+
+	/* ctx->E starts with a plain key - if the hardware is capable
+	   to generate the extended key itself we must supply
+	   the plain key for both Encryption and Decryption. */
+	if (encdec ==  CRYPTO_OP_ENCRYPT || cword->b.keygen == 0)
+		key = ctx->E;
+	else
+		key = ctx->D;
+	
+	padlock_aligner(out_arg, in_arg, iv_arg, key, cword,
+			nbytes, AES_BLOCK_SIZE, encdec, mode);
+}
+
+static void aes_padlock_ecb(void *ctx, u8 *dst, const u8 *src, const u8 *iv,
+			    size_t nbytes, int encdec)
+{
+	aes_padlock(ctx, dst, src, NULL, nbytes, encdec, CRYPTO_MODE_ECB);
+}
+
+static void aes_padlock_cbc(void *ctx, u8 *dst, const u8 *src, const u8 *iv,
+			    size_t nbytes, int encdec)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec, CRYPTO_MODE_CBC);
+}
+
+static void aes_padlock_cfb(void *ctx, u8 *dst, const u8 *src, const u8 *iv,
+			    size_t nbytes, int encdec)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec, CRYPTO_MODE_CFB);
+}
+
+static void aes_padlock_ofb(void *ctx, u8 *dst, const u8 *src, const u8 *iv,
+			    size_t nbytes, int encdec)
+{
+	aes_padlock(ctx, dst, src, iv, nbytes, encdec, CRYPTO_MODE_OFB);
+}
+
+static struct crypto_capability padlock_caps[] = 
+{
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_ECB, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CBC, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CFB, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_OFB, 1000},
+
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_ECB, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CBC, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CFB, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_OFB, 1000},
+	
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_ECB, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CBC, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CFB, 1000},
+	{CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_OFB, 1000},
+	
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_ECB, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CBC, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CFB, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_OFB, 1000},
+
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_ECB, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CBC, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CFB, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_OFB, 1000},
+	
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_ECB, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CBC, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CFB, 1000},
+	{CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_OFB, 1000},
+};
+static int padlock_cap_number = sizeof(padlock_caps)/sizeof(padlock_caps[0]);
+
+static void padlock_data_ready(struct crypto_device *dev);
+static int padlock_data_ready_reentry;
+
+static struct crypto_device padlock_device =
+{
+	.name			= "via-padlock",
+	.data_ready		= padlock_data_ready,
+	.cap			= &padlock_caps[0],
+};
+
+static void process_session(struct crypto_session *s)
+{
+	int err;
+	u8 *key, *dst, *src, *iv;
+	size_t size, keylen;
+	
+	key = ((u8 *)page_address(s->data.sg_key.page)) + s->data.sg_key.offset;
+	keylen = s->data.sg_key.length;
+	dst = ((u8 *)page_address(s->data.sg_dst.page)) + s->data.sg_dst.offset;
+	src = ((u8 *)page_address(s->data.sg_src.page)) + s->data.sg_src.offset;
+	size = s->data.sg_src.length;
+	iv = ((u8 *)page_address(s->data.sg_iv.page)) + s->data.sg_iv.offset;
+	
+	err = aes_set_key(s->data.priv, key, keylen);
+	if (err)
+		return;
+
+	switch (s->ci.mode)
+	{
+		case CRYPTO_MODE_ECB:
+			aes_padlock_ecb(s->data.priv, dst, src, iv, size, s->ci.operation);
+			break;
+		case CRYPTO_MODE_CBC:
+			aes_padlock_cbc(s->data.priv, dst, src, iv, size, s->ci.operation);
+			break;
+		case CRYPTO_MODE_CFB:
+			aes_padlock_cfb(s->data.priv, dst, src, iv, size, s->ci.operation);
+			break;
+		case CRYPTO_MODE_OFB:
+			aes_padlock_ofb(s->data.priv, dst, src, iv, size, s->ci.operation);
+			break;
+	}
+
+	s->data.sg_dst.length = size;
+
+	return;
+}
+
+static void padlock_data_ready(struct crypto_device *dev)
+{
+	struct crypto_session *s, *n;
+
+	if (padlock_data_ready_reentry)
+		return;
+
+	padlock_data_ready_reentry++;
+	list_for_each_entry_safe(s, n, &dev->session_list, dev_queue_entry)
+	{
+		if (!session_completed(s))
+		{
+			start_process_session(s);
+			process_session(s);
+			crypto_stat_complete_inc(s);
+			crypto_session_dequeue_route(s);
+			complete_session(s);
+			stop_process_session(s);
+		}
+	}
+	padlock_data_ready_reentry--;
+}
+
+int padlock_init_aes(void)
+{
+	u32 cpuid, edx;
+	u32 val = 0xC0000000;
+
+	cpuid = cpuid_eax(val);
+	edx = cpuid_edx(val);
+	printk("val=%x, cpuid=%x, edx=%x.\n", val, cpuid, edx);
+	if (cpuid >= val + 1)
+	{
+		printk("Board supports ACE.\n");
+	}
+	else
+	{
+		printk("Board does not support ACE.\n");
+		return -ENODEV;
+	}
+	
+	printk(KERN_NOTICE "Using VIA PadLock ACE for AES algorithm (multiblock).\n");
+
+	padlock_device.cap_number = padlock_cap_number;
+	
+	gen_tabs();
+	return crypto_device_add(&padlock_device);
+}
+
+void padlock_fini_aes(void)
+{
+	crypto_device_remove(&padlock_device);
+}
diff -Nru /tmp/empty/padlock-generic.c via-padlock/padlock-generic.c
--- /tmp/empty/padlock-generic.c	1970-01-01 03:00:00.000000000 +0300
+++ via-padlock/padlock-generic.c	2004-11-01 09:30:41.000000000 +0300
@@ -0,0 +1,191 @@
+/* 
+ * Cryptographic API.
+ *
+ * Support for VIA PadLock hardware crypto engine.
+ *
+ * Linux developers:
+ *  Michal Ludvig <mludvig@suse.cz>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/crypto.h>
+#include <asm/byteorder.h>
+
+#include "padlock.h"
+#include "../acrypto.h"
+#include "../crypto_def.h"
+
+#define PFX	"padlock: "
+
+typedef void (xcrypt_t)(u8 *input, u8 *output, u8 *key, u8 *iv,
+			void *control_word, u32 count);
+
+static inline void padlock_xcrypt_ecb(u8 *input, u8 *output, u8 *key,
+				      u8 *iv, void *control_word, u32 count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xc8"	/* rep xcryptecb */
+		      : "=m"(*output), "+S"(input), "+D"(output)
+		      : "d"(control_word), "b"(key), "c"(count));
+}
+
+static inline void padlock_xcrypt_cbc(u8 *input, u8 *output, u8 *key,
+				      u8 *iv, void *control_word, u32 count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xd0"	/* rep xcryptcbc */
+		      : "=m"(*output), "+S"(input), "+D"(output)
+		      : "d"(control_word), "b"(key), "c"(count), "a"(iv));
+}
+
+static inline void padlock_xcrypt_cfb(u8 *input, u8 *output, u8 *key,
+				      u8 *iv, void *control_word, u32 count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xe0"	/* rep xcryptcfb */
+		      : "=m"(*output), "+S"(input), "+D"(output)
+		      : "d"(control_word), "b"(key), "c"(count), "a"(iv));
+}
+
+static inline void padlock_xcrypt_ofb(u8 *input, u8 *output, u8 *key,
+				      u8 *iv, void *control_word, u32 count)
+{
+	asm volatile ("pushfl; popfl");		/* enforce key reload. */
+	asm volatile (".byte 0xf3,0x0f,0xa7,0xe8"	/* rep xcryptofb */
+		      : "=m"(*output), "+S"(input), "+D"(output)
+		      : "d"(control_word), "b"(key), "c"(count), "a"(iv));
+}
+
+void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index)
+{
+       char *ptr;
+
+       ptr = kmalloc(size + alignment, mode);
+       *index = ptr;
+       if (alignment > 1 && ((long)ptr & (alignment - 1))) {
+               ptr += alignment - ((long)ptr & (alignment - 1));
+       }
+
+       return ptr;
+}
+
+void padlock_aligner(u8 *out_arg, const u8 *in_arg, const u8 *iv_arg,
+		     void *key, union cword *cword,
+		     size_t nbytes, size_t blocksize,
+		     int encdec, int mode)
+{
+	/* Don't blindly modify this structure - the items must 
+	   fit on 16-Bytes boundaries! */
+	struct padlock_xcrypt_data {
+		u8 iv[blocksize];		/* Initialization vector */
+	};
+
+	u8 *in, *out, *iv;
+	void *index = NULL;
+	char bigbuf[sizeof(struct padlock_xcrypt_data) + 16];
+	struct padlock_xcrypt_data *data;
+
+	/* Place 'data' at the first 16-Bytes aligned address in 'bigbuf'. */
+	if (((long)bigbuf) & 0x0F)
+		data = (void*)(bigbuf + 16 - ((long)bigbuf & 0x0F));
+	else
+		data = (void*)bigbuf;
+
+	if (((long)in_arg) & 0x0F) {
+		in = crypto_aligned_kmalloc(nbytes, GFP_KERNEL, 16, &index);
+		memcpy(in, in_arg, nbytes);
+	}
+	else
+		in = (u8*)in_arg;
+	
+	if (((long)out_arg) & 0x0F) {
+		if (index)
+			out = in;       /* xcrypt can work "in place" */
+		else
+			out = crypto_aligned_kmalloc(nbytes, GFP_KERNEL, 16, &index);
+	}
+	else
+		out = out_arg;
+
+	/* Always make a local copy of IV - xcrypt may change it! */
+	iv = data->iv;
+	if (iv_arg)
+		memcpy(iv, iv_arg, blocksize);
+	
+
+	dprintk("data=%p\n", data);
+	dprintk("in=%p\n", in);
+	dprintk("out=%p\n", out);
+	dprintk("iv=%p\n", iv);
+	dprintk("nbytes=%d, blocksize=%d.\n", nbytes, blocksize);
+
+	switch (mode) {
+		case CRYPTO_MODE_ECB:
+			padlock_xcrypt_ecb(in, out, key, iv, cword, nbytes/blocksize);
+			break;
+
+		case CRYPTO_MODE_CBC:
+			padlock_xcrypt_cbc(in, out, key, iv, cword, nbytes/blocksize);
+			break;
+
+		case CRYPTO_MODE_CFB:
+			padlock_xcrypt_cfb(in, out, key, iv, cword, nbytes/blocksize);
+			break;
+
+		case CRYPTO_MODE_OFB:
+			padlock_xcrypt_ofb(in, out, key, iv, cword, nbytes/blocksize);
+			break;
+
+		default:
+			BUG();
+	}
+
+	/* Copy the 16-Byte aligned output to the caller's buffer. */
+	if (out != out_arg)
+		memcpy(out_arg, out, nbytes);
+
+	if (index)
+		kfree(index);
+}
+
+static int __init padlock_init(void)
+{
+	int ret = -ENOSYS;
+#if 0	
+	if (!cpu_has_xcrypt) {
+		printk(KERN_ERR PFX "VIA PadLock not detected.\n");
+		return -ENODEV;
+	}
+
+	if (!cpu_has_xcrypt_enabled) {
+		printk(KERN_ERR PFX "VIA PadLock detected, but not enabled. Hmm, strange...\n");
+		return -ENODEV;
+	}
+#endif
+	if ((ret = padlock_init_aes())) {
+		printk(KERN_ERR PFX "VIA PadLock AES initialization failed.\n");
+		return ret;
+	}
+
+	return ret;
+}
+
+static void __exit padlock_fini(void)
+{
+	padlock_fini_aes();
+}
+
+module_init(padlock_init);
+module_exit(padlock_fini);
+
+MODULE_DESCRIPTION("VIA PadLock crypto engine support.");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Michal Ludvig");
diff -Nru /tmp/empty/padlock.h via-padlock/padlock.h
--- /tmp/empty/padlock.h	1970-01-01 03:00:00.000000000 +0300
+++ via-padlock/padlock.h	2004-10-28 10:05:50.000000000 +0400
@@ -0,0 +1,71 @@
+/*
+ * Cryptographic API.
+ *
+ * Copyright (c) 2004 Michal Ludvig <mludvig@suse.cz>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option) 
+ * any later version.
+ *
+ */
+
+#ifndef _CRYPTO_PADLOCK_H
+#define _CRYPTO_PADLOCK_H
+
+#define AES_MIN_KEY_SIZE	16	/* in u8 units */
+#define AES_MAX_KEY_SIZE	32	/* ditto */
+#define AES_BLOCK_SIZE		16	/* ditto */
+#define AES_EXTENDED_KEY_SIZE	64	/* in u32 units */
+#define AES_EXTENDED_KEY_SIZE_B	(AES_EXTENDED_KEY_SIZE * sizeof(u32))
+
+struct aes_ctx {
+	u32 e_data[AES_EXTENDED_KEY_SIZE+4];
+	u32 d_data[AES_EXTENDED_KEY_SIZE+4];
+	int key_length;
+	u32 *E;
+	u32 *D;
+};
+
+#define E_KEY ctx->E
+#define D_KEY ctx->D
+
+
+/* Control word. */
+#if 1
+union cword {
+	u32 cword[4];
+	struct {
+		int rounds:4;
+		int algo:3;
+		int keygen:1;
+		int interm:1;
+		int encdec:1;
+		int ksize:2;
+	} b;
+};
+#else
+union cword {
+	u32 cword[4];
+	struct {
+		unsigned	rounds:4,
+				algo:3,
+				keygen:1,
+				interm:1,
+				encdec:1,
+				ksize:2;
+	} b;
+};
+#endif
+
+#define PFX	"padlock: "
+
+void padlock_aligner(u8 *out_arg, const u8 *in_arg, const u8 *iv_arg,
+		     void *key, union cword *cword,
+		     size_t nbytes, size_t blocksize,
+		     int encdec, int mode);
+
+int padlock_init_aes(void);
+void padlock_fini_aes(void);
+
+#endif	/* _CRYPTO_PADLOCK_H */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov
  2005-01-14 22:31 ` Evgeniy Polyakov
  2005-01-14 22:32 ` Evgeniy Polyakov
@ 2005-01-14 22:33 ` Evgeniy Polyakov
  2005-01-14 22:34 ` Evgeniy Polyakov
  2005-01-14 22:41 ` Evgeniy Polyakov
  4 siblings, 0 replies; 18+ messages in thread
From: Evgeniy Polyakov @ 2005-01-14 22:33 UTC (permalink / raw)
  To: johnpol
  Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton,
	James Morris, cryptoapi, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 292 bytes --]

 hifn archive - driver for HIFN 7955/7956 (7956 was not run on Clemens' setup,
 	hopefully patches sent to him fixed that).
 	This is work in progress and currently works only on low load 
 	(about one session per 10 msec).

	Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

[-- Attachment #2: hifn-14_01_2005.tar.gz --]
[-- Type: application/octet-stream, Size: 29808 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov
  2005-01-14 22:31 ` Evgeniy Polyakov
@ 2005-01-14 22:32 ` Evgeniy Polyakov
  2005-01-14 22:33 ` Evgeniy Polyakov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Evgeniy Polyakov @ 2005-01-14 22:32 UTC (permalink / raw)
  To: johnpol
  Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton,
	James Morris, cryptoapi, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 263 bytes --]

 acrypto archive - asynchronous crypto layer, the latest(third) reincarnation(announce below).
 	It also has asynchronous and synchronous test crypto providers and test crypto
 	consumer module.

	Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

[-- Attachment #2: acrypto-14_01_2005.tar.gz --]
[-- Type: application/octet-stream, Size: 38272 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
  2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov
@ 2005-01-14 22:31 ` Evgeniy Polyakov
  2005-01-14 22:32 ` Evgeniy Polyakov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Evgeniy Polyakov @ 2005-01-14 22:31 UTC (permalink / raw)
  To: johnpol
  Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton,
	James Morris, cryptoapi, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

On Sat, 15 Jan 2005 01:31:03 +0300
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

 bd archive - simple in-mamory block device used for test. I currently work 
 	on creating modular loop device replacement based on bd, which could allow
 	network block device to be removed(btw, it is broken at least in 2.6.9)
 	and also allow acrypto module to be used with various tweakable ciphers.
 	I hope that system will provide more flexible control over dataflow
 	than loop device currently does.
 	I recomend following  interesting reading about tweaking ciphers: 
 	http://clemens.endorphin.org/cryptography

	Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

[-- Attachment #2: bd-14_01_2005.tar.gz --]
[-- Type: application/octet-stream, Size: 3542 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-03-03 12:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Xine.LNX.4.44.0411301009560.11945-100000@thoron.boston.redhat.com>
     [not found] ` <Pine.LNX.4.61.0411301722270.4409@maxipes.logix.cz>
     [not found]   ` <20041130222442.7b0f4f67.davem@davemloft.net>
2005-01-11 17:03     ` PadLock processing multiple blocks at a time Michal Ludvig
2005-01-11 17:08       ` [PATCH 1/2] " Michal Ludvig
2005-01-14 13:10         ` [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers " Michal Ludvig
2005-01-14 14:20           ` Fruhwirth Clemens
2005-01-14 16:40             ` Michal Ludvig
2005-01-15 12:45               ` Fruhwirth Clemens
2005-01-18 16:49                 ` James Morris
2005-01-20  3:30                   ` David McCullough
2005-01-20 13:47                     ` James Morris
2005-03-03 10:50                       ` David McCullough
2005-01-11 17:08       ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig
2005-01-14  3:05         ` Andrew Morton
2005-01-14 13:15         ` [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once Michal Ludvig
2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov
2005-01-14 22:31 ` Evgeniy Polyakov
2005-01-14 22:32 ` Evgeniy Polyakov
2005-01-14 22:33 ` Evgeniy Polyakov
2005-01-14 22:34 ` Evgeniy Polyakov
2005-01-14 22:41 ` Evgeniy Polyakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).