* PadLock processing multiple blocks at a time [not found] ` <20041130222442.7b0f4f67.davem@davemloft.net> @ 2005-01-11 17:03 ` Michal Ludvig 2005-01-11 17:08 ` [PATCH 1/2] " Michal Ludvig 2005-01-11 17:08 ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig 0 siblings, 2 replies; 18+ messages in thread From: Michal Ludvig @ 2005-01-11 17:03 UTC (permalink / raw) To: David S. Miller; +Cc: jmorris, cryptoapi, linux-kernel Hi all, I have got some improvements for VIA PadLock crypto driver. 1. Generic extension to crypto/cipher.c that allows offloading the encryption of the whole buffer in a given mode (CBC, ...) to the algorithm provider (i.e. PadLock). Basically it extends 'struct cipher_alg' by some new fields: @@ -69,6 +73,18 @@ struct cipher_alg { unsigned int keylen, u32 *flags); void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src); void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src); + size_t cia_max_nbytes; + size_t cia_req_align; + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); }; If cia_<mode> is non-NULL that function is used instead of the software <mode>_process chaining function (e.g. cbc_process()). In the case of PadLock it can significantly speed-up the {en,de}cryption. 2. On top of this I have an extension of the padlock module to support this scheme. I will send both patches in separate follow ups. The speedup gained by this change is quite significant (measured with bonnie on ext2 over dm-crypt with aes128): No encryption 2.6.10-bk1 multiblock Writing with putc() 10454 (100%) 7479 (72%) 9353 (89%) Rewriting 16510 (100%) 7628 (46%) 10611 (64%) Writing intelligently 61128 (100%) 21132 (35%) 48103 (79%) Reading with getc() 9406 (100%) 6916 (74%) 8801 (94%) Reading intelligently 35885 (100%) 15271 (43%) 23202 (65%) Numbers are in kB/s, percents show the slowdown from plaintext run. As can be seen, the multiblock encryption is significantly faster in comparsion to the already comitted single-block-at-a-time processing. More statistics (e.g. comparsion with aes.ko and aes-i586.ko) are available at http://www.logix.cz/michal/devel/padlock/bench.xp Dave, if you're OK with these changes, please merge them. Michal Ludvig -- * A mouse is a device used to point at the xterm you want to type in. * Personal homepage - http://www.logix.cz/michal ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] PadLock processing multiple blocks at a time 2005-01-11 17:03 ` PadLock processing multiple blocks at a time Michal Ludvig @ 2005-01-11 17:08 ` Michal Ludvig 2005-01-14 13:10 ` [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers " Michal Ludvig 2005-01-11 17:08 ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig 1 sibling, 1 reply; 18+ messages in thread From: Michal Ludvig @ 2005-01-11 17:08 UTC (permalink / raw) To: David S. Miller; +Cc: jmorris, cryptoapi, linux-kernel # # Extends crypto/cipher.c for offloading the whole chaining modes # to e.g. hardware crypto accelerators. # # Signed-off-by: Michal Ludvig <mludvig@suse.cz> # Index: linux-2.6.10/crypto/api.c =================================================================== --- linux-2.6.10.orig/crypto/api.c 2004-12-24 22:35:39.000000000 +0100 +++ linux-2.6.10/crypto/api.c 2005-01-10 16:37:11.943356651 +0100 @@ -217,6 +217,19 @@ int crypto_alg_available(const char *nam return ret; } +void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index) +{ + char *ptr; + + ptr = kmalloc(size + alignment, mode); + *index = ptr; + if (alignment > 1 && ((long)ptr & (alignment - 1))) { + ptr += alignment - ((long)ptr & (alignment - 1)); + } + + return ptr; +} + static int __init init_crypto(void) { printk(KERN_INFO "Initializing Cryptographic API\n"); @@ -231,3 +244,4 @@ EXPORT_SYMBOL_GPL(crypto_unregister_alg) EXPORT_SYMBOL_GPL(crypto_alloc_tfm); EXPORT_SYMBOL_GPL(crypto_free_tfm); EXPORT_SYMBOL_GPL(crypto_alg_available); +EXPORT_SYMBOL_GPL(crypto_aligned_kmalloc); Index: linux-2.6.10/include/linux/crypto.h =================================================================== --- linux-2.6.10.orig/include/linux/crypto.h 2005-01-07 17:26:42.000000000 +0100 +++ linux-2.6.10/include/linux/crypto.h 2005-01-10 16:37:52.157648454 +0100 @@ -42,6 +42,7 @@ #define CRYPTO_TFM_MODE_CBC 0x00000002 #define CRYPTO_TFM_MODE_CFB 0x00000004 #define CRYPTO_TFM_MODE_CTR 0x00000008 +#define CRYPTO_TFM_MODE_OFB 0x00000010 #define CRYPTO_TFM_REQ_WEAK_KEY 0x00000100 #define CRYPTO_TFM_RES_WEAK_KEY 0x00100000 @@ -72,6 +73,18 @@ struct cipher_alg { unsigned int keylen, u32 *flags); void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src); void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src); + size_t cia_max_nbytes; + size_t cia_req_align; + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); }; struct digest_alg { @@ -124,6 +137,11 @@ int crypto_unregister_alg(struct crypto_ int crypto_alg_available(const char *name, u32 flags); /* + * Helper function. + */ +void *crypto_aligned_kmalloc (size_t size, int mode, size_t alignment, void **index); + +/* * Transforms: user-instantiated objects which encapsulate algorithms * and core processing logic. Managed via crypto_alloc_tfm() and * crypto_free_tfm(), as well as the various helpers below. @@ -258,6 +276,18 @@ static inline unsigned int crypto_tfm_al return tfm->__crt_alg->cra_digest.dia_digestsize; } +static inline unsigned int crypto_tfm_alg_max_nbytes(struct crypto_tfm *tfm) +{ + BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER); + return tfm->__crt_alg->cra_cipher.cia_max_nbytes; +} + +static inline unsigned int crypto_tfm_alg_req_align(struct crypto_tfm *tfm) +{ + BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER); + return tfm->__crt_alg->cra_cipher.cia_req_align; +} + /* * API wrappers. */ Index: linux-2.6.10/crypto/cipher.c =================================================================== --- linux-2.6.10.orig/crypto/cipher.c 2004-12-24 22:34:57.000000000 +0100 +++ linux-2.6.10/crypto/cipher.c 2005-01-10 16:37:11.974350710 +0100 @@ -20,7 +20,31 @@ #include "internal.h" #include "scatterwalk.h" +#define CRA_CIPHER(tfm) (tfm)->__crt_alg->cra_cipher + +#define DEF_TFM_FUNCTION(name,mode,encdec,iv) \ +static int name(struct crypto_tfm *tfm, \ + struct scatterlist *dst, \ + struct scatterlist *src, \ + unsigned int nbytes) \ +{ \ + return crypt(tfm, dst, src, nbytes, \ + mode, encdec, iv); \ +} + +#define DEF_TFM_FUNCTION_IV(name,mode,encdec,iv) \ +static int name(struct crypto_tfm *tfm, \ + struct scatterlist *dst, \ + struct scatterlist *src, \ + unsigned int nbytes, u8 *iv) \ +{ \ + return crypt(tfm, dst, src, nbytes, \ + mode, encdec, iv); \ +} + typedef void (cryptfn_t)(void *, u8 *, const u8 *); +typedef void (cryptblkfn_t)(void *, u8 *, const u8 *, u8 *, + size_t, int, int); typedef void (procfn_t)(struct crypto_tfm *, u8 *, u8*, cryptfn_t, int enc, void *, int); @@ -38,6 +62,36 @@ static inline void xor_128(u8 *a, const ((u32 *)a)[3] ^= ((u32 *)b)[3]; } +static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, + cryptfn_t *fn, int enc, void *info, int in_place) +{ + u8 *iv = info; + + /* Null encryption */ + if (!iv) + return; + + if (enc) { + tfm->crt_u.cipher.cit_xor_block(iv, src); + (*fn)(crypto_tfm_ctx(tfm), dst, iv); + memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm)); + } else { + u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0]; + u8 *buf = in_place ? stack : dst; + + (*fn)(crypto_tfm_ctx(tfm), buf, src); + tfm->crt_u.cipher.cit_xor_block(buf, iv); + memcpy(iv, src, crypto_tfm_alg_blocksize(tfm)); + if (buf != dst) + memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm)); + } +} + +static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, + cryptfn_t fn, int enc, void *info, int in_place) +{ + (*fn)(crypto_tfm_ctx(tfm), dst, src); +} /* * Generic encrypt/decrypt wrapper for ciphers, handles operations across @@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const static int crypt(struct crypto_tfm *tfm, struct scatterlist *dst, struct scatterlist *src, - unsigned int nbytes, cryptfn_t crfn, - procfn_t prfn, int enc, void *info) + unsigned int nbytes, + int mode, int enc, void *info) { - struct scatter_walk walk_in, walk_out; - const unsigned int bsize = crypto_tfm_alg_blocksize(tfm); - u8 tmp_src[bsize]; - u8 tmp_dst[bsize]; + cryptfn_t *cryptofn = NULL; + procfn_t *processfn = NULL; + cryptblkfn_t *cryptomultiblockfn = NULL; + + struct scatter_walk walk_in, walk_out; + size_t max_nbytes = crypto_tfm_alg_max_nbytes(tfm); + size_t bsize = crypto_tfm_alg_blocksize(tfm); + int req_align = crypto_tfm_alg_req_align(tfm); + int ret = 0; + int gfp; + void *index_src = NULL, *index_dst = NULL; + u8 *iv = info; + u8 *tmp_src, *tmp_dst; if (!nbytes) - return 0; + return ret; if (nbytes % bsize) { tfm->crt_flags |= CRYPTO_TFM_RES_BAD_BLOCK_LEN; - return -EINVAL; + ret = -EINVAL; + goto out; } + + switch (mode) { + case CRYPTO_TFM_MODE_ECB: + if (CRA_CIPHER(tfm).cia_ecb) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_ecb; + else { + cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ? + CRA_CIPHER(tfm).cia_encrypt : + CRA_CIPHER(tfm).cia_decrypt; + processfn = ecb_process; + } + break; + + case CRYPTO_TFM_MODE_CBC: + if (CRA_CIPHER(tfm).cia_cbc) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_cbc; + else { + cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ? + CRA_CIPHER(tfm).cia_encrypt : + CRA_CIPHER(tfm).cia_decrypt; + processfn = cbc_process; + } + break; + + /* Until we have the appropriate {ofb,cfb,ctr}_process() + functions, the following cases will return -ENOSYS if + there is no HW support for the mode. */ + case CRYPTO_TFM_MODE_OFB: + if (CRA_CIPHER(tfm).cia_ofb) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_ofb; + else + return -ENOSYS; + break; + + case CRYPTO_TFM_MODE_CFB: + if (CRA_CIPHER(tfm).cia_cfb) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_cfb; + else + return -ENOSYS; + break; + + case CRYPTO_TFM_MODE_CTR: + if (CRA_CIPHER(tfm).cia_ctr) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_ctr; + else + return -ENOSYS; + break; + + default: + BUG(); + } + + if (cryptomultiblockfn) + bsize = (max_nbytes > nbytes) ? nbytes : max_nbytes; + + /* Some hardware crypto engines may require a specific + alignment of the buffers. We will align the buffers + already here to avoid their reallocating later. */ + gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL; + tmp_src = crypto_aligned_kmalloc(bsize, gfp, + req_align, &index_src); + tmp_dst = crypto_aligned_kmalloc(bsize, gfp, + req_align, &index_dst); + + if (!index_src || !index_dst) { + ret = -ENOMEM; + goto out; + } + scatterwalk_start(&walk_in, src); scatterwalk_start(&walk_out, dst); @@ -81,7 +214,13 @@ static int crypt(struct crypto_tfm *tfm, scatterwalk_copychunks(src_p, &walk_in, bsize, 0); - prfn(tfm, dst_p, src_p, crfn, enc, info, in_place); + if (cryptomultiblockfn) + (*cryptomultiblockfn)(crypto_tfm_ctx(tfm), + dst_p, src_p, iv, + bsize, enc, in_place); + else + (*processfn)(tfm, dst_p, src_p, cryptofn, + enc, info, in_place); scatterwalk_done(&walk_in, 0, nbytes); @@ -89,46 +228,23 @@ static int crypt(struct crypto_tfm *tfm, scatterwalk_done(&walk_out, 1, nbytes); if (!nbytes) - return 0; + goto out; crypto_yield(tfm); } -} - -static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, - cryptfn_t fn, int enc, void *info, int in_place) -{ - u8 *iv = info; - - /* Null encryption */ - if (!iv) - return; - - if (enc) { - tfm->crt_u.cipher.cit_xor_block(iv, src); - fn(crypto_tfm_ctx(tfm), dst, iv); - memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm)); - } else { - u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0]; - u8 *buf = in_place ? stack : dst; - fn(crypto_tfm_ctx(tfm), buf, src); - tfm->crt_u.cipher.cit_xor_block(buf, iv); - memcpy(iv, src, crypto_tfm_alg_blocksize(tfm)); - if (buf != dst) - memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm)); - } -} +out: + if (index_src) + kfree(index_src); + if (index_dst) + kfree(index_dst); -static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, - cryptfn_t fn, int enc, void *info, int in_place) -{ - fn(crypto_tfm_ctx(tfm), dst, src); + return ret; } static int setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen) { - struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher; + struct cipher_alg *cia = &CRA_CIPHER(tfm); if (keylen < cia->cia_min_keysize || keylen > cia->cia_max_keysize) { tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN; @@ -138,80 +254,28 @@ static int setkey(struct crypto_tfm *tfm &tfm->crt_flags); } -static int ecb_encrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_encrypt, - ecb_process, 1, NULL); -} +DEF_TFM_FUNCTION(ecb_encrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_ENCRYPT, NULL); +DEF_TFM_FUNCTION(ecb_decrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_DECRYPT, NULL); -static int ecb_decrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_decrypt, - ecb_process, 1, NULL); -} - -static int cbc_encrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_encrypt, - cbc_process, 1, tfm->crt_cipher.cit_iv); -} - -static int cbc_encrypt_iv(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes, u8 *iv) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_encrypt, - cbc_process, 1, iv); -} - -static int cbc_decrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_decrypt, - cbc_process, 0, tfm->crt_cipher.cit_iv); -} - -static int cbc_decrypt_iv(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes, u8 *iv) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_decrypt, - cbc_process, 0, iv); -} - -static int nocrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return -ENOSYS; -} - -static int nocrypt_iv(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes, u8 *iv) -{ - return -ENOSYS; -} +DEF_TFM_FUNCTION(cbc_encrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cbc_encrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(cbc_decrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cbc_decrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, iv); + +DEF_TFM_FUNCTION(cfb_encrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cfb_encrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(cfb_decrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cfb_decrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, iv); + +DEF_TFM_FUNCTION(ofb_encrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ofb_encrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(ofb_decrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ofb_decrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, iv); + +DEF_TFM_FUNCTION(ctr_encrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ctr_encrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(ctr_decrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ctr_decrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, iv); int crypto_init_cipher_flags(struct crypto_tfm *tfm, u32 flags) { @@ -245,17 +309,24 @@ int crypto_init_cipher_ops(struct crypto break; case CRYPTO_TFM_MODE_CFB: - ops->cit_encrypt = nocrypt; - ops->cit_decrypt = nocrypt; - ops->cit_encrypt_iv = nocrypt_iv; - ops->cit_decrypt_iv = nocrypt_iv; + ops->cit_encrypt = cfb_encrypt; + ops->cit_decrypt = cfb_decrypt; + ops->cit_encrypt_iv = cfb_encrypt_iv; + ops->cit_decrypt_iv = cfb_decrypt_iv; + break; + + case CRYPTO_TFM_MODE_OFB: + ops->cit_encrypt = ofb_encrypt; + ops->cit_decrypt = ofb_decrypt; + ops->cit_encrypt_iv = ofb_encrypt_iv; + ops->cit_decrypt_iv = ofb_decrypt_iv; break; case CRYPTO_TFM_MODE_CTR: - ops->cit_encrypt = nocrypt; - ops->cit_decrypt = nocrypt; - ops->cit_encrypt_iv = nocrypt_iv; - ops->cit_decrypt_iv = nocrypt_iv; + ops->cit_encrypt = ctr_encrypt; + ops->cit_decrypt = ctr_decrypt; + ops->cit_encrypt_iv = ctr_encrypt_iv; + ops->cit_decrypt_iv = ctr_decrypt_iv; break; default: ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-11 17:08 ` [PATCH 1/2] " Michal Ludvig @ 2005-01-14 13:10 ` Michal Ludvig 2005-01-14 14:20 ` Fruhwirth Clemens 0 siblings, 1 reply; 18+ messages in thread From: Michal Ludvig @ 2005-01-14 13:10 UTC (permalink / raw) To: Andrew Morton; +Cc: David S. Miller, jmorris, cryptoapi, linux-kernel Hi all, I'm resending this patch with trailing spaces removed per Andrew's comment. This patch extends crypto/cipher.c for offloading the whole chaining modes to e.g. hardware crypto accelerators. It is much faster to let the hardware do all the chaining if it can do so. Signed-off-by: Michal Ludvig <michal@logix.cz> --- crypto/api.c | 14 ++ crypto/cipher.c | 313 ++++++++++++++++++++++++++++++------------------- include/linux/crypto.h | 30 ++++ 3 files changed, 236 insertions(+), 121 deletions(-) Index: linux-2.6.10/crypto/api.c =================================================================== --- linux-2.6.10.orig/crypto/api.c 2004-12-24 22:35:39.000000000 +0100 +++ linux-2.6.10/crypto/api.c 2005-01-10 16:37:11.943356651 +0100 @@ -217,6 +217,19 @@ int crypto_alg_available(const char *nam return ret; } +void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index) +{ + char *ptr; + + ptr = kmalloc(size + alignment, mode); + *index = ptr; + if (alignment > 1 && ((long)ptr & (alignment - 1))) { + ptr += alignment - ((long)ptr & (alignment - 1)); + } + + return ptr; +} + static int __init init_crypto(void) { printk(KERN_INFO "Initializing Cryptographic API\n"); @@ -231,3 +244,4 @@ EXPORT_SYMBOL_GPL(crypto_unregister_alg) EXPORT_SYMBOL_GPL(crypto_alloc_tfm); EXPORT_SYMBOL_GPL(crypto_free_tfm); EXPORT_SYMBOL_GPL(crypto_alg_available); +EXPORT_SYMBOL_GPL(crypto_aligned_kmalloc); Index: linux-2.6.10/include/linux/crypto.h =================================================================== --- linux-2.6.10.orig/include/linux/crypto.h 2005-01-07 17:26:42.000000000 +0100 +++ linux-2.6.10/include/linux/crypto.h 2005-01-10 16:37:52.157648454 +0100 @@ -42,6 +42,7 @@ #define CRYPTO_TFM_MODE_CBC 0x00000002 #define CRYPTO_TFM_MODE_CFB 0x00000004 #define CRYPTO_TFM_MODE_CTR 0x00000008 +#define CRYPTO_TFM_MODE_OFB 0x00000010 #define CRYPTO_TFM_REQ_WEAK_KEY 0x00000100 #define CRYPTO_TFM_RES_WEAK_KEY 0x00100000 @@ -72,6 +73,18 @@ struct cipher_alg { unsigned int keylen, u32 *flags); void (*cia_encrypt)(void *ctx, u8 *dst, const u8 *src); void (*cia_decrypt)(void *ctx, u8 *dst, const u8 *src); + size_t cia_max_nbytes; + size_t cia_req_align; + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv, + size_t nbytes, int encdec, int inplace); }; struct digest_alg { @@ -124,6 +137,11 @@ int crypto_unregister_alg(struct crypto_ int crypto_alg_available(const char *name, u32 flags); /* + * Helper function. + */ +void *crypto_aligned_kmalloc (size_t size, int mode, size_t alignment, void **index); + +/* * Transforms: user-instantiated objects which encapsulate algorithms * and core processing logic. Managed via crypto_alloc_tfm() and * crypto_free_tfm(), as well as the various helpers below. @@ -258,6 +276,18 @@ static inline unsigned int crypto_tfm_al return tfm->__crt_alg->cra_digest.dia_digestsize; } +static inline unsigned int crypto_tfm_alg_max_nbytes(struct crypto_tfm *tfm) +{ + BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER); + return tfm->__crt_alg->cra_cipher.cia_max_nbytes; +} + +static inline unsigned int crypto_tfm_alg_req_align(struct crypto_tfm *tfm) +{ + BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER); + return tfm->__crt_alg->cra_cipher.cia_req_align; +} + /* * API wrappers. */ Index: linux-2.6.10/crypto/cipher.c =================================================================== --- linux-2.6.10.orig/crypto/cipher.c 2004-12-24 22:34:57.000000000 +0100 +++ linux-2.6.10/crypto/cipher.c 2005-01-10 16:37:11.974350710 +0100 @@ -20,7 +20,31 @@ #include "internal.h" #include "scatterwalk.h" +#define CRA_CIPHER(tfm) (tfm)->__crt_alg->cra_cipher + +#define DEF_TFM_FUNCTION(name,mode,encdec,iv) \ +static int name(struct crypto_tfm *tfm, \ + struct scatterlist *dst, \ + struct scatterlist *src, \ + unsigned int nbytes) \ +{ \ + return crypt(tfm, dst, src, nbytes, \ + mode, encdec, iv); \ +} + +#define DEF_TFM_FUNCTION_IV(name,mode,encdec,iv) \ +static int name(struct crypto_tfm *tfm, \ + struct scatterlist *dst, \ + struct scatterlist *src, \ + unsigned int nbytes, u8 *iv) \ +{ \ + return crypt(tfm, dst, src, nbytes, \ + mode, encdec, iv); \ +} + typedef void (cryptfn_t)(void *, u8 *, const u8 *); +typedef void (cryptblkfn_t)(void *, u8 *, const u8 *, u8 *, + size_t, int, int); typedef void (procfn_t)(struct crypto_tfm *, u8 *, u8*, cryptfn_t, int enc, void *, int); @@ -38,6 +62,36 @@ static inline void xor_128(u8 *a, const ((u32 *)a)[3] ^= ((u32 *)b)[3]; } +static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, + cryptfn_t *fn, int enc, void *info, int in_place) +{ + u8 *iv = info; + + /* Null encryption */ + if (!iv) + return; + + if (enc) { + tfm->crt_u.cipher.cit_xor_block(iv, src); + (*fn)(crypto_tfm_ctx(tfm), dst, iv); + memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm)); + } else { + u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0]; + u8 *buf = in_place ? stack : dst; + + (*fn)(crypto_tfm_ctx(tfm), buf, src); + tfm->crt_u.cipher.cit_xor_block(buf, iv); + memcpy(iv, src, crypto_tfm_alg_blocksize(tfm)); + if (buf != dst) + memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm)); + } +} + +static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, + cryptfn_t fn, int enc, void *info, int in_place) +{ + (*fn)(crypto_tfm_ctx(tfm), dst, src); +} /* * Generic encrypt/decrypt wrapper for ciphers, handles operations across @@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const static int crypt(struct crypto_tfm *tfm, struct scatterlist *dst, struct scatterlist *src, - unsigned int nbytes, cryptfn_t crfn, - procfn_t prfn, int enc, void *info) + unsigned int nbytes, + int mode, int enc, void *info) { - struct scatter_walk walk_in, walk_out; - const unsigned int bsize = crypto_tfm_alg_blocksize(tfm); - u8 tmp_src[bsize]; - u8 tmp_dst[bsize]; + cryptfn_t *cryptofn = NULL; + procfn_t *processfn = NULL; + cryptblkfn_t *cryptomultiblockfn = NULL; + + struct scatter_walk walk_in, walk_out; + size_t max_nbytes = crypto_tfm_alg_max_nbytes(tfm); + size_t bsize = crypto_tfm_alg_blocksize(tfm); + int req_align = crypto_tfm_alg_req_align(tfm); + int ret = 0; + int gfp; + void *index_src = NULL, *index_dst = NULL; + u8 *iv = info; + u8 *tmp_src, *tmp_dst; if (!nbytes) - return 0; + return ret; if (nbytes % bsize) { tfm->crt_flags |= CRYPTO_TFM_RES_BAD_BLOCK_LEN; - return -EINVAL; + ret = -EINVAL; + goto out; } + + switch (mode) { + case CRYPTO_TFM_MODE_ECB: + if (CRA_CIPHER(tfm).cia_ecb) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_ecb; + else { + cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ? + CRA_CIPHER(tfm).cia_encrypt : + CRA_CIPHER(tfm).cia_decrypt; + processfn = ecb_process; + } + break; + + case CRYPTO_TFM_MODE_CBC: + if (CRA_CIPHER(tfm).cia_cbc) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_cbc; + else { + cryptofn = (enc == CRYPTO_DIR_ENCRYPT) ? + CRA_CIPHER(tfm).cia_encrypt : + CRA_CIPHER(tfm).cia_decrypt; + processfn = cbc_process; + } + break; + + /* Until we have the appropriate {ofb,cfb,ctr}_process() + functions, the following cases will return -ENOSYS if + there is no HW support for the mode. */ + case CRYPTO_TFM_MODE_OFB: + if (CRA_CIPHER(tfm).cia_ofb) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_ofb; + else + return -ENOSYS; + break; + + case CRYPTO_TFM_MODE_CFB: + if (CRA_CIPHER(tfm).cia_cfb) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_cfb; + else + return -ENOSYS; + break; + + case CRYPTO_TFM_MODE_CTR: + if (CRA_CIPHER(tfm).cia_ctr) + cryptomultiblockfn = CRA_CIPHER(tfm).cia_ctr; + else + return -ENOSYS; + break; + + default: + BUG(); + } + + if (cryptomultiblockfn) + bsize = (max_nbytes > nbytes) ? nbytes : max_nbytes; + + /* Some hardware crypto engines may require a specific + alignment of the buffers. We will align the buffers + already here to avoid their reallocating later. */ + gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL; + tmp_src = crypto_aligned_kmalloc(bsize, gfp, + req_align, &index_src); + tmp_dst = crypto_aligned_kmalloc(bsize, gfp, + req_align, &index_dst); + + if (!index_src || !index_dst) { + ret = -ENOMEM; + goto out; + } + scatterwalk_start(&walk_in, src); scatterwalk_start(&walk_out, dst); @@ -81,7 +214,13 @@ static int crypt(struct crypto_tfm *tfm, scatterwalk_copychunks(src_p, &walk_in, bsize, 0); - prfn(tfm, dst_p, src_p, crfn, enc, info, in_place); + if (cryptomultiblockfn) + (*cryptomultiblockfn)(crypto_tfm_ctx(tfm), + dst_p, src_p, iv, + bsize, enc, in_place); + else + (*processfn)(tfm, dst_p, src_p, cryptofn, + enc, info, in_place); scatterwalk_done(&walk_in, 0, nbytes); @@ -89,46 +228,23 @@ static int crypt(struct crypto_tfm *tfm, scatterwalk_done(&walk_out, 1, nbytes); if (!nbytes) - return 0; + goto out; crypto_yield(tfm); } -} - -static void cbc_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, - cryptfn_t fn, int enc, void *info, int in_place) -{ - u8 *iv = info; - - /* Null encryption */ - if (!iv) - return; - - if (enc) { - tfm->crt_u.cipher.cit_xor_block(iv, src); - fn(crypto_tfm_ctx(tfm), dst, iv); - memcpy(iv, dst, crypto_tfm_alg_blocksize(tfm)); - } else { - u8 stack[in_place ? crypto_tfm_alg_blocksize(tfm) : 0]; - u8 *buf = in_place ? stack : dst; - fn(crypto_tfm_ctx(tfm), buf, src); - tfm->crt_u.cipher.cit_xor_block(buf, iv); - memcpy(iv, src, crypto_tfm_alg_blocksize(tfm)); - if (buf != dst) - memcpy(dst, buf, crypto_tfm_alg_blocksize(tfm)); - } -} +out: + if (index_src) + kfree(index_src); + if (index_dst) + kfree(index_dst); -static void ecb_process(struct crypto_tfm *tfm, u8 *dst, u8 *src, - cryptfn_t fn, int enc, void *info, int in_place) -{ - fn(crypto_tfm_ctx(tfm), dst, src); + return ret; } static int setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen) { - struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher; + struct cipher_alg *cia = &CRA_CIPHER(tfm); if (keylen < cia->cia_min_keysize || keylen > cia->cia_max_keysize) { tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN; @@ -138,80 +254,28 @@ static int setkey(struct crypto_tfm *tfm &tfm->crt_flags); } -static int ecb_encrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_encrypt, - ecb_process, 1, NULL); -} +DEF_TFM_FUNCTION(ecb_encrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_ENCRYPT, NULL); +DEF_TFM_FUNCTION(ecb_decrypt, CRYPTO_TFM_MODE_ECB, CRYPTO_DIR_DECRYPT, NULL); -static int ecb_decrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_decrypt, - ecb_process, 1, NULL); -} - -static int cbc_encrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_encrypt, - cbc_process, 1, tfm->crt_cipher.cit_iv); -} - -static int cbc_encrypt_iv(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes, u8 *iv) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_encrypt, - cbc_process, 1, iv); -} - -static int cbc_decrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_decrypt, - cbc_process, 0, tfm->crt_cipher.cit_iv); -} - -static int cbc_decrypt_iv(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes, u8 *iv) -{ - return crypt(tfm, dst, src, nbytes, - tfm->__crt_alg->cra_cipher.cia_decrypt, - cbc_process, 0, iv); -} - -static int nocrypt(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes) -{ - return -ENOSYS; -} - -static int nocrypt_iv(struct crypto_tfm *tfm, - struct scatterlist *dst, - struct scatterlist *src, - unsigned int nbytes, u8 *iv) -{ - return -ENOSYS; -} +DEF_TFM_FUNCTION(cbc_encrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cbc_encrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(cbc_decrypt, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cbc_decrypt_iv, CRYPTO_TFM_MODE_CBC, CRYPTO_DIR_DECRYPT, iv); + +DEF_TFM_FUNCTION(cfb_encrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cfb_encrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(cfb_decrypt, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(cfb_decrypt_iv, CRYPTO_TFM_MODE_CFB, CRYPTO_DIR_DECRYPT, iv); + +DEF_TFM_FUNCTION(ofb_encrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ofb_encrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(ofb_decrypt, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ofb_decrypt_iv, CRYPTO_TFM_MODE_OFB, CRYPTO_DIR_DECRYPT, iv); + +DEF_TFM_FUNCTION(ctr_encrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ctr_encrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_ENCRYPT, iv); +DEF_TFM_FUNCTION(ctr_decrypt, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, tfm->crt_cipher.cit_iv); +DEF_TFM_FUNCTION_IV(ctr_decrypt_iv, CRYPTO_TFM_MODE_CTR, CRYPTO_DIR_DECRYPT, iv); int crypto_init_cipher_flags(struct crypto_tfm *tfm, u32 flags) { @@ -245,17 +309,24 @@ int crypto_init_cipher_ops(struct crypto break; case CRYPTO_TFM_MODE_CFB: - ops->cit_encrypt = nocrypt; - ops->cit_decrypt = nocrypt; - ops->cit_encrypt_iv = nocrypt_iv; - ops->cit_decrypt_iv = nocrypt_iv; + ops->cit_encrypt = cfb_encrypt; + ops->cit_decrypt = cfb_decrypt; + ops->cit_encrypt_iv = cfb_encrypt_iv; + ops->cit_decrypt_iv = cfb_decrypt_iv; + break; + + case CRYPTO_TFM_MODE_OFB: + ops->cit_encrypt = ofb_encrypt; + ops->cit_decrypt = ofb_decrypt; + ops->cit_encrypt_iv = ofb_encrypt_iv; + ops->cit_decrypt_iv = ofb_decrypt_iv; break; case CRYPTO_TFM_MODE_CTR: - ops->cit_encrypt = nocrypt; - ops->cit_decrypt = nocrypt; - ops->cit_encrypt_iv = nocrypt_iv; - ops->cit_decrypt_iv = nocrypt_iv; + ops->cit_encrypt = ctr_encrypt; + ops->cit_decrypt = ctr_decrypt; + ops->cit_encrypt_iv = ctr_encrypt_iv; + ops->cit_decrypt_iv = ctr_decrypt_iv; break; default: ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 13:10 ` [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers " Michal Ludvig @ 2005-01-14 14:20 ` Fruhwirth Clemens 2005-01-14 16:40 ` Michal Ludvig 0 siblings, 1 reply; 18+ messages in thread From: Fruhwirth Clemens @ 2005-01-14 14:20 UTC (permalink / raw) To: Michal Ludvig Cc: Andrew Morton, James Morris, cryptoapi, David S. Miller, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4498 bytes --] On Fri, 2005-01-14 at 14:10 +0100, Michal Ludvig wrote: > This patch extends crypto/cipher.c for offloading the whole chaining modes > to e.g. hardware crypto accelerators. It is much faster to let the > hardware do all the chaining if it can do so. Is there any connection to Evgeniy Polyakov's acrypto work? It appears, that there are two project for one objective. Would be nice to see both parties pulling on one string. > + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > + size_t nbytes, int encdec, int inplace); > + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > + size_t nbytes, int encdec, int inplace); > + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > + size_t nbytes, int encdec, int inplace); > + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > + size_t nbytes, int encdec, int inplace); > + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > + size_t nbytes, int encdec, int inplace); What's the use of adding mode specific functions to the tfm struct? And why do they all have the same function type? For instance, the "iv" or "inplace" argument is meaningless for ECB. Have a look at http://clemens.endorphin.org/patches/lrw/2-tweakable-cipher-interface.diff This patch takes the following approach to handle the cipher mode/interface issue: Every mode is associated with one or more interfaces. This interface is either cit_encrypt, cit_encrypt_iv, or cit_encrypt_tweaks. How these interfaces are associated with cipher modes, is handled in crypto_init_cipher_flags. Except for CBC, every mode associates with just one interface. In CBC, the CryptoAPI caller can use the IV interface to supply an IV, or use the current tfm's IV by using cit_encrypt instead of cit_encrypt_iv. I don't see a gain to through dozens of pointers into the tfm, as a tfm is always assigned a single mode. > /* > * Generic encrypt/decrypt wrapper for ciphers, handles operations across > @@ -47,22 +101,101 @@ static inline void xor_128(u8 *a, const > static int crypt(struct crypto_tfm *tfm, > struct scatterlist *dst, > struct scatterlist *src, > - unsigned int nbytes, cryptfn_t crfn, > - procfn_t prfn, int enc, void *info) Your patch heavily interferes with my cleanup patch for crypt(..). To put it briefly, I consider crypt(..) a mess. The function definition of crypto and the procfn_t function is just a patchwork of stuff, added when needed. I've rewritten a generic scatterwalker, that's a generic replacement for crypto, that can apply any processing function with arbitrary argument length to the data associated with a set of scatterlists. I think this function shouldn't be in crypto/ but in some more generic location, as I think it could be useful for much more things. http://clemens.endorphin.org/patches/lrw/1-generic-scatterwalker.diff is the generic scatterwalk patch. int scatterwalk_walker_generic(void (function)(void *priv, int length, void **buflist), void *priv, int steps, int nsl, ...) "function" is applied to the scatterlist data. "priv" is a private data structure for bookkeeping. It's supplied to the function as first parameter. "steps" is the number of times function is called. "nsl" is the number of scatterlists following. After "nsl", the scatterlists follow in a tuple of data: <struct scatterlist *, int steplength, int ioflag> ECB, for example: ... struct ecb_process_priv priv = { .tfm = tfm, .crfn = tfm->__crt_alg->cra_cipher.cia_decrypt, }; int bsize = crypto_tfm_alg_blocksize(tfm); scatterwalk_walker_generic(ecb_process_gw, // processing function &priv, // private data nbytes/bsize, // number of steps 2, // number of scatterlists dst, bsize, 1, // first, ioflag set to output src, bsize, 0); // second, ioflag set to input .. static void ecb_process_gw(void *_priv, int nsg, void **buf) { struct ecb_process_priv *priv = (struct ecb_process_priv *)_priv; char *dst = buf[0]; // pointer to correctly kmapped and copied dst char *src = buf[1]; // pointer to correctly kmapped and copied src priv->crfn(crypto_tfm_ctx(priv->tfm), dst, src); } Well, I recognize that I'm somehow off-topic now. But, it demonstrates clearly, why we should get rid of crypt(..) and replace it with something more generic. -- Fruhwirth Clemens <clemens@endorphin.org> http://clemens.endorphin.org [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 14:20 ` Fruhwirth Clemens @ 2005-01-14 16:40 ` Michal Ludvig 2005-01-15 12:45 ` Fruhwirth Clemens 0 siblings, 1 reply; 18+ messages in thread From: Michal Ludvig @ 2005-01-14 16:40 UTC (permalink / raw) To: Fruhwirth Clemens Cc: Andrew Morton, James Morris, cryptoapi, David S. Miller, linux-kernel On Fri, 14 Jan 2005, Fruhwirth Clemens wrote: > On Fri, 2005-01-14 at 14:10 +0100, Michal Ludvig wrote: > > > This patch extends crypto/cipher.c for offloading the whole chaining modes > > to e.g. hardware crypto accelerators. It is much faster to let the > > hardware do all the chaining if it can do so. > > Is there any connection to Evgeniy Polyakov's acrypto work? It appears, > that there are two project for one objective. Would be nice to see both > parties pulling on one string. These projects do not compete at all. Evgeniy's work is a complete replacement for current cryptoapi and brings the asynchronous operations at the first place. My patches are simple and straightforward extensions to current cryptoapi that enable offloading the chaining to hardware where possible. > > + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > + size_t nbytes, int encdec, int inplace); > > + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > + size_t nbytes, int encdec, int inplace); > > + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > + size_t nbytes, int encdec, int inplace); > > + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > + size_t nbytes, int encdec, int inplace); > > + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > + size_t nbytes, int encdec, int inplace); > > What's the use of adding mode specific functions to the tfm struct? And > why do they all have the same function type? For instance, the "iv" or > "inplace" argument is meaningless for ECB. The prototypes must be the same in my implementation, because in crypt() only a pointer to the appropriate mode function is taken and further used as "(func*)(arg, arg, ...)". BTW these functions are not added to "struct crypto_tfm", but to "struct crypto_alg" which describes what a particular module supports (i.e. along with the block size, algorithm name, etc). In this case it can say that e.g. padlock.ko supports encryption in CBC mode in addition to a common single-block processing. BTW I'll look at the pointers of the tweakable api over the weekend... Michal Ludvig -- * A mouse is a device used to point at the xterm you want to type in. * Personal homepage - http://www.logix.cz/michal ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 16:40 ` Michal Ludvig @ 2005-01-15 12:45 ` Fruhwirth Clemens 2005-01-18 16:49 ` James Morris 0 siblings, 1 reply; 18+ messages in thread From: Fruhwirth Clemens @ 2005-01-15 12:45 UTC (permalink / raw) To: Michal Ludvig Cc: Andrew Morton, James Morris, cryptoapi, David S. Miller, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4730 bytes --] On Fri, 2005-01-14 at 17:40 +0100, Michal Ludvig wrote: > > Is there any connection to Evgeniy Polyakov's acrypto work? It appears, > > that there are two project for one objective. Would be nice to see both > > parties pulling on one string. > > These projects do not compete at all. Evgeniy's work is a complete > replacement for current cryptoapi and brings the asynchronous > operations at the first place. My patches are simple and straightforward > extensions to current cryptoapi that enable offloading the chaining to > hardware where possible. Fine, I just saw in Evgeniy's reply, that he took your padlock implementation. I thought both of you have been working on different implementations. But actually both aim for the same goal. Hardware crypto-offloading. With padlock the need for a async interface isn't that big, because it's not "off-loading" as it's done on the same chip and in the same thread. However, developing two different APIs isn't particular efficient. I know, at the moment there isn't much choice, as J.Morris hasn't commited to acrypto in anyway. But I think it would be good to replace the synchronized CryptoAPI implementation altogether, put the missing internals of CryptoAPI into acrypto, and back the interfaces of CryptoAPI with small stubs, that do like somereturnvalue synchronized_interface(..) { acrypto_kick_some_operation(acrypto_struct); wait_for_completion(acrypto_struct); return fetch_result(acrypto_struct); } The other way round, a asynchron interface using a synchronized interface doesn't seem natural to me. (That doesn't mean I oppose your patches, merely that we should start to think in different directions) > > > + void (*cia_ecb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > > + size_t nbytes, int encdec, int inplace); > > > + void (*cia_cbc)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > > + size_t nbytes, int encdec, int inplace); > > > + void (*cia_cfb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > > + size_t nbytes, int encdec, int inplace); > > > + void (*cia_ofb)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > > + size_t nbytes, int encdec, int inplace); > > > + void (*cia_ctr)(void *ctx, u8 *dst, const u8 *src, u8 *iv, > > > + size_t nbytes, int encdec, int inplace); > > > > What's the use of adding mode specific functions to the tfm struct? And > > why do they all have the same function type? For instance, the "iv" or > > "inplace" argument is meaningless for ECB. > > The prototypes must be the same in my implementation, because in crypt() > only a pointer to the appropriate mode function is taken and further used > as "(func*)(arg, arg, ...)". > > BTW these functions are not added to "struct crypto_tfm", but to "struct > crypto_alg" which describes what a particular module supports (i.e. along > with the block size, algorithm name, etc). In this case it can say that > e.g. padlock.ko supports encryption in CBC mode in addition to a common > single-block processing. Err, right. I overlooked that it's cia and not cit. However, I don't like the idea of extending structs when there is a new cipher mode. I think the API should not have to be extended for every addition, but should be designed for such extension right from the start. What about a "selector" function, which returns the appropriate encryption function for a mode? typedef void (procfn_t)(struct crypto_tfm *, u8 *, u8*, cryptfn_t, int enc, void *, int); put procfn_t (*cia_modesel)(u32 function, int iface, int encdec); into struct crypto_alg; then in crypto_init_cipher_ops, instead of switch (tfm->crt_cipher.cit_mode) { .. case CRYPTO_TFM_MODE_CFB: ops->cit_encrypt = cfb_encrypt; ops->cit_decrypt = cfb_decrypt; .. } we do, struct cipher_alg *cia = &tfm->__crt_alg->cra_cipher; switch (tfm->crt_cipher.cit_mode) { .. case CRYPTO_TFM_MODE_CFB: ops->cit_encrypt = cia->cia_modesel(cit_mode, 0, IFACE_ECB); ops->cit_decrypt = cia->cia_modesel(cit_mode, 1, IFACE_ECB); ops->cit_encrypt_iv = cia->cia_modesel(cit_mode, 0, IFACE_IV); ops->cit_decrypt_iv = cia->cia_modesel(cit_mode, 1, IFACE_IV); .. Alternatively, we could also add a lookup table. But I like this better, since this is much easier to read for people, and tfm's aren't alloced that often. Probably, we can add a wrapper for cia_modesel, that when cia_modesel is NULL, it falls back to the old behaviour. This way, we don't have to patch all algorithm implementations to include cia_modesel. How you like that idea? -- Fruhwirth Clemens <clemens@endorphin.org> http://clemens.endorphin.org [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-15 12:45 ` Fruhwirth Clemens @ 2005-01-18 16:49 ` James Morris 2005-01-20 3:30 ` David McCullough 0 siblings, 1 reply; 18+ messages in thread From: James Morris @ 2005-01-18 16:49 UTC (permalink / raw) To: Fruhwirth Clemens Cc: Michal Ludvig, Andrew Morton, cryptoapi, David S. Miller, linux-kernel On Sat, 15 Jan 2005, Fruhwirth Clemens wrote: > However, developing two different APIs isn't particular efficient. I > know, at the moment there isn't much choice, as J.Morris hasn't commited > to acrypto in anyway. There is also the OCF port (OpenBSD crypto framework) to consider, if permission to dual license from the original authors can be obtained. - James -- James Morris <jmorris@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-18 16:49 ` James Morris @ 2005-01-20 3:30 ` David McCullough 2005-01-20 13:47 ` James Morris 0 siblings, 1 reply; 18+ messages in thread From: David McCullough @ 2005-01-20 3:30 UTC (permalink / raw) To: James Morris Cc: Fruhwirth Clemens, Andrew Morton, linux-kernel, cryptoapi, Michal Ludvig, David S. Miller Jivin James Morris lays it down ... > On Sat, 15 Jan 2005, Fruhwirth Clemens wrote: > > > However, developing two different APIs isn't particular efficient. I > > know, at the moment there isn't much choice, as J.Morris hasn't commited > > to acrypto in anyway. > > There is also the OCF port (OpenBSD crypto framework) to consider, if > permission to dual license from the original authors can be obtained. For anyone looking for the OCF port for linux, you can find the latest release here: http://lists.logix.cz/pipermail/cryptoapi/2004/000261.html One of the drivers uses the existing kernel crypto API to implement a SW crypto engine for OCF. As for permission to use a dual license, I will gladly approach the authors if others feel it is important to know the possibility of it at this point, Cheers, Davidm -- David McCullough, davidm@snapgear.com Ph:+61 7 34352815 http://www.SnapGear.com Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-20 3:30 ` David McCullough @ 2005-01-20 13:47 ` James Morris 2005-03-03 10:50 ` David McCullough 0 siblings, 1 reply; 18+ messages in thread From: James Morris @ 2005-01-20 13:47 UTC (permalink / raw) To: David McCullough Cc: Fruhwirth Clemens, Andrew Morton, linux-kernel, cryptoapi, Michal Ludvig, David S. Miller On Thu, 20 Jan 2005, David McCullough wrote: > As for permission to use a dual license, I will gladly approach the > authors if others feel it is important to know the possibility of it at this > point, Please do so. It would be useful to have the option of using an already developed, debugged and analyzed framework. - James -- James Morris <jmorris@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-20 13:47 ` James Morris @ 2005-03-03 10:50 ` David McCullough 0 siblings, 0 replies; 18+ messages in thread From: David McCullough @ 2005-03-03 10:50 UTC (permalink / raw) To: James Morris Cc: Fruhwirth Clemens, Andrew Morton, linux-kernel, cryptoapi, Michal Ludvig, David S. Miller Jivin James Morris lays it down ... > On Thu, 20 Jan 2005, David McCullough wrote: > > > As for permission to use a dual license, I will gladly approach the > > authors if others feel it is important to know the possibility of it at this > > point, > > Please do so. It would be useful to have the option of using an already > developed, debugged and analyzed framework. Ok, I finally managed to get responses from all the individual contributors, though none of the corporations contacted have responded. While a good number of those contacted were happy to dual-license, most are concerned that changes made under the GPL will not be available for use in BSD. A couple were a definate no. I have had offers to rewrite any portions that can not be dual-licensed, but I think that is overkill for now unless there is significant interest in taking that path. Fortunately we have been able to obtain some funding to complete a large amount of work on the project so it should have some nice progress in the next couple of weeks as that ramps up :-) Cheers, Davidm -- David McCullough, davidm@snapgear.com Ph:+61 7 34352815 http://www.SnapGear.com Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] PadLock processing multiple blocks at a time 2005-01-11 17:03 ` PadLock processing multiple blocks at a time Michal Ludvig 2005-01-11 17:08 ` [PATCH 1/2] " Michal Ludvig @ 2005-01-11 17:08 ` Michal Ludvig 2005-01-14 3:05 ` Andrew Morton 2005-01-14 13:15 ` [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once Michal Ludvig 1 sibling, 2 replies; 18+ messages in thread From: Michal Ludvig @ 2005-01-11 17:08 UTC (permalink / raw) To: David S. Miller; +Cc: jmorris, cryptoapi, linux-kernel # # Update to padlock-aes.c that enables processing of the whole # buffer of data at once with the given chaining mode (e.g. CBC). # # Signed-off-by: Michal Ludvig <michal@logix.cz> # Index: linux-2.6.10/drivers/crypto/padlock-aes.c =================================================================== --- linux-2.6.10.orig/drivers/crypto/padlock-aes.c 2005-01-07 17:26:42.000000000 +0100 +++ linux-2.6.10/drivers/crypto/padlock-aes.c 2005-01-10 17:59:17.000000000 +0100 @@ -369,19 +369,54 @@ aes_set_key(void *ctx_arg, const uint8_t /* ====== Encryption/decryption routines ====== */ -/* This is the real call to PadLock. */ -static inline void +/* These are the real calls to PadLock. */ +static inline void * padlock_xcrypt_ecb(uint8_t *input, uint8_t *output, uint8_t *key, - void *control_word, uint32_t count) + uint8_t *iv, void *control_word, uint32_t count) { asm volatile ("pushfl; popfl"); /* enforce key reload. */ asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */ : "+S"(input), "+D"(output) : "d"(control_word), "b"(key), "c"(count)); + return NULL; +} + +static inline void * +padlock_xcrypt_cbc(uint8_t *input, uint8_t *output, uint8_t *key, + uint8_t *iv, void *control_word, uint32_t count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */ + : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv) + : "d"(control_word), "b"(key), "c"(count)); + return iv; +} + +static inline void * +padlock_xcrypt_cfb(uint8_t *input, uint8_t *output, uint8_t *key, + uint8_t *iv, void *control_word, uint32_t count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xe0" /* rep xcryptcfb */ + : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv) + : "d"(control_word), "b"(key), "c"(count)); + return iv; +} + +static inline void * +padlock_xcrypt_ofb(uint8_t *input, uint8_t *output, uint8_t *key, + uint8_t *iv, void *control_word, uint32_t count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xe8" /* rep xcryptofb */ + : "=m"(*output), "+S"(input), "+D"(output), "+a"(iv) + : "d"(control_word), "b"(key), "c"(count)); + return iv; } static void -aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, int encdec) +aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, + uint8_t *iv_arg, size_t nbytes, int encdec, int mode) { /* Don't blindly modify this structure - the items must fit on 16-Bytes boundaries! */ @@ -419,21 +454,126 @@ aes_padlock(void *ctx_arg, uint8_t *out_ else key = ctx->D; - memcpy(data->buf, in_arg, AES_BLOCK_SIZE); - padlock_xcrypt_ecb(data->buf, data->buf, key, &data->cword, 1); - memcpy(out_arg, data->buf, AES_BLOCK_SIZE); + if (nbytes == AES_BLOCK_SIZE) { + /* Processing one block only => ECB is enough */ + memcpy(data->buf, in_arg, AES_BLOCK_SIZE); + padlock_xcrypt_ecb(data->buf, data->buf, key, NULL, + &data->cword, 1); + memcpy(out_arg, data->buf, AES_BLOCK_SIZE); + } else { + /* Processing multiple blocks at once */ + uint8_t *in, *out, *iv; + int gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL; + void *index = NULL; + + if (unlikely(((long)in_arg) & 0x0F)) { + in = crypto_aligned_kmalloc(nbytes, gfp, 16, &index); + memcpy(in, in_arg, nbytes); + } + else + in = (uint8_t*)in_arg; + + if (unlikely(((long)out_arg) & 0x0F)) { + if (index) + out = in; /* xcrypt can work "in place" */ + else + out = crypto_aligned_kmalloc(nbytes, gfp, 16, + &index); + } + else + out = out_arg; + + /* Always make a local copy of IV - xcrypt may change it! */ + iv = data->buf; + if (iv_arg) + memcpy(iv, iv_arg, AES_BLOCK_SIZE); + + switch (mode) { + case CRYPTO_TFM_MODE_ECB: + iv = padlock_xcrypt_ecb(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + case CRYPTO_TFM_MODE_CBC: + iv = padlock_xcrypt_cbc(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + case CRYPTO_TFM_MODE_CFB: + iv = padlock_xcrypt_cfb(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + case CRYPTO_TFM_MODE_OFB: + iv = padlock_xcrypt_ofb(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + default: + BUG(); + } + + /* Back up IV */ + if (iv && iv_arg) + memcpy(iv_arg, iv, AES_BLOCK_SIZE); + + /* Copy the 16-Byte aligned output to the caller's buffer. */ + if (out != out_arg) + memcpy(out_arg, out, nbytes); + + if (index) + kfree(index); + } +} + +static void +aes_padlock_ecb(void *ctx, uint8_t *dst, const uint8_t *src, + uint8_t *iv, size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, NULL, nbytes, encdec, + CRYPTO_TFM_MODE_ECB); +} + +static void +aes_padlock_cbc(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv, + size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, + CRYPTO_TFM_MODE_CBC); +} + +static void +aes_padlock_cfb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv, + size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, + CRYPTO_TFM_MODE_CFB); +} + +static void +aes_padlock_ofb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv, + size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, + CRYPTO_TFM_MODE_OFB); } static void aes_encrypt(void *ctx_arg, uint8_t *out, const uint8_t *in) { - aes_padlock(ctx_arg, out, in, CRYPTO_DIR_ENCRYPT); + aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE, + CRYPTO_DIR_ENCRYPT, CRYPTO_TFM_MODE_ECB); } static void aes_decrypt(void *ctx_arg, uint8_t *out, const uint8_t *in) { - aes_padlock(ctx_arg, out, in, CRYPTO_DIR_DECRYPT); + aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE, + CRYPTO_DIR_DECRYPT, CRYPTO_TFM_MODE_ECB); } static struct crypto_alg aes_alg = { @@ -454,9 +594,25 @@ static struct crypto_alg aes_alg = { } }; +static int disable_multiblock = 0; +MODULE_PARM(disable_multiblock, "i"); +MODULE_PARM_DESC(disable_multiblock, + "Disable encryption of whole multiblock buffers."); + int __init padlock_init_aes(void) { - printk(KERN_NOTICE PFX "Using VIA PadLock ACE for AES algorithm.\n"); + if (!disable_multiblock) { + aes_alg.cra_u.cipher.cia_max_nbytes = (size_t)-1; + aes_alg.cra_u.cipher.cia_req_align = 16; + aes_alg.cra_u.cipher.cia_ecb = aes_padlock_ecb; + aes_alg.cra_u.cipher.cia_cbc = aes_padlock_cbc; + aes_alg.cra_u.cipher.cia_cfb = aes_padlock_cfb; + aes_alg.cra_u.cipher.cia_ofb = aes_padlock_ofb; + } + + printk(KERN_NOTICE PFX + "Using VIA PadLock ACE for AES algorithm%s.\n", + disable_multiblock ? "" : " (multiblock)"); gen_tabs(); return crypto_register_alg(&aes_alg); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] PadLock processing multiple blocks at a time 2005-01-11 17:08 ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig @ 2005-01-14 3:05 ` Andrew Morton 2005-01-14 13:15 ` [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once Michal Ludvig 1 sibling, 0 replies; 18+ messages in thread From: Andrew Morton @ 2005-01-14 3:05 UTC (permalink / raw) To: Michal Ludvig; +Cc: davem, jmorris, cryptoapi, linux-kernel Michal Ludvig <michal@logix.cz> wrote: > > # > # Update to padlock-aes.c that enables processing of the whole > # buffer of data at once with the given chaining mode (e.g. CBC). > # Please don't email different patche sunder the same Subject:. Choose a Subject: which is meaningful for each patch? This one kills gcc-2.95.x: drivers/crypto/padlock-aes.c: In function `aes_padlock': drivers/crypto/padlock-aes.c:391: impossible register constraint in `asm' drivers/crypto/padlock-aes.c:402: impossible register constraint in `asm' drivers/crypto/padlock-aes.c:413: impossible register constraint in `asm' drivers/crypto/padlock-aes.c:391: `asm' needs too many reloads ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once 2005-01-11 17:08 ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig 2005-01-14 3:05 ` Andrew Morton @ 2005-01-14 13:15 ` Michal Ludvig 1 sibling, 0 replies; 18+ messages in thread From: Michal Ludvig @ 2005-01-14 13:15 UTC (permalink / raw) To: Andrew Morton; +Cc: David S. Miller, jmorris, cryptoapi, linux-kernel Hi all, Update to padlock-aes.c that enables processing of the whole buffer of data at once with the given chaining mode (e.g. CBC). It brings much higher speed over the case where the chaining is done in software by CryptoAPI. This is updated revision of the patch. Now it compiles even with GCC 2.95.3. Signed-off-by: Michal Ludvig <michal@logix.cz> --- padlock-aes.c | 176 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 166 insertions(+), 10 deletions(-) Index: linux-2.6.10/drivers/crypto/padlock-aes.c =================================================================== --- linux-2.6.10.orig/drivers/crypto/padlock-aes.c 2005-01-11 14:01:05.000000000 +0100 +++ linux-2.6.10/drivers/crypto/padlock-aes.c 2005-01-11 23:40:26.000000000 +0100 @@ -369,19 +369,54 @@ aes_set_key(void *ctx_arg, const uint8_t /* ====== Encryption/decryption routines ====== */ -/* This is the real call to PadLock. */ -static inline void +/* These are the real calls to PadLock. */ +static inline void * padlock_xcrypt_ecb(uint8_t *input, uint8_t *output, uint8_t *key, - void *control_word, uint32_t count) + uint8_t *iv, void *control_word, uint32_t count) { asm volatile ("pushfl; popfl"); /* enforce key reload. */ asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */ : "+S"(input), "+D"(output) : "d"(control_word), "b"(key), "c"(count)); + return NULL; +} + +static inline void * +padlock_xcrypt_cbc(uint8_t *input, uint8_t *output, uint8_t *key, + uint8_t *iv, void *control_word, uint32_t count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */ + : "+S"(input), "+D"(output), "+a"(iv) + : "d"(control_word), "b"(key), "c"(count)); + return iv; +} + +static inline void * +padlock_xcrypt_cfb(uint8_t *input, uint8_t *output, uint8_t *key, + uint8_t *iv, void *control_word, uint32_t count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xe0" /* rep xcryptcfb */ + : "+S"(input), "+D"(output), "+a"(iv) + : "d"(control_word), "b"(key), "c"(count)); + return iv; +} + +static inline void * +padlock_xcrypt_ofb(uint8_t *input, uint8_t *output, uint8_t *key, + uint8_t *iv, void *control_word, uint32_t count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xe8" /* rep xcryptofb */ + : "+S"(input), "+D"(output), "+a"(iv) + : "d"(control_word), "b"(key), "c"(count)); + return iv; } static void -aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, int encdec) +aes_padlock(void *ctx_arg, uint8_t *out_arg, const uint8_t *in_arg, + uint8_t *iv_arg, size_t nbytes, int encdec, int mode) { /* Don't blindly modify this structure - the items must fit on 16-Bytes boundaries! */ @@ -419,21 +454,126 @@ aes_padlock(void *ctx_arg, uint8_t *out_ else key = ctx->D; - memcpy(data->buf, in_arg, AES_BLOCK_SIZE); - padlock_xcrypt_ecb(data->buf, data->buf, key, &data->cword, 1); - memcpy(out_arg, data->buf, AES_BLOCK_SIZE); + if (nbytes == AES_BLOCK_SIZE) { + /* Processing one block only => ECB is enough */ + memcpy(data->buf, in_arg, AES_BLOCK_SIZE); + padlock_xcrypt_ecb(data->buf, data->buf, key, NULL, + &data->cword, 1); + memcpy(out_arg, data->buf, AES_BLOCK_SIZE); + } else { + /* Processing multiple blocks at once */ + uint8_t *in, *out, *iv; + int gfp = in_atomic() ? GFP_ATOMIC : GFP_KERNEL; + void *index = NULL; + + if (unlikely(((long)in_arg) & 0x0F)) { + in = crypto_aligned_kmalloc(nbytes, gfp, 16, &index); + memcpy(in, in_arg, nbytes); + } + else + in = (uint8_t*)in_arg; + + if (unlikely(((long)out_arg) & 0x0F)) { + if (index) + out = in; /* xcrypt can work "in place" */ + else + out = crypto_aligned_kmalloc(nbytes, gfp, 16, + &index); + } + else + out = out_arg; + + /* Always make a local copy of IV - xcrypt may change it! */ + iv = data->buf; + if (iv_arg) + memcpy(iv, iv_arg, AES_BLOCK_SIZE); + + switch (mode) { + case CRYPTO_TFM_MODE_ECB: + iv = padlock_xcrypt_ecb(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + case CRYPTO_TFM_MODE_CBC: + iv = padlock_xcrypt_cbc(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + case CRYPTO_TFM_MODE_CFB: + iv = padlock_xcrypt_cfb(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + case CRYPTO_TFM_MODE_OFB: + iv = padlock_xcrypt_ofb(in, out, key, iv, + &data->cword, + nbytes/AES_BLOCK_SIZE); + break; + + default: + BUG(); + } + + /* Back up IV */ + if (iv && iv_arg) + memcpy(iv_arg, iv, AES_BLOCK_SIZE); + + /* Copy the 16-Byte aligned output to the caller's buffer. */ + if (out != out_arg) + memcpy(out_arg, out, nbytes); + + if (index) + kfree(index); + } +} + +static void +aes_padlock_ecb(void *ctx, uint8_t *dst, const uint8_t *src, + uint8_t *iv, size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, NULL, nbytes, encdec, + CRYPTO_TFM_MODE_ECB); +} + +static void +aes_padlock_cbc(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv, + size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, + CRYPTO_TFM_MODE_CBC); +} + +static void +aes_padlock_cfb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv, + size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, + CRYPTO_TFM_MODE_CFB); +} + +static void +aes_padlock_ofb(void *ctx, uint8_t *dst, const uint8_t *src, uint8_t *iv, + size_t nbytes, int encdec, int inplace) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, + CRYPTO_TFM_MODE_OFB); } static void aes_encrypt(void *ctx_arg, uint8_t *out, const uint8_t *in) { - aes_padlock(ctx_arg, out, in, CRYPTO_DIR_ENCRYPT); + aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE, + CRYPTO_DIR_ENCRYPT, CRYPTO_TFM_MODE_ECB); } static void aes_decrypt(void *ctx_arg, uint8_t *out, const uint8_t *in) { - aes_padlock(ctx_arg, out, in, CRYPTO_DIR_DECRYPT); + aes_padlock(ctx_arg, out, in, NULL, AES_BLOCK_SIZE, + CRYPTO_DIR_DECRYPT, CRYPTO_TFM_MODE_ECB); } static struct crypto_alg aes_alg = { @@ -454,9 +594,25 @@ static struct crypto_alg aes_alg = { } }; +static int disable_multiblock = 0; +MODULE_PARM(disable_multiblock, "i"); +MODULE_PARM_DESC(disable_multiblock, + "Disable encryption of whole multiblock buffers."); + int __init padlock_init_aes(void) { - printk(KERN_NOTICE PFX "Using VIA PadLock ACE for AES algorithm.\n"); + if (!disable_multiblock) { + aes_alg.cra_u.cipher.cia_max_nbytes = (size_t)-1; + aes_alg.cra_u.cipher.cia_req_align = 16; + aes_alg.cra_u.cipher.cia_ecb = aes_padlock_ecb; + aes_alg.cra_u.cipher.cia_cbc = aes_padlock_cbc; + aes_alg.cra_u.cipher.cia_cfb = aes_padlock_cfb; + aes_alg.cra_u.cipher.cia_ofb = aes_padlock_ofb; + } + + printk(KERN_NOTICE PFX + "Using VIA PadLock ACE for AES algorithm%s.\n", + disable_multiblock ? "" : " (multiblock)"); gen_tabs(); return crypto_register_alg(&aes_alg); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
@ 2005-01-14 22:31 Evgeniy Polyakov
2005-01-14 22:31 ` Evgeniy Polyakov
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Evgeniy Polyakov @ 2005-01-14 22:31 UTC (permalink / raw)
To: linux-kernel
Cc: Michal Ludvig, Fruhwirth Clemens, Andrew Morton, James Morris,
cryptoapi, David S. Miller, Evgeniy Polyakov
Size was too big for mail lists, sorry.
Splitted to several messages.
Begin forwarded message:
Date: Fri, 14 Jan 2005 23:43:56 +0300
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Michal Ludvig <michal@logix.cz>
Cc: Fruhwirth Clemens <clemens@endorphin.org>, Andrew Morton <akpm@osdl.org>, James Morris <jmorris@redhat.com>, cryptoapi@lists.logix.cz, "David S. Miller" <davem@davemloft.net>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time
On Fri, 14 Jan 2005 17:40:39 +0100 (CET)
Michal Ludvig <michal@logix.cz> wrote:
> On Fri, 14 Jan 2005, Fruhwirth Clemens wrote:
>
> > On Fri, 2005-01-14 at 14:10 +0100, Michal Ludvig wrote:
> >
> > > This patch extends crypto/cipher.c for offloading the whole chaining modes
> > > to e.g. hardware crypto accelerators. It is much faster to let the
> > > hardware do all the chaining if it can do so.
> >
> > Is there any connection to Evgeniy Polyakov's acrypto work? It appears,
> > that there are two project for one objective. Would be nice to see both
> > parties pulling on one string.
>
> These projects do not compete at all. Evgeniy's work is a complete
> replacement for current cryptoapi and brings the asynchronous
> operations at the first place. My patches are simple and straightforward
> extensions to current cryptoapi that enable offloading the chaining to
> hardware where possible.
Actualy acrypto genetically allows to use such hardware acceleration.
I do not call it a feature but a logical consequence from design.
When hardware has access to the queue of requests it can do anything
it wants to properly complete it's sessions.
For example use hardware accelerated block chaining encryption...
Probably it is time to show the work.
Attached files:
bd archive - simple in-mamory block device used for test. I currently work
on creating modular loop device replacement based on bd, which could allow
network block device to be removed(btw, it is broken at least in 2.6.9)
and also allow acrypto module to be used with various tweakable ciphers.
I hope that system will provide more flexible control over dataflow
than loop device currently does.
I recomend following interesting reading about tweaking ciphers:
http://clemens.endorphin.org/cryptography
acrypto archive - asynchronous crypto layer, the latest(third) reincarnation(announce below).
It also has asynchronous and synchronous test crypto providers and test crypto
consumer module.
hifn archive - driver for HIFN 7955/7956 (7956 was not run on Clemens' setup,
hopefully patches sent to him fixed that).
This is work in progress and currently works only on low load
(about one session per 10 msec).
via-padlock - patch to enable xcrypt instructions on various VIA CPUs (for example Nehemiah family).
It is totally Michal's work, I've just ported it to acrypto.
Not tested.
fcrypt - driver for CE-InfoSys FastCrypt PCI card equipped with a SuperCrypt CE99C003B chip that
can offload DES and 3DES encryption from the CPU.
It is totally Michal's work too, I've just ported it to acrypto.
Not tested.
fcrypt and via-padlock can be found at Michal's Ludvig page: http://www.logix.cz
Btw, I've made several changes in acrypto for proper multi scatterlist processing,
so above drivers will not compile clearly, but I suspect noone will apply it today,
so it is currently for examination,
I will fix them all after finish relaxing after vocations.
I've added disk write emulation into async and sync crypto providers, and what do we see:
actual disk write speed is about 46 kb/msec, encryption speed is about 68 kb/msec.
Encryption of one byte takes ~0.014 usec, disk writing thus will take ~0.014/46*68 = 0.019 usec.
With such delay I've gotten following numbers on 4-way system:
scaled to 4 processors async_provider: 800 Mb in 12.6376 sec.
scaled to 1 processor async_provider: 800 Mb in 12.1828 sec.
sync_provider: 800 Mb in 13.5662 sec.
Actually the former two tests with async_provider show
the same values in average when running several times.
Thus about 10% even on one CPU(strange).
I posted several times acrypto with userspace support(both direct vma/page
access and ioctl based one) to netdev@oss.sgi.com, but probably due
to message size(only acrypto patch is about 120K) it was not appeared there.
Please test and comment.
Here is an annonce:
Acrypto - asynchronous crypto layer for linux kernel 2.6
I'm pleased to announce asynchronous crypto layer for Linux kernel 2.6.
It supports following features:
- multiple asynchronous crypto device queues
- crypto session routing
- crypto session binding
- modular load balancing
- crypto session batching genetically implemented by design
- crypto session priority
- different kinds of crypto operation(RNG, asymmetrical crypto, HMAC and any other)
Some design notes:
acrypto has one main crypto session queue(double linked list, probably it should
be done like crypto_route or sk_buff queue), into which each newly allocated session
is inserted and this is a place where load balancing searches it's food. When new
session is being prepared for insertion it calls load balancer's ->find_device() method,
which should return suitable device(current simple_lb load balancer returns device with
the lowest load(device has the least number of session in it's queue)) if it exists.
After crypto_device being returned acrypto creates new crypto routing entry which points
to returned device and adds it to crypto session routing queue. Crypto session is being
inserted into device's queue according to it's priority and it is crypto device driver
that should process it's session list according to session's priority.
All insertion and deletion are guarded by appropriate locks, but session_list traversing
is not guarded in crypto_lb_thread() since session can be removed _only_ from that
function by design, so if crypto device (atomically) marks session as completed and
not being processed and use list_for_each_safe() for traversing it's queue all should be OK.
Each crypto load balancer must implement 2 methods:
->rehash() and ->find_device() which will be called from any context and under spinlock.
->rehash() method should be called to remix crypto sessions in device's queues,
for example if driver decides that it's device is broken it marks itself as broken
and load balancer(or scheduler if you like) should remove all sessions from this
queue to some other devices. If session can not be completed scheduler must mark
it as broken and complete it(by calling first broke_session() and then complete_session()
and stop_process_session()). Consumer must check if operation was successful
(and therefore session is not broken).
->find_device() method should return appropriate crypto device.
For crypto session to be successfully allocated crypto consumer must provide two structures -
struct crypto_session_initializer (hmm, why only one z?) and struct crypto_data.
struct crypto_session_initializer contains data needed to find appropriate device, like
type of operation, mode of operation, some flags(for example SESSION_BINDED, which means
that session must be bound to specified in bdev field crypto device, it is useful for TCPA/TPM),
session priority and callback which will be called after all routing for given session are finished.
struct crypto_data contains scatterlists for src, dst, key and iv. It also has void *priv
field and it's size which is allocated and may be used by any crypto agent(for example VIA
PadLock driver uses it to store aes_ctx field, crypto_session can use this field to store
some pointers needed in ->callback()).
Actually callback will be called from queue_work, but I suppose it is better to not assume
calling context.
->callback() will be called after all crypto routing for given session are done with the
same parameters as were provided in initialisation time(if session has only one routing
callback will be called with original parameters, but if it has several routes callback
will be called with parameters from the latest processed one). I believe crypto callback
should not know about crypto sessions, routings, device and so on, proper restriction is
always a good idea.
Crypto routing.
This feature allows the same session to be processed by several devices/algorithms.
For example if you need to encrypt data and then sign it in TPM device you can create
one route to encryption device and then route it to TPM device. (Note: this feature
must be discussed since there is no time slice after session allocation, only in
crypto_device->data_ready() method and there are locking issues in ->callback() method).
Crypto device.
It can be either software emulator or hardware accelerator chip(like HIFN 79*/83* or
Via PadLock ACE/RNG, or even TPM device like each IBM ThinkPad or some HP laptops have
(gentle hint: _they_ even have a _windows_ software for them, HP gimme specs :) )).
It can be registered with asynchronous crypto layer and must provide some data for it:
->data_ready() method - it is called each time new session is added to device's queue.
Array of struct crypto_capability and it's amount - struct crypto_capability describes
each operation given device can handle, and has a maximum session queue length parameter.
Note: this structure can [be extended to] include "rate" parameter to show absolute speed
of given operation in some units, which therefore can be used by scheduler(load balancer)
for proper device selection. Actually queue length can somehow reflects device's "speed".
Acrypto has full userspace support through ioctl and direct process' vmas and pages access.
It is done using ioctl() with 2 copyings from+to userspace data.
Session processing contains of 3 major parts:
1. Session creation. CRYPTO_SESSION_ALLOC ioctl.
User must provide special structure which has src, dst, key and iv data sizes and crypto
initializer(crypto operation, mode, type and priority).
2. Data filling. User must call several CRYPTO_FILL_DATA ioctls.
Each one requires data size and data type(structure crypto_user_data) and data itself.
3. Finish. User must call CRYPTO_SESSION_ADD ioctl with pointer to the are whre crypting
result must be stored.
The latter ioctl will sleep while session is being processed.
Second userspace communication mechanism is based on direct access to the process' vmas
and pages from acrypto, pointers are transferred using special kernel connector structure.
Obviously it can not be used with the most hardware and sizes more than one page, but
I like the idea itself.
Some discussion can be found at http://marc.theaimsgroup.com/?l=linux-netdev&m=109903101312733&w=2
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov @ 2005-01-14 22:31 ` Evgeniy Polyakov 2005-01-14 22:32 ` Evgeniy Polyakov ` (3 subsequent siblings) 4 siblings, 0 replies; 18+ messages in thread From: Evgeniy Polyakov @ 2005-01-14 22:31 UTC (permalink / raw) To: johnpol Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton, James Morris, cryptoapi, David S. Miller [-- Attachment #1: Type: text/plain, Size: 671 bytes --] On Sat, 15 Jan 2005 01:31:03 +0300 Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote: bd archive - simple in-mamory block device used for test. I currently work on creating modular loop device replacement based on bd, which could allow network block device to be removed(btw, it is broken at least in 2.6.9) and also allow acrypto module to be used with various tweakable ciphers. I hope that system will provide more flexible control over dataflow than loop device currently does. I recomend following interesting reading about tweaking ciphers: http://clemens.endorphin.org/cryptography Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt [-- Attachment #2: bd-14_01_2005.tar.gz --] [-- Type: application/octet-stream, Size: 3542 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov 2005-01-14 22:31 ` Evgeniy Polyakov @ 2005-01-14 22:32 ` Evgeniy Polyakov 2005-01-14 22:33 ` Evgeniy Polyakov ` (2 subsequent siblings) 4 siblings, 0 replies; 18+ messages in thread From: Evgeniy Polyakov @ 2005-01-14 22:32 UTC (permalink / raw) To: johnpol Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton, James Morris, cryptoapi, David S. Miller [-- Attachment #1: Type: text/plain, Size: 263 bytes --] acrypto archive - asynchronous crypto layer, the latest(third) reincarnation(announce below). It also has asynchronous and synchronous test crypto providers and test crypto consumer module. Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt [-- Attachment #2: acrypto-14_01_2005.tar.gz --] [-- Type: application/octet-stream, Size: 38272 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov 2005-01-14 22:31 ` Evgeniy Polyakov 2005-01-14 22:32 ` Evgeniy Polyakov @ 2005-01-14 22:33 ` Evgeniy Polyakov 2005-01-14 22:34 ` Evgeniy Polyakov 2005-01-14 22:41 ` Evgeniy Polyakov 4 siblings, 0 replies; 18+ messages in thread From: Evgeniy Polyakov @ 2005-01-14 22:33 UTC (permalink / raw) To: johnpol Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton, James Morris, cryptoapi, David S. Miller [-- Attachment #1: Type: text/plain, Size: 292 bytes --] hifn archive - driver for HIFN 7955/7956 (7956 was not run on Clemens' setup, hopefully patches sent to him fixed that). This is work in progress and currently works only on low load (about one session per 10 msec). Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt [-- Attachment #2: hifn-14_01_2005.tar.gz --] [-- Type: application/octet-stream, Size: 29808 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov ` (2 preceding siblings ...) 2005-01-14 22:33 ` Evgeniy Polyakov @ 2005-01-14 22:34 ` Evgeniy Polyakov 2005-01-14 22:41 ` Evgeniy Polyakov 4 siblings, 0 replies; 18+ messages in thread From: Evgeniy Polyakov @ 2005-01-14 22:34 UTC (permalink / raw) To: johnpol Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton, James Morris, cryptoapi, David S. Miller [-- Attachment #1: Type: text/plain, Size: 481 bytes --] via-padlock - patch to enable xcrypt instructions on various VIA CPUs (for example Nehemiah family). It is totally Michal's work, I've just ported it to acrypto. Not tested. fcrypt - driver for CE-InfoSys FastCrypt PCI card equipped with a SuperCrypt CE99C003B chip that can offload DES and 3DES encryption from the CPU. It is totally Michal's work too, I've just ported it to acrypto. Not tested. Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt [-- Attachment #2: fcrypt-04_01_2005.tar.gz --] [-- Type: application/octet-stream, Size: 8502 bytes --] [-- Attachment #3: via-padlock.patch-04_01_2005 --] [-- Type: application/octet-stream, Size: 24092 bytes --] diff -Nru /tmp/empty/Makefile via-padlock/Makefile --- /tmp/empty/Makefile 1970-01-01 03:00:00.000000000 +0300 +++ via-padlock/Makefile 2004-10-26 07:20:11.000000000 +0400 @@ -0,0 +1,6 @@ +obj-m += padlock.o +padlock-objs := padlock-aes.o padlock-generic.o + +clean: + rm -f *.o *.ko *.mod.* .*.cmd *~ + rm -rf .tmp_versions diff -Nru /tmp/empty/padlock-aes.c via-padlock/padlock-aes.c --- /tmp/empty/padlock-aes.c 1970-01-01 03:00:00.000000000 +0300 +++ via-padlock/padlock-aes.c 2004-12-20 12:49:12.225384528 +0300 @@ -0,0 +1,553 @@ +/* + * Cryptographic API. + * + * Support for VIA PadLock hardware crypto engine. + * + * Linux developers: + * Michal Ludvig <mludvig@suse.cz> + * + * Key expansion routine taken from crypto/aes.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * --------------------------------------------------------------------------- + * Copyright (c) 2002, Dr Brian Gladman <brg@gladman.me.uk>, Worcester, UK. + * All rights reserved. + * + * LICENSE TERMS + * + * The free distribution and use of this software in both source and binary + * form is allowed (with or without changes) provided that: + * + * 1. distributions of this source code include the above copyright + * notice, this list of conditions and the following disclaimer; + * + * 2. distributions in binary form include the above copyright + * notice, this list of conditions and the following disclaimer + * in the documentation and/or other associated materials; + * + * 3. the copyright holder's name is not used to endorse products + * built using this software without specific written permission. + * + * ALTERNATIVELY, provided that this notice is retained in full, this product + * may be distributed under the terms of the GNU General Public License (GPL), + * in which case the provisions of the GPL apply INSTEAD OF those given above. + * + * DISCLAIMER + * + * This software is provided 'as is' with no explicit or implied warranties + * in respect of its properties, including, but not limited to, correctness + * and/or fitness for purpose. + * --------------------------------------------------------------------------- + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/types.h> +#include <linux/errno.h> +#include <linux/crypto.h> +#include <asm/byteorder.h> +#include <linux/mm.h> + +#include <asm/scatterlist.h> + +#include "padlock.h" + +#include "../crypto_def.h" +#include "../acrypto.h" +#include "../crypto_stat.h" + +static inline int aes_hw_extkey_available (u8 key_len); + +static inline +u32 generic_rotr32 (const u32 x, const unsigned bits) +{ + const unsigned n = bits % 32; + return (x >> n) | (x << (32 - n)); +} + +static inline +u32 generic_rotl32 (const u32 x, const unsigned bits) +{ + const unsigned n = bits % 32; + return (x << n) | (x >> (32 - n)); +} + +#define rotl generic_rotl32 +#define rotr generic_rotr32 + +/* + * #define byte(x, nr) ((unsigned char)((x) >> (nr*8))) + */ +inline static u8 +byte(const u32 x, const unsigned n) +{ + return x >> (n << 3); +} + +#define u32_in(x) le32_to_cpu(*(const u32 *)(x)) +#define u32_out(to, from) (*(u32 *)(to) = cpu_to_le32(from)) + +static u8 pow_tab[256]; +static u8 log_tab[256]; +static u8 sbx_tab[256]; +static u8 isb_tab[256]; +static u32 rco_tab[10]; +static u32 ft_tab[4][256]; +static u32 it_tab[4][256]; + +static u32 fl_tab[4][256]; +static u32 il_tab[4][256]; + +static inline u8 +f_mult (u8 a, u8 b) +{ + u8 aa = log_tab[a], cc = aa + log_tab[b]; + + return pow_tab[cc + (cc < aa ? 1 : 0)]; +} + +#define ff_mult(a,b) (a && b ? f_mult(a, b) : 0) + +#define f_rn(bo, bi, n, k) \ + bo[n] = ft_tab[0][byte(bi[n],0)] ^ \ + ft_tab[1][byte(bi[(n + 1) & 3],1)] ^ \ + ft_tab[2][byte(bi[(n + 2) & 3],2)] ^ \ + ft_tab[3][byte(bi[(n + 3) & 3],3)] ^ *(k + n) + +#define i_rn(bo, bi, n, k) \ + bo[n] = it_tab[0][byte(bi[n],0)] ^ \ + it_tab[1][byte(bi[(n + 3) & 3],1)] ^ \ + it_tab[2][byte(bi[(n + 2) & 3],2)] ^ \ + it_tab[3][byte(bi[(n + 1) & 3],3)] ^ *(k + n) + +#define ls_box(x) \ + ( fl_tab[0][byte(x, 0)] ^ \ + fl_tab[1][byte(x, 1)] ^ \ + fl_tab[2][byte(x, 2)] ^ \ + fl_tab[3][byte(x, 3)] ) + +#define f_rl(bo, bi, n, k) \ + bo[n] = fl_tab[0][byte(bi[n],0)] ^ \ + fl_tab[1][byte(bi[(n + 1) & 3],1)] ^ \ + fl_tab[2][byte(bi[(n + 2) & 3],2)] ^ \ + fl_tab[3][byte(bi[(n + 3) & 3],3)] ^ *(k + n) + +#define i_rl(bo, bi, n, k) \ + bo[n] = il_tab[0][byte(bi[n],0)] ^ \ + il_tab[1][byte(bi[(n + 3) & 3],1)] ^ \ + il_tab[2][byte(bi[(n + 2) & 3],2)] ^ \ + il_tab[3][byte(bi[(n + 1) & 3],3)] ^ *(k + n) + +static void +gen_tabs (void) +{ + u32 i, t; + u8 p, q; + + /* log and power tables for GF(2**8) finite field with + 0x011b as modular polynomial - the simplest prmitive + root is 0x03, used here to generate the tables */ + + for (i = 0, p = 1; i < 256; ++i) { + pow_tab[i] = (u8) p; + log_tab[p] = (u8) i; + + p ^= (p << 1) ^ (p & 0x80 ? 0x01b : 0); + } + + log_tab[1] = 0; + + for (i = 0, p = 1; i < 10; ++i) { + rco_tab[i] = p; + + p = (p << 1) ^ (p & 0x80 ? 0x01b : 0); + } + + for (i = 0; i < 256; ++i) { + p = (i ? pow_tab[255 - log_tab[i]] : 0); + q = ((p >> 7) | (p << 1)) ^ ((p >> 6) | (p << 2)); + p ^= 0x63 ^ q ^ ((q >> 6) | (q << 2)); + sbx_tab[i] = p; + isb_tab[p] = (u8) i; + } + + for (i = 0; i < 256; ++i) { + p = sbx_tab[i]; + + t = p; + fl_tab[0][i] = t; + fl_tab[1][i] = rotl (t, 8); + fl_tab[2][i] = rotl (t, 16); + fl_tab[3][i] = rotl (t, 24); + + t = ((u32) ff_mult (2, p)) | + ((u32) p << 8) | + ((u32) p << 16) | ((u32) ff_mult (3, p) << 24); + + ft_tab[0][i] = t; + ft_tab[1][i] = rotl (t, 8); + ft_tab[2][i] = rotl (t, 16); + ft_tab[3][i] = rotl (t, 24); + + p = isb_tab[i]; + + t = p; + il_tab[0][i] = t; + il_tab[1][i] = rotl (t, 8); + il_tab[2][i] = rotl (t, 16); + il_tab[3][i] = rotl (t, 24); + + t = ((u32) ff_mult (14, p)) | + ((u32) ff_mult (9, p) << 8) | + ((u32) ff_mult (13, p) << 16) | + ((u32) ff_mult (11, p) << 24); + + it_tab[0][i] = t; + it_tab[1][i] = rotl (t, 8); + it_tab[2][i] = rotl (t, 16); + it_tab[3][i] = rotl (t, 24); + } +} + +#define star_x(x) (((x) & 0x7f7f7f7f) << 1) ^ ((((x) & 0x80808080) >> 7) * 0x1b) + +#define imix_col(y,x) \ + u = star_x(x); \ + v = star_x(u); \ + w = star_x(v); \ + t = w ^ (x); \ + (y) = u ^ v ^ w; \ + (y) ^= rotr(u ^ t, 8) ^ \ + rotr(v ^ t, 16) ^ \ + rotr(t,24) + +/* initialise the key schedule from the user supplied key */ + +#define loop4(i) \ +{ t = rotr(t, 8); t = ls_box(t) ^ rco_tab[i]; \ + t ^= E_KEY[4 * i]; E_KEY[4 * i + 4] = t; \ + t ^= E_KEY[4 * i + 1]; E_KEY[4 * i + 5] = t; \ + t ^= E_KEY[4 * i + 2]; E_KEY[4 * i + 6] = t; \ + t ^= E_KEY[4 * i + 3]; E_KEY[4 * i + 7] = t; \ +} + +#define loop6(i) \ +{ t = rotr(t, 8); t = ls_box(t) ^ rco_tab[i]; \ + t ^= E_KEY[6 * i]; E_KEY[6 * i + 6] = t; \ + t ^= E_KEY[6 * i + 1]; E_KEY[6 * i + 7] = t; \ + t ^= E_KEY[6 * i + 2]; E_KEY[6 * i + 8] = t; \ + t ^= E_KEY[6 * i + 3]; E_KEY[6 * i + 9] = t; \ + t ^= E_KEY[6 * i + 4]; E_KEY[6 * i + 10] = t; \ + t ^= E_KEY[6 * i + 5]; E_KEY[6 * i + 11] = t; \ +} + +#define loop8(i) \ +{ t = rotr(t, 8); ; t = ls_box(t) ^ rco_tab[i]; \ + t ^= E_KEY[8 * i]; E_KEY[8 * i + 8] = t; \ + t ^= E_KEY[8 * i + 1]; E_KEY[8 * i + 9] = t; \ + t ^= E_KEY[8 * i + 2]; E_KEY[8 * i + 10] = t; \ + t ^= E_KEY[8 * i + 3]; E_KEY[8 * i + 11] = t; \ + t = E_KEY[8 * i + 4] ^ ls_box(t); \ + E_KEY[8 * i + 12] = t; \ + t ^= E_KEY[8 * i + 5]; E_KEY[8 * i + 13] = t; \ + t ^= E_KEY[8 * i + 6]; E_KEY[8 * i + 14] = t; \ + t ^= E_KEY[8 * i + 7]; E_KEY[8 * i + 15] = t; \ +} + +static int +aes_set_key(void *ctx_arg, const u8 *in_key, unsigned int key_len) +{ + struct aes_ctx *ctx = ctx_arg; + u32 i, t, u, v, w; + u32 P[AES_EXTENDED_KEY_SIZE]; + u32 rounds; + + if (key_len != 16 && key_len != 24 && key_len != 32) { + return -EINVAL; + } + + ctx->key_length = key_len; + + ctx->E = ctx->e_data; + ctx->D = ctx->d_data; + + /* Ensure 16-Bytes alignmentation of keys for VIA PadLock. */ + if ((int)(ctx->e_data) & 0x0F) + ctx->E += 4 - (((int)(ctx->e_data) & 0x0F) / sizeof (ctx->e_data[0])); + + if ((int)(ctx->d_data) & 0x0F) + ctx->D += 4 - (((int)(ctx->d_data) & 0x0F) / sizeof (ctx->d_data[0])); + + E_KEY[0] = u32_in (in_key); + E_KEY[1] = u32_in (in_key + 4); + E_KEY[2] = u32_in (in_key + 8); + E_KEY[3] = u32_in (in_key + 12); + + /* Don't generate extended keys if the hardware can do it. */ + if (aes_hw_extkey_available(key_len)) + return 0; + + switch (key_len) { + case 16: + t = E_KEY[3]; + for (i = 0; i < 10; ++i) + loop4 (i); + break; + + case 24: + E_KEY[4] = u32_in (in_key + 16); + t = E_KEY[5] = u32_in (in_key + 20); + for (i = 0; i < 8; ++i) + loop6 (i); + break; + + case 32: + E_KEY[4] = u32_in (in_key + 16); + E_KEY[5] = u32_in (in_key + 20); + E_KEY[6] = u32_in (in_key + 24); + t = E_KEY[7] = u32_in (in_key + 28); + for (i = 0; i < 7; ++i) + loop8 (i); + break; + } + + D_KEY[0] = E_KEY[0]; + D_KEY[1] = E_KEY[1]; + D_KEY[2] = E_KEY[2]; + D_KEY[3] = E_KEY[3]; + + for (i = 4; i < key_len + 24; ++i) { + imix_col (D_KEY[i], E_KEY[i]); + } + + /* PadLock needs a different format of the decryption key. */ + rounds = 10 + (key_len - 16) / 4; + + for (i = 0; i < rounds; i++) { + P[((i + 1) * 4) + 0] = D_KEY[((rounds - i - 1) * 4) + 0]; + P[((i + 1) * 4) + 1] = D_KEY[((rounds - i - 1) * 4) + 1]; + P[((i + 1) * 4) + 2] = D_KEY[((rounds - i - 1) * 4) + 2]; + P[((i + 1) * 4) + 3] = D_KEY[((rounds - i - 1) * 4) + 3]; + } + + P[0] = E_KEY[(rounds * 4) + 0]; + P[1] = E_KEY[(rounds * 4) + 1]; + P[2] = E_KEY[(rounds * 4) + 2]; + P[3] = E_KEY[(rounds * 4) + 3]; + + memcpy(D_KEY, P, AES_EXTENDED_KEY_SIZE_B); + + return 0; +} + +/* Tells whether the ACE is capable to generate + the extended key for a given key_len. */ +static inline int aes_hw_extkey_available(u8 key_len) +{ + /* TODO: We should check the actual CPU model/stepping + as it's likely that the capability will be + added in the next CPU revisions. */ + if (key_len == 16) + return 1; + return 0; +} + +static void aes_padlock(void *ctx_arg, u8 *out_arg, const u8 *in_arg, + const u8 *iv_arg, size_t nbytes, int encdec, + int mode) +{ + struct aes_ctx *ctx = ctx_arg; + char bigbuf[sizeof(union cword) + 16]; + union cword *cword; + void *key; + + if (((long)bigbuf) & 0x0F) + cword = (void*)(bigbuf + 16 - ((long)bigbuf & 0x0F)); + else + cword = (void*)bigbuf; + + /* Prepare Control word. */ + memset (cword, 0, sizeof(union cword)); + cword->b.encdec = !encdec; /* in the rest of cryptoapi ENC=1/DEC=0 */ + cword->b.rounds = 10 + (ctx->key_length - 16) / 4; + cword->b.ksize = (ctx->key_length - 16) / 8; + + /* Is the hardware capable to generate the extended key? */ + if (!aes_hw_extkey_available(ctx->key_length)) + cword->b.keygen = 1; + + /* ctx->E starts with a plain key - if the hardware is capable + to generate the extended key itself we must supply + the plain key for both Encryption and Decryption. */ + if (encdec == CRYPTO_OP_ENCRYPT || cword->b.keygen == 0) + key = ctx->E; + else + key = ctx->D; + + padlock_aligner(out_arg, in_arg, iv_arg, key, cword, + nbytes, AES_BLOCK_SIZE, encdec, mode); +} + +static void aes_padlock_ecb(void *ctx, u8 *dst, const u8 *src, const u8 *iv, + size_t nbytes, int encdec) +{ + aes_padlock(ctx, dst, src, NULL, nbytes, encdec, CRYPTO_MODE_ECB); +} + +static void aes_padlock_cbc(void *ctx, u8 *dst, const u8 *src, const u8 *iv, + size_t nbytes, int encdec) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, CRYPTO_MODE_CBC); +} + +static void aes_padlock_cfb(void *ctx, u8 *dst, const u8 *src, const u8 *iv, + size_t nbytes, int encdec) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, CRYPTO_MODE_CFB); +} + +static void aes_padlock_ofb(void *ctx, u8 *dst, const u8 *src, const u8 *iv, + size_t nbytes, int encdec) +{ + aes_padlock(ctx, dst, src, iv, nbytes, encdec, CRYPTO_MODE_OFB); +} + +static struct crypto_capability padlock_caps[] = +{ + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_ECB, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CBC, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CFB, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_OFB, 1000}, + + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_ECB, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CBC, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CFB, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_OFB, 1000}, + + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_ECB, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CBC, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CFB, 1000}, + {CRYPTO_OP_ENCRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_OFB, 1000}, + + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_ECB, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CBC, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_CFB, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_128, CRYPTO_MODE_OFB, 1000}, + + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_ECB, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CBC, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_CFB, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_192, CRYPTO_MODE_OFB, 1000}, + + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_ECB, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CBC, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_CFB, 1000}, + {CRYPTO_OP_DECRYPT, CRYPTO_TYPE_AES_256, CRYPTO_MODE_OFB, 1000}, +}; +static int padlock_cap_number = sizeof(padlock_caps)/sizeof(padlock_caps[0]); + +static void padlock_data_ready(struct crypto_device *dev); +static int padlock_data_ready_reentry; + +static struct crypto_device padlock_device = +{ + .name = "via-padlock", + .data_ready = padlock_data_ready, + .cap = &padlock_caps[0], +}; + +static void process_session(struct crypto_session *s) +{ + int err; + u8 *key, *dst, *src, *iv; + size_t size, keylen; + + key = ((u8 *)page_address(s->data.sg_key.page)) + s->data.sg_key.offset; + keylen = s->data.sg_key.length; + dst = ((u8 *)page_address(s->data.sg_dst.page)) + s->data.sg_dst.offset; + src = ((u8 *)page_address(s->data.sg_src.page)) + s->data.sg_src.offset; + size = s->data.sg_src.length; + iv = ((u8 *)page_address(s->data.sg_iv.page)) + s->data.sg_iv.offset; + + err = aes_set_key(s->data.priv, key, keylen); + if (err) + return; + + switch (s->ci.mode) + { + case CRYPTO_MODE_ECB: + aes_padlock_ecb(s->data.priv, dst, src, iv, size, s->ci.operation); + break; + case CRYPTO_MODE_CBC: + aes_padlock_cbc(s->data.priv, dst, src, iv, size, s->ci.operation); + break; + case CRYPTO_MODE_CFB: + aes_padlock_cfb(s->data.priv, dst, src, iv, size, s->ci.operation); + break; + case CRYPTO_MODE_OFB: + aes_padlock_ofb(s->data.priv, dst, src, iv, size, s->ci.operation); + break; + } + + s->data.sg_dst.length = size; + + return; +} + +static void padlock_data_ready(struct crypto_device *dev) +{ + struct crypto_session *s, *n; + + if (padlock_data_ready_reentry) + return; + + padlock_data_ready_reentry++; + list_for_each_entry_safe(s, n, &dev->session_list, dev_queue_entry) + { + if (!session_completed(s)) + { + start_process_session(s); + process_session(s); + crypto_stat_complete_inc(s); + crypto_session_dequeue_route(s); + complete_session(s); + stop_process_session(s); + } + } + padlock_data_ready_reentry--; +} + +int padlock_init_aes(void) +{ + u32 cpuid, edx; + u32 val = 0xC0000000; + + cpuid = cpuid_eax(val); + edx = cpuid_edx(val); + printk("val=%x, cpuid=%x, edx=%x.\n", val, cpuid, edx); + if (cpuid >= val + 1) + { + printk("Board supports ACE.\n"); + } + else + { + printk("Board does not support ACE.\n"); + return -ENODEV; + } + + printk(KERN_NOTICE "Using VIA PadLock ACE for AES algorithm (multiblock).\n"); + + padlock_device.cap_number = padlock_cap_number; + + gen_tabs(); + return crypto_device_add(&padlock_device); +} + +void padlock_fini_aes(void) +{ + crypto_device_remove(&padlock_device); +} diff -Nru /tmp/empty/padlock-generic.c via-padlock/padlock-generic.c --- /tmp/empty/padlock-generic.c 1970-01-01 03:00:00.000000000 +0300 +++ via-padlock/padlock-generic.c 2004-11-01 09:30:41.000000000 +0300 @@ -0,0 +1,191 @@ +/* + * Cryptographic API. + * + * Support for VIA PadLock hardware crypto engine. + * + * Linux developers: + * Michal Ludvig <mludvig@suse.cz> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/types.h> +#include <linux/errno.h> +#include <linux/crypto.h> +#include <asm/byteorder.h> + +#include "padlock.h" +#include "../acrypto.h" +#include "../crypto_def.h" + +#define PFX "padlock: " + +typedef void (xcrypt_t)(u8 *input, u8 *output, u8 *key, u8 *iv, + void *control_word, u32 count); + +static inline void padlock_xcrypt_ecb(u8 *input, u8 *output, u8 *key, + u8 *iv, void *control_word, u32 count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */ + : "=m"(*output), "+S"(input), "+D"(output) + : "d"(control_word), "b"(key), "c"(count)); +} + +static inline void padlock_xcrypt_cbc(u8 *input, u8 *output, u8 *key, + u8 *iv, void *control_word, u32 count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */ + : "=m"(*output), "+S"(input), "+D"(output) + : "d"(control_word), "b"(key), "c"(count), "a"(iv)); +} + +static inline void padlock_xcrypt_cfb(u8 *input, u8 *output, u8 *key, + u8 *iv, void *control_word, u32 count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xe0" /* rep xcryptcfb */ + : "=m"(*output), "+S"(input), "+D"(output) + : "d"(control_word), "b"(key), "c"(count), "a"(iv)); +} + +static inline void padlock_xcrypt_ofb(u8 *input, u8 *output, u8 *key, + u8 *iv, void *control_word, u32 count) +{ + asm volatile ("pushfl; popfl"); /* enforce key reload. */ + asm volatile (".byte 0xf3,0x0f,0xa7,0xe8" /* rep xcryptofb */ + : "=m"(*output), "+S"(input), "+D"(output) + : "d"(control_word), "b"(key), "c"(count), "a"(iv)); +} + +void *crypto_aligned_kmalloc(size_t size, int mode, size_t alignment, void **index) +{ + char *ptr; + + ptr = kmalloc(size + alignment, mode); + *index = ptr; + if (alignment > 1 && ((long)ptr & (alignment - 1))) { + ptr += alignment - ((long)ptr & (alignment - 1)); + } + + return ptr; +} + +void padlock_aligner(u8 *out_arg, const u8 *in_arg, const u8 *iv_arg, + void *key, union cword *cword, + size_t nbytes, size_t blocksize, + int encdec, int mode) +{ + /* Don't blindly modify this structure - the items must + fit on 16-Bytes boundaries! */ + struct padlock_xcrypt_data { + u8 iv[blocksize]; /* Initialization vector */ + }; + + u8 *in, *out, *iv; + void *index = NULL; + char bigbuf[sizeof(struct padlock_xcrypt_data) + 16]; + struct padlock_xcrypt_data *data; + + /* Place 'data' at the first 16-Bytes aligned address in 'bigbuf'. */ + if (((long)bigbuf) & 0x0F) + data = (void*)(bigbuf + 16 - ((long)bigbuf & 0x0F)); + else + data = (void*)bigbuf; + + if (((long)in_arg) & 0x0F) { + in = crypto_aligned_kmalloc(nbytes, GFP_KERNEL, 16, &index); + memcpy(in, in_arg, nbytes); + } + else + in = (u8*)in_arg; + + if (((long)out_arg) & 0x0F) { + if (index) + out = in; /* xcrypt can work "in place" */ + else + out = crypto_aligned_kmalloc(nbytes, GFP_KERNEL, 16, &index); + } + else + out = out_arg; + + /* Always make a local copy of IV - xcrypt may change it! */ + iv = data->iv; + if (iv_arg) + memcpy(iv, iv_arg, blocksize); + + + dprintk("data=%p\n", data); + dprintk("in=%p\n", in); + dprintk("out=%p\n", out); + dprintk("iv=%p\n", iv); + dprintk("nbytes=%d, blocksize=%d.\n", nbytes, blocksize); + + switch (mode) { + case CRYPTO_MODE_ECB: + padlock_xcrypt_ecb(in, out, key, iv, cword, nbytes/blocksize); + break; + + case CRYPTO_MODE_CBC: + padlock_xcrypt_cbc(in, out, key, iv, cword, nbytes/blocksize); + break; + + case CRYPTO_MODE_CFB: + padlock_xcrypt_cfb(in, out, key, iv, cword, nbytes/blocksize); + break; + + case CRYPTO_MODE_OFB: + padlock_xcrypt_ofb(in, out, key, iv, cword, nbytes/blocksize); + break; + + default: + BUG(); + } + + /* Copy the 16-Byte aligned output to the caller's buffer. */ + if (out != out_arg) + memcpy(out_arg, out, nbytes); + + if (index) + kfree(index); +} + +static int __init padlock_init(void) +{ + int ret = -ENOSYS; +#if 0 + if (!cpu_has_xcrypt) { + printk(KERN_ERR PFX "VIA PadLock not detected.\n"); + return -ENODEV; + } + + if (!cpu_has_xcrypt_enabled) { + printk(KERN_ERR PFX "VIA PadLock detected, but not enabled. Hmm, strange...\n"); + return -ENODEV; + } +#endif + if ((ret = padlock_init_aes())) { + printk(KERN_ERR PFX "VIA PadLock AES initialization failed.\n"); + return ret; + } + + return ret; +} + +static void __exit padlock_fini(void) +{ + padlock_fini_aes(); +} + +module_init(padlock_init); +module_exit(padlock_fini); + +MODULE_DESCRIPTION("VIA PadLock crypto engine support."); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Michal Ludvig"); diff -Nru /tmp/empty/padlock.h via-padlock/padlock.h --- /tmp/empty/padlock.h 1970-01-01 03:00:00.000000000 +0300 +++ via-padlock/padlock.h 2004-10-28 10:05:50.000000000 +0400 @@ -0,0 +1,71 @@ +/* + * Cryptographic API. + * + * Copyright (c) 2004 Michal Ludvig <mludvig@suse.cz> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + */ + +#ifndef _CRYPTO_PADLOCK_H +#define _CRYPTO_PADLOCK_H + +#define AES_MIN_KEY_SIZE 16 /* in u8 units */ +#define AES_MAX_KEY_SIZE 32 /* ditto */ +#define AES_BLOCK_SIZE 16 /* ditto */ +#define AES_EXTENDED_KEY_SIZE 64 /* in u32 units */ +#define AES_EXTENDED_KEY_SIZE_B (AES_EXTENDED_KEY_SIZE * sizeof(u32)) + +struct aes_ctx { + u32 e_data[AES_EXTENDED_KEY_SIZE+4]; + u32 d_data[AES_EXTENDED_KEY_SIZE+4]; + int key_length; + u32 *E; + u32 *D; +}; + +#define E_KEY ctx->E +#define D_KEY ctx->D + + +/* Control word. */ +#if 1 +union cword { + u32 cword[4]; + struct { + int rounds:4; + int algo:3; + int keygen:1; + int interm:1; + int encdec:1; + int ksize:2; + } b; +}; +#else +union cword { + u32 cword[4]; + struct { + unsigned rounds:4, + algo:3, + keygen:1, + interm:1, + encdec:1, + ksize:2; + } b; +}; +#endif + +#define PFX "padlock: " + +void padlock_aligner(u8 *out_arg, const u8 *in_arg, const u8 *iv_arg, + void *key, union cword *cword, + size_t nbytes, size_t blocksize, + int encdec, int mode); + +int padlock_init_aes(void); +void padlock_fini_aes(void); + +#endif /* _CRYPTO_PADLOCK_H */ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time 2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov ` (3 preceding siblings ...) 2005-01-14 22:34 ` Evgeniy Polyakov @ 2005-01-14 22:41 ` Evgeniy Polyakov 4 siblings, 0 replies; 18+ messages in thread From: Evgeniy Polyakov @ 2005-01-14 22:41 UTC (permalink / raw) To: johnpol Cc: linux-kernel, Michal Ludvig, Fruhwirth Clemens, Andrew Morton, James Morris, cryptoapi, David S. Miller On Sat, 15 Jan 2005 01:31:03 +0300 Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote: > > Crypto routing. > This feature allows the same session to be processed by several devices/algorithms. > For example if you need to encrypt data and then sign it in TPM device you can create > one route to encryption device and then route it to TPM device. (Note: this feature > must be discussed since there is no time slice after session allocation, only in > crypto_device->data_ready() method and there are locking issues in ->callback() method). Actually it is already impleneted by crypto_session_alloc(); route manipulations crypto_session_add(); And sessions can be (re)routed inside crypto devices itself. Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-03-03 12:01 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <Xine.LNX.4.44.0411301009560.11945-100000@thoron.boston.redhat.com> [not found] ` <Pine.LNX.4.61.0411301722270.4409@maxipes.logix.cz> [not found] ` <20041130222442.7b0f4f67.davem@davemloft.net> 2005-01-11 17:03 ` PadLock processing multiple blocks at a time Michal Ludvig 2005-01-11 17:08 ` [PATCH 1/2] " Michal Ludvig 2005-01-14 13:10 ` [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers " Michal Ludvig 2005-01-14 14:20 ` Fruhwirth Clemens 2005-01-14 16:40 ` Michal Ludvig 2005-01-15 12:45 ` Fruhwirth Clemens 2005-01-18 16:49 ` James Morris 2005-01-20 3:30 ` David McCullough 2005-01-20 13:47 ` James Morris 2005-03-03 10:50 ` David McCullough 2005-01-11 17:08 ` [PATCH 2/2] PadLock processing multiple blocks " Michal Ludvig 2005-01-14 3:05 ` Andrew Morton 2005-01-14 13:15 ` [PATCH 2/2] CryptoAPI: Update PadLock to process multiple blocks at once Michal Ludvig 2005-01-14 22:31 Fw: [PATCH 1/2] CryptoAPI: prepare for processing multiple buffers at a time Evgeniy Polyakov 2005-01-14 22:31 ` Evgeniy Polyakov 2005-01-14 22:32 ` Evgeniy Polyakov 2005-01-14 22:33 ` Evgeniy Polyakov 2005-01-14 22:34 ` Evgeniy Polyakov 2005-01-14 22:41 ` Evgeniy Polyakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).