* comparison of the AF_ALG interface with the /dev/crypto @ 2011-08-28 13:17 Nikos Mavrogiannopoulos 2011-08-28 20:35 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Nikos Mavrogiannopoulos @ 2011-08-28 13:17 UTC (permalink / raw) To: cryptodev-linux-devel, linux-crypto, linux-kernel Hello, I've compared the cryptodev [0] and AF_ALG interfaces in terms of performance [1]. I've put the results, as well as the benchmarks used in: http://home.gna.org/cryptodev-linux/comparison.html The benchmark idea was to test the speed of initialization, encryption and deinitiation, as well as the encryption speed alone. These are the most common use cases of the frameworks (i.e. how they would be used by a cryptographic library). The AF_ALG appears to have poor performance comparing to cryptodev. Note that the test with software AES is not really indicative because the cost of software encryption masks the overhead of the framework. The difference is clearly seen in the NULL cipher that has no cost (as one would expect from a hardware cipher accelerator). Given my benchmarks have no issues, it is not apparent to me why one should use AF_ALG instead of cryptodev. I do not know though why AF_ALG performs so poor. I'd speculate by blaming it on the usage of the socket API and the number of system calls required. regards, Nikos [0]. http://home.gna.org/cryptodev-linux/ [1]. Both intend to provide user-space with high-bandwidth hardware accelerated ciphers, thus performance seems a rational to compare. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-08-28 13:17 comparison of the AF_ALG interface with the /dev/crypto Nikos Mavrogiannopoulos @ 2011-08-28 20:35 ` David Miller 2011-08-29 7:32 ` Nikos Mavrogiannopoulos 2011-08-30 16:33 ` [Cryptodev-linux-devel] " Phil Sutter 2011-09-01 2:15 ` Herbert Xu 2 siblings, 1 reply; 29+ messages in thread From: David Miller @ 2011-08-28 20:35 UTC (permalink / raw) To: nmav; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel From: Nikos Mavrogiannopoulos <nmav@gnutls.org> Date: Sun, 28 Aug 2011 15:17:00 +0200 > The benchmark idea was to test the speed of initialization, encryption > and deinitiation, as well as the encryption speed alone. These are the > most common use cases of the frameworks (i.e. how they would be used > by a cryptographic library). Be sure to use splice() with AF_ALG for maximum performance. For example, see the test program below. You'll need to replace "8192" with whatever the page size is on your cpu. -------------------- #include <fcntl.h> #include <openssl/aes.h> #include <stdio.h> #include <string.h> #include <sys/socket.h> #include <sys/types.h> #include <linux/types.h> #define AF_ALG 38 #define SOL_ALG 279 #define SPLICE_F_GIFT (0x08) /* pages passed in are a gift */ struct sockaddr_alg { __u16 salg_family; __u8 salg_type[14]; __u32 salg_feat; __u32 salg_mask; __u8 salg_name[64]; }; struct af_alg_iv { __u32 ivlen; __u8 iv[0]; }; /* Socket options */ #define ALG_SET_KEY 1 #define ALG_SET_IV 2 #define ALG_SET_OP 3 /* Operations */ #define ALG_OP_DECRYPT 0 #define ALG_OP_ENCRYPT 1 static char buf[8192] __attribute__((__aligned__(8192))); static void crypt_ssl(const char *key, char *iv, int i) { AES_KEY akey; AES_set_encrypt_key(key, 128, &akey); while (i--) AES_cbc_encrypt(buf, buf, 8192, &akey, iv, 1); } static void crypt_kernel(const char *key, char *oiv, int i) { int opfd; int tfmfd; struct sockaddr_alg sa = { .salg_family = AF_ALG, .salg_type = "skcipher", .salg_name = "cbc(aes)" }; struct msghdr msg = {}; struct cmsghdr *cmsg; char cbuf[CMSG_SPACE(4) + CMSG_SPACE(20)] = {}; struct aes_iv { __u32 len; __u8 iv[16]; } *iv; struct iovec iov; int pipes[2]; pipe(pipes); tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)); setsockopt(tfmfd, SOL_ALG, ALG_SET_KEY, key, 16); opfd = accept(tfmfd, NULL, 0); msg.msg_control = cbuf; msg.msg_controllen = sizeof(cbuf); cmsg = CMSG_FIRSTHDR(&msg); cmsg->cmsg_level = SOL_ALG; cmsg->cmsg_type = ALG_SET_OP; cmsg->cmsg_len = CMSG_LEN(4); *(__u32 *)CMSG_DATA(cmsg) = ALG_OP_ENCRYPT; cmsg = CMSG_NXTHDR(&msg, cmsg); cmsg->cmsg_level = SOL_ALG; cmsg->cmsg_type = ALG_SET_IV; cmsg->cmsg_len = CMSG_LEN(20); iv = (void *)CMSG_DATA(cmsg); iv->len = 16; memcpy(iv->iv, oiv, 16); iov.iov_base = buf; iov.iov_len = 8192; msg.msg_iovlen = 0; msg.msg_flags = MSG_MORE; while (i--) { sendmsg(opfd, &msg, 0); vmsplice(pipes[1], &iov, 1, SPLICE_F_GIFT); splice(pipes[0], NULL, opfd, NULL, 8192, 0); read(opfd, buf, 8192); } close(opfd); close(tfmfd); close(pipes[0]); close(pipes[1]); } int main(int argc, char **argv) { int i; const char key[16] = "\x06\xa9\x21\x40\x36\xb8\xa1\x5b" "\x51\x2e\x03\xd5\x34\x12\x00\x06"; char iv[16] = "\x3d\xaf\xba\x42\x9d\x9e\xb4\x30" "\xb4\x22\xda\x80\x2c\x9f\xac\x41"; memcpy(buf, "Single block msg", 16); if (argc > 1) crypt_ssl(key, iv, 1024 * 1024); else crypt_kernel(key, iv, 1024 * 1024); for (i = 0; i < 8192; i++) { printf("%02x", (unsigned char)buf[i]); } printf("\n"); return 0; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-08-28 20:35 ` David Miller @ 2011-08-29 7:32 ` Nikos Mavrogiannopoulos 2011-08-29 16:09 ` David Miller 0 siblings, 1 reply; 29+ messages in thread From: Nikos Mavrogiannopoulos @ 2011-08-29 7:32 UTC (permalink / raw) To: David Miller; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel On 08/28/2011 10:35 PM, David Miller wrote: >> The benchmark idea was to test the speed of initialization, encryption >> and deinitiation, as well as the encryption speed alone. These are the >> most common use cases of the frameworks (i.e. how they would be used >> by a cryptographic library). > Be sure to use splice() with AF_ALG for maximum performance. > For example, see the test program below. You'll need to replace > "8192" with whatever the page size is on your cpu. As I understand with splice you can encrypt only page aligned data that span a multiple of pages. This is a very uncommon case. My benchmark targets the generic case, i.e., the way this interface will be used in crypto libraries like gnutls. However, I'll update the comparison page to include the splice version as well. regards, Nikos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-08-29 7:32 ` Nikos Mavrogiannopoulos @ 2011-08-29 16:09 ` David Miller 0 siblings, 0 replies; 29+ messages in thread From: David Miller @ 2011-08-29 16:09 UTC (permalink / raw) To: nmav; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel From: Nikos Mavrogiannopoulos <nmav@gnutls.org> Date: Mon, 29 Aug 2011 09:32:19 +0200 > On 08/28/2011 10:35 PM, David Miller wrote: > >>> The benchmark idea was to test the speed of initialization, encryption >>> and deinitiation, as well as the encryption speed alone. These are the >>> most common use cases of the frameworks (i.e. how they would be used >>> by a cryptographic library). >> Be sure to use splice() with AF_ALG for maximum performance. >> For example, see the test program below. You'll need to replace >> "8192" with whatever the page size is on your cpu. > > As I understand with splice you can encrypt only page aligned data > that span a multiple of pages. This is a very uncommon case. My > benchmark targets the generic case, i.e., the way this interface will > be used in crypto libraries like gnutls. Only the buffer you use must have these properties, you can use whatever lengths you like. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Cryptodev-linux-devel] comparison of the AF_ALG interface with the /dev/crypto 2011-08-28 13:17 comparison of the AF_ALG interface with the /dev/crypto Nikos Mavrogiannopoulos 2011-08-28 20:35 ` David Miller @ 2011-08-30 16:33 ` Phil Sutter 2011-09-01 2:15 ` Herbert Xu 2 siblings, 0 replies; 29+ messages in thread From: Phil Sutter @ 2011-08-30 16:33 UTC (permalink / raw) To: Nikos Mavrogiannopoulos; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel Hi, On Sun, Aug 28, 2011 at 03:17:00PM +0200, Nikos Mavrogiannopoulos wrote: > I've compared the cryptodev [0] and AF_ALG interfaces in terms of > performance [1]. I've put the results, as well as the benchmarks used > in: http://home.gna.org/cryptodev-linux/comparison.html Well done, Nikos! I did a short verification of your results on a (bit older) Via Eden running at 1GHz (with padlock enabled, of course). I just ran the cryptodev "fulltest" and af_alg "aes", so this should relate to the overall-test using splice. Here are the numbers: chunksize cryptodev af_alg ------------------------------------------- 512 15.34 MB/s 12.32 MB/s 1024 30.01 MB/s 24.22 MB/s 2048 57.29 MB/s 46.85 MB/s 4096 103.13 MB/s 87.29 MB/s 8192 174.08 MB/s 150.04 MB/s 16384 0.27 GB/s 0.23 GB/s 32768 0.35 GB/s 0.32 GB/s 65536 0.42 GB/s 0.38 GB/s So at it's best (512byte chunks), cryptodev is about 25% faster. The worst case is with 32kbyte chunks, then cryptodev is only 9% faster. > The AF_ALG appears to have poor performance comparing to cryptodev. Note > that the test with software AES is not really indicative because the > cost of software encryption masks the overhead of the framework. The > difference is clearly seen in the NULL cipher that has no cost (as one > would expect from a hardware cipher accelerator). Not really. Indeed, a crypto engine accelerates the actual encryption. But another important benefit of CPU-separate (unlike padlock) engines is the offloading of that work, so the CPU can do other things in the mean time. E.g. handling the less efficient userspace interface. ;) OK, just kidding - in reality you always need to do init and fini stuff before and after the actual crypto operation to get any result at all. Skipping the middle should allow for measuring the rest. > Given my benchmarks have no issues, it is not apparent to me why one > should use AF_ALG instead of cryptodev. I do not know though why AF_ALG > performs so poor. I'd speculate by blaming it on the usage of the socket > API and the number of system calls required. Interestingly, the splice variant is outrun by regular AF_ALG on small buffers. I don't know if there is something wrong with the code, but according to some old benchmarks I found, cryptodev with zero-copy enabled got faster in every situation (even with 16byte buffers). Greetings, Phil ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-08-28 13:17 comparison of the AF_ALG interface with the /dev/crypto Nikos Mavrogiannopoulos @ 2011-09-01 2:15 ` Herbert Xu 2011-08-30 16:33 ` [Cryptodev-linux-devel] " Phil Sutter 2011-09-01 2:15 ` Herbert Xu 2 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 2:15 UTC (permalink / raw) To: Nikos Mavrogiannopoulos; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel Nikos Mavrogiannopoulos <nmav@gnutls.org> wrote: > > Given my benchmarks have no issues, it is not apparent to me why one > should use AF_ALG instead of cryptodev. I do not know though why AF_ALG > performs so poor. I'd speculate by blaming it on the usage of the socket > API and the number of system calls required. The target usage of AF_ALG is hardware offload devices that cannot be directly used in user-space, not software crypto on implementations such as AESNI/Padlock. Going through the kernel to use something like AESNI/Padlock or software crypto is insane. Given the intended target case, your numbers are pretty much meaningless as cryptodev's performance can be easily beaten by a pure user-space implementation. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 2:15 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 2:15 UTC (permalink / raw) To: Nikos Mavrogiannopoulos; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel Nikos Mavrogiannopoulos <nmav@gnutls.org> wrote: > > Given my benchmarks have no issues, it is not apparent to me why one > should use AF_ALG instead of cryptodev. I do not know though why AF_ALG > performs so poor. I'd speculate by blaming it on the usage of the socket > API and the number of system calls required. The target usage of AF_ALG is hardware offload devices that cannot be directly used in user-space, not software crypto on implementations such as AESNI/Padlock. Going through the kernel to use something like AESNI/Padlock or software crypto is insane. Given the intended target case, your numbers are pretty much meaningless as cryptodev's performance can be easily beaten by a pure user-space implementation. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 2:15 ` Herbert Xu (?) @ 2011-09-01 6:26 ` Nikos Mavrogiannopoulos 2011-09-01 6:43 ` Herbert Xu -1 siblings, 1 reply; 29+ messages in thread From: Nikos Mavrogiannopoulos @ 2011-09-01 6:26 UTC (permalink / raw) To: Herbert Xu; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel On 09/01/2011 04:15 AM, Herbert Xu wrote: > Nikos Mavrogiannopoulos<nmav@gnutls.org> wrote: >> >> Given my benchmarks have no issues, it is not apparent to me why one >> should use AF_ALG instead of cryptodev. I do not know though why AF_ALG >> performs so poor. I'd speculate by blaming it on the usage of the socket >> API and the number of system calls required. > The target usage of AF_ALG is hardware offload devices that cannot > be directly used in user-space, not software crypto on implementations > such as AESNI/Padlock. > Going through the kernel to use something like AESNI/Padlock or > software crypto is insane. > Given the intended target case, your numbers are pretty much > meaningless as cryptodev's performance can be easily beaten > by a pure user-space implementation. Actually this is the reason of the ecb(cipher-null) comparison. To emulate the case of a hardware offload device. I tried to make that clear in the text, but may not be. If you see AF_ALG performs really bad on that case. It performs better when a software or a padlock implementation of AES is involved (which as you say it is a useless use-case). Of course, I don't own such an offloading device and cannot test it directly. If you have different values from a benchmark with an actual hardware accelerator, I'll be happy to include them. regards, Nikos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 6:26 ` Nikos Mavrogiannopoulos @ 2011-09-01 6:43 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 6:43 UTC (permalink / raw) To: Nikos Mavrogiannopoulos; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 08:26:07AM +0200, Nikos Mavrogiannopoulos wrote: > > Actually this is the reason of the ecb(cipher-null) comparison. To > emulate the case of a hardware offload device. I tried to make that > clear in the text, but may not be. If you see AF_ALG performs really bad > on that case. It performs better when a software or a padlock > implementation of AES is involved (which as you say it is a useless > use-case). It's meaningless because such devices operate at a rate much lower than the figures you give. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 6:43 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 6:43 UTC (permalink / raw) To: Nikos Mavrogiannopoulos; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 08:26:07AM +0200, Nikos Mavrogiannopoulos wrote: > > Actually this is the reason of the ecb(cipher-null) comparison. To > emulate the case of a hardware offload device. I tried to make that > clear in the text, but may not be. If you see AF_ALG performs really bad > on that case. It performs better when a software or a padlock > implementation of AES is involved (which as you say it is a useless > use-case). It's meaningless because such devices operate at a rate much lower than the figures you give. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 6:43 ` Herbert Xu (?) @ 2011-09-01 6:54 ` Nikos Mavrogiannopoulos 2011-09-01 6:56 ` Herbert Xu -1 siblings, 1 reply; 29+ messages in thread From: Nikos Mavrogiannopoulos @ 2011-09-01 6:54 UTC (permalink / raw) To: Herbert Xu; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel On 09/01/2011 08:43 AM, Herbert Xu wrote: > On Thu, Sep 01, 2011 at 08:26:07AM +0200, Nikos Mavrogiannopoulos wrote: >> >> Actually this is the reason of the ecb(cipher-null) comparison. To >> emulate the case of a hardware offload device. I tried to make that >> clear in the text, but may not be. If you see AF_ALG performs really bad >> on that case. It performs better when a software or a padlock >> implementation of AES is involved (which as you say it is a useless >> use-case). > It's meaningless because such devices operate at a rate much > lower than the figures you give. Have you actually measured that? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 6:54 ` Nikos Mavrogiannopoulos @ 2011-09-01 6:56 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 6:56 UTC (permalink / raw) To: Nikos Mavrogiannopoulos; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 08:54:19AM +0200, Nikos Mavrogiannopoulos wrote: > > Have you actually measured that? Not against your cryptodev code-base. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 6:56 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 6:56 UTC (permalink / raw) To: Nikos Mavrogiannopoulos; +Cc: cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 08:54:19AM +0200, Nikos Mavrogiannopoulos wrote: > > Have you actually measured that? Not against your cryptodev code-base. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 2:15 ` Herbert Xu @ 2011-09-01 13:39 ` Phil Sutter -1 siblings, 0 replies; 29+ messages in thread From: Phil Sutter @ 2011-09-01 13:39 UTC (permalink / raw) To: Herbert Xu Cc: Nikos Mavrogiannopoulos, cryptodev-linux-devel, linux-crypto, linux-kernel Herbert, On Thu, Sep 01, 2011 at 12:15:34PM +1000, Herbert Xu wrote: > Nikos Mavrogiannopoulos <nmav@gnutls.org> wrote: > > > > Given my benchmarks have no issues, it is not apparent to me why one > > should use AF_ALG instead of cryptodev. I do not know though why AF_ALG > > performs so poor. I'd speculate by blaming it on the usage of the socket > > API and the number of system calls required. > > The target usage of AF_ALG is hardware offload devices that cannot > be directly used in user-space, not software crypto on implementations > such as AESNI/Padlock. > > Going through the kernel to use something like AESNI/Padlock or > software crypto is insane. > > Given the intended target case, your numbers are pretty much > meaningless as cryptodev's performance can be easily beaten > by a pure user-space implementation. I ran the benchmarks on my OpenRD Ultimate, an embedded device equipped with the Marvell Kirkwood SoC, which also contains the CESA crypto engine. Hopefully a less "insane" use-case, also from your point of view. Here are the results of the "fulltest", i.e. init, AES128 and deinit measured as a whole: chunksize af_alg cryptodev (100 * cryptodev / af_alg) -------------------------------------------------------------------------- 512 4.169 MB/s 7.113 MB/s 171 % 1024 7.904 MB/s 12.957 MB/s 164 % 2048 13.163 MB/s 19.683 MB/s 150 % 4096 20.218 MB/s 26.960 MB/s 133 % 8192 27.539 MB/s 34.373 MB/s 125 % 16384 33.730 MB/s 39.997 MB/s 119 % 32768 37.399 MB/s 42.727 MB/s 114 % 65536 40.004 MB/s 44.660 MB/s 112 % although I'm quite sure there's a reason why these values are meaningless as well, I would like to point out that cryptodev-linux has outperformed AF_ALG in every situation they have been compared so far. Greetings, Phil ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 13:39 ` Phil Sutter 0 siblings, 0 replies; 29+ messages in thread From: Phil Sutter @ 2011-09-01 13:39 UTC (permalink / raw) To: Herbert Xu Cc: Nikos Mavrogiannopoulos, cryptodev-linux-devel, linux-crypto, linux-kernel Herbert, On Thu, Sep 01, 2011 at 12:15:34PM +1000, Herbert Xu wrote: > Nikos Mavrogiannopoulos <nmav@gnutls.org> wrote: > > > > Given my benchmarks have no issues, it is not apparent to me why one > > should use AF_ALG instead of cryptodev. I do not know though why AF_ALG > > performs so poor. I'd speculate by blaming it on the usage of the socket > > API and the number of system calls required. > > The target usage of AF_ALG is hardware offload devices that cannot > be directly used in user-space, not software crypto on implementations > such as AESNI/Padlock. > > Going through the kernel to use something like AESNI/Padlock or > software crypto is insane. > > Given the intended target case, your numbers are pretty much > meaningless as cryptodev's performance can be easily beaten > by a pure user-space implementation. I ran the benchmarks on my OpenRD Ultimate, an embedded device equipped with the Marvell Kirkwood SoC, which also contains the CESA crypto engine. Hopefully a less "insane" use-case, also from your point of view. Here are the results of the "fulltest", i.e. init, AES128 and deinit measured as a whole: chunksize af_alg cryptodev (100 * cryptodev / af_alg) -------------------------------------------------------------------------- 512 4.169 MB/s 7.113 MB/s 171 % 1024 7.904 MB/s 12.957 MB/s 164 % 2048 13.163 MB/s 19.683 MB/s 150 % 4096 20.218 MB/s 26.960 MB/s 133 % 8192 27.539 MB/s 34.373 MB/s 125 % 16384 33.730 MB/s 39.997 MB/s 119 % 32768 37.399 MB/s 42.727 MB/s 114 % 65536 40.004 MB/s 44.660 MB/s 112 % although I'm quite sure there's a reason why these values are meaningless as well, I would like to point out that cryptodev-linux has outperformed AF_ALG in every situation they have been compared so far. Greetings, Phil ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 13:39 ` Phil Sutter @ 2011-09-01 14:14 ` Herbert Xu -1 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 14:14 UTC (permalink / raw) To: Phil Sutter; +Cc: nmav, cryptodev-linux-devel, linux-crypto, linux-kernel Phil Sutter <phil@nwl.cc> wrote: > > chunksize af_alg cryptodev (100 * cryptodev / af_alg) > -------------------------------------------------------------------------- > 512 4.169 MB/s 7.113 MB/s 171 % > 1024 7.904 MB/s 12.957 MB/s 164 % > 2048 13.163 MB/s 19.683 MB/s 150 % > 4096 20.218 MB/s 26.960 MB/s 133 % > 8192 27.539 MB/s 34.373 MB/s 125 % > 16384 33.730 MB/s 39.997 MB/s 119 % > 32768 37.399 MB/s 42.727 MB/s 114 % > 65536 40.004 MB/s 44.660 MB/s 112 % Are you maxing out your submission CPU? If not then you're testing the latency of the interface, as opposed to the throughput. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 14:14 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 14:14 UTC (permalink / raw) To: Phil Sutter; +Cc: nmav, cryptodev-linux-devel, linux-crypto, linux-kernel Phil Sutter <phil@nwl.cc> wrote: > > chunksize af_alg cryptodev (100 * cryptodev / af_alg) > -------------------------------------------------------------------------- > 512 4.169 MB/s 7.113 MB/s 171 % > 1024 7.904 MB/s 12.957 MB/s 164 % > 2048 13.163 MB/s 19.683 MB/s 150 % > 4096 20.218 MB/s 26.960 MB/s 133 % > 8192 27.539 MB/s 34.373 MB/s 125 % > 16384 33.730 MB/s 39.997 MB/s 119 % > 32768 37.399 MB/s 42.727 MB/s 114 % > 65536 40.004 MB/s 44.660 MB/s 112 % Are you maxing out your submission CPU? If not then you're testing the latency of the interface, as opposed to the throughput. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 14:14 ` Herbert Xu (?) @ 2011-09-01 14:56 ` Nikos Mavrogiannopoulos 2011-09-01 14:59 ` Herbert Xu -1 siblings, 1 reply; 29+ messages in thread From: Nikos Mavrogiannopoulos @ 2011-09-01 14:56 UTC (permalink / raw) To: Herbert Xu; +Cc: Phil Sutter, cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 1, 2011 at 4:14 PM, Herbert Xu <herbert@gondor.hengli.com.au> wrote: > Are you maxing out your submission CPU? If not then you're testing > the latency of the interface, as opposed to the throughput. I think it is obvious that a benchmark of throughput measures throughput. If however, you think that AF_ALG is in disadvantage in this benchmark, because it is a high latency interface, you're free to propose and perform another one. I haven't seen anywhere how is this interface was supposed to be used, nor about its qualities (high latency, maybe(?) high throughput or so). Thus, I designed this benchmark with a use-case in mind, i.e., a TLS or DTLS tunnel executing in a system with such an accelerator. There might be other benchmarks with other use cases in mind, but I haven't seen any. regards, Nikos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 14:56 ` Nikos Mavrogiannopoulos @ 2011-09-01 14:59 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 14:59 UTC (permalink / raw) To: Nikos Mavrogiannopoulos Cc: Phil Sutter, cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 04:56:56PM +0200, Nikos Mavrogiannopoulos wrote: > On Thu, Sep 1, 2011 at 4:14 PM, Herbert Xu <herbert@gondor.hengli.com.au> wrote: > > > Are you maxing out your submission CPU? If not then you're testing > > the latency of the interface, as opposed to the throughput. > > I think it is obvious that a benchmark of throughput measures > throughput. If however, you think that AF_ALG is in disadvantage in > this benchmark, because it is a high latency interface, you're free to > propose and perform another one. I haven't seen anywhere how is this > interface was supposed to be used, nor about its qualities (high > latency, maybe(?) high throughput or so). Thus, I designed this > benchmark with a use-case in mind, i.e., a TLS or DTLS tunnel > executing in a system with such an accelerator. There might be other > benchmarks with other use cases in mind, but I haven't seen any. Putting TLS data-path in user-space is always going to be less than optimal, especially with hardware crypto offload, since you'll be crossing the user-space/kernel boundary multiple times. The data-path should reside in the kernel so as to avoid that. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 14:59 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 14:59 UTC (permalink / raw) To: Nikos Mavrogiannopoulos Cc: Phil Sutter, cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 04:56:56PM +0200, Nikos Mavrogiannopoulos wrote: > On Thu, Sep 1, 2011 at 4:14 PM, Herbert Xu <herbert@gondor.hengli.com.au> wrote: > > > Are you maxing out your submission CPU? If not then you're testing > > the latency of the interface, as opposed to the throughput. > > I think it is obvious that a benchmark of throughput measures > throughput. If however, you think that AF_ALG is in disadvantage in > this benchmark, because it is a high latency interface, you're free to > propose and perform another one. I haven't seen anywhere how is this > interface was supposed to be used, nor about its qualities (high > latency, maybe(?) high throughput or so). Thus, I designed this > benchmark with a use-case in mind, i.e., a TLS or DTLS tunnel > executing in a system with such an accelerator. There might be other > benchmarks with other use cases in mind, but I haven't seen any. Putting TLS data-path in user-space is always going to be less than optimal, especially with hardware crypto offload, since you'll be crossing the user-space/kernel boundary multiple times. The data-path should reside in the kernel so as to avoid that. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 14:59 ` Herbert Xu (?) @ 2011-09-01 15:06 ` Nikos Mavrogiannopoulos 2011-09-01 15:08 ` Herbert Xu 2011-09-01 15:32 ` David Miller -1 siblings, 2 replies; 29+ messages in thread From: Nikos Mavrogiannopoulos @ 2011-09-01 15:06 UTC (permalink / raw) To: Herbert Xu; +Cc: Phil Sutter, cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 1, 2011 at 4:59 PM, Herbert Xu <herbert@gondor.hengli.com.au> wrote: >> latency, maybe(?) high throughput or so). Thus, I designed this >> benchmark with a use-case in mind, i.e., a TLS or DTLS tunnel >> executing in a system with such an accelerator. There might be other >> benchmarks with other use cases in mind, but I haven't seen any. > Putting TLS data-path in user-space is always going to be less > than optimal, especially with hardware crypto offload, since you'll > be crossing the user-space/kernel boundary multiple times. Indeed but today that's what we have in some systems. User-space TLS implementations (GnuTLS and OpenSSL) and kernel-space crypto offloading. The purpose of the /dev/crypto and AF_ALG interfaces is to connect those together. It would be interesting to have a partial kernel-space TLS implementation but I don't know whether such a thing could ever make it to kernel. regards, Nikos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 15:06 ` Nikos Mavrogiannopoulos @ 2011-09-01 15:08 ` Herbert Xu 2011-09-01 15:32 ` David Miller 1 sibling, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 15:08 UTC (permalink / raw) To: Nikos Mavrogiannopoulos Cc: Phil Sutter, cryptodev-linux-devel, linux-crypto, linux-kernel, netdev On Thu, Sep 01, 2011 at 05:06:06PM +0200, Nikos Mavrogiannopoulos wrote: > > Indeed but today that's what we have in some systems. User-space TLS > implementations (GnuTLS and OpenSSL) and kernel-space crypto > offloading. The purpose of the /dev/crypto and AF_ALG interfaces is to > connect those together. It would be interesting to have a partial > kernel-space TLS implementation but I don't know whether such a thing > could ever make it to kernel. Well we've talked about a kernel implementation of the data path previously and I don't think there is any opposition to the idea. The only thing missing is an implementation. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 15:08 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 15:08 UTC (permalink / raw) To: Nikos Mavrogiannopoulos Cc: Phil Sutter, cryptodev-linux-devel, linux-crypto, linux-kernel, netdev On Thu, Sep 01, 2011 at 05:06:06PM +0200, Nikos Mavrogiannopoulos wrote: > > Indeed but today that's what we have in some systems. User-space TLS > implementations (GnuTLS and OpenSSL) and kernel-space crypto > offloading. The purpose of the /dev/crypto and AF_ALG interfaces is to > connect those together. It would be interesting to have a partial > kernel-space TLS implementation but I don't know whether such a thing > could ever make it to kernel. Well we've talked about a kernel implementation of the data path previously and I don't think there is any opposition to the idea. The only thing missing is an implementation. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 15:06 ` Nikos Mavrogiannopoulos 2011-09-01 15:08 ` Herbert Xu @ 2011-09-01 15:32 ` David Miller 2011-09-01 16:19 ` Nikos Mavrogiannopoulos 1 sibling, 1 reply; 29+ messages in thread From: David Miller @ 2011-09-01 15:32 UTC (permalink / raw) To: nmav; +Cc: herbert, phil, cryptodev-linux-devel, linux-crypto, linux-kernel From: Nikos Mavrogiannopoulos <nmav@gnutls.org> Date: Thu, 1 Sep 2011 17:06:06 +0200 > It would be interesting to have a partial kernel-space TLS > implementation but I don't know whether such a thing could ever make > it to kernel. Herbert and I have discussed this several times and we plan on implementing this at some point. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 15:32 ` David Miller @ 2011-09-01 16:19 ` Nikos Mavrogiannopoulos 0 siblings, 0 replies; 29+ messages in thread From: Nikos Mavrogiannopoulos @ 2011-09-01 16:19 UTC (permalink / raw) To: David Miller Cc: herbert, phil, cryptodev-linux-devel, linux-crypto, linux-kernel On 09/01/2011 05:32 PM, David Miller wrote: > From: Nikos Mavrogiannopoulos<nmav@gnutls.org> > Date: Thu, 1 Sep 2011 17:06:06 +0200 > >> It would be interesting to have a partial kernel-space TLS >> implementation but I don't know whether such a thing could ever make >> it to kernel. > Herbert and I have discussed this several times and we plan on > implementing this at some point. The problem is that TLS is not a universal thing. There is still SSH, kerberos, openvpn (as far as I remember it is a custom protocol), etc. It makes sense to have something to apply broadly, especially when it is in the Linux kernel. Currently have a device such as /dev/crypto looks like a good compromise. regards, Nikos ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 14:14 ` Herbert Xu @ 2011-09-01 15:09 ` Phil Sutter -1 siblings, 0 replies; 29+ messages in thread From: Phil Sutter @ 2011-09-01 15:09 UTC (permalink / raw) To: Herbert Xu; +Cc: nmav, cryptodev-linux-devel, linux-crypto, linux-kernel Herbert, On Thu, Sep 01, 2011 at 10:14:45PM +0800, Herbert Xu wrote: > Phil Sutter <phil@nwl.cc> wrote: > > > > chunksize af_alg cryptodev (100 * cryptodev / af_alg) > > -------------------------------------------------------------------------- > > 512 4.169 MB/s 7.113 MB/s 171 % > > 1024 7.904 MB/s 12.957 MB/s 164 % > > 2048 13.163 MB/s 19.683 MB/s 150 % > > 4096 20.218 MB/s 26.960 MB/s 133 % > > 8192 27.539 MB/s 34.373 MB/s 125 % > > 16384 33.730 MB/s 39.997 MB/s 119 % > > 32768 37.399 MB/s 42.727 MB/s 114 % > > 65536 40.004 MB/s 44.660 MB/s 112 % > > Are you maxing out your submission CPU? If not then you're testing > the latency of the interface, as opposed to the throughput. Good point. So in order to also test the throughput, I've put my OpenRD under load: | stress -c 2 -i 2 -m 2 --vm-bytes 64MB and ran the tests again: chunksize af_alg cryptodev (100 * cryptodev / af_alg) -------------------------------------------------------------------------- 512 0.618 MB/s 1.14 MB/s 184 % 1024 1.258 MB/s 2.28 MB/s 181 % 2048 2.453 MB/s 4.39 MB/s 179 % 4096 4.540 MB/s 7.76 MB/s 171 % 8192 7.981 MB/s 11.67 MB/s 146 % 16384 12.543 MB/s 14.08 MB/s 112 % 32768 13.139 MB/s 14.46 MB/s 110 % 65536 14.254 MB/s 15.55 MB/s 109 % So that means cryptodev-linux is superior in throughput as well as latency, right? Or is it the lower latency of the interface causing the higher throughput? Greetings, Phil ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 15:09 ` Phil Sutter 0 siblings, 0 replies; 29+ messages in thread From: Phil Sutter @ 2011-09-01 15:09 UTC (permalink / raw) To: Herbert Xu; +Cc: nmav, cryptodev-linux-devel, linux-crypto, linux-kernel Herbert, On Thu, Sep 01, 2011 at 10:14:45PM +0800, Herbert Xu wrote: > Phil Sutter <phil@nwl.cc> wrote: > > > > chunksize af_alg cryptodev (100 * cryptodev / af_alg) > > -------------------------------------------------------------------------- > > 512 4.169 MB/s 7.113 MB/s 171 % > > 1024 7.904 MB/s 12.957 MB/s 164 % > > 2048 13.163 MB/s 19.683 MB/s 150 % > > 4096 20.218 MB/s 26.960 MB/s 133 % > > 8192 27.539 MB/s 34.373 MB/s 125 % > > 16384 33.730 MB/s 39.997 MB/s 119 % > > 32768 37.399 MB/s 42.727 MB/s 114 % > > 65536 40.004 MB/s 44.660 MB/s 112 % > > Are you maxing out your submission CPU? If not then you're testing > the latency of the interface, as opposed to the throughput. Good point. So in order to also test the throughput, I've put my OpenRD under load: | stress -c 2 -i 2 -m 2 --vm-bytes 64MB and ran the tests again: chunksize af_alg cryptodev (100 * cryptodev / af_alg) -------------------------------------------------------------------------- 512 0.618 MB/s 1.14 MB/s 184 % 1024 1.258 MB/s 2.28 MB/s 181 % 2048 2.453 MB/s 4.39 MB/s 179 % 4096 4.540 MB/s 7.76 MB/s 171 % 8192 7.981 MB/s 11.67 MB/s 146 % 16384 12.543 MB/s 14.08 MB/s 112 % 32768 13.139 MB/s 14.46 MB/s 110 % 65536 14.254 MB/s 15.55 MB/s 109 % So that means cryptodev-linux is superior in throughput as well as latency, right? Or is it the lower latency of the interface causing the higher throughput? Greetings, Phil ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto 2011-09-01 15:09 ` Phil Sutter @ 2011-09-01 15:13 ` Herbert Xu -1 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 15:13 UTC (permalink / raw) To: nmav, cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 05:09:28PM +0200, Phil Sutter wrote: > > Good point. So in order to also test the throughput, I've put my OpenRD > under load: No that's not what I meant. You're pushing a request to an async device and waiting for a response to come back before pushing the next request. In order to maximise throughput, you need to issue your requests without waiting for the responses synchronously. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: comparison of the AF_ALG interface with the /dev/crypto @ 2011-09-01 15:13 ` Herbert Xu 0 siblings, 0 replies; 29+ messages in thread From: Herbert Xu @ 2011-09-01 15:13 UTC (permalink / raw) To: nmav, cryptodev-linux-devel, linux-crypto, linux-kernel On Thu, Sep 01, 2011 at 05:09:28PM +0200, Phil Sutter wrote: > > Good point. So in order to also test the throughput, I've put my OpenRD > under load: No that's not what I meant. You're pushing a request to an async device and waiting for a response to come back before pushing the next request. In order to maximise throughput, you need to issue your requests without waiting for the responses synchronously. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-09-01 16:19 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-08-28 13:17 comparison of the AF_ALG interface with the /dev/crypto Nikos Mavrogiannopoulos 2011-08-28 20:35 ` David Miller 2011-08-29 7:32 ` Nikos Mavrogiannopoulos 2011-08-29 16:09 ` David Miller 2011-08-30 16:33 ` [Cryptodev-linux-devel] " Phil Sutter 2011-09-01 2:15 ` Herbert Xu 2011-09-01 2:15 ` Herbert Xu 2011-09-01 6:26 ` Nikos Mavrogiannopoulos 2011-09-01 6:43 ` Herbert Xu 2011-09-01 6:43 ` Herbert Xu 2011-09-01 6:54 ` Nikos Mavrogiannopoulos 2011-09-01 6:56 ` Herbert Xu 2011-09-01 6:56 ` Herbert Xu 2011-09-01 13:39 ` Phil Sutter 2011-09-01 13:39 ` Phil Sutter 2011-09-01 14:14 ` Herbert Xu 2011-09-01 14:14 ` Herbert Xu 2011-09-01 14:56 ` Nikos Mavrogiannopoulos 2011-09-01 14:59 ` Herbert Xu 2011-09-01 14:59 ` Herbert Xu 2011-09-01 15:06 ` Nikos Mavrogiannopoulos 2011-09-01 15:08 ` Herbert Xu 2011-09-01 15:08 ` Herbert Xu 2011-09-01 15:32 ` David Miller 2011-09-01 16:19 ` Nikos Mavrogiannopoulos 2011-09-01 15:09 ` Phil Sutter 2011-09-01 15:09 ` Phil Sutter 2011-09-01 15:13 ` Herbert Xu 2011-09-01 15:13 ` Herbert Xu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.