Hi, On 8 March 2017 at 13:49, Binoy Jayan wrote: > Hi Gilad, > >> I gave it a spin on a x86_64 with 8 CPUs with AES-NI using cryptd and >> on Arm using CryptoCell hardware accelerator. >> >> There was no difference in performance between 512 and 4096 bytes >> cluster size on the x86_64 (800 MB loop file system) >> >> There was an improvement in latency of 3.2% between 512 and 4096 bytes >> cluster size on the Arm. I expect the performance benefits for this >> test for Binoy's patch to be the same. >> >> In both cases the very naive test was a simple dd with block size of >> 4096 bytes or the raw block device. >> >> I do not know what effect having a bigger cluster size would have on >> have on other more complex file system operations. >> Is there any specific benchmark worth testing with? The multiple instances issue in /proc/crypto is fixed. It was because of the IV code itself modifying the algorithm name inadvertently in the global crypto algorithm lookup table when it was splitting up "plain(cbc(aes))" into "plain" and "cbc(aes)" so as to invoke the child algorithm. I ran a few tests with dd, bonnie and FIO under Qemu - x86 using the automated script [1] that I wrote to make the testing easy. The tests were done on software implementations of the algorithms as the real hardware was not available with me. According to the test, I found that the sequential reads and writes have a good improvement (5.7 %) in the data rate with the proposed solution while the random reads shows a very little improvement. When tested with FIO, the random writes also shows a small improvement (2.2%) but the random reads show a little deterioration in performance (4 %). When tested in arm hardware, only the sequential writes with bonnie shows improvement (5.6%). All other tests shows degraded performance in the absence of crypto hardware. [1] https://github.com/binoyjayan/utilities/blob/master/utils/dmtest Dependencies: dd [Full version], bonnie, fio Thanks, Binoy