Linux-Crypto Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v5 0/3] crypto: qce driver fixes for gcm
@ 2020-02-07 15:02 Eneas U de Queiroz
  2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eneas U de Queiroz @ 2020-02-07 15:02 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu, David S. Miller
  Cc: Ard Biesheuvel, Eneas U de Queiroz

I've made enough mistakes in this series, I'll just start over.  It's
been hard for me not to be able to run test this in master, and have to
go back and forth between it and 4.19; that's why I have messed up so
many times.  I apologize for the noise again.

If you've read the cover letter from v1 and v2, there's not anything too
relevant that I'm changing here.

---

I've finally managed to get gcm(aes) working with the qce crypto engine.

These first patch fixes a bug where the gcm authentication tag was being
overwritten during gcm decryption, because it was passed in the same sgl
buffer as the crypto payload.  The qce driver appends some private state
buffer to the request destination sgl, but it was not checking the
length of the sgl being passed.

The second patch works around a problem, which I frankly can't pinpoint
what exactly is the cause, but after some help from Ard Biesheuvel, I
think it is related to DMA.  When gcm sends a request in
crypto_gcm_setkey, it stores the hash (the crypto payload) and the iv in
the same data struct.  When the driver updates the IV, then the payload
gets overwritten with the unencrypted data, or all zeroes, it may be a
coincidence.

However, it works if I pass the request down to the fallback driver--it
is used by the driver to accept 192-bit-key requests.  All I had to do
was setup the fallback regardless of key size, and then check the
payload length along with the keysize to pass the request to the
fallback.  This turns out to enhance performance, because of the
avoided latency that comes with using the hardware.

I've started with checking for a single 16-byte AES block, and that is
enough to make gcm work.  Next thing I've done was to tune the request
size for performance.  What got me started into looking at the qce
driver was reports of it being detrimental to VPN speed, by the way.
I've tested this win an Asus RT-AC58U, but the slow VPN reports[1] have
more devices affected.  Access to the device was kindly provided by
@simsasss.

I've added a 768-byte block size to tcrypt to get some measurements to
come up with an optimal threshold to transition from software to
hardware, and encountered another bug in the qce driver: it apparently
cannot handle aes-xts requests that are greater than 512 bytes, but not
a multiple of it.  It failed with 768, 1280; XTS is usually used with a
512-byte sector (or a multiple of it), so I'm concluding that is the
cause of failure.

With that fixed, I added a module parameter to set the maximum request
size that will be handled by the software fallback cipher and made some
speed measurements using tcrypt to come up with an optimum value.

I've documented this briefly in the parameter description, pointing out
that gcm will not work if you set it to 0, and in better detail in the
Kconfig help.

TLDR: In the worst (where the hardware is slowest) case, hardware and
software speed match at around 768 bytes, but I lowered the threshold to
512 to benefit the CPU offload.

Here's a sample comparing three runs, using the proposed driver, varying
the aes_sw_max_len parameter: 1st run will always use fallback, second
run will use the default fallback for len <= 512, and third run will
never use the fallback.

testing speed of async cbc(aes) (cbc-aes-qce) encryption
------------------      ----------   ----------    ----------
aes_sw_max_len              32,768          512             0
------------------      ----------   ----------    ----------
128 bit   16 bytes       8,081,136    5,614,448       430,416
128 bit   64 bytes      13,152,768   13,205,952     1,745,088
128 bit  256 bytes      16,094,464   16,101,120     6,969,600
128 bit  512 bytes      16,701,440   16,705,024    12,866,048
128 bit  768 bytes      16,883,712   13,192,704    15,186,432
128 bit 1024 bytes      17,036,288   17,149,952    19,716,096
128 bit 2048 bytes      17,108,992   30,842,880    32,868,352
128 bit 4096 bytes      17,203,200   44,929,024    49,655,808
128 bit 8192 bytes      17,219,584   58,966,016    74,186,752
256 bit   16 bytes       6,962,432    1,943,616       419,088
256 bit   64 bytes      10,485,568   10,421,952     1,681,536
256 bit  256 bytes      12,211,712   12,160,000     6,701,312
256 bit  512 bytes      12,499,456   12,584,448     9,882,112
256 bit  768 bytes      12,622,080   12,550,656    14,701,824
256 bit 1024 bytes      12,750,848   16,079,872    19,585,024
256 bit 2048 bytes      12,812,288   28,293,120    27,693,056
256 bit 4096 bytes      12,939,264   34,234,368    44,142,592
256 bit 8192 bytes      12,845,056   50,274,304    63,520,768

The numbers vary from run to run, sometimes greatly.

I've tried running the same tests with the arm-neon drivers, but the
results don't change with any cipher mode, so I'm assuming the fallback
is always aes-generic.

I've made the measurements using an Asus RT-AC58U only, so I don't know
how other hardware performs, but the user can always override the
parameter, or even its default value.

[1] https://forum.openwrt.org/t/ipsec-performance-issue/39690

Eneas U de Queiroz (3):
  crypto: qce - use cryptlen when adding extra sgl
  crypto: qce - use AES fallback for small requests
  crypto: qce - handle AES-XTS cases that qce fails

 drivers/crypto/Kconfig        | 23 +++++++++++++++++++++++
 drivers/crypto/qce/common.c   |  2 --
 drivers/crypto/qce/common.h   |  3 +++
 drivers/crypto/qce/dma.c      | 11 ++++++-----
 drivers/crypto/qce/dma.h      |  2 +-
 drivers/crypto/qce/skcipher.c | 30 ++++++++++++++++++++----------
 6 files changed, 53 insertions(+), 18 deletions(-)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl
  2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
@ 2020-02-07 15:02 ` Eneas U de Queiroz
  2020-02-07 15:02 ` [PATCH v5 2/3] crypto: qce - use AES fallback for small requests Eneas U de Queiroz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eneas U de Queiroz @ 2020-02-07 15:02 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu, David S. Miller
  Cc: Ard Biesheuvel, Eneas U de Queiroz

The qce crypto driver appends an extra entry to the dst sgl, to maintain
private state information.

When the gcm driver sends requests to the ctr skcipher, it passes the
authentication tag after the actual crypto payload, but it must not be
touched.

Commit 1336c2221bee ("crypto: qce - save a sg table slot for result
buf") limited the destination sgl to avoid overwriting the
authentication tag but it assumed the tag would be in a separate sgl
entry.

This is not always the case, so it is better to limit the length of the
destination buffer to req->cryptlen before appending the result buf.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>

--
v1 -> v5: no change

diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
index 7da893dc00e7..46db5bf366b4 100644
--- a/drivers/crypto/qce/dma.c
+++ b/drivers/crypto/qce/dma.c
@@ -48,9 +48,10 @@ void qce_dma_release(struct qce_dma_data *dma)
 
 struct scatterlist *
 qce_sgtable_add(struct sg_table *sgt, struct scatterlist *new_sgl,
-		int max_ents)
+		unsigned int max_len)
 {
 	struct scatterlist *sg = sgt->sgl, *sg_last = NULL;
+	unsigned int new_len;
 
 	while (sg) {
 		if (!sg_page(sg))
@@ -61,13 +62,13 @@ qce_sgtable_add(struct sg_table *sgt, struct scatterlist *new_sgl,
 	if (!sg)
 		return ERR_PTR(-EINVAL);
 
-	while (new_sgl && sg && max_ents) {
-		sg_set_page(sg, sg_page(new_sgl), new_sgl->length,
-			    new_sgl->offset);
+	while (new_sgl && sg && max_len) {
+		new_len = new_sgl->length > max_len ? max_len : new_sgl->length;
+		sg_set_page(sg, sg_page(new_sgl), new_len, new_sgl->offset);
 		sg_last = sg;
 		sg = sg_next(sg);
 		new_sgl = sg_next(new_sgl);
-		max_ents--;
+		max_len -= new_len;
 	}
 
 	return sg_last;
diff --git a/drivers/crypto/qce/dma.h b/drivers/crypto/qce/dma.h
index ed25a0d9829e..786402169360 100644
--- a/drivers/crypto/qce/dma.h
+++ b/drivers/crypto/qce/dma.h
@@ -43,6 +43,6 @@ void qce_dma_issue_pending(struct qce_dma_data *dma);
 int qce_dma_terminate_all(struct qce_dma_data *dma);
 struct scatterlist *
 qce_sgtable_add(struct sg_table *sgt, struct scatterlist *sg_add,
-		int max_ents);
+		unsigned int max_len);
 
 #endif /* _DMA_H_ */
diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index 4217b745f124..63ae75809cb7 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -97,13 +97,14 @@ qce_skcipher_async_req_handle(struct crypto_async_request *async_req)
 
 	sg_init_one(&rctx->result_sg, qce->dma.result_buf, QCE_RESULT_BUF_SZ);
 
-	sg = qce_sgtable_add(&rctx->dst_tbl, req->dst, rctx->dst_nents - 1);
+	sg = qce_sgtable_add(&rctx->dst_tbl, req->dst, req->cryptlen);
 	if (IS_ERR(sg)) {
 		ret = PTR_ERR(sg);
 		goto error_free;
 	}
 
-	sg = qce_sgtable_add(&rctx->dst_tbl, &rctx->result_sg, 1);
+	sg = qce_sgtable_add(&rctx->dst_tbl, &rctx->result_sg,
+			     QCE_RESULT_BUF_SZ);
 	if (IS_ERR(sg)) {
 		ret = PTR_ERR(sg);
 		goto error_free;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 2/3] crypto: qce - use AES fallback for small requests
  2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
  2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
@ 2020-02-07 15:02 ` Eneas U de Queiroz
  2020-02-07 15:02 ` [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails Eneas U de Queiroz
  2020-02-13  9:26 ` [PATCH v5 0/3] crypto: qce driver fixes for gcm Herbert Xu
  3 siblings, 0 replies; 5+ messages in thread
From: Eneas U de Queiroz @ 2020-02-07 15:02 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu, David S. Miller
  Cc: Ard Biesheuvel, Eneas U de Queiroz, kbuild test robot

Process small blocks using the fallback cipher, as a workaround for an
observed failure (DMA-related, apparently) when computing the GCM ghash
key.  This brings a speed gain as well, since it avoids the latency of
using the hardware engine to process small blocks.

Using software for all 16-byte requests would be enough to make GCM
work, but to increase performance, a larger threshold would be better.
Measuring the performance of supported ciphers with openssl speed,
software matches hardware at around 768-1024 bytes.

Considering the 256-bit ciphers, software is 2-3 times faster than qce
at 256-bytes, 30% faster at 512, and about even at 768-bytes.  With
128-bit keys, the break-even point would be around 1024-bytes.

This adds the 'aes_sw_max_len' parameter, to set the largest request
length processed by the software fallback.  Its default is being set to
512 bytes, a little lower than the break-even point, to balance the cost
in CPU usage.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>

--
v4 -> v5:
Fixed parentheses around '&&' within '||'
Reported-by: kbuild test robot <lkp@intel.com>

v3 -> v4:
Corrected a missing 'static' declaration of aes_sw_max_len

v2 -> v3:
Corrected style issues pointed out by checkpatch.pl

v1 -> v2:
Changed the threshold from a fixed number to a module parameter

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index c2767ed54dfe..052d3ff7fb20 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -685,6 +685,29 @@ choice
 
 endchoice
 
+config CRYPTO_DEV_QCE_SW_MAX_LEN
+	int "Default maximum request size to use software for AES"
+	depends on CRYPTO_DEV_QCE && CRYPTO_DEV_QCE_SKCIPHER
+	default 512
+	help
+	  This sets the default maximum request size to perform AES requests
+	  using software instead of the crypto engine.  It can be changed by
+	  setting the aes_sw_max_len parameter.
+
+	  Small blocks are processed faster in software than hardware.
+	  Considering the 256-bit ciphers, software is 2-3 times faster than
+	  qce at 256-bytes, 30% faster at 512, and about even at 768-bytes.
+	  With 128-bit keys, the break-even point would be around 1024-bytes.
+
+	  The default is set a little lower, to 512 bytes, to balance the
+	  cost in CPU usage.  The minimum recommended setting is 16-bytes
+	  (1 AES block), since AES-GCM will fail if you set it lower.
+	  Setting this to zero will send all requests to the hardware.
+
+	  Note that 192-bit keys are not supported by the hardware and are
+	  always processed by the software fallback, and all DES requests
+	  are done by the hardware.
+
 config CRYPTO_DEV_QCOM_RNG
 	tristate "Qualcomm Random Number Generator Driver"
 	depends on ARCH_QCOM || COMPILE_TEST
diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index 63ae75809cb7..fc7c940b5a43 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -5,6 +5,7 @@
 
 #include <linux/device.h>
 #include <linux/interrupt.h>
+#include <linux/moduleparam.h>
 #include <linux/types.h>
 #include <crypto/aes.h>
 #include <crypto/internal/des.h>
@@ -12,6 +13,13 @@
 
 #include "cipher.h"
 
+static unsigned int aes_sw_max_len = CONFIG_CRYPTO_DEV_QCE_SW_MAX_LEN;
+module_param(aes_sw_max_len, uint, 0644);
+MODULE_PARM_DESC(aes_sw_max_len,
+		 "Only use hardware for AES requests larger than this "
+		 "[0=always use hardware; anything <16 breaks AES-GCM; default="
+		 __stringify(CONFIG_CRYPTO_DEV_QCE_SOFT_THRESHOLD)"]");
+
 static LIST_HEAD(skcipher_algs);
 
 static void qce_skcipher_done(void *data)
@@ -166,15 +174,10 @@ static int qce_skcipher_setkey(struct crypto_skcipher *ablk, const u8 *key,
 	switch (IS_XTS(flags) ? keylen >> 1 : keylen) {
 	case AES_KEYSIZE_128:
 	case AES_KEYSIZE_256:
+		memcpy(ctx->enc_key, key, keylen);
 		break;
-	default:
-		goto fallback;
 	}
 
-	ctx->enc_keylen = keylen;
-	memcpy(ctx->enc_key, key, keylen);
-	return 0;
-fallback:
 	ret = crypto_sync_skcipher_setkey(ctx->fallback, key, keylen);
 	if (!ret)
 		ctx->enc_keylen = keylen;
@@ -224,8 +227,9 @@ static int qce_skcipher_crypt(struct skcipher_request *req, int encrypt)
 	rctx->flags |= encrypt ? QCE_ENCRYPT : QCE_DECRYPT;
 	keylen = IS_XTS(rctx->flags) ? ctx->enc_keylen >> 1 : ctx->enc_keylen;
 
-	if (IS_AES(rctx->flags) && keylen != AES_KEYSIZE_128 &&
-	    keylen != AES_KEYSIZE_256) {
+	if (IS_AES(rctx->flags) &&
+	    ((keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256) ||
+	     req->cryptlen <= aes_sw_max_len)) {
 		SYNC_SKCIPHER_REQUEST_ON_STACK(subreq, ctx->fallback);
 
 		skcipher_request_set_sync_tfm(subreq, ctx->fallback);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails
  2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
  2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
  2020-02-07 15:02 ` [PATCH v5 2/3] crypto: qce - use AES fallback for small requests Eneas U de Queiroz
@ 2020-02-07 15:02 ` Eneas U de Queiroz
  2020-02-13  9:26 ` [PATCH v5 0/3] crypto: qce driver fixes for gcm Herbert Xu
  3 siblings, 0 replies; 5+ messages in thread
From: Eneas U de Queiroz @ 2020-02-07 15:02 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu, David S. Miller
  Cc: Ard Biesheuvel, Eneas U de Queiroz

QCE hangs when presented with an AES-XTS request whose length is larger
than QCE_SECTOR_SIZE (512-bytes), and is not a multiple of it.  Let the
fallback cipher handle them.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>

--
v4 -> v5
Adapted to [v5 2/3] paretheses change

v3 -> v4
No change

v2 -> v3
Corrected style issues pointed out by checkpatch.pl

v1 -> v2
Patch was first added to the series

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index 629e7f34dc09..5006e74c40cd 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -15,8 +15,6 @@
 #include "regs-v5.h"
 #include "sha.h"
 
-#define QCE_SECTOR_SIZE		512
-
 static inline u32 qce_read(struct qce_device *qce, u32 offset)
 {
 	return readl(qce->base + offset);
diff --git a/drivers/crypto/qce/common.h b/drivers/crypto/qce/common.h
index 282d4317470d..9f989cba0f1b 100644
--- a/drivers/crypto/qce/common.h
+++ b/drivers/crypto/qce/common.h
@@ -12,6 +12,9 @@
 #include <crypto/hash.h>
 #include <crypto/internal/skcipher.h>
 
+/* xts du size */
+#define QCE_SECTOR_SIZE			512
+
 /* key size in bytes */
 #define QCE_SHA_HMAC_KEY_SIZE		64
 #define QCE_MAX_CIPHER_KEY_SIZE		AES_KEYSIZE_256
diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index fc7c940b5a43..a4f6ec1b64c7 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -227,9 +227,14 @@ static int qce_skcipher_crypt(struct skcipher_request *req, int encrypt)
 	rctx->flags |= encrypt ? QCE_ENCRYPT : QCE_DECRYPT;
 	keylen = IS_XTS(rctx->flags) ? ctx->enc_keylen >> 1 : ctx->enc_keylen;
 
+	/* qce is hanging when AES-XTS request len > QCE_SECTOR_SIZE and
+	 * is not a multiple of it; pass such requests to the fallback
+	 */
 	if (IS_AES(rctx->flags) &&
-	    ((keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256) ||
-	     req->cryptlen <= aes_sw_max_len)) {
+	    (((keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256) ||
+	      req->cryptlen <= aes_sw_max_len) ||
+	     (IS_XTS(rctx->flags) && req->cryptlen > QCE_SECTOR_SIZE &&
+	      req->cryptlen % QCE_SECTOR_SIZE))) {
 		SYNC_SKCIPHER_REQUEST_ON_STACK(subreq, ctx->fallback);
 
 		skcipher_request_set_sync_tfm(subreq, ctx->fallback);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 0/3] crypto: qce driver fixes for gcm
  2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
                   ` (2 preceding siblings ...)
  2020-02-07 15:02 ` [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails Eneas U de Queiroz
@ 2020-02-13  9:26 ` Herbert Xu
  3 siblings, 0 replies; 5+ messages in thread
From: Herbert Xu @ 2020-02-13  9:26 UTC (permalink / raw)
  To: Eneas U de Queiroz; +Cc: linux-crypto, David S. Miller, Ard Biesheuvel

On Fri, Feb 07, 2020 at 12:02:24PM -0300, Eneas U de Queiroz wrote:
> I've made enough mistakes in this series, I'll just start over.  It's
> been hard for me not to be able to run test this in master, and have to
> go back and forth between it and 4.19; that's why I have messed up so
> many times.  I apologize for the noise again.
> 
> If you've read the cover letter from v1 and v2, there's not anything too
> relevant that I'm changing here.
> 
> ---
> 
> I've finally managed to get gcm(aes) working with the qce crypto engine.
> 
> These first patch fixes a bug where the gcm authentication tag was being
> overwritten during gcm decryption, because it was passed in the same sgl
> buffer as the crypto payload.  The qce driver appends some private state
> buffer to the request destination sgl, but it was not checking the
> length of the sgl being passed.
> 
> The second patch works around a problem, which I frankly can't pinpoint
> what exactly is the cause, but after some help from Ard Biesheuvel, I
> think it is related to DMA.  When gcm sends a request in
> crypto_gcm_setkey, it stores the hash (the crypto payload) and the iv in
> the same data struct.  When the driver updates the IV, then the payload
> gets overwritten with the unencrypted data, or all zeroes, it may be a
> coincidence.
> 
> However, it works if I pass the request down to the fallback driver--it
> is used by the driver to accept 192-bit-key requests.  All I had to do
> was setup the fallback regardless of key size, and then check the
> payload length along with the keysize to pass the request to the
> fallback.  This turns out to enhance performance, because of the
> avoided latency that comes with using the hardware.
> 
> I've started with checking for a single 16-byte AES block, and that is
> enough to make gcm work.  Next thing I've done was to tune the request
> size for performance.  What got me started into looking at the qce
> driver was reports of it being detrimental to VPN speed, by the way.
> I've tested this win an Asus RT-AC58U, but the slow VPN reports[1] have
> more devices affected.  Access to the device was kindly provided by
> @simsasss.
> 
> I've added a 768-byte block size to tcrypt to get some measurements to
> come up with an optimal threshold to transition from software to
> hardware, and encountered another bug in the qce driver: it apparently
> cannot handle aes-xts requests that are greater than 512 bytes, but not
> a multiple of it.  It failed with 768, 1280; XTS is usually used with a
> 512-byte sector (or a multiple of it), so I'm concluding that is the
> cause of failure.
> 
> With that fixed, I added a module parameter to set the maximum request
> size that will be handled by the software fallback cipher and made some
> speed measurements using tcrypt to come up with an optimum value.
> 
> I've documented this briefly in the parameter description, pointing out
> that gcm will not work if you set it to 0, and in better detail in the
> Kconfig help.
> 
> TLDR: In the worst (where the hardware is slowest) case, hardware and
> software speed match at around 768 bytes, but I lowered the threshold to
> 512 to benefit the CPU offload.
> 
> Here's a sample comparing three runs, using the proposed driver, varying
> the aes_sw_max_len parameter: 1st run will always use fallback, second
> run will use the default fallback for len <= 512, and third run will
> never use the fallback.
> 
> testing speed of async cbc(aes) (cbc-aes-qce) encryption
> ------------------      ----------   ----------    ----------
> aes_sw_max_len              32,768          512             0
> ------------------      ----------   ----------    ----------
> 128 bit   16 bytes       8,081,136    5,614,448       430,416
> 128 bit   64 bytes      13,152,768   13,205,952     1,745,088
> 128 bit  256 bytes      16,094,464   16,101,120     6,969,600
> 128 bit  512 bytes      16,701,440   16,705,024    12,866,048
> 128 bit  768 bytes      16,883,712   13,192,704    15,186,432
> 128 bit 1024 bytes      17,036,288   17,149,952    19,716,096
> 128 bit 2048 bytes      17,108,992   30,842,880    32,868,352
> 128 bit 4096 bytes      17,203,200   44,929,024    49,655,808
> 128 bit 8192 bytes      17,219,584   58,966,016    74,186,752
> 256 bit   16 bytes       6,962,432    1,943,616       419,088
> 256 bit   64 bytes      10,485,568   10,421,952     1,681,536
> 256 bit  256 bytes      12,211,712   12,160,000     6,701,312
> 256 bit  512 bytes      12,499,456   12,584,448     9,882,112
> 256 bit  768 bytes      12,622,080   12,550,656    14,701,824
> 256 bit 1024 bytes      12,750,848   16,079,872    19,585,024
> 256 bit 2048 bytes      12,812,288   28,293,120    27,693,056
> 256 bit 4096 bytes      12,939,264   34,234,368    44,142,592
> 256 bit 8192 bytes      12,845,056   50,274,304    63,520,768
> 
> The numbers vary from run to run, sometimes greatly.
> 
> I've tried running the same tests with the arm-neon drivers, but the
> results don't change with any cipher mode, so I'm assuming the fallback
> is always aes-generic.
> 
> I've made the measurements using an Asus RT-AC58U only, so I don't know
> how other hardware performs, but the user can always override the
> parameter, or even its default value.
> 
> [1] https://forum.openwrt.org/t/ipsec-performance-issue/39690
> 
> Eneas U de Queiroz (3):
>   crypto: qce - use cryptlen when adding extra sgl
>   crypto: qce - use AES fallback for small requests
>   crypto: qce - handle AES-XTS cases that qce fails
> 
>  drivers/crypto/Kconfig        | 23 +++++++++++++++++++++++
>  drivers/crypto/qce/common.c   |  2 --
>  drivers/crypto/qce/common.h   |  3 +++
>  drivers/crypto/qce/dma.c      | 11 ++++++-----
>  drivers/crypto/qce/dma.h      |  2 +-
>  drivers/crypto/qce/skcipher.c | 30 ++++++++++++++++++++----------
>  6 files changed, 53 insertions(+), 18 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 2/3] crypto: qce - use AES fallback for small requests Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails Eneas U de Queiroz
2020-02-13  9:26 ` [PATCH v5 0/3] crypto: qce driver fixes for gcm Herbert Xu

Linux-Crypto Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-crypto/0 linux-crypto/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-crypto linux-crypto/ https://lore.kernel.org/linux-crypto \
		linux-crypto@vger.kernel.org
	public-inbox-index linux-crypto

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-crypto


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git