linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes
@ 2019-12-20 19:02 Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 1/6] crypto: qce - fix ctr-aes-qce block, chunk sizes Eneas U de Queiroz
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Eneas U de Queiroz @ 2019-12-20 19:02 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto; +Cc: Eneas U de Queiroz

I've been trying to make the Qualcomm Crypto Engine work with GCM-mode
AES.  I fixed some bugs, and added an option to build only hashes or
skciphers, as the VPN performance increases if you leave some of that to
the CPU.

A discussion about this can be found here:
https://github.com/openwrt/openwrt/pull/2518

I'm using openwrt to test this, and there's no support for kernel 5.x
yet.  So I have backported the recent skcipher updates, and tested this
with 4.19. I don't have the hardware with me, but I have run-tested
everything, working remotely.

All of the skciphers directly implemented by the driver work.  They pass
the tcrypt tests, and also some tests from userspace using AF_ALG:
https://github.com/cotequeiroz/afalg_tests

However, I can't get gcm(aes) to work.  When setting the gcm-mode key,
it sets the ctr(aes) key, then encrypt a block of zeroes, and uses that
as the ghash key.  The driver fails to perform that encryption.  I've
dumped the input and output data, and they apparently are not touched by
the QCE.  The IV, which written to a buffer appended to the results sg
list gets updated, but the results themselves are not.  I'm not sure
what goes wrong, if it is a DMA/cache problem, memory alignment, or
whatever.

If I take 'be128 hash' out of the 'data' struct, and kzalloc them
separately in crypto_gcm_setkey (crypto/gcm.c), it encrypts the data
just fine--perhaps the payload and the request struct can't be in the
same page?

However, it still fails during decryption of the very first tcrypt test
vector (I'm testing with the AF_ALG program, using the same vectors as
the kernel), in the final encryption to compute the authentication tag,
in the same fashion as it did in 'crypto_gcm_setkey'.  What this case
has in common with the ghash key above is the input data, a single block
of zeroes, so this may be a hardware bug.  However, if I perform the
same encryption using the AF_ALG test, it completes OK.

I am not experienced enough with the Linux Kernel, or with the ARM
architecture to wrap this up on my own, so I need some pointers to what
to try next.

To come up a working setup, I am passing any AES requests whose length
is less than or equal to one AES block to the fallback skcipher.  This
hack is not a part of this series, but I can send it if there's any
interest in it.

Anyway, the patches in this series are complete enough on their own.
With the exception of the last patch, they're all bugfixes.

Cheers,

Eneas

Eneas U de Queiroz (6):
  crypto: qce - fix ctr-aes-qce block, chunk sizes
  crypto: qce - fix xts-aes-qce key sizes
  crypto: qce - save a sg table slot for result buf
  crypto: qce - update the skcipher IV
  crypto: qce - initialize fallback only for AES
  crypto: qce - allow building only hashes/ciphers

 drivers/crypto/Kconfig        |  63 ++++++++-
 drivers/crypto/qce/Makefile   |   7 +-
 drivers/crypto/qce/common.c   | 244 ++++++++++++++++++----------------
 drivers/crypto/qce/core.c     |   4 +
 drivers/crypto/qce/dma.c      |   6 +-
 drivers/crypto/qce/dma.h      |   3 +-
 drivers/crypto/qce/skcipher.c |  41 ++++--
 7 files changed, 229 insertions(+), 139 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] crypto: qce - fix ctr-aes-qce block, chunk sizes
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
@ 2019-12-20 19:02 ` Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 2/6] crypto: qce - fix xts-aes-qce key sizes Eneas U de Queiroz
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Eneas U de Queiroz @ 2019-12-20 19:02 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto; +Cc: Eneas U de Queiroz

Set blocksize of ctr-aes-qce to 1, so it can operate as a stream cipher,
adding the definition for chucksize instead, where the underlying block
size belongs.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
---
 drivers/crypto/qce/skcipher.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index fee07323f8f9..1f1f40a761fa 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -270,6 +270,7 @@ struct qce_skcipher_def {
 	const char *name;
 	const char *drv_name;
 	unsigned int blocksize;
+	unsigned int chunksize;
 	unsigned int ivsize;
 	unsigned int min_keysize;
 	unsigned int max_keysize;
@@ -298,7 +299,8 @@ static const struct qce_skcipher_def skcipher_def[] = {
 		.flags		= QCE_ALG_AES | QCE_MODE_CTR,
 		.name		= "ctr(aes)",
 		.drv_name	= "ctr-aes-qce",
-		.blocksize	= AES_BLOCK_SIZE,
+		.blocksize	= 1,
+		.chunksize	= AES_BLOCK_SIZE,
 		.ivsize		= AES_BLOCK_SIZE,
 		.min_keysize	= AES_MIN_KEY_SIZE,
 		.max_keysize	= AES_MAX_KEY_SIZE,
@@ -368,6 +370,7 @@ static int qce_skcipher_register_one(const struct qce_skcipher_def *def,
 		 def->drv_name);
 
 	alg->base.cra_blocksize		= def->blocksize;
+	alg->chunksize			= def->chunksize;
 	alg->ivsize			= def->ivsize;
 	alg->min_keysize		= def->min_keysize;
 	alg->max_keysize		= def->max_keysize;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] crypto: qce - fix xts-aes-qce key sizes
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 1/6] crypto: qce - fix ctr-aes-qce block, chunk sizes Eneas U de Queiroz
@ 2019-12-20 19:02 ` Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 3/6] crypto: qce - save a sg table slot for result buf Eneas U de Queiroz
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Eneas U de Queiroz @ 2019-12-20 19:02 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto; +Cc: Eneas U de Queiroz

XTS-mode uses two keys, so the keysizes should be doubled in
skcipher_def, and halved when checking if it is AES-128/192/256.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
---
 drivers/crypto/qce/skcipher.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index 1f1f40a761fa..e4f6d87ba51d 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -154,12 +154,13 @@ static int qce_skcipher_setkey(struct crypto_skcipher *ablk, const u8 *key,
 {
 	struct crypto_tfm *tfm = crypto_skcipher_tfm(ablk);
 	struct qce_cipher_ctx *ctx = crypto_tfm_ctx(tfm);
+	unsigned long flags = to_cipher_tmpl(ablk)->alg_flags;
 	int ret;
 
 	if (!key || !keylen)
 		return -EINVAL;
 
-	switch (keylen) {
+	switch (IS_XTS(flags) ? keylen >> 1 : keylen) {
 	case AES_KEYSIZE_128:
 	case AES_KEYSIZE_256:
 		break;
@@ -213,13 +214,15 @@ static int qce_skcipher_crypt(struct skcipher_request *req, int encrypt)
 	struct qce_cipher_ctx *ctx = crypto_skcipher_ctx(tfm);
 	struct qce_cipher_reqctx *rctx = skcipher_request_ctx(req);
 	struct qce_alg_template *tmpl = to_cipher_tmpl(tfm);
+	int keylen;
 	int ret;
 
 	rctx->flags = tmpl->alg_flags;
 	rctx->flags |= encrypt ? QCE_ENCRYPT : QCE_DECRYPT;
+	keylen = IS_XTS(rctx->flags) ? ctx->enc_keylen >> 1 : ctx->enc_keylen;
 
-	if (IS_AES(rctx->flags) && ctx->enc_keylen != AES_KEYSIZE_128 &&
-	    ctx->enc_keylen != AES_KEYSIZE_256) {
+	if (IS_AES(rctx->flags) && keylen != AES_KEYSIZE_128 &&
+	    keylen != AES_KEYSIZE_256) {
 		SYNC_SKCIPHER_REQUEST_ON_STACK(subreq, ctx->fallback);
 
 		skcipher_request_set_sync_tfm(subreq, ctx->fallback);
@@ -311,8 +314,8 @@ static const struct qce_skcipher_def skcipher_def[] = {
 		.drv_name	= "xts-aes-qce",
 		.blocksize	= AES_BLOCK_SIZE,
 		.ivsize		= AES_BLOCK_SIZE,
-		.min_keysize	= AES_MIN_KEY_SIZE,
-		.max_keysize	= AES_MAX_KEY_SIZE,
+		.min_keysize	= AES_MIN_KEY_SIZE * 2,
+		.max_keysize	= AES_MAX_KEY_SIZE * 2,
 	},
 	{
 		.flags		= QCE_ALG_DES | QCE_MODE_ECB,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] crypto: qce - save a sg table slot for result buf
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 1/6] crypto: qce - fix ctr-aes-qce block, chunk sizes Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 2/6] crypto: qce - fix xts-aes-qce key sizes Eneas U de Queiroz
@ 2019-12-20 19:02 ` Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 4/6] crypto: qce - update the skcipher IV Eneas U de Queiroz
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Eneas U de Queiroz @ 2019-12-20 19:02 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto; +Cc: Eneas U de Queiroz

When ctr-aes-qce is used for gcm-mode, an extra sg entry for the
authentication tag is present, causing trouble when the qce driver
prepares the dst-results sg table for dma.

It computes the number of entries needed with sg_nents_for_len, leaving
out the tag entry.  Then it creates a sg table with that number plus
one, used to store a result buffer.

When copying the sg table, there's no limit to the number of entries
copied, so the extra slot is filled with the authentication tag sg.
When the driver tries to add the result sg, the list is full, and it
returns EINVAL.

By limiting the number of sg entries copied to the dest table, the slot
for the result buffer is guaranteed to be unused.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
---
 drivers/crypto/qce/dma.c      | 6 ++++--
 drivers/crypto/qce/dma.h      | 3 ++-
 drivers/crypto/qce/skcipher.c | 4 ++--
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
index 40a59214d2e1..7da893dc00e7 100644
--- a/drivers/crypto/qce/dma.c
+++ b/drivers/crypto/qce/dma.c
@@ -47,7 +47,8 @@ void qce_dma_release(struct qce_dma_data *dma)
 }
 
 struct scatterlist *
-qce_sgtable_add(struct sg_table *sgt, struct scatterlist *new_sgl)
+qce_sgtable_add(struct sg_table *sgt, struct scatterlist *new_sgl,
+		int max_ents)
 {
 	struct scatterlist *sg = sgt->sgl, *sg_last = NULL;
 
@@ -60,12 +61,13 @@ qce_sgtable_add(struct sg_table *sgt, struct scatterlist *new_sgl)
 	if (!sg)
 		return ERR_PTR(-EINVAL);
 
-	while (new_sgl && sg) {
+	while (new_sgl && sg && max_ents) {
 		sg_set_page(sg, sg_page(new_sgl), new_sgl->length,
 			    new_sgl->offset);
 		sg_last = sg;
 		sg = sg_next(sg);
 		new_sgl = sg_next(new_sgl);
+		max_ents--;
 	}
 
 	return sg_last;
diff --git a/drivers/crypto/qce/dma.h b/drivers/crypto/qce/dma.h
index 1e25a9e0e6f8..ed25a0d9829e 100644
--- a/drivers/crypto/qce/dma.h
+++ b/drivers/crypto/qce/dma.h
@@ -42,6 +42,7 @@ int qce_dma_prep_sgs(struct qce_dma_data *dma, struct scatterlist *sg_in,
 void qce_dma_issue_pending(struct qce_dma_data *dma);
 int qce_dma_terminate_all(struct qce_dma_data *dma);
 struct scatterlist *
-qce_sgtable_add(struct sg_table *sgt, struct scatterlist *sg_add);
+qce_sgtable_add(struct sg_table *sgt, struct scatterlist *sg_add,
+		int max_ents);
 
 #endif /* _DMA_H_ */
diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index e4f6d87ba51d..a9ae356bc2a7 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -95,13 +95,13 @@ qce_skcipher_async_req_handle(struct crypto_async_request *async_req)
 
 	sg_init_one(&rctx->result_sg, qce->dma.result_buf, QCE_RESULT_BUF_SZ);
 
-	sg = qce_sgtable_add(&rctx->dst_tbl, req->dst);
+	sg = qce_sgtable_add(&rctx->dst_tbl, req->dst, rctx->dst_nents - 1);
 	if (IS_ERR(sg)) {
 		ret = PTR_ERR(sg);
 		goto error_free;
 	}
 
-	sg = qce_sgtable_add(&rctx->dst_tbl, &rctx->result_sg);
+	sg = qce_sgtable_add(&rctx->dst_tbl, &rctx->result_sg, 1);
 	if (IS_ERR(sg)) {
 		ret = PTR_ERR(sg);
 		goto error_free;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] crypto: qce - update the skcipher IV
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
                   ` (2 preceding siblings ...)
  2019-12-20 19:02 ` [PATCH 3/6] crypto: qce - save a sg table slot for result buf Eneas U de Queiroz
@ 2019-12-20 19:02 ` Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 5/6] crypto: qce - initialize fallback only for AES Eneas U de Queiroz
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Eneas U de Queiroz @ 2019-12-20 19:02 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto; +Cc: Eneas U de Queiroz

Update the IV after the completion of each cipher operation.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
---
 drivers/crypto/qce/skcipher.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index a9ae356bc2a7..d3852a61cb1d 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -21,6 +21,7 @@ static void qce_skcipher_done(void *data)
 	struct qce_cipher_reqctx *rctx = skcipher_request_ctx(req);
 	struct qce_alg_template *tmpl = to_cipher_tmpl(crypto_skcipher_reqtfm(req));
 	struct qce_device *qce = tmpl->qce;
+	struct qce_result_dump *result_buf = qce->dma.result_buf;
 	enum dma_data_direction dir_src, dir_dst;
 	u32 status;
 	int error;
@@ -45,6 +46,7 @@ static void qce_skcipher_done(void *data)
 	if (error < 0)
 		dev_dbg(qce->dev, "skcipher operation error (%x)\n", status);
 
+	memcpy(rctx->iv, result_buf->encr_cntr_iv, rctx->ivsize);
 	qce->async_req_done(tmpl->qce, error);
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] crypto: qce - initialize fallback only for AES
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
                   ` (3 preceding siblings ...)
  2019-12-20 19:02 ` [PATCH 4/6] crypto: qce - update the skcipher IV Eneas U de Queiroz
@ 2019-12-20 19:02 ` Eneas U de Queiroz
  2019-12-20 19:02 ` [PATCH 6/6] crypto: qce - allow building only hashes/ciphers Eneas U de Queiroz
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Eneas U de Queiroz @ 2019-12-20 19:02 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto; +Cc: Eneas U de Queiroz

Adjust cra_flags to add CRYPTO_NEED_FALLBACK only for AES ciphers, where
AES-192 is not handled by the qce hardware, and don't allocate & free
the fallback skcipher for other algorithms.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
---
 drivers/crypto/qce/skcipher.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index d3852a61cb1d..4217b745f124 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -257,7 +257,14 @@ static int qce_skcipher_init(struct crypto_skcipher *tfm)
 
 	memset(ctx, 0, sizeof(*ctx));
 	crypto_skcipher_set_reqsize(tfm, sizeof(struct qce_cipher_reqctx));
+	return 0;
+}
 
+static int qce_skcipher_init_fallback(struct crypto_skcipher *tfm)
+{
+	struct qce_cipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	qce_skcipher_init(tfm);
 	ctx->fallback = crypto_alloc_sync_skcipher(crypto_tfm_alg_name(&tfm->base),
 						   0, CRYPTO_ALG_NEED_FALLBACK);
 	return PTR_ERR_OR_ZERO(ctx->fallback);
@@ -387,14 +394,18 @@ static int qce_skcipher_register_one(const struct qce_skcipher_def *def,
 
 	alg->base.cra_priority		= 300;
 	alg->base.cra_flags		= CRYPTO_ALG_ASYNC |
-					  CRYPTO_ALG_NEED_FALLBACK |
 					  CRYPTO_ALG_KERN_DRIVER_ONLY;
 	alg->base.cra_ctxsize		= sizeof(struct qce_cipher_ctx);
 	alg->base.cra_alignmask		= 0;
 	alg->base.cra_module		= THIS_MODULE;
 
-	alg->init			= qce_skcipher_init;
-	alg->exit			= qce_skcipher_exit;
+	if (IS_AES(def->flags)) {
+		alg->base.cra_flags    |= CRYPTO_ALG_NEED_FALLBACK;
+		alg->init		= qce_skcipher_init_fallback;
+		alg->exit		= qce_skcipher_exit;
+	} else {
+		alg->init		= qce_skcipher_init;
+	}
 
 	INIT_LIST_HEAD(&tmpl->entry);
 	tmpl->crypto_alg_type = CRYPTO_ALG_TYPE_SKCIPHER;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] crypto: qce - allow building only hashes/ciphers
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
                   ` (4 preceding siblings ...)
  2019-12-20 19:02 ` [PATCH 5/6] crypto: qce - initialize fallback only for AES Eneas U de Queiroz
@ 2019-12-20 19:02 ` Eneas U de Queiroz
  2019-12-23  9:46 ` Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Ard Biesheuvel
  2019-12-27 10:39 ` Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Herbert Xu
  7 siblings, 0 replies; 12+ messages in thread
From: Eneas U de Queiroz @ 2019-12-20 19:02 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto; +Cc: Eneas U de Queiroz

Allow the user to choose whether to build support for all algorithms
(default), hashes-only, or skciphers-only.

The QCE engine does not appear to scale as well as the CPU to handle
multiple crypto requests.  While the ipq40xx chips have 4-core CPUs, the
QCE handles only 2 requests in parallel.

Ipsec throughput seems to improve when disabling either family of
algorithms, sharing the load with the CPU.  Enabling skciphers-only
appears to work best.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
---
 drivers/crypto/Kconfig      |  63 +++++++++-
 drivers/crypto/qce/Makefile |   7 +-
 drivers/crypto/qce/common.c | 244 +++++++++++++++++++-----------------
 drivers/crypto/qce/core.c   |   4 +
 4 files changed, 193 insertions(+), 125 deletions(-)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index d02e79ac81c0..73a80232e69e 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -618,6 +618,14 @@ config CRYPTO_DEV_QCE
 	tristate "Qualcomm crypto engine accelerator"
 	depends on ARCH_QCOM || COMPILE_TEST
 	depends on HAS_IOMEM
+	help
+	  This driver supports Qualcomm crypto engine accelerator
+	  hardware. To compile this driver as a module, choose M here. The
+	  module will be called qcrypto.
+
+config CRYPTO_DEV_QCE_SKCIPHER
+	bool
+	depends on CRYPTO_DEV_QCE
 	select CRYPTO_AES
 	select CRYPTO_LIB_DES
 	select CRYPTO_ECB
@@ -625,10 +633,57 @@ config CRYPTO_DEV_QCE
 	select CRYPTO_XTS
 	select CRYPTO_CTR
 	select CRYPTO_SKCIPHER
-	help
-	  This driver supports Qualcomm crypto engine accelerator
-	  hardware. To compile this driver as a module, choose M here. The
-	  module will be called qcrypto.
+
+config CRYPTO_DEV_QCE_SHA
+	bool
+	depends on CRYPTO_DEV_QCE
+
+choice
+	prompt "Algorithms enabled for QCE acceleration"
+	default CRYPTO_DEV_QCE_ENABLE_ALL
+	depends on CRYPTO_DEV_QCE
+	help
+	  This option allows to choose whether to build support for all algorihtms
+	  (default), hashes-only, or skciphers-only.
+
+	  The QCE engine does not appear to scale as well as the CPU to handle
+	  multiple crypto requests.  While the ipq40xx chips have 4-core CPUs, the
+	  QCE handles only 2 requests in parallel.
+
+	  Ipsec throughput seems to improve when disabling either family of
+	  algorithms, sharing the load with the CPU.  Enabling skciphers-only
+	  appears to work best.
+
+	config CRYPTO_DEV_QCE_ENABLE_ALL
+		bool "All supported algorithms"
+		select CRYPTO_DEV_QCE_SKCIPHER
+		select CRYPTO_DEV_QCE_SHA
+		help
+		  Enable all supported algorithms:
+			- AES (CBC, CTR, ECB, XTS)
+			- 3DES (CBC, ECB)
+			- DES (CBC, ECB)
+			- SHA1, HMAC-SHA1
+			- SHA256, HMAC-SHA256
+
+	config CRYPTO_DEV_QCE_ENABLE_SKCIPHER
+		bool "Symmetric-key ciphers only"
+		select CRYPTO_DEV_QCE_SKCIPHER
+		help
+		  Enable symmetric-key ciphers only:
+			- AES (CBC, CTR, ECB, XTS)
+			- 3DES (ECB, CBC)
+			- DES (ECB, CBC)
+
+	config CRYPTO_DEV_QCE_ENABLE_SHA
+		bool "Hash/HMAC only"
+		select CRYPTO_DEV_QCE_SHA
+		help
+		  Enable hashes/HMAC algorithms only:
+			- SHA1, HMAC-SHA1
+			- SHA256, HMAC-SHA256
+
+endchoice
 
 config CRYPTO_DEV_QCOM_RNG
 	tristate "Qualcomm Random Number Generator Driver"
diff --git a/drivers/crypto/qce/Makefile b/drivers/crypto/qce/Makefile
index 8caa04e1ec43..14ade8a7d664 100644
--- a/drivers/crypto/qce/Makefile
+++ b/drivers/crypto/qce/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_CRYPTO_DEV_QCE) += qcrypto.o
 qcrypto-objs := core.o \
 		common.o \
-		dma.o \
-		sha.o \
-		skcipher.o
+		dma.o
+
+qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SHA) += sha.o
+qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) += skcipher.o
diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index da1188abc9ba..629e7f34dc09 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -45,52 +45,56 @@ qce_clear_array(struct qce_device *qce, u32 offset, unsigned int len)
 		qce_write(qce, offset + i * sizeof(u32), 0);
 }
 
-static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size)
+static u32 qce_config_reg(struct qce_device *qce, int little)
 {
-	u32 cfg = 0;
+	u32 beats = (qce->burst_size >> 3) - 1;
+	u32 pipe_pair = qce->pipe_pair_id;
+	u32 config;
 
-	if (IS_AES(flags)) {
-		if (aes_key_size == AES_KEYSIZE_128)
-			cfg |= ENCR_KEY_SZ_AES128 << ENCR_KEY_SZ_SHIFT;
-		else if (aes_key_size == AES_KEYSIZE_256)
-			cfg |= ENCR_KEY_SZ_AES256 << ENCR_KEY_SZ_SHIFT;
-	}
+	config = (beats << REQ_SIZE_SHIFT) & REQ_SIZE_MASK;
+	config |= BIT(MASK_DOUT_INTR_SHIFT) | BIT(MASK_DIN_INTR_SHIFT) |
+		  BIT(MASK_OP_DONE_INTR_SHIFT) | BIT(MASK_ERR_INTR_SHIFT);
+	config |= (pipe_pair << PIPE_SET_SELECT_SHIFT) & PIPE_SET_SELECT_MASK;
+	config &= ~HIGH_SPD_EN_N_SHIFT;
 
-	if (IS_AES(flags))
-		cfg |= ENCR_ALG_AES << ENCR_ALG_SHIFT;
-	else if (IS_DES(flags) || IS_3DES(flags))
-		cfg |= ENCR_ALG_DES << ENCR_ALG_SHIFT;
+	if (little)
+		config |= BIT(LITTLE_ENDIAN_MODE_SHIFT);
 
-	if (IS_DES(flags))
-		cfg |= ENCR_KEY_SZ_DES << ENCR_KEY_SZ_SHIFT;
+	return config;
+}
 
-	if (IS_3DES(flags))
-		cfg |= ENCR_KEY_SZ_3DES << ENCR_KEY_SZ_SHIFT;
+void qce_cpu_to_be32p_array(__be32 *dst, const u8 *src, unsigned int len)
+{
+	__be32 *d = dst;
+	const u8 *s = src;
+	unsigned int n;
 
-	switch (flags & QCE_MODE_MASK) {
-	case QCE_MODE_ECB:
-		cfg |= ENCR_MODE_ECB << ENCR_MODE_SHIFT;
-		break;
-	case QCE_MODE_CBC:
-		cfg |= ENCR_MODE_CBC << ENCR_MODE_SHIFT;
-		break;
-	case QCE_MODE_CTR:
-		cfg |= ENCR_MODE_CTR << ENCR_MODE_SHIFT;
-		break;
-	case QCE_MODE_XTS:
-		cfg |= ENCR_MODE_XTS << ENCR_MODE_SHIFT;
-		break;
-	case QCE_MODE_CCM:
-		cfg |= ENCR_MODE_CCM << ENCR_MODE_SHIFT;
-		cfg |= LAST_CCM_XFR << LAST_CCM_SHIFT;
-		break;
-	default:
-		return ~0;
+	n = len / sizeof(u32);
+	for (; n > 0; n--) {
+		*d = cpu_to_be32p((const __u32 *) s);
+		s += sizeof(__u32);
+		d++;
 	}
+}
 
-	return cfg;
+static void qce_setup_config(struct qce_device *qce)
+{
+	u32 config;
+
+	/* get big endianness */
+	config = qce_config_reg(qce, 0);
+
+	/* clear status */
+	qce_write(qce, REG_STATUS, 0);
+	qce_write(qce, REG_CONFIG, config);
+}
+
+static inline void qce_crypto_go(struct qce_device *qce)
+{
+	qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT));
 }
 
+#ifdef CONFIG_CRYPTO_DEV_QCE_SHA
 static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
 {
 	u32 cfg = 0;
@@ -137,88 +141,6 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
 	return cfg;
 }
 
-static u32 qce_config_reg(struct qce_device *qce, int little)
-{
-	u32 beats = (qce->burst_size >> 3) - 1;
-	u32 pipe_pair = qce->pipe_pair_id;
-	u32 config;
-
-	config = (beats << REQ_SIZE_SHIFT) & REQ_SIZE_MASK;
-	config |= BIT(MASK_DOUT_INTR_SHIFT) | BIT(MASK_DIN_INTR_SHIFT) |
-		  BIT(MASK_OP_DONE_INTR_SHIFT) | BIT(MASK_ERR_INTR_SHIFT);
-	config |= (pipe_pair << PIPE_SET_SELECT_SHIFT) & PIPE_SET_SELECT_MASK;
-	config &= ~HIGH_SPD_EN_N_SHIFT;
-
-	if (little)
-		config |= BIT(LITTLE_ENDIAN_MODE_SHIFT);
-
-	return config;
-}
-
-void qce_cpu_to_be32p_array(__be32 *dst, const u8 *src, unsigned int len)
-{
-	__be32 *d = dst;
-	const u8 *s = src;
-	unsigned int n;
-
-	n = len / sizeof(u32);
-	for (; n > 0; n--) {
-		*d = cpu_to_be32p((const __u32 *) s);
-		s += sizeof(__u32);
-		d++;
-	}
-}
-
-static void qce_xts_swapiv(__be32 *dst, const u8 *src, unsigned int ivsize)
-{
-	u8 swap[QCE_AES_IV_LENGTH];
-	u32 i, j;
-
-	if (ivsize > QCE_AES_IV_LENGTH)
-		return;
-
-	memset(swap, 0, QCE_AES_IV_LENGTH);
-
-	for (i = (QCE_AES_IV_LENGTH - ivsize), j = ivsize - 1;
-	     i < QCE_AES_IV_LENGTH; i++, j--)
-		swap[i] = src[j];
-
-	qce_cpu_to_be32p_array(dst, swap, QCE_AES_IV_LENGTH);
-}
-
-static void qce_xtskey(struct qce_device *qce, const u8 *enckey,
-		       unsigned int enckeylen, unsigned int cryptlen)
-{
-	u32 xtskey[QCE_MAX_CIPHER_KEY_SIZE / sizeof(u32)] = {0};
-	unsigned int xtsklen = enckeylen / (2 * sizeof(u32));
-	unsigned int xtsdusize;
-
-	qce_cpu_to_be32p_array((__be32 *)xtskey, enckey + enckeylen / 2,
-			       enckeylen / 2);
-	qce_write_array(qce, REG_ENCR_XTS_KEY0, xtskey, xtsklen);
-
-	/* xts du size 512B */
-	xtsdusize = min_t(u32, QCE_SECTOR_SIZE, cryptlen);
-	qce_write(qce, REG_ENCR_XTS_DU_SIZE, xtsdusize);
-}
-
-static void qce_setup_config(struct qce_device *qce)
-{
-	u32 config;
-
-	/* get big endianness */
-	config = qce_config_reg(qce, 0);
-
-	/* clear status */
-	qce_write(qce, REG_STATUS, 0);
-	qce_write(qce, REG_CONFIG, config);
-}
-
-static inline void qce_crypto_go(struct qce_device *qce)
-{
-	qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT));
-}
-
 static int qce_setup_regs_ahash(struct crypto_async_request *async_req,
 				u32 totallen, u32 offset)
 {
@@ -303,6 +225,87 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req,
 
 	return 0;
 }
+#endif
+
+#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER
+static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size)
+{
+	u32 cfg = 0;
+
+	if (IS_AES(flags)) {
+		if (aes_key_size == AES_KEYSIZE_128)
+			cfg |= ENCR_KEY_SZ_AES128 << ENCR_KEY_SZ_SHIFT;
+		else if (aes_key_size == AES_KEYSIZE_256)
+			cfg |= ENCR_KEY_SZ_AES256 << ENCR_KEY_SZ_SHIFT;
+	}
+
+	if (IS_AES(flags))
+		cfg |= ENCR_ALG_AES << ENCR_ALG_SHIFT;
+	else if (IS_DES(flags) || IS_3DES(flags))
+		cfg |= ENCR_ALG_DES << ENCR_ALG_SHIFT;
+
+	if (IS_DES(flags))
+		cfg |= ENCR_KEY_SZ_DES << ENCR_KEY_SZ_SHIFT;
+
+	if (IS_3DES(flags))
+		cfg |= ENCR_KEY_SZ_3DES << ENCR_KEY_SZ_SHIFT;
+
+	switch (flags & QCE_MODE_MASK) {
+	case QCE_MODE_ECB:
+		cfg |= ENCR_MODE_ECB << ENCR_MODE_SHIFT;
+		break;
+	case QCE_MODE_CBC:
+		cfg |= ENCR_MODE_CBC << ENCR_MODE_SHIFT;
+		break;
+	case QCE_MODE_CTR:
+		cfg |= ENCR_MODE_CTR << ENCR_MODE_SHIFT;
+		break;
+	case QCE_MODE_XTS:
+		cfg |= ENCR_MODE_XTS << ENCR_MODE_SHIFT;
+		break;
+	case QCE_MODE_CCM:
+		cfg |= ENCR_MODE_CCM << ENCR_MODE_SHIFT;
+		cfg |= LAST_CCM_XFR << LAST_CCM_SHIFT;
+		break;
+	default:
+		return ~0;
+	}
+
+	return cfg;
+}
+
+static void qce_xts_swapiv(__be32 *dst, const u8 *src, unsigned int ivsize)
+{
+	u8 swap[QCE_AES_IV_LENGTH];
+	u32 i, j;
+
+	if (ivsize > QCE_AES_IV_LENGTH)
+		return;
+
+	memset(swap, 0, QCE_AES_IV_LENGTH);
+
+	for (i = (QCE_AES_IV_LENGTH - ivsize), j = ivsize - 1;
+	     i < QCE_AES_IV_LENGTH; i++, j--)
+		swap[i] = src[j];
+
+	qce_cpu_to_be32p_array(dst, swap, QCE_AES_IV_LENGTH);
+}
+
+static void qce_xtskey(struct qce_device *qce, const u8 *enckey,
+		       unsigned int enckeylen, unsigned int cryptlen)
+{
+	u32 xtskey[QCE_MAX_CIPHER_KEY_SIZE / sizeof(u32)] = {0};
+	unsigned int xtsklen = enckeylen / (2 * sizeof(u32));
+	unsigned int xtsdusize;
+
+	qce_cpu_to_be32p_array((__be32 *)xtskey, enckey + enckeylen / 2,
+			       enckeylen / 2);
+	qce_write_array(qce, REG_ENCR_XTS_KEY0, xtskey, xtsklen);
+
+	/* xts du size 512B */
+	xtsdusize = min_t(u32, QCE_SECTOR_SIZE, cryptlen);
+	qce_write(qce, REG_ENCR_XTS_DU_SIZE, xtsdusize);
+}
 
 static int qce_setup_regs_skcipher(struct crypto_async_request *async_req,
 				     u32 totallen, u32 offset)
@@ -384,15 +387,20 @@ static int qce_setup_regs_skcipher(struct crypto_async_request *async_req,
 
 	return 0;
 }
+#endif
 
 int qce_start(struct crypto_async_request *async_req, u32 type, u32 totallen,
 	      u32 offset)
 {
 	switch (type) {
+#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER
 	case CRYPTO_ALG_TYPE_SKCIPHER:
 		return qce_setup_regs_skcipher(async_req, totallen, offset);
+#endif
+#ifdef CONFIG_CRYPTO_DEV_QCE_SHA
 	case CRYPTO_ALG_TYPE_AHASH:
 		return qce_setup_regs_ahash(async_req, totallen, offset);
+#endif
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/crypto/qce/core.c b/drivers/crypto/qce/core.c
index 0a44a6eeacf5..cb6d61eb7302 100644
--- a/drivers/crypto/qce/core.c
+++ b/drivers/crypto/qce/core.c
@@ -22,8 +22,12 @@
 #define QCE_QUEUE_LENGTH	1
 
 static const struct qce_algo_ops *qce_ops[] = {
+#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER
 	&skcipher_ops,
+#endif
+#ifdef CONFIG_CRYPTO_DEV_QCE_SHA
 	&ahash_ops,
+#endif
 };
 
 static void qce_unregister_algs(struct qce_device *qce)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
                   ` (5 preceding siblings ...)
  2019-12-20 19:02 ` [PATCH 6/6] crypto: qce - allow building only hashes/ciphers Eneas U de Queiroz
@ 2019-12-23  9:46 ` Ard Biesheuvel
       [not found]   ` <CAPxccB2LGANG8DcmF4nwUDOzDzf2RHX4S-4w9z6TcO9csu4xSw@mail.gmail.com>
  2019-12-27 10:39 ` Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Herbert Xu
  7 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2019-12-23  9:46 UTC (permalink / raw)
  To: Eneas U de Queiroz
  Cc: Herbert Xu, David S. Miller,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE

On Fri, 20 Dec 2019 at 20:02, Eneas U de Queiroz <cotequeiroz@gmail.com> wrote:
>
> I've been trying to make the Qualcomm Crypto Engine work with GCM-mode
> AES.  I fixed some bugs, and added an option to build only hashes or
> skciphers, as the VPN performance increases if you leave some of that to
> the CPU.
>
> A discussion about this can be found here:
> https://github.com/openwrt/openwrt/pull/2518
>
> I'm using openwrt to test this, and there's no support for kernel 5.x
> yet.  So I have backported the recent skcipher updates, and tested this
> with 4.19. I don't have the hardware with me, but I have run-tested
> everything, working remotely.
>
> All of the skciphers directly implemented by the driver work.  They pass
> the tcrypt tests, and also some tests from userspace using AF_ALG:
> https://github.com/cotequeiroz/afalg_tests
>
> However, I can't get gcm(aes) to work.  When setting the gcm-mode key,
> it sets the ctr(aes) key, then encrypt a block of zeroes, and uses that
> as the ghash key.  The driver fails to perform that encryption.  I've
> dumped the input and output data, and they apparently are not touched by
> the QCE.  The IV, which written to a buffer appended to the results sg
> list gets updated, but the results themselves are not.  I'm not sure
> what goes wrong, if it is a DMA/cache problem, memory alignment, or
> whatever.
>

This does sound like a DMA problem. I assume the accelerator is not
cache coherent?

In any case, it is dubious whether the round trip to the accelerator
is worth it when encrypting the GHASH key. Just call aes_encrypt()
instead, and do it in software.

> If I take 'be128 hash' out of the 'data' struct, and kzalloc them
> separately in crypto_gcm_setkey (crypto/gcm.c), it encrypts the data
> just fine--perhaps the payload and the request struct can't be in the
> same page?
>

Non-cache coherent DMA involves cache invalidation on inbound data. So
if both the device and the CPU write to the same cacheline while the
buffer is mapped for DMA from device to memory, one of the updates
gets lost.


> However, it still fails during decryption of the very first tcrypt test
> vector (I'm testing with the AF_ALG program, using the same vectors as
> the kernel), in the final encryption to compute the authentication tag,
> in the same fashion as it did in 'crypto_gcm_setkey'.  What this case
> has in common with the ghash key above is the input data, a single block
> of zeroes, so this may be a hardware bug.  However, if I perform the
> same encryption using the AF_ALG test, it completes OK.
>
> I am not experienced enough with the Linux Kernel, or with the ARM
> architecture to wrap this up on my own, so I need some pointers to what
> to try next.
>
> To come up a working setup, I am passing any AES requests whose length
> is less than or equal to one AES block to the fallback skcipher.  This
> hack is not a part of this series, but I can send it if there's any
> interest in it.
>
> Anyway, the patches in this series are complete enough on their own.
> With the exception of the last patch, they're all bugfixes.
>
> Cheers,
>
> Eneas
>
> Eneas U de Queiroz (6):
>   crypto: qce - fix ctr-aes-qce block, chunk sizes
>   crypto: qce - fix xts-aes-qce key sizes
>   crypto: qce - save a sg table slot for result buf
>   crypto: qce - update the skcipher IV
>   crypto: qce - initialize fallback only for AES
>   crypto: qce - allow building only hashes/ciphers
>
>  drivers/crypto/Kconfig        |  63 ++++++++-
>  drivers/crypto/qce/Makefile   |   7 +-
>  drivers/crypto/qce/common.c   | 244 ++++++++++++++++++----------------
>  drivers/crypto/qce/core.c     |   4 +
>  drivers/crypto/qce/dma.c      |   6 +-
>  drivers/crypto/qce/dma.h      |   3 +-
>  drivers/crypto/qce/skcipher.c |  41 ++++--
>  7 files changed, 229 insertions(+), 139 deletions(-)
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes
  2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
                   ` (6 preceding siblings ...)
  2019-12-23  9:46 ` Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Ard Biesheuvel
@ 2019-12-27 10:39 ` Herbert Xu
  7 siblings, 0 replies; 12+ messages in thread
From: Herbert Xu @ 2019-12-27 10:39 UTC (permalink / raw)
  To: Eneas U de Queiroz; +Cc: David S. Miller, linux-crypto

On Fri, Dec 20, 2019 at 04:02:12PM -0300, Eneas U de Queiroz wrote:
> I've been trying to make the Qualcomm Crypto Engine work with GCM-mode
> AES.  I fixed some bugs, and added an option to build only hashes or
> skciphers, as the VPN performance increases if you leave some of that to
> the CPU.
> 
> A discussion about this can be found here:
> https://github.com/openwrt/openwrt/pull/2518
> 
> I'm using openwrt to test this, and there's no support for kernel 5.x
> yet.  So I have backported the recent skcipher updates, and tested this
> with 4.19. I don't have the hardware with me, but I have run-tested
> everything, working remotely.
> 
> All of the skciphers directly implemented by the driver work.  They pass
> the tcrypt tests, and also some tests from userspace using AF_ALG:
> https://github.com/cotequeiroz/afalg_tests
> 
> However, I can't get gcm(aes) to work.  When setting the gcm-mode key,
> it sets the ctr(aes) key, then encrypt a block of zeroes, and uses that
> as the ghash key.  The driver fails to perform that encryption.  I've
> dumped the input and output data, and they apparently are not touched by
> the QCE.  The IV, which written to a buffer appended to the results sg
> list gets updated, but the results themselves are not.  I'm not sure
> what goes wrong, if it is a DMA/cache problem, memory alignment, or
> whatever.
> 
> If I take 'be128 hash' out of the 'data' struct, and kzalloc them
> separately in crypto_gcm_setkey (crypto/gcm.c), it encrypts the data
> just fine--perhaps the payload and the request struct can't be in the
> same page?
> 
> However, it still fails during decryption of the very first tcrypt test
> vector (I'm testing with the AF_ALG program, using the same vectors as
> the kernel), in the final encryption to compute the authentication tag,
> in the same fashion as it did in 'crypto_gcm_setkey'.  What this case
> has in common with the ghash key above is the input data, a single block
> of zeroes, so this may be a hardware bug.  However, if I perform the
> same encryption using the AF_ALG test, it completes OK.
> 
> I am not experienced enough with the Linux Kernel, or with the ARM
> architecture to wrap this up on my own, so I need some pointers to what
> to try next.
> 
> To come up a working setup, I am passing any AES requests whose length
> is less than or equal to one AES block to the fallback skcipher.  This
> hack is not a part of this series, but I can send it if there's any
> interest in it.
> 
> Anyway, the patches in this series are complete enough on their own.
> With the exception of the last patch, they're all bugfixes.
> 
> Cheers,
> 
> Eneas
> 
> Eneas U de Queiroz (6):
>   crypto: qce - fix ctr-aes-qce block, chunk sizes
>   crypto: qce - fix xts-aes-qce key sizes
>   crypto: qce - save a sg table slot for result buf
>   crypto: qce - update the skcipher IV
>   crypto: qce - initialize fallback only for AES
>   crypto: qce - allow building only hashes/ciphers
> 
>  drivers/crypto/Kconfig        |  63 ++++++++-
>  drivers/crypto/qce/Makefile   |   7 +-
>  drivers/crypto/qce/common.c   | 244 ++++++++++++++++++----------------
>  drivers/crypto/qce/core.c     |   4 +
>  drivers/crypto/qce/dma.c      |   6 +-
>  drivers/crypto/qce/dma.h      |   3 +-
>  drivers/crypto/qce/skcipher.c |  41 ++++--
>  7 files changed, 229 insertions(+), 139 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Fwd: QCE hw-crypto DMA issues
       [not found]   ` <CAPxccB2LGANG8DcmF4nwUDOzDzf2RHX4S-4w9z6TcO9csu4xSw@mail.gmail.com>
@ 2020-01-02 21:14     ` Eneas Queiroz
  2020-01-03 14:33     ` Ard Biesheuvel
  1 sibling, 0 replies; 12+ messages in thread
From: Eneas Queiroz @ 2020-01-02 21:14 UTC (permalink / raw)
  To: open list:HARDWARE RANDOM NUMBER GENERATOR CORE

I'm changing the subject title, as the original series has been merged.

On Mon, Dec 23, 2019 at 6:46 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Fri, 20 Dec 2019 at 20:02, Eneas U de Queiroz <cotequeiroz@gmail.com> wrote:
> >
> > I've been trying to make the Qualcomm Crypto Engine work with GCM-mode
> > AES.  I fixed some bugs, and added an option to build only hashes or
> > skciphers, as the VPN performance increases if you leave some of that to
> > the CPU.
> >
> > A discussion about this can be found here:
> > https://github.com/openwrt/openwrt/pull/2518
> >
> > I'm using openwrt to test this, and there's no support for kernel 5.x
> > yet.  So I have backported the recent skcipher updates, and tested this
> > with 4.19. I don't have the hardware with me, but I have run-tested
> > everything, working remotely.
> >
> > All of the skciphers directly implemented by the driver work.  They pass
> > the tcrypt tests, and also some tests from userspace using AF_ALG:
> > https://github.com/cotequeiroz/afalg_tests
> >
> > However, I can't get gcm(aes) to work.  When setting the gcm-mode key,
> > it sets the ctr(aes) key, then encrypt a block of zeroes, and uses that
> > as the ghash key.  The driver fails to perform that encryption.  I've
> > dumped the input and output data, and they apparently are not touched by
> > the QCE.  The IV, which written to a buffer appended to the results sg
> > list gets updated, but the results themselves are not.  I'm not sure
> > what goes wrong, if it is a DMA/cache problem, memory alignment, or
> > whatever.
> >
>
> This does sound like a DMA problem. I assume the accelerator is not
> cache coherent?
>
> In any case, it is dubious whether the round trip to the accelerator
> is worth it when encrypting the GHASH key. Just call aes_encrypt()
> instead, and do it in software.

ipsec still fails, even if I use software for every single-block
operation. I can perhaps leave that as an optimization, but it won't
fix the main issue.

> > If I take 'be128 hash' out of the 'data' struct, and kzalloc them
> > separately in crypto_gcm_setkey (crypto/gcm.c), it encrypts the data
> > just fine--perhaps the payload and the request struct can't be in the
> > same page?
> >
>
> Non-cache coherent DMA involves cache invalidation on inbound data. So
> if both the device and the CPU write to the same cacheline while the
> buffer is mapped for DMA from device to memory, one of the updates
> gets lost.

Can you give me any pointers/examples of how I can make this work?

Thanks,

Eneas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: QCE hw-crypto DMA issues
       [not found]   ` <CAPxccB2LGANG8DcmF4nwUDOzDzf2RHX4S-4w9z6TcO9csu4xSw@mail.gmail.com>
  2020-01-02 21:14     ` Fwd: QCE hw-crypto DMA issues Eneas Queiroz
@ 2020-01-03 14:33     ` Ard Biesheuvel
  2020-01-03 14:37       ` Eneas Queiroz
  1 sibling, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2020-01-03 14:33 UTC (permalink / raw)
  To: Eneas Queiroz; +Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE

On Thu, 2 Jan 2020 at 22:09, Eneas Queiroz <cotequeiroz@gmail.com> wrote:
>
> I'm changing the subject title, as the original series has been merged.
>
> On Mon, Dec 23, 2019 at 6:46 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>
>> On Fri, 20 Dec 2019 at 20:02, Eneas U de Queiroz <cotequeiroz@gmail.com> wrote:
>> >
>> > I've been trying to make the Qualcomm Crypto Engine work with GCM-mode
>> > AES.  I fixed some bugs, and added an option to build only hashes or
>> > skciphers, as the VPN performance increases if you leave some of that to
>> > the CPU.
>> >
>> > A discussion about this can be found here:
>> > https://github.com/openwrt/openwrt/pull/2518
>> >
>> > I'm using openwrt to test this, and there's no support for kernel 5.x
>> > yet.  So I have backported the recent skcipher updates, and tested this
>> > with 4.19. I don't have the hardware with me, but I have run-tested
>> > everything, working remotely.
>> >
>> > All of the skciphers directly implemented by the driver work.  They pass
>> > the tcrypt tests, and also some tests from userspace using AF_ALG:
>> > https://github.com/cotequeiroz/afalg_tests
>> >
>> > However, I can't get gcm(aes) to work.  When setting the gcm-mode key,
>> > it sets the ctr(aes) key, then encrypt a block of zeroes, and uses that
>> > as the ghash key.  The driver fails to perform that encryption.  I've
>> > dumped the input and output data, and they apparently are not touched by
>> > the QCE.  The IV, which written to a buffer appended to the results sg
>> > list gets updated, but the results themselves are not.  I'm not sure
>> > what goes wrong, if it is a DMA/cache problem, memory alignment, or
>> > whatever.
>> >
>>
>> This does sound like a DMA problem. I assume the accelerator is not
>> cache coherent?
>>
>> In any case, it is dubious whether the round trip to the accelerator
>> is worth it when encrypting the GHASH key. Just call aes_encrypt()
>> instead, and do it in software.
>
>
> ipsec still fails, even if I use software for every single-block operation. I can perhaps leave that as an optimization, but it won't fix the main issue.
>
>> > If I take 'be128 hash' out of the 'data' struct, and kzalloc them
>> > separately in crypto_gcm_setkey (crypto/gcm.c), it encrypts the data
>> > just fine--perhaps the payload and the request struct can't be in the
>> > same page?
>> >
>>
>> Non-cache coherent DMA involves cache invalidation on inbound data. So
>> if both the device and the CPU write to the same cacheline while the
>> buffer is mapped for DMA from device to memory, one of the updates
>> gets lost.
>
>
>  Can you give me any pointers/examples of how I can make this work?
>

You could have a look at commit ed527b13d800dd515a9e6c582f0a73eca65b2e1b

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: QCE hw-crypto DMA issues
  2020-01-03 14:33     ` Ard Biesheuvel
@ 2020-01-03 14:37       ` Eneas Queiroz
  0 siblings, 0 replies; 12+ messages in thread
From: Eneas Queiroz @ 2020-01-03 14:37 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE

On Fri, Jan 3, 2020 at 11:33 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Thu, 2 Jan 2020 at 22:09, Eneas Queiroz <cotequeiroz@gmail.com> wrote:
> >>
> >> Non-cache coherent DMA involves cache invalidation on inbound data. So
> >> if both the device and the CPU write to the same cacheline while the
> >> buffer is mapped for DMA from device to memory, one of the updates
> >> gets lost.
> >
> >
> >  Can you give me any pointers/examples of how I can make this work?
> >
>
> You could have a look at commit ed527b13d800dd515a9e6c582f0a73eca65b2e1b

Thanks, I'll check it out.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-01-03 14:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-20 19:02 Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Eneas U de Queiroz
2019-12-20 19:02 ` [PATCH 1/6] crypto: qce - fix ctr-aes-qce block, chunk sizes Eneas U de Queiroz
2019-12-20 19:02 ` [PATCH 2/6] crypto: qce - fix xts-aes-qce key sizes Eneas U de Queiroz
2019-12-20 19:02 ` [PATCH 3/6] crypto: qce - save a sg table slot for result buf Eneas U de Queiroz
2019-12-20 19:02 ` [PATCH 4/6] crypto: qce - update the skcipher IV Eneas U de Queiroz
2019-12-20 19:02 ` [PATCH 5/6] crypto: qce - initialize fallback only for AES Eneas U de Queiroz
2019-12-20 19:02 ` [PATCH 6/6] crypto: qce - allow building only hashes/ciphers Eneas U de Queiroz
2019-12-23  9:46 ` Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Ard Biesheuvel
     [not found]   ` <CAPxccB2LGANG8DcmF4nwUDOzDzf2RHX4S-4w9z6TcO9csu4xSw@mail.gmail.com>
2020-01-02 21:14     ` Fwd: QCE hw-crypto DMA issues Eneas Queiroz
2020-01-03 14:33     ` Ard Biesheuvel
2020-01-03 14:37       ` Eneas Queiroz
2019-12-27 10:39 ` Subject: [PATCH 0/6] crypto: QCE hw-crypto fixes Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).