All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Improve DMA chaining for ahash requests
@ 2016-10-04 12:57 ` Romain Perier
  0 siblings, 0 replies; 10+ messages in thread
From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw)
  To: Boris Brezillon, Arnaud Ebalard
  Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel

This series contain performance improvement regarding ahash requests.
So far, ahash requests were systematically not chained at the DMA level.
However, in some case, like this is the case by using IPSec, some ahash
requests can be processed directly by the engine, and don't have
intermediaire partial update states.

This series firstly re-work the way outer IVs are copied from the SRAM
into the dma pool. To do so, we introduce a common dma pool for all type
of requests that contains outer results (like IV or digest). Then, for
ahash requests that can be processed directly by the engine, outer
results are copied from the SRAM into the common dma pool. These requests
are then allowed to be chained at the DMA level.


Benchmarking results with iperf throught IPSec
==============================================
		ESP			AH

Before		343 Mbits/s		492 Mbits/s
After		422 Mbits/s		577 Mbits/s
Improvement	+23%			+17%

Romain Perier (2):
  crypto: marvell - Use an unique pool to copy results of requests
  crypto: marvell - Don't break chain for computable last ahash requests

 drivers/crypto/marvell/cesa.c   |  4 ---
 drivers/crypto/marvell/cesa.h   |  5 ++-
 drivers/crypto/marvell/cipher.c |  8 +++--
 drivers/crypto/marvell/hash.c   | 79 +++++++++++++++++++++++++++++++++--------
 drivers/crypto/marvell/tdma.c   | 28 +++++++--------
 5 files changed, 85 insertions(+), 39 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 0/2] Improve DMA chaining for ahash requests
@ 2016-10-04 12:57 ` Romain Perier
  0 siblings, 0 replies; 10+ messages in thread
From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

This series contain performance improvement regarding ahash requests.
So far, ahash requests were systematically not chained at the DMA level.
However, in some case, like this is the case by using IPSec, some ahash
requests can be processed directly by the engine, and don't have
intermediaire partial update states.

This series firstly re-work the way outer IVs are copied from the SRAM
into the dma pool. To do so, we introduce a common dma pool for all type
of requests that contains outer results (like IV or digest). Then, for
ahash requests that can be processed directly by the engine, outer
results are copied from the SRAM into the common dma pool. These requests
are then allowed to be chained at the DMA level.


Benchmarking results with iperf throught IPSec
==============================================
		ESP			AH

Before		343 Mbits/s		492 Mbits/s
After		422 Mbits/s		577 Mbits/s
Improvement	+23%			+17%

Romain Perier (2):
  crypto: marvell - Use an unique pool to copy results of requests
  crypto: marvell - Don't break chain for computable last ahash requests

 drivers/crypto/marvell/cesa.c   |  4 ---
 drivers/crypto/marvell/cesa.h   |  5 ++-
 drivers/crypto/marvell/cipher.c |  8 +++--
 drivers/crypto/marvell/hash.c   | 79 +++++++++++++++++++++++++++++++++--------
 drivers/crypto/marvell/tdma.c   | 28 +++++++--------
 5 files changed, 85 insertions(+), 39 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests
  2016-10-04 12:57 ` Romain Perier
@ 2016-10-04 12:57   ` Romain Perier
  -1 siblings, 0 replies; 10+ messages in thread
From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw)
  To: Boris Brezillon, Arnaud Ebalard
  Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel

So far, we used a dedicated dma pool to copy the result of outer IV for
cipher requests. Instead of using a dma pool per outer data, we prefer
use the op dma pool that contains all part of the request from the SRAM.
Then, the outer data that is likely to be used by the 'complete'
operation, is copied later. In this way, any type of result can be
retrieved by DMA for cipher or ahash requests.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v3:
  - Don't allocate a new op ctx for the last tdma descriptor. Instead
    we point to the last op ctx in the tdma chain, and copy the context
    of the current request to this location.

Changes in v2:
  - Use the dma pool "op" to retrieve outer data intead of introducing
    a new one.

 drivers/crypto/marvell/cesa.c   |  4 ----
 drivers/crypto/marvell/cesa.h   |  5 ++---
 drivers/crypto/marvell/cipher.c |  8 +++++---
 drivers/crypto/marvell/tdma.c   | 28 ++++++++++++++--------------
 4 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
index 37dadb2..6e7a5c7 100644
--- a/drivers/crypto/marvell/cesa.c
+++ b/drivers/crypto/marvell/cesa.c
@@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa)
 	if (!dma->padding_pool)
 		return -ENOMEM;
 
-	dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0);
-	if (!dma->iv_pool)
-		return -ENOMEM;
-
 	cesa->dma = dma;
 
 	return 0;
diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e423d33..a768da7 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -277,7 +277,7 @@ struct mv_cesa_op_ctx {
 #define CESA_TDMA_DUMMY				0
 #define CESA_TDMA_DATA				1
 #define CESA_TDMA_OP				2
-#define CESA_TDMA_IV				3
+#define CESA_TDMA_RESULT			3
 
 /**
  * struct mv_cesa_tdma_desc - TDMA descriptor
@@ -393,7 +393,6 @@ struct mv_cesa_dev_dma {
 	struct dma_pool *op_pool;
 	struct dma_pool *cache_pool;
 	struct dma_pool *padding_pool;
-	struct dma_pool *iv_pool;
 };
 
 /**
@@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain)
 	memset(chain, 0, sizeof(*chain));
 }
 
-int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
+int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
 			  u32 size, u32 flags, gfp_t gfp_flags);
 
 struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,
diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index d19dc96..098871a 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req)
 		struct mv_cesa_req *basereq;
 
 		basereq = &creq->base;
-		memcpy(ablkreq->info, basereq->chain.last->data, ivsize);
+		memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv,
+		       ivsize);
 	} else {
 		memcpy_fromio(ablkreq->info,
 			      engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
@@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req,
 
 	/* Add output data for IV */
 	ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req));
-	ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET,
-				    ivsize, CESA_TDMA_SRC_IN_SRAM, flags);
+	ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET,
+				    CESA_SA_DATA_SRAM_OFFSET,
+				    CESA_TDMA_SRC_IN_SRAM, flags);
 
 	if (ret)
 		goto err_free_tdma;
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 9fd7a5f..991dc3f 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq)
 		if (type == CESA_TDMA_OP)
 			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
 				      le32_to_cpu(tdma->src));
-		else if (type == CESA_TDMA_IV)
-			dma_pool_free(cesa_dev->dma->iv_pool, tdma->data,
-				      le32_to_cpu(tdma->dst));
 
 		tdma = tdma->next;
 		dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma,
@@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags)
 	return new_tdma;
 }
 
-int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
+int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
 			  u32 size, u32 flags, gfp_t gfp_flags)
 {
-
-	struct mv_cesa_tdma_desc *tdma;
-	u8 *iv;
-	dma_addr_t dma_handle;
+	struct mv_cesa_tdma_desc *tdma, *op_desc;
 
 	tdma = mv_cesa_dma_add_desc(chain, gfp_flags);
 	if (IS_ERR(tdma))
 		return PTR_ERR(tdma);
 
-	iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle);
-	if (!iv)
-		return -ENOMEM;
+	for (op_desc = chain->first; op_desc; op_desc = op_desc->next) {
+		u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK;
+
+		if (type == CESA_TDMA_OP)
+			break;
+	}
+
+	if (!op_desc)
+		return -EIO;
 
 	tdma->byte_cnt = cpu_to_le32(size | BIT(31));
 	tdma->src = src;
-	tdma->dst = cpu_to_le32(dma_handle);
-	tdma->data = iv;
+	tdma->dst = op_desc->src;
+	tdma->op = op_desc->op;
 
 	flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM);
-	tdma->flags = flags | CESA_TDMA_IV;
+	tdma->flags = flags | CESA_TDMA_RESULT;
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests
@ 2016-10-04 12:57   ` Romain Perier
  0 siblings, 0 replies; 10+ messages in thread
From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

So far, we used a dedicated dma pool to copy the result of outer IV for
cipher requests. Instead of using a dma pool per outer data, we prefer
use the op dma pool that contains all part of the request from the SRAM.
Then, the outer data that is likely to be used by the 'complete'
operation, is copied later. In this way, any type of result can be
retrieved by DMA for cipher or ahash requests.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v3:
  - Don't allocate a new op ctx for the last tdma descriptor. Instead
    we point to the last op ctx in the tdma chain, and copy the context
    of the current request to this location.

Changes in v2:
  - Use the dma pool "op" to retrieve outer data intead of introducing
    a new one.

 drivers/crypto/marvell/cesa.c   |  4 ----
 drivers/crypto/marvell/cesa.h   |  5 ++---
 drivers/crypto/marvell/cipher.c |  8 +++++---
 drivers/crypto/marvell/tdma.c   | 28 ++++++++++++++--------------
 4 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
index 37dadb2..6e7a5c7 100644
--- a/drivers/crypto/marvell/cesa.c
+++ b/drivers/crypto/marvell/cesa.c
@@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa)
 	if (!dma->padding_pool)
 		return -ENOMEM;
 
-	dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0);
-	if (!dma->iv_pool)
-		return -ENOMEM;
-
 	cesa->dma = dma;
 
 	return 0;
diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e423d33..a768da7 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -277,7 +277,7 @@ struct mv_cesa_op_ctx {
 #define CESA_TDMA_DUMMY				0
 #define CESA_TDMA_DATA				1
 #define CESA_TDMA_OP				2
-#define CESA_TDMA_IV				3
+#define CESA_TDMA_RESULT			3
 
 /**
  * struct mv_cesa_tdma_desc - TDMA descriptor
@@ -393,7 +393,6 @@ struct mv_cesa_dev_dma {
 	struct dma_pool *op_pool;
 	struct dma_pool *cache_pool;
 	struct dma_pool *padding_pool;
-	struct dma_pool *iv_pool;
 };
 
 /**
@@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain)
 	memset(chain, 0, sizeof(*chain));
 }
 
-int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
+int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
 			  u32 size, u32 flags, gfp_t gfp_flags);
 
 struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,
diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index d19dc96..098871a 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req)
 		struct mv_cesa_req *basereq;
 
 		basereq = &creq->base;
-		memcpy(ablkreq->info, basereq->chain.last->data, ivsize);
+		memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv,
+		       ivsize);
 	} else {
 		memcpy_fromio(ablkreq->info,
 			      engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
@@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req,
 
 	/* Add output data for IV */
 	ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req));
-	ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET,
-				    ivsize, CESA_TDMA_SRC_IN_SRAM, flags);
+	ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET,
+				    CESA_SA_DATA_SRAM_OFFSET,
+				    CESA_TDMA_SRC_IN_SRAM, flags);
 
 	if (ret)
 		goto err_free_tdma;
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 9fd7a5f..991dc3f 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq)
 		if (type == CESA_TDMA_OP)
 			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
 				      le32_to_cpu(tdma->src));
-		else if (type == CESA_TDMA_IV)
-			dma_pool_free(cesa_dev->dma->iv_pool, tdma->data,
-				      le32_to_cpu(tdma->dst));
 
 		tdma = tdma->next;
 		dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma,
@@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags)
 	return new_tdma;
 }
 
-int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
+int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
 			  u32 size, u32 flags, gfp_t gfp_flags)
 {
-
-	struct mv_cesa_tdma_desc *tdma;
-	u8 *iv;
-	dma_addr_t dma_handle;
+	struct mv_cesa_tdma_desc *tdma, *op_desc;
 
 	tdma = mv_cesa_dma_add_desc(chain, gfp_flags);
 	if (IS_ERR(tdma))
 		return PTR_ERR(tdma);
 
-	iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle);
-	if (!iv)
-		return -ENOMEM;
+	for (op_desc = chain->first; op_desc; op_desc = op_desc->next) {
+		u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK;
+
+		if (type == CESA_TDMA_OP)
+			break;
+	}
+
+	if (!op_desc)
+		return -EIO;
 
 	tdma->byte_cnt = cpu_to_le32(size | BIT(31));
 	tdma->src = src;
-	tdma->dst = cpu_to_le32(dma_handle);
-	tdma->data = iv;
+	tdma->dst = op_desc->src;
+	tdma->op = op_desc->op;
 
 	flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM);
-	tdma->flags = flags | CESA_TDMA_IV;
+	tdma->flags = flags | CESA_TDMA_RESULT;
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests
  2016-10-04 12:57 ` Romain Perier
@ 2016-10-04 12:57   ` Romain Perier
  -1 siblings, 0 replies; 10+ messages in thread
From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw)
  To: Boris Brezillon, Arnaud Ebalard
  Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel

Currently, the driver breaks chain for all kind of hash requests in order to
don't override intermediate states of partial ahash updates. However, some final
ahash requests can be directly processed by the engine, and so without
intermediate state. This is typically the case for most for the HMAC requests
processed via IPSec.

This commits adds a TDMA descriptor to copy context for these of requests
into the "op" dma pool, then it allow to chain these requests at the DMA level.
The 'complete' operation is also updated to retrieve the MAC digest from the
right location.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v3:
 - Copy the whole context back to RAM and not just the digest. Also
   fixed a rebase issue ^^ (whoops)

Changes in v2:
 - Replaced BUG_ON by an error
 - Add a variable "break_chain", with "type" to break the chain

   with ahash requests. It improves code readability.
 drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 64 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 9f28468..b36f196 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 	int i;
 
 	digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
-	for (i = 0; i < digsize / 4; i++)
-		creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i));
 
-	if (creq->last_req) {
+	if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ &&
+	    !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) {
+		struct mv_cesa_tdma_desc *tdma = NULL;
+		__le32 *data = NULL;
+
+		for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) {
+			u32 type = tdma->flags & CESA_TDMA_TYPE_MSK;
+			if (type ==  CESA_TDMA_RESULT)
+				break;
+		}
+
+		if (!tdma) {
+			dev_err(cesa_dev->dev, "Failed to retrieve tdma "
+					       "descriptor for outer data\n");
+			return;
+		}
+
 		/*
-		 * Hardware's MD5 digest is in little endian format, but
-		 * SHA in big endian format
+		 * Result is already in the correct endianess when the SA is
+		 * used
 		 */
-		if (creq->algo_le) {
-			__le32 *result = (void *)ahashreq->result;
+		data = tdma->op->ctx.hash.hash;
+		for (i = 0; i < digsize / 4; i++)
+			creq->state[i] = cpu_to_le32(data[i]);
 
-			for (i = 0; i < digsize / 4; i++)
-				result[i] = cpu_to_le32(creq->state[i]);
-		} else {
-			__be32 *result = (void *)ahashreq->result;
+		memcpy(ahashreq->result, data, digsize);
+	} else {
+		for (i = 0; i < digsize / 4; i++)
+			creq->state[i] = readl_relaxed(engine->regs +
+						       CESA_IVDIG(i));
+		if (creq->last_req) {
+			/*
+			* Hardware's MD5 digest is in little endian format, but
+			* SHA in big endian format
+			*/
+			if (creq->algo_le) {
+				__le32 *result = (void *)ahashreq->result;
+
+				for (i = 0; i < digsize / 4; i++)
+					result[i] = cpu_to_le32(creq->state[i]);
+			} else {
+				__be32 *result = (void *)ahashreq->result;
 
-			for (i = 0; i < digsize / 4; i++)
-				result[i] = cpu_to_be32(creq->state[i]);
+				for (i = 0; i < digsize / 4; i++)
+					result[i] = cpu_to_be32(creq->state[i]);
+			}
 		}
 	}
 
@@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
 						CESA_SA_DESC_CFG_LAST_FRAG,
 				      CESA_SA_DESC_CFG_FRAG_MSK);
 
+		ret = mv_cesa_dma_add_result_op(chain,
+						CESA_SA_CFG_SRAM_OFFSET,
+						CESA_SA_DATA_SRAM_OFFSET,
+						CESA_TDMA_SRC_IN_SRAM, flags);
+		if (ret)
+			return ERR_PTR(-ENOMEM);
 		return op;
 	}
 
@@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 	struct mv_cesa_op_ctx *op = NULL;
 	unsigned int frag_len;
 	int ret;
+	u32 type;
+	bool break_chain = true;
 
 	basereq->chain.first = NULL;
 	basereq->chain.last = NULL;
@@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 		goto err_free_tdma;
 	}
 
+	/*
+	 * If results are copied via DMA, this means that this
+	 * request can be directly processed by the engine,
+	 * without partial updates. So we can chain it at the
+	 * DMA level with other requests.
+	 */
+	type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK;
+	if (type == CESA_TDMA_RESULT)
+		break_chain = false;
+
 	if (op) {
 		/* Add dummy desc to wait for crypto operation end */
 		ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags);
@@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 	else
 		creq->cache_ptr = 0;
 
-	basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ |
-				       CESA_TDMA_BREAK_CHAIN);
+	basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ;
+
+	if (break_chain)
+		basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN;
 
 	return 0;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests
@ 2016-10-04 12:57   ` Romain Perier
  0 siblings, 0 replies; 10+ messages in thread
From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, the driver breaks chain for all kind of hash requests in order to
don't override intermediate states of partial ahash updates. However, some final
ahash requests can be directly processed by the engine, and so without
intermediate state. This is typically the case for most for the HMAC requests
processed via IPSec.

This commits adds a TDMA descriptor to copy context for these of requests
into the "op" dma pool, then it allow to chain these requests at the DMA level.
The 'complete' operation is also updated to retrieve the MAC digest from the
right location.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v3:
 - Copy the whole context back to RAM and not just the digest. Also
   fixed a rebase issue ^^ (whoops)

Changes in v2:
 - Replaced BUG_ON by an error
 - Add a variable "break_chain", with "type" to break the chain

   with ahash requests. It improves code readability.
 drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 64 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 9f28468..b36f196 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 	int i;
 
 	digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
-	for (i = 0; i < digsize / 4; i++)
-		creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i));
 
-	if (creq->last_req) {
+	if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ &&
+	    !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) {
+		struct mv_cesa_tdma_desc *tdma = NULL;
+		__le32 *data = NULL;
+
+		for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) {
+			u32 type = tdma->flags & CESA_TDMA_TYPE_MSK;
+			if (type ==  CESA_TDMA_RESULT)
+				break;
+		}
+
+		if (!tdma) {
+			dev_err(cesa_dev->dev, "Failed to retrieve tdma "
+					       "descriptor for outer data\n");
+			return;
+		}
+
 		/*
-		 * Hardware's MD5 digest is in little endian format, but
-		 * SHA in big endian format
+		 * Result is already in the correct endianess when the SA is
+		 * used
 		 */
-		if (creq->algo_le) {
-			__le32 *result = (void *)ahashreq->result;
+		data = tdma->op->ctx.hash.hash;
+		for (i = 0; i < digsize / 4; i++)
+			creq->state[i] = cpu_to_le32(data[i]);
 
-			for (i = 0; i < digsize / 4; i++)
-				result[i] = cpu_to_le32(creq->state[i]);
-		} else {
-			__be32 *result = (void *)ahashreq->result;
+		memcpy(ahashreq->result, data, digsize);
+	} else {
+		for (i = 0; i < digsize / 4; i++)
+			creq->state[i] = readl_relaxed(engine->regs +
+						       CESA_IVDIG(i));
+		if (creq->last_req) {
+			/*
+			* Hardware's MD5 digest is in little endian format, but
+			* SHA in big endian format
+			*/
+			if (creq->algo_le) {
+				__le32 *result = (void *)ahashreq->result;
+
+				for (i = 0; i < digsize / 4; i++)
+					result[i] = cpu_to_le32(creq->state[i]);
+			} else {
+				__be32 *result = (void *)ahashreq->result;
 
-			for (i = 0; i < digsize / 4; i++)
-				result[i] = cpu_to_be32(creq->state[i]);
+				for (i = 0; i < digsize / 4; i++)
+					result[i] = cpu_to_be32(creq->state[i]);
+			}
 		}
 	}
 
@@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
 						CESA_SA_DESC_CFG_LAST_FRAG,
 				      CESA_SA_DESC_CFG_FRAG_MSK);
 
+		ret = mv_cesa_dma_add_result_op(chain,
+						CESA_SA_CFG_SRAM_OFFSET,
+						CESA_SA_DATA_SRAM_OFFSET,
+						CESA_TDMA_SRC_IN_SRAM, flags);
+		if (ret)
+			return ERR_PTR(-ENOMEM);
 		return op;
 	}
 
@@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 	struct mv_cesa_op_ctx *op = NULL;
 	unsigned int frag_len;
 	int ret;
+	u32 type;
+	bool break_chain = true;
 
 	basereq->chain.first = NULL;
 	basereq->chain.last = NULL;
@@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 		goto err_free_tdma;
 	}
 
+	/*
+	 * If results are copied via DMA, this means that this
+	 * request can be directly processed by the engine,
+	 * without partial updates. So we can chain it at the
+	 * DMA level with other requests.
+	 */
+	type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK;
+	if (type == CESA_TDMA_RESULT)
+		break_chain = false;
+
 	if (op) {
 		/* Add dummy desc to wait for crypto operation end */
 		ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags);
@@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 	else
 		creq->cache_ptr = 0;
 
-	basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ |
-				       CESA_TDMA_BREAK_CHAIN);
+	basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ;
+
+	if (break_chain)
+		basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN;
 
 	return 0;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests
  2016-10-04 12:57   ` Romain Perier
@ 2016-10-04 13:17     ` Boris Brezillon
  -1 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2016-10-04 13:17 UTC (permalink / raw)
  To: Romain Perier
  Cc: Arnaud Ebalard, David S. Miller, Herbert Xu, Thomas Petazzoni,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Gregory Clement, Nadav Haklai, Ofer Heifetz, linux-crypto,
	linux-arm-kernel

On Tue,  4 Oct 2016 14:57:19 +0200
Romain Perier <romain.perier@free-electrons.com> wrote:

> So far, we used a dedicated dma pool to copy the result of outer IV for
> cipher requests. Instead of using a dma pool per outer data, we prefer
> use the op dma pool that contains all part of the request from the SRAM.
> Then, the outer data that is likely to be used by the 'complete'
> operation, is copied later. In this way, any type of result can be
> retrieved by DMA for cipher or ahash requests.
> 
> Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
> ---
> 
> Changes in v3:
>   - Don't allocate a new op ctx for the last tdma descriptor. Instead
>     we point to the last op ctx in the tdma chain, and copy the context
>     of the current request to this location.
> 
> Changes in v2:
>   - Use the dma pool "op" to retrieve outer data intead of introducing
>     a new one.
> 
>  drivers/crypto/marvell/cesa.c   |  4 ----
>  drivers/crypto/marvell/cesa.h   |  5 ++---
>  drivers/crypto/marvell/cipher.c |  8 +++++---
>  drivers/crypto/marvell/tdma.c   | 28 ++++++++++++++--------------
>  4 files changed, 21 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
> index 37dadb2..6e7a5c7 100644
> --- a/drivers/crypto/marvell/cesa.c
> +++ b/drivers/crypto/marvell/cesa.c
> @@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa)
>  	if (!dma->padding_pool)
>  		return -ENOMEM;
>  
> -	dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0);
> -	if (!dma->iv_pool)
> -		return -ENOMEM;
> -
>  	cesa->dma = dma;
>  
>  	return 0;
> diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
> index e423d33..a768da7 100644
> --- a/drivers/crypto/marvell/cesa.h
> +++ b/drivers/crypto/marvell/cesa.h
> @@ -277,7 +277,7 @@ struct mv_cesa_op_ctx {
>  #define CESA_TDMA_DUMMY				0
>  #define CESA_TDMA_DATA				1
>  #define CESA_TDMA_OP				2
> -#define CESA_TDMA_IV				3
> +#define CESA_TDMA_RESULT			3
>  
>  /**
>   * struct mv_cesa_tdma_desc - TDMA descriptor
> @@ -393,7 +393,6 @@ struct mv_cesa_dev_dma {
>  	struct dma_pool *op_pool;
>  	struct dma_pool *cache_pool;
>  	struct dma_pool *padding_pool;
> -	struct dma_pool *iv_pool;
>  };
>  
>  /**
> @@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain)
>  	memset(chain, 0, sizeof(*chain));
>  }
>  
> -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
> +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
>  			  u32 size, u32 flags, gfp_t gfp_flags);
>  
>  struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,
> diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
> index d19dc96..098871a 100644
> --- a/drivers/crypto/marvell/cipher.c
> +++ b/drivers/crypto/marvell/cipher.c
> @@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req)
>  		struct mv_cesa_req *basereq;
>  
>  		basereq = &creq->base;
> -		memcpy(ablkreq->info, basereq->chain.last->data, ivsize);
> +		memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv,
> +		       ivsize);
>  	} else {
>  		memcpy_fromio(ablkreq->info,
>  			      engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
> @@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req,
>  
>  	/* Add output data for IV */
>  	ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req));
> -	ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET,
> -				    ivsize, CESA_TDMA_SRC_IN_SRAM, flags);
> +	ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET,
> +				    CESA_SA_DATA_SRAM_OFFSET,
> +				    CESA_TDMA_SRC_IN_SRAM, flags);
>  
>  	if (ret)
>  		goto err_free_tdma;
> diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
> index 9fd7a5f..991dc3f 100644
> --- a/drivers/crypto/marvell/tdma.c
> +++ b/drivers/crypto/marvell/tdma.c
> @@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq)
>  		if (type == CESA_TDMA_OP)
>  			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
>  				      le32_to_cpu(tdma->src));
> -		else if (type == CESA_TDMA_IV)
> -			dma_pool_free(cesa_dev->dma->iv_pool, tdma->data,
> -				      le32_to_cpu(tdma->dst));
>  
>  		tdma = tdma->next;
>  		dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma,
> @@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags)
>  	return new_tdma;
>  }
>  
> -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
> +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
>  			  u32 size, u32 flags, gfp_t gfp_flags)
>  {
> -
> -	struct mv_cesa_tdma_desc *tdma;
> -	u8 *iv;
> -	dma_addr_t dma_handle;
> +	struct mv_cesa_tdma_desc *tdma, *op_desc;
>  
>  	tdma = mv_cesa_dma_add_desc(chain, gfp_flags);
>  	if (IS_ERR(tdma))
>  		return PTR_ERR(tdma);
>  
> -	iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle);
> -	if (!iv)
> -		return -ENOMEM;

Can you add a comment explaining what you're doing here?

	/* We re-use an existing op_desc object to retrieve the context
	 * and result instead of allocating a new one.
	 * There is at least one object of this type in a CESA crypto
	 * req, just pick the first one in the chain.
	 */

Once this is addressed, you can add my

Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>

> +	for (op_desc = chain->first; op_desc; op_desc = op_desc->next) {
> +		u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK;
> +
> +		if (type == CESA_TDMA_OP)
> +			break;
> +	}
> +
> +	if (!op_desc)
> +		return -EIO;
>  
>  	tdma->byte_cnt = cpu_to_le32(size | BIT(31));
>  	tdma->src = src;
> -	tdma->dst = cpu_to_le32(dma_handle);
> -	tdma->data = iv;
> +	tdma->dst = op_desc->src;
> +	tdma->op = op_desc->op;
>  
>  	flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM);
> -	tdma->flags = flags | CESA_TDMA_IV;
> +	tdma->flags = flags | CESA_TDMA_RESULT;
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests
@ 2016-10-04 13:17     ` Boris Brezillon
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2016-10-04 13:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue,  4 Oct 2016 14:57:19 +0200
Romain Perier <romain.perier@free-electrons.com> wrote:

> So far, we used a dedicated dma pool to copy the result of outer IV for
> cipher requests. Instead of using a dma pool per outer data, we prefer
> use the op dma pool that contains all part of the request from the SRAM.
> Then, the outer data that is likely to be used by the 'complete'
> operation, is copied later. In this way, any type of result can be
> retrieved by DMA for cipher or ahash requests.
> 
> Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
> ---
> 
> Changes in v3:
>   - Don't allocate a new op ctx for the last tdma descriptor. Instead
>     we point to the last op ctx in the tdma chain, and copy the context
>     of the current request to this location.
> 
> Changes in v2:
>   - Use the dma pool "op" to retrieve outer data intead of introducing
>     a new one.
> 
>  drivers/crypto/marvell/cesa.c   |  4 ----
>  drivers/crypto/marvell/cesa.h   |  5 ++---
>  drivers/crypto/marvell/cipher.c |  8 +++++---
>  drivers/crypto/marvell/tdma.c   | 28 ++++++++++++++--------------
>  4 files changed, 21 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
> index 37dadb2..6e7a5c7 100644
> --- a/drivers/crypto/marvell/cesa.c
> +++ b/drivers/crypto/marvell/cesa.c
> @@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa)
>  	if (!dma->padding_pool)
>  		return -ENOMEM;
>  
> -	dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0);
> -	if (!dma->iv_pool)
> -		return -ENOMEM;
> -
>  	cesa->dma = dma;
>  
>  	return 0;
> diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
> index e423d33..a768da7 100644
> --- a/drivers/crypto/marvell/cesa.h
> +++ b/drivers/crypto/marvell/cesa.h
> @@ -277,7 +277,7 @@ struct mv_cesa_op_ctx {
>  #define CESA_TDMA_DUMMY				0
>  #define CESA_TDMA_DATA				1
>  #define CESA_TDMA_OP				2
> -#define CESA_TDMA_IV				3
> +#define CESA_TDMA_RESULT			3
>  
>  /**
>   * struct mv_cesa_tdma_desc - TDMA descriptor
> @@ -393,7 +393,6 @@ struct mv_cesa_dev_dma {
>  	struct dma_pool *op_pool;
>  	struct dma_pool *cache_pool;
>  	struct dma_pool *padding_pool;
> -	struct dma_pool *iv_pool;
>  };
>  
>  /**
> @@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain)
>  	memset(chain, 0, sizeof(*chain));
>  }
>  
> -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
> +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
>  			  u32 size, u32 flags, gfp_t gfp_flags);
>  
>  struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,
> diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
> index d19dc96..098871a 100644
> --- a/drivers/crypto/marvell/cipher.c
> +++ b/drivers/crypto/marvell/cipher.c
> @@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req)
>  		struct mv_cesa_req *basereq;
>  
>  		basereq = &creq->base;
> -		memcpy(ablkreq->info, basereq->chain.last->data, ivsize);
> +		memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv,
> +		       ivsize);
>  	} else {
>  		memcpy_fromio(ablkreq->info,
>  			      engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
> @@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req,
>  
>  	/* Add output data for IV */
>  	ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req));
> -	ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET,
> -				    ivsize, CESA_TDMA_SRC_IN_SRAM, flags);
> +	ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET,
> +				    CESA_SA_DATA_SRAM_OFFSET,
> +				    CESA_TDMA_SRC_IN_SRAM, flags);
>  
>  	if (ret)
>  		goto err_free_tdma;
> diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
> index 9fd7a5f..991dc3f 100644
> --- a/drivers/crypto/marvell/tdma.c
> +++ b/drivers/crypto/marvell/tdma.c
> @@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq)
>  		if (type == CESA_TDMA_OP)
>  			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
>  				      le32_to_cpu(tdma->src));
> -		else if (type == CESA_TDMA_IV)
> -			dma_pool_free(cesa_dev->dma->iv_pool, tdma->data,
> -				      le32_to_cpu(tdma->dst));
>  
>  		tdma = tdma->next;
>  		dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma,
> @@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags)
>  	return new_tdma;
>  }
>  
> -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
> +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
>  			  u32 size, u32 flags, gfp_t gfp_flags)
>  {
> -
> -	struct mv_cesa_tdma_desc *tdma;
> -	u8 *iv;
> -	dma_addr_t dma_handle;
> +	struct mv_cesa_tdma_desc *tdma, *op_desc;
>  
>  	tdma = mv_cesa_dma_add_desc(chain, gfp_flags);
>  	if (IS_ERR(tdma))
>  		return PTR_ERR(tdma);
>  
> -	iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle);
> -	if (!iv)
> -		return -ENOMEM;

Can you add a comment explaining what you're doing here?

	/* We re-use an existing op_desc object to retrieve the context
	 * and result instead of allocating a new one.
	 * There is at least one object of this type in a CESA crypto
	 * req, just pick the first one in the chain.
	 */

Once this is addressed, you can add my

Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>

> +	for (op_desc = chain->first; op_desc; op_desc = op_desc->next) {
> +		u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK;
> +
> +		if (type == CESA_TDMA_OP)
> +			break;
> +	}
> +
> +	if (!op_desc)
> +		return -EIO;
>  
>  	tdma->byte_cnt = cpu_to_le32(size | BIT(31));
>  	tdma->src = src;
> -	tdma->dst = cpu_to_le32(dma_handle);
> -	tdma->data = iv;
> +	tdma->dst = op_desc->src;
> +	tdma->op = op_desc->op;
>  
>  	flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM);
> -	tdma->flags = flags | CESA_TDMA_IV;
> +	tdma->flags = flags | CESA_TDMA_RESULT;
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests
  2016-10-04 12:57   ` Romain Perier
@ 2016-10-04 14:14     ` Boris Brezillon
  -1 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2016-10-04 14:14 UTC (permalink / raw)
  To: Romain Perier
  Cc: Arnaud Ebalard, David S. Miller, Herbert Xu, Thomas Petazzoni,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Gregory Clement, Nadav Haklai, Ofer Heifetz, linux-crypto,
	linux-arm-kernel

On Tue,  4 Oct 2016 14:57:20 +0200
Romain Perier <romain.perier@free-electrons.com> wrote:

> Currently, the driver breaks chain for all kind of hash requests in order to
> don't override intermediate states of partial ahash updates. However, some final
> ahash requests can be directly processed by the engine, and so without
> intermediate state. This is typically the case for most for the HMAC requests
> processed via IPSec.
> 
> This commits adds a TDMA descriptor to copy context for these of requests
> into the "op" dma pool, then it allow to chain these requests at the DMA level.
> The 'complete' operation is also updated to retrieve the MAC digest from the
> right location.
> 
> Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
> ---
> 
> Changes in v3:
>  - Copy the whole context back to RAM and not just the digest. Also
>    fixed a rebase issue ^^ (whoops)
> 
> Changes in v2:
>  - Replaced BUG_ON by an error
>  - Add a variable "break_chain", with "type" to break the chain
> 
>    with ahash requests. It improves code readability.
>  drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 64 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index 9f28468..b36f196 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
>  	int i;
>  
>  	digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
> -	for (i = 0; i < digsize / 4; i++)
> -		creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i));
>  
> -	if (creq->last_req) {
> +	if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ &&
> +	    !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) {
> +		struct mv_cesa_tdma_desc *tdma = NULL;
> +		__le32 *data = NULL;
> +
> +		for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) {
> +			u32 type = tdma->flags & CESA_TDMA_TYPE_MSK;
> +			if (type ==  CESA_TDMA_RESULT)
> +				break;
> +		}

You should be able to drop the DUMMY desc at the end of the chain and
replace it by the RESULT desc. This way, you won't have to iterate over
the chain to find the TDMA_RESULT element: it should always be the last
desc in the chain.

> +
> +		if (!tdma) {
> +			dev_err(cesa_dev->dev, "Failed to retrieve tdma "
> +					       "descriptor for outer data\n");
> +			return;
> +		}
> +
>  		/*
> -		 * Hardware's MD5 digest is in little endian format, but
> -		 * SHA in big endian format
> +		 * Result is already in the correct endianess when the SA is
> +		 * used
>  		 */
> -		if (creq->algo_le) {
> -			__le32 *result = (void *)ahashreq->result;
> +		data = tdma->op->ctx.hash.hash;
> +		for (i = 0; i < digsize / 4; i++)
> +			creq->state[i] = cpu_to_le32(data[i]);
>  
> -			for (i = 0; i < digsize / 4; i++)
> -				result[i] = cpu_to_le32(creq->state[i]);
> -		} else {
> -			__be32 *result = (void *)ahashreq->result;
> +		memcpy(ahashreq->result, data, digsize);
> +	} else {
> +		for (i = 0; i < digsize / 4; i++)
> +			creq->state[i] = readl_relaxed(engine->regs +
> +						       CESA_IVDIG(i));
> +		if (creq->last_req) {
> +			/*
> +			* Hardware's MD5 digest is in little endian format, but
> +			* SHA in big endian format
> +			*/
> +			if (creq->algo_le) {
> +				__le32 *result = (void *)ahashreq->result;
> +
> +				for (i = 0; i < digsize / 4; i++)
> +					result[i] = cpu_to_le32(creq->state[i]);
> +			} else {
> +				__be32 *result = (void *)ahashreq->result;
>  
> -			for (i = 0; i < digsize / 4; i++)
> -				result[i] = cpu_to_be32(creq->state[i]);
> +				for (i = 0; i < digsize / 4; i++)
> +					result[i] = cpu_to_be32(creq->state[i]);
> +			}
>  		}
>  	}
>  
> @@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
>  						CESA_SA_DESC_CFG_LAST_FRAG,
>  				      CESA_SA_DESC_CFG_FRAG_MSK);
>  
> +		ret = mv_cesa_dma_add_result_op(chain,
> +						CESA_SA_CFG_SRAM_OFFSET,
> +						CESA_SA_DATA_SRAM_OFFSET,
> +						CESA_TDMA_SRC_IN_SRAM, flags);
> +		if (ret)
> +			return ERR_PTR(-ENOMEM);
>  		return op;
>  	}
>  
> @@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
>  	struct mv_cesa_op_ctx *op = NULL;
>  	unsigned int frag_len;
>  	int ret;
> +	u32 type;
> +	bool break_chain = true;
>  
>  	basereq->chain.first = NULL;
>  	basereq->chain.last = NULL;
> @@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
>  		goto err_free_tdma;
>  	}
>  
> +	/*
> +	 * If results are copied via DMA, this means that this
> +	 * request can be directly processed by the engine,
> +	 * without partial updates. So we can chain it at the
> +	 * DMA level with other requests.
> +	 */
> +	type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK;
> +	if (type == CESA_TDMA_RESULT)
> +		break_chain = false;
> +
>  	if (op) {
>  		/* Add dummy desc to wait for crypto operation end */
>  		ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags);
> @@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
>  	else
>  		creq->cache_ptr = 0;
>  
> -	basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ |
> -				       CESA_TDMA_BREAK_CHAIN);
> +	basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ;
> +
> +	if (break_chain)
> +		basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN;

Not sure this break_chain variable is really needed. you can directly
test the type of the last element in the TDMA chain here and if it's
!= CESA_TDMA_RESULT, pass the CESA_TDMA_BREAK_CHAIN flag.

>  
>  	return 0;
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests
@ 2016-10-04 14:14     ` Boris Brezillon
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2016-10-04 14:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue,  4 Oct 2016 14:57:20 +0200
Romain Perier <romain.perier@free-electrons.com> wrote:

> Currently, the driver breaks chain for all kind of hash requests in order to
> don't override intermediate states of partial ahash updates. However, some final
> ahash requests can be directly processed by the engine, and so without
> intermediate state. This is typically the case for most for the HMAC requests
> processed via IPSec.
> 
> This commits adds a TDMA descriptor to copy context for these of requests
> into the "op" dma pool, then it allow to chain these requests at the DMA level.
> The 'complete' operation is also updated to retrieve the MAC digest from the
> right location.
> 
> Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
> ---
> 
> Changes in v3:
>  - Copy the whole context back to RAM and not just the digest. Also
>    fixed a rebase issue ^^ (whoops)
> 
> Changes in v2:
>  - Replaced BUG_ON by an error
>  - Add a variable "break_chain", with "type" to break the chain
> 
>    with ahash requests. It improves code readability.
>  drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 64 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index 9f28468..b36f196 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
>  	int i;
>  
>  	digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
> -	for (i = 0; i < digsize / 4; i++)
> -		creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i));
>  
> -	if (creq->last_req) {
> +	if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ &&
> +	    !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) {
> +		struct mv_cesa_tdma_desc *tdma = NULL;
> +		__le32 *data = NULL;
> +
> +		for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) {
> +			u32 type = tdma->flags & CESA_TDMA_TYPE_MSK;
> +			if (type ==  CESA_TDMA_RESULT)
> +				break;
> +		}

You should be able to drop the DUMMY desc at the end of the chain and
replace it by the RESULT desc. This way, you won't have to iterate over
the chain to find the TDMA_RESULT element: it should always be the last
desc in the chain.

> +
> +		if (!tdma) {
> +			dev_err(cesa_dev->dev, "Failed to retrieve tdma "
> +					       "descriptor for outer data\n");
> +			return;
> +		}
> +
>  		/*
> -		 * Hardware's MD5 digest is in little endian format, but
> -		 * SHA in big endian format
> +		 * Result is already in the correct endianess when the SA is
> +		 * used
>  		 */
> -		if (creq->algo_le) {
> -			__le32 *result = (void *)ahashreq->result;
> +		data = tdma->op->ctx.hash.hash;
> +		for (i = 0; i < digsize / 4; i++)
> +			creq->state[i] = cpu_to_le32(data[i]);
>  
> -			for (i = 0; i < digsize / 4; i++)
> -				result[i] = cpu_to_le32(creq->state[i]);
> -		} else {
> -			__be32 *result = (void *)ahashreq->result;
> +		memcpy(ahashreq->result, data, digsize);
> +	} else {
> +		for (i = 0; i < digsize / 4; i++)
> +			creq->state[i] = readl_relaxed(engine->regs +
> +						       CESA_IVDIG(i));
> +		if (creq->last_req) {
> +			/*
> +			* Hardware's MD5 digest is in little endian format, but
> +			* SHA in big endian format
> +			*/
> +			if (creq->algo_le) {
> +				__le32 *result = (void *)ahashreq->result;
> +
> +				for (i = 0; i < digsize / 4; i++)
> +					result[i] = cpu_to_le32(creq->state[i]);
> +			} else {
> +				__be32 *result = (void *)ahashreq->result;
>  
> -			for (i = 0; i < digsize / 4; i++)
> -				result[i] = cpu_to_be32(creq->state[i]);
> +				for (i = 0; i < digsize / 4; i++)
> +					result[i] = cpu_to_be32(creq->state[i]);
> +			}
>  		}
>  	}
>  
> @@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
>  						CESA_SA_DESC_CFG_LAST_FRAG,
>  				      CESA_SA_DESC_CFG_FRAG_MSK);
>  
> +		ret = mv_cesa_dma_add_result_op(chain,
> +						CESA_SA_CFG_SRAM_OFFSET,
> +						CESA_SA_DATA_SRAM_OFFSET,
> +						CESA_TDMA_SRC_IN_SRAM, flags);
> +		if (ret)
> +			return ERR_PTR(-ENOMEM);
>  		return op;
>  	}
>  
> @@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
>  	struct mv_cesa_op_ctx *op = NULL;
>  	unsigned int frag_len;
>  	int ret;
> +	u32 type;
> +	bool break_chain = true;
>  
>  	basereq->chain.first = NULL;
>  	basereq->chain.last = NULL;
> @@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
>  		goto err_free_tdma;
>  	}
>  
> +	/*
> +	 * If results are copied via DMA, this means that this
> +	 * request can be directly processed by the engine,
> +	 * without partial updates. So we can chain it at the
> +	 * DMA level with other requests.
> +	 */
> +	type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK;
> +	if (type == CESA_TDMA_RESULT)
> +		break_chain = false;
> +
>  	if (op) {
>  		/* Add dummy desc to wait for crypto operation end */
>  		ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags);
> @@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
>  	else
>  		creq->cache_ptr = 0;
>  
> -	basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ |
> -				       CESA_TDMA_BREAK_CHAIN);
> +	basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ;
> +
> +	if (break_chain)
> +		basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN;

Not sure this break_chain variable is really needed. you can directly
test the type of the last element in the TDMA chain here and if it's
!= CESA_TDMA_RESULT, pass the CESA_TDMA_BREAK_CHAIN flag.

>  
>  	return 0;
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-10-04 14:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-04 12:57 [PATCH v3 0/2] Improve DMA chaining for ahash requests Romain Perier
2016-10-04 12:57 ` Romain Perier
2016-10-04 12:57 ` [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests Romain Perier
2016-10-04 12:57   ` Romain Perier
2016-10-04 13:17   ` Boris Brezillon
2016-10-04 13:17     ` Boris Brezillon
2016-10-04 12:57 ` [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests Romain Perier
2016-10-04 12:57   ` Romain Perier
2016-10-04 14:14   ` Boris Brezillon
2016-10-04 14:14     ` Boris Brezillon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.