linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Improve DMA chaining for ahash requests
@ 2016-10-03 15:17 Romain Perier
  2016-10-03 15:17 ` [PATCH v2 1/2] crypto: marvell - Use an unique pool to copy results of requests Romain Perier
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Romain Perier @ 2016-10-03 15:17 UTC (permalink / raw)
  To: Boris Brezillon, Arnaud Ebalard
  Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	linux-crypto, linux-arm-kernel

This series contain performance improvement regarding ahash requests.
So far, ahash requests were systematically not chained at the DMA level.
However, in some case, like this is the case by using IPSec, some ahash
requests can be processed directly by the engine, and don't have
intermediaire partial update states.

This series firstly re-work the way outer IVs are copied from the SRAM
into the dma pool. To do so, we introduce a common dma pool for all type
of requests that contains outer results (like IV or digest). Then, for
ahash requests that can be processed directly by the engine, outer
results are copied from the SRAM into the common dma pool. These requests
are then allowed to be chained at the DMA level.


Benchmarking results with iperf throught IPSec
==============================================
		ESP			AH

Before		343 Mbits/s		492 Mbits/s
After		392 Mbits/s		557 Mbits/s
Improvement	+14%			+13%

Romain Perier (2):
  crypto: marvell - Use an unique pool to copy results of requests
  crypto: marvell - Don't break chain for computable last ahash requests

 drivers/crypto/marvell/cesa.c   |  4 ---
 drivers/crypto/marvell/cesa.h   |  5 ++-
 drivers/crypto/marvell/cipher.c |  6 ++--
 drivers/crypto/marvell/hash.c   | 79 +++++++++++++++++++++++++++++++++--------
 drivers/crypto/marvell/tdma.c   | 17 +++++----
 5 files changed, 78 insertions(+), 33 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] crypto: marvell - Use an unique pool to copy results of requests
  2016-10-03 15:17 [PATCH v2 0/2] Improve DMA chaining for ahash requests Romain Perier
@ 2016-10-03 15:17 ` Romain Perier
  2016-10-03 15:17 ` [PATCH v2 2/2] crypto: marvell - Don't break chain for computable last ahash requests Romain Perier
  2016-10-03 15:48 ` [PATCH v2 0/2] Improve DMA chaining for " Romain Perier
  2 siblings, 0 replies; 4+ messages in thread
From: Romain Perier @ 2016-10-03 15:17 UTC (permalink / raw)
  To: Boris Brezillon, Arnaud Ebalard
  Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	linux-crypto, linux-arm-kernel

So far, we used a dedicated dma pool to copy the result of outer IV for
cipher requests. Instead of using a dma pool per outer data, we prefer
use the op dma pool that contains all part of the request from the SRAM.
Then, the outer data that is likely to be used by the 'complete'
operation, is copied later. In this way, any type of result can be
retrieved by DMA for cipher or ahash requests.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v2:
  - Use the dma pool "op" to retrieve outer data intead of introducing
    a new one.

 drivers/crypto/marvell/cesa.c   |  4 ----
 drivers/crypto/marvell/cesa.h   |  5 ++---
 drivers/crypto/marvell/cipher.c |  8 +++++---
 drivers/crypto/marvell/tdma.c   | 17 ++++++++---------
 4 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
index 37dadb2..6e7a5c7 100644
--- a/drivers/crypto/marvell/cesa.c
+++ b/drivers/crypto/marvell/cesa.c
@@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa)
 	if (!dma->padding_pool)
 		return -ENOMEM;
 
-	dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0);
-	if (!dma->iv_pool)
-		return -ENOMEM;
-
 	cesa->dma = dma;
 
 	return 0;
diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e423d33..a768da7 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -277,7 +277,7 @@ struct mv_cesa_op_ctx {
 #define CESA_TDMA_DUMMY				0
 #define CESA_TDMA_DATA				1
 #define CESA_TDMA_OP				2
-#define CESA_TDMA_IV				3
+#define CESA_TDMA_RESULT			3
 
 /**
  * struct mv_cesa_tdma_desc - TDMA descriptor
@@ -393,7 +393,6 @@ struct mv_cesa_dev_dma {
 	struct dma_pool *op_pool;
 	struct dma_pool *cache_pool;
 	struct dma_pool *padding_pool;
-	struct dma_pool *iv_pool;
 };
 
 /**
@@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain)
 	memset(chain, 0, sizeof(*chain));
 }
 
-int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
+int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
 			  u32 size, u32 flags, gfp_t gfp_flags);
 
 struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,
diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index d19dc96..098871a 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req)
 		struct mv_cesa_req *basereq;
 
 		basereq = &creq->base;
-		memcpy(ablkreq->info, basereq->chain.last->data, ivsize);
+		memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv,
+		       ivsize);
 	} else {
 		memcpy_fromio(ablkreq->info,
 			      engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
@@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req,
 
 	/* Add output data for IV */
 	ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req));
-	ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET,
-				    ivsize, CESA_TDMA_SRC_IN_SRAM, flags);
+	ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET,
+				    CESA_SA_DATA_SRAM_OFFSET,
+				    CESA_TDMA_SRC_IN_SRAM, flags);
 
 	if (ret)
 		goto err_free_tdma;
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 9fd7a5f..2e15f19 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -69,8 +69,8 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq)
 		if (type == CESA_TDMA_OP)
 			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
 				      le32_to_cpu(tdma->src));
-		else if (type == CESA_TDMA_IV)
-			dma_pool_free(cesa_dev->dma->iv_pool, tdma->data,
+		else if (type == CESA_TDMA_RESULT)
+			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
 				      le32_to_cpu(tdma->dst));
 
 		tdma = tdma->next;
@@ -209,29 +209,28 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags)
 	return new_tdma;
 }
 
-int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
+int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
 			  u32 size, u32 flags, gfp_t gfp_flags)
 {
-
 	struct mv_cesa_tdma_desc *tdma;
-	u8 *iv;
+	struct mv_cesa_op_ctx *result;
 	dma_addr_t dma_handle;
 
 	tdma = mv_cesa_dma_add_desc(chain, gfp_flags);
 	if (IS_ERR(tdma))
 		return PTR_ERR(tdma);
 
-	iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle);
-	if (!iv)
+	result = dma_pool_alloc(cesa_dev->dma->op_pool, gfp_flags, &dma_handle);
+	if (!result)
 		return -ENOMEM;
 
 	tdma->byte_cnt = cpu_to_le32(size | BIT(31));
 	tdma->src = src;
 	tdma->dst = cpu_to_le32(dma_handle);
-	tdma->data = iv;
+	tdma->op = result;
 
 	flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM);
-	tdma->flags = flags | CESA_TDMA_IV;
+	tdma->flags = flags | CESA_TDMA_RESULT;
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] crypto: marvell - Don't break chain for computable last ahash requests
  2016-10-03 15:17 [PATCH v2 0/2] Improve DMA chaining for ahash requests Romain Perier
  2016-10-03 15:17 ` [PATCH v2 1/2] crypto: marvell - Use an unique pool to copy results of requests Romain Perier
@ 2016-10-03 15:17 ` Romain Perier
  2016-10-03 15:48 ` [PATCH v2 0/2] Improve DMA chaining for " Romain Perier
  2 siblings, 0 replies; 4+ messages in thread
From: Romain Perier @ 2016-10-03 15:17 UTC (permalink / raw)
  To: Boris Brezillon, Arnaud Ebalard
  Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, Gregory Clement,
	linux-crypto, linux-arm-kernel

Currently, the driver breaks chain for all kind of hash requests in order to
don't override intermediate states of partial ahash updates. However, some final
ahash requests can be directly processed by the engine, and so without
intermediate state. This is typically the case for most for the HMAC requests
processed via IPSec.

This commits adds a TDMA descriptor to copy outer data for these of requests
into the "op" dma pool, then it allow to chain these requests at the DMA level.
The 'complete' operation is also updated to retrieve the MAC digest from the
right location.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v2:
 - Replaced BUG_ON by an error
 - Add a variable "break_chain", with "type" to break the chain
   with ahash requests. It improves code readability.

 drivers/crypto/marvell/cipher.c | 10 +++---
 drivers/crypto/marvell/hash.c   | 79 +++++++++++++++++++++++++++++++++--------
 drivers/crypto/marvell/tdma.c   |  4 +--
 3 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index 098871a..8bc52bf 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -212,8 +212,7 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req)
 		struct mv_cesa_req *basereq;
 
 		basereq = &creq->base;
-		memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv,
-		       ivsize);
+		memcpy(ablkreq->info, basereq->chain.last->data, ivsize);
 	} else {
 		memcpy_fromio(ablkreq->info,
 			      engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
@@ -374,9 +373,10 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req,
 
 	/* Add output data for IV */
 	ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req));
-	ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET,
-				    CESA_SA_DATA_SRAM_OFFSET,
-				    CESA_TDMA_SRC_IN_SRAM, flags);
+	ret = mv_cesa_dma_add_result_op(&basereq->chain,
+					CESA_SA_CRYPT_IV_SRAM_OFFSET,
+					ivsize,
+					CESA_TDMA_SRC_IN_SRAM, flags);
 
 	if (ret)
 		goto err_free_tdma;
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 9f28468..35baf4f 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 	int i;
 
 	digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
-	for (i = 0; i < digsize / 4; i++)
-		creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i));
 
-	if (creq->last_req) {
+	if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ &&
+	    !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) {
+		struct mv_cesa_tdma_desc *tdma = NULL;
+		__le32 *data = NULL;
+
+		for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) {
+			u32 type = tdma->flags & CESA_TDMA_TYPE_MSK;
+			if (type ==  CESA_TDMA_RESULT)
+				break;
+		}
+
+		if (!tdma) {
+			dev_err(cesa_dev->dev, "Failed to retrieve tdma "
+					       "descriptor for outer data\n");
+			return;
+		}
+
 		/*
-		 * Hardware's MD5 digest is in little endian format, but
-		 * SHA in big endian format
+		 * Result is already in the correct endianess when the SA is
+		 * used
 		 */
-		if (creq->algo_le) {
-			__le32 *result = (void *)ahashreq->result;
+		data = tdma->data;
+		for (i = 0; i < digsize / 4; i++)
+			creq->state[i] = cpu_to_le32(data[i]);
 
-			for (i = 0; i < digsize / 4; i++)
-				result[i] = cpu_to_le32(creq->state[i]);
-		} else {
-			__be32 *result = (void *)ahashreq->result;
+		memcpy(ahashreq->result, data, digsize);
+	} else {
+		for (i = 0; i < digsize / 4; i++)
+			creq->state[i] = readl_relaxed(engine->regs +
+						       CESA_IVDIG(i));
+		if (creq->last_req) {
+			/*
+			* Hardware's MD5 digest is in little endian format, but
+			* SHA in big endian format
+			*/
+			if (creq->algo_le) {
+				__le32 *result = (void *)ahashreq->result;
+
+				for (i = 0; i < digsize / 4; i++)
+					result[i] = cpu_to_le32(creq->state[i]);
+			} else {
+				__be32 *result = (void *)ahashreq->result;
 
-			for (i = 0; i < digsize / 4; i++)
-				result[i] = cpu_to_be32(creq->state[i]);
+				for (i = 0; i < digsize / 4; i++)
+					result[i] = cpu_to_be32(creq->state[i]);
+			}
 		}
 	}
 
@@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
 						CESA_SA_DESC_CFG_LAST_FRAG,
 				      CESA_SA_DESC_CFG_FRAG_MSK);
 
+		ret = mv_cesa_dma_add_result_op(chain,
+						CESA_SA_MAC_DIG_SRAM_OFFSET,
+						32,
+						CESA_TDMA_SRC_IN_SRAM, flags);
+		if (ret)
+			return ERR_PTR(-ENOMEM);
 		return op;
 	}
 
@@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 	struct mv_cesa_op_ctx *op = NULL;
 	unsigned int frag_len;
 	int ret;
+	u32 type;
+	bool break_chain = true;
 
 	basereq->chain.first = NULL;
 	basereq->chain.last = NULL;
@@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 		goto err_free_tdma;
 	}
 
+	/*
+	 * If results are copied via DMA, this means that this
+	 * request can be directly processed by the engine,
+	 * without partial updates. So we can chain it at the
+	 * DMA level with other requests.
+	 */
+	type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK;
+	if (type == CESA_TDMA_RESULT)
+		break_chain = false;
+
 	if (op) {
 		/* Add dummy desc to wait for crypto operation end */
 		ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags);
@@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
 	else
 		creq->cache_ptr = 0;
 
-	basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ |
-				       CESA_TDMA_BREAK_CHAIN);
+	basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ;
+
+	if (break_chain)
+		basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN;
 
 	return 0;
 
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 2e15f19..8d33003 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -70,7 +70,7 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq)
 			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
 				      le32_to_cpu(tdma->src));
 		else if (type == CESA_TDMA_RESULT)
-			dma_pool_free(cesa_dev->dma->op_pool, tdma->op,
+			dma_pool_free(cesa_dev->dma->op_pool, tdma->data,
 				      le32_to_cpu(tdma->dst));
 
 		tdma = tdma->next;
@@ -227,7 +227,7 @@ int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src,
 	tdma->byte_cnt = cpu_to_le32(size | BIT(31));
 	tdma->src = src;
 	tdma->dst = cpu_to_le32(dma_handle);
-	tdma->op = result;
+	tdma->data = result;
 
 	flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM);
 	tdma->flags = flags | CESA_TDMA_RESULT;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/2] Improve DMA chaining for ahash requests
  2016-10-03 15:17 [PATCH v2 0/2] Improve DMA chaining for ahash requests Romain Perier
  2016-10-03 15:17 ` [PATCH v2 1/2] crypto: marvell - Use an unique pool to copy results of requests Romain Perier
  2016-10-03 15:17 ` [PATCH v2 2/2] crypto: marvell - Don't break chain for computable last ahash requests Romain Perier
@ 2016-10-03 15:48 ` Romain Perier
  2 siblings, 0 replies; 4+ messages in thread
From: Romain Perier @ 2016-10-03 15:48 UTC (permalink / raw)
  To: Boris Brezillon, Arnaud Ebalard
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper, linux-crypto,
	Gregory Clement, Herbert Xu, David S. Miller, linux-arm-kernel,
	Sebastian Hesselbarth

Hello,

Le 03/10/2016 17:17, Romain Perier a écrit :
> This series contain performance improvement regarding ahash requests.
> So far, ahash requests were systematically not chained at the DMA level.
> However, in some case, like this is the case by using IPSec, some ahash
> requests can be processed directly by the engine, and don't have
> intermediaire partial update states.
>
> This series firstly re-work the way outer IVs are copied from the SRAM
> into the dma pool. To do so, we introduce a common dma pool for all type
> of requests that contains outer results (like IV or digest). Then, for
> ahash requests that can be processed directly by the engine, outer
> results are copied from the SRAM into the common dma pool. These requests
> are then allowed to be chained at the DMA level.
>
>
> Benchmarking results with iperf throught IPSec
> ==============================================
> 		ESP			AH
>
> Before		343 Mbits/s		492 Mbits/s
> After		392 Mbits/s		557 Mbits/s
> Improvement	+14%			+13%
>
> Romain Perier (2):
>    crypto: marvell - Use an unique pool to copy results of requests
>    crypto: marvell - Don't break chain for computable last ahash requests
>
>   drivers/crypto/marvell/cesa.c   |  4 ---
>   drivers/crypto/marvell/cesa.h   |  5 ++-
>   drivers/crypto/marvell/cipher.c |  6 ++--
>   drivers/crypto/marvell/hash.c   | 79 +++++++++++++++++++++++++++++++++--------
>   drivers/crypto/marvell/tdma.c   | 17 +++++----
>   5 files changed, 78 insertions(+), 33 deletions(-)
>

After an internal discussion, we can handle things differently. Instead 
of allocating a new mv_cesa_op_ctx in the op_pool, we can just allocate 
a new descriptor which points to the first op ctx of the chain, and then 
copy outer data.

Please ignore this series.

Regards,

-- 
Romain Perier, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-03 15:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-03 15:17 [PATCH v2 0/2] Improve DMA chaining for ahash requests Romain Perier
2016-10-03 15:17 ` [PATCH v2 1/2] crypto: marvell - Use an unique pool to copy results of requests Romain Perier
2016-10-03 15:17 ` [PATCH v2 2/2] crypto: marvell - Don't break chain for computable last ahash requests Romain Perier
2016-10-03 15:48 ` [PATCH v2 0/2] Improve DMA chaining for " Romain Perier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).