* [PATCH v3 0/2] Improve DMA chaining for ahash requests @ 2016-10-04 12:57 ` Romain Perier 0 siblings, 0 replies; 10+ messages in thread From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw) To: Boris Brezillon, Arnaud Ebalard Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement, Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel This series contain performance improvement regarding ahash requests. So far, ahash requests were systematically not chained at the DMA level. However, in some case, like this is the case by using IPSec, some ahash requests can be processed directly by the engine, and don't have intermediaire partial update states. This series firstly re-work the way outer IVs are copied from the SRAM into the dma pool. To do so, we introduce a common dma pool for all type of requests that contains outer results (like IV or digest). Then, for ahash requests that can be processed directly by the engine, outer results are copied from the SRAM into the common dma pool. These requests are then allowed to be chained at the DMA level. Benchmarking results with iperf throught IPSec ============================================== ESP AH Before 343 Mbits/s 492 Mbits/s After 422 Mbits/s 577 Mbits/s Improvement +23% +17% Romain Perier (2): crypto: marvell - Use an unique pool to copy results of requests crypto: marvell - Don't break chain for computable last ahash requests drivers/crypto/marvell/cesa.c | 4 --- drivers/crypto/marvell/cesa.h | 5 ++- drivers/crypto/marvell/cipher.c | 8 +++-- drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++-------- drivers/crypto/marvell/tdma.c | 28 +++++++-------- 5 files changed, 85 insertions(+), 39 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 0/2] Improve DMA chaining for ahash requests @ 2016-10-04 12:57 ` Romain Perier 0 siblings, 0 replies; 10+ messages in thread From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw) To: linux-arm-kernel This series contain performance improvement regarding ahash requests. So far, ahash requests were systematically not chained at the DMA level. However, in some case, like this is the case by using IPSec, some ahash requests can be processed directly by the engine, and don't have intermediaire partial update states. This series firstly re-work the way outer IVs are copied from the SRAM into the dma pool. To do so, we introduce a common dma pool for all type of requests that contains outer results (like IV or digest). Then, for ahash requests that can be processed directly by the engine, outer results are copied from the SRAM into the common dma pool. These requests are then allowed to be chained at the DMA level. Benchmarking results with iperf throught IPSec ============================================== ESP AH Before 343 Mbits/s 492 Mbits/s After 422 Mbits/s 577 Mbits/s Improvement +23% +17% Romain Perier (2): crypto: marvell - Use an unique pool to copy results of requests crypto: marvell - Don't break chain for computable last ahash requests drivers/crypto/marvell/cesa.c | 4 --- drivers/crypto/marvell/cesa.h | 5 ++- drivers/crypto/marvell/cipher.c | 8 +++-- drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++-------- drivers/crypto/marvell/tdma.c | 28 +++++++-------- 5 files changed, 85 insertions(+), 39 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests 2016-10-04 12:57 ` Romain Perier @ 2016-10-04 12:57 ` Romain Perier -1 siblings, 0 replies; 10+ messages in thread From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw) To: Boris Brezillon, Arnaud Ebalard Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement, Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel So far, we used a dedicated dma pool to copy the result of outer IV for cipher requests. Instead of using a dma pool per outer data, we prefer use the op dma pool that contains all part of the request from the SRAM. Then, the outer data that is likely to be used by the 'complete' operation, is copied later. In this way, any type of result can be retrieved by DMA for cipher or ahash requests. Signed-off-by: Romain Perier <romain.perier@free-electrons.com> --- Changes in v3: - Don't allocate a new op ctx for the last tdma descriptor. Instead we point to the last op ctx in the tdma chain, and copy the context of the current request to this location. Changes in v2: - Use the dma pool "op" to retrieve outer data intead of introducing a new one. drivers/crypto/marvell/cesa.c | 4 ---- drivers/crypto/marvell/cesa.h | 5 ++--- drivers/crypto/marvell/cipher.c | 8 +++++--- drivers/crypto/marvell/tdma.c | 28 ++++++++++++++-------------- 4 files changed, 21 insertions(+), 24 deletions(-) diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c index 37dadb2..6e7a5c7 100644 --- a/drivers/crypto/marvell/cesa.c +++ b/drivers/crypto/marvell/cesa.c @@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa) if (!dma->padding_pool) return -ENOMEM; - dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0); - if (!dma->iv_pool) - return -ENOMEM; - cesa->dma = dma; return 0; diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h index e423d33..a768da7 100644 --- a/drivers/crypto/marvell/cesa.h +++ b/drivers/crypto/marvell/cesa.h @@ -277,7 +277,7 @@ struct mv_cesa_op_ctx { #define CESA_TDMA_DUMMY 0 #define CESA_TDMA_DATA 1 #define CESA_TDMA_OP 2 -#define CESA_TDMA_IV 3 +#define CESA_TDMA_RESULT 3 /** * struct mv_cesa_tdma_desc - TDMA descriptor @@ -393,7 +393,6 @@ struct mv_cesa_dev_dma { struct dma_pool *op_pool; struct dma_pool *cache_pool; struct dma_pool *padding_pool; - struct dma_pool *iv_pool; }; /** @@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain) memset(chain, 0, sizeof(*chain)); } -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, u32 size, u32 flags, gfp_t gfp_flags); struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain, diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c index d19dc96..098871a 100644 --- a/drivers/crypto/marvell/cipher.c +++ b/drivers/crypto/marvell/cipher.c @@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req) struct mv_cesa_req *basereq; basereq = &creq->base; - memcpy(ablkreq->info, basereq->chain.last->data, ivsize); + memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv, + ivsize); } else { memcpy_fromio(ablkreq->info, engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET, @@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req, /* Add output data for IV */ ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req)); - ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET, - ivsize, CESA_TDMA_SRC_IN_SRAM, flags); + ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET, + CESA_SA_DATA_SRAM_OFFSET, + CESA_TDMA_SRC_IN_SRAM, flags); if (ret) goto err_free_tdma; diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c index 9fd7a5f..991dc3f 100644 --- a/drivers/crypto/marvell/tdma.c +++ b/drivers/crypto/marvell/tdma.c @@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq) if (type == CESA_TDMA_OP) dma_pool_free(cesa_dev->dma->op_pool, tdma->op, le32_to_cpu(tdma->src)); - else if (type == CESA_TDMA_IV) - dma_pool_free(cesa_dev->dma->iv_pool, tdma->data, - le32_to_cpu(tdma->dst)); tdma = tdma->next; dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma, @@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags) return new_tdma; } -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, u32 size, u32 flags, gfp_t gfp_flags) { - - struct mv_cesa_tdma_desc *tdma; - u8 *iv; - dma_addr_t dma_handle; + struct mv_cesa_tdma_desc *tdma, *op_desc; tdma = mv_cesa_dma_add_desc(chain, gfp_flags); if (IS_ERR(tdma)) return PTR_ERR(tdma); - iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle); - if (!iv) - return -ENOMEM; + for (op_desc = chain->first; op_desc; op_desc = op_desc->next) { + u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK; + + if (type == CESA_TDMA_OP) + break; + } + + if (!op_desc) + return -EIO; tdma->byte_cnt = cpu_to_le32(size | BIT(31)); tdma->src = src; - tdma->dst = cpu_to_le32(dma_handle); - tdma->data = iv; + tdma->dst = op_desc->src; + tdma->op = op_desc->op; flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM); - tdma->flags = flags | CESA_TDMA_IV; + tdma->flags = flags | CESA_TDMA_RESULT; return 0; } -- 2.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests @ 2016-10-04 12:57 ` Romain Perier 0 siblings, 0 replies; 10+ messages in thread From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw) To: linux-arm-kernel So far, we used a dedicated dma pool to copy the result of outer IV for cipher requests. Instead of using a dma pool per outer data, we prefer use the op dma pool that contains all part of the request from the SRAM. Then, the outer data that is likely to be used by the 'complete' operation, is copied later. In this way, any type of result can be retrieved by DMA for cipher or ahash requests. Signed-off-by: Romain Perier <romain.perier@free-electrons.com> --- Changes in v3: - Don't allocate a new op ctx for the last tdma descriptor. Instead we point to the last op ctx in the tdma chain, and copy the context of the current request to this location. Changes in v2: - Use the dma pool "op" to retrieve outer data intead of introducing a new one. drivers/crypto/marvell/cesa.c | 4 ---- drivers/crypto/marvell/cesa.h | 5 ++--- drivers/crypto/marvell/cipher.c | 8 +++++--- drivers/crypto/marvell/tdma.c | 28 ++++++++++++++-------------- 4 files changed, 21 insertions(+), 24 deletions(-) diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c index 37dadb2..6e7a5c7 100644 --- a/drivers/crypto/marvell/cesa.c +++ b/drivers/crypto/marvell/cesa.c @@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa) if (!dma->padding_pool) return -ENOMEM; - dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0); - if (!dma->iv_pool) - return -ENOMEM; - cesa->dma = dma; return 0; diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h index e423d33..a768da7 100644 --- a/drivers/crypto/marvell/cesa.h +++ b/drivers/crypto/marvell/cesa.h @@ -277,7 +277,7 @@ struct mv_cesa_op_ctx { #define CESA_TDMA_DUMMY 0 #define CESA_TDMA_DATA 1 #define CESA_TDMA_OP 2 -#define CESA_TDMA_IV 3 +#define CESA_TDMA_RESULT 3 /** * struct mv_cesa_tdma_desc - TDMA descriptor @@ -393,7 +393,6 @@ struct mv_cesa_dev_dma { struct dma_pool *op_pool; struct dma_pool *cache_pool; struct dma_pool *padding_pool; - struct dma_pool *iv_pool; }; /** @@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain) memset(chain, 0, sizeof(*chain)); } -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, u32 size, u32 flags, gfp_t gfp_flags); struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain, diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c index d19dc96..098871a 100644 --- a/drivers/crypto/marvell/cipher.c +++ b/drivers/crypto/marvell/cipher.c @@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req) struct mv_cesa_req *basereq; basereq = &creq->base; - memcpy(ablkreq->info, basereq->chain.last->data, ivsize); + memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv, + ivsize); } else { memcpy_fromio(ablkreq->info, engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET, @@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req, /* Add output data for IV */ ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req)); - ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET, - ivsize, CESA_TDMA_SRC_IN_SRAM, flags); + ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET, + CESA_SA_DATA_SRAM_OFFSET, + CESA_TDMA_SRC_IN_SRAM, flags); if (ret) goto err_free_tdma; diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c index 9fd7a5f..991dc3f 100644 --- a/drivers/crypto/marvell/tdma.c +++ b/drivers/crypto/marvell/tdma.c @@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq) if (type == CESA_TDMA_OP) dma_pool_free(cesa_dev->dma->op_pool, tdma->op, le32_to_cpu(tdma->src)); - else if (type == CESA_TDMA_IV) - dma_pool_free(cesa_dev->dma->iv_pool, tdma->data, - le32_to_cpu(tdma->dst)); tdma = tdma->next; dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma, @@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags) return new_tdma; } -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, u32 size, u32 flags, gfp_t gfp_flags) { - - struct mv_cesa_tdma_desc *tdma; - u8 *iv; - dma_addr_t dma_handle; + struct mv_cesa_tdma_desc *tdma, *op_desc; tdma = mv_cesa_dma_add_desc(chain, gfp_flags); if (IS_ERR(tdma)) return PTR_ERR(tdma); - iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle); - if (!iv) - return -ENOMEM; + for (op_desc = chain->first; op_desc; op_desc = op_desc->next) { + u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK; + + if (type == CESA_TDMA_OP) + break; + } + + if (!op_desc) + return -EIO; tdma->byte_cnt = cpu_to_le32(size | BIT(31)); tdma->src = src; - tdma->dst = cpu_to_le32(dma_handle); - tdma->data = iv; + tdma->dst = op_desc->src; + tdma->op = op_desc->op; flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM); - tdma->flags = flags | CESA_TDMA_IV; + tdma->flags = flags | CESA_TDMA_RESULT; return 0; } -- 2.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests 2016-10-04 12:57 ` Romain Perier @ 2016-10-04 13:17 ` Boris Brezillon -1 siblings, 0 replies; 10+ messages in thread From: Boris Brezillon @ 2016-10-04 13:17 UTC (permalink / raw) To: Romain Perier Cc: Arnaud Ebalard, David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement, Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel On Tue, 4 Oct 2016 14:57:19 +0200 Romain Perier <romain.perier@free-electrons.com> wrote: > So far, we used a dedicated dma pool to copy the result of outer IV for > cipher requests. Instead of using a dma pool per outer data, we prefer > use the op dma pool that contains all part of the request from the SRAM. > Then, the outer data that is likely to be used by the 'complete' > operation, is copied later. In this way, any type of result can be > retrieved by DMA for cipher or ahash requests. > > Signed-off-by: Romain Perier <romain.perier@free-electrons.com> > --- > > Changes in v3: > - Don't allocate a new op ctx for the last tdma descriptor. Instead > we point to the last op ctx in the tdma chain, and copy the context > of the current request to this location. > > Changes in v2: > - Use the dma pool "op" to retrieve outer data intead of introducing > a new one. > > drivers/crypto/marvell/cesa.c | 4 ---- > drivers/crypto/marvell/cesa.h | 5 ++--- > drivers/crypto/marvell/cipher.c | 8 +++++--- > drivers/crypto/marvell/tdma.c | 28 ++++++++++++++-------------- > 4 files changed, 21 insertions(+), 24 deletions(-) > > diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c > index 37dadb2..6e7a5c7 100644 > --- a/drivers/crypto/marvell/cesa.c > +++ b/drivers/crypto/marvell/cesa.c > @@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa) > if (!dma->padding_pool) > return -ENOMEM; > > - dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0); > - if (!dma->iv_pool) > - return -ENOMEM; > - > cesa->dma = dma; > > return 0; > diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h > index e423d33..a768da7 100644 > --- a/drivers/crypto/marvell/cesa.h > +++ b/drivers/crypto/marvell/cesa.h > @@ -277,7 +277,7 @@ struct mv_cesa_op_ctx { > #define CESA_TDMA_DUMMY 0 > #define CESA_TDMA_DATA 1 > #define CESA_TDMA_OP 2 > -#define CESA_TDMA_IV 3 > +#define CESA_TDMA_RESULT 3 > > /** > * struct mv_cesa_tdma_desc - TDMA descriptor > @@ -393,7 +393,6 @@ struct mv_cesa_dev_dma { > struct dma_pool *op_pool; > struct dma_pool *cache_pool; > struct dma_pool *padding_pool; > - struct dma_pool *iv_pool; > }; > > /** > @@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain) > memset(chain, 0, sizeof(*chain)); > } > > -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > u32 size, u32 flags, gfp_t gfp_flags); > > struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain, > diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c > index d19dc96..098871a 100644 > --- a/drivers/crypto/marvell/cipher.c > +++ b/drivers/crypto/marvell/cipher.c > @@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req) > struct mv_cesa_req *basereq; > > basereq = &creq->base; > - memcpy(ablkreq->info, basereq->chain.last->data, ivsize); > + memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv, > + ivsize); > } else { > memcpy_fromio(ablkreq->info, > engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET, > @@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req, > > /* Add output data for IV */ > ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req)); > - ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET, > - ivsize, CESA_TDMA_SRC_IN_SRAM, flags); > + ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET, > + CESA_SA_DATA_SRAM_OFFSET, > + CESA_TDMA_SRC_IN_SRAM, flags); > > if (ret) > goto err_free_tdma; > diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c > index 9fd7a5f..991dc3f 100644 > --- a/drivers/crypto/marvell/tdma.c > +++ b/drivers/crypto/marvell/tdma.c > @@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq) > if (type == CESA_TDMA_OP) > dma_pool_free(cesa_dev->dma->op_pool, tdma->op, > le32_to_cpu(tdma->src)); > - else if (type == CESA_TDMA_IV) > - dma_pool_free(cesa_dev->dma->iv_pool, tdma->data, > - le32_to_cpu(tdma->dst)); > > tdma = tdma->next; > dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma, > @@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags) > return new_tdma; > } > > -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > u32 size, u32 flags, gfp_t gfp_flags) > { > - > - struct mv_cesa_tdma_desc *tdma; > - u8 *iv; > - dma_addr_t dma_handle; > + struct mv_cesa_tdma_desc *tdma, *op_desc; > > tdma = mv_cesa_dma_add_desc(chain, gfp_flags); > if (IS_ERR(tdma)) > return PTR_ERR(tdma); > > - iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle); > - if (!iv) > - return -ENOMEM; Can you add a comment explaining what you're doing here? /* We re-use an existing op_desc object to retrieve the context * and result instead of allocating a new one. * There is at least one object of this type in a CESA crypto * req, just pick the first one in the chain. */ Once this is addressed, you can add my Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> > + for (op_desc = chain->first; op_desc; op_desc = op_desc->next) { > + u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK; > + > + if (type == CESA_TDMA_OP) > + break; > + } > + > + if (!op_desc) > + return -EIO; > > tdma->byte_cnt = cpu_to_le32(size | BIT(31)); > tdma->src = src; > - tdma->dst = cpu_to_le32(dma_handle); > - tdma->data = iv; > + tdma->dst = op_desc->src; > + tdma->op = op_desc->op; > > flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM); > - tdma->flags = flags | CESA_TDMA_IV; > + tdma->flags = flags | CESA_TDMA_RESULT; > return 0; > } > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests @ 2016-10-04 13:17 ` Boris Brezillon 0 siblings, 0 replies; 10+ messages in thread From: Boris Brezillon @ 2016-10-04 13:17 UTC (permalink / raw) To: linux-arm-kernel On Tue, 4 Oct 2016 14:57:19 +0200 Romain Perier <romain.perier@free-electrons.com> wrote: > So far, we used a dedicated dma pool to copy the result of outer IV for > cipher requests. Instead of using a dma pool per outer data, we prefer > use the op dma pool that contains all part of the request from the SRAM. > Then, the outer data that is likely to be used by the 'complete' > operation, is copied later. In this way, any type of result can be > retrieved by DMA for cipher or ahash requests. > > Signed-off-by: Romain Perier <romain.perier@free-electrons.com> > --- > > Changes in v3: > - Don't allocate a new op ctx for the last tdma descriptor. Instead > we point to the last op ctx in the tdma chain, and copy the context > of the current request to this location. > > Changes in v2: > - Use the dma pool "op" to retrieve outer data intead of introducing > a new one. > > drivers/crypto/marvell/cesa.c | 4 ---- > drivers/crypto/marvell/cesa.h | 5 ++--- > drivers/crypto/marvell/cipher.c | 8 +++++--- > drivers/crypto/marvell/tdma.c | 28 ++++++++++++++-------------- > 4 files changed, 21 insertions(+), 24 deletions(-) > > diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c > index 37dadb2..6e7a5c7 100644 > --- a/drivers/crypto/marvell/cesa.c > +++ b/drivers/crypto/marvell/cesa.c > @@ -375,10 +375,6 @@ static int mv_cesa_dev_dma_init(struct mv_cesa_dev *cesa) > if (!dma->padding_pool) > return -ENOMEM; > > - dma->iv_pool = dmam_pool_create("cesa_iv", dev, 16, 1, 0); > - if (!dma->iv_pool) > - return -ENOMEM; > - > cesa->dma = dma; > > return 0; > diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h > index e423d33..a768da7 100644 > --- a/drivers/crypto/marvell/cesa.h > +++ b/drivers/crypto/marvell/cesa.h > @@ -277,7 +277,7 @@ struct mv_cesa_op_ctx { > #define CESA_TDMA_DUMMY 0 > #define CESA_TDMA_DATA 1 > #define CESA_TDMA_OP 2 > -#define CESA_TDMA_IV 3 > +#define CESA_TDMA_RESULT 3 > > /** > * struct mv_cesa_tdma_desc - TDMA descriptor > @@ -393,7 +393,6 @@ struct mv_cesa_dev_dma { > struct dma_pool *op_pool; > struct dma_pool *cache_pool; > struct dma_pool *padding_pool; > - struct dma_pool *iv_pool; > }; > > /** > @@ -839,7 +838,7 @@ mv_cesa_tdma_desc_iter_init(struct mv_cesa_tdma_chain *chain) > memset(chain, 0, sizeof(*chain)); > } > > -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > u32 size, u32 flags, gfp_t gfp_flags); > > struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain, > diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c > index d19dc96..098871a 100644 > --- a/drivers/crypto/marvell/cipher.c > +++ b/drivers/crypto/marvell/cipher.c > @@ -212,7 +212,8 @@ mv_cesa_ablkcipher_complete(struct crypto_async_request *req) > struct mv_cesa_req *basereq; > > basereq = &creq->base; > - memcpy(ablkreq->info, basereq->chain.last->data, ivsize); > + memcpy(ablkreq->info, basereq->chain.last->op->ctx.blkcipher.iv, > + ivsize); > } else { > memcpy_fromio(ablkreq->info, > engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET, > @@ -373,8 +374,9 @@ static int mv_cesa_ablkcipher_dma_req_init(struct ablkcipher_request *req, > > /* Add output data for IV */ > ivsize = crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(req)); > - ret = mv_cesa_dma_add_iv_op(&basereq->chain, CESA_SA_CRYPT_IV_SRAM_OFFSET, > - ivsize, CESA_TDMA_SRC_IN_SRAM, flags); > + ret = mv_cesa_dma_add_result_op(&basereq->chain, CESA_SA_CFG_SRAM_OFFSET, > + CESA_SA_DATA_SRAM_OFFSET, > + CESA_TDMA_SRC_IN_SRAM, flags); > > if (ret) > goto err_free_tdma; > diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c > index 9fd7a5f..991dc3f 100644 > --- a/drivers/crypto/marvell/tdma.c > +++ b/drivers/crypto/marvell/tdma.c > @@ -69,9 +69,6 @@ void mv_cesa_dma_cleanup(struct mv_cesa_req *dreq) > if (type == CESA_TDMA_OP) > dma_pool_free(cesa_dev->dma->op_pool, tdma->op, > le32_to_cpu(tdma->src)); > - else if (type == CESA_TDMA_IV) > - dma_pool_free(cesa_dev->dma->iv_pool, tdma->data, > - le32_to_cpu(tdma->dst)); > > tdma = tdma->next; > dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma, > @@ -209,29 +206,32 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags) > return new_tdma; > } > > -int mv_cesa_dma_add_iv_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > +int mv_cesa_dma_add_result_op(struct mv_cesa_tdma_chain *chain, dma_addr_t src, > u32 size, u32 flags, gfp_t gfp_flags) > { > - > - struct mv_cesa_tdma_desc *tdma; > - u8 *iv; > - dma_addr_t dma_handle; > + struct mv_cesa_tdma_desc *tdma, *op_desc; > > tdma = mv_cesa_dma_add_desc(chain, gfp_flags); > if (IS_ERR(tdma)) > return PTR_ERR(tdma); > > - iv = dma_pool_alloc(cesa_dev->dma->iv_pool, gfp_flags, &dma_handle); > - if (!iv) > - return -ENOMEM; Can you add a comment explaining what you're doing here? /* We re-use an existing op_desc object to retrieve the context * and result instead of allocating a new one. * There is at least one object of this type in a CESA crypto * req, just pick the first one in the chain. */ Once this is addressed, you can add my Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> > + for (op_desc = chain->first; op_desc; op_desc = op_desc->next) { > + u32 type = op_desc->flags & CESA_TDMA_TYPE_MSK; > + > + if (type == CESA_TDMA_OP) > + break; > + } > + > + if (!op_desc) > + return -EIO; > > tdma->byte_cnt = cpu_to_le32(size | BIT(31)); > tdma->src = src; > - tdma->dst = cpu_to_le32(dma_handle); > - tdma->data = iv; > + tdma->dst = op_desc->src; > + tdma->op = op_desc->op; > > flags &= (CESA_TDMA_DST_IN_SRAM | CESA_TDMA_SRC_IN_SRAM); > - tdma->flags = flags | CESA_TDMA_IV; > + tdma->flags = flags | CESA_TDMA_RESULT; > return 0; > } > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests 2016-10-04 12:57 ` Romain Perier @ 2016-10-04 12:57 ` Romain Perier -1 siblings, 0 replies; 10+ messages in thread From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw) To: Boris Brezillon, Arnaud Ebalard Cc: David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement, Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel Currently, the driver breaks chain for all kind of hash requests in order to don't override intermediate states of partial ahash updates. However, some final ahash requests can be directly processed by the engine, and so without intermediate state. This is typically the case for most for the HMAC requests processed via IPSec. This commits adds a TDMA descriptor to copy context for these of requests into the "op" dma pool, then it allow to chain these requests at the DMA level. The 'complete' operation is also updated to retrieve the MAC digest from the right location. Signed-off-by: Romain Perier <romain.perier@free-electrons.com> --- Changes in v3: - Copy the whole context back to RAM and not just the digest. Also fixed a rebase issue ^^ (whoops) Changes in v2: - Replaced BUG_ON by an error - Add a variable "break_chain", with "type" to break the chain with ahash requests. It improves code readability. drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 64 insertions(+), 15 deletions(-) diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c index 9f28468..b36f196 100644 --- a/drivers/crypto/marvell/hash.c +++ b/drivers/crypto/marvell/hash.c @@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req) int i; digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq)); - for (i = 0; i < digsize / 4; i++) - creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i)); - if (creq->last_req) { + if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ && + !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) { + struct mv_cesa_tdma_desc *tdma = NULL; + __le32 *data = NULL; + + for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) { + u32 type = tdma->flags & CESA_TDMA_TYPE_MSK; + if (type == CESA_TDMA_RESULT) + break; + } + + if (!tdma) { + dev_err(cesa_dev->dev, "Failed to retrieve tdma " + "descriptor for outer data\n"); + return; + } + /* - * Hardware's MD5 digest is in little endian format, but - * SHA in big endian format + * Result is already in the correct endianess when the SA is + * used */ - if (creq->algo_le) { - __le32 *result = (void *)ahashreq->result; + data = tdma->op->ctx.hash.hash; + for (i = 0; i < digsize / 4; i++) + creq->state[i] = cpu_to_le32(data[i]); - for (i = 0; i < digsize / 4; i++) - result[i] = cpu_to_le32(creq->state[i]); - } else { - __be32 *result = (void *)ahashreq->result; + memcpy(ahashreq->result, data, digsize); + } else { + for (i = 0; i < digsize / 4; i++) + creq->state[i] = readl_relaxed(engine->regs + + CESA_IVDIG(i)); + if (creq->last_req) { + /* + * Hardware's MD5 digest is in little endian format, but + * SHA in big endian format + */ + if (creq->algo_le) { + __le32 *result = (void *)ahashreq->result; + + for (i = 0; i < digsize / 4; i++) + result[i] = cpu_to_le32(creq->state[i]); + } else { + __be32 *result = (void *)ahashreq->result; - for (i = 0; i < digsize / 4; i++) - result[i] = cpu_to_be32(creq->state[i]); + for (i = 0; i < digsize / 4; i++) + result[i] = cpu_to_be32(creq->state[i]); + } } } @@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain, CESA_SA_DESC_CFG_LAST_FRAG, CESA_SA_DESC_CFG_FRAG_MSK); + ret = mv_cesa_dma_add_result_op(chain, + CESA_SA_CFG_SRAM_OFFSET, + CESA_SA_DATA_SRAM_OFFSET, + CESA_TDMA_SRC_IN_SRAM, flags); + if (ret) + return ERR_PTR(-ENOMEM); return op; } @@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) struct mv_cesa_op_ctx *op = NULL; unsigned int frag_len; int ret; + u32 type; + bool break_chain = true; basereq->chain.first = NULL; basereq->chain.last = NULL; @@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) goto err_free_tdma; } + /* + * If results are copied via DMA, this means that this + * request can be directly processed by the engine, + * without partial updates. So we can chain it at the + * DMA level with other requests. + */ + type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK; + if (type == CESA_TDMA_RESULT) + break_chain = false; + if (op) { /* Add dummy desc to wait for crypto operation end */ ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags); @@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) else creq->cache_ptr = 0; - basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ | - CESA_TDMA_BREAK_CHAIN); + basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ; + + if (break_chain) + basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN; return 0; -- 2.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests @ 2016-10-04 12:57 ` Romain Perier 0 siblings, 0 replies; 10+ messages in thread From: Romain Perier @ 2016-10-04 12:57 UTC (permalink / raw) To: linux-arm-kernel Currently, the driver breaks chain for all kind of hash requests in order to don't override intermediate states of partial ahash updates. However, some final ahash requests can be directly processed by the engine, and so without intermediate state. This is typically the case for most for the HMAC requests processed via IPSec. This commits adds a TDMA descriptor to copy context for these of requests into the "op" dma pool, then it allow to chain these requests at the DMA level. The 'complete' operation is also updated to retrieve the MAC digest from the right location. Signed-off-by: Romain Perier <romain.perier@free-electrons.com> --- Changes in v3: - Copy the whole context back to RAM and not just the digest. Also fixed a rebase issue ^^ (whoops) Changes in v2: - Replaced BUG_ON by an error - Add a variable "break_chain", with "type" to break the chain with ahash requests. It improves code readability. drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 64 insertions(+), 15 deletions(-) diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c index 9f28468..b36f196 100644 --- a/drivers/crypto/marvell/hash.c +++ b/drivers/crypto/marvell/hash.c @@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req) int i; digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq)); - for (i = 0; i < digsize / 4; i++) - creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i)); - if (creq->last_req) { + if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ && + !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) { + struct mv_cesa_tdma_desc *tdma = NULL; + __le32 *data = NULL; + + for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) { + u32 type = tdma->flags & CESA_TDMA_TYPE_MSK; + if (type == CESA_TDMA_RESULT) + break; + } + + if (!tdma) { + dev_err(cesa_dev->dev, "Failed to retrieve tdma " + "descriptor for outer data\n"); + return; + } + /* - * Hardware's MD5 digest is in little endian format, but - * SHA in big endian format + * Result is already in the correct endianess when the SA is + * used */ - if (creq->algo_le) { - __le32 *result = (void *)ahashreq->result; + data = tdma->op->ctx.hash.hash; + for (i = 0; i < digsize / 4; i++) + creq->state[i] = cpu_to_le32(data[i]); - for (i = 0; i < digsize / 4; i++) - result[i] = cpu_to_le32(creq->state[i]); - } else { - __be32 *result = (void *)ahashreq->result; + memcpy(ahashreq->result, data, digsize); + } else { + for (i = 0; i < digsize / 4; i++) + creq->state[i] = readl_relaxed(engine->regs + + CESA_IVDIG(i)); + if (creq->last_req) { + /* + * Hardware's MD5 digest is in little endian format, but + * SHA in big endian format + */ + if (creq->algo_le) { + __le32 *result = (void *)ahashreq->result; + + for (i = 0; i < digsize / 4; i++) + result[i] = cpu_to_le32(creq->state[i]); + } else { + __be32 *result = (void *)ahashreq->result; - for (i = 0; i < digsize / 4; i++) - result[i] = cpu_to_be32(creq->state[i]); + for (i = 0; i < digsize / 4; i++) + result[i] = cpu_to_be32(creq->state[i]); + } } } @@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain, CESA_SA_DESC_CFG_LAST_FRAG, CESA_SA_DESC_CFG_FRAG_MSK); + ret = mv_cesa_dma_add_result_op(chain, + CESA_SA_CFG_SRAM_OFFSET, + CESA_SA_DATA_SRAM_OFFSET, + CESA_TDMA_SRC_IN_SRAM, flags); + if (ret) + return ERR_PTR(-ENOMEM); return op; } @@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) struct mv_cesa_op_ctx *op = NULL; unsigned int frag_len; int ret; + u32 type; + bool break_chain = true; basereq->chain.first = NULL; basereq->chain.last = NULL; @@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) goto err_free_tdma; } + /* + * If results are copied via DMA, this means that this + * request can be directly processed by the engine, + * without partial updates. So we can chain it at the + * DMA level with other requests. + */ + type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK; + if (type == CESA_TDMA_RESULT) + break_chain = false; + if (op) { /* Add dummy desc to wait for crypto operation end */ ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags); @@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) else creq->cache_ptr = 0; - basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ | - CESA_TDMA_BREAK_CHAIN); + basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ; + + if (break_chain) + basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN; return 0; -- 2.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests 2016-10-04 12:57 ` Romain Perier @ 2016-10-04 14:14 ` Boris Brezillon -1 siblings, 0 replies; 10+ messages in thread From: Boris Brezillon @ 2016-10-04 14:14 UTC (permalink / raw) To: Romain Perier Cc: Arnaud Ebalard, David S. Miller, Herbert Xu, Thomas Petazzoni, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory Clement, Nadav Haklai, Ofer Heifetz, linux-crypto, linux-arm-kernel On Tue, 4 Oct 2016 14:57:20 +0200 Romain Perier <romain.perier@free-electrons.com> wrote: > Currently, the driver breaks chain for all kind of hash requests in order to > don't override intermediate states of partial ahash updates. However, some final > ahash requests can be directly processed by the engine, and so without > intermediate state. This is typically the case for most for the HMAC requests > processed via IPSec. > > This commits adds a TDMA descriptor to copy context for these of requests > into the "op" dma pool, then it allow to chain these requests at the DMA level. > The 'complete' operation is also updated to retrieve the MAC digest from the > right location. > > Signed-off-by: Romain Perier <romain.perier@free-electrons.com> > --- > > Changes in v3: > - Copy the whole context back to RAM and not just the digest. Also > fixed a rebase issue ^^ (whoops) > > Changes in v2: > - Replaced BUG_ON by an error > - Add a variable "break_chain", with "type" to break the chain > > with ahash requests. It improves code readability. > drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++-------- > 1 file changed, 64 insertions(+), 15 deletions(-) > > diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c > index 9f28468..b36f196 100644 > --- a/drivers/crypto/marvell/hash.c > +++ b/drivers/crypto/marvell/hash.c > @@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req) > int i; > > digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq)); > - for (i = 0; i < digsize / 4; i++) > - creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i)); > > - if (creq->last_req) { > + if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ && > + !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) { > + struct mv_cesa_tdma_desc *tdma = NULL; > + __le32 *data = NULL; > + > + for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) { > + u32 type = tdma->flags & CESA_TDMA_TYPE_MSK; > + if (type == CESA_TDMA_RESULT) > + break; > + } You should be able to drop the DUMMY desc at the end of the chain and replace it by the RESULT desc. This way, you won't have to iterate over the chain to find the TDMA_RESULT element: it should always be the last desc in the chain. > + > + if (!tdma) { > + dev_err(cesa_dev->dev, "Failed to retrieve tdma " > + "descriptor for outer data\n"); > + return; > + } > + > /* > - * Hardware's MD5 digest is in little endian format, but > - * SHA in big endian format > + * Result is already in the correct endianess when the SA is > + * used > */ > - if (creq->algo_le) { > - __le32 *result = (void *)ahashreq->result; > + data = tdma->op->ctx.hash.hash; > + for (i = 0; i < digsize / 4; i++) > + creq->state[i] = cpu_to_le32(data[i]); > > - for (i = 0; i < digsize / 4; i++) > - result[i] = cpu_to_le32(creq->state[i]); > - } else { > - __be32 *result = (void *)ahashreq->result; > + memcpy(ahashreq->result, data, digsize); > + } else { > + for (i = 0; i < digsize / 4; i++) > + creq->state[i] = readl_relaxed(engine->regs + > + CESA_IVDIG(i)); > + if (creq->last_req) { > + /* > + * Hardware's MD5 digest is in little endian format, but > + * SHA in big endian format > + */ > + if (creq->algo_le) { > + __le32 *result = (void *)ahashreq->result; > + > + for (i = 0; i < digsize / 4; i++) > + result[i] = cpu_to_le32(creq->state[i]); > + } else { > + __be32 *result = (void *)ahashreq->result; > > - for (i = 0; i < digsize / 4; i++) > - result[i] = cpu_to_be32(creq->state[i]); > + for (i = 0; i < digsize / 4; i++) > + result[i] = cpu_to_be32(creq->state[i]); > + } > } > } > > @@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain, > CESA_SA_DESC_CFG_LAST_FRAG, > CESA_SA_DESC_CFG_FRAG_MSK); > > + ret = mv_cesa_dma_add_result_op(chain, > + CESA_SA_CFG_SRAM_OFFSET, > + CESA_SA_DATA_SRAM_OFFSET, > + CESA_TDMA_SRC_IN_SRAM, flags); > + if (ret) > + return ERR_PTR(-ENOMEM); > return op; > } > > @@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) > struct mv_cesa_op_ctx *op = NULL; > unsigned int frag_len; > int ret; > + u32 type; > + bool break_chain = true; > > basereq->chain.first = NULL; > basereq->chain.last = NULL; > @@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) > goto err_free_tdma; > } > > + /* > + * If results are copied via DMA, this means that this > + * request can be directly processed by the engine, > + * without partial updates. So we can chain it at the > + * DMA level with other requests. > + */ > + type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK; > + if (type == CESA_TDMA_RESULT) > + break_chain = false; > + > if (op) { > /* Add dummy desc to wait for crypto operation end */ > ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags); > @@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) > else > creq->cache_ptr = 0; > > - basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ | > - CESA_TDMA_BREAK_CHAIN); > + basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ; > + > + if (break_chain) > + basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN; Not sure this break_chain variable is really needed. you can directly test the type of the last element in the TDMA chain here and if it's != CESA_TDMA_RESULT, pass the CESA_TDMA_BREAK_CHAIN flag. > > return 0; > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests @ 2016-10-04 14:14 ` Boris Brezillon 0 siblings, 0 replies; 10+ messages in thread From: Boris Brezillon @ 2016-10-04 14:14 UTC (permalink / raw) To: linux-arm-kernel On Tue, 4 Oct 2016 14:57:20 +0200 Romain Perier <romain.perier@free-electrons.com> wrote: > Currently, the driver breaks chain for all kind of hash requests in order to > don't override intermediate states of partial ahash updates. However, some final > ahash requests can be directly processed by the engine, and so without > intermediate state. This is typically the case for most for the HMAC requests > processed via IPSec. > > This commits adds a TDMA descriptor to copy context for these of requests > into the "op" dma pool, then it allow to chain these requests at the DMA level. > The 'complete' operation is also updated to retrieve the MAC digest from the > right location. > > Signed-off-by: Romain Perier <romain.perier@free-electrons.com> > --- > > Changes in v3: > - Copy the whole context back to RAM and not just the digest. Also > fixed a rebase issue ^^ (whoops) > > Changes in v2: > - Replaced BUG_ON by an error > - Add a variable "break_chain", with "type" to break the chain > > with ahash requests. It improves code readability. > drivers/crypto/marvell/hash.c | 79 +++++++++++++++++++++++++++++++++++-------- > 1 file changed, 64 insertions(+), 15 deletions(-) > > diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c > index 9f28468..b36f196 100644 > --- a/drivers/crypto/marvell/hash.c > +++ b/drivers/crypto/marvell/hash.c > @@ -312,24 +312,53 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req) > int i; > > digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq)); > - for (i = 0; i < digsize / 4; i++) > - creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i)); > > - if (creq->last_req) { > + if (mv_cesa_req_get_type(&creq->base) == CESA_DMA_REQ && > + !(creq->base.chain.last->flags & CESA_TDMA_BREAK_CHAIN)) { > + struct mv_cesa_tdma_desc *tdma = NULL; > + __le32 *data = NULL; > + > + for (tdma = creq->base.chain.first; tdma; tdma = tdma->next) { > + u32 type = tdma->flags & CESA_TDMA_TYPE_MSK; > + if (type == CESA_TDMA_RESULT) > + break; > + } You should be able to drop the DUMMY desc at the end of the chain and replace it by the RESULT desc. This way, you won't have to iterate over the chain to find the TDMA_RESULT element: it should always be the last desc in the chain. > + > + if (!tdma) { > + dev_err(cesa_dev->dev, "Failed to retrieve tdma " > + "descriptor for outer data\n"); > + return; > + } > + > /* > - * Hardware's MD5 digest is in little endian format, but > - * SHA in big endian format > + * Result is already in the correct endianess when the SA is > + * used > */ > - if (creq->algo_le) { > - __le32 *result = (void *)ahashreq->result; > + data = tdma->op->ctx.hash.hash; > + for (i = 0; i < digsize / 4; i++) > + creq->state[i] = cpu_to_le32(data[i]); > > - for (i = 0; i < digsize / 4; i++) > - result[i] = cpu_to_le32(creq->state[i]); > - } else { > - __be32 *result = (void *)ahashreq->result; > + memcpy(ahashreq->result, data, digsize); > + } else { > + for (i = 0; i < digsize / 4; i++) > + creq->state[i] = readl_relaxed(engine->regs + > + CESA_IVDIG(i)); > + if (creq->last_req) { > + /* > + * Hardware's MD5 digest is in little endian format, but > + * SHA in big endian format > + */ > + if (creq->algo_le) { > + __le32 *result = (void *)ahashreq->result; > + > + for (i = 0; i < digsize / 4; i++) > + result[i] = cpu_to_le32(creq->state[i]); > + } else { > + __be32 *result = (void *)ahashreq->result; > > - for (i = 0; i < digsize / 4; i++) > - result[i] = cpu_to_be32(creq->state[i]); > + for (i = 0; i < digsize / 4; i++) > + result[i] = cpu_to_be32(creq->state[i]); > + } > } > } > > @@ -504,6 +533,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain, > CESA_SA_DESC_CFG_LAST_FRAG, > CESA_SA_DESC_CFG_FRAG_MSK); > > + ret = mv_cesa_dma_add_result_op(chain, > + CESA_SA_CFG_SRAM_OFFSET, > + CESA_SA_DATA_SRAM_OFFSET, > + CESA_TDMA_SRC_IN_SRAM, flags); > + if (ret) > + return ERR_PTR(-ENOMEM); > return op; > } > > @@ -564,6 +599,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) > struct mv_cesa_op_ctx *op = NULL; > unsigned int frag_len; > int ret; > + u32 type; > + bool break_chain = true; > > basereq->chain.first = NULL; > basereq->chain.last = NULL; > @@ -635,6 +672,16 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) > goto err_free_tdma; > } > > + /* > + * If results are copied via DMA, this means that this > + * request can be directly processed by the engine, > + * without partial updates. So we can chain it at the > + * DMA level with other requests. > + */ > + type = basereq->chain.last->flags & CESA_TDMA_TYPE_MSK; > + if (type == CESA_TDMA_RESULT) > + break_chain = false; > + > if (op) { > /* Add dummy desc to wait for crypto operation end */ > ret = mv_cesa_dma_add_dummy_end(&basereq->chain, flags); > @@ -648,8 +695,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req) > else > creq->cache_ptr = 0; > > - basereq->chain.last->flags |= (CESA_TDMA_END_OF_REQ | > - CESA_TDMA_BREAK_CHAIN); > + basereq->chain.last->flags |= CESA_TDMA_END_OF_REQ; > + > + if (break_chain) > + basereq->chain.last->flags |= CESA_TDMA_BREAK_CHAIN; Not sure this break_chain variable is really needed. you can directly test the type of the last element in the TDMA chain here and if it's != CESA_TDMA_RESULT, pass the CESA_TDMA_BREAK_CHAIN flag. > > return 0; > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-10-04 14:14 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-04 12:57 [PATCH v3 0/2] Improve DMA chaining for ahash requests Romain Perier 2016-10-04 12:57 ` Romain Perier 2016-10-04 12:57 ` [PATCH v3 1/2] crypto: marvell - Use an unique pool to copy results of requests Romain Perier 2016-10-04 12:57 ` Romain Perier 2016-10-04 13:17 ` Boris Brezillon 2016-10-04 13:17 ` Boris Brezillon 2016-10-04 12:57 ` [PATCH v3 2/2] crypto: marvell - Don't break chain for computable last ahash requests Romain Perier 2016-10-04 12:57 ` Romain Perier 2016-10-04 14:14 ` Boris Brezillon 2016-10-04 14:14 ` Boris Brezillon
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.