linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] crypto: caam/jr - optimize job ring enqueue and dequeue operations
@ 2019-03-05  7:00 Vakul Garg
  2019-03-05 16:43 ` Horia Geanta
  0 siblings, 1 reply; 4+ messages in thread
From: Vakul Garg @ 2019-03-05  7:00 UTC (permalink / raw)
  To: linux-crypto; +Cc: Horia Geanta, Aymen Sghaier, herbert, davem, Vakul Garg

Instead of reading job ring's occupancy registers for every req/rsp
enqueued/dequeued respectively, we read these registers once and store
them in memory. After completing a job enqueue/dequeue, we decrement
these values. When these values become zero, we refresh the snapshot of
job ring's occupancy registers. This eliminates need of expensive device
register read operations for every job enqueued and dequeued and hence
makes caam_jr_enqueue() and caam_jr_dequeue() faster.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
---
 drivers/crypto/caam/intern.h |  1 +
 drivers/crypto/caam/jr.c     | 12 ++++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
index 5869ad58d497..b6d96e2ecf4c 100644
--- a/drivers/crypto/caam/intern.h
+++ b/drivers/crypto/caam/intern.h
@@ -59,6 +59,7 @@ struct caam_drv_private_jr {
 	int out_ring_read_index;	/* Output index "tail" */
 	int tail;			/* entinfo (s/w ring) tail index */
 	struct jr_outentry *outring;	/* Base of output ring, DMA-safe */
+	u32 inpring_avail;		/* Number of free entries in i/p ring*/
 };
 
 /*
diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
index d50085a03597..b9caa95755d1 100644
--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -170,8 +170,10 @@ static void caam_jr_dequeue(unsigned long devarg)
 	void (*usercall)(struct device *dev, u32 *desc, u32 status, void *arg);
 	u32 *userdesc, userstatus;
 	void *userarg;
+	u32 outring_used = 0;
 
-	while (rd_reg32(&jrp->rregs->outring_used)) {
+	while (outring_used ||
+	       (outring_used = rd_reg32(&jrp->rregs->outring_used))) {
 
 		head = READ_ONCE(jrp->head);
 
@@ -236,6 +238,7 @@ static void caam_jr_dequeue(unsigned long devarg)
 
 		/* Finally, execute user's callback */
 		usercall(dev, userdesc, userstatus, userarg);
+		outring_used--;
 	}
 
 	/* reenable / unmask IRQs */
@@ -345,7 +348,7 @@ int caam_jr_enqueue(struct device *dev, u32 *desc,
 	head = jrp->head;
 	tail = READ_ONCE(jrp->tail);
 
-	if (!rd_reg32(&jrp->rregs->inpring_avail) ||
+	if (!jrp->inpring_avail ||
 	    CIRC_SPACE(head, tail, JOBR_DEPTH) <= 0) {
 		spin_unlock_bh(&jrp->inplock);
 		dma_unmap_single(dev, desc_dma, desc_size, DMA_TO_DEVICE);
@@ -380,6 +383,10 @@ int caam_jr_enqueue(struct device *dev, u32 *desc,
 
 	wr_reg32(&jrp->rregs->inpring_jobadd, 1);
 
+	jrp->inpring_avail--;
+	if (!jrp->inpring_avail)
+		jrp->inpring_avail = rd_reg32(&jrp->rregs->inpring_avail);
+
 	spin_unlock_bh(&jrp->inplock);
 
 	return 0;
@@ -442,6 +449,7 @@ static int caam_jr_init(struct device *dev)
 	wr_reg32(&jrp->rregs->outring_size, JOBR_DEPTH);
 
 	jrp->ringsize = JOBR_DEPTH;
+	jrp->inpring_avail = JOBR_DEPTH;
 
 	spin_lock_init(&jrp->inplock);
 	spin_lock_init(&jrp->outlock);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] crypto: caam/jr - optimize job ring enqueue and dequeue operations
  2019-03-05  7:00 [PATCH] crypto: caam/jr - optimize job ring enqueue and dequeue operations Vakul Garg
@ 2019-03-05 16:43 ` Horia Geanta
  2019-03-05 17:35   ` Vakul Garg
  0 siblings, 1 reply; 4+ messages in thread
From: Horia Geanta @ 2019-03-05 16:43 UTC (permalink / raw)
  To: Vakul Garg, linux-crypto; +Cc: Aymen Sghaier, herbert, davem

On 3/5/2019 9:00 AM, Vakul Garg wrote:
> Instead of reading job ring's occupancy registers for every req/rsp
> enqueued/dequeued respectively, we read these registers once and store
> them in memory. After completing a job enqueue/dequeue, we decrement
> these values. When these values become zero, we refresh the snapshot of
> job ring's occupancy registers. This eliminates need of expensive device
> register read operations for every job enqueued and dequeued and hence
> makes caam_jr_enqueue() and caam_jr_dequeue() faster.
> 
How expensive?
Please share the case you benchmarked and performance improvement you noticed.

Somewhat related: it seems that after commit a0ca6ca022ac ("crypto: caam - one
tasklet per job ring") the "outlock" spinlock could be removed, this being a
good candidate for further improvement.

> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> ---
>  drivers/crypto/caam/intern.h |  1 +
>  drivers/crypto/caam/jr.c     | 12 ++++++++++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
> index 5869ad58d497..b6d96e2ecf4c 100644
> --- a/drivers/crypto/caam/intern.h
> +++ b/drivers/crypto/caam/intern.h
> @@ -59,6 +59,7 @@ struct caam_drv_private_jr {
>  	int out_ring_read_index;	/* Output index "tail" */
>  	int tail;			/* entinfo (s/w ring) tail index */
>  	struct jr_outentry *outring;	/* Base of output ring, DMA-safe */
> +	u32 inpring_avail;		/* Number of free entries in i/p ring*/
Locality: this should be near the other enqueue-related structure members.

Nitpick: use "input" instead of "i/p".

Thanks,
Horia


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] crypto: caam/jr - optimize job ring enqueue and dequeue operations
  2019-03-05 16:43 ` Horia Geanta
@ 2019-03-05 17:35   ` Vakul Garg
  2019-03-05 17:55     ` Horia Geanta
  0 siblings, 1 reply; 4+ messages in thread
From: Vakul Garg @ 2019-03-05 17:35 UTC (permalink / raw)
  To: Horia Geanta, linux-crypto; +Cc: Aymen Sghaier, herbert, davem



> -----Original Message-----
> From: Horia Geanta
> Sent: Tuesday, March 5, 2019 10:14 PM
> To: Vakul Garg <vakul.garg@nxp.com>; linux-crypto@vger.kernel.org
> Cc: Aymen Sghaier <aymen.sghaier@nxp.com>;
> herbert@gondor.apana.org.au; davem@davemloft.net
> Subject: Re: [PATCH] crypto: caam/jr - optimize job ring enqueue and
> dequeue operations
> 
> On 3/5/2019 9:00 AM, Vakul Garg wrote:
> > Instead of reading job ring's occupancy registers for every req/rsp
> > enqueued/dequeued respectively, we read these registers once and store
> > them in memory. After completing a job enqueue/dequeue, we decrement
> > these values. When these values become zero, we refresh the snapshot
> > of job ring's occupancy registers. This eliminates need of expensive
> > device register read operations for every job enqueued and dequeued
> > and hence makes caam_jr_enqueue() and caam_jr_dequeue() faster.
> >
> How expensive?
> Please share the case you benchmarked and performance improvement you
> noticed.

The performance of kernel ipsec improved by about 6% on ls1028.

> 
> Somewhat related: it seems that after commit a0ca6ca022ac ("crypto: caam
> - one tasklet per job ring") the "outlock" spinlock could be removed, this
> being a good candidate for further improvement.
> 
Yes, I remember I discussed it before. 
There are other inefficiencies as well.
Will submit patches. 

> > Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> > ---
> >  drivers/crypto/caam/intern.h |  1 +
> >  drivers/crypto/caam/jr.c     | 12 ++++++++++--
> >  2 files changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/crypto/caam/intern.h
> > b/drivers/crypto/caam/intern.h index 5869ad58d497..b6d96e2ecf4c
> 100644
> > --- a/drivers/crypto/caam/intern.h
> > +++ b/drivers/crypto/caam/intern.h
> > @@ -59,6 +59,7 @@ struct caam_drv_private_jr {
> >  	int out_ring_read_index;	/* Output index "tail" */
> >  	int tail;			/* entinfo (s/w ring) tail index */
> >  	struct jr_outentry *outring;	/* Base of output ring, DMA-safe */
> > +	u32 inpring_avail;		/* Number of free entries in i/p
> ring*/
> Locality: this should be near the other enqueue-related structure members.
> 
> Nitpick: use "input" instead of "i/p".
> 

Sending v2. 

> Thanks,
> Horia


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] crypto: caam/jr - optimize job ring enqueue and dequeue operations
  2019-03-05 17:35   ` Vakul Garg
@ 2019-03-05 17:55     ` Horia Geanta
  0 siblings, 0 replies; 4+ messages in thread
From: Horia Geanta @ 2019-03-05 17:55 UTC (permalink / raw)
  To: Vakul Garg, linux-crypto; +Cc: Aymen Sghaier, herbert, davem

On 3/5/2019 7:35 PM, Vakul Garg wrote:
> The performance of kernel ipsec improved by about 6% on ls1028.
> 
The information should be part of the commit message.

Thanks,
Horia

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-05 17:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05  7:00 [PATCH] crypto: caam/jr - optimize job ring enqueue and dequeue operations Vakul Garg
2019-03-05 16:43 ` Horia Geanta
2019-03-05 17:35   ` Vakul Garg
2019-03-05 17:55     ` Horia Geanta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).