Backlog support for CAAM?
diff mbox series

Message ID 839258138.49105.1564003328543.JavaMail.zimbra@nod.at
State New
Headers show
Series
  • Backlog support for CAAM?
Related show

Commit Message

Richard Weinberger July 24, 2019, 9:22 p.m. UTC
Hi!

Recently I had the pleasure to debug a lockup on a imx6 based platform.
It turned out that the lockup was caused by the CAAM driver because it
just returns -EBUSY upon a full job ring.

Then I found commits:
0618764cb25f ("dm crypt: fix deadlock when async crypto algorithm returns -EBUSY")
c0403ec0bb5a ("Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"")

Is there a reason why the driver has still no proper backlog support?

If it is just a matter of -ENOPATCH, I have some cycles left an can help.
But before working on this topic I'd like to figure what the current state
or plans are. :-)

So far I work around the issue with disgusting hacks like this one:


Thanks,
//richard

Comments

Horia Geanta July 25, 2019, 5:57 a.m. UTC | #1
On 7/25/2019 12:22 AM, Richard Weinberger wrote:
> Hi!
> 
> Recently I had the pleasure to debug a lockup on a imx6 based platform.
> It turned out that the lockup was caused by the CAAM driver because it
> just returns -EBUSY upon a full job ring.
> 
> Then I found commits:
> 0618764cb25f ("dm crypt: fix deadlock when async crypto algorithm returns -EBUSY")
> c0403ec0bb5a ("Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"")
> 
Truly sorry for the inconvenience.
Indeed this is a caam driver issue, and not a dm-crypt one.

> Is there a reason why the driver has still no proper backlog support?
> 
We've been rejected a few times or the implementation had performance issues:
v1: https://patchwork.kernel.org/patch/7144701
v2: https://patchwork.kernel.org/patch/7199241
v3: https://patchwork.kernel.org/patch/7221941
v4: https://patchwork.kernel.org/patch/7230241
v5: https://patchwork.kernel.org/patch/9033121

and we haven't been persistent enough.

> If it is just a matter of -ENOPATCH, I have some cycles left an can help.
> But before working on this topic I'd like to figure what the current state
> or plans are. :-)
> 
Right now we're evaluating two options:
-reworking v5 above
-using crypto engine (crypto/crypto_engine.c)

Ideally crypto engine should be the way to go.
However we need to make sure performance degradation is negligible,
which unfortunately is not case.

Currently it seems that crypto engine has an issue with sending
multiple crypto requests from (SW) engine queue -> (HW) caam queue.

More exactly, crypto_pump_requests() performs this check:
        /* Make sure we are not already running a request */
        if (engine->cur_req)
                goto out;

thus it's not possible to add more crypto requests to the caam queue
until HW finishes the work on the current crypto request and
calls crypto_finalize_request():
        if (finalize_cur_req) {
		[...]
                engine->cur_req = NULL;

Horia
Richard Weinberger July 25, 2019, 8:12 a.m. UTC | #2
----- Ursprüngliche Mail -----
> Von: "horia geanta" <horia.geanta@nxp.com>
> An: "richard" <richard@nod.at>, "Linux Crypto Mailing List" <linux-crypto@vger.kernel.org>, "linux-kernel"
> <linux-kernel@vger.kernel.org>
> CC: "aymen sghaier" <aymen.sghaier@nxp.com>, "david" <david@sigma-star.at>, "Baolin Wang" <baolin.wang@linaro.org>
> Gesendet: Donnerstag, 25. Juli 2019 07:57:28
> Betreff: Re: Backlog support for CAAM?

> On 7/25/2019 12:22 AM, Richard Weinberger wrote:
>> Hi!
>> 
>> Recently I had the pleasure to debug a lockup on a imx6 based platform.
>> It turned out that the lockup was caused by the CAAM driver because it
>> just returns -EBUSY upon a full job ring.
>> 
>> Then I found commits:
>> 0618764cb25f ("dm crypt: fix deadlock when async crypto algorithm returns
>> -EBUSY")
>> c0403ec0bb5a ("Revert "dm crypt: fix deadlock when async crypto algorithm
>> returns -EBUSY"")
>> 
> Truly sorry for the inconvenience.

No need to worry. Nobody got hurt. :-)

> Indeed this is a caam driver issue, and not a dm-crypt one.
> 
>> Is there a reason why the driver has still no proper backlog support?
>> 
> We've been rejected a few times or the implementation had performance issues:
> v1: https://patchwork.kernel.org/patch/7144701
> v2: https://patchwork.kernel.org/patch/7199241
> v3: https://patchwork.kernel.org/patch/7221941
> v4: https://patchwork.kernel.org/patch/7230241
> v5: https://patchwork.kernel.org/patch/9033121
> 
> and we haven't been persistent enough.
> 
>> If it is just a matter of -ENOPATCH, I have some cycles left an can help.
>> But before working on this topic I'd like to figure what the current state
>> or plans are. :-)
>> 
> Right now we're evaluating two options:
> -reworking v5 above
> -using crypto engine (crypto/crypto_engine.c)

I'll look into that to get a better understanding.

> Ideally crypto engine should be the way to go.
> However we need to make sure performance degradation is negligible,
> which unfortunately is not case.
> 
> Currently it seems that crypto engine has an issue with sending
> multiple crypto requests from (SW) engine queue -> (HW) caam queue.
> 
> More exactly, crypto_pump_requests() performs this check:
>        /* Make sure we are not already running a request */
>        if (engine->cur_req)
>                goto out;
> 
> thus it's not possible to add more crypto requests to the caam queue
> until HW finishes the work on the current crypto request and
> calls crypto_finalize_request():
>        if (finalize_cur_req) {
>		[...]
>                engine->cur_req = NULL;

Let me also dig into this.
Thanks for all the pointers!

Thanks,
//richard
Richard Weinberger July 28, 2019, 8:50 p.m. UTC | #3
----- Ursprüngliche Mail -----
> Right now we're evaluating two options:
> -reworking v5 above
> -using crypto engine (crypto/crypto_engine.c)
> 
> Ideally crypto engine should be the way to go.
> However we need to make sure performance degradation is negligible,
> which unfortunately is not case.
> 
> Currently it seems that crypto engine has an issue with sending
> multiple crypto requests from (SW) engine queue -> (HW) caam queue.
> 
> More exactly, crypto_pump_requests() performs this check:
>        /* Make sure we are not already running a request */
>        if (engine->cur_req)
>                goto out;
> 
> thus it's not possible to add more crypto requests to the caam queue
> until HW finishes the work on the current crypto request and
> calls crypto_finalize_request():
>        if (finalize_cur_req) {
>		[...]
>                engine->cur_req = NULL;

Did you consider using a hybrid approach?

Please let me sketch my idea:

- Let's have a worker thread which serves a software queue.
- The software queue is a linked list of requests.
- Upon job submission the driver checks whether the software queue is empty.
- If the software queue is empty the regular submission continues.
- Is the hardware queue full at this point, the request is put on the software
  queue and we return EBUSY.
- If upon job submission the software queue not empty, the new job is also put
  on the software queue.
- The worker thread is woken up every time a new job is put on the software
  queue and every time CAAM processed a job.

That way we can keep the fast path fast. If hardware queue not full, software queue
can be bypassed completely.
If the software queue is used once it will become empty as soon jobs are getting
submitted at a slower rate and the fast path will be used again.

What do you think?

Thanks,
//richard
Horia Geanta July 30, 2019, 10:28 a.m. UTC | #4
On 7/28/2019 11:50 PM, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Right now we're evaluating two options:
>> -reworking v5 above
>> -using crypto engine (crypto/crypto_engine.c)
>>
>> Ideally crypto engine should be the way to go.
>> However we need to make sure performance degradation is negligible,
>> which unfortunately is not case.
>>
>> Currently it seems that crypto engine has an issue with sending
>> multiple crypto requests from (SW) engine queue -> (HW) caam queue.
>>
>> More exactly, crypto_pump_requests() performs this check:
>>        /* Make sure we are not already running a request */
>>        if (engine->cur_req)
>>                goto out;
>>
>> thus it's not possible to add more crypto requests to the caam queue
>> until HW finishes the work on the current crypto request and
>> calls crypto_finalize_request():
>>        if (finalize_cur_req) {
>> 		[...]
>>                engine->cur_req = NULL;
> 
> Did you consider using a hybrid approach?
> 
Yes, this is on our plate, though we haven't tried it yet.

> Please let me sketch my idea:
> 
> - Let's have a worker thread which serves a software queue.
> - The software queue is a linked list of requests.
> - Upon job submission the driver checks whether the software queue is empty.
> - If the software queue is empty the regular submission continues.
> - Is the hardware queue full at this point, the request is put on the software
>   queue and we return EBUSY.
> - If upon job submission the software queue not empty, the new job is also put
>   on the software queue.
> - The worker thread is woken up every time a new job is put on the software
>   queue and every time CAAM processed a job.
> 
> That way we can keep the fast path fast. If hardware queue not full, software queue
> can be bypassed completely.
> If the software queue is used once it will become empty as soon jobs are getting
> submitted at a slower rate and the fast path will be used again.
> 
> What do you think?
> 
The optimization mentioned above - bypassing SW queue (i.e. try enqueuing
to HW queue if SW is empty) should probably be added into crypto engine
implementation itself - for e.g. in crypto_transfer_request().

Thanks,
Horia

Patch
diff mbox series

--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -339,6 +339,7 @@  int caam_jr_enqueue(struct device *dev, u32 *desc,
                return -EIO;
        }
 
+again:
        spin_lock_bh(&jrp->inplock);
 
        head = jrp->head;
@@ -347,8 +348,8 @@  int caam_jr_enqueue(struct device *dev, u32 *desc,
        if (!rd_reg32(&jrp->rregs->inpring_avail) ||
            CIRC_SPACE(head, tail, JOBR_DEPTH) <= 0) {
                spin_unlock_bh(&jrp->inplock);
-               dma_unmap_single(dev, desc_dma, desc_size, DMA_TO_DEVICE);
-               return -EBUSY;
+               msleep(100);
+               goto again;
        }
 
        head_entry = &jrp->entinfo[head];