linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Re: CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query)
       [not found] <20210301152231.GC5549@pengutronix.de>
@ 2021-03-03 10:26 ` Horia Geantă
  2021-03-03 12:07   ` Robin Murphy
  2021-03-03 14:56   ` Sascha Hauer
  0 siblings, 2 replies; 6+ messages in thread
From: Horia Geantă @ 2021-03-03 10:26 UTC (permalink / raw)
  To: Sascha Hauer, linux-crypto
  Cc: Aymen Sghaier, kernel, Greg Ungerer, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, iommu, Russell King,
	linux-arm-kernel

Adding some people in the loop, maybe they could help in understanding
why lack of "dma-coherent" property for a HW-coherent device could lead to
unexpected / strange side effects.

On 3/1/2021 5:22 PM, Sascha Hauer wrote:
> Hi All,
> 
> I am on a Layerscape LS1046a using Linux-5.11. The CAAM driver sometimes
> crashes during the run-time self tests with:
> 
>> kernel BUG at drivers/crypto/caam/jr.c:247!
>> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty #12
>> Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT)
>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>> pc : caam_jr_dequeue+0x98/0x57c
>> lr : caam_jr_dequeue+0x98/0x57c
>> sp : ffff800010003d50
>> x29: ffff800010003d50 x28: ffff8000118d4000
>> x27: ffff8000118d4328 x26: 00000000000001f0
>> x25: ffff0008022be480 x24: ffff0008022c6410
>> x23: 00000000000001f1 x22: ffff8000118d4329
>> x21: 0000000000004d80 x20: 00000000000001f1
>> x19: 0000000000000001 x18: 0000000000000020
>> x17: 0000000000000000 x16: 0000000000000015
>> x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e
>> x13: 2e2e2e2e2e2e2020 x12: 3030303030303030
>> x11: ffff800011700a38 x10: 00000000fffff000
>> x9 : ffff8000100ada30 x8 : ffff8000116a8a38
>> x7 : 0000000000000001 x6 : 0000000000000000
>> x5 : 0000000000000000 x4 : 0000000000000000
>> x3 : 00000000ffffffff x2 : 0000000000000000
>> x1 : 0000000000000000 x0 : 0000000000001800
>> Call trace:
>>  caam_jr_dequeue+0x98/0x57c
>>  tasklet_action_common.constprop.0+0x164/0x18c
>>  tasklet_action+0x44/0x54
>>  __do_softirq+0x160/0x454
>>  __irq_exit_rcu+0x164/0x16c
>>  irq_exit+0x1c/0x30
>>  __handle_domain_irq+0xc0/0x13c
>>  gic_handle_irq+0x5c/0xf0
>>  el1_irq+0xb4/0x180
>>  arch_cpu_idle+0x18/0x30
>>  default_idle_call+0x3c/0x1c0
>>  do_idle+0x23c/0x274
>>  cpu_startup_entry+0x34/0x70
>>  rest_init+0xdc/0xec
>>  arch_call_rest_init+0x1c/0x28
>>  start_kernel+0x4ac/0x4e4
>> Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000)
> 
> The driver iterates over the descriptors in the output ring and matches them
> with the ones it has previously queued. If it doesn't find a matching
> descriptor it complains with the BUG_ON() seen above. What I see sometimes is
> that the address in the output ring is 0x0, the job status in this case is
> 0x40000006 (meaning DECO Invalid KEY command). It seems that the CAAM doesn't
> write the descriptor address to the output ring at least in some error cases.
> When we don't have the descriptor address of the failed descriptor we have no
> way to find it in the list of queued descriptors, thus we also can't find the
> callback for that descriptor. This looks very unfortunate, anyone else seen
> this or has an idea what to do about it?
> 
> I haven't investigated yet which job actually fails and why. Of course that would
> be my ultimate goal to find that out.
> 
This looks very similar to an earlier report from Greg.
He confirmed that adding "dma-coherent" property to the "crypto" DT node
fixes the issue:
https://lore.kernel.org/linux-crypto/74f664f5-5433-d322-4789-3c78bdb814d8@kernel.org
Patch rebased on v5.11 is at the bottom. Does it work for you too?

What I don't understand (and the reason I've postponed upstreaming it) is
_why_ exactly this patch is working.
I would have expected that a HW-coherent device to work fine even without
the "dma-coherent" DT property in the corresponding node.
I've found what seems related discussions involving eSDHC, but still I am trying
to figure out what's happening. I'd really appreciate a clarification on what
could go wrong (e.g. interactions with SW-based cache management etc.):
https://lore.kernel.org/linux-mmc/20190916171509.GG25745@shell.armlinux.org.uk
https://lore.kernel.org/lkml/20191010083503.250941866@linuxfoundation.org
https://lore.kernel.org/linux-mmc/AM7PR04MB688507B5B4D84EB266738891F8320@AM7PR04MB6885.eurprd04.prod.outlook.com

Thanks,
Horia

-- >8 --

Subject: [PATCH] arm64: dts: ls1046a: mark crypto engine dma coherent

Crypto engine (CAAM) on LS1046A platform has support for HW coherency,
mark accordingly the DT node.

Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
---
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 025e1f587662..6d4db3e021e8 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -325,6 +325,7 @@
                        ranges = <0x0 0x00 0x1700000 0x100000>;
                        reg = <0x00 0x1700000 0x0 0x100000>;
                        interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>;
+                       dma-coherent;

                        sec_jr0: jr@10000 {
                                compatible = "fsl,sec-v5.4-job-ring",

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query)
  2021-03-03 10:26 ` CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query) Horia Geantă
@ 2021-03-03 12:07   ` Robin Murphy
  2021-03-03 16:04     ` Horia Geantă
  2021-03-03 14:56   ` Sascha Hauer
  1 sibling, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2021-03-03 12:07 UTC (permalink / raw)
  To: Horia Geantă, Sascha Hauer, linux-crypto
  Cc: Aymen Sghaier, kernel, Greg Ungerer, Christoph Hellwig,
	Marek Szyprowski, iommu, Russell King, linux-arm-kernel

On 2021-03-03 10:26, Horia Geantă wrote:
> Adding some people in the loop, maybe they could help in understanding
> why lack of "dma-coherent" property for a HW-coherent device could lead to
> unexpected / strange side effects.
> 
> On 3/1/2021 5:22 PM, Sascha Hauer wrote:
>> Hi All,
>>
>> I am on a Layerscape LS1046a using Linux-5.11. The CAAM driver sometimes
>> crashes during the run-time self tests with:
>>
>>> kernel BUG at drivers/crypto/caam/jr.c:247!
>>> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>> Modules linked in:
>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty #12
>>> Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT)
>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>>> pc : caam_jr_dequeue+0x98/0x57c
>>> lr : caam_jr_dequeue+0x98/0x57c
>>> sp : ffff800010003d50
>>> x29: ffff800010003d50 x28: ffff8000118d4000
>>> x27: ffff8000118d4328 x26: 00000000000001f0
>>> x25: ffff0008022be480 x24: ffff0008022c6410
>>> x23: 00000000000001f1 x22: ffff8000118d4329
>>> x21: 0000000000004d80 x20: 00000000000001f1
>>> x19: 0000000000000001 x18: 0000000000000020
>>> x17: 0000000000000000 x16: 0000000000000015
>>> x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e
>>> x13: 2e2e2e2e2e2e2020 x12: 3030303030303030
>>> x11: ffff800011700a38 x10: 00000000fffff000
>>> x9 : ffff8000100ada30 x8 : ffff8000116a8a38
>>> x7 : 0000000000000001 x6 : 0000000000000000
>>> x5 : 0000000000000000 x4 : 0000000000000000
>>> x3 : 00000000ffffffff x2 : 0000000000000000
>>> x1 : 0000000000000000 x0 : 0000000000001800
>>> Call trace:
>>>   caam_jr_dequeue+0x98/0x57c
>>>   tasklet_action_common.constprop.0+0x164/0x18c
>>>   tasklet_action+0x44/0x54
>>>   __do_softirq+0x160/0x454
>>>   __irq_exit_rcu+0x164/0x16c
>>>   irq_exit+0x1c/0x30
>>>   __handle_domain_irq+0xc0/0x13c
>>>   gic_handle_irq+0x5c/0xf0
>>>   el1_irq+0xb4/0x180
>>>   arch_cpu_idle+0x18/0x30
>>>   default_idle_call+0x3c/0x1c0
>>>   do_idle+0x23c/0x274
>>>   cpu_startup_entry+0x34/0x70
>>>   rest_init+0xdc/0xec
>>>   arch_call_rest_init+0x1c/0x28
>>>   start_kernel+0x4ac/0x4e4
>>> Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000)
>>
>> The driver iterates over the descriptors in the output ring and matches them
>> with the ones it has previously queued. If it doesn't find a matching
>> descriptor it complains with the BUG_ON() seen above. What I see sometimes is
>> that the address in the output ring is 0x0, the job status in this case is
>> 0x40000006 (meaning DECO Invalid KEY command). It seems that the CAAM doesn't
>> write the descriptor address to the output ring at least in some error cases.
>> When we don't have the descriptor address of the failed descriptor we have no
>> way to find it in the list of queued descriptors, thus we also can't find the
>> callback for that descriptor. This looks very unfortunate, anyone else seen
>> this or has an idea what to do about it?
>>
>> I haven't investigated yet which job actually fails and why. Of course that would
>> be my ultimate goal to find that out.
>>
> This looks very similar to an earlier report from Greg.
> He confirmed that adding "dma-coherent" property to the "crypto" DT node
> fixes the issue:
> https://lore.kernel.org/linux-crypto/74f664f5-5433-d322-4789-3c78bdb814d8@kernel.org
> Patch rebased on v5.11 is at the bottom. Does it work for you too?
> 
> What I don't understand (and the reason I've postponed upstreaming it) is
> _why_ exactly this patch is working.
> I would have expected that a HW-coherent device to work fine even without
> the "dma-coherent" DT property in the corresponding node.
> I've found what seems related discussions involving eSDHC, but still I am trying
> to figure out what's happening. I'd really appreciate a clarification on what
> could go wrong (e.g. interactions with SW-based cache management etc.):
> https://lore.kernel.org/linux-mmc/20190916171509.GG25745@shell.armlinux.org.uk
> https://lore.kernel.org/lkml/20191010083503.250941866@linuxfoundation.org
> https://lore.kernel.org/linux-mmc/AM7PR04MB688507B5B4D84EB266738891F8320@AM7PR04MB6885.eurprd04.prod.outlook.com

Consider the flow for a non-coherent DMA_FROM_DEVICE transfer:

1: dma_map_page() - cleans and invalidates caches to prevent any dirty 
lines being written back during the transfer
2: CPU cache may prefetch the buffer back in at any time from now on 
(e.g. if other threads access nearby memory), but that's OK since the 
CPU must not actually access it until after step 4, and clean lines 
don't get written back
3: device writes to buffer - non-coherent so goes straight to DRAM
4: dma_unmap_page() - invalidates caches to discard any clean lines 
speculatively fetched since step 1
5: CPU reads from buffer - fetches new data into cache, all is well

Now consider what can happen if the device is secretly coherent, but the 
DMA API still uses the same non-coherent flow:

1: dma_map_page() - cleans and invalidates caches to prevent any dirty 
lines being written back during the transfer
2: CPU cache *does* happen to prefetch the buffer back in
3: device writes to buffer - write snoop hits in cache so data goes 
there instead of DRAM
4: dma_unmap_page() - invalidates caches, unknowingly destroying new data
5: CPU reads from page - fetches whatever old data was cleaned to DRAM 
in step 1, hilarity ensues.

Note that it still *can* work out OK in the (likely) case that the 
prefetch at step 2 doesn't happen, so in step 3 the snoop doesn't hit 
and the data does end up going to DRAM, or (less likely) the updated 
dirty lines are naturally evicted and written back between steps 3 and 4.

Similarly, if a buffer is mmap'ed to userspace (or remapped for coherent 
DMA) with non-cacheable attributes on the assumption that the device is 
non-coherent - the cacheable alias from the kernel linear map can still 
be present in caches, so coherent device accesses can unexpectedly hit 
that and fail to observe CPU reads and writes going straight to/from 
DRAM via the non-cacheable alias. We hit this case with Panfrost on some 
Amlogic platforms not too long ago.

Hope that helps clarify things.

Robin.

> 
> Thanks,
> Horia
> 
> -- >8 --
> 
> Subject: [PATCH] arm64: dts: ls1046a: mark crypto engine dma coherent
> 
> Crypto engine (CAAM) on LS1046A platform has support for HW coherency,
> mark accordingly the DT node.
> 
> Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
> ---
>   arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index 025e1f587662..6d4db3e021e8 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -325,6 +325,7 @@
>                          ranges = <0x0 0x00 0x1700000 0x100000>;
>                          reg = <0x00 0x1700000 0x0 0x100000>;
>                          interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>;
> +                       dma-coherent;
> 
>                          sec_jr0: jr@10000 {
>                                  compatible = "fsl,sec-v5.4-job-ring",
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query)
  2021-03-03 10:26 ` CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query) Horia Geantă
  2021-03-03 12:07   ` Robin Murphy
@ 2021-03-03 14:56   ` Sascha Hauer
  2021-03-03 16:40     ` Horia Geantă
  1 sibling, 1 reply; 6+ messages in thread
From: Sascha Hauer @ 2021-03-03 14:56 UTC (permalink / raw)
  To: Horia Geantă
  Cc: linux-crypto, Aymen Sghaier, Russell King, Christoph Hellwig,
	iommu, kernel, Robin Murphy, Greg Ungerer, linux-arm-kernel,
	Marek Szyprowski

On Wed, Mar 03, 2021 at 12:26:32PM +0200, Horia Geantă wrote:
> Adding some people in the loop, maybe they could help in understanding
> why lack of "dma-coherent" property for a HW-coherent device could lead to
> unexpected / strange side effects.
> 
> On 3/1/2021 5:22 PM, Sascha Hauer wrote:
> > Hi All,
> > 
> > I am on a Layerscape LS1046a using Linux-5.11. The CAAM driver sometimes
> > crashes during the run-time self tests with:
> > 
> >> kernel BUG at drivers/crypto/caam/jr.c:247!
> >> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> >> Modules linked in:
> >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty #12
> >> Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT)
> >> pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> >> pc : caam_jr_dequeue+0x98/0x57c
> >> lr : caam_jr_dequeue+0x98/0x57c
> >> sp : ffff800010003d50
> >> x29: ffff800010003d50 x28: ffff8000118d4000
> >> x27: ffff8000118d4328 x26: 00000000000001f0
> >> x25: ffff0008022be480 x24: ffff0008022c6410
> >> x23: 00000000000001f1 x22: ffff8000118d4329
> >> x21: 0000000000004d80 x20: 00000000000001f1
> >> x19: 0000000000000001 x18: 0000000000000020
> >> x17: 0000000000000000 x16: 0000000000000015
> >> x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e
> >> x13: 2e2e2e2e2e2e2020 x12: 3030303030303030
> >> x11: ffff800011700a38 x10: 00000000fffff000
> >> x9 : ffff8000100ada30 x8 : ffff8000116a8a38
> >> x7 : 0000000000000001 x6 : 0000000000000000
> >> x5 : 0000000000000000 x4 : 0000000000000000
> >> x3 : 00000000ffffffff x2 : 0000000000000000
> >> x1 : 0000000000000000 x0 : 0000000000001800
> >> Call trace:
> >>  caam_jr_dequeue+0x98/0x57c
> >>  tasklet_action_common.constprop.0+0x164/0x18c
> >>  tasklet_action+0x44/0x54
> >>  __do_softirq+0x160/0x454
> >>  __irq_exit_rcu+0x164/0x16c
> >>  irq_exit+0x1c/0x30
> >>  __handle_domain_irq+0xc0/0x13c
> >>  gic_handle_irq+0x5c/0xf0
> >>  el1_irq+0xb4/0x180
> >>  arch_cpu_idle+0x18/0x30
> >>  default_idle_call+0x3c/0x1c0
> >>  do_idle+0x23c/0x274
> >>  cpu_startup_entry+0x34/0x70
> >>  rest_init+0xdc/0xec
> >>  arch_call_rest_init+0x1c/0x28
> >>  start_kernel+0x4ac/0x4e4
> >> Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000)
> > 
> > The driver iterates over the descriptors in the output ring and matches them
> > with the ones it has previously queued. If it doesn't find a matching
> > descriptor it complains with the BUG_ON() seen above. What I see sometimes is
> > that the address in the output ring is 0x0, the job status in this case is
> > 0x40000006 (meaning DECO Invalid KEY command). It seems that the CAAM doesn't
> > write the descriptor address to the output ring at least in some error cases.
> > When we don't have the descriptor address of the failed descriptor we have no
> > way to find it in the list of queued descriptors, thus we also can't find the
> > callback for that descriptor. This looks very unfortunate, anyone else seen
> > this or has an idea what to do about it?
> > 
> > I haven't investigated yet which job actually fails and why. Of course that would
> > be my ultimate goal to find that out.
> > 
> This looks very similar to an earlier report from Greg.
> He confirmed that adding "dma-coherent" property to the "crypto" DT node
> fixes the issue:
> https://lore.kernel.org/linux-crypto/74f664f5-5433-d322-4789-3c78bdb814d8@kernel.org
> Patch rebased on v5.11 is at the bottom. Does it work for you too?

Indeed this seems to solve it for me as well, you can add my

Tested-by: Sascha Hauer <s.hauer@pengutronix.de>

However, there seem to be two problems: First that "DECO Invalid KEY
command" actually occurs and second that the deqeueue code currently
can't handle a NULL pointer in the output ring.
Do you think that the occurence of a NULL pointer is also a coherency
issue?

Sascha

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query)
  2021-03-03 12:07   ` Robin Murphy
@ 2021-03-03 16:04     ` Horia Geantă
  0 siblings, 0 replies; 6+ messages in thread
From: Horia Geantă @ 2021-03-03 16:04 UTC (permalink / raw)
  To: Robin Murphy, Sascha Hauer, linux-crypto
  Cc: Aymen Sghaier, kernel, Greg Ungerer, Christoph Hellwig,
	Marek Szyprowski, iommu, Russell King, linux-arm-kernel

On 3/3/2021 2:07 PM, Robin Murphy wrote:
> On 2021-03-03 10:26, Horia Geantă wrote:
>> Adding some people in the loop, maybe they could help in understanding
>> why lack of "dma-coherent" property for a HW-coherent device could lead to
>> unexpected / strange side effects.
>>
>> On 3/1/2021 5:22 PM, Sascha Hauer wrote:
>>> Hi All,
>>>
>>> I am on a Layerscape LS1046a using Linux-5.11. The CAAM driver sometimes
>>> crashes during the run-time self tests with:
>>>
>>>> kernel BUG at drivers/crypto/caam/jr.c:247!
>>>> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>> Modules linked in:
>>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty #12
>>>> Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT)
>>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>>>> pc : caam_jr_dequeue+0x98/0x57c
>>>> lr : caam_jr_dequeue+0x98/0x57c
>>>> sp : ffff800010003d50
>>>> x29: ffff800010003d50 x28: ffff8000118d4000
>>>> x27: ffff8000118d4328 x26: 00000000000001f0
>>>> x25: ffff0008022be480 x24: ffff0008022c6410
>>>> x23: 00000000000001f1 x22: ffff8000118d4329
>>>> x21: 0000000000004d80 x20: 00000000000001f1
>>>> x19: 0000000000000001 x18: 0000000000000020
>>>> x17: 0000000000000000 x16: 0000000000000015
>>>> x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e
>>>> x13: 2e2e2e2e2e2e2020 x12: 3030303030303030
>>>> x11: ffff800011700a38 x10: 00000000fffff000
>>>> x9 : ffff8000100ada30 x8 : ffff8000116a8a38
>>>> x7 : 0000000000000001 x6 : 0000000000000000
>>>> x5 : 0000000000000000 x4 : 0000000000000000
>>>> x3 : 00000000ffffffff x2 : 0000000000000000
>>>> x1 : 0000000000000000 x0 : 0000000000001800
>>>> Call trace:
>>>>   caam_jr_dequeue+0x98/0x57c
>>>>   tasklet_action_common.constprop.0+0x164/0x18c
>>>>   tasklet_action+0x44/0x54
>>>>   __do_softirq+0x160/0x454
>>>>   __irq_exit_rcu+0x164/0x16c
>>>>   irq_exit+0x1c/0x30
>>>>   __handle_domain_irq+0xc0/0x13c
>>>>   gic_handle_irq+0x5c/0xf0
>>>>   el1_irq+0xb4/0x180
>>>>   arch_cpu_idle+0x18/0x30
>>>>   default_idle_call+0x3c/0x1c0
>>>>   do_idle+0x23c/0x274
>>>>   cpu_startup_entry+0x34/0x70
>>>>   rest_init+0xdc/0xec
>>>>   arch_call_rest_init+0x1c/0x28
>>>>   start_kernel+0x4ac/0x4e4
>>>> Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000)
>>>
>>> The driver iterates over the descriptors in the output ring and matches them
>>> with the ones it has previously queued. If it doesn't find a matching
>>> descriptor it complains with the BUG_ON() seen above. What I see sometimes is
>>> that the address in the output ring is 0x0, the job status in this case is
>>> 0x40000006 (meaning DECO Invalid KEY command). It seems that the CAAM doesn't
>>> write the descriptor address to the output ring at least in some error cases.
>>> When we don't have the descriptor address of the failed descriptor we have no
>>> way to find it in the list of queued descriptors, thus we also can't find the
>>> callback for that descriptor. This looks very unfortunate, anyone else seen
>>> this or has an idea what to do about it?
>>>
>>> I haven't investigated yet which job actually fails and why. Of course that would
>>> be my ultimate goal to find that out.
>>>
>> This looks very similar to an earlier report from Greg.
>> He confirmed that adding "dma-coherent" property to the "crypto" DT node
>> fixes the issue:
>> https://lore.kernel.org/linux-crypto/74f664f5-5433-d322-4789-3c78bdb814d8@kernel.org
>> Patch rebased on v5.11 is at the bottom. Does it work for you too?
>>
>> What I don't understand (and the reason I've postponed upstreaming it) is
>> _why_ exactly this patch is working.
>> I would have expected that a HW-coherent device to work fine even without
>> the "dma-coherent" DT property in the corresponding node.
>> I've found what seems related discussions involving eSDHC, but still I am trying
>> to figure out what's happening. I'd really appreciate a clarification on what
>> could go wrong (e.g. interactions with SW-based cache management etc.):
>> https://lore.kernel.org/linux-mmc/20190916171509.GG25745@shell.armlinux.org.uk
>> https://lore.kernel.org/lkml/20191010083503.250941866@linuxfoundation.org
>> https://lore.kernel.org/linux-mmc/AM7PR04MB688507B5B4D84EB266738891F8320@AM7PR04MB6885.eurprd04.prod.outlook.com
> 
> Consider the flow for a non-coherent DMA_FROM_DEVICE transfer:
> 
> 1: dma_map_page() - cleans and invalidates caches to prevent any dirty 
> lines being written back during the transfer
> 2: CPU cache may prefetch the buffer back in at any time from now on 
> (e.g. if other threads access nearby memory), but that's OK since the 
> CPU must not actually access it until after step 4, and clean lines 
> don't get written back
> 3: device writes to buffer - non-coherent so goes straight to DRAM
> 4: dma_unmap_page() - invalidates caches to discard any clean lines 
> speculatively fetched since step 1
> 5: CPU reads from buffer - fetches new data into cache, all is well
> 
> Now consider what can happen if the device is secretly coherent, but the 
> DMA API still uses the same non-coherent flow:
> 
> 1: dma_map_page() - cleans and invalidates caches to prevent any dirty 
> lines being written back during the transfer
> 2: CPU cache *does* happen to prefetch the buffer back in
> 3: device writes to buffer - write snoop hits in cache so data goes 
> there instead of DRAM
> 4: dma_unmap_page() - invalidates caches, unknowingly destroying new data
> 5: CPU reads from page - fetches whatever old data was cleaned to DRAM 
> in step 1, hilarity ensues.
> 
> Note that it still *can* work out OK in the (likely) case that the 
> prefetch at step 2 doesn't happen, so in step 3 the snoop doesn't hit 
> and the data does end up going to DRAM, or (less likely) the updated 
> dirty lines are naturally evicted and written back between steps 3 and 4.
> 
> Similarly, if a buffer is mmap'ed to userspace (or remapped for coherent 
> DMA) with non-cacheable attributes on the assumption that the device is 
> non-coherent - the cacheable alias from the kernel linear map can still 
> be present in caches, so coherent device accesses can unexpectedly hit 
> that and fail to observe CPU reads and writes going straight to/from 
> DRAM via the non-cacheable alias. We hit this case with Panfrost on some 
> Amlogic platforms not too long ago.
> 
> Hope that helps clarify things.
> 
Thanks Robin.
Indeed this example shows how things can go awry.

Horia

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query)
  2021-03-03 14:56   ` Sascha Hauer
@ 2021-03-03 16:40     ` Horia Geantă
  2021-03-04 17:02       ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Horia Geantă @ 2021-03-03 16:40 UTC (permalink / raw)
  To: Sascha Hauer
  Cc: linux-crypto, Aymen Sghaier, Russell King, Christoph Hellwig,
	iommu, kernel, Robin Murphy, Greg Ungerer, linux-arm-kernel,
	Marek Szyprowski

On 3/3/2021 4:57 PM, Sascha Hauer wrote:
> On Wed, Mar 03, 2021 at 12:26:32PM +0200, Horia Geantă wrote:
>> Adding some people in the loop, maybe they could help in understanding
>> why lack of "dma-coherent" property for a HW-coherent device could lead to
>> unexpected / strange side effects.
>>
>> On 3/1/2021 5:22 PM, Sascha Hauer wrote:
>>> Hi All,
>>>
>>> I am on a Layerscape LS1046a using Linux-5.11. The CAAM driver sometimes
>>> crashes during the run-time self tests with:
>>>
>>>> kernel BUG at drivers/crypto/caam/jr.c:247!
>>>> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>> Modules linked in:
>>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty #12
>>>> Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT)
>>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>>>> pc : caam_jr_dequeue+0x98/0x57c
>>>> lr : caam_jr_dequeue+0x98/0x57c
>>>> sp : ffff800010003d50
>>>> x29: ffff800010003d50 x28: ffff8000118d4000
>>>> x27: ffff8000118d4328 x26: 00000000000001f0
>>>> x25: ffff0008022be480 x24: ffff0008022c6410
>>>> x23: 00000000000001f1 x22: ffff8000118d4329
>>>> x21: 0000000000004d80 x20: 00000000000001f1
>>>> x19: 0000000000000001 x18: 0000000000000020
>>>> x17: 0000000000000000 x16: 0000000000000015
>>>> x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e
>>>> x13: 2e2e2e2e2e2e2020 x12: 3030303030303030
>>>> x11: ffff800011700a38 x10: 00000000fffff000
>>>> x9 : ffff8000100ada30 x8 : ffff8000116a8a38
>>>> x7 : 0000000000000001 x6 : 0000000000000000
>>>> x5 : 0000000000000000 x4 : 0000000000000000
>>>> x3 : 00000000ffffffff x2 : 0000000000000000
>>>> x1 : 0000000000000000 x0 : 0000000000001800
>>>> Call trace:
>>>>  caam_jr_dequeue+0x98/0x57c
>>>>  tasklet_action_common.constprop.0+0x164/0x18c
>>>>  tasklet_action+0x44/0x54
>>>>  __do_softirq+0x160/0x454
>>>>  __irq_exit_rcu+0x164/0x16c
>>>>  irq_exit+0x1c/0x30
>>>>  __handle_domain_irq+0xc0/0x13c
>>>>  gic_handle_irq+0x5c/0xf0
>>>>  el1_irq+0xb4/0x180
>>>>  arch_cpu_idle+0x18/0x30
>>>>  default_idle_call+0x3c/0x1c0
>>>>  do_idle+0x23c/0x274
>>>>  cpu_startup_entry+0x34/0x70
>>>>  rest_init+0xdc/0xec
>>>>  arch_call_rest_init+0x1c/0x28
>>>>  start_kernel+0x4ac/0x4e4
>>>> Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000)
>>>
>>> The driver iterates over the descriptors in the output ring and matches them
>>> with the ones it has previously queued. If it doesn't find a matching
>>> descriptor it complains with the BUG_ON() seen above. What I see sometimes is
>>> that the address in the output ring is 0x0, the job status in this case is
>>> 0x40000006 (meaning DECO Invalid KEY command). It seems that the CAAM doesn't
>>> write the descriptor address to the output ring at least in some error cases.
>>> When we don't have the descriptor address of the failed descriptor we have no
>>> way to find it in the list of queued descriptors, thus we also can't find the
>>> callback for that descriptor. This looks very unfortunate, anyone else seen
>>> this or has an idea what to do about it?
>>>
>>> I haven't investigated yet which job actually fails and why. Of course that would
>>> be my ultimate goal to find that out.
>>>
>> This looks very similar to an earlier report from Greg.
>> He confirmed that adding "dma-coherent" property to the "crypto" DT node
>> fixes the issue:
>> https://lore.kernel.org/linux-crypto/74f664f5-5433-d322-4789-3c78bdb814d8@kernel.org
>> Patch rebased on v5.11 is at the bottom. Does it work for you too?
> 
> Indeed this seems to solve it for me as well, you can add my
> 
> Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
> 
Thanks!
I'll append the tag to the formally submitted patch.

> However, there seem to be two problems: First that "DECO Invalid KEY
> command" actually occurs and second that the deqeueue code currently
> can't handle a NULL pointer in the output ring.
Currently the dequeue code BUGs not only for "NULL pointer", but for any
IOVA in the output ring that is not matched with an entry in the "shadow"
(SW) ring.
Here the BUG_ON() should be replaced with WARN_ON since not finding a match
means driver can't go to the "SW context" and eventually call complete()
to wake up the crypto API user. In many cases the user relies on
crypto_wait_req(), which does not time out and is not killable.

> Do you think that the occurence of a NULL pointer is also a coherency
> issue?
> 
I strongly believe there's a single problem because the issue goes away
when the patch is applied, even though I haven't figured out what is
the exact place / data structure that gets corrupted.

One theory is that corruption occurs in the input ring:
-CPU sets up correctly the input ring entry
-device doesn't see the "fresh" data, reading 0x0 for the descriptor address
-device reads the descriptor commands from address 0x0 and issues
"DECO invalid KEY command" (note that KEY command opcode is b'00000, so reading
all zeros from address 0x0 would lead to this error)

But then the input & output rings are allocated using dma_alloc_coherent(),
so I'll need to check if lack of "dma-coherent" DT property has the same
effect on consistent DMA mappings as on streaming DMA mappings.

Horia

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query)
  2021-03-03 16:40     ` Horia Geantă
@ 2021-03-04 17:02       ` Robin Murphy
  0 siblings, 0 replies; 6+ messages in thread
From: Robin Murphy @ 2021-03-04 17:02 UTC (permalink / raw)
  To: Horia Geantă, Sascha Hauer
  Cc: linux-crypto, Aymen Sghaier, Russell King, Christoph Hellwig,
	iommu, kernel, Greg Ungerer, linux-arm-kernel, Marek Szyprowski

On 2021-03-03 16:40, Horia Geantă wrote:
> On 3/3/2021 4:57 PM, Sascha Hauer wrote:
>> On Wed, Mar 03, 2021 at 12:26:32PM +0200, Horia Geantă wrote:
>>> Adding some people in the loop, maybe they could help in understanding
>>> why lack of "dma-coherent" property for a HW-coherent device could lead to
>>> unexpected / strange side effects.
>>>
>>> On 3/1/2021 5:22 PM, Sascha Hauer wrote:
>>>> Hi All,
>>>>
>>>> I am on a Layerscape LS1046a using Linux-5.11. The CAAM driver sometimes
>>>> crashes during the run-time self tests with:
>>>>
>>>>> kernel BUG at drivers/crypto/caam/jr.c:247!
>>>>> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>> Modules linked in:
>>>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty #12
>>>>> Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT)
>>>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>>>>> pc : caam_jr_dequeue+0x98/0x57c
>>>>> lr : caam_jr_dequeue+0x98/0x57c
>>>>> sp : ffff800010003d50
>>>>> x29: ffff800010003d50 x28: ffff8000118d4000
>>>>> x27: ffff8000118d4328 x26: 00000000000001f0
>>>>> x25: ffff0008022be480 x24: ffff0008022c6410
>>>>> x23: 00000000000001f1 x22: ffff8000118d4329
>>>>> x21: 0000000000004d80 x20: 00000000000001f1
>>>>> x19: 0000000000000001 x18: 0000000000000020
>>>>> x17: 0000000000000000 x16: 0000000000000015
>>>>> x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e
>>>>> x13: 2e2e2e2e2e2e2020 x12: 3030303030303030
>>>>> x11: ffff800011700a38 x10: 00000000fffff000
>>>>> x9 : ffff8000100ada30 x8 : ffff8000116a8a38
>>>>> x7 : 0000000000000001 x6 : 0000000000000000
>>>>> x5 : 0000000000000000 x4 : 0000000000000000
>>>>> x3 : 00000000ffffffff x2 : 0000000000000000
>>>>> x1 : 0000000000000000 x0 : 0000000000001800
>>>>> Call trace:
>>>>>   caam_jr_dequeue+0x98/0x57c
>>>>>   tasklet_action_common.constprop.0+0x164/0x18c
>>>>>   tasklet_action+0x44/0x54
>>>>>   __do_softirq+0x160/0x454
>>>>>   __irq_exit_rcu+0x164/0x16c
>>>>>   irq_exit+0x1c/0x30
>>>>>   __handle_domain_irq+0xc0/0x13c
>>>>>   gic_handle_irq+0x5c/0xf0
>>>>>   el1_irq+0xb4/0x180
>>>>>   arch_cpu_idle+0x18/0x30
>>>>>   default_idle_call+0x3c/0x1c0
>>>>>   do_idle+0x23c/0x274
>>>>>   cpu_startup_entry+0x34/0x70
>>>>>   rest_init+0xdc/0xec
>>>>>   arch_call_rest_init+0x1c/0x28
>>>>>   start_kernel+0x4ac/0x4e4
>>>>> Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000)
>>>>
>>>> The driver iterates over the descriptors in the output ring and matches them
>>>> with the ones it has previously queued. If it doesn't find a matching
>>>> descriptor it complains with the BUG_ON() seen above. What I see sometimes is
>>>> that the address in the output ring is 0x0, the job status in this case is
>>>> 0x40000006 (meaning DECO Invalid KEY command). It seems that the CAAM doesn't
>>>> write the descriptor address to the output ring at least in some error cases.
>>>> When we don't have the descriptor address of the failed descriptor we have no
>>>> way to find it in the list of queued descriptors, thus we also can't find the
>>>> callback for that descriptor. This looks very unfortunate, anyone else seen
>>>> this or has an idea what to do about it?
>>>>
>>>> I haven't investigated yet which job actually fails and why. Of course that would
>>>> be my ultimate goal to find that out.
>>>>
>>> This looks very similar to an earlier report from Greg.
>>> He confirmed that adding "dma-coherent" property to the "crypto" DT node
>>> fixes the issue:
>>> https://lore.kernel.org/linux-crypto/74f664f5-5433-d322-4789-3c78bdb814d8@kernel.org
>>> Patch rebased on v5.11 is at the bottom. Does it work for you too?
>>
>> Indeed this seems to solve it for me as well, you can add my
>>
>> Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
>>
> Thanks!
> I'll append the tag to the formally submitted patch.
> 
>> However, there seem to be two problems: First that "DECO Invalid KEY
>> command" actually occurs and second that the deqeueue code currently
>> can't handle a NULL pointer in the output ring.
> Currently the dequeue code BUGs not only for "NULL pointer", but for any
> IOVA in the output ring that is not matched with an entry in the "shadow"
> (SW) ring.
> Here the BUG_ON() should be replaced with WARN_ON since not finding a match
> means driver can't go to the "SW context" and eventually call complete()
> to wake up the crypto API user. In many cases the user relies on
> crypto_wait_req(), which does not time out and is not killable.
> 
>> Do you think that the occurence of a NULL pointer is also a coherency
>> issue?
>>
> I strongly believe there's a single problem because the issue goes away
> when the patch is applied, even though I haven't figured out what is
> the exact place / data structure that gets corrupted.
> 
> One theory is that corruption occurs in the input ring:
> -CPU sets up correctly the input ring entry
> -device doesn't see the "fresh" data, reading 0x0 for the descriptor address
> -device reads the descriptor commands from address 0x0 and issues
> "DECO invalid KEY command" (note that KEY command opcode is b'00000, so reading
> all zeros from address 0x0 would lead to this error)
> 
> But then the input & output rings are allocated using dma_alloc_coherent(),
> so I'll need to check if lack of "dma-coherent" DT property has the same
> effect on consistent DMA mappings as on streaming DMA mappings.

It certainly can, at least on arm64 where coherent buffers are remapped 
in vmalloc rather than changing the linear map attributes in-place. In 
that case the dma_alloc_coherent() flow looks like this:

1: clean and invalidate pages by (cacheable) linear map address
2: set up non-cacheable remap
3: write zeros via non-cacheable mapping
...
4: CPU writes descriptor via non-cacheable mapping
...
5: device reads descriptor

If the cacheable alias is prefetched back in between steps 1 and 4 (e.g. 
from another thread accessing an adjacent page by linear map address), 
the CPU writes will (usually) still bypass the caches and go straight to 
DRAM, so if the device read unexpectedly snoops it can return the older 
data from the cache. This will normally be the zeros from step 3, unless 
you're extremely unlucky and the prefetch happened even before that. As 
I mentioned, this is exactly what we were hitting with Panfrost where 
GPU coherency wasn't described.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-04 17:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210301152231.GC5549@pengutronix.de>
2021-03-03 10:26 ` CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query) Horia Geantă
2021-03-03 12:07   ` Robin Murphy
2021-03-03 16:04     ` Horia Geantă
2021-03-03 14:56   ` Sascha Hauer
2021-03-03 16:40     ` Horia Geantă
2021-03-04 17:02       ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).