All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
@ 2013-12-06 21:36 Sebastian Reichel
  2013-12-06 22:27 ` Tony Lindgren
  0 siblings, 1 reply; 87+ messages in thread
From: Sebastian Reichel @ 2013-12-06 21:36 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-omap

[-- Attachment #1: Type: text/plain, Size: 8058 bytes --]

Hi Tony,

Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
relevant kernel output below. Disabling the AES module in the
omap3-n900.dts with status = "disabled" fixed the boot for me.

-- Sebastian

[    2.082427] Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0c5048
[    2.090484] Internal error: : 1028 [#1] SMP ARM
[    2.095245] Modules linked in:
[    2.098480] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0-rc2+ #446
[    2.106353] task: ce057b40 ti: ce058000 task.ti: ce058000
[    2.112030] PC is at omap_aes_dma_stop+0x24/0x3c
[    2.116882] LR is at omap_aes_probe+0x1cc/0x584
[    2.121643] pc : [<c039cd54>]    lr : [<c039d3ac>]    psr: 60000113
[    2.121643] sp : ce059e20  ip : ce0b4ee0  fp : 00000000
[    2.133728] r10: c0573ae8  r9 : c0749508  r8 : 00000000
[    2.139221] r7 : ce0b4e00  r6 : 00000000  r5 : ce0b4e10  r4 : ce274890
[    2.146057] r3 : fa0c5048  r2 : 00000048  r1 : 0000002c  r0 : ce274890
[    2.152923] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    2.160614] Control: 10c5387d  Table: 80004019  DAC: 00000015
[    2.166687] Process swapper/0 (pid: 1, stack limit = 0xce058248)
[    2.173004] Stack: (0xce059e20 to 0xce05a000)
[    2.177612] 9e20: c0749508 0000a1ff 00000000 c016cd8c c06b5a06 ce2a45f0 ce2a4570 ce0b5fb0
[    2.186218] 9e40: 00000000 480c5000 480c504f c0abe4e4 00000200 00000000 00000000 00000000
[    2.194824] 9e60: ce0b4e10 ce0b4e10 c082da3c c082da3c c02b8c70 c077c610 c0749508 00000000
[    2.203430] 9e80: 00000000 c02b9e7c c02b9e64 ce0b4e10 00000000 c02b8b20 ce0b4e10 ce0b4e44
[    2.212036] 9ea0: c082da3c c02b8cd8 00000000 ce059eb8 c082da3c c02b7408 ce079edc ce0b1a34
[    2.220642] 9ec0: c082da3c c082da3c ce2a0280 00000000 c08158d8 c02b8358 c0663405 c0663405
[    2.229248] 9ee0: 00000073 c082da3c c079e4e8 c07ab3bc c0844340 c02b9334 00000000 00000006
[    2.237823] 9f00: c079e4e8 c0008920 c067f6bf c0ac7c6b 00000000 c0712e28 00000000 00000000
[    2.246429] 9f20: c0712e38 ce059f38 00000093 c0ac7c82 00000000 c0058994 00000000 c07130e8
[    2.255035] 9f40: c07127b8 00000093 00000006 00000006 00000001 00000006 00000006 c079e4e8
[    2.263610] 9f60: c07ab3bc c0844340 00000093 c0749508 c079e4f4 c0749c64 00000006 00000006
[    2.272247] 9f80: c0749508 00000000 00000000 c0517e2c 00000000 00000000 00000000 00000000
[    2.280853] 9fa0: 00000000 c0517e34 00000000 c000dfb8 00000000 00000000 00000000 00000000
[    2.289459] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.298065] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
[    2.306671] [<c039cd54>] (omap_aes_dma_stop+0x24/0x3c) from [<c039d3ac>] (omap_aes_probe+0x1cc/0x584)
[    2.316406] [<c039d3ac>] (omap_aes_probe+0x1cc/0x584) from [<c02b9e7c>] (platform_drv_probe+0x18/0x48)
[    2.326232] [<c02b9e7c>] (platform_drv_probe+0x18/0x48) from [<c02b8b20>] (driver_probe_device+0xb0/0x200)
[    2.336425] [<c02b8b20>] (driver_probe_device+0xb0/0x200) from [<c02b8cd8>] (__driver_attach+0x68/0x8c)
[    2.346343] [<c02b8cd8>] (__driver_attach+0x68/0x8c) from [<c02b7408>] (bus_for_each_dev+0x50/0x88)
[    2.355865] [<c02b7408>] (bus_for_each_dev+0x50/0x88) from [<c02b8358>] (bus_add_driver+0xcc/0x1c8)
[    2.365356] [<c02b8358>] (bus_add_driver+0xcc/0x1c8) from [<c02b9334>] (driver_register+0x9c/0xe0)
[    2.374816] [<c02b9334>] (driver_register+0x9c/0xe0) from [<c0008920>] (do_one_initcall+0x98/0x140)
[    2.384368] [<c0008920>] (do_one_initcall+0x98/0x140) from [<c0749c64>] (kernel_init_freeable+0x16c/0x23c)
[    2.394592] [<c0749c64>] (kernel_init_freeable+0x16c/0x23c) from [<c0517e34>] (kernel_init+0x8/0x100)
[    2.404296] [<c0517e34>] (kernel_init+0x8/0x100) from [<c000dfb8>] (ret_from_fork+0x14/0x3c)
[    2.413177] Code: e1811002 e5932020 e590300c e0833002 (e593c000) 
[    2.419586] ---[ end trace 12268ed9c6cdcae7 ]---
[    2.424468] In-band Error seen by MPU  at address 0
[    2.429626] ------------[ cut here ]------------
[    2.434509] WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_smx.c:162 omap3_l3_app_irq+0xdc/0x124()
[    2.443817] Modules linked in:
[    2.447052] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D W    3.13.0-rc2+ #446
[    2.454956] [<c00158b4>] (unwind_backtrace+0x0/0xe0) from [<c0011b24>] (show_stack+0x10/0x14)
[    2.463958] [<c0011b24>] (show_stack+0x10/0x14) from [<c051c30c>] (dump_stack+0x68/0x84)
[    2.472473] [<c051c30c>] (dump_stack+0x68/0x84) from [<c003db64>] (warn_slowpath_common+0x64/0x88)
[    2.481933] [<c003db64>] (warn_slowpath_common+0x64/0x88) from [<c003dba0>] (warn_slowpath_null+0x18/0x1c)
[    2.492095] [<c003dba0>] (warn_slowpath_null+0x18/0x1c) from [<c023d2c0>] (omap3_l3_app_irq+0xdc/0x124)
[    2.502014] [<c023d2c0>] (omap3_l3_app_irq+0xdc/0x124) from [<c007c718>] (handle_irq_event_percpu+0x60/0x218)
[    2.512420] [<c007c718>] (handle_irq_event_percpu+0x60/0x218) from [<c007c914>] (handle_irq_event+0x44/0x64)
[    2.522796] [<c007c914>] (handle_irq_event+0x44/0x64) from [<c007f4e0>] (handle_level_irq+0xd4/0xfc)
[    2.532409] [<c007f4e0>] (handle_level_irq+0xd4/0xfc) from [<c007c14c>] (generic_handle_irq+0x20/0x30)
[    2.542236] [<c007c14c>] (generic_handle_irq+0x20/0x30) from [<c000ee3c>] (handle_IRQ+0x64/0x8c)
[    2.551513] [<c000ee3c>] (handle_IRQ+0x64/0x8c) from [<c00085dc>] (omap3_intc_handle_irq+0x60/0x74)
[    2.561035] [<c00085dc>] (omap3_intc_handle_irq+0x60/0x74) from [<c0012640>] (__irq_svc+0x40/0x50)
[    2.570404] Exception stack(0xce059c28 to 0xce059c70)
[    2.575714] 9c20:                   00000000 00000005 c07bc608 00000504 ce05bbc0 ce058000
[    2.584289] 9c40: 00000000 00000001 0000000b 00000000 00000000 c0660359 00000000 ce059c70
[    2.592895] 9c60: c009a964 c009aa6c 40000113 ffffffff
[    2.598236] [<c0012640>] (__irq_svc+0x40/0x50) from [<c009aa6c>] (acct_collect+0x1b4/0x1b8)
[    2.607025] [<c009aa6c>] (acct_collect+0x1b4/0x1b8) from [<c003fce4>] (do_exit+0x1fc/0x954)
[    2.615844] [<c003fce4>] (do_exit+0x1fc/0x954) from [<c0011ebc>] (die+0x394/0x410)
[    2.623809] [<c0011ebc>] (die+0x394/0x410) from [<c000845c>] (do_DataAbort+0x84/0x98)
[    2.632049] [<c000845c>] (do_DataAbort+0x84/0x98) from [<c00125d8>] (__dabt_svc+0x38/0x60)
[    2.640747] Exception stack(0xce059dd8 to 0xce059e20)
[    2.646087] 9dc0:                                                       ce274890 0000002c
[    2.654663] 9de0: 00000048 fa0c5048 ce274890 ce0b4e10 00000000 ce0b4e00 00000000 c0749508
[    2.663299] 9e00: c0573ae8 00000000 ce0b4ee0 ce059e20 c039d3ac c039cd54 60000113 ffffffff
[    2.671874] [<c00125d8>] (__dabt_svc+0x38/0x60) from [<c039cd54>] (omap_aes_dma_stop+0x24/0x3c)
[    2.681060] [<c039cd54>] (omap_aes_dma_stop+0x24/0x3c) from [<c039d3ac>] (omap_aes_probe+0x1cc/0x584)
[    2.690734] [<c039d3ac>] (omap_aes_probe+0x1cc/0x584) from [<c02b9e7c>] (platform_drv_probe+0x18/0x48)
[    2.700500] [<c02b9e7c>] (platform_drv_probe+0x18/0x48) from [<c02b8b20>] (driver_probe_device+0xb0/0x200)
[    2.710662] [<c02b8b20>] (driver_probe_device+0xb0/0x200) from [<c02b8cd8>] (__driver_attach+0x68/0x8c)
[    2.720581] [<c02b8cd8>] (__driver_attach+0x68/0x8c) from [<c02b7408>] (bus_for_each_dev+0x50/0x88)
[    2.730133] [<c02b7408>] (bus_for_each_dev+0x50/0x88) from [<c02b8358>] (bus_add_driver+0xcc/0x1c8)
[    2.739654] [<c02b8358>] (bus_add_driver+0xcc/0x1c8) from [<c02b9334>] (driver_register+0x9c/0xe0)
[    2.749084] [<c02b9334>] (driver_register+0x9c/0xe0) from [<c0008920>] (do_one_initcall+0x98/0x140)
[    2.758605] [<c0008920>] (do_one_initcall+0x98/0x140) from [<c0749c64>] (kernel_init_freeable+0x16c/0x23c)
[    2.768798] [<c0749c64>] (kernel_init_freeable+0x16c/0x23c) from [<c0517e34>] (kernel_init+0x8/0x100)
[    2.778472] [<c0517e34>] (kernel_init+0x8/0x100) from [<c000dfb8>] (ret_from_fork+0x14/0x3c)
[    2.787353] ---[ end trace 12268ed9c6cdcae8 ]---
[    2.792449] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-06 21:36 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot Sebastian Reichel
@ 2013-12-06 22:27 ` Tony Lindgren
  2013-12-07  0:00   ` Sebastian Reichel
  2013-12-08 14:13   ` Aaro Koskinen
  0 siblings, 2 replies; 87+ messages in thread
From: Tony Lindgren @ 2013-12-06 22:27 UTC (permalink / raw)
  To: linux-omap; +Cc: Aaro Koskinen

* Sebastian Reichel <sre@debian.org> [131206 13:37]:
> Hi Tony,
> 
> Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> relevant kernel output below. Disabling the AES module in the
> omap3-n900.dts with status = "disabled" fixed the boot for me.

OK thanks for letting me know. How about the following patch to
fix it?

Aaro, does this work for you on n9(50)? I'd assume n9(50) needs a
similar patch too.

Regards,

Tony

8< ------------------------------
From: Tony Lindgren <tony@atomide.com>
Date: Fri, 6 Dec 2013 14:20:17 -0800
Subject: [PATCH] ARM: dts: Fix booting for secure omaps

Commit 7ce93f3 (ARM: OMAP2+: Fix more missing data for omap3.dtsi file)
fixed missing device tree data for omaps, but did not account for some of the
hardware modules being inaccessible for secure omaps. This causes the
following error on secure omaps:

Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0c5048
SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0-rc2+ #446
task: ce057b40 ti: ce058000 task.ti: ce058000
PC is at omap_aes_dma_stop+0x24/0x3c
LR is at omap_aes_probe+0x1cc/0x584
   psr: 60000113
sp : ce059e20  ip : ce0b4ee0  fp : 00000000
r10: c0573ae8  r9 : c0749508  r8 : 00000000
r7 : ce0b4e00  r6 : 00000000  r5 : ce0b4e10  r4 : ce274890
r3 : fa0c5048  r2 : 00000048  r1 : 0000002c  r0 : ce274890
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 80004019  DAC: 00000015
Process swapper/0 (pid: 1, stack limit = 0xce058248)
Stack: (0xce059e20 to 0xce05a000)
9e20: c0749508 0000a1ff 00000000 c016cd8c c06b5a06 ce2a45f0 ce2a4570 ce0b5fb0
9e40: 00000000 480c5000 480c504f c0abe4e4 00000200 00000000 00000000 00000000
9e60: ce0b4e10 ce0b4e10 c082da3c c082da3c c02b8c70 c077c610 c0749508 00000000
9e80: 00000000 c02b9e7c c02b9e64 ce0b4e10 00000000 c02b8b20 ce0b4e10 ce0b4e44
9ea0: c082da3c c02b8cd8 00000000 ce059eb8 c082da3c c02b7408 ce079edc ce0b1a34
9ec0: c082da3c c082da3c ce2a0280 00000000 c08158d8 c02b8358 c0663405 c0663405
9ee0: 00000073 c082da3c c079e4e8 c07ab3bc c0844340 c02b9334 00000000 00000006
9f00: c079e4e8 c0008920 c067f6bf c0ac7c6b 00000000 c0712e28 00000000 00000000
9f20: c0712e38 ce059f38 00000093 c0ac7c82 00000000 c0058994 00000000 c07130e8
9f40: c07127b8 00000093 00000006 00000006 00000001 00000006 00000006 c079e4e8
9f60: c07ab3bc c0844340 00000093 c0749508 c079e4f4 c0749c64 00000006 00000006
9f80: c0749508 00000000 00000000 c0517e2c 00000000 00000000 00000000 00000000
9fa0: 00000000 c0517e34 00000000 c000dfb8 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
(omap_aes_probe+0x1cc/0x584)
(platform_drv_probe+0x18/0x48)
(driver_probe_device+0xb0/0x200)
(__driver_attach+0x68/0x8c)
(bus_for_each_dev+0x50/0x88)
(bus_add_driver+0xcc/0x1c8)
(driver_register+0x9c/0xe0)
(do_one_initcall+0x98/0x140)
(kernel_init_freeable+0x16c/0x23c)
(kernel_init+0x8/0x100)
(ret_from_fork+0x14/0x3c)
Code: e1811002 e5932020 e590300c e0833002 (e593c000)

Let's fix the issue by adding omap34xx-hs.dtsi and omap36xx-hs.dtsi and make
n900, n9 and n950 to use them.

Reported-by: Sebastian Reichel <sre@debian.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>

--- a/arch/arm/boot/dts/omap3-n900.dts
+++ b/arch/arm/boot/dts/omap3-n900.dts
@@ -9,7 +9,7 @@
 
 /dts-v1/;
 
-#include "omap34xx.dtsi"
+#include "omap34xx-hs.dtsi"
 
 / {
 	model = "Nokia N900";
--- a/arch/arm/boot/dts/omap3-n950-n9.dtsi
+++ b/arch/arm/boot/dts/omap3-n950-n9.dtsi
@@ -8,7 +8,7 @@
  * published by the Free Software Foundation.
  */
 
-#include "omap36xx.dtsi"
+#include "omap36xx-hs.dtsi"
 
 / {
 	cpus {
--- /dev/null
+++ b/arch/arm/boot/dts/omap34xx-hs.dtsi
@@ -0,0 +1,8 @@
+/* Disabled modules for secure omaps */
+
+#include "omap34xx.dtsi"
+
+/* Secure omaps have some devices inaccessible depending on the firmware */
+&aes {
+	status = "disabled";
+};
--- /dev/null
+++ b/arch/arm/boot/dts/omap36xx-hs.dtsi
@@ -0,0 +1,8 @@
+/* Disabled modules for secure omaps */
+
+#include "omap36xx.dtsi"
+
+/* Secure omaps have some devices inaccessible depending on the firmware */
+&aes {
+	status = "disabled";
+};

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-06 22:27 ` Tony Lindgren
@ 2013-12-07  0:00   ` Sebastian Reichel
  2013-12-07  0:38     ` Tony Lindgren
                       ` (2 more replies)
  2013-12-08 14:13   ` Aaro Koskinen
  1 sibling, 3 replies; 87+ messages in thread
From: Sebastian Reichel @ 2013-12-07  0:00 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-omap, Aaro Koskinen, Pali Rohár

[-- Attachment #1: Type: text/plain, Size: 3202 bytes --]

On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > relevant kernel output below. Disabling the AES module in the
> > omap3-n900.dts with status = "disabled" fixed the boot for me.
> 
> OK thanks for letting me know. How about the following patch to
> fix it?

That's basically what I did to fix the problem.

I guess the proper fix would be a runtime check if the device can be
accessed (if that's possible). AFAIK it is possible to use the AES
module on the N900 if the bootloader is slightly patched.

Pali, can you elaborate more about this? I've seen, that you added
a section about this on [0].

[0] http://elinux.org/N900#M-Shield

-- Sebastian

> Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0c5048
> SMP ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0-rc2+ #446
> task: ce057b40 ti: ce058000 task.ti: ce058000
> PC is at omap_aes_dma_stop+0x24/0x3c
> LR is at omap_aes_probe+0x1cc/0x584
>    psr: 60000113
> sp : ce059e20  ip : ce0b4ee0  fp : 00000000
> r10: c0573ae8  r9 : c0749508  r8 : 00000000
> r7 : ce0b4e00  r6 : 00000000  r5 : ce0b4e10  r4 : ce274890
> r3 : fa0c5048  r2 : 00000048  r1 : 0000002c  r0 : ce274890
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> Control: 10c5387d  Table: 80004019  DAC: 00000015
> Process swapper/0 (pid: 1, stack limit = 0xce058248)
> Stack: (0xce059e20 to 0xce05a000)
> 9e20: c0749508 0000a1ff 00000000 c016cd8c c06b5a06 ce2a45f0 ce2a4570 ce0b5fb0
> 9e40: 00000000 480c5000 480c504f c0abe4e4 00000200 00000000 00000000 00000000
> 9e60: ce0b4e10 ce0b4e10 c082da3c c082da3c c02b8c70 c077c610 c0749508 00000000
> 9e80: 00000000 c02b9e7c c02b9e64 ce0b4e10 00000000 c02b8b20 ce0b4e10 ce0b4e44
> 9ea0: c082da3c c02b8cd8 00000000 ce059eb8 c082da3c c02b7408 ce079edc ce0b1a34
> 9ec0: c082da3c c082da3c ce2a0280 00000000 c08158d8 c02b8358 c0663405 c0663405
> 9ee0: 00000073 c082da3c c079e4e8 c07ab3bc c0844340 c02b9334 00000000 00000006
> 9f00: c079e4e8 c0008920 c067f6bf c0ac7c6b 00000000 c0712e28 00000000 00000000
> 9f20: c0712e38 ce059f38 00000093 c0ac7c82 00000000 c0058994 00000000 c07130e8
> 9f40: c07127b8 00000093 00000006 00000006 00000001 00000006 00000006 c079e4e8
> 9f60: c07ab3bc c0844340 00000093 c0749508 c079e4f4 c0749c64 00000006 00000006
> 9f80: c0749508 00000000 00000000 c0517e2c 00000000 00000000 00000000 00000000
> 9fa0: 00000000 c0517e34 00000000 c000dfb8 00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
> (omap_aes_probe+0x1cc/0x584)
> (platform_drv_probe+0x18/0x48)
> (driver_probe_device+0xb0/0x200)
> (__driver_attach+0x68/0x8c)
> (bus_for_each_dev+0x50/0x88)
> (bus_add_driver+0xcc/0x1c8)
> (driver_register+0x9c/0xe0)
> (do_one_initcall+0x98/0x140)
> (kernel_init_freeable+0x16c/0x23c)
> (kernel_init+0x8/0x100)
> (ret_from_fork+0x14/0x3c)
> Code: e1811002 e5932020 e590300c e0833002 (e593c000)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-07  0:00   ` Sebastian Reichel
@ 2013-12-07  0:38     ` Tony Lindgren
  2013-12-07  8:18     ` Pali Rohár
  2015-02-09 11:55     ` 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot Pali Rohár
  2 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2013-12-07  0:38 UTC (permalink / raw)
  To: linux-omap, Aaro Koskinen, Pali Rohár

* Sebastian Reichel <sre@ring0.de> [131206 16:01]:
> On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > > relevant kernel output below. Disabling the AES module in the
> > > omap3-n900.dts with status = "disabled" fixed the boot for me.
> > 
> > OK thanks for letting me know. How about the following patch to
> > fix it?
> 
> That's basically what I did to fix the problem.

Looks like omap_hwmod_3xxx_data.c enables omap34xx_gp_hwmod_ocp_ifs
and omap36xx_gp_hwmod_ocp_ifs for timer12, sham and aes only for GP
devices. When booted in the legacy mode those won't get initialized
at all for HS devices.

Sounds like the updated patch below enhanced to to cover also timer12
and sham is the way to go for this fix though to have things behave
the same way for legacy booting and DT based booting. Or have I missed
something?

> I guess the proper fix would be a runtime check if the device can be
> accessed (if that's possible). AFAIK it is possible to use the AES
> module on the N900 if the bootloader is slightly patched.
>
> Pali, can you elaborate more about this? I've seen, that you added
> a section about this on [0].

Hmm yeah I don't know about that either so more info would be good to
have. If they're somehow accessible then some kind of runtime checking
of the bus firewall configuration might be doable.

Regards,

Tony

 
> [0] http://elinux.org/N900#M-Shield


8< ---------------------------------
From: Tony Lindgren <tony@atomide.com>
Date: Fri, 6 Dec 2013 14:20:17 -0800
Subject: [PATCH] ARM: dts: Fix booting for secure omaps

Commit 7ce93f3 (ARM: OMAP2+: Fix more missing data for omap3.dtsi file)
fixed missing device tree data for omaps, but did not account for some of the
hardware modules being inaccessible for secure omaps. This causes the
following error on secure omaps:

Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0c5048
SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0-rc2+ #446
task: ce057b40 ti: ce058000 task.ti: ce058000
PC is at omap_aes_dma_stop+0x24/0x3c
LR is at omap_aes_probe+0x1cc/0x584
   psr: 60000113
sp : ce059e20  ip : ce0b4ee0  fp : 00000000
r10: c0573ae8  r9 : c0749508  r8 : 00000000
r7 : ce0b4e00  r6 : 00000000  r5 : ce0b4e10  r4 : ce274890
r3 : fa0c5048  r2 : 00000048  r1 : 0000002c  r0 : ce274890
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 80004019  DAC: 00000015
Process swapper/0 (pid: 1, stack limit = 0xce058248)
Stack: (0xce059e20 to 0xce05a000)
9e20: c0749508 0000a1ff 00000000 c016cd8c c06b5a06 ce2a45f0 ce2a4570 ce0b5fb0
9e40: 00000000 480c5000 480c504f c0abe4e4 00000200 00000000 00000000 00000000
9e60: ce0b4e10 ce0b4e10 c082da3c c082da3c c02b8c70 c077c610 c0749508 00000000
9e80: 00000000 c02b9e7c c02b9e64 ce0b4e10 00000000 c02b8b20 ce0b4e10 ce0b4e44
9ea0: c082da3c c02b8cd8 00000000 ce059eb8 c082da3c c02b7408 ce079edc ce0b1a34
9ec0: c082da3c c082da3c ce2a0280 00000000 c08158d8 c02b8358 c0663405 c0663405
9ee0: 00000073 c082da3c c079e4e8 c07ab3bc c0844340 c02b9334 00000000 00000006
9f00: c079e4e8 c0008920 c067f6bf c0ac7c6b 00000000 c0712e28 00000000 00000000
9f20: c0712e38 ce059f38 00000093 c0ac7c82 00000000 c0058994 00000000 c07130e8
9f40: c07127b8 00000093 00000006 00000006 00000001 00000006 00000006 c079e4e8
9f60: c07ab3bc c0844340 00000093 c0749508 c079e4f4 c0749c64 00000006 00000006
9f80: c0749508 00000000 00000000 c0517e2c 00000000 00000000 00000000 00000000
9fa0: 00000000 c0517e34 00000000 c000dfb8 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
(omap_aes_probe+0x1cc/0x584)
(platform_drv_probe+0x18/0x48)
(driver_probe_device+0xb0/0x200)
(__driver_attach+0x68/0x8c)
(bus_for_each_dev+0x50/0x88)
(bus_add_driver+0xcc/0x1c8)
(driver_register+0x9c/0xe0)
(do_one_initcall+0x98/0x140)
(kernel_init_freeable+0x16c/0x23c)
(kernel_init+0x8/0x100)
(ret_from_fork+0x14/0x3c)
Code: e1811002 e5932020 e590300c e0833002 (e593c000)

Let's fix the issue by adding omap34xx-hs.dtsi and omap36xx-hs.dtsi and make
n900, n9 and n950 to use them. This way we have the aes, sham and timer12
disabled for secure devices the same way legacy booting does based on the
omap34xx_gp_hwmod_ocp_ifs and omap36xx_gp_hwmod_ocp_ifs arrays in
omap_hwmod_3xxx_data.c.

Reported-by: Sebastian Reichel <sre@debian.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>

--- a/arch/arm/boot/dts/omap3-n900.dts
+++ b/arch/arm/boot/dts/omap3-n900.dts
@@ -9,7 +9,7 @@
 
 /dts-v1/;
 
-#include "omap34xx.dtsi"
+#include "omap34xx-hs.dtsi"
 
 / {
 	model = "Nokia N900";
--- a/arch/arm/boot/dts/omap3-n950-n9.dtsi
+++ b/arch/arm/boot/dts/omap3-n950-n9.dtsi
@@ -8,7 +8,7 @@
  * published by the Free Software Foundation.
  */
 
-#include "omap36xx.dtsi"
+#include "omap36xx-hs.dtsi"
 
 / {
 	cpus {
--- /dev/null
+++ b/arch/arm/boot/dts/omap34xx-hs.dtsi
@@ -0,0 +1,16 @@
+/* Disabled modules for secure omaps */
+
+#include "omap34xx.dtsi"
+
+/* Secure omaps have some devices inaccessible depending on the firmware */
+&aes {
+	status = "disabled";
+};
+
+&sham {
+	status = "disabled";
+};
+
+&timer12 {
+	status = "disabled";
+};
--- /dev/null
+++ b/arch/arm/boot/dts/omap36xx-hs.dtsi
@@ -0,0 +1,16 @@
+/* Disabled modules for secure omaps */
+
+#include "omap36xx.dtsi"
+
+/* Secure omaps have some devices inaccessible depending on the firmware */
+&aes {
+	status = "disabled";
+};
+
+&sham {
+	status = "disabled";
+};
+
+&timer12 {
+	status = "disabled";
+};

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-07  0:00   ` Sebastian Reichel
  2013-12-07  0:38     ` Tony Lindgren
@ 2013-12-07  8:18     ` Pali Rohár
  2013-12-07 13:48       ` Sebastian Reichel
  2015-02-09 11:55     ` 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot Pali Rohár
  2 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2013-12-07  8:18 UTC (permalink / raw)
  To: Sebastian Reichel; +Cc: Tony Lindgren, linux-omap, Aaro Koskinen

[-- Attachment #1: Type: Text/Plain, Size: 3853 bytes --]

On Saturday 07 December 2013 01:00:27 Sebastian Reichel wrote:
> On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can
> > > see the relevant kernel output below. Disabling the AES
> > > module in the omap3-n900.dts with status = "disabled"
> > > fixed the boot for me.
> > 
> > OK thanks for letting me know. How about the following patch
> > to fix it?
> 
> That's basically what I did to fix the problem.
> 
> I guess the proper fix would be a runtime check if the device
> can be accessed (if that's possible). AFAIK it is possible to
> use the AES module on the N900 if the bootloader is slightly
> patched.
> 
> Pali, can you elaborate more about this? I've seen, that you
> added a section about this on [0].
> 
> [0] http://elinux.org/N900#M-Shield
> 
> -- Sebastian
> 

Yes, if you want to use M-Shield on Nokia N900, you need to use 
new version of signed Nokia X-Loader which enable M-Shield usage 
outside secure world.

Because this updated X-Loader is not official and I think nobody 
has it in n900, please disable using these M-Shield crypto 
modules on n900.

Without patched updated X-Loader kernel will crash.

> > Unhandled fault: external abort on non-linefetch (0x1028) at
> > 0xfa0c5048 SMP ARM
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W   
> > 3.13.0-rc2+ #446 task: ce057b40 ti: ce058000 task.ti:
> > ce058000
> > PC is at omap_aes_dma_stop+0x24/0x3c
> > LR is at omap_aes_probe+0x1cc/0x584
> > 
> >    psr: 60000113
> > 
> > sp : ce059e20  ip : ce0b4ee0  fp : 00000000
> > r10: c0573ae8  r9 : c0749508  r8 : 00000000
> > r7 : ce0b4e00  r6 : 00000000  r5 : ce0b4e10  r4 : ce274890
> > r3 : fa0c5048  r2 : 00000048  r1 : 0000002c  r0 : ce274890
> > Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment
> > kernel Control: 10c5387d  Table: 80004019  DAC: 00000015
> > Process swapper/0 (pid: 1, stack limit = 0xce058248)
> > Stack: (0xce059e20 to 0xce05a000)
> > 9e20: c0749508 0000a1ff 00000000 c016cd8c c06b5a06 ce2a45f0
> > ce2a4570 ce0b5fb0 9e40: 00000000 480c5000 480c504f c0abe4e4
> > 00000200 00000000 00000000 00000000 9e60: ce0b4e10 ce0b4e10
> > c082da3c c082da3c c02b8c70 c077c610 c0749508 00000000 9e80:
> > 00000000 c02b9e7c c02b9e64 ce0b4e10 00000000 c02b8b20
> > ce0b4e10 ce0b4e44 9ea0: c082da3c c02b8cd8 00000000 ce059eb8
> > c082da3c c02b7408 ce079edc ce0b1a34 9ec0: c082da3c c082da3c
> > ce2a0280 00000000 c08158d8 c02b8358 c0663405 c0663405 9ee0:
> > 00000073 c082da3c c079e4e8 c07ab3bc c0844340 c02b9334
> > 00000000 00000006 9f00: c079e4e8 c0008920 c067f6bf c0ac7c6b
> > 00000000 c0712e28 00000000 00000000 9f20: c0712e38 ce059f38
> > 00000093 c0ac7c82 00000000 c0058994 00000000 c07130e8 9f40:
> > c07127b8 00000093 00000006 00000006 00000001 00000006
> > 00000006 c079e4e8 9f60: c07ab3bc c0844340 00000093 c0749508
> > c079e4f4 c0749c64 00000006 00000006 9f80: c0749508 00000000
> > 00000000 c0517e2c 00000000 00000000 00000000 00000000 9fa0:
> > 00000000 c0517e34 00000000 c000dfb8 00000000 00000000
> > 00000000 00000000 9fc0: 00000000 00000000 00000000 00000000
> > 00000000 00000000 00000000 00000000 9fe0: 00000000 00000000
> > 00000000 00000000 00000013 00000000 ffffffff ffffffff
> > (omap_aes_probe+0x1cc/0x584)
> > (platform_drv_probe+0x18/0x48)
> > (driver_probe_device+0xb0/0x200)
> > (__driver_attach+0x68/0x8c)
> > (bus_for_each_dev+0x50/0x88)
> > (bus_add_driver+0xcc/0x1c8)
> > (driver_register+0x9c/0xe0)
> > (do_one_initcall+0x98/0x140)
> > (kernel_init_freeable+0x16c/0x23c)
> > (kernel_init+0x8/0x100)
> > (ret_from_fork+0x14/0x3c)
> > Code: e1811002 e5932020 e590300c e0833002 (e593c000)

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-07  8:18     ` Pali Rohár
@ 2013-12-07 13:48       ` Sebastian Reichel
  2013-12-07 13:57         ` Pali Rohár
  0 siblings, 1 reply; 87+ messages in thread
From: Sebastian Reichel @ 2013-12-07 13:48 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Tony Lindgren, linux-omap, Aaro Koskinen

[-- Attachment #1: Type: text/plain, Size: 1980 bytes --]

On Sat, Dec 07, 2013 at 09:18:32AM +0100, Pali Rohár wrote:
> On Saturday 07 December 2013 01:00:27 Sebastian Reichel wrote:
> > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can
> > > > see the relevant kernel output below. Disabling the AES
> > > > module in the omap3-n900.dts with status = "disabled"
> > > > fixed the boot for me.
> > > 
> > > OK thanks for letting me know. How about the following patch
> > > to fix it?
> > 
> > That's basically what I did to fix the problem.
> > 
> > I guess the proper fix would be a runtime check if the device
> > can be accessed (if that's possible). AFAIK it is possible to
> > use the AES module on the N900 if the bootloader is slightly
> > patched.
> > 
> > Pali, can you elaborate more about this? I've seen, that you
> > added a section about this on [0].
> > 
> > [0] http://elinux.org/N900#M-Shield
> > 
> > -- Sebastian
> > 
> 
> Yes, if you want to use M-Shield on Nokia N900, you need to use 
> new version of signed Nokia X-Loader which enable M-Shield usage 
> outside secure world.
> 
> Because this updated X-Loader is not official and I think nobody 
> has it in n900, please disable using these M-Shield crypto 
> modules on n900.

Is the updated X-Loader available somewhere?

> Without patched updated X-Loader kernel will crash.

That's what this thread is about :) It might be better to have a
runtime check for the crypto hw's accessability by adding something
like this to the driver's probe function:

if (!aes_module_is_accessible()) {
    dev_err(&pdev->dev, "Usage of OMAP's AES module is blocked\n");
    return -ENODEV;
}

That would require, that aes_module_is_accessible() can actually be
implemented.

P.S.: I fixed the mail address of Aaro Koskinen. His nokia mail
address doesn't work anymore.

-- Sebastian

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-07 13:48       ` Sebastian Reichel
@ 2013-12-07 13:57         ` Pali Rohár
  2013-12-07 16:51           ` Tony Lindgren
  0 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2013-12-07 13:57 UTC (permalink / raw)
  To: Sebastian Reichel; +Cc: Tony Lindgren, linux-omap, Aaro Koskinen

[-- Attachment #1: Type: Text/Plain, Size: 2578 bytes --]

On Saturday 07 December 2013 14:48:20 Sebastian Reichel wrote:
> On Sat, Dec 07, 2013 at 09:18:32AM +0100, Pali Rohár wrote:
> > On Saturday 07 December 2013 01:00:27 Sebastian Reichel 
wrote:
> > > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren 
wrote:
> > > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You
> > > > > can see the relevant kernel output below. Disabling
> > > > > the AES module in the omap3-n900.dts with status =
> > > > > "disabled" fixed the boot for me.
> > > > 
> > > > OK thanks for letting me know. How about the following
> > > > patch to fix it?
> > > 
> > > That's basically what I did to fix the problem.
> > > 
> > > I guess the proper fix would be a runtime check if the
> > > device can be accessed (if that's possible). AFAIK it is
> > > possible to use the AES module on the N900 if the
> > > bootloader is slightly patched.
> > > 
> > > Pali, can you elaborate more about this? I've seen, that
> > > you added a section about this on [0].
> > > 
> > > [0] http://elinux.org/N900#M-Shield
> > > 
> > > -- Sebastian
> > 
> > Yes, if you want to use M-Shield on Nokia N900, you need to
> > use new version of signed Nokia X-Loader which enable
> > M-Shield usage outside secure world.
> > 
> > Because this updated X-Loader is not official and I think
> > nobody has it in n900, please disable using these M-Shield
> > crypto modules on n900.
> 
> Is the updated X-Loader available somewhere?
> 

It was on mediafire server linked from this thread: 
http://maemo.org/community/maemo-developers/n900_aes_and_sha1-md5_hw_acceleration_drivers/

Now it is deleted from that server, but I have copy on my HDD.

> > Without patched updated X-Loader kernel will crash.
> 
> That's what this thread is about :) It might be better to have
> a runtime check for the crypto hw's accessability by adding
> something like this to the driver's probe function:
> 
> if (!aes_module_is_accessible()) {
>     dev_err(&pdev->dev, "Usage of OMAP's AES module is
> blocked\n"); return -ENODEV;
> }
> 
> That would require, that aes_module_is_accessible() can
> actually be implemented.
> 

You need to ask that question to somebody who has 1) access to 
secure M-Shield documentation or 2) author who created that M-
Shield driver if your check is possible...

> P.S.: I fixed the mail address of Aaro Koskinen. His nokia
> mail address doesn't work anymore.
> 
> -- Sebastian

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-07 13:57         ` Pali Rohár
@ 2013-12-07 16:51           ` Tony Lindgren
  2013-12-07 17:53             ` Tony Lindgren
  2013-12-07 18:49             ` runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot) Sebastian Reichel
  0 siblings, 2 replies; 87+ messages in thread
From: Tony Lindgren @ 2013-12-07 16:51 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Sebastian Reichel, linux-omap, Aaro Koskinen

* Pali Rohár <pali.rohar@gmail.com> [131207 05:58]:
> On Saturday 07 December 2013 14:48:20 Sebastian Reichel wrote:
> > On Sat, Dec 07, 2013 at 09:18:32AM +0100, Pali Rohár wrote:
> > > On Saturday 07 December 2013 01:00:27 Sebastian Reichel 
> wrote:
> > > > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren 
> wrote:
> > > > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You
> > > > > > can see the relevant kernel output below. Disabling
> > > > > > the AES module in the omap3-n900.dts with status =
> > > > > > "disabled" fixed the boot for me.
> > > > > 
> > > > > OK thanks for letting me know. How about the following
> > > > > patch to fix it?
> > > > 
> > > > That's basically what I did to fix the problem.
> > > > 
> > > > I guess the proper fix would be a runtime check if the
> > > > device can be accessed (if that's possible). AFAIK it is
> > > > possible to use the AES module on the N900 if the
> > > > bootloader is slightly patched.
> > > > 
> > > > Pali, can you elaborate more about this? I've seen, that
> > > > you added a section about this on [0].
> > > > 
> > > > [0] http://elinux.org/N900#M-Shield
> > > > 
> > > > -- Sebastian
> > > 
> > > Yes, if you want to use M-Shield on Nokia N900, you need to
> > > use new version of signed Nokia X-Loader which enable
> > > M-Shield usage outside secure world.
> > > 
> > > Because this updated X-Loader is not official and I think
> > > nobody has it in n900, please disable using these M-Shield
> > > crypto modules on n900.
> > 
> > Is the updated X-Loader available somewhere?
> > 
> 
> It was on mediafire server linked from this thread: 
> http://maemo.org/community/maemo-developers/n900_aes_and_sha1-md5_hw_acceleration_drivers/
>
> Now it is deleted from that server, but I have copy on my HDD.

Hmm OK let's hope there's some working link still around for those.
It seems like we should eventually cover both options, but for
the -rc cycle, we need to just disable those hardware modules in
the .dtsi files.
 
> > > Without patched updated X-Loader kernel will crash.
> > 
> > That's what this thread is about :) It might be better to have
> > a runtime check for the crypto hw's accessability by adding
> > something like this to the driver's probe function:
> > 
> > if (!aes_module_is_accessible()) {
> >     dev_err(&pdev->dev, "Usage of OMAP's AES module is
> > blocked\n"); return -ENODEV;
> > }
> > 
> > That would require, that aes_module_is_accessible() can
> > actually be implemented.
> > 
> 
> You need to ask that question to somebody who has 1) access to 
> secure M-Shield documentation or 2) author who created that M-
> Shield driver if your check is possible...

I think we can check the configuration from the L3 registers.
Looks like we no longer have those defined after purging the
unused defines a while back, but I think the registers for the
configuration are L3_PM_READ_PERMISSION etc registers.
 
> > P.S.: I fixed the mail address of Aaro Koskinen. His nokia
> > mail address doesn't work anymore.

Oops sorry about that, I need to update my address book.

Regards,

Tohy
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-07 16:51           ` Tony Lindgren
@ 2013-12-07 17:53             ` Tony Lindgren
  2013-12-07 18:49             ` runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot) Sebastian Reichel
  1 sibling, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2013-12-07 17:53 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Sebastian Reichel, linux-omap, Aaro Koskinen

* Tony Lindgren <tony@atomide.com> [131207 08:52]:
> 
> I think we can check the configuration from the L3 registers.
> Looks like we no longer have those defined after purging the
> unused defines a while back, but I think the registers for the
> configuration are L3_PM_READ_PERMISSION etc registers.

Hmm I must have grepped in the wrong directory. We do have those
defined in the mainline kernel.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2013-12-07 16:51           ` Tony Lindgren
  2013-12-07 17:53             ` Tony Lindgren
@ 2013-12-07 18:49             ` Sebastian Reichel
  2013-12-07 21:11               ` Tony Lindgren
  1 sibling, 1 reply; 87+ messages in thread
From: Sebastian Reichel @ 2013-12-07 18:49 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: Pali Rohár, linux-omap, Aaro Koskinen, Joel Fernandes

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]

On Sat, Dec 07, 2013 at 08:51:04AM -0800, Tony Lindgren wrote:
> * Pali Rohár <pali.rohar@gmail.com> [131207 05:58]:
> > On Saturday 07 December 2013 14:48:20 Sebastian Reichel wrote:
> > > On Sat, Dec 07, 2013 at 09:18:32AM +0100, Pali Rohár wrote:
> > > > On Saturday 07 December 2013 01:00:27 Sebastian Reichel 
> > wrote:
> > > > > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren 
> > wrote:
> > > > > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You
> > > > > > > can see the relevant kernel output below. Disabling
> > > > > > > the AES module in the omap3-n900.dts with status =
> > > > > > > "disabled" fixed the boot for me.
> > > > > > 
> > > > > > OK thanks for letting me know. How about the following
> > > > > > patch to fix it?
> > > > > 
> > > > > That's basically what I did to fix the problem.
> > > > > 
> > > > > I guess the proper fix would be a runtime check if the
> > > > > device can be accessed (if that's possible). AFAIK it is
> > > > > possible to use the AES module on the N900 if the
> > > > > bootloader is slightly patched.
> > > > > 
> > > > > Pali, can you elaborate more about this? I've seen, that
> > > > > you added a section about this on [0].
> > > > > 
> > > > > [0] http://elinux.org/N900#M-Shield
> > > > > 
> > > > > -- Sebastian
> > > > 
> > > > Yes, if you want to use M-Shield on Nokia N900, you need to
> > > > use new version of signed Nokia X-Loader which enable
> > > > M-Shield usage outside secure world.
> > > > 
> > > > Because this updated X-Loader is not official and I think
> > > > nobody has it in n900, please disable using these M-Shield
> > > > crypto modules on n900.
> > > 
> > > Is the updated X-Loader available somewhere?
> > > 
> > 
> > It was on mediafire server linked from this thread: 
> > http://maemo.org/community/maemo-developers/n900_aes_and_sha1-md5_hw_acceleration_drivers/
> >
> > Now it is deleted from that server, but I have copy on my HDD.
> 
> Hmm OK let's hope there's some working link still around for those.
> It seems like we should eventually cover both options, but for
> the -rc cycle, we need to just disable those hardware modules in
> the .dtsi files.

Yes. I think for -rc cycle disabling the AES module is the right
way. It has been disabled before anyway. I will test your patch
later.

> I think we can check the configuration from the L3 registers.
> Looks like we no longer have those defined after purging the
> unused defines a while back, but I think the registers for the
> configuration are L3_PM_READ_PERMISSION etc registers.

I asked Pali to send me his copy of the updated NOLO bootloader,
so that I can test this. I just checked the omap documentation
(I only have access to the public one) and crypto related stuff
is not documented for the L3_PM_READ_PERMISSION register. There
are a couple of reserved bits, which may be used for this, though.

I also CC'd Joel Fernandes, since he worked on the driver before and
may have access to the documentation.

-- Sebastian

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2013-12-07 18:49             ` runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot) Sebastian Reichel
@ 2013-12-07 21:11               ` Tony Lindgren
  2013-12-07 23:03                 ` Sebastian Reichel
  0 siblings, 1 reply; 87+ messages in thread
From: Tony Lindgren @ 2013-12-07 21:11 UTC (permalink / raw)
  To: Pali Rohár, linux-omap, Aaro Koskinen, Joel Fernandes

* Sebastian Reichel <sre@ring0.de> [131207 10:51]:
> On Sat, Dec 07, 2013 at 08:51:04AM -0800, Tony Lindgren wrote:
> > * Pali Rohár <pali.rohar@gmail.com> [131207 05:58]:
> > > On Saturday 07 December 2013 14:48:20 Sebastian Reichel wrote:
> > > > On Sat, Dec 07, 2013 at 09:18:32AM +0100, Pali Rohár wrote:
> > > > > On Saturday 07 December 2013 01:00:27 Sebastian Reichel 
> > > wrote:
> > > > > > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren 
> > > wrote:
> > > > > > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > > > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You
> > > > > > > > can see the relevant kernel output below. Disabling
> > > > > > > > the AES module in the omap3-n900.dts with status =
> > > > > > > > "disabled" fixed the boot for me.
> > > > > > > 
> > > > > > > OK thanks for letting me know. How about the following
> > > > > > > patch to fix it?
> > > > > > 
> > > > > > That's basically what I did to fix the problem.
> > > > > > 
> > > > > > I guess the proper fix would be a runtime check if the
> > > > > > device can be accessed (if that's possible). AFAIK it is
> > > > > > possible to use the AES module on the N900 if the
> > > > > > bootloader is slightly patched.
> > > > > > 
> > > > > > Pali, can you elaborate more about this? I've seen, that
> > > > > > you added a section about this on [0].
> > > > > > 
> > > > > > [0] http://elinux.org/N900#M-Shield
> > > > > > 
> > > > > > -- Sebastian
> > > > > 
> > > > > Yes, if you want to use M-Shield on Nokia N900, you need to
> > > > > use new version of signed Nokia X-Loader which enable
> > > > > M-Shield usage outside secure world.
> > > > > 
> > > > > Because this updated X-Loader is not official and I think
> > > > > nobody has it in n900, please disable using these M-Shield
> > > > > crypto modules on n900.
> > > > 
> > > > Is the updated X-Loader available somewhere?
> > > > 
> > > 
> > > It was on mediafire server linked from this thread: 
> > > http://maemo.org/community/maemo-developers/n900_aes_and_sha1-md5_hw_acceleration_drivers/
> > >
> > > Now it is deleted from that server, but I have copy on my HDD.
> > 
> > Hmm OK let's hope there's some working link still around for those.
> > It seems like we should eventually cover both options, but for
> > the -rc cycle, we need to just disable those hardware modules in
> > the .dtsi files.
> 
> Yes. I think for -rc cycle disabling the AES module is the right
> way. It has been disabled before anyway. I will test your patch
> later.

OK thanks.
 
> > I think we can check the configuration from the L3 registers.
> > Looks like we no longer have those defined after purging the
> > unused defines a while back, but I think the registers for the
> > configuration are L3_PM_READ_PERMISSION etc registers.
> 
> I asked Pali to send me his copy of the updated NOLO bootloader,
> so that I can test this. I just checked the omap documentation
> (I only have access to the public one) and crypto related stuff
> is not documented for the L3_PM_READ_PERMISSION register. There
> are a couple of reserved bits, which may be used for this, though.
> 
> I also CC'd Joel Fernandes, since he worked on the driver before and
> may have access to the documentation.

Looks like at least the 36xx public version referenced here has them:

http://www.spinics.net/lists/linux-omap/msg21857.html

I'd assume the registers are the same for 34xx since we don't have
them defined separately in the kernel.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2013-12-07 21:11               ` Tony Lindgren
@ 2013-12-07 23:03                 ` Sebastian Reichel
  2013-12-07 23:22                   ` Tony Lindgren
  0 siblings, 1 reply; 87+ messages in thread
From: Sebastian Reichel @ 2013-12-07 23:03 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: Pali Rohár, linux-omap, Aaro Koskinen, Joel Fernandes

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren wrote:
> > I asked Pali to send me his copy of the updated NOLO bootloader,
> > so that I can test this. I just checked the omap documentation
> > (I only have access to the public one) and crypto related stuff
> > is not documented for the L3_PM_READ_PERMISSION register. There
> > are a couple of reserved bits, which may be used for this, though.
> > 
> > I also CC'd Joel Fernandes, since he worked on the driver before and
> > may have access to the documentation.
> 
> Looks like at least the 36xx public version referenced here has them:
> 
> http://www.spinics.net/lists/linux-omap/msg21857.html
> 
> I'd assume the registers are the same for 34xx since we don't have
> them defined separately in the kernel.

I can't find it in the omap36xx documentation either. Maybe I search
at the wrong position. I tried to find something crypto related in

Table 9-89. L3_PM_READ_PERMISSION_i

-- Sebastian

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2013-12-07 23:03                 ` Sebastian Reichel
@ 2013-12-07 23:22                   ` Tony Lindgren
  2014-09-08 23:45                     ` Pali Rohár
                                       ` (2 more replies)
  0 siblings, 3 replies; 87+ messages in thread
From: Tony Lindgren @ 2013-12-07 23:22 UTC (permalink / raw)
  To: Pali Rohár, linux-omap, Aaro Koskinen, Joel Fernandes

* Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren wrote:
> > > I asked Pali to send me his copy of the updated NOLO bootloader,
> > > so that I can test this. I just checked the omap documentation
> > > (I only have access to the public one) and crypto related stuff
> > > is not documented for the L3_PM_READ_PERMISSION register. There
> > > are a couple of reserved bits, which may be used for this, though.
> > > 
> > > I also CC'd Joel Fernandes, since he worked on the driver before and
> > > may have access to the documentation.
> > 
> > Looks like at least the 36xx public version referenced here has them:
> > 
> > http://www.spinics.net/lists/linux-omap/msg21857.html
> > 
> > I'd assume the registers are the same for 34xx since we don't have
> > them defined separately in the kernel.
> 
> I can't find it in the omap36xx documentation either. Maybe I search
> at the wrong position. I tried to find something crypto related in
> 
> Table 9-89. L3_PM_READ_PERMISSION_i

Hmm maybe it's done based on the address in L3_PM_ADDR_MATCH_k?

I guess the thing to do would be to compare the register output between
the two different firmware versions.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-06 22:27 ` Tony Lindgren
  2013-12-07  0:00   ` Sebastian Reichel
@ 2013-12-08 14:13   ` Aaro Koskinen
  2013-12-08 16:40     ` Tony Lindgren
  1 sibling, 1 reply; 87+ messages in thread
From: Aaro Koskinen @ 2013-12-08 14:13 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-omap

Hi,

On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > relevant kernel output below. Disabling the AES module in the
> > omap3-n900.dts with status = "disabled" fixed the boot for me.
> 
> OK thanks for letting me know. How about the following patch to
> fix it?
> 
> Aaro, does this work for you on n9(50)? I'd assume n9(50) needs a
> similar patch too.

Yes, also N9/N950 crashes and your patch fixes that.

Thanks,

A.

> 8< ------------------------------
> From: Tony Lindgren <tony@atomide.com>
> Date: Fri, 6 Dec 2013 14:20:17 -0800
> Subject: [PATCH] ARM: dts: Fix booting for secure omaps
> 
> Commit 7ce93f3 (ARM: OMAP2+: Fix more missing data for omap3.dtsi file)
> fixed missing device tree data for omaps, but did not account for some of the
> hardware modules being inaccessible for secure omaps. This causes the
> following error on secure omaps:
> 
> Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0c5048
> SMP ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0-rc2+ #446
> task: ce057b40 ti: ce058000 task.ti: ce058000
> PC is at omap_aes_dma_stop+0x24/0x3c
> LR is at omap_aes_probe+0x1cc/0x584
>    psr: 60000113
> sp : ce059e20  ip : ce0b4ee0  fp : 00000000
> r10: c0573ae8  r9 : c0749508  r8 : 00000000
> r7 : ce0b4e00  r6 : 00000000  r5 : ce0b4e10  r4 : ce274890
> r3 : fa0c5048  r2 : 00000048  r1 : 0000002c  r0 : ce274890
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> Control: 10c5387d  Table: 80004019  DAC: 00000015
> Process swapper/0 (pid: 1, stack limit = 0xce058248)
> Stack: (0xce059e20 to 0xce05a000)
> 9e20: c0749508 0000a1ff 00000000 c016cd8c c06b5a06 ce2a45f0 ce2a4570 ce0b5fb0
> 9e40: 00000000 480c5000 480c504f c0abe4e4 00000200 00000000 00000000 00000000
> 9e60: ce0b4e10 ce0b4e10 c082da3c c082da3c c02b8c70 c077c610 c0749508 00000000
> 9e80: 00000000 c02b9e7c c02b9e64 ce0b4e10 00000000 c02b8b20 ce0b4e10 ce0b4e44
> 9ea0: c082da3c c02b8cd8 00000000 ce059eb8 c082da3c c02b7408 ce079edc ce0b1a34
> 9ec0: c082da3c c082da3c ce2a0280 00000000 c08158d8 c02b8358 c0663405 c0663405
> 9ee0: 00000073 c082da3c c079e4e8 c07ab3bc c0844340 c02b9334 00000000 00000006
> 9f00: c079e4e8 c0008920 c067f6bf c0ac7c6b 00000000 c0712e28 00000000 00000000
> 9f20: c0712e38 ce059f38 00000093 c0ac7c82 00000000 c0058994 00000000 c07130e8
> 9f40: c07127b8 00000093 00000006 00000006 00000001 00000006 00000006 c079e4e8
> 9f60: c07ab3bc c0844340 00000093 c0749508 c079e4f4 c0749c64 00000006 00000006
> 9f80: c0749508 00000000 00000000 c0517e2c 00000000 00000000 00000000 00000000
> 9fa0: 00000000 c0517e34 00000000 c000dfb8 00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
> (omap_aes_probe+0x1cc/0x584)
> (platform_drv_probe+0x18/0x48)
> (driver_probe_device+0xb0/0x200)
> (__driver_attach+0x68/0x8c)
> (bus_for_each_dev+0x50/0x88)
> (bus_add_driver+0xcc/0x1c8)
> (driver_register+0x9c/0xe0)
> (do_one_initcall+0x98/0x140)
> (kernel_init_freeable+0x16c/0x23c)
> (kernel_init+0x8/0x100)
> (ret_from_fork+0x14/0x3c)
> Code: e1811002 e5932020 e590300c e0833002 (e593c000)
> 
> Let's fix the issue by adding omap34xx-hs.dtsi and omap36xx-hs.dtsi and make
> n900, n9 and n950 to use them.
> 
> Reported-by: Sebastian Reichel <sre@debian.org>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> 
> --- a/arch/arm/boot/dts/omap3-n900.dts
> +++ b/arch/arm/boot/dts/omap3-n900.dts
> @@ -9,7 +9,7 @@
>  
>  /dts-v1/;
>  
> -#include "omap34xx.dtsi"
> +#include "omap34xx-hs.dtsi"
>  
>  / {
>  	model = "Nokia N900";
> --- a/arch/arm/boot/dts/omap3-n950-n9.dtsi
> +++ b/arch/arm/boot/dts/omap3-n950-n9.dtsi
> @@ -8,7 +8,7 @@
>   * published by the Free Software Foundation.
>   */
>  
> -#include "omap36xx.dtsi"
> +#include "omap36xx-hs.dtsi"
>  
>  / {
>  	cpus {
> --- /dev/null
> +++ b/arch/arm/boot/dts/omap34xx-hs.dtsi
> @@ -0,0 +1,8 @@
> +/* Disabled modules for secure omaps */
> +
> +#include "omap34xx.dtsi"
> +
> +/* Secure omaps have some devices inaccessible depending on the firmware */
> +&aes {
> +	status = "disabled";
> +};
> --- /dev/null
> +++ b/arch/arm/boot/dts/omap36xx-hs.dtsi
> @@ -0,0 +1,8 @@
> +/* Disabled modules for secure omaps */
> +
> +#include "omap36xx.dtsi"
> +
> +/* Secure omaps have some devices inaccessible depending on the firmware */
> +&aes {
> +	status = "disabled";
> +};

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-08 14:13   ` Aaro Koskinen
@ 2013-12-08 16:40     ` Tony Lindgren
  2013-12-08 17:10       ` Sebastian Reichel
  0 siblings, 1 reply; 87+ messages in thread
From: Tony Lindgren @ 2013-12-08 16:40 UTC (permalink / raw)
  To: Aaro Koskinen; +Cc: linux-omap

* Aaro Koskinen <aaro.koskinen@iki.fi> [131208 06:15]:
> Hi,
> 
> On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > > relevant kernel output below. Disabling the AES module in the
> > > omap3-n900.dts with status = "disabled" fixed the boot for me.
> > 
> > OK thanks for letting me know. How about the following patch to
> > fix it?
> > 
> > Aaro, does this work for you on n9(50)? I'd assume n9(50) needs a
> > similar patch too.
> 
> Yes, also N9/N950 crashes and your patch fixes that.

OK thanks, there's the updated version of this patch if you guys
care to ack, I'd like to commit it today.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-08 16:40     ` Tony Lindgren
@ 2013-12-08 17:10       ` Sebastian Reichel
  2013-12-08 17:43         ` Tony Lindgren
  0 siblings, 1 reply; 87+ messages in thread
From: Sebastian Reichel @ 2013-12-08 17:10 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: Aaro Koskinen, linux-omap

[-- Attachment #1: Type: text/plain, Size: 948 bytes --]

On Sun, Dec 08, 2013 at 08:40:29AM -0800, Tony Lindgren wrote:
> * Aaro Koskinen <aaro.koskinen@iki.fi> [131208 06:15]:
> > Hi,
> > 
> > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > > > relevant kernel output below. Disabling the AES module in the
> > > > omap3-n900.dts with status = "disabled" fixed the boot for me.
> > > 
> > > OK thanks for letting me know. How about the following patch to
> > > fix it?
> > > 
> > > Aaro, does this work for you on n9(50)? I'd assume n9(50) needs a
> > > similar patch too.
> > 
> > Yes, also N9/N950 crashes and your patch fixes that.
> 
> OK thanks, there's the updated version of this patch if you guys
> care to ack, I'd like to commit it today.

Where is the updated version? I couldn't find it in this thread.

-- Sebastian

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-08 17:10       ` Sebastian Reichel
@ 2013-12-08 17:43         ` Tony Lindgren
  2013-12-08 17:59           ` Aaro Koskinen
  2013-12-08 18:09           ` Sebastian Reichel
  0 siblings, 2 replies; 87+ messages in thread
From: Tony Lindgren @ 2013-12-08 17:43 UTC (permalink / raw)
  To: Aaro Koskinen, linux-omap

* Sebastian Reichel <sre@debian.org> [131208 09:11]:
> On Sun, Dec 08, 2013 at 08:40:29AM -0800, Tony Lindgren wrote:
> > * Aaro Koskinen <aaro.koskinen@iki.fi> [131208 06:15]:
> > > Hi,
> > > 
> > > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > > > > relevant kernel output below. Disabling the AES module in the
> > > > > omap3-n900.dts with status = "disabled" fixed the boot for me.
> > > > 
> > > > OK thanks for letting me know. How about the following patch to
> > > > fix it?
> > > > 
> > > > Aaro, does this work for you on n9(50)? I'd assume n9(50) needs a
> > > > similar patch too.
> > > 
> > > Yes, also N9/N950 crashes and your patch fixes that.
> > 
> > OK thanks, there's the updated version of this patch if you guys
> > care to ack, I'd like to commit it today.
> 
> Where is the updated version? I couldn't find it in this thread.

It's Message-ID: <20131207003824.GO26766@atomide.com> in this thread,
the version that disables aes, sham and timer12 like we currently do
in the hwmod code for legacy booting:

http://www.spinics.net/lists/linux-omap/msg101045.html

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-08 17:43         ` Tony Lindgren
@ 2013-12-08 17:59           ` Aaro Koskinen
  2013-12-08 18:09           ` Sebastian Reichel
  1 sibling, 0 replies; 87+ messages in thread
From: Aaro Koskinen @ 2013-12-08 17:59 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-omap

On Sun, Dec 08, 2013 at 09:43:51AM -0800, Tony Lindgren wrote:
> * Sebastian Reichel <sre@debian.org> [131208 09:11]:
> > On Sun, Dec 08, 2013 at 08:40:29AM -0800, Tony Lindgren wrote:
> > > * Aaro Koskinen <aaro.koskinen@iki.fi> [131208 06:15]:
> > > > Hi,
> > > > 
> > > > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > > > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > > > > > relevant kernel output below. Disabling the AES module in the
> > > > > > omap3-n900.dts with status = "disabled" fixed the boot for me.
> > > > > 
> > > > > OK thanks for letting me know. How about the following patch to
> > > > > fix it?
> > > > > 
> > > > > Aaro, does this work for you on n9(50)? I'd assume n9(50) needs a
> > > > > similar patch too.
> > > > 
> > > > Yes, also N9/N950 crashes and your patch fixes that.
> > > 
> > > OK thanks, there's the updated version of this patch if you guys
> > > care to ack, I'd like to commit it today.
> > 
> > Where is the updated version? I couldn't find it in this thread.
> 
> It's Message-ID: <20131207003824.GO26766@atomide.com> in this thread,
> the version that disables aes, sham and timer12 like we currently do
> in the hwmod code for legacy booting:
> 
> http://www.spinics.net/lists/linux-omap/msg101045.html

Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>

A.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-08 17:43         ` Tony Lindgren
  2013-12-08 17:59           ` Aaro Koskinen
@ 2013-12-08 18:09           ` Sebastian Reichel
  1 sibling, 0 replies; 87+ messages in thread
From: Sebastian Reichel @ 2013-12-08 18:09 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: Aaro Koskinen, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1477 bytes --]

On Sun, Dec 08, 2013 at 09:43:51AM -0800, Tony Lindgren wrote:
> * Sebastian Reichel <sre@debian.org> [131208 09:11]:
> > On Sun, Dec 08, 2013 at 08:40:29AM -0800, Tony Lindgren wrote:
> > > * Aaro Koskinen <aaro.koskinen@iki.fi> [131208 06:15]:
> > > > Hi,
> > > > 
> > > > On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > > > > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > > > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can see the
> > > > > > relevant kernel output below. Disabling the AES module in the
> > > > > > omap3-n900.dts with status = "disabled" fixed the boot for me.
> > > > > 
> > > > > OK thanks for letting me know. How about the following patch to
> > > > > fix it?
> > > > > 
> > > > > Aaro, does this work for you on n9(50)? I'd assume n9(50) needs a
> > > > > similar patch too.
> > > > 
> > > > Yes, also N9/N950 crashes and your patch fixes that.
> > > 
> > > OK thanks, there's the updated version of this patch if you guys
> > > care to ack, I'd like to commit it today.
> > 
> > Where is the updated version? I couldn't find it in this thread.
> 
> It's Message-ID: <20131207003824.GO26766@atomide.com> in this thread,
> the version that disables aes, sham and timer12 like we currently do
> in the hwmod code for legacy booting:
> 
> http://www.spinics.net/lists/linux-omap/msg101045.html

boots on my N900:

Acked-By: Sebastian Reichel <sre@debian.org>

-- Sebastian

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2013-12-07 23:22                   ` Tony Lindgren
@ 2014-09-08 23:45                     ` Pali Rohár
  2014-11-25 21:08                     ` Pali Rohár
  2015-01-24 10:40                     ` Pali Rohár
  2 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2014-09-08 23:45 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-omap, Aaro Koskinen, Joel Fernandes

[-- Attachment #1: Type: Text/Plain, Size: 1585 bytes --]

On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren wrote:
> > > > I asked Pali to send me his copy of the updated NOLO
> > > > bootloader, so that I can test this. I just checked the
> > > > omap documentation (I only have access to the public
> > > > one) and crypto related stuff is not documented for the
> > > > L3_PM_READ_PERMISSION register. There are a couple of
> > > > reserved bits, which may be used for this, though.
> > > > 
> > > > I also CC'd Joel Fernandes, since he worked on the
> > > > driver before and may have access to the documentation.
> > > 
> > > Looks like at least the 36xx public version referenced
> > > here has them:
> > > 
> > > http://www.spinics.net/lists/linux-omap/msg21857.html
> > > 
> > > I'd assume the registers are the same for 34xx since we
> > > don't have them defined separately in the kernel.
> > 
> > I can't find it in the omap36xx documentation either. Maybe
> > I search at the wrong position. I tried to find something
> > crypto related in
> > 
> > Table 9-89. L3_PM_READ_PERMISSION_i
> 
> Hmm maybe it's done based on the address in
> L3_PM_ADDR_MATCH_k?
> 
> I guess the thing to do would be to compare the register
> output between the two different firmware versions.
> 
> Regards,
> 
> Tony

Hello, I found some info about omap firewall:
http://web.archive.org/web/20140703081349/http://droid-developers.org/wiki/L3_firewall

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2013-12-07 23:22                   ` Tony Lindgren
  2014-09-08 23:45                     ` Pali Rohár
@ 2014-11-25 21:08                     ` Pali Rohár
  2014-11-25 21:31                       ` Pali Rohár
  2015-01-24 10:40                     ` Pali Rohár
  2 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2014-11-25 21:08 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: linux-omap, Aaro Koskinen, Joel Fernandes, Sebastian Reichel,
	Pavel Machek

[-- Attachment #1: Type: Text/Plain, Size: 1704 bytes --]

On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren 
wrote:
> > > > I asked Pali to send me his copy of the updated NOLO
> > > > bootloader, so that I can test this. I just checked the
> > > > omap documentation (I only have access to the public
> > > > one) and crypto related stuff is not documented for the
> > > > L3_PM_READ_PERMISSION register. There are a couple of
> > > > reserved bits, which may be used for this, though.
> > > > 
> > > > I also CC'd Joel Fernandes, since he worked on the
> > > > driver before and may have access to the documentation.
> > > 
> > > Looks like at least the 36xx public version referenced
> > > here has them:
> > > 
> > > http://www.spinics.net/lists/linux-omap/msg21857.html
> > > 
> > > I'd assume the registers are the same for 34xx since we
> > > don't have them defined separately in the kernel.
> > 
> > I can't find it in the omap36xx documentation either. Maybe
> > I search at the wrong position. I tried to find something
> > crypto related in
> > 
> > Table 9-89. L3_PM_READ_PERMISSION_i
> 
> Hmm maybe it's done based on the address in
> L3_PM_ADDR_MATCH_k?
> 
> I guess the thing to do would be to compare the register
> output between the two different firmware versions.
> 
> Regards,
> 
> Tony

Tony, if you can tell me how to read those registers I can 
provide output from both bootloaders (one that enable aes support 
in L3 firewall and one which not).

Also I can test other patches or provide other logs if you need 
something more...

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2014-11-25 21:08                     ` Pali Rohár
@ 2014-11-25 21:31                       ` Pali Rohár
  2014-11-26 17:54                         ` Tony Lindgren
  0 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2014-11-25 21:31 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: linux-omap, Aaro Koskinen, Joel Fernandes, Sebastian Reichel,
	Pavel Machek, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 2234 bytes --]

On Tuesday 25 November 2014 22:08:17 Pali Rohár wrote:
> On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> > * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren
> 
> wrote:
> > > > > I asked Pali to send me his copy of the updated NOLO
> > > > > bootloader, so that I can test this. I just checked
> > > > > the omap documentation (I only have access to the
> > > > > public one) and crypto related stuff is not
> > > > > documented for the L3_PM_READ_PERMISSION register.
> > > > > There are a couple of reserved bits, which may be
> > > > > used for this, though.
> > > > > 
> > > > > I also CC'd Joel Fernandes, since he worked on the
> > > > > driver before and may have access to the
> > > > > documentation.
> > > > 
> > > > Looks like at least the 36xx public version referenced
> > > > here has them:
> > > > 
> > > > http://www.spinics.net/lists/linux-omap/msg21857.html
> > > > 
> > > > I'd assume the registers are the same for 34xx since we
> > > > don't have them defined separately in the kernel.
> > > 
> > > I can't find it in the omap36xx documentation either.
> > > Maybe I search at the wrong position. I tried to find
> > > something crypto related in
> > > 
> > > Table 9-89. L3_PM_READ_PERMISSION_i
> > 
> > Hmm maybe it's done based on the address in
> > L3_PM_ADDR_MATCH_k?
> > 
> > I guess the thing to do would be to compare the register
> > output between the two different firmware versions.
> > 
> > Regards,
> > 
> > Tony
> 
> Tony, if you can tell me how to read those registers I can
> provide output from both bootloaders (one that enable aes
> support in L3 firewall and one which not).
> 
> Also I can test other patches or provide other logs if you
> need something more...

CCing Nishanth Menon

Problem is that when L3 firewall is not configured by signed 
bootloader then loading omap aes driver cause kernel crash. We 
need some code which can check if omap aes is enabled or not at 
runtime in kernel...

More details with full email tread conversation is there:
http://thread.gmane.org/gmane.linux.ports.arm.omap/108397/

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2014-11-25 21:31                       ` Pali Rohár
@ 2014-11-26 17:54                         ` Tony Lindgren
  2015-01-17  9:18                           ` Pali Rohár
  0 siblings, 1 reply; 87+ messages in thread
From: Tony Lindgren @ 2014-11-26 17:54 UTC (permalink / raw)
  To: Pali Rohár
  Cc: linux-omap, Aaro Koskinen, Joel Fernandes, Sebastian Reichel,
	Pavel Machek, Nishanth Menon

* Pali Rohár <pali.rohar@gmail.com> [141125 13:33]:
> On Tuesday 25 November 2014 22:08:17 Pali Rohár wrote:
> > On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> > > * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > > > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren
> > 
> > wrote:
> > > > > > I asked Pali to send me his copy of the updated NOLO
> > > > > > bootloader, so that I can test this. I just checked
> > > > > > the omap documentation (I only have access to the
> > > > > > public one) and crypto related stuff is not
> > > > > > documented for the L3_PM_READ_PERMISSION register.
> > > > > > There are a couple of reserved bits, which may be
> > > > > > used for this, though.
> > > > > > 
> > > > > > I also CC'd Joel Fernandes, since he worked on the
> > > > > > driver before and may have access to the
> > > > > > documentation.
> > > > > 
> > > > > Looks like at least the 36xx public version referenced
> > > > > here has them:
> > > > > 
> > > > > http://www.spinics.net/lists/linux-omap/msg21857.html
> > > > > 
> > > > > I'd assume the registers are the same for 34xx since we
> > > > > don't have them defined separately in the kernel.
> > > > 
> > > > I can't find it in the omap36xx documentation either.
> > > > Maybe I search at the wrong position. I tried to find
> > > > something crypto related in
> > > > 
> > > > Table 9-89. L3_PM_READ_PERMISSION_i
> > > 
> > > Hmm maybe it's done based on the address in
> > > L3_PM_ADDR_MATCH_k?
> > > 
> > > I guess the thing to do would be to compare the register
> > > output between the two different firmware versions.
> > > 
> > > Regards,
> > > 
> > > Tony
> > 
> > Tony, if you can tell me how to read those registers I can
> > provide output from both bootloaders (one that enable aes
> > support in L3 firewall and one which not).
> > 
> > Also I can test other patches or provide other logs if you
> > need something more...

> CCing Nishanth Menon

Maybe just try dumping out the L3_PM_ADDR_MATCH_* registers
during the boot?

> Problem is that when L3 firewall is not configured by signed 
> bootloader then loading omap aes driver cause kernel crash. We 
> need some code which can check if omap aes is enabled or not at 
> runtime in kernel...
> 
> More details with full email tread conversation is there:
> http://thread.gmane.org/gmane.linux.ports.arm.omap/108397/

AFAIK we can't change the configuration, but it should be
readable somewhere. Sorry don't know which registers would
show the configuration other than the L3_PM_ADDR_MATCH_*
registers.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2014-11-26 17:54                         ` Tony Lindgren
@ 2015-01-17  9:18                           ` Pali Rohár
  2015-01-17 17:04                             ` Tony Lindgren
  0 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2015-01-17  9:18 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: linux-omap, Aaro Koskinen, Joel Fernandes, Sebastian Reichel,
	Pavel Machek, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 3446 bytes --]

On Wednesday 26 November 2014 18:54:12 Tony Lindgren wrote:
> * Pali Rohár <pali.rohar@gmail.com> [141125 13:33]:
> > On Tuesday 25 November 2014 22:08:17 Pali Rohár wrote:
> > > On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> > > > * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > > > > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony
> > > > > Lindgren
> > > 
> > > wrote:
> > > > > > > I asked Pali to send me his copy of the updated
> > > > > > > NOLO bootloader, so that I can test this. I just
> > > > > > > checked the omap documentation (I only have
> > > > > > > access to the public one) and crypto related
> > > > > > > stuff is not documented for the
> > > > > > > L3_PM_READ_PERMISSION register. There are a
> > > > > > > couple of reserved bits, which may be used for
> > > > > > > this, though.
> > > > > > > 
> > > > > > > I also CC'd Joel Fernandes, since he worked on the
> > > > > > > driver before and may have access to the
> > > > > > > documentation.
> > > > > > 
> > > > > > Looks like at least the 36xx public version
> > > > > > referenced here has them:
> > > > > > 
> > > > > > http://www.spinics.net/lists/linux-omap/msg21857.htm
> > > > > > l
> > > > > > 
> > > > > > I'd assume the registers are the same for 34xx since
> > > > > > we don't have them defined separately in the
> > > > > > kernel.
> > > > > 
> > > > > I can't find it in the omap36xx documentation either.
> > > > > Maybe I search at the wrong position. I tried to find
> > > > > something crypto related in
> > > > > 
> > > > > Table 9-89. L3_PM_READ_PERMISSION_i
> > > > 
> > > > Hmm maybe it's done based on the address in
> > > > L3_PM_ADDR_MATCH_k?
> > > > 
> > > > I guess the thing to do would be to compare the register
> > > > output between the two different firmware versions.
> > > > 
> > > > Regards,
> > > > 
> > > > Tony
> > > 
> > > Tony, if you can tell me how to read those registers I can
> > > provide output from both bootloaders (one that enable aes
> > > support in L3 firewall and one which not).
> > > 
> > > Also I can test other patches or provide other logs if you
> > > need something more...
> > 
> > CCing Nishanth Menon
> 
> Maybe just try dumping out the L3_PM_ADDR_MATCH_* registers
> during the boot?
> 
> > Problem is that when L3 firewall is not configured by signed
> > bootloader then loading omap aes driver cause kernel crash.
> > We need some code which can check if omap aes is enabled or
> > not at runtime in kernel...
> > 
> > More details with full email tread conversation is there:
> > http://thread.gmane.org/gmane.linux.ports.arm.omap/108397/
> 
> AFAIK we can't change the configuration, but it should be
> readable somewhere. Sorry don't know which registers would
> show the configuration other than the L3_PM_ADDR_MATCH_*
> registers.
> 
> Regards,
> 
> Tony

Hello, after playing with aes and non-aes version of bootloaders 
I found out that code which configure L3 firewall must be in 
signed X-Loader part. So probably we cannot change configuration 
at runtime...

Tony,
any idea how to dump L3_PM_ADDR_MATCH_* registers from userspace 
or from kernel? Now I have configured one N900 with original 
Nokia X-Loader (where loading omap aes driver cause kernel crash) 
and one N900 with aes X-Loader (where loading omap aes driver is 
working).

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-17  9:18                           ` Pali Rohár
@ 2015-01-17 17:04                             ` Tony Lindgren
  2015-01-17 17:29                               ` Pali Rohár
  0 siblings, 1 reply; 87+ messages in thread
From: Tony Lindgren @ 2015-01-17 17:04 UTC (permalink / raw)
  To: Pali Rohár
  Cc: linux-omap, Aaro Koskinen, Joel Fernandes, Sebastian Reichel,
	Pavel Machek, Nishanth Menon

* Pali Rohár <pali.rohar@gmail.com> [150117 01:21]:
> 
> Hello, after playing with aes and non-aes version of bootloaders 
> I found out that code which configure L3 firewall must be in 
> signed X-Loader part. So probably we cannot change configuration 
> at runtime...
> 
> Tony,
> any idea how to dump L3_PM_ADDR_MATCH_* registers from userspace 
> or from kernel? Now I have configured one N900 with original 
> Nokia X-Loader (where loading omap aes driver cause kernel crash) 
> and one N900 with aes X-Loader (where loading omap aes driver is 
> working).

Hmm maybe give the omapconf too a try? It's at:

https://github.com/omapconf/omapconf

That uses /dev/mem from userspace, then you can just ioremap
them in the kernel code if you need to do something during
runtime. Probably best to access them with a syscon mapping
from the aes driver.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-17 17:04                             ` Tony Lindgren
@ 2015-01-17 17:29                               ` Pali Rohár
  2015-01-17 17:41                                 ` Tony Lindgren
  0 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2015-01-17 17:29 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: linux-omap, Aaro Koskinen, Joel Fernandes, Sebastian Reichel,
	Pavel Machek, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 1141 bytes --]

On Saturday 17 January 2015 18:04:11 Tony Lindgren wrote:
> * Pali Rohár <pali.rohar@gmail.com> [150117 01:21]:
> > Hello, after playing with aes and non-aes version of
> > bootloaders I found out that code which configure L3
> > firewall must be in signed X-Loader part. So probably we
> > cannot change configuration at runtime...
> > 
> > Tony,
> > any idea how to dump L3_PM_ADDR_MATCH_* registers from
> > userspace or from kernel? Now I have configured one N900
> > with original Nokia X-Loader (where loading omap aes driver
> > cause kernel crash) and one N900 with aes X-Loader (where
> > loading omap aes driver is working).
> 
> Hmm maybe give the omapconf too a try? It's at:
> 
> https://github.com/omapconf/omapconf
> 
> That uses /dev/mem from userspace, then you can just ioremap
> them in the kernel code if you need to do something during
> runtime. Probably best to access them with a syscon mapping
> from the aes driver.
> 
> Regards,
> 
> Tony

https://github.com/omapconf/omapconf/wiki

Legacy TI OMAP platforms (OMAP[1-2-3]) are not supported.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-17 17:29                               ` Pali Rohár
@ 2015-01-17 17:41                                 ` Tony Lindgren
  2015-01-31 11:34                                   ` Pali Rohár
  0 siblings, 1 reply; 87+ messages in thread
From: Tony Lindgren @ 2015-01-17 17:41 UTC (permalink / raw)
  To: Pali Rohár
  Cc: linux-omap, Aaro Koskinen, Joel Fernandes, Sebastian Reichel,
	Pavel Machek, Nishanth Menon

* Pali Rohár <pali.rohar@gmail.com> [150117 09:32]:
> On Saturday 17 January 2015 18:04:11 Tony Lindgren wrote:
> > * Pali Rohár <pali.rohar@gmail.com> [150117 01:21]:
> > > Hello, after playing with aes and non-aes version of
> > > bootloaders I found out that code which configure L3
> > > firewall must be in signed X-Loader part. So probably we
> > > cannot change configuration at runtime...
> > > 
> > > Tony,
> > > any idea how to dump L3_PM_ADDR_MATCH_* registers from
> > > userspace or from kernel? Now I have configured one N900
> > > with original Nokia X-Loader (where loading omap aes driver
> > > cause kernel crash) and one N900 with aes X-Loader (where
> > > loading omap aes driver is working).
> > 
> > Hmm maybe give the omapconf too a try? It's at:
> > 
> > https://github.com/omapconf/omapconf
> > 
> > That uses /dev/mem from userspace, then you can just ioremap
> > them in the kernel code if you need to do something during
> > runtime. Probably best to access them with a syscon mapping
> > from the aes driver.
> > 
> > Regards,
> > 
> > Tony
> 
> https://github.com/omapconf/omapconf/wiki
> 
> Legacy TI OMAP platforms (OMAP[1-2-3]) are not supported.

Oh OK. Well at least you can look at the code if you want to
do it from the user space :)

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2013-12-07 23:22                   ` Tony Lindgren
  2014-09-08 23:45                     ` Pali Rohár
  2014-11-25 21:08                     ` Pali Rohár
@ 2015-01-24 10:40                     ` Pali Rohár
  2015-01-31 14:38                       ` Matthijs van Duin
  2 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2015-01-24 10:40 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Tony Lindgren, linux-omap, Aaro Koskinen, Nishanth Menon,
	Pavel Machek, Sebastian Reichel

[-- Attachment #1: Type: Text/Plain, Size: 1842 bytes --]

On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren 
wrote:
> > > > I asked Pali to send me his copy of the updated NOLO
> > > > bootloader, so that I can test this. I just checked the
> > > > omap documentation (I only have access to the public
> > > > one) and crypto related stuff is not documented for the
> > > > L3_PM_READ_PERMISSION register. There are a couple of
> > > > reserved bits, which may be used for this, though.
> > > > 
> > > > I also CC'd Joel Fernandes, since he worked on the
> > > > driver before and may have access to the documentation.
> > > 
> > > Looks like at least the 36xx public version referenced
> > > here has them:
> > > 
> > > http://www.spinics.net/lists/linux-omap/msg21857.html
> > > 
> > > I'd assume the registers are the same for 34xx since we
> > > don't have them defined separately in the kernel.
> > 
> > I can't find it in the omap36xx documentation either. Maybe
> > I search at the wrong position. I tried to find something
> > crypto related in
> > 
> > Table 9-89. L3_PM_READ_PERMISSION_i
> 
> Hmm maybe it's done based on the address in
> L3_PM_ADDR_MATCH_k?
> 
> I guess the thing to do would be to compare the register
> output between the two different firmware versions.
> 
> Regards,
> 
> Tony

Joel Fernandes, you are author of omap-des driver. Do you know 
something about crashing kernel when omap crypto support (via L3 
firewall) is not enabled by bootloader? Do you know something 
about runtime detection of crypto support to prevent kernel/board 
crash?

Problem with kernel log of omap-aes is reported there:
http://thread.gmane.org/gmane.linux.ports.arm.omap/108397/

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-17 17:41                                 ` Tony Lindgren
@ 2015-01-31 11:34                                   ` Pali Rohár
  2015-01-31 15:13                                     ` Matthijs van Duin
  0 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2015-01-31 11:34 UTC (permalink / raw)
  To: Tony Lindgren, Sebastian Reichel
  Cc: linux-omap, Aaro Koskinen, Pavel Machek, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 4435 bytes --]

On Saturday 17 January 2015 18:41:26 Tony Lindgren wrote:
> * Pali Rohár <pali.rohar@gmail.com> [150117 09:32]:
> > On Saturday 17 January 2015 18:04:11 Tony Lindgren wrote:
> > > * Pali Rohár <pali.rohar@gmail.com> [150117 01:21]:
> > > > Hello, after playing with aes and non-aes version of
> > > > bootloaders I found out that code which configure L3
> > > > firewall must be in signed X-Loader part. So probably we
> > > > cannot change configuration at runtime...
> > > > 
> > > > Tony,
> > > > any idea how to dump L3_PM_ADDR_MATCH_* registers from
> > > > userspace or from kernel? Now I have configured one N900
> > > > with original Nokia X-Loader (where loading omap aes
> > > > driver cause kernel crash) and one N900 with aes
> > > > X-Loader (where loading omap aes driver is working).
> > > 
> > > Hmm maybe give the omapconf too a try? It's at:
> > > 
> > > https://github.com/omapconf/omapconf
> > > 
> > > That uses /dev/mem from userspace, then you can just
> > > ioremap them in the kernel code if you need to do
> > > something during runtime. Probably best to access them
> > > with a syscon mapping from the aes driver.
> > > 
> > > Regards,
> > > 
> > > Tony
> > 
> > https://github.com/omapconf/omapconf/wiki
> > 
> > Legacy TI OMAP platforms (OMAP[1-2-3]) are not supported.
> 
> Oh OK. Well at least you can look at the code if you want to
> do it from the user space :)
> 
> Regards,
> 
> Tony

I played with devmem2.c program, but trying to read any of those register caused probably L3 violation 
error...

registers:
L3_PM_ERROR_LOG
L3_PM_CONTROL
L3_PM_REQ_INFO_PERMISSION_i
L3_PM_READ_PERMISSION_i
L3_PM_WRITE_PERMISSION_i
L3_PM_ADDR_MATCH_k

protection mechanisms:
PM_RT
PM_GPMC
PM_OCM_RAM
PM_OCM_ROM
PM_IVA2.2

Here is example output for address 0x68010028 (PM_RT L3_PM_CONTROL):

$ devmem2 0x68010028 w
/dev/mem opened.
Memory mapped at address 0xb6f87000.
Bus error

dmesg output:
[  172.923553] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6f87028
[  172.930664] In-band Error seen by MPU  at address 0
[  172.937408] ------------[ cut here ]------------
[  172.944061] WARNING: CPU: 0 PID: 612 at drivers/bus/omap_l3_smx.c:166 omap3_l3_app_irq+0xd4/0x118()
[  172.957733] Modules linked in: wl1251_spi wl1251 crc7 mac80211 cfg80211 isp1704_charger 
lis3lv02d_i2c lis3lv02d input_polldev bq2415x_charger si4713 bq27x00_battery v4l2_common videodev 
media leds_lp5523 omap_ssi hsi leds_lp55xx_common tsl2563 rtc_twl rx51_battery twl4030_vibra 
ff_memless tsc2005 twl4030_wdt omap_wdt
[  172.987762] CPU: 0 PID: 612 Comm: devmem2 Tainted: G        W      3.19.0-rc5+ #297
[  172.995483] Hardware name: Nokia RX-51 board
[  173.002960] [<c0012678>] (unwind_backtrace) from [<c0010d44>] (show_stack+0x10/0x14)
[  173.010589] [<c0010d44>] (show_stack) from [<c0032194>] (warn_slowpath_common+0x84/0xac)
[  173.018280] [<c0032194>] (warn_slowpath_common) from [<c00321d4>] (warn_slowpath_null+0x18/0x1c)
[  173.026123] [<c00321d4>] (warn_slowpath_null) from [<c01f30e0>] (omap3_l3_app_irq+0xd4/0x118)
[  173.034057] [<c01f30e0>] (omap3_l3_app_irq) from [<c0061fd0>] (handle_irq_event_percpu+0xcc/0x294)
[  173.049987] [<c0061fd0>] (handle_irq_event_percpu) from [<c00621fc>] (handle_irq_event+0x64/0x8c)
[  173.058441] [<c00621fc>] (handle_irq_event) from [<c0064b9c>] (handle_level_irq+0xcc/0x130)
[  173.066925] [<c0064b9c>] (handle_level_irq) from [<c0061850>] (generic_handle_irq+0x20/0x30)
[  173.075592] [<c0061850>] (generic_handle_irq) from [<c0061a38>] (__handle_domain_irq+0x80/0xa4)
[  173.084411] [<c0061a38>] (__handle_domain_irq) from [<c0008520>] (omap_intc_handle_irq+0x78/0xa4)
[  173.093383] [<c0008520>] (omap_intc_handle_irq) from [<c0409380>] (__irq_svc+0x40/0x74)
[  173.102508] Exception stack(0xc6f2ff58 to 0xc6f2ffa0)
[  173.111755] ff40:                                                       00000000 00000001
[  173.121154] ff60: 60000010 00000001 c6f2ffb0 00000000 ffffffff 10c5387d 00000000 c6f2e000
[  173.130554] ff80: 00000000 bea65b44 00000000 c6f2ffa0 c000dc20 c00104a0 20000113 ffffffff
[  173.140136] [<c0409380>] (__irq_svc) from [<c00104a0>] (do_work_pending+0x34/0xb4)
[  173.149902] [<c00104a0>] (do_work_pending) from [<c000dc20>] (work_pending+0xc/0x20)
[  173.159759] ---[ end trace 124b6fccc1bf5a71 ]---

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-24 10:40                     ` Pali Rohár
@ 2015-01-31 14:38                       ` Matthijs van Duin
  2015-01-31 19:09                         ` Pali Rohár
  0 siblings, 1 reply; 87+ messages in thread
From: Matthijs van Duin @ 2015-01-31 14:38 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Joel Fernandes, Tony Lindgren, linux-omap, Aaro Koskinen,
	Nishanth Menon, Pavel Machek, Sebastian Reichel

On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren wrote:
> > > > I asked Pali to send me his copy of the updated NOLO
> > > > bootloader, so that I can test this. I just checked the
> > > > omap documentation (I only have access to the public
> > > > one) and crypto related stuff is not documented for the
> > > > L3_PM_READ_PERMISSION register. There are a couple of
> > > > reserved bits, which may be used for this, though.
> > > >
> > > > I also CC'd Joel Fernandes, since he worked on the
> > > > driver before and may have access to the documentation.
> > >
> > > Looks like at least the 36xx public version referenced
> > > here has them:
> > >
> > > http://www.spinics.net/lists/linux-omap/msg21857.html
> > >
> > > I'd assume the registers are the same for 34xx since we
> > > don't have them defined separately in the kernel.
> >
> > I can't find it in the omap36xx documentation either. Maybe
> > I search at the wrong position.

I've checked a few of the oldest (in case it was later removed) and
newest (in case it was later added) omap3-series public TRMs I have,
none of them list the aes module or associated interconnect info. The
region is either "reserved" or just silently skipped over. The
practice of pretending something doesn't exist in the TRM while
simultaneously releasing a linux driver continues to puzzle me.

> > I tried to find something crypto related in
> >
> > Table 9-89. L3_PM_READ_PERMISSION_i
>
> Hmm maybe it's done based on the address in
> L3_PM_ADDR_MATCH_k?

According to the address (aes@480c5000) it's attached to the L4-Core
interconnect, so why would an L3 firewall be involved?  Its access
control would be configured in the L4-Core AP (2KB @ 0x48040000), and
since they have an integrated memory map you'd automatically know
which entry is responsible, assuming you can access the AP at all.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-31 11:34                                   ` Pali Rohár
@ 2015-01-31 15:13                                     ` Matthijs van Duin
  2015-01-31 19:06                                       ` Pali Rohár
  2015-02-11 12:39                                       ` Pali Rohár
  0 siblings, 2 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-01-31 15:13 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Tony Lindgren, Sebastian Reichel, linux-omap, Aaro Koskinen,
	Pavel Machek, Nishanth Menon

On 31 January 2015 at 12:34, Pali Rohár <pali.rohar@gmail.com> wrote:
> [  172.923553] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6f87028
> [  172.930664] In-band Error seen by MPU  at address 0

Also, why is this error so uninformative?  A synchronous abort should
at least mention the _physical_ address, type of access, etc.

Anyhow, since checking the firewalls/APs to see if you have permission
will probably only get you yet another fault if things are walled off,
the robust way of dealing with this sort of situation is by probing
the device with a read while trapping bus faults. This also handles
modules that are unreachable for other reasons, e.g. being disabled by
eFuse.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-31 15:13                                     ` Matthijs van Duin
@ 2015-01-31 19:06                                       ` Pali Rohár
  2015-02-11 12:39                                       ` Pali Rohár
  1 sibling, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-01-31 19:06 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Tony Lindgren, Sebastian Reichel, linux-omap, Aaro Koskinen,
	Pavel Machek, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 1107 bytes --]

On Saturday 31 January 2015 16:13:39 Matthijs van Duin wrote:
> On 31 January 2015 at 12:34, Pali Rohár <pali.rohar@gmail.com> 
wrote:
> > [  172.923553] Unhandled fault: external abort on
> > non-linefetch (0x1018) at 0xb6f87028 [  172.930664] In-band
> > Error seen by MPU  at address 0
> 
> Also, why is this error so uninformative?  A synchronous abort
> should at least mention the _physical_ address, type of
> access, etc.
> 
> Anyhow, since checking the firewalls/APs to see if you have
> permission will probably only get you yet another fault if
> things are walled off, the robust way of dealing with this
> sort of situation is by probing the device with a read while
> trapping bus faults. This also handles modules that are
> unreachable for other reasons, e.g. being disabled by eFuse.

Just to note that above error output is from device where is 
signed X-Loader which *enable* omap aes support.

So it looks like it is not possible to dump registers which 
should tell you if kernel has permission or not (in L3 firewall).

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-31 14:38                       ` Matthijs van Duin
@ 2015-01-31 19:09                         ` Pali Rohár
  2015-02-01  1:36                           ` Matthijs van Duin
  0 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2015-01-31 19:09 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Joel Fernandes, Tony Lindgren, linux-omap, Aaro Koskinen,
	Nishanth Menon, Pavel Machek, Sebastian Reichel

[-- Attachment #1: Type: Text/Plain, Size: 2688 bytes --]

On Saturday 31 January 2015 15:38:28 Matthijs van Duin wrote:
> On Sunday 08 December 2013 00:22:06 Tony Lindgren wrote:
> > * Sebastian Reichel <sre@ring0.de> [131207 15:04]:
> > > On Sat, Dec 07, 2013 at 01:11:37PM -0800, Tony Lindgren 
> > > wrote:
> > > > > I asked Pali to send me his copy of the updated NOLO
> > > > > bootloader, so that I can test this. I just checked
> > > > > the omap documentation (I only have access to the
> > > > > public one) and crypto related stuff is not
> > > > > documented for the L3_PM_READ_PERMISSION register.
> > > > > There are a couple of reserved bits, which may be
> > > > > used for this, though.
> > > > > 
> > > > > I also CC'd Joel Fernandes, since he worked on the
> > > > > driver before and may have access to the
> > > > > documentation.
> > > > 
> > > > Looks like at least the 36xx public version referenced
> > > > here has them:
> > > > 
> > > > http://www.spinics.net/lists/linux-omap/msg21857.html
> > > > 
> > > > I'd assume the registers are the same for 34xx since we
> > > > don't have them defined separately in the kernel.
> > > 
> > > I can't find it in the omap36xx documentation either.
> > > Maybe I search at the wrong position.
> 
> I've checked a few of the oldest (in case it was later
> removed) and newest (in case it was later added) omap3-series
> public TRMs I have, none of them list the aes module or
> associated interconnect info. The region is either "reserved"
> or just silently skipped over. The practice of pretending
> something doesn't exist in the TRM while simultaneously
> releasing a linux driver continues to puzzle me.
> 
> > > I tried to find something crypto related in
> > > 
> > > Table 9-89. L3_PM_READ_PERMISSION_i
> > 
> > Hmm maybe it's done based on the address in
> > L3_PM_ADDR_MATCH_k?
> 
> According to the address (aes@480c5000) it's attached to the
> L4-Core interconnect, so why would an L3 firewall be
> involved?  Its access control would be configured in the
> L4-Core AP (2KB @ 0x48040000), and since they have an
> integrated memory map you'd automatically know which entry is
> responsible, assuming you can access the AP at all.

Do you have idea if it is possible to write such check in kernel 
if address (aes@480c5000) is readable or not?

I have configured two testing N900 devices. One with signed 
bootloader which enable omap aes support and one device with 
signed bootloader which does not enable omap aes support.

So I can run any code/kernel patch and compare results/dumps 
between those two devices.

Just I do not know what to do, or what to test...

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-31 19:09                         ` Pali Rohár
@ 2015-02-01  1:36                           ` Matthijs van Duin
  2015-02-01  8:56                             ` Pali Rohár
  0 siblings, 1 reply; 87+ messages in thread
From: Matthijs van Duin @ 2015-02-01  1:36 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Joel Fernandes, Tony Lindgren, linux-omap, Aaro Koskinen,
	Nishanth Menon, Pavel Machek, Sebastian Reichel

On 31 January 2015 at 20:06, Pali Rohár <pali.rohar@gmail.com> wrote:
> Just to note that above error output is from device where is
> signed X-Loader which *enable* omap aes support.
>
> So it looks like it is not possible to dump registers which
> should tell you if kernel has permission or not (in L3 firewall).

This is hardly surprising on a HS device.  If the firewalls are
actively being used to enforce access control rather than being left
wide open like they are by default on GP devices, the firewalls
themselves will *certainly* be firewalled off.

(Though they could have opted for granting read-only access to the L3
firewalls for inspection, as I mentioned it's the L4-Core firewall
which matters here and these do not even have the option of providing
read-only access.)


On 31 January 2015 at 20:09, Pali Rohár <pali.rohar@gmail.com> wrote:
> Do you have idea if it is possible to write such check in kernel
> if address (aes@480c5000) is readable or not?

Most of my experience with TI SoCs so far has actually been baremetal
programming, so I have to admit I'm not yet as familiar as I'd like
with how things are done in the linux kernel.

But if I'm reading arch/arm/mm/fault.c right, kernel level exception
handling is currently only invoked for translation faults and not for
synchronous external aborts. If that were fixed, then something like a
privileged version of __get_user_asm_word
(arch/arm/include/asm/uaccess.h) could be used.  Hmm, in fact, it
appears __get_user actually performs a privileged load unless
CONFIG_CPU_USE_DOMAINS is enabled... odd.

My intuition is that this shouldn't be very difficult, but invoking
fixup_exception for synchronous external aborts in addition to
translation faults would be a semantic change with kernel-wide scope,
so I don't know if this might be a big no-no for non-obvious reasons.


> I have configured two testing N900 devices. One with signed
> bootloader which enable omap aes support and one device with
> signed bootloader which does not enable omap aes support.

I'm probably missing some context here, but why not just use the one
with aes support? Alternatively, one may argue that it's the
bootloader's job to provide the kernel with an accurate device tree.
(Though one may equally well argue that it would be nice to avoid
having to customize the device tree for every feature-flavor of a
processor, especially if this depends on how it's initialized.)

BTW, I noticed they actually do list the A/P regions and default
protection groups in the TRMs, including the "secret" ones. It's not
presented in a very readable form though, so I've converted the one
for L4-Core into yet another addition to my spreadsheet collection:
https://docs.google.com/spreadsheets/d/1HhK3bIoaDzJoEGW1zDO5yTRQMdqC8NftWHHUCS5QVmA/view
I started filling in the descriptions but then got lazy :P  If anyone
wants edit permissions just mail me.  I already noticed one
inconsistency though (it claims 48308800-FFF is valid and 48308000-7FF
is not, while the TRM elsewhere claims the opposite) so it would still
be better if someone with access to a GP omap34xx/omap35xx dumps the
actual tables from hardware to avoid human error.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-01  1:36                           ` Matthijs van Duin
@ 2015-02-01  8:56                             ` Pali Rohár
  2015-02-11 20:43                               ` Pavel Machek
  0 siblings, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2015-02-01  8:56 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Joel Fernandes, Tony Lindgren, linux-omap, Aaro Koskinen,
	Nishanth Menon, Pavel Machek, Sebastian Reichel

[-- Attachment #1: Type: Text/Plain, Size: 1614 bytes --]

On Sunday 01 February 2015 02:36:06 Matthijs van Duin wrote:
> On 31 January 2015 at 20:06, Pali Rohár <pali.rohar@gmail.com> 
> wrote:
> > I have configured two testing N900 devices. One with signed
> > bootloader which enable omap aes support and one device with
> > signed bootloader which does not enable omap aes support.
> 
> I'm probably missing some context here, but why not just use
> the one with aes support? Alternatively, one may argue that
> it's the bootloader's job to provide the kernel with an
> accurate device tree. (Though one may equally well argue that
> it would be nice to avoid having to customize the device tree
> for every feature-flavor of a processor, especially if this
> depends on how it's initialized.)
> 

Nokia X-Loader is closed source and signed. So we cannot modify 
it. And it is responsible for configuring L3/L4 firewall.

Year ago it was possible to find on internet signed X-Loader for 
N900 which enable omap aes support (for testing purpose together 
with open source linux kernel modules), but it is unofficial and 
I think there only too few people who flashed it into N900 nand. 
If somebody needs binaries I have backup all of them.

More info about that aes enabled X-Loader:
http://maemo.org/community/maemo-developers/n900_aes_and_sha1-md5_hw_acceleration_drivers/

Majority of users use only official X-Loader which does not 
enable aes support so we cannot enable kernel modules (cause 
crashes). And also we cannot force users to flash some unofficial 
binary into their device...

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot
  2013-12-07  0:00   ` Sebastian Reichel
  2013-12-07  0:38     ` Tony Lindgren
  2013-12-07  8:18     ` Pali Rohár
@ 2015-02-09 11:55     ` Pali Rohár
  2 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-09 11:55 UTC (permalink / raw)
  To: Sebastian Reichel
  Cc: Tony Lindgren, linux-omap, Aaro Koskinen, Pavel Machek,
	Nishanth Menon, linux-kernel, Matthijs van Duin, Ivaylo Dimitrov

[-- Attachment #1: Type: Text/Plain, Size: 3848 bytes --]

On Saturday 07 December 2013 01:00:27 Sebastian Reichel wrote:
> On Fri, Dec 06, 2013 at 02:27:25PM -0800, Tony Lindgren wrote:
> > * Sebastian Reichel <sre@debian.org> [131206 13:37]:
> > > Nokia N900 DT boot breaks for me using 3.13-rc3. You can
> > > see the relevant kernel output below. Disabling the AES
> > > module in the omap3-n900.dts with status = "disabled"
> > > fixed the boot for me.
> > 
> > OK thanks for letting me know. How about the following patch
> > to fix it?
> 
> That's basically what I did to fix the problem.
> 
> I guess the proper fix would be a runtime check if the device
> can be accessed (if that's possible). AFAIK it is possible to
> use the AES module on the N900 if the bootloader is slightly
> patched.
> 
> Pali, can you elaborate more about this? I've seen, that you
> added a section about this on [0].
> 
> [0] http://elinux.org/N900#M-Shield
> 
> -- Sebastian
> 
> > Unhandled fault: external abort on non-linefetch (0x1028) at
> > 0xfa0c5048 SMP ARM
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W   
> > 3.13.0-rc2+ #446 task: ce057b40 ti: ce058000 task.ti:
> > ce058000
> > PC is at omap_aes_dma_stop+0x24/0x3c
> > LR is at omap_aes_probe+0x1cc/0x584
> > 
> >    psr: 60000113
> > 
> > sp : ce059e20  ip : ce0b4ee0  fp : 00000000
> > r10: c0573ae8  r9 : c0749508  r8 : 00000000
> > r7 : ce0b4e00  r6 : 00000000  r5 : ce0b4e10  r4 : ce274890
> > r3 : fa0c5048  r2 : 00000048  r1 : 0000002c  r0 : ce274890
> > Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment
> > kernel Control: 10c5387d  Table: 80004019  DAC: 00000015
> > Process swapper/0 (pid: 1, stack limit = 0xce058248)
> > Stack: (0xce059e20 to 0xce05a000)
> > 9e20: c0749508 0000a1ff 00000000 c016cd8c c06b5a06 ce2a45f0
> > ce2a4570 ce0b5fb0 9e40: 00000000 480c5000 480c504f c0abe4e4
> > 00000200 00000000 00000000 00000000 9e60: ce0b4e10 ce0b4e10
> > c082da3c c082da3c c02b8c70 c077c610 c0749508 00000000 9e80:
> > 00000000 c02b9e7c c02b9e64 ce0b4e10 00000000 c02b8b20
> > ce0b4e10 ce0b4e44 9ea0: c082da3c c02b8cd8 00000000 ce059eb8
> > c082da3c c02b7408 ce079edc ce0b1a34 9ec0: c082da3c c082da3c
> > ce2a0280 00000000 c08158d8 c02b8358 c0663405 c0663405 9ee0:
> > 00000073 c082da3c c079e4e8 c07ab3bc c0844340 c02b9334
> > 00000000 00000006 9f00: c079e4e8 c0008920 c067f6bf c0ac7c6b
> > 00000000 c0712e28 00000000 00000000 9f20: c0712e38 ce059f38
> > 00000093 c0ac7c82 00000000 c0058994 00000000 c07130e8 9f40:
> > c07127b8 00000093 00000006 00000006 00000001 00000006
> > 00000006 c079e4e8 9f60: c07ab3bc c0844340 00000093 c0749508
> > c079e4f4 c0749c64 00000006 00000006 9f80: c0749508 00000000
> > 00000000 c0517e2c 00000000 00000000 00000000 00000000 9fa0:
> > 00000000 c0517e34 00000000 c000dfb8 00000000 00000000
> > 00000000 00000000 9fc0: 00000000 00000000 00000000 00000000
> > 00000000 00000000 00000000 00000000 9fe0: 00000000 00000000
> > 00000000 00000000 00000013 00000000 ffffffff ffffffff
> > (omap_aes_probe+0x1cc/0x584)
> > (platform_drv_probe+0x18/0x48)
> > (driver_probe_device+0xb0/0x200)
> > (__driver_attach+0x68/0x8c)
> > (bus_for_each_dev+0x50/0x88)
> > (bus_add_driver+0xcc/0x1c8)
> > (driver_register+0x9c/0xe0)
> > (do_one_initcall+0x98/0x140)
> > (kernel_init_freeable+0x16c/0x23c)
> > (kernel_init+0x8/0x100)
> > (ret_from_fork+0x14/0x3c)
> > Code: e1811002 e5932020 e590300c e0833002 (e593c000)

Sebastian,

can you try to add #define DEBUG into drivers/crypto/omap-aes.c, 
enable AES in n900 DT and provide full dmesg output (with kernel 
crash) which occur on your device? I think you get that previous 
output from serial console, right? Also your compiled omap-aes.ko 
module (with debug symbols) and your modified source code could 
help.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-01-31 15:13                                     ` Matthijs van Duin
  2015-01-31 19:06                                       ` Pali Rohár
@ 2015-02-11 12:39                                       ` Pali Rohár
  2015-02-11 15:22                                         ` Matthijs van Duin
  1 sibling, 1 reply; 87+ messages in thread
From: Pali Rohár @ 2015-02-11 12:39 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Tony Lindgren, Sebastian Reichel, linux-omap, Aaro Koskinen,
	Pavel Machek, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 974 bytes --]

On Saturday 31 January 2015 16:13:39 Matthijs van Duin wrote:
> On 31 January 2015 at 12:34, Pali Rohár <pali.rohar@gmail.com> 
wrote:
> > [  172.923553] Unhandled fault: external abort on
> > non-linefetch (0x1018) at 0xb6f87028 [  172.930664] In-band
> > Error seen by MPU  at address 0
> 
> Also, why is this error so uninformative?  A synchronous abort
> should at least mention the _physical_ address, type of
> access, etc.
> 
> Anyhow, since checking the firewalls/APs to see if you have
> permission will probably only get you yet another fault if
> things are walled off, the robust way of dealing with this
> sort of situation is by probing the device with a read while
> trapping bus faults. This also handles modules that are
> unreachable for other reasons, e.g. being disabled by eFuse.

It is possible to patch kernel code to mask or ignore that fault? 
Can you help me with something like that?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 12:39                                       ` Pali Rohár
@ 2015-02-11 15:22                                         ` Matthijs van Duin
  2015-02-11 20:28                                           ` Pali Rohár
  2015-02-19 18:20                                             ` Pali Rohár
  0 siblings, 2 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-02-11 15:22 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Tony Lindgren, Sebastian Reichel, linux-omap, Aaro Koskinen,
	Pavel Machek, Nishanth Menon

On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com> wrote:
>> Anyhow, since checking the firewalls/APs to see if you have
>> permission will probably only get you yet another fault if
>> things are walled off, the robust way of dealing with this
>> sort of situation is by probing the device with a read while
>> trapping bus faults. This also handles modules that are
>> unreachable for other reasons, e.g. being disabled by eFuse.
>
> It is possible to patch kernel code to mask or ignore that fault?
> Can you help me with something like that?

As I mentioned, I'm still learning my way around the kernel, so I
don't feel very comfortable suggesting a concrete patch just yet. I've
been browsing arch/arm/mm/ however and my impression is that all that
would be required is editing fault.c by making a copy of do_bad but
containing
    return user_mode(regs) || !fixup_exception(regs);
and hook it onto the appropriate fault codes.  However, this really
needs the opinion of someone more familiar with this code.

I do have an observation to make on the issue of fault decoding: the
list in fsr-2level.c may be "standard ARMv3 and ARMv4 aborts" but they
are quite wrong for ARMv7 which has:

[ 0] -
[ 1] alignment fault
[ 2] debug event
[ 3] section access flag fault
[ 4] instruction cache maintainance fault (reported via data abort)
[ 5] section translation fault
[ 6] page access flag fault
[ 7] page translation fault
[ 8] bus error on access
[ 9] section domain fault
[10] -
[11] page domain fault
[12] bus error on section table walk
[13] section permission fault
[14] bus error on page table walk
[15] page permission fault
[16] (TLB conflict abort)
[17] -
[18] -
[19] -
[20] (lockdown abort)
[21] -
[22] async bus error (reported via data abort)
[23] -
[24] async parity/ECC error (reported via data abort)
[25] parity/ECC error on access
[26] (coprocessor abort)
[27] -
[28] parity/ECC error on section table walk
[29] -
[30] parity/ECC error on page table walk
[31] -

Some entries are patched up near the bottom of fault.c but many bogus
messages remain, for example the "on linefetch" vs "on non-linefetch"
is misleading since no such thing can be inferred from the fault
status on v7.  Also, the i-cache maintenance fault handling looks
wrong to me: it should fetch the actual fault status from IFSR (even
though the address still comes from DFSR) and dispatch based on that.

Async external aborts (async bus error and async parity/ECC error)
give you basically no info. DFAR will contain garbage hence displaying
it will confuse rather than enlighten, a traceback is pointless since
the instruction that caused the access is long retired, likewise
user_mode() doesn't matter since a transition to kernel space may have
happened after the access that cause the abort. Basically they should
be treated more as an IRQ than as a fault (note they can also be
masked just like irqs). In case of a bus error, it may be appropriate
to just warn about it, or perhaps send a signal to the current
process, although in the latter case it should have some means to
distinguish it from a synchronous bus error.

At least on the cortex-a8, a parity/ECC error (whether async or not)
is to be regarded as absolutely fatal.  Quoth the TRM: "No recovery is
possible. The abort handler must disable the caches, communicate the
fail directly with the external system, request a reboot."

Bit 10 no longer indicates an asynchronous (let alone imprecise)
fault.  Apart from the debug events and async aborts (and possibly
some implementation-defined aborts), all aborts listed are
synchronous, and DFAR/IFAR is valid. There's no technical obstruction
to make these trappable via the kernel exception handling mechanism.
(Though at least in case of parity/ECC errors one shouldn't.)
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 15:22                                         ` Matthijs van Duin
@ 2015-02-11 20:28                                           ` Pali Rohár
  2015-02-11 20:33                                             ` Tony Lindgren
  2015-02-11 20:40                                               ` Nishanth Menon
  2015-02-19 18:20                                             ` Pali Rohár
  1 sibling, 2 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-11 20:28 UTC (permalink / raw)
  To: Tony Lindgren, Nishanth Menon
  Cc: Matthijs van Duin, Sebastian Reichel, linux-omap, Aaro Koskinen,
	Pavel Machek, linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 4314 bytes --]

On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com> 
wrote:
> >> Anyhow, since checking the firewalls/APs to see if you have
> >> permission will probably only get you yet another fault if
> >> things are walled off, the robust way of dealing with this
> >> sort of situation is by probing the device with a read
> >> while trapping bus faults. This also handles modules that
> >> are unreachable for other reasons, e.g. being disabled by
> >> eFuse.
> > 
> > It is possible to patch kernel code to mask or ignore that
> > fault? Can you help me with something like that?
> 
> As I mentioned, I'm still learning my way around the kernel,
> so I don't feel very comfortable suggesting a concrete patch
> just yet. I've been browsing arch/arm/mm/ however and my
> impression is that all that would be required is editing
> fault.c by making a copy of do_bad but containing
>     return user_mode(regs) || !fixup_exception(regs);
> and hook it onto the appropriate fault codes.  However, this
> really needs the opinion of someone more familiar with this
> code.
> 
> I do have an observation to make on the issue of fault
> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> 
> [ 0] -
> [ 1] alignment fault
> [ 2] debug event
> [ 3] section access flag fault
> [ 4] instruction cache maintainance fault (reported via data
> abort) [ 5] section translation fault
> [ 6] page access flag fault
> [ 7] page translation fault
> [ 8] bus error on access
> [ 9] section domain fault
> [10] -
> [11] page domain fault
> [12] bus error on section table walk
> [13] section permission fault
> [14] bus error on page table walk
> [15] page permission fault
> [16] (TLB conflict abort)
> [17] -
> [18] -
> [19] -
> [20] (lockdown abort)
> [21] -
> [22] async bus error (reported via data abort)
> [23] -
> [24] async parity/ECC error (reported via data abort)
> [25] parity/ECC error on access
> [26] (coprocessor abort)
> [27] -
> [28] parity/ECC error on section table walk
> [29] -
> [30] parity/ECC error on page table walk
> [31] -
> 
> Some entries are patched up near the bottom of fault.c but
> many bogus messages remain, for example the "on linefetch" vs
> "on non-linefetch" is misleading since no such thing can be
> inferred from the fault status on v7.  Also, the i-cache
> maintenance fault handling looks wrong to me: it should fetch
> the actual fault status from IFSR (even though the address
> still comes from DFSR) and dispatch based on that.
> 
> Async external aborts (async bus error and async parity/ECC
> error) give you basically no info. DFAR will contain garbage
> hence displaying it will confuse rather than enlighten, a
> traceback is pointless since the instruction that caused the
> access is long retired, likewise user_mode() doesn't matter
> since a transition to kernel space may have happened after
> the access that cause the abort. Basically they should be
> treated more as an IRQ than as a fault (note they can also be
> masked just like irqs). In case of a bus error, it may be
> appropriate to just warn about it, or perhaps send a signal
> to the current process, although in the latter case it should
> have some means to distinguish it from a synchronous bus
> error.
> 
> At least on the cortex-a8, a parity/ECC error (whether async
> or not) is to be regarded as absolutely fatal.  Quoth the
> TRM: "No recovery is possible. The abort handler must disable
> the caches, communicate the fail directly with the external
> system, request a reboot."
> 
> Bit 10 no longer indicates an asynchronous (let alone
> imprecise) fault.  Apart from the debug events and async
> aborts (and possibly some implementation-defined aborts), all
> aborts listed are synchronous, and DFAR/IFAR is valid.
> There's no technical obstruction to make these trappable via
> the kernel exception handling mechanism. (Though at least in
> case of parity/ECC errors one shouldn't.)

Tony, Nishanth, or somebody else... can you help with memory 
management? Or do you know some expert for arch/arm/mm/ code?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 20:28                                           ` Pali Rohár
@ 2015-02-11 20:33                                             ` Tony Lindgren
  2015-02-11 20:40                                               ` Nishanth Menon
  1 sibling, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-02-11 20:33 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Nishanth Menon, Matthijs van Duin, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, linux-kernel

* Pali Rohár <pali.rohar@gmail.com> [150211 12:32]:
> On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> > On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com> 
> wrote:
> > >> Anyhow, since checking the firewalls/APs to see if you have
> > >> permission will probably only get you yet another fault if
> > >> things are walled off, the robust way of dealing with this
> > >> sort of situation is by probing the device with a read
> > >> while trapping bus faults. This also handles modules that
> > >> are unreachable for other reasons, e.g. being disabled by
> > >> eFuse.
> > > 
> > > It is possible to patch kernel code to mask or ignore that
> > > fault? Can you help me with something like that?
> > 
> > As I mentioned, I'm still learning my way around the kernel,
> > so I don't feel very comfortable suggesting a concrete patch
> > just yet. I've been browsing arch/arm/mm/ however and my
> > impression is that all that would be required is editing
> > fault.c by making a copy of do_bad but containing
> >     return user_mode(regs) || !fixup_exception(regs);
> > and hook it onto the appropriate fault codes.  However, this
> > really needs the opinion of someone more familiar with this
> > code.
> > 
> > I do have an observation to make on the issue of fault
> > decoding: the list in fsr-2level.c may be "standard ARMv3 and
> > ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> > 
> > [ 0] -
> > [ 1] alignment fault
> > [ 2] debug event
> > [ 3] section access flag fault
> > [ 4] instruction cache maintainance fault (reported via data
> > abort) [ 5] section translation fault
> > [ 6] page access flag fault
> > [ 7] page translation fault
> > [ 8] bus error on access
> > [ 9] section domain fault
> > [10] -
> > [11] page domain fault
> > [12] bus error on section table walk
> > [13] section permission fault
> > [14] bus error on page table walk
> > [15] page permission fault
> > [16] (TLB conflict abort)
> > [17] -
> > [18] -
> > [19] -
> > [20] (lockdown abort)
> > [21] -
> > [22] async bus error (reported via data abort)
> > [23] -
> > [24] async parity/ECC error (reported via data abort)
> > [25] parity/ECC error on access
> > [26] (coprocessor abort)
> > [27] -
> > [28] parity/ECC error on section table walk
> > [29] -
> > [30] parity/ECC error on page table walk
> > [31] -
> > 
> > Some entries are patched up near the bottom of fault.c but
> > many bogus messages remain, for example the "on linefetch" vs
> > "on non-linefetch" is misleading since no such thing can be
> > inferred from the fault status on v7.  Also, the i-cache
> > maintenance fault handling looks wrong to me: it should fetch
> > the actual fault status from IFSR (even though the address
> > still comes from DFSR) and dispatch based on that.
> > 
> > Async external aborts (async bus error and async parity/ECC
> > error) give you basically no info. DFAR will contain garbage
> > hence displaying it will confuse rather than enlighten, a
> > traceback is pointless since the instruction that caused the
> > access is long retired, likewise user_mode() doesn't matter
> > since a transition to kernel space may have happened after
> > the access that cause the abort. Basically they should be
> > treated more as an IRQ than as a fault (note they can also be
> > masked just like irqs). In case of a bus error, it may be
> > appropriate to just warn about it, or perhaps send a signal
> > to the current process, although in the latter case it should
> > have some means to distinguish it from a synchronous bus
> > error.
> > 
> > At least on the cortex-a8, a parity/ECC error (whether async
> > or not) is to be regarded as absolutely fatal.  Quoth the
> > TRM: "No recovery is possible. The abort handler must disable
> > the caches, communicate the fail directly with the external
> > system, request a reboot."
> > 
> > Bit 10 no longer indicates an asynchronous (let alone
> > imprecise) fault.  Apart from the debug events and async
> > aborts (and possibly some implementation-defined aborts), all
> > aborts listed are synchronous, and DFAR/IFAR is valid.
> > There's no technical obstruction to make these trappable via
> > the kernel exception handling mechanism. (Though at least in
> > case of parity/ECC errors one shouldn't.)
> 
> Tony, Nishanth, or somebody else... can you help with memory 
> management? Or do you know some expert for arch/arm/mm/ code?

Changing the abort handling should be discussed on the
linux-arm-kernel list. Probably best to play with that first
for a proof of concept patch :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 20:28                                           ` Pali Rohár
@ 2015-02-11 20:40                                               ` Nishanth Menon
  2015-02-11 20:40                                               ` Nishanth Menon
  1 sibling, 0 replies; 87+ messages in thread
From: Nishanth Menon @ 2015-02-11 20:40 UTC (permalink / raw)
  To: Pali Rohár, linux-arm-kernel
  Cc: Tony Lindgren, Matthijs van Duin, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml

On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
>> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com>
> wrote:
>> >> Anyhow, since checking the firewalls/APs to see if you have
>> >> permission will probably only get you yet another fault if
>> >> things are walled off, the robust way of dealing with this
>> >> sort of situation is by probing the device with a read
>> >> while trapping bus faults. This also handles modules that
>> >> are unreachable for other reasons, e.g. being disabled by
>> >> eFuse.
>> >
>> > It is possible to patch kernel code to mask or ignore that
>> > fault? Can you help me with something like that?
>>
>> As I mentioned, I'm still learning my way around the kernel,
>> so I don't feel very comfortable suggesting a concrete patch
>> just yet. I've been browsing arch/arm/mm/ however and my
>> impression is that all that would be required is editing
>> fault.c by making a copy of do_bad but containing
>>     return user_mode(regs) || !fixup_exception(regs);
>> and hook it onto the appropriate fault codes.  However, this
>> really needs the opinion of someone more familiar with this
>> code.
>>
>> I do have an observation to make on the issue of fault
>> decoding: the list in fsr-2level.c may be "standard ARMv3 and
>> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
>>
>> [ 0] -
>> [ 1] alignment fault
>> [ 2] debug event
>> [ 3] section access flag fault
>> [ 4] instruction cache maintainance fault (reported via data
>> abort) [ 5] section translation fault
>> [ 6] page access flag fault
>> [ 7] page translation fault
>> [ 8] bus error on access
>> [ 9] section domain fault
>> [10] -
>> [11] page domain fault
>> [12] bus error on section table walk
>> [13] section permission fault
>> [14] bus error on page table walk
>> [15] page permission fault
>> [16] (TLB conflict abort)
>> [17] -
>> [18] -
>> [19] -
>> [20] (lockdown abort)
>> [21] -
>> [22] async bus error (reported via data abort)
>> [23] -
>> [24] async parity/ECC error (reported via data abort)
>> [25] parity/ECC error on access
>> [26] (coprocessor abort)
>> [27] -
>> [28] parity/ECC error on section table walk
>> [29] -
>> [30] parity/ECC error on page table walk
>> [31] -
>>
>> Some entries are patched up near the bottom of fault.c but
>> many bogus messages remain, for example the "on linefetch" vs
>> "on non-linefetch" is misleading since no such thing can be
>> inferred from the fault status on v7.  Also, the i-cache
>> maintenance fault handling looks wrong to me: it should fetch
>> the actual fault status from IFSR (even though the address
>> still comes from DFSR) and dispatch based on that.
>>
>> Async external aborts (async bus error and async parity/ECC
>> error) give you basically no info. DFAR will contain garbage
>> hence displaying it will confuse rather than enlighten, a
>> traceback is pointless since the instruction that caused the
>> access is long retired, likewise user_mode() doesn't matter
>> since a transition to kernel space may have happened after
>> the access that cause the abort. Basically they should be
>> treated more as an IRQ than as a fault (note they can also be
>> masked just like irqs). In case of a bus error, it may be
>> appropriate to just warn about it, or perhaps send a signal
>> to the current process, although in the latter case it should
>> have some means to distinguish it from a synchronous bus
>> error.
>>
>> At least on the cortex-a8, a parity/ECC error (whether async
>> or not) is to be regarded as absolutely fatal.  Quoth the
>> TRM: "No recovery is possible. The abort handler must disable
>> the caches, communicate the fail directly with the external
>> system, request a reboot."
>>
>> Bit 10 no longer indicates an asynchronous (let alone
>> imprecise) fault.  Apart from the debug events and async
>> aborts (and possibly some implementation-defined aborts), all
>> aborts listed are synchronous, and DFAR/IFAR is valid.
>> There's no technical obstruction to make these trappable via
>> the kernel exception handling mechanism. (Though at least in
>> case of parity/ECC errors one shouldn't.)
>
> Tony, Nishanth, or somebody else... can you help with memory
> management? Or do you know some expert for arch/arm/mm/ code?

Folks in linux-arm-kernel are probably the right people, I suppose.
Looping them in.

-- 
---
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-02-11 20:40                                               ` Nishanth Menon
  0 siblings, 0 replies; 87+ messages in thread
From: Nishanth Menon @ 2015-02-11 20:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 11, 2015 at 2:28 PM, Pali Roh?r <pali.rohar@gmail.com> wrote:
> On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
>> On 11 February 2015 at 13:39, Pali Roh?r <pali.rohar@gmail.com>
> wrote:
>> >> Anyhow, since checking the firewalls/APs to see if you have
>> >> permission will probably only get you yet another fault if
>> >> things are walled off, the robust way of dealing with this
>> >> sort of situation is by probing the device with a read
>> >> while trapping bus faults. This also handles modules that
>> >> are unreachable for other reasons, e.g. being disabled by
>> >> eFuse.
>> >
>> > It is possible to patch kernel code to mask or ignore that
>> > fault? Can you help me with something like that?
>>
>> As I mentioned, I'm still learning my way around the kernel,
>> so I don't feel very comfortable suggesting a concrete patch
>> just yet. I've been browsing arch/arm/mm/ however and my
>> impression is that all that would be required is editing
>> fault.c by making a copy of do_bad but containing
>>     return user_mode(regs) || !fixup_exception(regs);
>> and hook it onto the appropriate fault codes.  However, this
>> really needs the opinion of someone more familiar with this
>> code.
>>
>> I do have an observation to make on the issue of fault
>> decoding: the list in fsr-2level.c may be "standard ARMv3 and
>> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
>>
>> [ 0] -
>> [ 1] alignment fault
>> [ 2] debug event
>> [ 3] section access flag fault
>> [ 4] instruction cache maintainance fault (reported via data
>> abort) [ 5] section translation fault
>> [ 6] page access flag fault
>> [ 7] page translation fault
>> [ 8] bus error on access
>> [ 9] section domain fault
>> [10] -
>> [11] page domain fault
>> [12] bus error on section table walk
>> [13] section permission fault
>> [14] bus error on page table walk
>> [15] page permission fault
>> [16] (TLB conflict abort)
>> [17] -
>> [18] -
>> [19] -
>> [20] (lockdown abort)
>> [21] -
>> [22] async bus error (reported via data abort)
>> [23] -
>> [24] async parity/ECC error (reported via data abort)
>> [25] parity/ECC error on access
>> [26] (coprocessor abort)
>> [27] -
>> [28] parity/ECC error on section table walk
>> [29] -
>> [30] parity/ECC error on page table walk
>> [31] -
>>
>> Some entries are patched up near the bottom of fault.c but
>> many bogus messages remain, for example the "on linefetch" vs
>> "on non-linefetch" is misleading since no such thing can be
>> inferred from the fault status on v7.  Also, the i-cache
>> maintenance fault handling looks wrong to me: it should fetch
>> the actual fault status from IFSR (even though the address
>> still comes from DFSR) and dispatch based on that.
>>
>> Async external aborts (async bus error and async parity/ECC
>> error) give you basically no info. DFAR will contain garbage
>> hence displaying it will confuse rather than enlighten, a
>> traceback is pointless since the instruction that caused the
>> access is long retired, likewise user_mode() doesn't matter
>> since a transition to kernel space may have happened after
>> the access that cause the abort. Basically they should be
>> treated more as an IRQ than as a fault (note they can also be
>> masked just like irqs). In case of a bus error, it may be
>> appropriate to just warn about it, or perhaps send a signal
>> to the current process, although in the latter case it should
>> have some means to distinguish it from a synchronous bus
>> error.
>>
>> At least on the cortex-a8, a parity/ECC error (whether async
>> or not) is to be regarded as absolutely fatal.  Quoth the
>> TRM: "No recovery is possible. The abort handler must disable
>> the caches, communicate the fail directly with the external
>> system, request a reboot."
>>
>> Bit 10 no longer indicates an asynchronous (let alone
>> imprecise) fault.  Apart from the debug events and async
>> aborts (and possibly some implementation-defined aborts), all
>> aborts listed are synchronous, and DFAR/IFAR is valid.
>> There's no technical obstruction to make these trappable via
>> the kernel exception handling mechanism. (Though at least in
>> case of parity/ECC errors one shouldn't.)
>
> Tony, Nishanth, or somebody else... can you help with memory
> management? Or do you know some expert for arch/arm/mm/ code?

Folks in linux-arm-kernel are probably the right people, I suppose.
Looping them in.

-- 
---
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-01  8:56                             ` Pali Rohár
@ 2015-02-11 20:43                               ` Pavel Machek
  2015-02-11 21:14                                 ` Pali Rohár
  0 siblings, 1 reply; 87+ messages in thread
From: Pavel Machek @ 2015-02-11 20:43 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Matthijs van Duin, Joel Fernandes, Tony Lindgren, linux-omap,
	Aaro Koskinen, Nishanth Menon, Sebastian Reichel

Hi!

On Sun 2015-02-01 09:56:28, Pali Rohár wrote:
> On Sunday 01 February 2015 02:36:06 Matthijs van Duin wrote:
> > On 31 January 2015 at 20:06, Pali Rohár <pali.rohar@gmail.com> 
> > wrote:
> > > I have configured two testing N900 devices. One with signed
> > > bootloader which enable omap aes support and one device with
> > > signed bootloader which does not enable omap aes support.
> > 
> > I'm probably missing some context here, but why not just use
> > the one with aes support? Alternatively, one may argue that
> > it's the bootloader's job to provide the kernel with an
> > accurate device tree. (Though one may equally well argue that
> > it would be nice to avoid having to customize the device tree
> > for every feature-flavor of a processor, especially if this
> > depends on how it's initialized.)
> > 
> 
> Nokia X-Loader is closed source and signed. So we cannot modify 
> it. And it is responsible for configuring L3/L4 firewall.
> 
> Year ago it was possible to find on internet signed X-Loader for 
> N900 which enable omap aes support (for testing purpose together 
> with open source linux kernel modules), but it is unofficial and 
> I think there only too few people who flashed it into N900 nand. 
> If somebody needs binaries I have backup all of them.
> 
> More info about that aes enabled X-Loader:
> http://maemo.org/community/maemo-developers/n900_aes_and_sha1-md5_hw_acceleration_drivers/
> 
> Majority of users use only official X-Loader which does not 
> enable aes support so we cannot enable kernel modules (cause 
> crashes). And also we cannot force users to flash some unofficial 
> binary into their device...

BTW... it would be interesting to know... are you doing some
heavy crypto processing on N900? Are the accelerated drivers faster
than non-accelerated ones? Because the link above says they are
slower...

# Eric Wheeler
# ...
# Hey guys, I have omap-aes compiled for my kernel and appears to be
# working.
# 
# I do not understand why non-accelerated software-crypto is faster than
# the omap-aes hardware acceleration:
# 
# mmcblk0 onboard 32GB: read=21.76MB/s write=12.97MB/s
# aes_generic crypto: read= 8.47 write= 5.54
# omap-aes hw crypto: read= 6.31 write= 5.45
# 
# mmcblk1 uSD 16GB Class 10: read=16.05MB/s write=16.47MB/s
# aes_generic crypto: read= 7.96 write= 7.43
# omap-aes hw crypto: read= 6.67 write= 7.18

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 20:43                               ` Pavel Machek
@ 2015-02-11 21:14                                 ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-11 21:14 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Matthijs van Duin, Joel Fernandes, Tony Lindgren, linux-omap,
	Aaro Koskinen, Nishanth Menon, Sebastian Reichel

[-- Attachment #1: Type: Text/Plain, Size: 2999 bytes --]

On Wednesday 11 February 2015 21:43:42 Pavel Machek wrote:
> Hi!
> 
> On Sun 2015-02-01 09:56:28, Pali Rohár wrote:
> > On Sunday 01 February 2015 02:36:06 Matthijs van Duin wrote:
> > > On 31 January 2015 at 20:06, Pali Rohár
> > > <pali.rohar@gmail.com>
> > > 
> > > wrote:
> > > > I have configured two testing N900 devices. One with
> > > > signed bootloader which enable omap aes support and one
> > > > device with signed bootloader which does not enable
> > > > omap aes support.
> > > 
> > > I'm probably missing some context here, but why not just
> > > use the one with aes support? Alternatively, one may
> > > argue that it's the bootloader's job to provide the
> > > kernel with an accurate device tree. (Though one may
> > > equally well argue that it would be nice to avoid having
> > > to customize the device tree for every feature-flavor of
> > > a processor, especially if this depends on how it's
> > > initialized.)
> > 
> > Nokia X-Loader is closed source and signed. So we cannot
> > modify it. And it is responsible for configuring L3/L4
> > firewall.
> > 
> > Year ago it was possible to find on internet signed X-Loader
> > for N900 which enable omap aes support (for testing purpose
> > together with open source linux kernel modules), but it is
> > unofficial and I think there only too few people who
> > flashed it into N900 nand. If somebody needs binaries I
> > have backup all of them.
> > 
> > More info about that aes enabled X-Loader:
> > http://maemo.org/community/maemo-developers/n900_aes_and_sha
> > 1-md5_hw_acceleration_drivers/
> > 
> > Majority of users use only official X-Loader which does not
> > enable aes support so we cannot enable kernel modules (cause
> > crashes). And also we cannot force users to flash some
> > unofficial binary into their device...
> 
> BTW... it would be interesting to know... are you doing some
> heavy crypto processing on N900? Are the accelerated drivers
> faster than non-accelerated ones? Because the link above says
> they are slower...
> 
> # Eric Wheeler
> # ...
> # Hey guys, I have omap-aes compiled for my kernel and appears
> to be # working.
> #
> # I do not understand why non-accelerated software-crypto is
> faster than # the omap-aes hardware acceleration:
> #
> # mmcblk0 onboard 32GB: read=21.76MB/s write=12.97MB/s
> # aes_generic crypto: read= 8.47 write= 5.54
> # omap-aes hw crypto: read= 6.31 write= 5.45
> #
> # mmcblk1 uSD 16GB Class 10: read=16.05MB/s write=16.47MB/s
> # aes_generic crypto: read= 7.96 write= 7.43
> # omap-aes hw crypto: read= 6.67 write= 7.18
> 
> Best regards,
> 									Pavel

Also depends on CPU usage. Same as there was preview of theora 
decoder. One neon optimized running at arm core provided better 
fps, but used 99%. Another implementation which used DSP core 
provided lower fps, but enough for watching videos and did not 
used full CPU usage...

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 20:40                                               ` Nishanth Menon
  (?)
@ 2015-02-18 21:14                                                 ` Pali Rohár
  -1 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-18 21:14 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nishanth Menon, Tony Lindgren, Matthijs van Duin,
	Sebastian Reichel, linux-omap, Aaro Koskinen, Pavel Machek, lkml

[-- Attachment #1: Type: Text/Plain, Size: 5318 bytes --]

On Wednesday 11 February 2015 21:40:33 Nishanth Menon wrote:
> On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár 
<pali.rohar@gmail.com> wrote:
> > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin 
wrote:
> >> On 11 February 2015 at 13:39, Pali Rohár
> >> <pali.rohar@gmail.com>
> > 
> > wrote:
> >> >> Anyhow, since checking the firewalls/APs to see if you
> >> >> have permission will probably only get you yet another
> >> >> fault if things are walled off, the robust way of
> >> >> dealing with this sort of situation is by probing the
> >> >> device with a read while trapping bus faults. This also
> >> >> handles modules that are unreachable for other reasons,
> >> >> e.g. being disabled by eFuse.
> >> > 
> >> > It is possible to patch kernel code to mask or ignore
> >> > that fault? Can you help me with something like that?
> >> 
> >> As I mentioned, I'm still learning my way around the
> >> kernel, so I don't feel very comfortable suggesting a
> >> concrete patch just yet. I've been browsing arch/arm/mm/
> >> however and my impression is that all that would be
> >> required is editing fault.c by making a copy of do_bad but
> >> containing
> >> 
> >>     return user_mode(regs) || !fixup_exception(regs);
> >> 
> >> and hook it onto the appropriate fault codes.  However,
> >> this really needs the opinion of someone more familiar
> >> with this code.
> >> 
> >> I do have an observation to make on the issue of fault
> >> decoding: the list in fsr-2level.c may be "standard ARMv3
> >> and ARMv4 aborts" but they are quite wrong for ARMv7 which
> >> has:
> >> 
> >> [ 0] -
> >> [ 1] alignment fault
> >> [ 2] debug event
> >> [ 3] section access flag fault
> >> [ 4] instruction cache maintainance fault (reported via
> >> data abort) [ 5] section translation fault
> >> [ 6] page access flag fault
> >> [ 7] page translation fault
> >> [ 8] bus error on access
> >> [ 9] section domain fault
> >> [10] -
> >> [11] page domain fault
> >> [12] bus error on section table walk
> >> [13] section permission fault
> >> [14] bus error on page table walk
> >> [15] page permission fault
> >> [16] (TLB conflict abort)
> >> [17] -
> >> [18] -
> >> [19] -
> >> [20] (lockdown abort)
> >> [21] -
> >> [22] async bus error (reported via data abort)
> >> [23] -
> >> [24] async parity/ECC error (reported via data abort)
> >> [25] parity/ECC error on access
> >> [26] (coprocessor abort)
> >> [27] -
> >> [28] parity/ECC error on section table walk
> >> [29] -
> >> [30] parity/ECC error on page table walk
> >> [31] -
> >> 
> >> Some entries are patched up near the bottom of fault.c but
> >> many bogus messages remain, for example the "on linefetch"
> >> vs "on non-linefetch" is misleading since no such thing
> >> can be inferred from the fault status on v7.  Also, the
> >> i-cache maintenance fault handling looks wrong to me: it
> >> should fetch the actual fault status from IFSR (even
> >> though the address still comes from DFSR) and dispatch
> >> based on that.
> >> 
> >> Async external aborts (async bus error and async parity/ECC
> >> error) give you basically no info. DFAR will contain
> >> garbage hence displaying it will confuse rather than
> >> enlighten, a traceback is pointless since the instruction
> >> that caused the access is long retired, likewise
> >> user_mode() doesn't matter since a transition to kernel
> >> space may have happened after the access that cause the
> >> abort. Basically they should be treated more as an IRQ
> >> than as a fault (note they can also be masked just like
> >> irqs). In case of a bus error, it may be appropriate to
> >> just warn about it, or perhaps send a signal to the
> >> current process, although in the latter case it should
> >> have some means to distinguish it from a synchronous bus
> >> error.
> >> 
> >> At least on the cortex-a8, a parity/ECC error (whether
> >> async or not) is to be regarded as absolutely fatal. 
> >> Quoth the TRM: "No recovery is possible. The abort handler
> >> must disable the caches, communicate the fail directly
> >> with the external system, request a reboot."
> >> 
> >> Bit 10 no longer indicates an asynchronous (let alone
> >> imprecise) fault.  Apart from the debug events and async
> >> aborts (and possibly some implementation-defined aborts),
> >> all aborts listed are synchronous, and DFAR/IFAR is valid.
> >> There's no technical obstruction to make these trappable
> >> via the kernel exception handling mechanism. (Though at
> >> least in case of parity/ECC errors one shouldn't.)
> > 
> > Tony, Nishanth, or somebody else... can you help with memory
> > management? Or do you know some expert for arch/arm/mm/
> > code?
> 
> Folks in linux-arm-kernel are probably the right people, I
> suppose. Looping them in.

Hi folks in linux-arm-kernel!

Can you help us with above problem? How to catch external abort 
on non-linefetch in kernel driver and prevent kernel panic?

Here is that kernel panic log: 
http://thread.gmane.org/gmane.linux.ports.arm.omap/108397/

We want to check for "Unhandled fault: external abort on non-
linefetch" and if it happens disable some functionality in kernel 
driver omap-aes.ko

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-02-18 21:14                                                 ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-18 21:14 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nishanth Menon, Tony Lindgren, Matthijs van Duin,
	Sebastian Reichel, linux-omap, Aaro Koskinen, Pavel Machek, lkml

[-- Attachment #1: Type: Text/Plain, Size: 5318 bytes --]

On Wednesday 11 February 2015 21:40:33 Nishanth Menon wrote:
> On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár 
<pali.rohar@gmail.com> wrote:
> > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin 
wrote:
> >> On 11 February 2015 at 13:39, Pali Rohár
> >> <pali.rohar@gmail.com>
> > 
> > wrote:
> >> >> Anyhow, since checking the firewalls/APs to see if you
> >> >> have permission will probably only get you yet another
> >> >> fault if things are walled off, the robust way of
> >> >> dealing with this sort of situation is by probing the
> >> >> device with a read while trapping bus faults. This also
> >> >> handles modules that are unreachable for other reasons,
> >> >> e.g. being disabled by eFuse.
> >> > 
> >> > It is possible to patch kernel code to mask or ignore
> >> > that fault? Can you help me with something like that?
> >> 
> >> As I mentioned, I'm still learning my way around the
> >> kernel, so I don't feel very comfortable suggesting a
> >> concrete patch just yet. I've been browsing arch/arm/mm/
> >> however and my impression is that all that would be
> >> required is editing fault.c by making a copy of do_bad but
> >> containing
> >> 
> >>     return user_mode(regs) || !fixup_exception(regs);
> >> 
> >> and hook it onto the appropriate fault codes.  However,
> >> this really needs the opinion of someone more familiar
> >> with this code.
> >> 
> >> I do have an observation to make on the issue of fault
> >> decoding: the list in fsr-2level.c may be "standard ARMv3
> >> and ARMv4 aborts" but they are quite wrong for ARMv7 which
> >> has:
> >> 
> >> [ 0] -
> >> [ 1] alignment fault
> >> [ 2] debug event
> >> [ 3] section access flag fault
> >> [ 4] instruction cache maintainance fault (reported via
> >> data abort) [ 5] section translation fault
> >> [ 6] page access flag fault
> >> [ 7] page translation fault
> >> [ 8] bus error on access
> >> [ 9] section domain fault
> >> [10] -
> >> [11] page domain fault
> >> [12] bus error on section table walk
> >> [13] section permission fault
> >> [14] bus error on page table walk
> >> [15] page permission fault
> >> [16] (TLB conflict abort)
> >> [17] -
> >> [18] -
> >> [19] -
> >> [20] (lockdown abort)
> >> [21] -
> >> [22] async bus error (reported via data abort)
> >> [23] -
> >> [24] async parity/ECC error (reported via data abort)
> >> [25] parity/ECC error on access
> >> [26] (coprocessor abort)
> >> [27] -
> >> [28] parity/ECC error on section table walk
> >> [29] -
> >> [30] parity/ECC error on page table walk
> >> [31] -
> >> 
> >> Some entries are patched up near the bottom of fault.c but
> >> many bogus messages remain, for example the "on linefetch"
> >> vs "on non-linefetch" is misleading since no such thing
> >> can be inferred from the fault status on v7.  Also, the
> >> i-cache maintenance fault handling looks wrong to me: it
> >> should fetch the actual fault status from IFSR (even
> >> though the address still comes from DFSR) and dispatch
> >> based on that.
> >> 
> >> Async external aborts (async bus error and async parity/ECC
> >> error) give you basically no info. DFAR will contain
> >> garbage hence displaying it will confuse rather than
> >> enlighten, a traceback is pointless since the instruction
> >> that caused the access is long retired, likewise
> >> user_mode() doesn't matter since a transition to kernel
> >> space may have happened after the access that cause the
> >> abort. Basically they should be treated more as an IRQ
> >> than as a fault (note they can also be masked just like
> >> irqs). In case of a bus error, it may be appropriate to
> >> just warn about it, or perhaps send a signal to the
> >> current process, although in the latter case it should
> >> have some means to distinguish it from a synchronous bus
> >> error.
> >> 
> >> At least on the cortex-a8, a parity/ECC error (whether
> >> async or not) is to be regarded as absolutely fatal. 
> >> Quoth the TRM: "No recovery is possible. The abort handler
> >> must disable the caches, communicate the fail directly
> >> with the external system, request a reboot."
> >> 
> >> Bit 10 no longer indicates an asynchronous (let alone
> >> imprecise) fault.  Apart from the debug events and async
> >> aborts (and possibly some implementation-defined aborts),
> >> all aborts listed are synchronous, and DFAR/IFAR is valid.
> >> There's no technical obstruction to make these trappable
> >> via the kernel exception handling mechanism. (Though at
> >> least in case of parity/ECC errors one shouldn't.)
> > 
> > Tony, Nishanth, or somebody else... can you help with memory
> > management? Or do you know some expert for arch/arm/mm/
> > code?
> 
> Folks in linux-arm-kernel are probably the right people, I
> suppose. Looping them in.

Hi folks in linux-arm-kernel!

Can you help us with above problem? How to catch external abort 
on non-linefetch in kernel driver and prevent kernel panic?

Here is that kernel panic log: 
http://thread.gmane.org/gmane.linux.ports.arm.omap/108397/

We want to check for "Unhandled fault: external abort on non-
linefetch" and if it happens disable some functionality in kernel 
driver omap-aes.ko

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-02-18 21:14                                                 ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-18 21:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 11 February 2015 21:40:33 Nishanth Menon wrote:
> On Wed, Feb 11, 2015 at 2:28 PM, Pali Roh?r 
<pali.rohar@gmail.com> wrote:
> > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin 
wrote:
> >> On 11 February 2015 at 13:39, Pali Roh?r
> >> <pali.rohar@gmail.com>
> > 
> > wrote:
> >> >> Anyhow, since checking the firewalls/APs to see if you
> >> >> have permission will probably only get you yet another
> >> >> fault if things are walled off, the robust way of
> >> >> dealing with this sort of situation is by probing the
> >> >> device with a read while trapping bus faults. This also
> >> >> handles modules that are unreachable for other reasons,
> >> >> e.g. being disabled by eFuse.
> >> > 
> >> > It is possible to patch kernel code to mask or ignore
> >> > that fault? Can you help me with something like that?
> >> 
> >> As I mentioned, I'm still learning my way around the
> >> kernel, so I don't feel very comfortable suggesting a
> >> concrete patch just yet. I've been browsing arch/arm/mm/
> >> however and my impression is that all that would be
> >> required is editing fault.c by making a copy of do_bad but
> >> containing
> >> 
> >>     return user_mode(regs) || !fixup_exception(regs);
> >> 
> >> and hook it onto the appropriate fault codes.  However,
> >> this really needs the opinion of someone more familiar
> >> with this code.
> >> 
> >> I do have an observation to make on the issue of fault
> >> decoding: the list in fsr-2level.c may be "standard ARMv3
> >> and ARMv4 aborts" but they are quite wrong for ARMv7 which
> >> has:
> >> 
> >> [ 0] -
> >> [ 1] alignment fault
> >> [ 2] debug event
> >> [ 3] section access flag fault
> >> [ 4] instruction cache maintainance fault (reported via
> >> data abort) [ 5] section translation fault
> >> [ 6] page access flag fault
> >> [ 7] page translation fault
> >> [ 8] bus error on access
> >> [ 9] section domain fault
> >> [10] -
> >> [11] page domain fault
> >> [12] bus error on section table walk
> >> [13] section permission fault
> >> [14] bus error on page table walk
> >> [15] page permission fault
> >> [16] (TLB conflict abort)
> >> [17] -
> >> [18] -
> >> [19] -
> >> [20] (lockdown abort)
> >> [21] -
> >> [22] async bus error (reported via data abort)
> >> [23] -
> >> [24] async parity/ECC error (reported via data abort)
> >> [25] parity/ECC error on access
> >> [26] (coprocessor abort)
> >> [27] -
> >> [28] parity/ECC error on section table walk
> >> [29] -
> >> [30] parity/ECC error on page table walk
> >> [31] -
> >> 
> >> Some entries are patched up near the bottom of fault.c but
> >> many bogus messages remain, for example the "on linefetch"
> >> vs "on non-linefetch" is misleading since no such thing
> >> can be inferred from the fault status on v7.  Also, the
> >> i-cache maintenance fault handling looks wrong to me: it
> >> should fetch the actual fault status from IFSR (even
> >> though the address still comes from DFSR) and dispatch
> >> based on that.
> >> 
> >> Async external aborts (async bus error and async parity/ECC
> >> error) give you basically no info. DFAR will contain
> >> garbage hence displaying it will confuse rather than
> >> enlighten, a traceback is pointless since the instruction
> >> that caused the access is long retired, likewise
> >> user_mode() doesn't matter since a transition to kernel
> >> space may have happened after the access that cause the
> >> abort. Basically they should be treated more as an IRQ
> >> than as a fault (note they can also be masked just like
> >> irqs). In case of a bus error, it may be appropriate to
> >> just warn about it, or perhaps send a signal to the
> >> current process, although in the latter case it should
> >> have some means to distinguish it from a synchronous bus
> >> error.
> >> 
> >> At least on the cortex-a8, a parity/ECC error (whether
> >> async or not) is to be regarded as absolutely fatal. 
> >> Quoth the TRM: "No recovery is possible. The abort handler
> >> must disable the caches, communicate the fail directly
> >> with the external system, request a reboot."
> >> 
> >> Bit 10 no longer indicates an asynchronous (let alone
> >> imprecise) fault.  Apart from the debug events and async
> >> aborts (and possibly some implementation-defined aborts),
> >> all aborts listed are synchronous, and DFAR/IFAR is valid.
> >> There's no technical obstruction to make these trappable
> >> via the kernel exception handling mechanism. (Though at
> >> least in case of parity/ECC errors one shouldn't.)
> > 
> > Tony, Nishanth, or somebody else... can you help with memory
> > management? Or do you know some expert for arch/arm/mm/
> > code?
> 
> Folks in linux-arm-kernel are probably the right people, I
> suppose. Looping them in.

Hi folks in linux-arm-kernel!

Can you help us with above problem? How to catch external abort 
on non-linefetch in kernel driver and prevent kernel panic?

Here is that kernel panic log: 
http://thread.gmane.org/gmane.linux.ports.arm.omap/108397/

We want to check for "Unhandled fault: external abort on non-
linefetch" and if it happens disable some functionality in kernel 
driver omap-aes.ko

-- 
Pali Roh?r
pali.rohar at gmail.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150218/c3e32be8/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 15:22                                         ` Matthijs van Duin
@ 2015-02-19 18:20                                             ` Pali Rohár
  2015-02-19 18:20                                             ` Pali Rohár
  1 sibling, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-19 18:20 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Tony Lindgren, Sebastian Reichel, linux-omap, Aaro Koskinen,
	Pavel Machek, Nishanth Menon, linux-arm-kernel

[-- Attachment #1: Type: Text/Plain, Size: 5123 bytes --]

On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com> wrote:
> >> Anyhow, since checking the firewalls/APs to see if you have
> >> permission will probably only get you yet another fault if
> >> things are walled off, the robust way of dealing with this
> >> sort of situation is by probing the device with a read
> >> while trapping bus faults. This also handles modules that
> >> are unreachable for other reasons, e.g. being disabled by
> >> eFuse.
> > 
> > It is possible to patch kernel code to mask or ignore that
> > fault? Can you help me with something like that?
> 
> As I mentioned, I'm still learning my way around the kernel,
> so I don't feel very comfortable suggesting a concrete patch
> just yet. I've been browsing arch/arm/mm/ however and my
> impression is that all that would be required is editing
> fault.c by making a copy of do_bad but containing
>     return user_mode(regs) || !fixup_exception(regs);
> and hook it onto the appropriate fault codes.  However, this
> really needs the opinion of someone more familiar with this
> code.
> 
> I do have an observation to make on the issue of fault
> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> 
> [ 0] -
> [ 1] alignment fault
> [ 2] debug event
> [ 3] section access flag fault
> [ 4] instruction cache maintainance fault (reported via data
> abort) [ 5] section translation fault
> [ 6] page access flag fault
> [ 7] page translation fault
> [ 8] bus error on access
> [ 9] section domain fault
> [10] -
> [11] page domain fault
> [12] bus error on section table walk
> [13] section permission fault
> [14] bus error on page table walk
> [15] page permission fault
> [16] (TLB conflict abort)
> [17] -
> [18] -
> [19] -
> [20] (lockdown abort)
> [21] -
> [22] async bus error (reported via data abort)
> [23] -
> [24] async parity/ECC error (reported via data abort)
> [25] parity/ECC error on access
> [26] (coprocessor abort)
> [27] -
> [28] parity/ECC error on section table walk
> [29] -
> [30] parity/ECC error on page table walk
> [31] -
> 
> Some entries are patched up near the bottom of fault.c but
> many bogus messages remain, for example the "on linefetch" vs
> "on non-linefetch" is misleading since no such thing can be
> inferred from the fault status on v7.  Also, the i-cache
> maintenance fault handling looks wrong to me: it should fetch
> the actual fault status from IFSR (even though the address
> still comes from DFSR) and dispatch based on that.
> 
> Async external aborts (async bus error and async parity/ECC
> error) give you basically no info. DFAR will contain garbage
> hence displaying it will confuse rather than enlighten, a
> traceback is pointless since the instruction that caused the
> access is long retired, likewise user_mode() doesn't matter
> since a transition to kernel space may have happened after
> the access that cause the abort. Basically they should be
> treated more as an IRQ than as a fault (note they can also be
> masked just like irqs). In case of a bus error, it may be
> appropriate to just warn about it, or perhaps send a signal
> to the current process, although in the latter case it should
> have some means to distinguish it from a synchronous bus
> error.
> 
> At least on the cortex-a8, a parity/ECC error (whether async
> or not) is to be regarded as absolutely fatal.  Quoth the
> TRM: "No recovery is possible. The abort handler must disable
> the caches, communicate the fail directly with the external
> system, request a reboot."
> 
> Bit 10 no longer indicates an asynchronous (let alone
> imprecise) fault.  Apart from the debug events and async
> aborts (and possibly some implementation-defined aborts), all
> aborts listed are synchronous, and DFAR/IFAR is valid.
> There's no technical obstruction to make these trappable via
> the kernel exception handling mechanism. (Though at least in
> case of parity/ECC errors one shouldn't.)

Anyway, in Nokia Harmattan N9/N950 2.6.32 kernel is this patch:

diff --git a/arch/arm/mm/fsr-2level.c b/arch/arm/mm/fsr-2level.c
index 18ca74c..d530d55 100644
--- a/arch/arm/mm/fsr-2level.c
+++ b/arch/arm/mm/fsr-2level.c
@@ -7,7 +7,12 @@ static struct fsr_info fsr_info[] = {
 	{ do_bad,		SIGBUS,	 BUS_ADRALN,	"alignment exception"		   },
 	{ do_bad,		SIGKILL, 0,		"terminal exception"		   },
 	{ do_bad,		SIGBUS,	 BUS_ADRALN,	"alignment exception"		   },
+/* Do we need runtime check ? */
+#if __LINUX_ARM_ARCH__ < 6
 	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
+#else
+	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"I-cache maintenance fault"	   },
+#endif
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"section translation fault"	   },
 	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
 	{ do_page_fault,	SIGSEGV, SEGV_MAPERR,	"page translation fault"	   },

Maybe it is related?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-02-19 18:20                                             ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-02-19 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> On 11 February 2015 at 13:39, Pali Roh?r <pali.rohar@gmail.com> wrote:
> >> Anyhow, since checking the firewalls/APs to see if you have
> >> permission will probably only get you yet another fault if
> >> things are walled off, the robust way of dealing with this
> >> sort of situation is by probing the device with a read
> >> while trapping bus faults. This also handles modules that
> >> are unreachable for other reasons, e.g. being disabled by
> >> eFuse.
> > 
> > It is possible to patch kernel code to mask or ignore that
> > fault? Can you help me with something like that?
> 
> As I mentioned, I'm still learning my way around the kernel,
> so I don't feel very comfortable suggesting a concrete patch
> just yet. I've been browsing arch/arm/mm/ however and my
> impression is that all that would be required is editing
> fault.c by making a copy of do_bad but containing
>     return user_mode(regs) || !fixup_exception(regs);
> and hook it onto the appropriate fault codes.  However, this
> really needs the opinion of someone more familiar with this
> code.
> 
> I do have an observation to make on the issue of fault
> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> 
> [ 0] -
> [ 1] alignment fault
> [ 2] debug event
> [ 3] section access flag fault
> [ 4] instruction cache maintainance fault (reported via data
> abort) [ 5] section translation fault
> [ 6] page access flag fault
> [ 7] page translation fault
> [ 8] bus error on access
> [ 9] section domain fault
> [10] -
> [11] page domain fault
> [12] bus error on section table walk
> [13] section permission fault
> [14] bus error on page table walk
> [15] page permission fault
> [16] (TLB conflict abort)
> [17] -
> [18] -
> [19] -
> [20] (lockdown abort)
> [21] -
> [22] async bus error (reported via data abort)
> [23] -
> [24] async parity/ECC error (reported via data abort)
> [25] parity/ECC error on access
> [26] (coprocessor abort)
> [27] -
> [28] parity/ECC error on section table walk
> [29] -
> [30] parity/ECC error on page table walk
> [31] -
> 
> Some entries are patched up near the bottom of fault.c but
> many bogus messages remain, for example the "on linefetch" vs
> "on non-linefetch" is misleading since no such thing can be
> inferred from the fault status on v7.  Also, the i-cache
> maintenance fault handling looks wrong to me: it should fetch
> the actual fault status from IFSR (even though the address
> still comes from DFSR) and dispatch based on that.
> 
> Async external aborts (async bus error and async parity/ECC
> error) give you basically no info. DFAR will contain garbage
> hence displaying it will confuse rather than enlighten, a
> traceback is pointless since the instruction that caused the
> access is long retired, likewise user_mode() doesn't matter
> since a transition to kernel space may have happened after
> the access that cause the abort. Basically they should be
> treated more as an IRQ than as a fault (note they can also be
> masked just like irqs). In case of a bus error, it may be
> appropriate to just warn about it, or perhaps send a signal
> to the current process, although in the latter case it should
> have some means to distinguish it from a synchronous bus
> error.
> 
> At least on the cortex-a8, a parity/ECC error (whether async
> or not) is to be regarded as absolutely fatal.  Quoth the
> TRM: "No recovery is possible. The abort handler must disable
> the caches, communicate the fail directly with the external
> system, request a reboot."
> 
> Bit 10 no longer indicates an asynchronous (let alone
> imprecise) fault.  Apart from the debug events and async
> aborts (and possibly some implementation-defined aborts), all
> aborts listed are synchronous, and DFAR/IFAR is valid.
> There's no technical obstruction to make these trappable via
> the kernel exception handling mechanism. (Though at least in
> case of parity/ECC errors one shouldn't.)

Anyway, in Nokia Harmattan N9/N950 2.6.32 kernel is this patch:

diff --git a/arch/arm/mm/fsr-2level.c b/arch/arm/mm/fsr-2level.c
index 18ca74c..d530d55 100644
--- a/arch/arm/mm/fsr-2level.c
+++ b/arch/arm/mm/fsr-2level.c
@@ -7,7 +7,12 @@ static struct fsr_info fsr_info[] = {
 	{ do_bad,		SIGBUS,	 BUS_ADRALN,	"alignment exception"		   },
 	{ do_bad,		SIGKILL, 0,		"terminal exception"		   },
 	{ do_bad,		SIGBUS,	 BUS_ADRALN,	"alignment exception"		   },
+/* Do we need runtime check ? */
+#if __LINUX_ARM_ARCH__ < 6
 	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
+#else
+	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"I-cache maintenance fault"	   },
+#endif
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"section translation fault"	   },
 	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
 	{ do_page_fault,	SIGSEGV, SEGV_MAPERR,	"page translation fault"	   },

Maybe it is related?

-- 
Pali Roh?r
pali.rohar@gmail.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150219/dcb50f41/attachment.sig>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-19 18:20                                             ` Pali Rohár
@ 2015-02-19 20:25                                               ` Matthijs van Duin
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-02-19 20:25 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Tony Lindgren, Sebastian Reichel, linux-omap, Aaro Koskinen,
	Pavel Machek, Nishanth Menon, linux-arm-kernel

On 18 February 2015 at 22:14, Pali Rohár <pali.rohar@gmail.com> wrote:
> Can you help us with above problem? How to catch external abort
> on non-linefetch in kernel driver and prevent kernel panic?

Actually it's a synchronous bus error that you want to catch, which
however is misreported by linux as "external abort on non-linefetch"
(... but a bus error on a linefetch would produce exactly the same
error).  Also, ARM apparently uses the term "external abort" as
umbrella term for aborts triggered outside the MMU, which includes not
just bus errors but also (uncorrectable) parity/ECC errors.

Anyhow, the core question you mean to ask is: can the "exception"
mechanism current already in place to trap MMU faults in e.g.
put_user() easily be extended to allow drivers to trap synchronous bus
errors?  My impression is that this would in fact be quite easy and I
even outlined a suggested patch, but I'm still a kernel newbie so I
may be way off course.

Although its main use would be for auto-probing, it's maybe worth
mentioning I've met at least one peripheral which also reports bus
errors when writing inappropriate/unsupported *values* to a register.
(Of course when using posted writes you won't get an abort anyhow in
that case, it's only reported via interconnect error logs.)


On 19 February 2015 at 19:20, Pali Rohár <pali.rohar@gmail.com> wrote:
> Anyway, in Nokia Harmattan N9/N950 2.6.32 kernel is this patch

In mainline linux the same fix-up is done at runtime rather than
compile time (in exceptions_init() at the bottom of fault.c). Either
way, in my post of the 11th I also mentioned that it looks wrong to
me. I-cache maintenance fault is really a special case in the fault
decoding logic since it means "although you got here via DAbort and
the relevant address is in DFAR, the exception happened on the
instruction side so you need to fetch the fault status from IFSR
instead."
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-02-19 20:25                                               ` Matthijs van Duin
  0 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-02-19 20:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 February 2015 at 22:14, Pali Roh?r <pali.rohar@gmail.com> wrote:
> Can you help us with above problem? How to catch external abort
> on non-linefetch in kernel driver and prevent kernel panic?

Actually it's a synchronous bus error that you want to catch, which
however is misreported by linux as "external abort on non-linefetch"
(... but a bus error on a linefetch would produce exactly the same
error).  Also, ARM apparently uses the term "external abort" as
umbrella term for aborts triggered outside the MMU, which includes not
just bus errors but also (uncorrectable) parity/ECC errors.

Anyhow, the core question you mean to ask is: can the "exception"
mechanism current already in place to trap MMU faults in e.g.
put_user() easily be extended to allow drivers to trap synchronous bus
errors?  My impression is that this would in fact be quite easy and I
even outlined a suggested patch, but I'm still a kernel newbie so I
may be way off course.

Although its main use would be for auto-probing, it's maybe worth
mentioning I've met at least one peripheral which also reports bus
errors when writing inappropriate/unsupported *values* to a register.
(Of course when using posted writes you won't get an abort anyhow in
that case, it's only reported via interconnect error logs.)


On 19 February 2015 at 19:20, Pali Roh?r <pali.rohar@gmail.com> wrote:
> Anyway, in Nokia Harmattan N9/N950 2.6.32 kernel is this patch

In mainline linux the same fix-up is done at runtime rather than
compile time (in exceptions_init() at the bottom of fault.c). Either
way, in my post of the 11th I also mentioned that it looks wrong to
me. I-cache maintenance fault is really a special case in the fault
decoding logic since it means "although you got here via DAbort and
the relevant address is in DFAR, the exception happened on the
instruction side so you need to fetch the fault status from IFSR
instead."

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-19 18:20                                             ` Pali Rohár
@ 2015-02-19 21:10                                               ` Aaro Koskinen
  -1 siblings, 0 replies; 87+ messages in thread
From: Aaro Koskinen @ 2015-02-19 21:10 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Matthijs van Duin, Tony Lindgren, Sebastian Reichel, linux-omap,
	Pavel Machek, Nishanth Menon, linux-arm-kernel

Hi,

On Thu, Feb 19, 2015 at 07:20:41PM +0100, Pali Rohár wrote:
> Anyway, in Nokia Harmattan N9/N950 2.6.32 kernel is this patch:

> +/* Do we need runtime check ? */
> +#if __LINUX_ARM_ARCH__ < 6
>  	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
> +#else
> +	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"I-cache maintenance fault"	   },
> +#endif

> Maybe it is related?

That was unrelated. Also, the patch is also in mainline,
see 8c0b742ca7a7d21de0ddc87eda6ef0b282e4de18 (ARM: 6134/1: Handle
instruction cache maintenance fault properly).

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-02-19 21:10                                               ` Aaro Koskinen
  0 siblings, 0 replies; 87+ messages in thread
From: Aaro Koskinen @ 2015-02-19 21:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Feb 19, 2015 at 07:20:41PM +0100, Pali Roh?r wrote:
> Anyway, in Nokia Harmattan N9/N950 2.6.32 kernel is this patch:

> +/* Do we need runtime check ? */
> +#if __LINUX_ARM_ARCH__ < 6
>  	{ do_bad,		SIGBUS,	 0,		"external abort on linefetch"	   },
> +#else
> +	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"I-cache maintenance fault"	   },
> +#endif

> Maybe it is related?

That was unrelated. Also, the patch is also in mainline,
see 8c0b742ca7a7d21de0ddc87eda6ef0b282e4de18 (ARM: 6134/1: Handle
instruction cache maintenance fault properly).

A.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-02-11 20:40                                               ` Nishanth Menon
  (?)
@ 2015-05-28  7:37                                                 ` Pali Rohár
  -1 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-05-28  7:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Tony Lindgren, Matthijs van Duin, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On Wednesday 11 February 2015 14:40:33 Nishanth Menon wrote:
> On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> >> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com>
> > wrote:
> >> >> Anyhow, since checking the firewalls/APs to see if you have
> >> >> permission will probably only get you yet another fault if
> >> >> things are walled off, the robust way of dealing with this
> >> >> sort of situation is by probing the device with a read
> >> >> while trapping bus faults. This also handles modules that
> >> >> are unreachable for other reasons, e.g. being disabled by
> >> >> eFuse.
> >> >
> >> > It is possible to patch kernel code to mask or ignore that
> >> > fault? Can you help me with something like that?
> >>
> >> As I mentioned, I'm still learning my way around the kernel,
> >> so I don't feel very comfortable suggesting a concrete patch
> >> just yet. I've been browsing arch/arm/mm/ however and my
> >> impression is that all that would be required is editing
> >> fault.c by making a copy of do_bad but containing
> >>     return user_mode(regs) || !fixup_exception(regs);
> >> and hook it onto the appropriate fault codes.  However, this
> >> really needs the opinion of someone more familiar with this
> >> code.
> >>
> >> I do have an observation to make on the issue of fault
> >> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> >> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> >>
> >> [ 0] -
> >> [ 1] alignment fault
> >> [ 2] debug event
> >> [ 3] section access flag fault
> >> [ 4] instruction cache maintainance fault (reported via data
> >> abort) [ 5] section translation fault
> >> [ 6] page access flag fault
> >> [ 7] page translation fault
> >> [ 8] bus error on access
> >> [ 9] section domain fault
> >> [10] -
> >> [11] page domain fault
> >> [12] bus error on section table walk
> >> [13] section permission fault
> >> [14] bus error on page table walk
> >> [15] page permission fault
> >> [16] (TLB conflict abort)
> >> [17] -
> >> [18] -
> >> [19] -
> >> [20] (lockdown abort)
> >> [21] -
> >> [22] async bus error (reported via data abort)
> >> [23] -
> >> [24] async parity/ECC error (reported via data abort)
> >> [25] parity/ECC error on access
> >> [26] (coprocessor abort)
> >> [27] -
> >> [28] parity/ECC error on section table walk
> >> [29] -
> >> [30] parity/ECC error on page table walk
> >> [31] -
> >>
> >> Some entries are patched up near the bottom of fault.c but
> >> many bogus messages remain, for example the "on linefetch" vs
> >> "on non-linefetch" is misleading since no such thing can be
> >> inferred from the fault status on v7.  Also, the i-cache
> >> maintenance fault handling looks wrong to me: it should fetch
> >> the actual fault status from IFSR (even though the address
> >> still comes from DFSR) and dispatch based on that.
> >>
> >> Async external aborts (async bus error and async parity/ECC
> >> error) give you basically no info. DFAR will contain garbage
> >> hence displaying it will confuse rather than enlighten, a
> >> traceback is pointless since the instruction that caused the
> >> access is long retired, likewise user_mode() doesn't matter
> >> since a transition to kernel space may have happened after
> >> the access that cause the abort. Basically they should be
> >> treated more as an IRQ than as a fault (note they can also be
> >> masked just like irqs). In case of a bus error, it may be
> >> appropriate to just warn about it, or perhaps send a signal
> >> to the current process, although in the latter case it should
> >> have some means to distinguish it from a synchronous bus
> >> error.
> >>
> >> At least on the cortex-a8, a parity/ECC error (whether async
> >> or not) is to be regarded as absolutely fatal.  Quoth the
> >> TRM: "No recovery is possible. The abort handler must disable
> >> the caches, communicate the fail directly with the external
> >> system, request a reboot."
> >>
> >> Bit 10 no longer indicates an asynchronous (let alone
> >> imprecise) fault.  Apart from the debug events and async
> >> aborts (and possibly some implementation-defined aborts), all
> >> aborts listed are synchronous, and DFAR/IFAR is valid.
> >> There's no technical obstruction to make these trappable via
> >> the kernel exception handling mechanism. (Though at least in
> >> case of parity/ECC errors one shouldn't.)
> >
> > Tony, Nishanth, or somebody else... can you help with memory
> > management? Or do you know some expert for arch/arm/mm/ code?
> 
> Folks in linux-arm-kernel are probably the right people, I suppose.
> Looping them in.
> 

So pinging linux-arm-kernel again. Any idea how to handle that fault?

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28  7:37                                                 ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-05-28  7:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Tony Lindgren, Matthijs van Duin, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On Wednesday 11 February 2015 14:40:33 Nishanth Menon wrote:
> On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> >> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com>
> > wrote:
> >> >> Anyhow, since checking the firewalls/APs to see if you have
> >> >> permission will probably only get you yet another fault if
> >> >> things are walled off, the robust way of dealing with this
> >> >> sort of situation is by probing the device with a read
> >> >> while trapping bus faults. This also handles modules that
> >> >> are unreachable for other reasons, e.g. being disabled by
> >> >> eFuse.
> >> >
> >> > It is possible to patch kernel code to mask or ignore that
> >> > fault? Can you help me with something like that?
> >>
> >> As I mentioned, I'm still learning my way around the kernel,
> >> so I don't feel very comfortable suggesting a concrete patch
> >> just yet. I've been browsing arch/arm/mm/ however and my
> >> impression is that all that would be required is editing
> >> fault.c by making a copy of do_bad but containing
> >>     return user_mode(regs) || !fixup_exception(regs);
> >> and hook it onto the appropriate fault codes.  However, this
> >> really needs the opinion of someone more familiar with this
> >> code.
> >>
> >> I do have an observation to make on the issue of fault
> >> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> >> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> >>
> >> [ 0] -
> >> [ 1] alignment fault
> >> [ 2] debug event
> >> [ 3] section access flag fault
> >> [ 4] instruction cache maintainance fault (reported via data
> >> abort) [ 5] section translation fault
> >> [ 6] page access flag fault
> >> [ 7] page translation fault
> >> [ 8] bus error on access
> >> [ 9] section domain fault
> >> [10] -
> >> [11] page domain fault
> >> [12] bus error on section table walk
> >> [13] section permission fault
> >> [14] bus error on page table walk
> >> [15] page permission fault
> >> [16] (TLB conflict abort)
> >> [17] -
> >> [18] -
> >> [19] -
> >> [20] (lockdown abort)
> >> [21] -
> >> [22] async bus error (reported via data abort)
> >> [23] -
> >> [24] async parity/ECC error (reported via data abort)
> >> [25] parity/ECC error on access
> >> [26] (coprocessor abort)
> >> [27] -
> >> [28] parity/ECC error on section table walk
> >> [29] -
> >> [30] parity/ECC error on page table walk
> >> [31] -
> >>
> >> Some entries are patched up near the bottom of fault.c but
> >> many bogus messages remain, for example the "on linefetch" vs
> >> "on non-linefetch" is misleading since no such thing can be
> >> inferred from the fault status on v7.  Also, the i-cache
> >> maintenance fault handling looks wrong to me: it should fetch
> >> the actual fault status from IFSR (even though the address
> >> still comes from DFSR) and dispatch based on that.
> >>
> >> Async external aborts (async bus error and async parity/ECC
> >> error) give you basically no info. DFAR will contain garbage
> >> hence displaying it will confuse rather than enlighten, a
> >> traceback is pointless since the instruction that caused the
> >> access is long retired, likewise user_mode() doesn't matter
> >> since a transition to kernel space may have happened after
> >> the access that cause the abort. Basically they should be
> >> treated more as an IRQ than as a fault (note they can also be
> >> masked just like irqs). In case of a bus error, it may be
> >> appropriate to just warn about it, or perhaps send a signal
> >> to the current process, although in the latter case it should
> >> have some means to distinguish it from a synchronous bus
> >> error.
> >>
> >> At least on the cortex-a8, a parity/ECC error (whether async
> >> or not) is to be regarded as absolutely fatal.  Quoth the
> >> TRM: "No recovery is possible. The abort handler must disable
> >> the caches, communicate the fail directly with the external
> >> system, request a reboot."
> >>
> >> Bit 10 no longer indicates an asynchronous (let alone
> >> imprecise) fault.  Apart from the debug events and async
> >> aborts (and possibly some implementation-defined aborts), all
> >> aborts listed are synchronous, and DFAR/IFAR is valid.
> >> There's no technical obstruction to make these trappable via
> >> the kernel exception handling mechanism. (Though at least in
> >> case of parity/ECC errors one shouldn't.)
> >
> > Tony, Nishanth, or somebody else... can you help with memory
> > management? Or do you know some expert for arch/arm/mm/ code?
> 
> Folks in linux-arm-kernel are probably the right people, I suppose.
> Looping them in.
> 

So pinging linux-arm-kernel again. Any idea how to handle that fault?

-- 
Pali Rohár
pali.rohar@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28  7:37                                                 ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-05-28  7:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 11 February 2015 14:40:33 Nishanth Menon wrote:
> On Wed, Feb 11, 2015 at 2:28 PM, Pali Roh?r <pali.rohar@gmail.com> wrote:
> > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> >> On 11 February 2015 at 13:39, Pali Roh?r <pali.rohar@gmail.com>
> > wrote:
> >> >> Anyhow, since checking the firewalls/APs to see if you have
> >> >> permission will probably only get you yet another fault if
> >> >> things are walled off, the robust way of dealing with this
> >> >> sort of situation is by probing the device with a read
> >> >> while trapping bus faults. This also handles modules that
> >> >> are unreachable for other reasons, e.g. being disabled by
> >> >> eFuse.
> >> >
> >> > It is possible to patch kernel code to mask or ignore that
> >> > fault? Can you help me with something like that?
> >>
> >> As I mentioned, I'm still learning my way around the kernel,
> >> so I don't feel very comfortable suggesting a concrete patch
> >> just yet. I've been browsing arch/arm/mm/ however and my
> >> impression is that all that would be required is editing
> >> fault.c by making a copy of do_bad but containing
> >>     return user_mode(regs) || !fixup_exception(regs);
> >> and hook it onto the appropriate fault codes.  However, this
> >> really needs the opinion of someone more familiar with this
> >> code.
> >>
> >> I do have an observation to make on the issue of fault
> >> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> >> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> >>
> >> [ 0] -
> >> [ 1] alignment fault
> >> [ 2] debug event
> >> [ 3] section access flag fault
> >> [ 4] instruction cache maintainance fault (reported via data
> >> abort) [ 5] section translation fault
> >> [ 6] page access flag fault
> >> [ 7] page translation fault
> >> [ 8] bus error on access
> >> [ 9] section domain fault
> >> [10] -
> >> [11] page domain fault
> >> [12] bus error on section table walk
> >> [13] section permission fault
> >> [14] bus error on page table walk
> >> [15] page permission fault
> >> [16] (TLB conflict abort)
> >> [17] -
> >> [18] -
> >> [19] -
> >> [20] (lockdown abort)
> >> [21] -
> >> [22] async bus error (reported via data abort)
> >> [23] -
> >> [24] async parity/ECC error (reported via data abort)
> >> [25] parity/ECC error on access
> >> [26] (coprocessor abort)
> >> [27] -
> >> [28] parity/ECC error on section table walk
> >> [29] -
> >> [30] parity/ECC error on page table walk
> >> [31] -
> >>
> >> Some entries are patched up near the bottom of fault.c but
> >> many bogus messages remain, for example the "on linefetch" vs
> >> "on non-linefetch" is misleading since no such thing can be
> >> inferred from the fault status on v7.  Also, the i-cache
> >> maintenance fault handling looks wrong to me: it should fetch
> >> the actual fault status from IFSR (even though the address
> >> still comes from DFSR) and dispatch based on that.
> >>
> >> Async external aborts (async bus error and async parity/ECC
> >> error) give you basically no info. DFAR will contain garbage
> >> hence displaying it will confuse rather than enlighten, a
> >> traceback is pointless since the instruction that caused the
> >> access is long retired, likewise user_mode() doesn't matter
> >> since a transition to kernel space may have happened after
> >> the access that cause the abort. Basically they should be
> >> treated more as an IRQ than as a fault (note they can also be
> >> masked just like irqs). In case of a bus error, it may be
> >> appropriate to just warn about it, or perhaps send a signal
> >> to the current process, although in the latter case it should
> >> have some means to distinguish it from a synchronous bus
> >> error.
> >>
> >> At least on the cortex-a8, a parity/ECC error (whether async
> >> or not) is to be regarded as absolutely fatal.  Quoth the
> >> TRM: "No recovery is possible. The abort handler must disable
> >> the caches, communicate the fail directly with the external
> >> system, request a reboot."
> >>
> >> Bit 10 no longer indicates an asynchronous (let alone
> >> imprecise) fault.  Apart from the debug events and async
> >> aborts (and possibly some implementation-defined aborts), all
> >> aborts listed are synchronous, and DFAR/IFAR is valid.
> >> There's no technical obstruction to make these trappable via
> >> the kernel exception handling mechanism. (Though at least in
> >> case of parity/ECC errors one shouldn't.)
> >
> > Tony, Nishanth, or somebody else... can you help with memory
> > management? Or do you know some expert for arch/arm/mm/ code?
> 
> Folks in linux-arm-kernel are probably the right people, I suppose.
> Looping them in.
> 

So pinging linux-arm-kernel again. Any idea how to handle that fault?

-- 
Pali Roh?r
pali.rohar at gmail.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-28  7:37                                                 ` Pali Rohár
  (?)
@ 2015-05-28 16:01                                                   ` Tony Lindgren
  -1 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-28 16:01 UTC (permalink / raw)
  To: Pali Rohár
  Cc: linux-arm-kernel, Matthijs van Duin, Sebastian Reichel,
	linux-omap, Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Pali Rohár <pali.rohar@gmail.com> [150528 00:39]:
> On Wednesday 11 February 2015 14:40:33 Nishanth Menon wrote:
> > On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> > > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> > >> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com>
> > > wrote:
> > >> >> Anyhow, since checking the firewalls/APs to see if you have
> > >> >> permission will probably only get you yet another fault if
> > >> >> things are walled off, the robust way of dealing with this
> > >> >> sort of situation is by probing the device with a read
> > >> >> while trapping bus faults. This also handles modules that
> > >> >> are unreachable for other reasons, e.g. being disabled by
> > >> >> eFuse.
> > >> >
> > >> > It is possible to patch kernel code to mask or ignore that
> > >> > fault? Can you help me with something like that?
> > >>
> > >> As I mentioned, I'm still learning my way around the kernel,
> > >> so I don't feel very comfortable suggesting a concrete patch
> > >> just yet. I've been browsing arch/arm/mm/ however and my
> > >> impression is that all that would be required is editing
> > >> fault.c by making a copy of do_bad but containing
> > >>     return user_mode(regs) || !fixup_exception(regs);
> > >> and hook it onto the appropriate fault codes.  However, this
> > >> really needs the opinion of someone more familiar with this
> > >> code.
> > >>
> > >> I do have an observation to make on the issue of fault
> > >> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> > >> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> > >>
> > >> [ 0] -
> > >> [ 1] alignment fault
> > >> [ 2] debug event
> > >> [ 3] section access flag fault
> > >> [ 4] instruction cache maintainance fault (reported via data
> > >> abort) [ 5] section translation fault
> > >> [ 6] page access flag fault
> > >> [ 7] page translation fault
> > >> [ 8] bus error on access
> > >> [ 9] section domain fault
> > >> [10] -
> > >> [11] page domain fault
> > >> [12] bus error on section table walk
> > >> [13] section permission fault
> > >> [14] bus error on page table walk
> > >> [15] page permission fault
> > >> [16] (TLB conflict abort)
> > >> [17] -
> > >> [18] -
> > >> [19] -
> > >> [20] (lockdown abort)
> > >> [21] -
> > >> [22] async bus error (reported via data abort)
> > >> [23] -
> > >> [24] async parity/ECC error (reported via data abort)
> > >> [25] parity/ECC error on access
> > >> [26] (coprocessor abort)
> > >> [27] -
> > >> [28] parity/ECC error on section table walk
> > >> [29] -
> > >> [30] parity/ECC error on page table walk
> > >> [31] -
> > >>
> > >> Some entries are patched up near the bottom of fault.c but
> > >> many bogus messages remain, for example the "on linefetch" vs
> > >> "on non-linefetch" is misleading since no such thing can be
> > >> inferred from the fault status on v7.  Also, the i-cache
> > >> maintenance fault handling looks wrong to me: it should fetch
> > >> the actual fault status from IFSR (even though the address
> > >> still comes from DFSR) and dispatch based on that.
> > >>
> > >> Async external aborts (async bus error and async parity/ECC
> > >> error) give you basically no info. DFAR will contain garbage
> > >> hence displaying it will confuse rather than enlighten, a
> > >> traceback is pointless since the instruction that caused the
> > >> access is long retired, likewise user_mode() doesn't matter
> > >> since a transition to kernel space may have happened after
> > >> the access that cause the abort. Basically they should be
> > >> treated more as an IRQ than as a fault (note they can also be
> > >> masked just like irqs). In case of a bus error, it may be
> > >> appropriate to just warn about it, or perhaps send a signal
> > >> to the current process, although in the latter case it should
> > >> have some means to distinguish it from a synchronous bus
> > >> error.
> > >>
> > >> At least on the cortex-a8, a parity/ECC error (whether async
> > >> or not) is to be regarded as absolutely fatal.  Quoth the
> > >> TRM: "No recovery is possible. The abort handler must disable
> > >> the caches, communicate the fail directly with the external
> > >> system, request a reboot."
> > >>
> > >> Bit 10 no longer indicates an asynchronous (let alone
> > >> imprecise) fault.  Apart from the debug events and async
> > >> aborts (and possibly some implementation-defined aborts), all
> > >> aborts listed are synchronous, and DFAR/IFAR is valid.
> > >> There's no technical obstruction to make these trappable via
> > >> the kernel exception handling mechanism. (Though at least in
> > >> case of parity/ECC errors one shouldn't.)
> > >
> > > Tony, Nishanth, or somebody else... can you help with memory
> > > management? Or do you know some expert for arch/arm/mm/ code?
> > 
> > Folks in linux-arm-kernel are probably the right people, I suppose.
> > Looping them in.
> > 
> 
> So pinging linux-arm-kernel again. Any idea how to handle that fault?

Here's what might work.. You could patch drivers/bus/omap_l3*.c
code to probe the devices after the omap_l3 driver interrupts
are enabled.

For failed device access you get an interrupt so you know to not
create the struct device entry for that device. For the working
devices you can do the struct device entry and let it probe.

So basically we could make the omap_l3* drivers managers for
the omap bus code instead of probing them with "simple-bus"
and omap_device_build_from_dt().

No need to have these device probe early, and they are all
internal devices so as long as we know the type and address
for each soc the omap_l3 drive code could probe them.

It seems that trying to do this early just makes things more
complicated and should be done in the bootloader instead of
kernel if needed early.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28 16:01                                                   ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-28 16:01 UTC (permalink / raw)
  To: Pali Rohár
  Cc: linux-arm-kernel, Matthijs van Duin, Sebastian Reichel,
	linux-omap, Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Pali Rohár <pali.rohar@gmail.com> [150528 00:39]:
> On Wednesday 11 February 2015 14:40:33 Nishanth Menon wrote:
> > On Wed, Feb 11, 2015 at 2:28 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> > > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> > >> On 11 February 2015 at 13:39, Pali Rohár <pali.rohar@gmail.com>
> > > wrote:
> > >> >> Anyhow, since checking the firewalls/APs to see if you have
> > >> >> permission will probably only get you yet another fault if
> > >> >> things are walled off, the robust way of dealing with this
> > >> >> sort of situation is by probing the device with a read
> > >> >> while trapping bus faults. This also handles modules that
> > >> >> are unreachable for other reasons, e.g. being disabled by
> > >> >> eFuse.
> > >> >
> > >> > It is possible to patch kernel code to mask or ignore that
> > >> > fault? Can you help me with something like that?
> > >>
> > >> As I mentioned, I'm still learning my way around the kernel,
> > >> so I don't feel very comfortable suggesting a concrete patch
> > >> just yet. I've been browsing arch/arm/mm/ however and my
> > >> impression is that all that would be required is editing
> > >> fault.c by making a copy of do_bad but containing
> > >>     return user_mode(regs) || !fixup_exception(regs);
> > >> and hook it onto the appropriate fault codes.  However, this
> > >> really needs the opinion of someone more familiar with this
> > >> code.
> > >>
> > >> I do have an observation to make on the issue of fault
> > >> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> > >> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> > >>
> > >> [ 0] -
> > >> [ 1] alignment fault
> > >> [ 2] debug event
> > >> [ 3] section access flag fault
> > >> [ 4] instruction cache maintainance fault (reported via data
> > >> abort) [ 5] section translation fault
> > >> [ 6] page access flag fault
> > >> [ 7] page translation fault
> > >> [ 8] bus error on access
> > >> [ 9] section domain fault
> > >> [10] -
> > >> [11] page domain fault
> > >> [12] bus error on section table walk
> > >> [13] section permission fault
> > >> [14] bus error on page table walk
> > >> [15] page permission fault
> > >> [16] (TLB conflict abort)
> > >> [17] -
> > >> [18] -
> > >> [19] -
> > >> [20] (lockdown abort)
> > >> [21] -
> > >> [22] async bus error (reported via data abort)
> > >> [23] -
> > >> [24] async parity/ECC error (reported via data abort)
> > >> [25] parity/ECC error on access
> > >> [26] (coprocessor abort)
> > >> [27] -
> > >> [28] parity/ECC error on section table walk
> > >> [29] -
> > >> [30] parity/ECC error on page table walk
> > >> [31] -
> > >>
> > >> Some entries are patched up near the bottom of fault.c but
> > >> many bogus messages remain, for example the "on linefetch" vs
> > >> "on non-linefetch" is misleading since no such thing can be
> > >> inferred from the fault status on v7.  Also, the i-cache
> > >> maintenance fault handling looks wrong to me: it should fetch
> > >> the actual fault status from IFSR (even though the address
> > >> still comes from DFSR) and dispatch based on that.
> > >>
> > >> Async external aborts (async bus error and async parity/ECC
> > >> error) give you basically no info. DFAR will contain garbage
> > >> hence displaying it will confuse rather than enlighten, a
> > >> traceback is pointless since the instruction that caused the
> > >> access is long retired, likewise user_mode() doesn't matter
> > >> since a transition to kernel space may have happened after
> > >> the access that cause the abort. Basically they should be
> > >> treated more as an IRQ than as a fault (note they can also be
> > >> masked just like irqs). In case of a bus error, it may be
> > >> appropriate to just warn about it, or perhaps send a signal
> > >> to the current process, although in the latter case it should
> > >> have some means to distinguish it from a synchronous bus
> > >> error.
> > >>
> > >> At least on the cortex-a8, a parity/ECC error (whether async
> > >> or not) is to be regarded as absolutely fatal.  Quoth the
> > >> TRM: "No recovery is possible. The abort handler must disable
> > >> the caches, communicate the fail directly with the external
> > >> system, request a reboot."
> > >>
> > >> Bit 10 no longer indicates an asynchronous (let alone
> > >> imprecise) fault.  Apart from the debug events and async
> > >> aborts (and possibly some implementation-defined aborts), all
> > >> aborts listed are synchronous, and DFAR/IFAR is valid.
> > >> There's no technical obstruction to make these trappable via
> > >> the kernel exception handling mechanism. (Though at least in
> > >> case of parity/ECC errors one shouldn't.)
> > >
> > > Tony, Nishanth, or somebody else... can you help with memory
> > > management? Or do you know some expert for arch/arm/mm/ code?
> > 
> > Folks in linux-arm-kernel are probably the right people, I suppose.
> > Looping them in.
> > 
> 
> So pinging linux-arm-kernel again. Any idea how to handle that fault?

Here's what might work.. You could patch drivers/bus/omap_l3*.c
code to probe the devices after the omap_l3 driver interrupts
are enabled.

For failed device access you get an interrupt so you know to not
create the struct device entry for that device. For the working
devices you can do the struct device entry and let it probe.

So basically we could make the omap_l3* drivers managers for
the omap bus code instead of probing them with "simple-bus"
and omap_device_build_from_dt().

No need to have these device probe early, and they are all
internal devices so as long as we know the type and address
for each soc the omap_l3 drive code could probe them.

It seems that trying to do this early just makes things more
complicated and should be done in the bootloader instead of
kernel if needed early.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28 16:01                                                   ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-28 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

* Pali Roh?r <pali.rohar@gmail.com> [150528 00:39]:
> On Wednesday 11 February 2015 14:40:33 Nishanth Menon wrote:
> > On Wed, Feb 11, 2015 at 2:28 PM, Pali Roh?r <pali.rohar@gmail.com> wrote:
> > > On Wednesday 11 February 2015 16:22:51 Matthijs van Duin wrote:
> > >> On 11 February 2015 at 13:39, Pali Roh?r <pali.rohar@gmail.com>
> > > wrote:
> > >> >> Anyhow, since checking the firewalls/APs to see if you have
> > >> >> permission will probably only get you yet another fault if
> > >> >> things are walled off, the robust way of dealing with this
> > >> >> sort of situation is by probing the device with a read
> > >> >> while trapping bus faults. This also handles modules that
> > >> >> are unreachable for other reasons, e.g. being disabled by
> > >> >> eFuse.
> > >> >
> > >> > It is possible to patch kernel code to mask or ignore that
> > >> > fault? Can you help me with something like that?
> > >>
> > >> As I mentioned, I'm still learning my way around the kernel,
> > >> so I don't feel very comfortable suggesting a concrete patch
> > >> just yet. I've been browsing arch/arm/mm/ however and my
> > >> impression is that all that would be required is editing
> > >> fault.c by making a copy of do_bad but containing
> > >>     return user_mode(regs) || !fixup_exception(regs);
> > >> and hook it onto the appropriate fault codes.  However, this
> > >> really needs the opinion of someone more familiar with this
> > >> code.
> > >>
> > >> I do have an observation to make on the issue of fault
> > >> decoding: the list in fsr-2level.c may be "standard ARMv3 and
> > >> ARMv4 aborts" but they are quite wrong for ARMv7 which has:
> > >>
> > >> [ 0] -
> > >> [ 1] alignment fault
> > >> [ 2] debug event
> > >> [ 3] section access flag fault
> > >> [ 4] instruction cache maintainance fault (reported via data
> > >> abort) [ 5] section translation fault
> > >> [ 6] page access flag fault
> > >> [ 7] page translation fault
> > >> [ 8] bus error on access
> > >> [ 9] section domain fault
> > >> [10] -
> > >> [11] page domain fault
> > >> [12] bus error on section table walk
> > >> [13] section permission fault
> > >> [14] bus error on page table walk
> > >> [15] page permission fault
> > >> [16] (TLB conflict abort)
> > >> [17] -
> > >> [18] -
> > >> [19] -
> > >> [20] (lockdown abort)
> > >> [21] -
> > >> [22] async bus error (reported via data abort)
> > >> [23] -
> > >> [24] async parity/ECC error (reported via data abort)
> > >> [25] parity/ECC error on access
> > >> [26] (coprocessor abort)
> > >> [27] -
> > >> [28] parity/ECC error on section table walk
> > >> [29] -
> > >> [30] parity/ECC error on page table walk
> > >> [31] -
> > >>
> > >> Some entries are patched up near the bottom of fault.c but
> > >> many bogus messages remain, for example the "on linefetch" vs
> > >> "on non-linefetch" is misleading since no such thing can be
> > >> inferred from the fault status on v7.  Also, the i-cache
> > >> maintenance fault handling looks wrong to me: it should fetch
> > >> the actual fault status from IFSR (even though the address
> > >> still comes from DFSR) and dispatch based on that.
> > >>
> > >> Async external aborts (async bus error and async parity/ECC
> > >> error) give you basically no info. DFAR will contain garbage
> > >> hence displaying it will confuse rather than enlighten, a
> > >> traceback is pointless since the instruction that caused the
> > >> access is long retired, likewise user_mode() doesn't matter
> > >> since a transition to kernel space may have happened after
> > >> the access that cause the abort. Basically they should be
> > >> treated more as an IRQ than as a fault (note they can also be
> > >> masked just like irqs). In case of a bus error, it may be
> > >> appropriate to just warn about it, or perhaps send a signal
> > >> to the current process, although in the latter case it should
> > >> have some means to distinguish it from a synchronous bus
> > >> error.
> > >>
> > >> At least on the cortex-a8, a parity/ECC error (whether async
> > >> or not) is to be regarded as absolutely fatal.  Quoth the
> > >> TRM: "No recovery is possible. The abort handler must disable
> > >> the caches, communicate the fail directly with the external
> > >> system, request a reboot."
> > >>
> > >> Bit 10 no longer indicates an asynchronous (let alone
> > >> imprecise) fault.  Apart from the debug events and async
> > >> aborts (and possibly some implementation-defined aborts), all
> > >> aborts listed are synchronous, and DFAR/IFAR is valid.
> > >> There's no technical obstruction to make these trappable via
> > >> the kernel exception handling mechanism. (Though at least in
> > >> case of parity/ECC errors one shouldn't.)
> > >
> > > Tony, Nishanth, or somebody else... can you help with memory
> > > management? Or do you know some expert for arch/arm/mm/ code?
> > 
> > Folks in linux-arm-kernel are probably the right people, I suppose.
> > Looping them in.
> > 
> 
> So pinging linux-arm-kernel again. Any idea how to handle that fault?

Here's what might work.. You could patch drivers/bus/omap_l3*.c
code to probe the devices after the omap_l3 driver interrupts
are enabled.

For failed device access you get an interrupt so you know to not
create the struct device entry for that device. For the working
devices you can do the struct device entry and let it probe.

So basically we could make the omap_l3* drivers managers for
the omap bus code instead of probing them with "simple-bus"
and omap_device_build_from_dt().

No need to have these device probe early, and they are all
internal devices so as long as we know the type and address
for each soc the omap_l3 drive code could probe them.

It seems that trying to do this early just makes things more
complicated and should be done in the bootloader instead of
kernel if needed early.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-28 16:01                                                   ` Tony Lindgren
@ 2015-05-28 20:26                                                     ` Matthijs van Duin
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-28 20:26 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> For failed device access you get an interrupt

Well for failed reads you get a bus error, and "catching" those (e.g.
using the existing exception mechanism used to catch MMU faults) is
the whole issue.

Though now that you mention it, it is true that for writes you won't
get any fault (at least on the DM814x and AM335x the posting point
appears to be the async bridge from MPUSS to the L3 interconnect) but
an interconnect error irq instead. It may be easier to make some kind
of harmless write (e.g. to the version register), wait a bit, and
check if the write triggered an interconnect error.

Feels hackish though: you'd need to be sure you waited long enough
(though using a read from another device on the same L4 interconnect
should be a reliable barrier in this case), and drivers for
receiving/interpreting interconnect errors are not implemented yet on
all SoCs (for some, like the AM335x, TI didn't even bother publishing
the relevant data in its TRM). Interconnect errors can also be lost in
some cases (multiple errors involving the same target in a short time
window) though that problem shouldn't arise in this particular case.

Also, presumably interconnect error reporting is unavailable on HS
devices given the fact that all interconnect registers seemed to be
inaccessible?

Matthijs

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28 20:26                                                     ` Matthijs van Duin
  0 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-28 20:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> For failed device access you get an interrupt

Well for failed reads you get a bus error, and "catching" those (e.g.
using the existing exception mechanism used to catch MMU faults) is
the whole issue.

Though now that you mention it, it is true that for writes you won't
get any fault (at least on the DM814x and AM335x the posting point
appears to be the async bridge from MPUSS to the L3 interconnect) but
an interconnect error irq instead. It may be easier to make some kind
of harmless write (e.g. to the version register), wait a bit, and
check if the write triggered an interconnect error.

Feels hackish though: you'd need to be sure you waited long enough
(though using a read from another device on the same L4 interconnect
should be a reliable barrier in this case), and drivers for
receiving/interpreting interconnect errors are not implemented yet on
all SoCs (for some, like the AM335x, TI didn't even bother publishing
the relevant data in its TRM). Interconnect errors can also be lost in
some cases (multiple errors involving the same target in a short time
window) though that problem shouldn't arise in this particular case.

Also, presumably interconnect error reporting is unavailable on HS
devices given the fact that all interconnect registers seemed to be
inaccessible?

Matthijs

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-28 20:26                                                     ` Matthijs van Duin
@ 2015-05-28 22:24                                                       ` Tony Lindgren
  -1 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-28 22:24 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > For failed device access you get an interrupt
> 
> Well for failed reads you get a bus error, and "catching" those (e.g.
> using the existing exception mechanism used to catch MMU faults) is
> the whole issue.
> 
> Though now that you mention it, it is true that for writes you won't
> get any fault (at least on the DM814x and AM335x the posting point
> appears to be the async bridge from MPUSS to the L3 interconnect) but
> an interconnect error irq instead. It may be easier to make some kind
> of harmless write (e.g. to the version register), wait a bit, and
> check if the write triggered an interconnect error.
> 
> Feels hackish though: you'd need to be sure you waited long enough
> (though using a read from another device on the same L4 interconnect
> should be a reliable barrier in this case), and drivers for
> receiving/interpreting interconnect errors are not implemented yet on
> all SoCs (for some, like the AM335x, TI didn't even bother publishing
> the relevant data in its TRM). Interconnect errors can also be lost in
> some cases (multiple errors involving the same target in a short time
> window) though that problem shouldn't arise in this particular case.

Hmm I believe the interrupt happens immediately trying to access an
invalid device. But maybe I'm thinking about just errors if a device
is not powered or clocked. So obviously some experiments need to be
done :)

The advantage here would be that the l3 driver actually already knows
quite a bit about the devices on the bus.
 
> Also, presumably interconnect error reporting is unavailable on HS
> devices given the fact that all interconnect registers seemed to be
> inaccessible?

Oh OK yeah then that would not work for Pali's case. I guess it just
needs to be tested.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28 22:24                                                       ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-28 22:24 UTC (permalink / raw)
  To: linux-arm-kernel

* Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > For failed device access you get an interrupt
> 
> Well for failed reads you get a bus error, and "catching" those (e.g.
> using the existing exception mechanism used to catch MMU faults) is
> the whole issue.
> 
> Though now that you mention it, it is true that for writes you won't
> get any fault (at least on the DM814x and AM335x the posting point
> appears to be the async bridge from MPUSS to the L3 interconnect) but
> an interconnect error irq instead. It may be easier to make some kind
> of harmless write (e.g. to the version register), wait a bit, and
> check if the write triggered an interconnect error.
> 
> Feels hackish though: you'd need to be sure you waited long enough
> (though using a read from another device on the same L4 interconnect
> should be a reliable barrier in this case), and drivers for
> receiving/interpreting interconnect errors are not implemented yet on
> all SoCs (for some, like the AM335x, TI didn't even bother publishing
> the relevant data in its TRM). Interconnect errors can also be lost in
> some cases (multiple errors involving the same target in a short time
> window) though that problem shouldn't arise in this particular case.

Hmm I believe the interrupt happens immediately trying to access an
invalid device. But maybe I'm thinking about just errors if a device
is not powered or clocked. So obviously some experiments need to be
done :)

The advantage here would be that the l3 driver actually already knows
quite a bit about the devices on the bus.
 
> Also, presumably interconnect error reporting is unavailable on HS
> devices given the fact that all interconnect registers seemed to be
> inaccessible?

Oh OK yeah then that would not work for Pali's case. I guess it just
needs to be tested.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-28 22:24                                                       ` Tony Lindgren
  (?)
@ 2015-05-28 22:27                                                         ` Pali Rohár
  -1 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-05-28 22:27 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Matthijs van Duin, linux-arm-kernel, Sebastian Reichel,
	linux-omap, Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 2266 bytes --]

On Friday 29 May 2015 00:24:13 Tony Lindgren wrote:
> * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> > On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > > For failed device access you get an interrupt
> > 
> > Well for failed reads you get a bus error, and "catching" those
> > (e.g. using the existing exception mechanism used to catch MMU
> > faults) is the whole issue.
> > 
> > Though now that you mention it, it is true that for writes you
> > won't get any fault (at least on the DM814x and AM335x the posting
> > point appears to be the async bridge from MPUSS to the L3
> > interconnect) but an interconnect error irq instead. It may be
> > easier to make some kind of harmless write (e.g. to the version
> > register), wait a bit, and check if the write triggered an
> > interconnect error.
> > 
> > Feels hackish though: you'd need to be sure you waited long enough
> > (though using a read from another device on the same L4
> > interconnect should be a reliable barrier in this case), and
> > drivers for receiving/interpreting interconnect errors are not
> > implemented yet on all SoCs (for some, like the AM335x, TI didn't
> > even bother publishing the relevant data in its TRM). Interconnect
> > errors can also be lost in some cases (multiple errors involving
> > the same target in a short time window) though that problem
> > shouldn't arise in this particular case.
> 
> Hmm I believe the interrupt happens immediately trying to access an
> invalid device. But maybe I'm thinking about just errors if a device
> is not powered or clocked. So obviously some experiments need to be
> done :)
> 
> The advantage here would be that the l3 driver actually already knows
> quite a bit about the devices on the bus.
> 
> > Also, presumably interconnect error reporting is unavailable on HS
> > devices given the fact that all interconnect registers seemed to be
> > inaccessible?
> 
> Oh OK yeah then that would not work for Pali's case. I guess it just
> needs to be tested.
> 
> Regards,
> 
> Tony

Ok, thanks for info. Do you have some quick small patches for testing? 
Or some pointers what is needed to modify and how?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28 22:27                                                         ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-05-28 22:27 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Matthijs van Duin, linux-arm-kernel, Sebastian Reichel,
	linux-omap, Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

[-- Attachment #1: Type: Text/Plain, Size: 2266 bytes --]

On Friday 29 May 2015 00:24:13 Tony Lindgren wrote:
> * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> > On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > > For failed device access you get an interrupt
> > 
> > Well for failed reads you get a bus error, and "catching" those
> > (e.g. using the existing exception mechanism used to catch MMU
> > faults) is the whole issue.
> > 
> > Though now that you mention it, it is true that for writes you
> > won't get any fault (at least on the DM814x and AM335x the posting
> > point appears to be the async bridge from MPUSS to the L3
> > interconnect) but an interconnect error irq instead. It may be
> > easier to make some kind of harmless write (e.g. to the version
> > register), wait a bit, and check if the write triggered an
> > interconnect error.
> > 
> > Feels hackish though: you'd need to be sure you waited long enough
> > (though using a read from another device on the same L4
> > interconnect should be a reliable barrier in this case), and
> > drivers for receiving/interpreting interconnect errors are not
> > implemented yet on all SoCs (for some, like the AM335x, TI didn't
> > even bother publishing the relevant data in its TRM). Interconnect
> > errors can also be lost in some cases (multiple errors involving
> > the same target in a short time window) though that problem
> > shouldn't arise in this particular case.
> 
> Hmm I believe the interrupt happens immediately trying to access an
> invalid device. But maybe I'm thinking about just errors if a device
> is not powered or clocked. So obviously some experiments need to be
> done :)
> 
> The advantage here would be that the l3 driver actually already knows
> quite a bit about the devices on the bus.
> 
> > Also, presumably interconnect error reporting is unavailable on HS
> > devices given the fact that all interconnect registers seemed to be
> > inaccessible?
> 
> Oh OK yeah then that would not work for Pali's case. I guess it just
> needs to be tested.
> 
> Regards,
> 
> Tony

Ok, thanks for info. Do you have some quick small patches for testing? 
Or some pointers what is needed to modify and how?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-28 22:27                                                         ` Pali Rohár
  0 siblings, 0 replies; 87+ messages in thread
From: Pali Rohár @ 2015-05-28 22:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 29 May 2015 00:24:13 Tony Lindgren wrote:
> * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> > On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > > For failed device access you get an interrupt
> > 
> > Well for failed reads you get a bus error, and "catching" those
> > (e.g. using the existing exception mechanism used to catch MMU
> > faults) is the whole issue.
> > 
> > Though now that you mention it, it is true that for writes you
> > won't get any fault (at least on the DM814x and AM335x the posting
> > point appears to be the async bridge from MPUSS to the L3
> > interconnect) but an interconnect error irq instead. It may be
> > easier to make some kind of harmless write (e.g. to the version
> > register), wait a bit, and check if the write triggered an
> > interconnect error.
> > 
> > Feels hackish though: you'd need to be sure you waited long enough
> > (though using a read from another device on the same L4
> > interconnect should be a reliable barrier in this case), and
> > drivers for receiving/interpreting interconnect errors are not
> > implemented yet on all SoCs (for some, like the AM335x, TI didn't
> > even bother publishing the relevant data in its TRM). Interconnect
> > errors can also be lost in some cases (multiple errors involving
> > the same target in a short time window) though that problem
> > shouldn't arise in this particular case.
> 
> Hmm I believe the interrupt happens immediately trying to access an
> invalid device. But maybe I'm thinking about just errors if a device
> is not powered or clocked. So obviously some experiments need to be
> done :)
> 
> The advantage here would be that the l3 driver actually already knows
> quite a bit about the devices on the bus.
> 
> > Also, presumably interconnect error reporting is unavailable on HS
> > devices given the fact that all interconnect registers seemed to be
> > inaccessible?
> 
> Oh OK yeah then that would not work for Pali's case. I guess it just
> needs to be tested.
> 
> Regards,
> 
> Tony

Ok, thanks for info. Do you have some quick small patches for testing? 
Or some pointers what is needed to modify and how?

-- 
Pali Roh?r
pali.rohar at gmail.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150529/f1493a36/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-28 22:27                                                         ` Pali Rohár
  (?)
@ 2015-05-29  0:15                                                           ` Tony Lindgren
  -1 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-29  0:15 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Matthijs van Duin, linux-arm-kernel, Sebastian Reichel,
	linux-omap, Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Pali Rohár <pali.rohar@gmail.com> [150528 15:29]:
> On Friday 29 May 2015 00:24:13 Tony Lindgren wrote:
> > * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> > > On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > > > For failed device access you get an interrupt
> > > 
> > > Well for failed reads you get a bus error, and "catching" those
> > > (e.g. using the existing exception mechanism used to catch MMU
> > > faults) is the whole issue.
> > > 
> > > Though now that you mention it, it is true that for writes you
> > > won't get any fault (at least on the DM814x and AM335x the posting
> > > point appears to be the async bridge from MPUSS to the L3
> > > interconnect) but an interconnect error irq instead. It may be
> > > easier to make some kind of harmless write (e.g. to the version
> > > register), wait a bit, and check if the write triggered an
> > > interconnect error.
> > > 
> > > Feels hackish though: you'd need to be sure you waited long enough
> > > (though using a read from another device on the same L4
> > > interconnect should be a reliable barrier in this case), and
> > > drivers for receiving/interpreting interconnect errors are not
> > > implemented yet on all SoCs (for some, like the AM335x, TI didn't
> > > even bother publishing the relevant data in its TRM). Interconnect
> > > errors can also be lost in some cases (multiple errors involving
> > > the same target in a short time window) though that problem
> > > shouldn't arise in this particular case.
> > 
> > Hmm I believe the interrupt happens immediately trying to access an
> > invalid device. But maybe I'm thinking about just errors if a device
> > is not powered or clocked. So obviously some experiments need to be
> > done :)
> > 
> > The advantage here would be that the l3 driver actually already knows
> > quite a bit about the devices on the bus.
> > 
> > > Also, presumably interconnect error reporting is unavailable on HS
> > > devices given the fact that all interconnect registers seemed to be
> > > inaccessible?
> > 
> > Oh OK yeah then that would not work for Pali's case. I guess it just
> > needs to be tested.
> > 
> > Regards,
> > 
> > Tony
> 
> Ok, thanks for info. Do you have some quick small patches for testing? 
> Or some pointers what is needed to modify and how?

Well I guess the initial test would be to make sure you have
CONFIG_OMAP_INTERCONNECT=y, comment out status = "disabled" in
omap3-n900.dts for aes, patch in the aes hwmod data, check that
you have CONFIG_CRYPTO_DEV_OMAP_AES=y, boot the kernel.

Do you get just the l3_smx interrupt instead of the "Unhandled fault"?

If so then we can use the interrupt handle to make the probe fail.
Not sure yet what would be the best way to do that though :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-29  0:15                                                           ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-29  0:15 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Matthijs van Duin, linux-arm-kernel, Sebastian Reichel,
	linux-omap, Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Pali Rohár <pali.rohar@gmail.com> [150528 15:29]:
> On Friday 29 May 2015 00:24:13 Tony Lindgren wrote:
> > * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> > > On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > > > For failed device access you get an interrupt
> > > 
> > > Well for failed reads you get a bus error, and "catching" those
> > > (e.g. using the existing exception mechanism used to catch MMU
> > > faults) is the whole issue.
> > > 
> > > Though now that you mention it, it is true that for writes you
> > > won't get any fault (at least on the DM814x and AM335x the posting
> > > point appears to be the async bridge from MPUSS to the L3
> > > interconnect) but an interconnect error irq instead. It may be
> > > easier to make some kind of harmless write (e.g. to the version
> > > register), wait a bit, and check if the write triggered an
> > > interconnect error.
> > > 
> > > Feels hackish though: you'd need to be sure you waited long enough
> > > (though using a read from another device on the same L4
> > > interconnect should be a reliable barrier in this case), and
> > > drivers for receiving/interpreting interconnect errors are not
> > > implemented yet on all SoCs (for some, like the AM335x, TI didn't
> > > even bother publishing the relevant data in its TRM). Interconnect
> > > errors can also be lost in some cases (multiple errors involving
> > > the same target in a short time window) though that problem
> > > shouldn't arise in this particular case.
> > 
> > Hmm I believe the interrupt happens immediately trying to access an
> > invalid device. But maybe I'm thinking about just errors if a device
> > is not powered or clocked. So obviously some experiments need to be
> > done :)
> > 
> > The advantage here would be that the l3 driver actually already knows
> > quite a bit about the devices on the bus.
> > 
> > > Also, presumably interconnect error reporting is unavailable on HS
> > > devices given the fact that all interconnect registers seemed to be
> > > inaccessible?
> > 
> > Oh OK yeah then that would not work for Pali's case. I guess it just
> > needs to be tested.
> > 
> > Regards,
> > 
> > Tony
> 
> Ok, thanks for info. Do you have some quick small patches for testing? 
> Or some pointers what is needed to modify and how?

Well I guess the initial test would be to make sure you have
CONFIG_OMAP_INTERCONNECT=y, comment out status = "disabled" in
omap3-n900.dts for aes, patch in the aes hwmod data, check that
you have CONFIG_CRYPTO_DEV_OMAP_AES=y, boot the kernel.

Do you get just the l3_smx interrupt instead of the "Unhandled fault"?

If so then we can use the interrupt handle to make the probe fail.
Not sure yet what would be the best way to do that though :)

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-29  0:15                                                           ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-29  0:15 UTC (permalink / raw)
  To: linux-arm-kernel

* Pali Roh?r <pali.rohar@gmail.com> [150528 15:29]:
> On Friday 29 May 2015 00:24:13 Tony Lindgren wrote:
> > * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 13:28]:
> > > On 28 May 2015 at 18:01, Tony Lindgren <tony@atomide.com> wrote:
> > > > For failed device access you get an interrupt
> > > 
> > > Well for failed reads you get a bus error, and "catching" those
> > > (e.g. using the existing exception mechanism used to catch MMU
> > > faults) is the whole issue.
> > > 
> > > Though now that you mention it, it is true that for writes you
> > > won't get any fault (at least on the DM814x and AM335x the posting
> > > point appears to be the async bridge from MPUSS to the L3
> > > interconnect) but an interconnect error irq instead. It may be
> > > easier to make some kind of harmless write (e.g. to the version
> > > register), wait a bit, and check if the write triggered an
> > > interconnect error.
> > > 
> > > Feels hackish though: you'd need to be sure you waited long enough
> > > (though using a read from another device on the same L4
> > > interconnect should be a reliable barrier in this case), and
> > > drivers for receiving/interpreting interconnect errors are not
> > > implemented yet on all SoCs (for some, like the AM335x, TI didn't
> > > even bother publishing the relevant data in its TRM). Interconnect
> > > errors can also be lost in some cases (multiple errors involving
> > > the same target in a short time window) though that problem
> > > shouldn't arise in this particular case.
> > 
> > Hmm I believe the interrupt happens immediately trying to access an
> > invalid device. But maybe I'm thinking about just errors if a device
> > is not powered or clocked. So obviously some experiments need to be
> > done :)
> > 
> > The advantage here would be that the l3 driver actually already knows
> > quite a bit about the devices on the bus.
> > 
> > > Also, presumably interconnect error reporting is unavailable on HS
> > > devices given the fact that all interconnect registers seemed to be
> > > inaccessible?
> > 
> > Oh OK yeah then that would not work for Pali's case. I guess it just
> > needs to be tested.
> > 
> > Regards,
> > 
> > Tony
> 
> Ok, thanks for info. Do you have some quick small patches for testing? 
> Or some pointers what is needed to modify and how?

Well I guess the initial test would be to make sure you have
CONFIG_OMAP_INTERCONNECT=y, comment out status = "disabled" in
omap3-n900.dts for aes, patch in the aes hwmod data, check that
you have CONFIG_CRYPTO_DEV_OMAP_AES=y, boot the kernel.

Do you get just the l3_smx interrupt instead of the "Unhandled fault"?

If so then we can use the interrupt handle to make the probe fail.
Not sure yet what would be the best way to do that though :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-28 22:24                                                       ` Tony Lindgren
@ 2015-05-29  0:58                                                         ` Matthijs van Duin
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-29  0:58 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On 29 May 2015 at 00:24, Tony Lindgren <tony@atomide.com> wrote:
> Hmm I believe the interrupt happens immediately trying to access an
> invalid device. But maybe I'm thinking about just errors if a device
> is not powered or clocked.

It is only guaranteed to happen immediately (before the next
instruction is executed) if the error occurs before the posting-point
of the write. However, in that case the error is reported in-band to
the cpu, resulting in a (synchronous) bus error which takes precedence
over the out-of-band error irq (if any is signalled). Once the write
is posted however, the cpu will receive an ack on the write and
continue execution, and there's no reason to assume that an error irq
will happen *immediately* after the write.

Of course it typically will happen soon afterwards, possibly even
before the next instruction is executed, depending a bit on how soon
after the posting-point the error occurs versus how long it takes for
the write-ack to reach the cpu. On the other hand, it's also possible
the write, after becoming posted, gets stuck for a while due to a
burst of higher-priority traffic. (I also recall reading about some
situation where a request needs to wait for something to be
dynamically powered up before an error response could be generated,
but I think that was on the OMAP 4.)

So that's the icky part: it will very likely happen almost
immediately. There's however no *guarantee* that it will, and in fact
it's quite tricky to absolutely make sure a write is no longer in
transit. The usual solution is an "OCP barrier": a read that is known
to follow the same path as the write. Normally that means a read from
the same peripheral, but that would defeat the purpose in this case.
Fortunately, the L4 interconnects (unlike the L3) detect firewall
violations in the initiator agent rather than the target agents, hence
a read from any peripheral on the same L4 interconnect suffices.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-29  0:58                                                         ` Matthijs van Duin
  0 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-29  0:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 May 2015 at 00:24, Tony Lindgren <tony@atomide.com> wrote:
> Hmm I believe the interrupt happens immediately trying to access an
> invalid device. But maybe I'm thinking about just errors if a device
> is not powered or clocked.

It is only guaranteed to happen immediately (before the next
instruction is executed) if the error occurs before the posting-point
of the write. However, in that case the error is reported in-band to
the cpu, resulting in a (synchronous) bus error which takes precedence
over the out-of-band error irq (if any is signalled). Once the write
is posted however, the cpu will receive an ack on the write and
continue execution, and there's no reason to assume that an error irq
will happen *immediately* after the write.

Of course it typically will happen soon afterwards, possibly even
before the next instruction is executed, depending a bit on how soon
after the posting-point the error occurs versus how long it takes for
the write-ack to reach the cpu. On the other hand, it's also possible
the write, after becoming posted, gets stuck for a while due to a
burst of higher-priority traffic. (I also recall reading about some
situation where a request needs to wait for something to be
dynamically powered up before an error response could be generated,
but I think that was on the OMAP 4.)

So that's the icky part: it will very likely happen almost
immediately. There's however no *guarantee* that it will, and in fact
it's quite tricky to absolutely make sure a write is no longer in
transit. The usual solution is an "OCP barrier": a read that is known
to follow the same path as the write. Normally that means a read from
the same peripheral, but that would defeat the purpose in this case.
Fortunately, the L4 interconnects (unlike the L3) detect firewall
violations in the initiator agent rather than the target agents, hence
a read from any peripheral on the same L4 interconnect suffices.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-29  0:58                                                         ` Matthijs van Duin
@ 2015-05-29  1:35                                                           ` Matthijs van Duin
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-29  1:35 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On 29 May 2015 at 02:58, Matthijs van Duin <matthijsvanduin@gmail.com> wrote:
> It is only guaranteed to happen immediately (before the next
> instruction is executed) if the error occurs before the posting-point
> of the write. However, in that case the error is reported in-band to
> the cpu, resulting in a (synchronous) bus error which takes precedence
> over the out-of-band error irq (if any is signalled).

OK, all this was actually assuming linux uses device-type mappings for
device mappings, which was also the impression I got from
build_mem_type_table() in arch/arm/mm/mmu.c (although it's a bit of a
maze). A quick test however seems to imply otherwise:

~# ./bogus-dev-write
Bus error

So... linux actually uses strongly-ordered mappings? I really didn't
expect that, given the performance implications (especially on a
strictly in-order cpu like the Cortex-A8 which will really just sit
there picking its nose until the write completes) and I think I recall
having seen an OCP barrier being used somewhere in driver code...

Well, in that case everything I said is technically still true, except
the posting point is the peripheral itself. That also means the
interconnect error reporting mechanism is not really useful for
probing since you'll get a bus error before any error irq is
delivered.

So I'd say you're back at having to trap that bus error using the
exception handling mechanism, which I still suspect shouldn't be hard
to do.

Or perhaps you could probe the device using a DMA access and combine
that with the interconnect error reporting irq... ;-)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-29  1:35                                                           ` Matthijs van Duin
  0 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-29  1:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 May 2015 at 02:58, Matthijs van Duin <matthijsvanduin@gmail.com> wrote:
> It is only guaranteed to happen immediately (before the next
> instruction is executed) if the error occurs before the posting-point
> of the write. However, in that case the error is reported in-band to
> the cpu, resulting in a (synchronous) bus error which takes precedence
> over the out-of-band error irq (if any is signalled).

OK, all this was actually assuming linux uses device-type mappings for
device mappings, which was also the impression I got from
build_mem_type_table() in arch/arm/mm/mmu.c (although it's a bit of a
maze). A quick test however seems to imply otherwise:

~# ./bogus-dev-write
Bus error

So... linux actually uses strongly-ordered mappings? I really didn't
expect that, given the performance implications (especially on a
strictly in-order cpu like the Cortex-A8 which will really just sit
there picking its nose until the write completes) and I think I recall
having seen an OCP barrier being used somewhere in driver code...

Well, in that case everything I said is technically still true, except
the posting point is the peripheral itself. That also means the
interconnect error reporting mechanism is not really useful for
probing since you'll get a bus error before any error irq is
delivered.

So I'd say you're back at having to trap that bus error using the
exception handling mechanism, which I still suspect shouldn't be hard
to do.

Or perhaps you could probe the device using a DMA access and combine
that with the interconnect error reporting irq... ;-)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-29  1:35                                                           ` Matthijs van Duin
@ 2015-05-29 15:50                                                             ` Tony Lindgren
  -1 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-29 15:50 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Matthijs van Duin <matthijsvanduin@gmail.com> [150528 18:37]:
> On 29 May 2015 at 02:58, Matthijs van Duin <matthijsvanduin@gmail.com> wrote:
> > It is only guaranteed to happen immediately (before the next
> > instruction is executed) if the error occurs before the posting-point
> > of the write. However, in that case the error is reported in-band to
> > the cpu, resulting in a (synchronous) bus error which takes precedence
> > over the out-of-band error irq (if any is signalled).
> 
> OK, all this was actually assuming linux uses device-type mappings for
> device mappings, which was also the impression I got from
> build_mem_type_table() in arch/arm/mm/mmu.c (although it's a bit of a
> maze). A quick test however seems to imply otherwise:
> 
> ~# ./bogus-dev-write
> Bus error
> 
> So... linux actually uses strongly-ordered mappings? I really didn't
> expect that, given the performance implications (especially on a
> strictly in-order cpu like the Cortex-A8 which will really just sit
> there picking its nose until the write completes) and I think I recall
> having seen an OCP barrier being used somewhere in driver code...

I believe some TI kernels use strongly-ordered mappings, mainline
kernel does not. Which kernel version are you using?
 
> Well, in that case everything I said is technically still true, except
> the posting point is the peripheral itself. That also means the
> interconnect error reporting mechanism is not really useful for
> probing since you'll get a bus error before any error irq is
> delivered.

Hmm if that's the case then yes we can't use the error irq. However,
what I've seen so far is that we only get the bus error if the
l3_* drivers are configured. I guess some more testing is needed.
 
> So I'd say you're back at having to trap that bus error using the
> exception handling mechanism, which I still suspect shouldn't be hard
> to do.

And in that case it makes sense to do that in the bootloader to
avoid adding any custom early boot code to Linux kernel.
 
> Or perhaps you could probe the device using a DMA access and combine
> that with the interconnect error reporting irq... ;-)

Heh too many dependencies :)

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-29 15:50                                                             ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-29 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

* Matthijs van Duin <matthijsvanduin@gmail.com> [150528 18:37]:
> On 29 May 2015 at 02:58, Matthijs van Duin <matthijsvanduin@gmail.com> wrote:
> > It is only guaranteed to happen immediately (before the next
> > instruction is executed) if the error occurs before the posting-point
> > of the write. However, in that case the error is reported in-band to
> > the cpu, resulting in a (synchronous) bus error which takes precedence
> > over the out-of-band error irq (if any is signalled).
> 
> OK, all this was actually assuming linux uses device-type mappings for
> device mappings, which was also the impression I got from
> build_mem_type_table() in arch/arm/mm/mmu.c (although it's a bit of a
> maze). A quick test however seems to imply otherwise:
> 
> ~# ./bogus-dev-write
> Bus error
> 
> So... linux actually uses strongly-ordered mappings? I really didn't
> expect that, given the performance implications (especially on a
> strictly in-order cpu like the Cortex-A8 which will really just sit
> there picking its nose until the write completes) and I think I recall
> having seen an OCP barrier being used somewhere in driver code...

I believe some TI kernels use strongly-ordered mappings, mainline
kernel does not. Which kernel version are you using?
 
> Well, in that case everything I said is technically still true, except
> the posting point is the peripheral itself. That also means the
> interconnect error reporting mechanism is not really useful for
> probing since you'll get a bus error before any error irq is
> delivered.

Hmm if that's the case then yes we can't use the error irq. However,
what I've seen so far is that we only get the bus error if the
l3_* drivers are configured. I guess some more testing is needed.
 
> So I'd say you're back at having to trap that bus error using the
> exception handling mechanism, which I still suspect shouldn't be hard
> to do.

And in that case it makes sense to do that in the bootloader to
avoid adding any custom early boot code to Linux kernel.
 
> Or perhaps you could probe the device using a DMA access and combine
> that with the interconnect error reporting irq... ;-)

Heh too many dependencies :)

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-29 15:50                                                             ` Tony Lindgren
@ 2015-05-29 18:16                                                               ` Tony Lindgren
  -1 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-29 18:16 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Tony Lindgren <tony@atomide.com> [150529 08:52]:
> * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 18:37]:
> > On 29 May 2015 at 02:58, Matthijs van Duin <matthijsvanduin@gmail.com> wrote:
> > > It is only guaranteed to happen immediately (before the next
> > > instruction is executed) if the error occurs before the posting-point
> > > of the write. However, in that case the error is reported in-band to
> > > the cpu, resulting in a (synchronous) bus error which takes precedence
> > > over the out-of-band error irq (if any is signalled).
> > 
> > OK, all this was actually assuming linux uses device-type mappings for
> > device mappings, which was also the impression I got from
> > build_mem_type_table() in arch/arm/mm/mmu.c (although it's a bit of a
> > maze). A quick test however seems to imply otherwise:
> > 
> > ~# ./bogus-dev-write
> > Bus error
> > 
> > So... linux actually uses strongly-ordered mappings? I really didn't
> > expect that, given the performance implications (especially on a
> > strictly in-order cpu like the Cortex-A8 which will really just sit
> > there picking its nose until the write completes) and I think I recall
> > having seen an OCP barrier being used somewhere in driver code...
> 
> I believe some TI kernels use strongly-ordered mappings, mainline
> kernel does not. Which kernel version are you using?
>  
> > Well, in that case everything I said is technically still true, except
> > the posting point is the peripheral itself. That also means the
> > interconnect error reporting mechanism is not really useful for
> > probing since you'll get a bus error before any error irq is
> > delivered.
> 
> Hmm if that's the case then yes we can't use the error irq. However,
> what I've seen so far is that we only get the bus error if the
> l3_* drivers are configured. I guess some more testing is needed.
>  
> > So I'd say you're back at having to trap that bus error using the
> > exception handling mechanism, which I still suspect shouldn't be hard
> > to do.
> 
> And in that case it makes sense to do that in the bootloader to
> avoid adding any custom early boot code to Linux kernel.
>  
> > Or perhaps you could probe the device using a DMA access and combine
> > that with the interconnect error reporting irq... ;-)
> 
> Heh too many dependencies :)

If we can't use the l3 interrrupts, then something similar to commit
fdf4850cb5b2 ("ARM: BCM5301X: workaround suppress fault") might be
doable too.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-29 18:16                                                               ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-05-29 18:16 UTC (permalink / raw)
  To: linux-arm-kernel

* Tony Lindgren <tony@atomide.com> [150529 08:52]:
> * Matthijs van Duin <matthijsvanduin@gmail.com> [150528 18:37]:
> > On 29 May 2015 at 02:58, Matthijs van Duin <matthijsvanduin@gmail.com> wrote:
> > > It is only guaranteed to happen immediately (before the next
> > > instruction is executed) if the error occurs before the posting-point
> > > of the write. However, in that case the error is reported in-band to
> > > the cpu, resulting in a (synchronous) bus error which takes precedence
> > > over the out-of-band error irq (if any is signalled).
> > 
> > OK, all this was actually assuming linux uses device-type mappings for
> > device mappings, which was also the impression I got from
> > build_mem_type_table() in arch/arm/mm/mmu.c (although it's a bit of a
> > maze). A quick test however seems to imply otherwise:
> > 
> > ~# ./bogus-dev-write
> > Bus error
> > 
> > So... linux actually uses strongly-ordered mappings? I really didn't
> > expect that, given the performance implications (especially on a
> > strictly in-order cpu like the Cortex-A8 which will really just sit
> > there picking its nose until the write completes) and I think I recall
> > having seen an OCP barrier being used somewhere in driver code...
> 
> I believe some TI kernels use strongly-ordered mappings, mainline
> kernel does not. Which kernel version are you using?
>  
> > Well, in that case everything I said is technically still true, except
> > the posting point is the peripheral itself. That also means the
> > interconnect error reporting mechanism is not really useful for
> > probing since you'll get a bus error before any error irq is
> > delivered.
> 
> Hmm if that's the case then yes we can't use the error irq. However,
> what I've seen so far is that we only get the bus error if the
> l3_* drivers are configured. I guess some more testing is needed.
>  
> > So I'd say you're back at having to trap that bus error using the
> > exception handling mechanism, which I still suspect shouldn't be hard
> > to do.
> 
> And in that case it makes sense to do that in the bootloader to
> avoid adding any custom early boot code to Linux kernel.
>  
> > Or perhaps you could probe the device using a DMA access and combine
> > that with the interconnect error reporting irq... ;-)
> 
> Heh too many dependencies :)

If we can't use the l3 interrrupts, then something similar to commit
fdf4850cb5b2 ("ARM: BCM5301X: workaround suppress fault") might be
doable too.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-29 15:50                                                             ` Tony Lindgren
@ 2015-05-30 15:22                                                               ` Matthijs van Duin
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-30 15:22 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On 29 May 2015 at 17:50, Tony Lindgren <tony@atomide.com> wrote:
> I believe some TI kernels use strongly-ordered mappings, mainline
> kernel does not. Which kernel version are you using?

Normally I periodically rebuild based on Robert C Nelson's -bone
kernel (but with heavily customized config). I also tried a plain
4.1.0-rc5-bone3, the generic 4.1.0-rc5-armv7-x0 (the most
vanilla-looking kernel I could find in my debian package list), and
for the heck of it also the classic 3.14.43-ti-r66.

In all cases I observed a synchronous bus error (dubiously reported as
"external abort on non-linefetch (0x1818)") on an AM335x with this
trivial test:

int main() {
        int fd = open( "/dev/mem", O_RDWR | O_DSYNC );
        if( fd < 0 ) return 1;
        void *ptr = mmap( NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0x42000000 );
        if( ptr == MAP_FAILED ) return 1;
        *(volatile int *)ptr = 0;
        return 0;
}

I even considered for a moment that maybe the AM335x has some "all
writes non-posted" thing enabled (which I think is available as a
switch on OMAP 4/5?). It seemed unlikely, but since most of my
exploration of interconnect behaviour was done on a DM814x, I
double-checked by performing the same write in a baremetal test
program (with that region configured Device-type in the MMU). As
expected, no data abort occurred, so writes most certainly are posted.

So I have trouble coming up with any explanation for this other than
the use of strongly-ordered mappings.

(Curiously BTW, omitting O_DSYNC made no difference.)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-05-30 15:22                                                               ` Matthijs van Duin
  0 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-05-30 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 May 2015 at 17:50, Tony Lindgren <tony@atomide.com> wrote:
> I believe some TI kernels use strongly-ordered mappings, mainline
> kernel does not. Which kernel version are you using?

Normally I periodically rebuild based on Robert C Nelson's -bone
kernel (but with heavily customized config). I also tried a plain
4.1.0-rc5-bone3, the generic 4.1.0-rc5-armv7-x0 (the most
vanilla-looking kernel I could find in my debian package list), and
for the heck of it also the classic 3.14.43-ti-r66.

In all cases I observed a synchronous bus error (dubiously reported as
"external abort on non-linefetch (0x1818)") on an AM335x with this
trivial test:

int main() {
        int fd = open( "/dev/mem", O_RDWR | O_DSYNC );
        if( fd < 0 ) return 1;
        void *ptr = mmap( NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0x42000000 );
        if( ptr == MAP_FAILED ) return 1;
        *(volatile int *)ptr = 0;
        return 0;
}

I even considered for a moment that maybe the AM335x has some "all
writes non-posted" thing enabled (which I think is available as a
switch on OMAP 4/5?). It seemed unlikely, but since most of my
exploration of interconnect behaviour was done on a DM814x, I
double-checked by performing the same write in a baremetal test
program (with that region configured Device-type in the MMU). As
expected, no data abort occurred, so writes most certainly are posted.

So I have trouble coming up with any explanation for this other than
the use of strongly-ordered mappings.

(Curiously BTW, omitting O_DSYNC made no difference.)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-05-30 15:22                                                               ` Matthijs van Duin
@ 2015-06-01 17:58                                                                 ` Tony Lindgren
  -1 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-06-01 17:58 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Matthijs van Duin <matthijsvanduin@gmail.com> [150530 08:24]:
> On 29 May 2015 at 17:50, Tony Lindgren <tony@atomide.com> wrote:
> > I believe some TI kernels use strongly-ordered mappings, mainline
> > kernel does not. Which kernel version are you using?
> 
> Normally I periodically rebuild based on Robert C Nelson's -bone
> kernel (but with heavily customized config). I also tried a plain
> 4.1.0-rc5-bone3, the generic 4.1.0-rc5-armv7-x0 (the most
> vanilla-looking kernel I could find in my debian package list), and
> for the heck of it also the classic 3.14.43-ti-r66.
> 
> In all cases I observed a synchronous bus error (dubiously reported as
> "external abort on non-linefetch (0x1818)") on an AM335x with this
> trivial test:
> 
> int main() {
>         int fd = open( "/dev/mem", O_RDWR | O_DSYNC );
>         if( fd < 0 ) return 1;
>         void *ptr = mmap( NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0x42000000 );
>         if( ptr == MAP_FAILED ) return 1;
>         *(volatile int *)ptr = 0;
>         return 0;
> }
> 
> I even considered for a moment that maybe the AM335x has some "all
> writes non-posted" thing enabled (which I think is available as a
> switch on OMAP 4/5?). It seemed unlikely, but since most of my
> exploration of interconnect behaviour was done on a DM814x, I
> double-checked by performing the same write in a baremetal test
> program (with that region configured Device-type in the MMU). As
> expected, no data abort occurred, so writes most certainly are posted.
> 
> So I have trouble coming up with any explanation for this other than
> the use of strongly-ordered mappings.
> 
> (Curiously BTW, omitting O_DSYNC made no difference.)

I think these kernels are missing the configuration for l3-noc
driver?

I tried it on omap4 that has l3-noc configured, and it first produces
"Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fd7000",
and the L3 interrupt only after that. So yeah, you're right, we can't
use the interrupts here. I somehow remembered we'd get only the L3
interrupt if configured.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-06-01 17:58                                                                 ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-06-01 17:58 UTC (permalink / raw)
  To: linux-arm-kernel

* Matthijs van Duin <matthijsvanduin@gmail.com> [150530 08:24]:
> On 29 May 2015 at 17:50, Tony Lindgren <tony@atomide.com> wrote:
> > I believe some TI kernels use strongly-ordered mappings, mainline
> > kernel does not. Which kernel version are you using?
> 
> Normally I periodically rebuild based on Robert C Nelson's -bone
> kernel (but with heavily customized config). I also tried a plain
> 4.1.0-rc5-bone3, the generic 4.1.0-rc5-armv7-x0 (the most
> vanilla-looking kernel I could find in my debian package list), and
> for the heck of it also the classic 3.14.43-ti-r66.
> 
> In all cases I observed a synchronous bus error (dubiously reported as
> "external abort on non-linefetch (0x1818)") on an AM335x with this
> trivial test:
> 
> int main() {
>         int fd = open( "/dev/mem", O_RDWR | O_DSYNC );
>         if( fd < 0 ) return 1;
>         void *ptr = mmap( NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0x42000000 );
>         if( ptr == MAP_FAILED ) return 1;
>         *(volatile int *)ptr = 0;
>         return 0;
> }
> 
> I even considered for a moment that maybe the AM335x has some "all
> writes non-posted" thing enabled (which I think is available as a
> switch on OMAP 4/5?). It seemed unlikely, but since most of my
> exploration of interconnect behaviour was done on a DM814x, I
> double-checked by performing the same write in a baremetal test
> program (with that region configured Device-type in the MMU). As
> expected, no data abort occurred, so writes most certainly are posted.
> 
> So I have trouble coming up with any explanation for this other than
> the use of strongly-ordered mappings.
> 
> (Curiously BTW, omitting O_DSYNC made no difference.)

I think these kernels are missing the configuration for l3-noc
driver?

I tried it on omap4 that has l3-noc configured, and it first produces
"Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fd7000",
and the L3 interrupt only after that. So yeah, you're right, we can't
use the interrupts here. I somehow remembered we'd get only the L3
interrupt if configured.

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-06-01 17:58                                                                 ` Tony Lindgren
@ 2015-06-01 20:32                                                                   ` Matthijs van Duin
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-06-01 20:32 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On 1 June 2015 at 19:58, Tony Lindgren <tony@atomide.com> wrote:
> I think these kernels are missing the configuration for l3-noc
> driver?

Yup. Since I'm pretty sure I have all the necessary info I was hoping
look into that... somewhere in my copious spare time...

> I tried it on omap4 that has l3-noc configured, and it first produces
> "Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fd7000",

(Though making a patch to fix that annoyingly wrong and useless
message is higher on my list of priorities)

> and the L3 interrupt only after that. So yeah, you're right, we can't
> use the interrupts here. I somehow remembered we'd get only the L3
> interrupt if configured.

The bus error is not influenced by L3 error reporting config afaik,
and it will always win from the irq: even though the irq is almost
certainly asserted first, it can't be taken until the load/store
instruction completes, and then the fault will take precedence.

While implementing L3 error reporting in my forth system I ran into a
tricky scenario though: it turns out that if an irq occurs while the
cpu is waiting for instruction fetch, it does allow the irq to be
taken. The interrupted fetch is abandoned and any bus error it may
have produced is ignored since exception entry/exit is an implicit
instruction sync barrier. On return it is simply refetched...

Hence, the result from attempting to execute code from an invalid address:
  fetching from [invalid]
    irq entry (LR=[invalid])
    L3 error displayed
    irq exit
  fetching from [invalid]
    irq entry (LR=[invalid])
    L3 error displayed
    irq exit
  fetching from [invalid]
    ...
(repeat until watchdog expires)


Anyhow, so we still have the puzzling fact that apparently neither of
us was expecting device memory to use a strongly-ordered mapping,
getting a bus error on a write (outside MPUSS itself) shows that it
does.

I've tried to read arch/arm/mm/mmu.c to find out why, but so far I'm
feeling hopelessly lost there... (the multitude of ARM architecture
versions/flavors supported aren't helping.)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-06-01 20:32                                                                   ` Matthijs van Duin
  0 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-06-01 20:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 June 2015 at 19:58, Tony Lindgren <tony@atomide.com> wrote:
> I think these kernels are missing the configuration for l3-noc
> driver?

Yup. Since I'm pretty sure I have all the necessary info I was hoping
look into that... somewhere in my copious spare time...

> I tried it on omap4 that has l3-noc configured, and it first produces
> "Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fd7000",

(Though making a patch to fix that annoyingly wrong and useless
message is higher on my list of priorities)

> and the L3 interrupt only after that. So yeah, you're right, we can't
> use the interrupts here. I somehow remembered we'd get only the L3
> interrupt if configured.

The bus error is not influenced by L3 error reporting config afaik,
and it will always win from the irq: even though the irq is almost
certainly asserted first, it can't be taken until the load/store
instruction completes, and then the fault will take precedence.

While implementing L3 error reporting in my forth system I ran into a
tricky scenario though: it turns out that if an irq occurs while the
cpu is waiting for instruction fetch, it does allow the irq to be
taken. The interrupted fetch is abandoned and any bus error it may
have produced is ignored since exception entry/exit is an implicit
instruction sync barrier. On return it is simply refetched...

Hence, the result from attempting to execute code from an invalid address:
  fetching from [invalid]
    irq entry (LR=[invalid])
    L3 error displayed
    irq exit
  fetching from [invalid]
    irq entry (LR=[invalid])
    L3 error displayed
    irq exit
  fetching from [invalid]
    ...
(repeat until watchdog expires)


Anyhow, so we still have the puzzling fact that apparently neither of
us was expecting device memory to use a strongly-ordered mapping,
getting a bus error on a write (outside MPUSS itself) shows that it
does.

I've tried to read arch/arm/mm/mmu.c to find out why, but so far I'm
feeling hopelessly lost there... (the multitude of ARM architecture
versions/flavors supported aren't helping.)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-06-01 20:32                                                                   ` Matthijs van Duin
@ 2015-06-01 20:52                                                                     ` Tony Lindgren
  -1 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-06-01 20:52 UTC (permalink / raw)
  To: Matthijs van Duin
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

* Matthijs van Duin <matthijsvanduin@gmail.com> [150601 13:34]:
> On 1 June 2015 at 19:58, Tony Lindgren <tony@atomide.com> wrote:
> > I think these kernels are missing the configuration for l3-noc
> > driver?
> 
> Yup. Since I'm pretty sure I have all the necessary info I was hoping
> look into that... somewhere in my copious spare time...
> 
> > I tried it on omap4 that has l3-noc configured, and it first produces
> > "Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fd7000",
> 
> (Though making a patch to fix that annoyingly wrong and useless
> message is higher on my list of priorities)
> 
> > and the L3 interrupt only after that. So yeah, you're right, we can't
> > use the interrupts here. I somehow remembered we'd get only the L3
> > interrupt if configured.
> 
> The bus error is not influenced by L3 error reporting config afaik,
> and it will always win from the irq: even though the irq is almost
> certainly asserted first, it can't be taken until the load/store
> instruction completes, and then the fault will take precedence.
> 
> While implementing L3 error reporting in my forth system I ran into a
> tricky scenario though: it turns out that if an irq occurs while the
> cpu is waiting for instruction fetch, it does allow the irq to be
> taken. The interrupted fetch is abandoned and any bus error it may
> have produced is ignored since exception entry/exit is an implicit
> instruction sync barrier. On return it is simply refetched...
> 
> Hence, the result from attempting to execute code from an invalid address:
>   fetching from [invalid]
>     irq entry (LR=[invalid])
>     L3 error displayed
>     irq exit
>   fetching from [invalid]
>     irq entry (LR=[invalid])
>     L3 error displayed
>     irq exit
>   fetching from [invalid]
>     ...
> (repeat until watchdog expires)

OK that must be the case I've seen then. Probably that happens
when a device is not clocked. 
 
> Anyhow, so we still have the puzzling fact that apparently neither of
> us was expecting device memory to use a strongly-ordered mapping,
> getting a bus error on a write (outside MPUSS itself) shows that it
> does.

Hmm well it should be just MT_DEVICE for anything Linux ioremaps..
Care to verify that from a device driver that does ioremap on it
first?
 
> I've tried to read arch/arm/mm/mmu.c to find out why, but so far I'm
> feeling hopelessly lost there... (the multitude of ARM architecture
> versions/flavors supported aren't helping.)

Heh yeah too much hardware churn going on :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-06-01 20:52                                                                     ` Tony Lindgren
  0 siblings, 0 replies; 87+ messages in thread
From: Tony Lindgren @ 2015-06-01 20:52 UTC (permalink / raw)
  To: linux-arm-kernel

* Matthijs van Duin <matthijsvanduin@gmail.com> [150601 13:34]:
> On 1 June 2015 at 19:58, Tony Lindgren <tony@atomide.com> wrote:
> > I think these kernels are missing the configuration for l3-noc
> > driver?
> 
> Yup. Since I'm pretty sure I have all the necessary info I was hoping
> look into that... somewhere in my copious spare time...
> 
> > I tried it on omap4 that has l3-noc configured, and it first produces
> > "Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fd7000",
> 
> (Though making a patch to fix that annoyingly wrong and useless
> message is higher on my list of priorities)
> 
> > and the L3 interrupt only after that. So yeah, you're right, we can't
> > use the interrupts here. I somehow remembered we'd get only the L3
> > interrupt if configured.
> 
> The bus error is not influenced by L3 error reporting config afaik,
> and it will always win from the irq: even though the irq is almost
> certainly asserted first, it can't be taken until the load/store
> instruction completes, and then the fault will take precedence.
> 
> While implementing L3 error reporting in my forth system I ran into a
> tricky scenario though: it turns out that if an irq occurs while the
> cpu is waiting for instruction fetch, it does allow the irq to be
> taken. The interrupted fetch is abandoned and any bus error it may
> have produced is ignored since exception entry/exit is an implicit
> instruction sync barrier. On return it is simply refetched...
> 
> Hence, the result from attempting to execute code from an invalid address:
>   fetching from [invalid]
>     irq entry (LR=[invalid])
>     L3 error displayed
>     irq exit
>   fetching from [invalid]
>     irq entry (LR=[invalid])
>     L3 error displayed
>     irq exit
>   fetching from [invalid]
>     ...
> (repeat until watchdog expires)

OK that must be the case I've seen then. Probably that happens
when a device is not clocked. 
 
> Anyhow, so we still have the puzzling fact that apparently neither of
> us was expecting device memory to use a strongly-ordered mapping,
> getting a bus error on a write (outside MPUSS itself) shows that it
> does.

Hmm well it should be just MT_DEVICE for anything Linux ioremaps..
Care to verify that from a device driver that does ioremap on it
first?
 
> I've tried to read arch/arm/mm/mmu.c to find out why, but so far I'm
> feeling hopelessly lost there... (the multitude of ARM architecture
> versions/flavors supported aren't helping.)

Heh yeah too much hardware churn going on :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
  2015-06-01 20:52                                                                     ` Tony Lindgren
@ 2015-06-02  4:21                                                                       ` Matthijs van Duin
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-06-02  4:21 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Pali Rohár, linux-arm-kernel, Sebastian Reichel, linux-omap,
	Aaro Koskinen, Pavel Machek, lkml, Nishanth Menon

On 1 June 2015 at 22:52, Tony Lindgren <tony@atomide.com> wrote:
> OK that must be the case I've seen then. Probably that happens
> when a device is not clocked.

It happens for any interconnect error reported as a result of
instruction fetch, but that is itself not a very common occurrence and
obviously doesn't apply to device memory.

Another case where the L3 error irq may be taken first is if the bus
error is asynchronous, but I don't think this combo can be produced on
a dm81xx or am335x, but that's mainly due to the strictly in-order
Cortex-A8 making almost every abort synchronous. I'd expect async
aborts are more common on an A9.

> Hmm well it should be just MT_DEVICE for anything Linux ioremaps..

Yikes, so both /dev/mem and uio are behaving unlike any device driver:
both use remap_pfn_range() after running the vm_page_prot though
pgprot_noncached() to set the memory type to L_PTE_MT_UNCACHED, which
counterintuitively turns out to mean strongly-ordered. o.O  Especially
uio is acting inappropriate here imho.

But this is problematic... these ranges are already mapped by the
kernel, and ARM explicitly forbids mapping the same physical range
twice with different memory attributes (and it's not the only
architecture to do so). Hmmz...

Anyhow, drifting a bit off-topic here. I'm going to some more reading
and thinking about this.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot)
@ 2015-06-02  4:21                                                                       ` Matthijs van Duin
  0 siblings, 0 replies; 87+ messages in thread
From: Matthijs van Duin @ 2015-06-02  4:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 June 2015 at 22:52, Tony Lindgren <tony@atomide.com> wrote:
> OK that must be the case I've seen then. Probably that happens
> when a device is not clocked.

It happens for any interconnect error reported as a result of
instruction fetch, but that is itself not a very common occurrence and
obviously doesn't apply to device memory.

Another case where the L3 error irq may be taken first is if the bus
error is asynchronous, but I don't think this combo can be produced on
a dm81xx or am335x, but that's mainly due to the strictly in-order
Cortex-A8 making almost every abort synchronous. I'd expect async
aborts are more common on an A9.

> Hmm well it should be just MT_DEVICE for anything Linux ioremaps..

Yikes, so both /dev/mem and uio are behaving unlike any device driver:
both use remap_pfn_range() after running the vm_page_prot though
pgprot_noncached() to set the memory type to L_PTE_MT_UNCACHED, which
counterintuitively turns out to mean strongly-ordered. o.O  Especially
uio is acting inappropriate here imho.

But this is problematic... these ranges are already mapped by the
kernel, and ARM explicitly forbids mapping the same physical range
twice with different memory attributes (and it's not the only
architecture to do so). Hmmz...

Anyhow, drifting a bit off-topic here. I'm going to some more reading
and thinking about this.

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2015-06-02  4:21 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-06 21:36 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot Sebastian Reichel
2013-12-06 22:27 ` Tony Lindgren
2013-12-07  0:00   ` Sebastian Reichel
2013-12-07  0:38     ` Tony Lindgren
2013-12-07  8:18     ` Pali Rohár
2013-12-07 13:48       ` Sebastian Reichel
2013-12-07 13:57         ` Pali Rohár
2013-12-07 16:51           ` Tony Lindgren
2013-12-07 17:53             ` Tony Lindgren
2013-12-07 18:49             ` runtime check for omap-aes bus access permission (was: Re: 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot) Sebastian Reichel
2013-12-07 21:11               ` Tony Lindgren
2013-12-07 23:03                 ` Sebastian Reichel
2013-12-07 23:22                   ` Tony Lindgren
2014-09-08 23:45                     ` Pali Rohár
2014-11-25 21:08                     ` Pali Rohár
2014-11-25 21:31                       ` Pali Rohár
2014-11-26 17:54                         ` Tony Lindgren
2015-01-17  9:18                           ` Pali Rohár
2015-01-17 17:04                             ` Tony Lindgren
2015-01-17 17:29                               ` Pali Rohár
2015-01-17 17:41                                 ` Tony Lindgren
2015-01-31 11:34                                   ` Pali Rohár
2015-01-31 15:13                                     ` Matthijs van Duin
2015-01-31 19:06                                       ` Pali Rohár
2015-02-11 12:39                                       ` Pali Rohár
2015-02-11 15:22                                         ` Matthijs van Duin
2015-02-11 20:28                                           ` Pali Rohár
2015-02-11 20:33                                             ` Tony Lindgren
2015-02-11 20:40                                             ` Nishanth Menon
2015-02-11 20:40                                               ` Nishanth Menon
2015-02-18 21:14                                               ` Pali Rohár
2015-02-18 21:14                                                 ` Pali Rohár
2015-02-18 21:14                                                 ` Pali Rohár
2015-05-28  7:37                                               ` Pali Rohár
2015-05-28  7:37                                                 ` Pali Rohár
2015-05-28  7:37                                                 ` Pali Rohár
2015-05-28 16:01                                                 ` Tony Lindgren
2015-05-28 16:01                                                   ` Tony Lindgren
2015-05-28 16:01                                                   ` Tony Lindgren
2015-05-28 20:26                                                   ` Matthijs van Duin
2015-05-28 20:26                                                     ` Matthijs van Duin
2015-05-28 22:24                                                     ` Tony Lindgren
2015-05-28 22:24                                                       ` Tony Lindgren
2015-05-28 22:27                                                       ` Pali Rohár
2015-05-28 22:27                                                         ` Pali Rohár
2015-05-28 22:27                                                         ` Pali Rohár
2015-05-29  0:15                                                         ` Tony Lindgren
2015-05-29  0:15                                                           ` Tony Lindgren
2015-05-29  0:15                                                           ` Tony Lindgren
2015-05-29  0:58                                                       ` Matthijs van Duin
2015-05-29  0:58                                                         ` Matthijs van Duin
2015-05-29  1:35                                                         ` Matthijs van Duin
2015-05-29  1:35                                                           ` Matthijs van Duin
2015-05-29 15:50                                                           ` Tony Lindgren
2015-05-29 15:50                                                             ` Tony Lindgren
2015-05-29 18:16                                                             ` Tony Lindgren
2015-05-29 18:16                                                               ` Tony Lindgren
2015-05-30 15:22                                                             ` Matthijs van Duin
2015-05-30 15:22                                                               ` Matthijs van Duin
2015-06-01 17:58                                                               ` Tony Lindgren
2015-06-01 17:58                                                                 ` Tony Lindgren
2015-06-01 20:32                                                                 ` Matthijs van Duin
2015-06-01 20:32                                                                   ` Matthijs van Duin
2015-06-01 20:52                                                                   ` Tony Lindgren
2015-06-01 20:52                                                                     ` Tony Lindgren
2015-06-02  4:21                                                                     ` Matthijs van Duin
2015-06-02  4:21                                                                       ` Matthijs van Duin
2015-02-19 18:20                                           ` Pali Rohár
2015-02-19 18:20                                             ` Pali Rohár
2015-02-19 20:25                                             ` Matthijs van Duin
2015-02-19 20:25                                               ` Matthijs van Duin
2015-02-19 21:10                                             ` Aaro Koskinen
2015-02-19 21:10                                               ` Aaro Koskinen
2015-01-24 10:40                     ` Pali Rohár
2015-01-31 14:38                       ` Matthijs van Duin
2015-01-31 19:09                         ` Pali Rohár
2015-02-01  1:36                           ` Matthijs van Duin
2015-02-01  8:56                             ` Pali Rohár
2015-02-11 20:43                               ` Pavel Machek
2015-02-11 21:14                                 ` Pali Rohár
2015-02-09 11:55     ` 3.13-rc3 (commit 7ce93f3) breaks Nokia N900 DT boot Pali Rohár
2013-12-08 14:13   ` Aaro Koskinen
2013-12-08 16:40     ` Tony Lindgren
2013-12-08 17:10       ` Sebastian Reichel
2013-12-08 17:43         ` Tony Lindgren
2013-12-08 17:59           ` Aaro Koskinen
2013-12-08 18:09           ` Sebastian Reichel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.