* Regression: memory corruption on Atmel SAMA5D31
@ 2022-03-03 0:29 Peter Rosin
2022-03-03 3:02 ` Saravana Kannan
2022-03-04 8:00 ` Thorsten Leemhuis
0 siblings, 2 replies; 39+ messages in thread
From: Peter Rosin @ 2022-03-03 0:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, Nicolas Ferre, Alexandre Belloni,
Ludovic Desroches
Cc: Saravana Kannan, Daniels Umanovskis, Greg Kroah-Hartman
[-- Attachment #1: Type: text/plain, Size: 11087 bytes --]
Hi!
I'm seeing a weird problem, and I'd like some help with further
things to try in order to track down what's going on. I have
bisected the issue to
f9aa460672c9 ("driver core: Refactor fw_devlink feature")
The symptoms are that I get (seemingly) random memory corruption
when processing large amounts of data (compared to system size).
I have two known reproducers, but I'm sure there are more if I
keep digging. One is to do this:
$ dd if=/dev/urandom of=testfile bs=1024 count=40000
40000+0 records in
40000+0 records out
40960000 bytes (41 MB, 39 MiB) copied, 19.7759 s, 2.1 MB/s
$ for i in 1 2 3 4; do cat testfile | sha256sum; done
d8c85f816e08baa5ad27050bf0413e11a09f325fb0a8843b7b2b45b9333ab542 -
f223c1cbb6dbecb02d1741e7991dc98cd8d5b40ffee05bb32dc2c15eb73d6b1f -
d6f3e7f3d325c67e83a6104934dd8a7c891ebfd9a2cf59633dbe97fb2cbb9c81 -
cf8ada47e7e2fee299314440b225ba83fca3cef1f6286adc160a5d4f207caccd -
It is harder to tickle the problem if I redirect the testfile to
sha256sum w/o involving cat or give the file as an argument to
sha256sum. I can also get things to behave better by getting rid
of a bunch of USB interrupts by doing the following:
$ echo 100 > /sys/bus/usb-serial/devices/ttyUSB0/latency_timer
$ echo 100 > /sys/bus/usb-serial/devices/ttyUSB1/latency_timer
$ echo 100 > /sys/bus/usb-serial/devices/ttyUSB2/latency_timer
$ echo 100 > /sys/bus/usb-serial/devices/ttyUSB3/latency_timer
With the lower interrupt pressure I get this:
$ for i in 1 2 3 4; do cat testfile | sha256sum; done
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
Nice. However, I need the latency to be lower than the default
16ms, 3ms could perhaps work in theory, but preferably 1ms, so
the above 100ms is far off. The initial hash run was with latency
set to 1ms, which makes it easy to trigger the issue. The latency
timer setting is for this driver: drivers/usb/serial/ftdi_sio.c
And also, that does not help with the other reproducer, namely
to copy that same random testfile with scp to a working system...
$ scp testfile peda@xyzzy:testfile1
testfile 100% 39MB 2.0MB/s 00:19
$ scp testfile peda@xyzzy:testfile2
testfile 100% 39MB 2.1MB/s 00:18
$ scp testfile peda@xyzzy:testfile3
testfile 100% 39MB 2.1MB/s 00:18
$ scp testfile peda@xyzzy:testfile4
testfile 100% 39MB 2.1MB/s 00:19
...and then perform the sha256sum on that xyzzy host instead:
$ sha256sum testfile?
39dc3a7d05483ae7a2c64c5ed2e8e6108287bf4ddf124a2f0c1a9d0221f9ac66 testfile1
9597ef542e7cce879872a027d9ec591feb5fc766aeaec47d58eff6e8c6ab3206 testfile2
c6104a700b1d6f13eb1de84b5a91a1846a3e1576e052d51a664d2e2711a3869d testfile3
60b9c240cb331bad530c3c1d766f50d53a24e01831bfc04e48f329b738521310 testfile4
$ sha256sum testfile?
39dc3a7d05483ae7a2c64c5ed2e8e6108287bf4ddf124a2f0c1a9d0221f9ac66 testfile1
9597ef542e7cce879872a027d9ec591feb5fc766aeaec47d58eff6e8c6ab3206 testfile2
c6104a700b1d6f13eb1de84b5a91a1846a3e1576e052d51a664d2e2711a3869d testfile3
60b9c240cb331bad530c3c1d766f50d53a24e01831bfc04e48f329b738521310 testfile4
Same output every time. Of course. xyzzy is a working system...
Converting these files to hex (hexdump -C) and diffing yields this:
$ diff -u0 testfile1.hex testfile2.hex
--- testfile1.hex 2022-03-02 23:56:38.273149516 +0100
+++ testfile2.hex 2022-03-03 00:00:57.912747033 +0100
@@ -8658,2 +8658,2 @@
-00021d10 08 2a dd c6 c8 0f 0d e2 4c 1e 46 21 f9 89 a2 54 |.*......L.F!...T|
-00021d20 23 8c 4f f1 46 f1 61 05 ee f2 d2 ee 56 79 4f 28 |#.O.F.a.....VyO(|
+00021d10 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
+00021d20 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
@@ -20592,2 +20592,2 @@
-000506f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
-00050700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
+000506f0 ad e6 d5 65 e6 dc c1 a3 e2 ba c9 e2 61 39 5f 5f |...e........a9__|
+00050700 bf eb 8e 5c 08 f1 f2 89 3c 57 c5 07 b9 f4 91 fc |...\....<W......|
@@ -461019,2 +461019,2 @@
-00708da0 0d 49 c3 e8 57 06 20 5a c1 27 74 29 f8 83 af 69 |.I..W. Z.'t)...i|
-00708db0 94 4d 5b 71 9f 3e e5 d2 91 cc cb cd aa ff 44 8b |.M[q.>........D.|
+00708da0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
+00708db0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
@@ -902788,2 +902788,2 @@
-00dc6830 f2 41 23 1b ec 54 d5 fe f0 33 51 f7 d2 fc bf bd |.A#..T...3Q.....|
-00dc6840 e5 1f 58 df 24 2f e3 dc 65 87 b2 27 12 86 d1 9a |..X.$/..e..'....|
+00dc6830 44 82 94 b5 c9 26 08 42 bd 89 e1 96 41 66 8a b5 |D....&.B....Af..|
+00dc6840 a5 34 46 5e fd 1b c1 73 86 33 24 fd 4d e1 e1 68 |.4F^...s.3$.M..h|
@@ -931900,2 +931900,2 @@
-00e383b0 ee 64 c5 6f 38 44 5b 31 41 e1 2c 64 49 d5 f8 ad |.d.o8D[1A.,dI...|
-00e383c0 fb 85 52 4f 00 1f 80 7a f3 de ee 8e db ac d5 bb |..RO...z........|
+00e383b0 4b 4d 29 a1 0a 99 8f f7 32 71 8c de 23 ca a0 f1 |KM).....2q..#...|
+00e383c0 e2 af e3 c4 a0 95 d3 1c ed 58 c4 c5 30 da 56 b9 |.........X..0.V.|
@@ -1170109,2 +1170109,2 @@
-011dabc0 6a 7c 0c 3c 86 1a b6 48 50 d7 98 68 0c 01 e3 1c |j|.<...HP..h....|
-011dabd0 a3 a8 b0 f2 62 21 86 b9 d1 52 9d 74 9e 26 42 51 |....b!...R.t.&BQ|
+011dabc0 5b 1a 9e 23 ae 58 42 68 83 58 df d6 c1 57 6b b0 |[..#.XBh.X...Wk.|
+011dabd0 ec d5 50 8b 76 5e 96 b4 49 21 f7 e4 b7 8f a3 45 |..P.v^..I!.....E|
@@ -1880164,2 +1880164,2 @@
-01cb0630 1c 74 74 16 75 b4 de f7 ce 4b 5e 4d 97 d6 36 d4 |.tt.u....K^M..6.|
-01cb0640 44 d9 fd 69 c5 d0 f0 a6 c6 44 26 53 7f 91 f3 62 |D..i.....D&S...b|
+01cb0630 73 bc 40 ce f8 9d 99 91 1b 14 8b a8 52 2a 7b 39 |s.@.........R*{9|
+01cb0640 6b ff f5 c5 02 b9 ab c2 c2 08 5e e7 3a 5e 69 c4 |k.........^.:^i.|
Grepping (some of the above) for duplicates yields this:
$ egrep "0 (08 2a dd|23 8c 4f|7b c8 d2|2a f8 fb)" testfile1.hex
00020d40 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
00020d50 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
00021d10 08 2a dd c6 c8 0f 0d e2 4c 1e 46 21 f9 89 a2 54 |.*......L.F!...T|
00021d20 23 8c 4f f1 46 f1 61 05 ee f2 d2 ee 56 79 4f 28 |#.O.F.a.....VyO(|
$ egrep "0 (08 2a dd|23 8c 4f|7b c8 d2|2a f8 fb)" testfile2.hex
00020d40 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
00020d50 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
00021d10 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|*
00021d20 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|*
$ egrep "0 (1f 6c ca|8b 6d 86|ad e6 d5|bf eb 8e)" testfile1.hex
0004f6f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
0004f700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
000506f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|*
00050700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|*
$ egrep "0 (1f 6c ca|8b 6d 86|ad e6 d5|bf eb 8e)" testfile2.hex
0004f6f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
0004f700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
000506f0 ad e6 d5 65 e6 dc c1 a3 e2 ba c9 e2 61 39 5f 5f |...e........a9__|
00050700 bf eb 8e 5c 08 f1 f2 89 3c 57 c5 07 b9 f4 91 fc |...\....<W......|
$ egrep "0 (0d 49 c3|94 4d 5b|d3 b4 96|5f f4 10 92)" testfile1.hex
00707dd0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
00707de0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
00708da0 0d 49 c3 e8 57 06 20 5a c1 27 74 29 f8 83 af 69 |.I..W. Z.'t)...i|
00708db0 94 4d 5b 71 9f 3e e5 d2 91 cc cb cd aa ff 44 8b |.M[q.>........D.|
$ egrep "0 (0d 49 c3|94 4d 5b|d3 b4 96|5f f4 10 92)" testfile2.hex
00707dd0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
00707de0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
00708da0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|*
00708db0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|*
I.e. testfile1 is (probably) corrupted at 000506f0..70f while
testfile2 is (probably) corrupted at 00021d10..2f and 00708da0..bf
(correpted lines marked with hand-made asterisks above)
If I keep grepping like this, the pattern is similar both within
these files and within testfile3 and testfile4. I.e. with
corruptions in 32-byte blocks at (seemingly) random positions
in the files. The corruption is always 16-byte-aligned and the bad
data seems to be a copy from exactly one page up in the file.
As stated above, I have bisected the issue to patch
f9aa460672c9 ("driver core: Refactor fw_devlink feature")
which was added between v5.10-rc3 and v5.10-rc4. Every kernel I have
tried with that patch applied have exhibited the issue, and I have
had no trouble like this with any kernel without that patch. Apart
from a whole bunch of kernels prior to v5.10-rc3, that includes some
later kernels with the patch reverted (along with the dependent
followup 2d09e6eb4a6f). The latest I have tried is 5.11.22. Those
two patches does not revert cleanly in 5.12 (and thereafter) so I
have not tried anything beyond 5.11 with the patch reverted.
I fail to understand how that patch might cause this issue. I have
compared boot messages before and after the patch and there is no
(significant) difference. Everything seems to happen in the same
order with the same result. But that comparison is of course limited
to what is logged.
In some random attempt I tried to disable the D-Cache bit, and that
makes it all very slow but it also (seemingly) fixes the issue. But
that may of course be due to vastly different timings.
Some background:
We have a "Linea" CPU module, with a design based on the Atmel (now
Microchip) SAMA5D31 evaluation board. This CPU module is used on e.g.
our TSE-850 for which there is a device tree in
arch/arm/boot/dts/at91-tse850-3.dts
It has a nand flash for the rootfs and 64 MB RAM. The 40 MB random
testfile is thus big enough to cause page cache churn.
We have used this module in thousands of delivered units (however,
not that many TSE-850) and have never observed anything like this
before. But that has been with older kernels. 4.13.<something> and
4.15.<something> was what we were on until this recent activity.
We're now developing a new product (preliminary device tree included)
and the trusty old CPU module was used again and a fresh new kernel
was built for it. I then started to notice this issue and have tried
to include as much relevant data as possible. If you need more data
or would like me to test something, please ask.
I'm stumped.
Cheers,
Peter
[-- Attachment #2: .config --]
[-- Type: application/xml, Size: 106948 bytes --]
[-- Attachment #3: dmesg --]
[-- Type: text/plain, Size: 17899 bytes --]
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 5.10.0-rc3+ (peda@orc) (arm-buildroot-linux-gnueabihf-gcc.br_real (Buildroot 2021.08) 10.3.0, GNU ld (GNU Binutils) 2.36.1) #65 Wed Mar 2 16:33:37 CET 2022
[ 0.000000] CPU: ARMv7 Processor [410fc051] revision 1 (ARMv7), cr=10c53c7d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] OF: fdt: Machine model: Axentia ME20 1.0
[ 0.000000] Memory policy: Data cache writeback
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000020000000-0x0000000023ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000020000000-0x0000000023ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000020000000-0x0000000023ffffff]
[ 0.000000] On node 0 totalpages: 16384
[ 0.000000] Normal zone: 128 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 16384 pages, LIFO batch:3
[ 0.000000] CPU: All CPU(s) started in SVC mode.
[ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 16256
[ 0.000000] Kernel command line: console=ttyS0,115200 rw oops=panic panic=30 ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
[ 0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 54036K/65536K available (7168K kernel code, 336K rwdata, 1280K rodata, 1024K init, 105K bss, 11500K reserved, 0K cma-reserved)
[ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[ 0.000000] random: get_random_bytes called from start_kernel+0x288/0x3b8 with crng_init=0
[ 0.000000] clocksource: timer@f0010000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 115833966437 ns
[ 0.000028] sched_clock: 32 bits at 16MHz, resolution 60ns, wraps every 130150523873ns
[ 0.000066] Switching to timer-based delay loop, resolution 60ns
[ 0.064143] clocksource: pit: mask: 0xfffffff max_cycles: 0xfffffff, max_idle_ns: 14479245754 ns
[ 0.064900] Console: colour dummy device 80x30
[ 0.064981] Calibrating delay loop (skipped), value calculated using timer frequency.. 33.00 BogoMIPS (lpj=165000)
[ 0.065032] pid_max: default: 32768 minimum: 301
[ 0.065342] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.065395] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.066572] CPU: Testing write buffer coherency: ok
[ 0.067881] Setting up static identity map for 0x20100000 - 0x20100060
[ 0.069481] devtmpfs: initialized
[ 0.087396] VFP support v0.3: implementor 41 architecture 2 part 30 variant 5 rev 1
[ 0.088032] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.088098] futex hash table entries: 256 (order: -1, 3072 bytes, linear)
[ 0.088292] pinctrl core: initialized pinctrl subsystem
[ 0.090363] NET: Registered protocol family 16
[ 0.091498] DMA: preallocated 256 KiB pool for atomic coherent allocations
[ 0.131639] AT91: PM: standby: standby, suspend: ulp0
[ 0.131689] No ATAGs?
[ 0.133101] gpio-at91 fffff200.gpio: at address (ptrval)
[ 0.134559] gpio-at91 fffff400.gpio: at address (ptrval)
[ 0.136118] gpio-at91 fffff600.gpio: at address (ptrval)
[ 0.137794] gpio-at91 fffff800.gpio: at address (ptrval)
[ 0.139548] gpio-at91 fffffa00.gpio: at address (ptrval)
[ 0.141556] pinctrl-at91 ahb:apb:pinctrl@fffff200: initialized AT91 pinctrl driver
[ 0.161208] at_hdmac ffffe600.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels
[ 0.163591] at_hdmac ffffe800.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels
[ 0.166563] AT91: Detected SoC family: sama5d3
[ 0.166604] AT91: Detected SoC: sama5d31, revision 2
[ 0.169066] SCSI subsystem initialized
[ 0.169697] usbcore: registered new interface driver usbfs
[ 0.169858] usbcore: registered new interface driver hub
[ 0.170116] usbcore: registered new device driver usb
[ 0.171359] at91_i2c f0014000.i2c: using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers
[ 0.171919] i2c i2c-0: Not using recovery: no recover_bus() found
[ 0.172383] at91_i2c f0014000.i2c: AT91 i2c bus driver (hw version: 0x402).
[ 0.173759] at91_i2c f801c000.i2c: using dma1chan0 (tx) and dma1chan1 (rx) for DMA transfers
[ 0.174316] i2c i2c-2: Not using recovery: no recover_bus() found
[ 0.176517] pca953x 2-0038: supply vcc not found, using dummy regulator
[ 0.177002] pca953x 2-0038: using no AI
[ 0.179527] pca953x 2-0039: supply vcc not found, using dummy regulator
[ 0.179968] pca953x 2-0039: using no AI
[ 0.183828] at91_i2c f801c000.i2c: AT91 i2c bus driver (hw version: 0x402).
[ 0.185716] Advanced Linux Sound Architecture Driver Initialized.
[ 0.187582] clocksource: Switched to clocksource timer@f0010000
[ 0.214089] NET: Registered protocol family 2
[ 0.215420] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[ 0.215495] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.215550] TCP bind hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.215598] TCP: Hash tables configured (established 1024 bind 1024)
[ 0.215866] UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.215937] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.216304] NET: Registered protocol family 1
[ 0.218398] workingset: timestamp_bits=30 max_order=14 bucket_order=0
[ 0.219442] io scheduler mq-deadline registered
[ 0.219491] io scheduler kyber registered
[ 0.232691] brd: module loaded
[ 0.253479] loop: module loaded
[ 0.254262] at24 0-0051: supply vcc not found, using dummy regulator
[ 0.255672] at24 0-0051: 8192 byte 24c64 EEPROM, writable, 32 bytes/write
[ 0.256292] at24 2-0050: supply vcc not found, using dummy regulator
[ 0.257759] at24 2-0050: 8192 byte 24c64 EEPROM, writable, 32 bytes/write
[ 0.258668] atmel_usart_serial: Failed to locate of_node [id: -2]
[ 0.259328] atmel_usart_serial.0.auto: ttyS1 at MMIO 0xf001c000 (irq = 20, base_baud = 4125000) is a ATMEL_SERIAL
[ 0.261597] atmel_usart_serial: Failed to locate of_node [id: -2]
[ 0.262198] atmel_usart_serial.1.auto: ttyS2 at MMIO 0xf0020000 (irq = 21, base_baud = 4125000) is a ATMEL_SERIAL
[ 0.264248] atmel_usart_serial: Failed to locate of_node [id: -2]
[ 0.264825] atmel_usart_serial.2.auto: ttyS0 at MMIO 0xffffee00 (irq = 30, base_baud = 8250000) is a ATMEL_SERIAL
[ 0.888475] printk: console [ttyS0] enabled
[ 0.897801] libphy: Fixed MDIO Bus: probed
[ 0.904936] macb f802c000.ethernet: invalid hw address, using random
[ 0.912773] libphy: MACB_mii_bus: probed
[ 0.951592] macb f802c000.ethernet eth0: Cadence MACB rev 0x0001010c at 0xf802c000 irq 38 (da:cc:3e:24:ad:a2)
[ 0.962277] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 0.968841] ehci-atmel: EHCI Atmel driver
[ 0.977104] atmel-ehci 700000.ehci: EHCI Host Controller
[ 0.982537] atmel-ehci 700000.ehci: new USB bus registered, assigned bus number 1
[ 0.990358] atmel-ehci 700000.ehci: irq 40, io mem 0x00700000
[ 1.020063] atmel-ehci 700000.ehci: USB 2.0 started, EHCI 1.00
[ 1.026421] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.10
[ 1.034749] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 1.041987] usb usb1: Product: EHCI Host Controller
[ 1.046843] usb usb1: Manufacturer: Linux 5.10.0-rc3+ ehci_hcd
[ 1.052716] usb usb1: SerialNumber: 700000.ehci
[ 1.058752] hub 1-0:1.0: USB hub found
[ 1.062735] hub 1-0:1.0: 3 ports detected
[ 1.068424] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 1.074708] ohci-atmel: OHCI Atmel driver
[ 1.080595] at91_ohci 600000.ohci: USB Host Controller
[ 1.085802] at91_ohci 600000.ohci: new USB bus registered, assigned bus number 2
[ 1.093467] at91_ohci 600000.ohci: irq 40, io mem 0x00600000
[ 1.164436] usb usb2: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 5.10
[ 1.172776] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 1.179987] usb usb2: Product: USB Host Controller
[ 1.184818] usb usb2: Manufacturer: Linux 5.10.0-rc3+ ohci_hcd
[ 1.190659] usb usb2: SerialNumber: at91
[ 1.196357] hub 2-0:1.0: USB hub found
[ 1.200420] hub 2-0:1.0: 2 ports detected
[ 1.206887] usbcore: registered new interface driver uas
[ 1.212449] usbcore: registered new interface driver usb-storage
[ 1.218529] usbcore: registered new interface driver ums-alauda
[ 1.224563] usbcore: registered new interface driver ums-cypress
[ 1.230688] usbcore: registered new interface driver ums-datafab
[ 1.236764] usbcore: registered new interface driver ums_eneub6250
[ 1.243051] usbcore: registered new interface driver ums-freecom
[ 1.249126] usbcore: registered new interface driver ums-isd200
[ 1.255157] usbcore: registered new interface driver ums-jumpshot
[ 1.261358] usbcore: registered new interface driver ums-karma
[ 1.267262] usbcore: registered new interface driver ums-onetouch
[ 1.273470] usbcore: registered new interface driver ums-realtek
[ 1.279568] usbcore: registered new interface driver ums-sddr09
[ 1.285600] usbcore: registered new interface driver ums-sddr55
[ 1.291634] usbcore: registered new interface driver ums-usbat
[ 1.297718] usbcore: registered new interface driver ftdi_sio
[ 1.303598] usbserial: USB Serial support registered for FTDI USB Serial Device
[ 1.311990] atmel_usba_udc 500000.gadget: MMIO registers at [mem 0xf8030000-0xf8033fff] mapped at (ptrval)
[ 1.321837] atmel_usba_udc 500000.gadget: FIFO at [mem 0x00500000-0x005fffff] mapped at (ptrval)
[ 1.334189] g_serial gadget: Gadget Serial v2.4
[ 1.338709] g_serial gadget: g_serial ready
[ 1.345530] at91_rtc fffffeb0.rtc: registered as rtc0
[ 1.350683] at91_rtc fffffeb0.rtc: setting system clock to 2007-01-01T02:26:23 UTC (1167618383)
[ 1.359411] at91_rtc fffffeb0.rtc: AT91 Real Time Clock driver.
[ 1.365635] i2c /dev entries driver
[ 1.371400] at91-reset fffffe00.rstc: Starting after user reset
[ 1.378397] lm75 2-004a: supply vs not found, using dummy regulator
[ 1.386116] lm75 2-004a: hwmon0: sensor 'at30ts74'
[ 1.392489] at91sam9_wdt: enabled (heartbeat=15 sec, nowayout=0)
[ 1.399385] atmel_aes f8038000.aes: version: 0x135
[ 1.406967] atmel_aes f8038000.aes: Atmel AES - Using dma1chan2, dma1chan3 for DMA transfers
[ 1.416412] atmel_sha f8034000.sha: version: 0x410
[ 1.421501] atmel_sha f8034000.sha: using dma1chan4 for DMA transfers
[ 1.429768] atmel_sha f8034000.sha: Atmel SHA1/SHA256/SHA224/SHA384/SHA512
[ 1.437640] atmel_tdes f803c000.tdes: version: 0x701
[ 1.443072] atmel_tdes f803c000.tdes: using dma1chan5, dma1chan6 for DMA transfers
[ 1.453314] atmel_tdes f803c000.tdes: Atmel DES/TDES
[ 1.459147] usbcore: registered new interface driver usbhid
[ 1.464762] usbhid: USB HID core driver
[ 1.473121] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xac
[ 1.479468] nand: Micron MT29F4G08ABBDAHC
[ 1.483557] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[ 1.492011] usb 1-2: new high-speed USB device number 2 using atmel-ehci
[ 1.499339] Bad block table found at page 262080, version 0x01
[ 1.505707] Bad block table found at page 262016, version 0x01
[ 1.511975] 8 cmdlinepart partitions found on MTD device atmel_nand
[ 1.518226] Creating 8 MTD partitions on "atmel_nand":
[ 1.523440] 0x000000000000-0x000000040000 : "at91bootstrap"
[ 1.532455] 0x000000040000-0x0000000a0000 : "barebox"
[ 1.541055] 0x0000000c0000-0x000000100000 : "bareboxenv"
[ 1.549748] 0x000000100000-0x000000140000 : "bareboxenv2"
[ 1.558604] 0x000000180000-0x0000001a0000 : "oftree"
[ 1.567271] 0x000000200000-0x000000700000 : "kernel"
[ 1.576183] 0x000000800000-0x000010000000 : "rootfs"
[ 1.586451] 0x000010000000-0x000020000000 : "ovlfs"
[ 1.607590] xt_time: kernel timezone is -0000
[ 1.612412] gre: GRE over IPv4 demultiplexor driver
[ 1.617662] Initializing XFRM netlink socket
[ 1.622226] NET: Registered protocol family 10
[ 1.628589] Segment Routing with IPv6
[ 1.632861] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 1.640410] NET: Registered protocol family 17
[ 1.660691] ubi0: attaching mtd6
[ 1.664091] random: fast init done
[ 1.691575] usb 1-2: New USB device found, idVendor=0403, idProduct=6011, bcdDevice= 8.00
[ 1.699766] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 1.706991] usb 1-2: Product: Quad RS232-HS
[ 1.711201] usb 1-2: Manufacturer: FTDI
[ 1.717343] ftdi_sio 1-2:1.0: FTDI USB Serial Device converter detected
[ 1.724413] usb 1-2: Detected FT4232H
[ 1.729268] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB0
[ 1.737880] ftdi_sio 1-2:1.1: FTDI USB Serial Device converter detected
[ 1.744956] usb 1-2: Detected FT4232H
[ 1.749712] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB1
[ 1.758271] ftdi_sio 1-2:1.2: FTDI USB Serial Device converter detected
[ 1.765348] usb 1-2: Detected FT4232H
[ 1.770134] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB2
[ 1.778663] ftdi_sio 1-2:1.3: FTDI USB Serial Device converter detected
[ 1.785720] usb 1-2: Detected FT4232H
[ 1.790602] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB3
[ 2.863734] ubi0: scanning is finished
[ 2.878739] ubi0: attached mtd6 (name "rootfs", size 248 MiB)
[ 2.884544] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 2.891432] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[ 2.898209] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 2.905245] ubi0: good PEBs: 1984, bad PEBs: 0, corrupted PEBs: 0
[ 2.911348] ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
[ 2.918566] ubi0: max/mean erase counter: 2/0, WL threshold: 4096, image sequence number: 1631681466
[ 2.927754] ubi0: available PEBs: 0, total reserved PEBs: 1984, PEBs reserved for bad PEB handling: 80
[ 2.937532] ubi0: background thread "ubi_bgt0d" started, PID 63
[ 2.948405] input: keys as /devices/platform/keys/input/input0
[ 2.956048] ALSA device list:
[ 2.958985] No soundcards found.
[ 2.964228] UBIFS (ubi0:0): Mounting in unauthenticated mode
[ 2.970305] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 64
[ 3.052921] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "rootfs"
[ 3.060411] UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 3.070371] UBIFS (ubi0:0): FS size: 239857664 bytes (228 MiB, 1889 LEBs), journal size 9023488 bytes (8 MiB, 72 LEBs)
[ 3.081093] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 3.086922] UBIFS (ubi0:0): media format: w4/r0 (latest is w5/r0), UUID 67B154A3-BF3E-4742-918A-569AB897CAFD, small LPT model
[ 3.099888] VFS: Mounted root (ubifs filesystem) on device 0:13.
[ 3.107321] devtmpfs: mounted
[ 3.113200] Freeing unused kernel memory: 1024K
[ 3.117921] Run /sbin/init as init process
[ 3.122033] with arguments:
[ 3.124956] /sbin/init
[ 3.127659] noinitrd
[ 3.130214] with environment:
[ 3.133333] HOME=/
[ 3.135688] TERM=linux
[ 3.161134] random: crng init done
[ 3.818451] ubi1: attaching mtd7
[ 4.953973] ubi1: scanning is finished
[ 4.969163] ubi1: attached mtd7 (name "ovlfs", size 256 MiB)
[ 4.974881] ubi1: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 4.981769] ubi1: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[ 4.988549] ubi1: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 4.995554] ubi1: good PEBs: 2044, bad PEBs: 4, corrupted PEBs: 0
[ 5.001662] ubi1: user volume: 1, internal volumes: 1, max. volumes count: 128
[ 5.008886] ubi1: max/mean erase counter: 3/1, WL threshold: 4096, image sequence number: 1283539683
[ 5.018071] ubi1: available PEBs: 0, total reserved PEBs: 2044, PEBs reserved for bad PEB handling: 76
[ 5.027815] ubi1: background thread "ubi_bgt1d" started, PID 79
UBI device number 1, total 2044 LEBs (259538944 bytes, 247.5 MiB), available 0 LEBs (0 bytes), LEB size 126976 bytes (124.0 KiB)
ubimkvol: error!: UBI device does not have free logical eraseblocks
[ 5.143406] UBIFS (ubi1:0): Mounting in unauthenticated mode
[ 5.149436] UBIFS (ubi1:0): background thread "ubifs_bgt1_0" started, PID 85
[ 5.184097] UBIFS (ubi1:0): recovery needed
[ 5.289691] UBIFS (ubi1:0): recovery completed
[ 5.294361] UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "ovl"
[ 5.301564] UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 5.311518] UBIFS (ubi1:0): FS size: 247730176 bytes (236 MiB, 1951 LEBs), journal size 12443648 bytes (11 MiB, 98 LEBs)
[ 5.322421] UBIFS (ubi1:0): reserved for root: 4952683 bytes (4836 KiB)
[ 5.329036] UBIFS (ubi1:0): media format: w5/r0 (latest is w5/r0), UUID AF203193-3C1C-433E-95D2-2F69D45F490A, small LPT model
[ 6.502071] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
[-- Attachment #4: at91-me20.dts --]
[-- Type: text/plain, Size: 6633 bytes --]
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* at91-me20.dts - Device Tree file for the Axentia ME20 1.0 board
*
* Copyright (C) 2022 Axentia Technologies AB
*
* Author: Peter Rosin <peda@axentia.se>
*/
/dts-v1/;
#include "at91-linea.dtsi"
#include <dt-bindings/input/input.h>
#include <dt-bindings/pwm/pwm.h>
/ {
model = "Axentia ME20 1.0";
compatible = "axentia,me20", "axentia,linea",
"atmel,sama5d31", "atmel,sama5d3", "atmel,sama5";
phyxtal: oscillator {
compatible = "fixed-clock";
#clock-cells = <0>;
clock-frequency = <25000000>;
clock-output-names = "phyxtal";
};
phyclk: clock {
compatible = "fixed-factor-clock";
clocks = <&phyxtal>;
#clock-cells = <0>;
clock-div = <1>;
clock-mult = <2>;
clock-output-names = "phyclk";
};
reg_5v: att-regulator {
compatible = "regulator-fixed";
regulator-name = "5v-supply";
regulator-min-microvolt = <30000000>;
regulator-max-microvolt = <30000000>;
};
reg_30v: tune-regulator {
compatible = "regulator-fixed";
regulator-name = "30v-supply";
regulator-min-microvolt = <30000000>;
regulator-max-microvolt = <30000000>;
};
keys {
compatible = "gpio-keys";
pinctrl-0 = <&pinctrl_keys>;
pinctrl-names = "default";
back {
label = "BACK";
gpios = <&pioA 5 GPIO_ACTIVE_LOW>;
linux,code = <KEY_BACK>;
};
sel {
label = "SEL";
gpios = <&pioA 6 GPIO_ACTIVE_LOW>;
linux,code = <KEY_SELECT>;
};
up {
label = "UP";
gpios = <&pioA 7 GPIO_ACTIVE_LOW>;
linux,code = <KEY_UP>;
};
down {
label = "DOWN";
gpios = <&pioA 8 GPIO_ACTIVE_LOW>;
linux,code = <KEY_DOWN>;
};
};
};
&pinctrl {
me20 {
pinctrl_keys: keys {
atmel,pins = <AT91_PIOA 5 AT91_PERIPH_GPIO // BACK
AT91_PINCTRL_DEGLITCH
AT91_PIOA 6 AT91_PERIPH_GPIO // SEL
AT91_PINCTRL_DEGLITCH
AT91_PIOA 7 AT91_PERIPH_GPIO // UP
AT91_PINCTRL_DEGLITCH
AT91_PIOA 8 AT91_PERIPH_GPIO // DOWN
AT91_PINCTRL_DEGLITCH>;
};
pinctrl_usba_vbus: usba-vbus {
atmel,pins = <AT91_PIOD 30 AT91_PERIPH_GPIO // UC-ID
AT91_PINCTRL_DEGLITCH>;
};
};
};
&nand {
partitions {
compatible = "fixed-partitions";
#address-cells = <1>;
#size-cells = <1>;
at91bootstrap@0 {
label = "at91bootstrap";
reg = <0x0 0x40000>;
};
barebox@40000 {
label = "bootloader";
reg = <0x40000 0x60000>;
};
bareboxenv@c0000 {
label = "bareboxenv";
reg = <0xc0000 0x40000>;
};
bareboxenv2@100000 {
label = "bareboxenv2";
reg = <0x100000 0x40000>;
};
oftree@180000 {
label = "oftree";
reg = <0x180000 0x20000>;
};
kernel@200000 {
label = "kernel";
reg = <0x200000 0x500000>;
};
rootfs@800000 {
label = "rootfs";
reg = <0x800000 0x0f800000>;
};
ovlfs@10000000 {
label = "ovlfs";
reg = <0x10000000 0x10000000>;
};
};
};
&i2c2 {
status = "okay";
tune-b@c { /* ti,dac121c081 */
/* Clobbered by address 0x46. */
/* Broadcast address 0x48. */
compatible = "ti,dac7571";
reg = <0xc>;
vref-supply = <®_30v>;
label = "tune-b";
};
tune-a@d {
/* Clobbered by address 0x46. */
/* Broadcast address 0x48. */
compatible = "ti,dac7571";
reg = <0xd>;
vref-supply = <®_30v>;
label = "tune-a";
};
att@e {
/* Clobbered by address 0x47. */
/* Broadcast address 0x48. */
compatible = "ti,dac7571";
reg = <0xe>;
vref-supply = <®_5v>;
label = "att";
};
gpio-a@38 {
compatible = "nxp,pca9554";
reg = <0x38>;
gpio-controller;
#gpio-cells = <2>;
gpio-line-names =
"AF1", "AF2", "AF3", "AF4",
"HP", "PASS", "HP2", "/AMPA";
};
gpio-b@39 {
compatible = "nxp,pca9554";
reg = <0x39>;
gpio-controller;
#gpio-cells = <2>;
gpio-line-names =
"BF1", "BF2", "BF3", "BF4",
"", "", "", "/AMPB";
};
oled@3c {
compatible = "solomon,ssd1311";
reg = <0x3c>;
reset-gpios = <&pioD 29 GPIO_ACTIVE_LOW>;
solomon,opr-gpios = <&pioD 26 GPIO_ACTIVE_HIGH
&pioD 28 GPIO_ACTIVE_HIGH>;
};
temp@4a {
compatible = "atmel,at30ts74", "dallas,ds7505";
reg = <0x4a>;
};
eeprom@50 {
compatible = "st,24c64", "atmel,24c64";
reg = <0x50>;
pagesize = <32>;
wp-gpios = <&pioA 3 GPIO_ACTIVE_HIGH>;
};
tusb320@60 {
compatible = "ti,tusb320";
reg = <0x60>;
interrupt-parent = <&pioA>;
interrupts = <9 IRQ_TYPE_LEVEL_LOW>;
};
};
&watchdog {
status = "okay";
};
/* GNSS, ttyS1 */
&usart0 {
status = "okay";
atmel,use-dma-rx;
};
/* RS232 port, ttyS2 */
&usart1 {
status = "okay";
pinctrl-0 = <&pinctrl_usart1 &pinctrl_usart1_rts_cts>;
atmel,use-dma-rx;
};
&macb1 {
status = "okay";
phy-mode = "rmii";
#address-cells = <1>;
#size-cells = <0>;
phy0: ethernet-phy@1 {
reg = <1>;
clocks = <&phyclk>;
reset-gpios = <&pioC 17 GPIO_ACTIVE_LOW>;
reset-assert-us = <100>;
smsc,disable-energy-detect;
};
};
&usb0 { /* gadget */
status = "okay";
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_usba_vbus>;
atmel,vbus-gpio = <&pioD 30 GPIO_ACTIVE_HIGH>;
};
&usb1 { /* ohci */
status = "okay";
num-ports = <2>;
};
&usb2 { /* ehci */
status = "okay";
};
/* ttyS0 */
&dbgu {
status = "okay";
dmas = <0>, <0>; /* Do not use DMA for dbgu */
};
&pioA {
gpio-line-names =
/* 0 */ "232-PS", "232-MODE", "UB-RST", "E2-WP",
/* 4 */ "GPSO/O", "B-BACK", "B-SEL", "B-UP",
/* 8 */ "B-DOWN", "/UC-INT", "", "",
/* 12 */ "", "", "", "",
/* 16 */ "", "", "SDA", "SCL",
/* 20 */ "", "ALE", "CLE", "",
/* 24 */ "", "", "", "",
/* 28 */ "", "", "LINSDA", "LINSCL";
};
&pioB {
gpio-line-names =
/* 0 */ "", "", "", "",
/* 4 */ "", "", "", "",
/* 8 */ "", "", "", "",
/* 12 */ "", "", "", "",
/* 16 */ "", "", "", "",
/* 20 */ "", "", "", "",
/* 24 */ "", "", "CTS-232", "RTS-232",
/* 28 */ "RX-232", "TX-232", "LINBRX", "LINBTX";
};
&pioC {
gpio-line-names =
/* 0 */ "ETX0", "ETX1", "ERX0", "ERX1",
/* 4 */ "ETXEN", "ECRSDV", "ERXER", "EREFCK",
/* 8 */ "EMDC", "EMDIO", "", "",
/* 12 */ "", "", "", "",
/* 16 */ "CGB-PDB", "/ETH-RST", "CGA-PDB", "1PPS",
/* 20 */ "", "", "", "",
/* 24 */ "", "", "", "",
/* 28 */ "", "", "", "";
};
&pioD {
gpio-line-names =
/* 0 */ "", "", "", "",
/* 4 */ "", "", "", "RFB-SIG>",
/* 8 */ "", "RFB-SIG<", "", "RFA-SIG>",
/* 12 */ "", "RFA-SIG<", "", "",
/* 16 */ "", "GNSS-TX", "GNSS-RX", "",
/* 20 */ "VREFEN", "", "", "",
/* 24 */ "", "", "D-OPR0", "",
/* 28 */ "D-OPR1", "/D-RES", "UC-ID", "";
};
&pioE {
gpio-line-names =
/* 0 */ "", "", "", "",
/* 4 */ "", "", "", "",
/* 8 */ "", "", "", "",
/* 12 */ "", "", "", "",
/* 16 */ "", "", "", "",
/* 20 */ "", "", "", "",
/* 24 */ "", "", "", "",
/* 28 */ "", "", "", "";
};
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-03 0:29 Regression: memory corruption on Atmel SAMA5D31 Peter Rosin
@ 2022-03-03 3:02 ` Saravana Kannan
2022-03-03 9:17 ` Peter Rosin
2022-03-04 8:00 ` Thorsten Leemhuis
1 sibling, 1 reply; 39+ messages in thread
From: Saravana Kannan @ 2022-03-03 3:02 UTC (permalink / raw)
To: Peter Rosin
Cc: linux-kernel, linux-arm-kernel, Nicolas Ferre, Alexandre Belloni,
Ludovic Desroches, Daniels Umanovskis, Greg Kroah-Hartman
On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>
> Hi!
>
> I'm seeing a weird problem, and I'd like some help with further
> things to try in order to track down what's going on. I have
> bisected the issue to
>
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
I skimmed through your email and I'll read it more closely tomorrow,
but it wasn't clear if you see this on Linus's tip of the tree too.
Asking because of:
https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
Also, a couple of other data points that _might_ help. Try kernel
command line option fw_devlink=permissive vs fw_devlink=on (I forget
if this was the default by 5.10) vs fw_devlink=off.
I'm expecting "off" to fix the issue for you. But if permissive vs on
shows a difference driver issues would start becoming a real
possibility.
-Saravana
>
> The symptoms are that I get (seemingly) random memory corruption
> when processing large amounts of data (compared to system size).
> I have two known reproducers, but I'm sure there are more if I
> keep digging. One is to do this:
>
> $ dd if=/dev/urandom of=testfile bs=1024 count=40000
> 40000+0 records in
> 40000+0 records out
> 40960000 bytes (41 MB, 39 MiB) copied, 19.7759 s, 2.1 MB/s
> $ for i in 1 2 3 4; do cat testfile | sha256sum; done
> d8c85f816e08baa5ad27050bf0413e11a09f325fb0a8843b7b2b45b9333ab542 -
> f223c1cbb6dbecb02d1741e7991dc98cd8d5b40ffee05bb32dc2c15eb73d6b1f -
> d6f3e7f3d325c67e83a6104934dd8a7c891ebfd9a2cf59633dbe97fb2cbb9c81 -
> cf8ada47e7e2fee299314440b225ba83fca3cef1f6286adc160a5d4f207caccd -
>
> It is harder to tickle the problem if I redirect the testfile to
> sha256sum w/o involving cat or give the file as an argument to
> sha256sum. I can also get things to behave better by getting rid
> of a bunch of USB interrupts by doing the following:
>
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB0/latency_timer
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB1/latency_timer
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB2/latency_timer
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB3/latency_timer
>
> With the lower interrupt pressure I get this:
>
> $ for i in 1 2 3 4; do cat testfile | sha256sum; done
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>
> Nice. However, I need the latency to be lower than the default
> 16ms, 3ms could perhaps work in theory, but preferably 1ms, so
> the above 100ms is far off. The initial hash run was with latency
> set to 1ms, which makes it easy to trigger the issue. The latency
> timer setting is for this driver: drivers/usb/serial/ftdi_sio.c
>
> And also, that does not help with the other reproducer, namely
> to copy that same random testfile with scp to a working system...
>
> $ scp testfile peda@xyzzy:testfile1
> testfile 100% 39MB 2.0MB/s 00:19
> $ scp testfile peda@xyzzy:testfile2
> testfile 100% 39MB 2.1MB/s 00:18
> $ scp testfile peda@xyzzy:testfile3
> testfile 100% 39MB 2.1MB/s 00:18
> $ scp testfile peda@xyzzy:testfile4
> testfile 100% 39MB 2.1MB/s 00:19
>
> ...and then perform the sha256sum on that xyzzy host instead:
>
> $ sha256sum testfile?
> 39dc3a7d05483ae7a2c64c5ed2e8e6108287bf4ddf124a2f0c1a9d0221f9ac66 testfile1
> 9597ef542e7cce879872a027d9ec591feb5fc766aeaec47d58eff6e8c6ab3206 testfile2
> c6104a700b1d6f13eb1de84b5a91a1846a3e1576e052d51a664d2e2711a3869d testfile3
> 60b9c240cb331bad530c3c1d766f50d53a24e01831bfc04e48f329b738521310 testfile4
> $ sha256sum testfile?
> 39dc3a7d05483ae7a2c64c5ed2e8e6108287bf4ddf124a2f0c1a9d0221f9ac66 testfile1
> 9597ef542e7cce879872a027d9ec591feb5fc766aeaec47d58eff6e8c6ab3206 testfile2
> c6104a700b1d6f13eb1de84b5a91a1846a3e1576e052d51a664d2e2711a3869d testfile3
> 60b9c240cb331bad530c3c1d766f50d53a24e01831bfc04e48f329b738521310 testfile4
>
> Same output every time. Of course. xyzzy is a working system...
> Converting these files to hex (hexdump -C) and diffing yields this:
>
> $ diff -u0 testfile1.hex testfile2.hex
> --- testfile1.hex 2022-03-02 23:56:38.273149516 +0100
> +++ testfile2.hex 2022-03-03 00:00:57.912747033 +0100
> @@ -8658,2 +8658,2 @@
> -00021d10 08 2a dd c6 c8 0f 0d e2 4c 1e 46 21 f9 89 a2 54 |.*......L.F!...T|
> -00021d20 23 8c 4f f1 46 f1 61 05 ee f2 d2 ee 56 79 4f 28 |#.O.F.a.....VyO(|
> +00021d10 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
> +00021d20 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
> @@ -20592,2 +20592,2 @@
> -000506f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
> -00050700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
> +000506f0 ad e6 d5 65 e6 dc c1 a3 e2 ba c9 e2 61 39 5f 5f |...e........a9__|
> +00050700 bf eb 8e 5c 08 f1 f2 89 3c 57 c5 07 b9 f4 91 fc |...\....<W......|
> @@ -461019,2 +461019,2 @@
> -00708da0 0d 49 c3 e8 57 06 20 5a c1 27 74 29 f8 83 af 69 |.I..W. Z.'t)...i|
> -00708db0 94 4d 5b 71 9f 3e e5 d2 91 cc cb cd aa ff 44 8b |.M[q.>........D.|
> +00708da0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
> +00708db0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
> @@ -902788,2 +902788,2 @@
> -00dc6830 f2 41 23 1b ec 54 d5 fe f0 33 51 f7 d2 fc bf bd |.A#..T...3Q.....|
> -00dc6840 e5 1f 58 df 24 2f e3 dc 65 87 b2 27 12 86 d1 9a |..X.$/..e..'....|
> +00dc6830 44 82 94 b5 c9 26 08 42 bd 89 e1 96 41 66 8a b5 |D....&.B....Af..|
> +00dc6840 a5 34 46 5e fd 1b c1 73 86 33 24 fd 4d e1 e1 68 |.4F^...s.3$.M..h|
> @@ -931900,2 +931900,2 @@
> -00e383b0 ee 64 c5 6f 38 44 5b 31 41 e1 2c 64 49 d5 f8 ad |.d.o8D[1A.,dI...|
> -00e383c0 fb 85 52 4f 00 1f 80 7a f3 de ee 8e db ac d5 bb |..RO...z........|
> +00e383b0 4b 4d 29 a1 0a 99 8f f7 32 71 8c de 23 ca a0 f1 |KM).....2q..#...|
> +00e383c0 e2 af e3 c4 a0 95 d3 1c ed 58 c4 c5 30 da 56 b9 |.........X..0.V.|
> @@ -1170109,2 +1170109,2 @@
> -011dabc0 6a 7c 0c 3c 86 1a b6 48 50 d7 98 68 0c 01 e3 1c |j|.<...HP..h....|
> -011dabd0 a3 a8 b0 f2 62 21 86 b9 d1 52 9d 74 9e 26 42 51 |....b!...R.t.&BQ|
> +011dabc0 5b 1a 9e 23 ae 58 42 68 83 58 df d6 c1 57 6b b0 |[..#.XBh.X...Wk.|
> +011dabd0 ec d5 50 8b 76 5e 96 b4 49 21 f7 e4 b7 8f a3 45 |..P.v^..I!.....E|
> @@ -1880164,2 +1880164,2 @@
> -01cb0630 1c 74 74 16 75 b4 de f7 ce 4b 5e 4d 97 d6 36 d4 |.tt.u....K^M..6.|
> -01cb0640 44 d9 fd 69 c5 d0 f0 a6 c6 44 26 53 7f 91 f3 62 |D..i.....D&S...b|
> +01cb0630 73 bc 40 ce f8 9d 99 91 1b 14 8b a8 52 2a 7b 39 |s.@.........R*{9|
> +01cb0640 6b ff f5 c5 02 b9 ab c2 c2 08 5e e7 3a 5e 69 c4 |k.........^.:^i.|
>
> Grepping (some of the above) for duplicates yields this:
>
> $ egrep "0 (08 2a dd|23 8c 4f|7b c8 d2|2a f8 fb)" testfile1.hex
> 00020d40 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
> 00020d50 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
> 00021d10 08 2a dd c6 c8 0f 0d e2 4c 1e 46 21 f9 89 a2 54 |.*......L.F!...T|
> 00021d20 23 8c 4f f1 46 f1 61 05 ee f2 d2 ee 56 79 4f 28 |#.O.F.a.....VyO(|
> $ egrep "0 (08 2a dd|23 8c 4f|7b c8 d2|2a f8 fb)" testfile2.hex
> 00020d40 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
> 00020d50 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
> 00021d10 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|*
> 00021d20 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|*
>
> $ egrep "0 (1f 6c ca|8b 6d 86|ad e6 d5|bf eb 8e)" testfile1.hex
> 0004f6f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
> 0004f700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
> 000506f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|*
> 00050700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|*
> $ egrep "0 (1f 6c ca|8b 6d 86|ad e6 d5|bf eb 8e)" testfile2.hex
> 0004f6f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
> 0004f700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
> 000506f0 ad e6 d5 65 e6 dc c1 a3 e2 ba c9 e2 61 39 5f 5f |...e........a9__|
> 00050700 bf eb 8e 5c 08 f1 f2 89 3c 57 c5 07 b9 f4 91 fc |...\....<W......|
>
> $ egrep "0 (0d 49 c3|94 4d 5b|d3 b4 96|5f f4 10 92)" testfile1.hex
> 00707dd0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
> 00707de0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
> 00708da0 0d 49 c3 e8 57 06 20 5a c1 27 74 29 f8 83 af 69 |.I..W. Z.'t)...i|
> 00708db0 94 4d 5b 71 9f 3e e5 d2 91 cc cb cd aa ff 44 8b |.M[q.>........D.|
> $ egrep "0 (0d 49 c3|94 4d 5b|d3 b4 96|5f f4 10 92)" testfile2.hex
> 00707dd0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
> 00707de0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
> 00708da0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|*
> 00708db0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|*
>
> I.e. testfile1 is (probably) corrupted at 000506f0..70f while
> testfile2 is (probably) corrupted at 00021d10..2f and 00708da0..bf
> (correpted lines marked with hand-made asterisks above)
>
> If I keep grepping like this, the pattern is similar both within
> these files and within testfile3 and testfile4. I.e. with
> corruptions in 32-byte blocks at (seemingly) random positions
> in the files. The corruption is always 16-byte-aligned and the bad
> data seems to be a copy from exactly one page up in the file.
>
> As stated above, I have bisected the issue to patch
>
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>
> which was added between v5.10-rc3 and v5.10-rc4. Every kernel I have
> tried with that patch applied have exhibited the issue, and I have
> had no trouble like this with any kernel without that patch. Apart
> from a whole bunch of kernels prior to v5.10-rc3, that includes some
> later kernels with the patch reverted (along with the dependent
> followup 2d09e6eb4a6f). The latest I have tried is 5.11.22. Those
> two patches does not revert cleanly in 5.12 (and thereafter) so I
> have not tried anything beyond 5.11 with the patch reverted.
>
> I fail to understand how that patch might cause this issue. I have
> compared boot messages before and after the patch and there is no
> (significant) difference. Everything seems to happen in the same
> order with the same result. But that comparison is of course limited
> to what is logged.
>
> In some random attempt I tried to disable the D-Cache bit, and that
> makes it all very slow but it also (seemingly) fixes the issue. But
> that may of course be due to vastly different timings.
>
> Some background:
>
> We have a "Linea" CPU module, with a design based on the Atmel (now
> Microchip) SAMA5D31 evaluation board. This CPU module is used on e.g.
> our TSE-850 for which there is a device tree in
> arch/arm/boot/dts/at91-tse850-3.dts
> It has a nand flash for the rootfs and 64 MB RAM. The 40 MB random
> testfile is thus big enough to cause page cache churn.
>
> We have used this module in thousands of delivered units (however,
> not that many TSE-850) and have never observed anything like this
> before. But that has been with older kernels. 4.13.<something> and
> 4.15.<something> was what we were on until this recent activity.
>
> We're now developing a new product (preliminary device tree included)
> and the trusty old CPU module was used again and a fresh new kernel
> was built for it. I then started to notice this issue and have tried
> to include as much relevant data as possible. If you need more data
> or would like me to test something, please ask.
>
> I'm stumped.
>
> Cheers,
> Peter
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-03 3:02 ` Saravana Kannan
@ 2022-03-03 9:17 ` Peter Rosin
2022-03-04 3:55 ` Saravana Kannan
0 siblings, 1 reply; 39+ messages in thread
From: Peter Rosin @ 2022-03-03 9:17 UTC (permalink / raw)
To: Saravana Kannan
Cc: linux-kernel, linux-arm-kernel, Nicolas Ferre, Alexandre Belloni,
Ludovic Desroches, Daniels Umanovskis, Greg Kroah-Hartman
On 2022-03-03 04:02, Saravana Kannan wrote:
> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>>
>> Hi!
>>
>> I'm seeing a weird problem, and I'd like some help with further
>> things to try in order to track down what's going on. I have
>> bisected the issue to
>>
>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>
> I skimmed through your email and I'll read it more closely tomorrow,
> but it wasn't clear if you see this on Linus's tip of the tree too.
> Asking because of:
> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
>
> Also, a couple of other data points that _might_ help. Try kernel
> command line option fw_devlink=permissive vs fw_devlink=on (I forget
> if this was the default by 5.10) vs fw_devlink=off.
>
> I'm expecting "off" to fix the issue for you. But if permissive vs on
> shows a difference driver issues would start becoming a real
> possibility.
>
> -Saravana
Thanks for the quick reply! I don't think I tested the very tip of
Linus tree before, only latest rc or something like that, but now I
have. I.e.
5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
It would have been typical if an issue that existed for a couple of
years had been fixed the last few weeks, but alas, no.
On that kernel, and with whatever the default fw_devlink value is, the
issue is there. It's a bit hard to tell if the incident probability
is the same when trying fw_devlink arguments, but roughly so, and I
do not have to wait for long to get a bad hash with the first
reproducer
while :; do cat testfile | sha256sum; done
The output is typical:
78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
Setting fw_devlink=off makes no difference, AFAICT.
So, just to double-check I went back to 5.11.22 with the two
mentioned patches reverted [1], plus an added backport of
c73960bb0a43 ("gpiolib: allow line names from device props to override driver names")
in order to make userspace behave as similarly as possible.
I left that running for an hour or so with 350-ish hashes
calculated correctly. Which is no proof that there is no latent
issue of course, but at the very least a great deal more stable
than later kernels.
Cheers,
Peter
[1]
f9aa460672c9 ("driver core: Refactor fw_devlink feature")
2d09e6eb4a6f ("driver core: Delete pointless parameter in fwnode_operations.add_links")
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-03 9:17 ` Peter Rosin
@ 2022-03-04 3:55 ` Saravana Kannan
2022-03-04 6:57 ` Peter Rosin
0 siblings, 1 reply; 39+ messages in thread
From: Saravana Kannan @ 2022-03-04 3:55 UTC (permalink / raw)
To: Peter Rosin
Cc: linux-kernel, linux-arm-kernel, Nicolas Ferre, Alexandre Belloni,
Ludovic Desroches, Daniels Umanovskis, Greg Kroah-Hartman
On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
>
> On 2022-03-03 04:02, Saravana Kannan wrote:
> > On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
> >>
> >> Hi!
> >>
> >> I'm seeing a weird problem, and I'd like some help with further
> >> things to try in order to track down what's going on. I have
> >> bisected the issue to
> >>
> >> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
> >
> > I skimmed through your email and I'll read it more closely tomorrow,
> > but it wasn't clear if you see this on Linus's tip of the tree too.
> > Asking because of:
> > https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
> >
> > Also, a couple of other data points that _might_ help. Try kernel
> > command line option fw_devlink=permissive vs fw_devlink=on (I forget
> > if this was the default by 5.10) vs fw_devlink=off.
> >
> > I'm expecting "off" to fix the issue for you. But if permissive vs on
> > shows a difference driver issues would start becoming a real
> > possibility.
> >
> > -Saravana
>
> Thanks for the quick reply! I don't think I tested the very tip of
> Linus tree before, only latest rc or something like that, but now I
> have. I.e.
>
> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>
> It would have been typical if an issue that existed for a couple of
> years had been fixed the last few weeks, but alas, no.
>
> On that kernel, and with whatever the default fw_devlink value is, the
It's fw_devlink=on by default from at least 5.12-rc4 or so.
> issue is there. It's a bit hard to tell if the incident probability
> is the same when trying fw_devlink arguments, but roughly so, and I
> do not have to wait for long to get a bad hash with the first
> reproducer
>
> while :; do cat testfile | sha256sum; done
>
> The output is typical:
> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>
> Setting fw_devlink=off makes no difference, AFAICT.
By this, I'm assuming you set fw_devlink=off in the kernel command
line and you still saw the corruption.
If that's the case, I can't see how this could possibly have anything
to do with:
f9aa460672c9 ("driver core: Refactor fw_devlink feature")
If you look at fw_devlink_link_device(), you'll see that the function
is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
there, the rest of the code in the series doesn't run because more
fields wouldn't get set, etc. That pretty much disables ALL the code
in the entire series. The only remaining diff would be header file
changes where I add/remove fields. But that's unlikely to cause any
issues here because I'm either deleting fields that aren't used or
adding fields that won't be used (with fw_devlink=off). I think the
patch was just causing enough timing changes that it's masking the
real issue.
IIRC (it's been more than a year), the series [1] that brings in this
patch has a few reverts. Those reverts undo subtle device probe
ordering changes brought in by a bunch of earlier patches. You could
go back to before those patches were added and see if you still see
this corruption and then start bisecting from there. Basically try
going to a point before:
42926ac3cd50 ("driver core: Move code to the right part of the file")
TL;DR: is that since you are reproducing this with fw_devlink=off, I'm
pretty sure the problem is not actually because of my changes or any
changes related to fw_devlink.
-Saravana
[1] - https://lore.kernel.org/all/20201121020232.908850-1-saravanak@google.com/
>
> So, just to double-check I went back to 5.11.22 with the two
> mentioned patches reverted [1], plus an added backport of
>
> c73960bb0a43 ("gpiolib: allow line names from device props to override driver names")
>
> in order to make userspace behave as similarly as possible.
> I left that running for an hour or so with 350-ish hashes
> calculated correctly. Which is no proof that there is no latent
> issue of course, but at the very least a great deal more stable
> than later kernels.
>
> Cheers,
> Peter
>
> [1]
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
> 2d09e6eb4a6f ("driver core: Delete pointless parameter in fwnode_operations.add_links")
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-04 3:55 ` Saravana Kannan
@ 2022-03-04 6:57 ` Peter Rosin
2022-03-04 10:57 ` Peter Rosin
0 siblings, 1 reply; 39+ messages in thread
From: Peter Rosin @ 2022-03-04 6:57 UTC (permalink / raw)
To: Saravana Kannan
Cc: linux-kernel, linux-arm-kernel, Nicolas Ferre, Alexandre Belloni,
Ludovic Desroches, Daniels Umanovskis, Greg Kroah-Hartman
On 2022-03-04 04:55, Saravana Kannan wrote:
> On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
>>
>> On 2022-03-03 04:02, Saravana Kannan wrote:
>>> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>>>>
>>>> Hi!
>>>>
>>>> I'm seeing a weird problem, and I'd like some help with further
>>>> things to try in order to track down what's going on. I have
>>>> bisected the issue to
>>>>
>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>
>>> I skimmed through your email and I'll read it more closely tomorrow,
>>> but it wasn't clear if you see this on Linus's tip of the tree too.
>>> Asking because of:
>>> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
>>>
>>> Also, a couple of other data points that _might_ help. Try kernel
>>> command line option fw_devlink=permissive vs fw_devlink=on (I forget
>>> if this was the default by 5.10) vs fw_devlink=off.
>>>
>>> I'm expecting "off" to fix the issue for you. But if permissive vs on
>>> shows a difference driver issues would start becoming a real
>>> possibility.
>>>
>>> -Saravana
>>
>> Thanks for the quick reply! I don't think I tested the very tip of
>> Linus tree before, only latest rc or something like that, but now I
>> have. I.e.
>>
>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>
>> It would have been typical if an issue that existed for a couple of
>> years had been fixed the last few weeks, but alas, no.
>>
>> On that kernel, and with whatever the default fw_devlink value is, the
>
> It's fw_devlink=on by default from at least 5.12-rc4 or so.
>
>> issue is there. It's a bit hard to tell if the incident probability
>> is the same when trying fw_devlink arguments, but roughly so, and I
>> do not have to wait for long to get a bad hash with the first
>> reproducer
>>
>> while :; do cat testfile | sha256sum; done
>>
>> The output is typical:
>> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
>> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
>> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
>> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
>> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>
>> Setting fw_devlink=off makes no difference, AFAICT.
>
> By this, I'm assuming you set fw_devlink=off in the kernel command
> line and you still saw the corruption.
Yes. On a bad kernel it's the same with all of the following kernel
command lines.
console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=on ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=off ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=permissive ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
> If that's the case, I can't see how this could possibly have anything
> to do with:
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>
> If you look at fw_devlink_link_device(), you'll see that the function
> is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
> there, the rest of the code in the series doesn't run because more
> fields wouldn't get set, etc. That pretty much disables ALL the code
> in the entire series. The only remaining diff would be header file
> changes where I add/remove fields. But that's unlikely to cause any
> issues here because I'm either deleting fields that aren't used or
> adding fields that won't be used (with fw_devlink=off). I think the
> patch was just causing enough timing changes that it's masking the
> real issue.
When I compare fw_devlink_link_device() from before and after
f9aa460672c9 ("driver core: Refactor fw_devlink feature")
I notice that you also removed an unconditional call to
device_link_add_missing_supplier_links() that was live before,
regardless of any fw_devlink parameter.
I don't know if that's relevant. Is it?
Not knowing this code at all, and without any serious attempt
at reading it, from here the comment of that removed function
sure looks like it might cause a different ordering before and
after the patch that is not restored with any fw_devlink
argument.
> IIRC (it's been more than a year), the series [1] that brings in this
> patch has a few reverts. Those reverts undo subtle device probe
> ordering changes brought in by a bunch of earlier patches. You could
> go back to before those patches were added and see if you still see
> this corruption and then start bisecting from there. Basically try
> going to a point before:
> 42926ac3cd50 ("driver core: Move code to the right part of the file")
That patch was added after 5.7-rc5, so just to make sure, I have now
also tested 5.6. As expected, it looks like a good kernel from here.
It's been running while I have written this mail and has consistently
produced good hashes.
I arrived at the bad patch by first noticing that 5.15.6 was bad and
that 4.14 was good. I then did a manual preliminary bisect-like
thing and concluded that 5.1 was good, 5.8 was good, 5.11 was bad,
and that 5.10 was good (I think that was the order anyway, not that
it matters all that much). I then did a "proper" bisect between 5.10
and 5.11.
$ git bisect log
git bisect start
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [f40ddce88593482919761f74910f42f4b84c004b] Linux 5.11
git bisect bad f40ddce88593482919761f74910f42f4b84c004b
# bad: [538fcf57aaee6ad78a05f52b69a99baa22b33418] Merge branches 'acpi-scan', 'acpi-pnp' and 'acpi-sleep'
git bisect bad 538fcf57aaee6ad78a05f52b69a99baa22b33418
# good: [15b447361794271f4d03c04d82276a841fe06328] mm/lru: revise the comments of lru_lock
git bisect good 15b447361794271f4d03c04d82276a841fe06328
# good: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect good d635a69dd4981cc51f90293f5f64268620ed1565
# bad: [2911ed9f47b47cb5ab87d03314b3b9fe008e607f] Merge tag 'char-misc-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect bad 2911ed9f47b47cb5ab87d03314b3b9fe008e607f
# good: [c367caf1a38b6f0a1aababafd88b00fefa625f9e] Merge tag 'sound-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good c367caf1a38b6f0a1aababafd88b00fefa625f9e
# good: [93f998879cd95b3e4f2836e7b17d6d5ae035cf90] Merge tag 'extcon-next-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next
git bisect good 93f998879cd95b3e4f2836e7b17d6d5ae035cf90
# good: [b5206275b46c30a8236feb34a1dc247fa3683d83] usb: typec: tcpm: convert comma to semicolon
git bisect good b5206275b46c30a8236feb34a1dc247fa3683d83
# good: [9e1792727ead477f49958578d0dbd466a7deea48] tty: use const parameters in port-flag accessors
git bisect good 9e1792727ead477f49958578d0dbd466a7deea48
# good: [157f809894f3cf8e62b4011915a00398603215c9] Merge tag 'tty-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good 157f809894f3cf8e62b4011915a00398603215c9
# good: [25ac86c6dbe62fba9b97e997fa648cdbe2d40173] driver core: Use device's fwnode to check if it is waiting for suppliers
git bisect good 25ac86c6dbe62fba9b97e997fa648cdbe2d40173
# bad: [9c30921fe7994907e0b3e0637b2c8c0fc4b5171f] driver core: platform: use bus_type functions
git bisect bad 9c30921fe7994907e0b3e0637b2c8c0fc4b5171f
# bad: [5b6164d3465fcc13b5679c860c452963443172a7] driver core: Reorder devices on successful probe
git bisect bad 5b6164d3465fcc13b5679c860c452963443172a7
# good: [e82a840cb1c1c83d01a9b81bb63b6cf1c09239d7] efi: Update implementation of add_links() to create fwnode links
git bisect good e82a840cb1c1c83d01a9b81bb63b6cf1c09239d7
# bad: [2d09e6eb4a6f20273959f4905ccf009da8c64c7a] driver core: Delete pointless parameter in fwnode_operations.add_links
git bisect bad 2d09e6eb4a6f20273959f4905ccf009da8c64c7a
# bad: [f9aa460672c9c56896cdc12a521159e3e67000ba] driver core: Refactor fw_devlink feature
git bisect bad f9aa460672c9c56896cdc12a521159e3e67000ba
# first bad commit: [f9aa460672c9c56896cdc12a521159e3e67000ba] driver core: Refactor fw_devlink feature
Since I need drivers that was added for 5.11, and it was easy
to revert there, I landed at 5.11.22. And while that seems
workable at the moment, it's of course not at all where I want
to be.
Since then, I have tried a fair few kernels after 5.11, and
they have all been bad. I'm sad so say that I have not kept a
log of exactly which ones though.
> TL;DR: is that since you are reproducing this with fw_devlink=off, I'm
> pretty sure the problem is not actually because of my changes or any
> changes related to fw_devlink.
I too don't get it, but it's a little bit too consistent with
everything pointing at this one patch across so many changes.
Nothing is good after this patch, and it all behaves a little
bit to similar across the bad kernels for it to be some subtle
timing issue. Methinks. But maybe I just need to stumble on
to some later good kernel. Not holding my breath though...
But it does seem related to interrupts, as I mentioned in the
original mail, I can take a bad kernel and reduce the interrupt
pressure from USB from slightly more than 1kHz down to a
trickle and things behave much better when it comes to sha256sum.
Copying with scp might cause network interrupts, so the two
reproducers I have are perhaps quite similar? If that's the
case, then trigger would be page cache churn, interrupts and a
fair bit of CPU usage (calculating hashes or encrypting).
Cheers,
Peter
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-04 6:57 ` Peter Rosin
@ 2022-03-04 10:57 ` Peter Rosin
2022-03-04 11:12 ` Tudor.Ambarus
2022-03-04 20:06 ` Saravana Kannan
0 siblings, 2 replies; 39+ messages in thread
From: Peter Rosin @ 2022-03-04 10:57 UTC (permalink / raw)
To: Saravana Kannan
Cc: linux-kernel, linux-arm-kernel, Nicolas Ferre, Alexandre Belloni,
Ludovic Desroches, Daniels Umanovskis, Greg Kroah-Hartman
On 2022-03-04 07:57, Peter Rosin wrote:
> On 2022-03-04 04:55, Saravana Kannan wrote:
>> On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
>>>
>>> On 2022-03-03 04:02, Saravana Kannan wrote:
>>>> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> I'm seeing a weird problem, and I'd like some help with further
>>>>> things to try in order to track down what's going on. I have
>>>>> bisected the issue to
>>>>>
>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>
>>>> I skimmed through your email and I'll read it more closely tomorrow,
>>>> but it wasn't clear if you see this on Linus's tip of the tree too.
>>>> Asking because of:
>>>> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
>>>>
>>>> Also, a couple of other data points that _might_ help. Try kernel
>>>> command line option fw_devlink=permissive vs fw_devlink=on (I forget
>>>> if this was the default by 5.10) vs fw_devlink=off.
>>>>
>>>> I'm expecting "off" to fix the issue for you. But if permissive vs on
>>>> shows a difference driver issues would start becoming a real
>>>> possibility.
>>>>
>>>> -Saravana
>>>
>>> Thanks for the quick reply! I don't think I tested the very tip of
>>> Linus tree before, only latest rc or something like that, but now I
>>> have. I.e.
>>>
>>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>>
>>> It would have been typical if an issue that existed for a couple of
>>> years had been fixed the last few weeks, but alas, no.
>>>
>>> On that kernel, and with whatever the default fw_devlink value is, the
>>
>> It's fw_devlink=on by default from at least 5.12-rc4 or so.
>>
>>> issue is there. It's a bit hard to tell if the incident probability
>>> is the same when trying fw_devlink arguments, but roughly so, and I
>>> do not have to wait for long to get a bad hash with the first
>>> reproducer
>>>
>>> while :; do cat testfile | sha256sum; done
>>>
>>> The output is typical:
>>> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
>>> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
>>> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
>>> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
>>> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>
>>> Setting fw_devlink=off makes no difference, AFAICT.
>>
>> By this, I'm assuming you set fw_devlink=off in the kernel command
>> line and you still saw the corruption.
>
> Yes. On a bad kernel it's the same with all of the following kernel
> command lines.
>
> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=on ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>
> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=off ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>
> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=permissive ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>
>> If that's the case, I can't see how this could possibly have anything
>> to do with:
>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>
>> If you look at fw_devlink_link_device(), you'll see that the function
>> is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
>> there, the rest of the code in the series doesn't run because more
>> fields wouldn't get set, etc. That pretty much disables ALL the code
>> in the entire series. The only remaining diff would be header file
>> changes where I add/remove fields. But that's unlikely to cause any
>> issues here because I'm either deleting fields that aren't used or
>> adding fields that won't be used (with fw_devlink=off). I think the
>> patch was just causing enough timing changes that it's masking the
>> real issue.
>
> When I compare fw_devlink_link_device() from before and after
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
> I notice that you also removed an unconditional call to
> device_link_add_missing_supplier_links() that was live before,
> regardless of any fw_devlink parameter.
>
> I don't know if that's relevant. Is it?
>
> Not knowing this code at all, and without any serious attempt
> at reading it, from here the comment of that removed function
> sure looks like it might cause a different ordering before and
> after the patch that is not restored with any fw_devlink
> argument.
It appears that the device_link_add_missing_supplier_links() difference
is not relevant after all. What actually happened in the header file in
the "bad" commit was that two fields were removed (none added). Like so:
struct dev_links_info {
struct list_head suppliers;
struct list_head consumers;
- struct list_head needs_suppliers;
struct list_head defer_sync;
- bool need_for_probe;
enum dl_dev_state status;
};
If I restore those fields on a bad kernel, the issue is no longer
visible. That is true for the first bad kernel, i.e.
f9aa460672c9 ("driver core: Refactor fw_devlink feature")
and for tip of Linus as of recently, i.e.
5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
Which is of course insane and a whole different level of bad. WTF!?!
I wonder if I can dig out the old SAMA5D31 evaluation kit and reproduce
there? I think that's next on the list...
Cheers,
Peter
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-04 10:57 ` Peter Rosin
@ 2022-03-04 11:12 ` Tudor.Ambarus
2022-03-04 12:38 ` Peter Rosin
2022-03-04 20:06 ` Saravana Kannan
1 sibling, 1 reply; 39+ messages in thread
From: Tudor.Ambarus @ 2022-03-04 11:12 UTC (permalink / raw)
To: peda, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel, Nicolas.Ferre
Hi, Peter!
On 3/4/22 12:57, Peter Rosin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 2022-03-04 07:57, Peter Rosin wrote:
>> On 2022-03-04 04:55, Saravana Kannan wrote:
>>> On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
>>>>
>>>> On 2022-03-03 04:02, Saravana Kannan wrote:
>>>>> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I'm seeing a weird problem, and I'd like some help with further
>>>>>> things to try in order to track down what's going on. I have
>>>>>> bisected the issue to
>>>>>>
>>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>>
>>>>> I skimmed through your email and I'll read it more closely tomorrow,
>>>>> but it wasn't clear if you see this on Linus's tip of the tree too.
>>>>> Asking because of:
>>>>> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
>>>>>
>>>>> Also, a couple of other data points that _might_ help. Try kernel
>>>>> command line option fw_devlink=permissive vs fw_devlink=on (I forget
>>>>> if this was the default by 5.10) vs fw_devlink=off.
>>>>>
>>>>> I'm expecting "off" to fix the issue for you. But if permissive vs on
>>>>> shows a difference driver issues would start becoming a real
>>>>> possibility.
>>>>>
>>>>> -Saravana
>>>>
>>>> Thanks for the quick reply! I don't think I tested the very tip of
>>>> Linus tree before, only latest rc or something like that, but now I
>>>> have. I.e.
>>>>
>>>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>>>
>>>> It would have been typical if an issue that existed for a couple of
>>>> years had been fixed the last few weeks, but alas, no.
>>>>
>>>> On that kernel, and with whatever the default fw_devlink value is, the
>>>
>>> It's fw_devlink=on by default from at least 5.12-rc4 or so.
>>>
>>>> issue is there. It's a bit hard to tell if the incident probability
>>>> is the same when trying fw_devlink arguments, but roughly so, and I
>>>> do not have to wait for long to get a bad hash with the first
>>>> reproducer
>>>>
>>>> while :; do cat testfile | sha256sum; done
>>>>
>>>> The output is typical:
>>>> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
>>>> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
>>>> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
>>>> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
>>>> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>
>>>> Setting fw_devlink=off makes no difference, AFAICT.
>>>
>>> By this, I'm assuming you set fw_devlink=off in the kernel command
>>> line and you still saw the corruption.
>>
>> Yes. On a bad kernel it's the same with all of the following kernel
>> command lines.
>>
>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=on ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>
>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=off ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>
>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=permissive ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>
>>> If that's the case, I can't see how this could possibly have anything
>>> to do with:
>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>
>>> If you look at fw_devlink_link_device(), you'll see that the function
>>> is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
>>> there, the rest of the code in the series doesn't run because more
>>> fields wouldn't get set, etc. That pretty much disables ALL the code
>>> in the entire series. The only remaining diff would be header file
>>> changes where I add/remove fields. But that's unlikely to cause any
>>> issues here because I'm either deleting fields that aren't used or
>>> adding fields that won't be used (with fw_devlink=off). I think the
>>> patch was just causing enough timing changes that it's masking the
>>> real issue.
>>
>> When I compare fw_devlink_link_device() from before and after
>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>> I notice that you also removed an unconditional call to
>> device_link_add_missing_supplier_links() that was live before,
>> regardless of any fw_devlink parameter.
>>
>> I don't know if that's relevant. Is it?
>>
>> Not knowing this code at all, and without any serious attempt
>> at reading it, from here the comment of that removed function
>> sure looks like it might cause a different ordering before and
>> after the patch that is not restored with any fw_devlink
>> argument.
>
> It appears that the device_link_add_missing_supplier_links() difference
> is not relevant after all. What actually happened in the header file in
> the "bad" commit was that two fields were removed (none added). Like so:
>
> struct dev_links_info {
> struct list_head suppliers;
> struct list_head consumers;
> - struct list_head needs_suppliers;
> struct list_head defer_sync;
> - bool need_for_probe;
> enum dl_dev_state status;
> };
>
> If I restore those fields on a bad kernel, the issue is no longer
> visible. That is true for the first bad kernel, i.e.
>
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>
> and for tip of Linus as of recently, i.e.
>
> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>
> Which is of course insane and a whole different level of bad. WTF!?!
>
> I wonder if I can dig out the old SAMA5D31 evaluation kit and reproduce
> there? I think that's next on the list...
>
I have a sama5d3_xplained that uses a SAMA5D36 and has a 256MBytes DDR2 and a
256MBytes NAND Flash. I tried a test with a 200MB file, rootfs on sdcard and
I couldn't reproduce the bug. I'm using Linus's latest kernel:
38f80f42147f (HEAD, origin/master, origin/HEAD) MAINTAINERS: Remove dead patchwork link
root@sama5d3-xplained-sd:~# dd if=/dev/urandom of=testfile bs=1024 count=200000
200000+0 records in
200000+0 records out
204800000 bytes (205 MB, 195 MiB) copied, 37.6424 s, 5.4 MB/s
root@sama5d3-xplained-sd:~# for i in 1 2 3 4 5 6 7 8; do cat testfile | sha256sum; done
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
root@sama5d3-xplained-sd:~#
I'll put the rootfs on NAND and try to retest. Maybe to do some other tests
in parallel to have more interrupts on the system. Will let you know if I can
reproduce the bug on sama5d3_xplained.
Cheers,
ta
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-04 11:12 ` Tudor.Ambarus
@ 2022-03-04 12:38 ` Peter Rosin
2022-03-04 16:48 ` Tudor.Ambarus
0 siblings, 1 reply; 39+ messages in thread
From: Peter Rosin @ 2022-03-04 12:38 UTC (permalink / raw)
To: Tudor.Ambarus, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel, Nicolas.Ferre
Hi!
On 2022-03-04 12:12, Tudor.Ambarus@microchip.com wrote:
> Hi, Peter!
>
> On 3/4/22 12:57, Peter Rosin wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> On 2022-03-04 07:57, Peter Rosin wrote:
>>> On 2022-03-04 04:55, Saravana Kannan wrote:
>>>> On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
>>>>>
>>>>> On 2022-03-03 04:02, Saravana Kannan wrote:
>>>>>> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I'm seeing a weird problem, and I'd like some help with further
>>>>>>> things to try in order to track down what's going on. I have
>>>>>>> bisected the issue to
>>>>>>>
>>>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>>>
>>>>>> I skimmed through your email and I'll read it more closely tomorrow,
>>>>>> but it wasn't clear if you see this on Linus's tip of the tree too.
>>>>>> Asking because of:
>>>>>> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
>>>>>>
>>>>>> Also, a couple of other data points that _might_ help. Try kernel
>>>>>> command line option fw_devlink=permissive vs fw_devlink=on (I forget
>>>>>> if this was the default by 5.10) vs fw_devlink=off.
>>>>>>
>>>>>> I'm expecting "off" to fix the issue for you. But if permissive vs on
>>>>>> shows a difference driver issues would start becoming a real
>>>>>> possibility.
>>>>>>
>>>>>> -Saravana
>>>>>
>>>>> Thanks for the quick reply! I don't think I tested the very tip of
>>>>> Linus tree before, only latest rc or something like that, but now I
>>>>> have. I.e.
>>>>>
>>>>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>>>>
>>>>> It would have been typical if an issue that existed for a couple of
>>>>> years had been fixed the last few weeks, but alas, no.
>>>>>
>>>>> On that kernel, and with whatever the default fw_devlink value is, the
>>>>
>>>> It's fw_devlink=on by default from at least 5.12-rc4 or so.
>>>>
>>>>> issue is there. It's a bit hard to tell if the incident probability
>>>>> is the same when trying fw_devlink arguments, but roughly so, and I
>>>>> do not have to wait for long to get a bad hash with the first
>>>>> reproducer
>>>>>
>>>>> while :; do cat testfile | sha256sum; done
>>>>>
>>>>> The output is typical:
>>>>> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
>>>>> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
>>>>> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
>>>>> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
>>>>> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>
>>>>> Setting fw_devlink=off makes no difference, AFAICT.
>>>>
>>>> By this, I'm assuming you set fw_devlink=off in the kernel command
>>>> line and you still saw the corruption.
>>>
>>> Yes. On a bad kernel it's the same with all of the following kernel
>>> command lines.
>>>
>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=on ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>
>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=off ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>
>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=permissive ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>
>>>> If that's the case, I can't see how this could possibly have anything
>>>> to do with:
>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>
>>>> If you look at fw_devlink_link_device(), you'll see that the function
>>>> is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
>>>> there, the rest of the code in the series doesn't run because more
>>>> fields wouldn't get set, etc. That pretty much disables ALL the code
>>>> in the entire series. The only remaining diff would be header file
>>>> changes where I add/remove fields. But that's unlikely to cause any
>>>> issues here because I'm either deleting fields that aren't used or
>>>> adding fields that won't be used (with fw_devlink=off). I think the
>>>> patch was just causing enough timing changes that it's masking the
>>>> real issue.
>>>
>>> When I compare fw_devlink_link_device() from before and after
>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>> I notice that you also removed an unconditional call to
>>> device_link_add_missing_supplier_links() that was live before,
>>> regardless of any fw_devlink parameter.
>>>
>>> I don't know if that's relevant. Is it?
>>>
>>> Not knowing this code at all, and without any serious attempt
>>> at reading it, from here the comment of that removed function
>>> sure looks like it might cause a different ordering before and
>>> after the patch that is not restored with any fw_devlink
>>> argument.
>>
>> It appears that the device_link_add_missing_supplier_links() difference
>> is not relevant after all. What actually happened in the header file in
>> the "bad" commit was that two fields were removed (none added). Like so:
>>
>> struct dev_links_info {
>> struct list_head suppliers;
>> struct list_head consumers;
>> - struct list_head needs_suppliers;
>> struct list_head defer_sync;
>> - bool need_for_probe;
>> enum dl_dev_state status;
>> };
>>
>> If I restore those fields on a bad kernel, the issue is no longer
>> visible. That is true for the first bad kernel, i.e.
>>
>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>
>> and for tip of Linus as of recently, i.e.
>>
>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>
>> Which is of course insane and a whole different level of bad. WTF!?!
>>
>> I wonder if I can dig out the old SAMA5D31 evaluation kit and reproduce
>> there? I think that's next on the list...
>>
>
> I have a sama5d3_xplained that uses a SAMA5D36 and has a 256MBytes DDR2 and a
> 256MBytes NAND Flash. I tried a test with a 200MB file, rootfs on sdcard and
> I couldn't reproduce the bug. I'm using Linus's latest kernel:
> 38f80f42147f (HEAD, origin/master, origin/HEAD) MAINTAINERS: Remove dead patchwork link
>
> root@sama5d3-xplained-sd:~# dd if=/dev/urandom of=testfile bs=1024 count=200000
> 200000+0 records in
> 200000+0 records out
> 204800000 bytes (205 MB, 195 MiB) copied, 37.6424 s, 5.4 MB/s
> root@sama5d3-xplained-sd:~# for i in 1 2 3 4 5 6 7 8; do cat testfile | sha256sum; done
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
> root@sama5d3-xplained-sd:~#
>
> I'll put the rootfs on NAND and try to retest. Maybe to do some other tests
> in parallel to have more interrupts on the system. Will let you know if I can
> reproduce the bug on sama5d3_xplained.
Thanks for testing!
Since you (probably) don't have the interrupt source from the USB
serial chip that I have, that is not completely unexpected.
$ lsusb
Bus 001 Device 002: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
$ cat /sys/bus/usb-serial/devices/ttyUSB?/latency_timer
1
1
1
1
Also, your file is perhaps too small? You leave approx 50MB for the
system, so it might be the case that the page cache can hold the whole
file?
So, can you please try that again with a slightly bigger file or if you
restrict how much RAM you allow the kernel to see?
And if you don't have the FTDI usb-serial chip, you should probably go
with the other reproducer, namely to simply copy the random file to a
different host using scp.
Thanks again!
Cheers,
Peter
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-04 12:38 ` Peter Rosin
@ 2022-03-04 16:48 ` Tudor.Ambarus
2022-03-07 9:45 ` Tudor.Ambarus
0 siblings, 1 reply; 39+ messages in thread
From: Tudor.Ambarus @ 2022-03-04 16:48 UTC (permalink / raw)
To: peda, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel, Nicolas.Ferre
On 3/4/22 14:38, Peter Rosin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> Hi!
Hi, Peter!
>
> On 2022-03-04 12:12, Tudor.Ambarus@microchip.com wrote:
>> Hi, Peter!
>>
>> On 3/4/22 12:57, Peter Rosin wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> On 2022-03-04 07:57, Peter Rosin wrote:
>>>> On 2022-03-04 04:55, Saravana Kannan wrote:
>>>>> On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
>>>>>>
>>>>>> On 2022-03-03 04:02, Saravana Kannan wrote:
>>>>>>> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I'm seeing a weird problem, and I'd like some help with further
>>>>>>>> things to try in order to track down what's going on. I have
>>>>>>>> bisected the issue to
>>>>>>>>
>>>>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>>>>
>>>>>>> I skimmed through your email and I'll read it more closely tomorrow,
>>>>>>> but it wasn't clear if you see this on Linus's tip of the tree too.
>>>>>>> Asking because of:
>>>>>>> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
>>>>>>>
>>>>>>> Also, a couple of other data points that _might_ help. Try kernel
>>>>>>> command line option fw_devlink=permissive vs fw_devlink=on (I forget
>>>>>>> if this was the default by 5.10) vs fw_devlink=off.
>>>>>>>
>>>>>>> I'm expecting "off" to fix the issue for you. But if permissive vs on
>>>>>>> shows a difference driver issues would start becoming a real
>>>>>>> possibility.
>>>>>>>
>>>>>>> -Saravana
>>>>>>
>>>>>> Thanks for the quick reply! I don't think I tested the very tip of
>>>>>> Linus tree before, only latest rc or something like that, but now I
>>>>>> have. I.e.
>>>>>>
>>>>>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>>>>>
>>>>>> It would have been typical if an issue that existed for a couple of
>>>>>> years had been fixed the last few weeks, but alas, no.
>>>>>>
>>>>>> On that kernel, and with whatever the default fw_devlink value is, the
>>>>>
>>>>> It's fw_devlink=on by default from at least 5.12-rc4 or so.
>>>>>
>>>>>> issue is there. It's a bit hard to tell if the incident probability
>>>>>> is the same when trying fw_devlink arguments, but roughly so, and I
>>>>>> do not have to wait for long to get a bad hash with the first
>>>>>> reproducer
>>>>>>
>>>>>> while :; do cat testfile | sha256sum; done
>>>>>>
>>>>>> The output is typical:
>>>>>> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
>>>>>> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
>>>>>> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
>>>>>> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
>>>>>> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>
>>>>>> Setting fw_devlink=off makes no difference, AFAICT.
>>>>>
>>>>> By this, I'm assuming you set fw_devlink=off in the kernel command
>>>>> line and you still saw the corruption.
>>>>
>>>> Yes. On a bad kernel it's the same with all of the following kernel
>>>> command lines.
>>>>
>>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=on ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>>
>>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=off ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>>
>>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=permissive ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>>
>>>>> If that's the case, I can't see how this could possibly have anything
>>>>> to do with:
>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>>
>>>>> If you look at fw_devlink_link_device(), you'll see that the function
>>>>> is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
>>>>> there, the rest of the code in the series doesn't run because more
>>>>> fields wouldn't get set, etc. That pretty much disables ALL the code
>>>>> in the entire series. The only remaining diff would be header file
>>>>> changes where I add/remove fields. But that's unlikely to cause any
>>>>> issues here because I'm either deleting fields that aren't used or
>>>>> adding fields that won't be used (with fw_devlink=off). I think the
>>>>> patch was just causing enough timing changes that it's masking the
>>>>> real issue.
>>>>
>>>> When I compare fw_devlink_link_device() from before and after
>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>> I notice that you also removed an unconditional call to
>>>> device_link_add_missing_supplier_links() that was live before,
>>>> regardless of any fw_devlink parameter.
>>>>
>>>> I don't know if that's relevant. Is it?
>>>>
>>>> Not knowing this code at all, and without any serious attempt
>>>> at reading it, from here the comment of that removed function
>>>> sure looks like it might cause a different ordering before and
>>>> after the patch that is not restored with any fw_devlink
>>>> argument.
>>>
>>> It appears that the device_link_add_missing_supplier_links() difference
>>> is not relevant after all. What actually happened in the header file in
>>> the "bad" commit was that two fields were removed (none added). Like so:
>>>
>>> struct dev_links_info {
>>> struct list_head suppliers;
>>> struct list_head consumers;
>>> - struct list_head needs_suppliers;
>>> struct list_head defer_sync;
>>> - bool need_for_probe;
>>> enum dl_dev_state status;
>>> };
>>>
>>> If I restore those fields on a bad kernel, the issue is no longer
>>> visible. That is true for the first bad kernel, i.e.
>>>
>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>
>>> and for tip of Linus as of recently, i.e.
>>>
>>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>>
>>> Which is of course insane and a whole different level of bad. WTF!?!
>>>
>>> I wonder if I can dig out the old SAMA5D31 evaluation kit and reproduce
>>> there? I think that's next on the list...
>>>
>>
>> I have a sama5d3_xplained that uses a SAMA5D36 and has a 256MBytes DDR2 and a
>> 256MBytes NAND Flash. I tried a test with a 200MB file, rootfs on sdcard and
>> I couldn't reproduce the bug. I'm using Linus's latest kernel:
>> 38f80f42147f (HEAD, origin/master, origin/HEAD) MAINTAINERS: Remove dead patchwork link
>>
>> root@sama5d3-xplained-sd:~# dd if=/dev/urandom of=testfile bs=1024 count=200000
>> 200000+0 records in
>> 200000+0 records out
>> 204800000 bytes (205 MB, 195 MiB) copied, 37.6424 s, 5.4 MB/s
>> root@sama5d3-xplained-sd:~# for i in 1 2 3 4 5 6 7 8; do cat testfile | sha256sum; done
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>> root@sama5d3-xplained-sd:~#
>>
>> I'll put the rootfs on NAND and try to retest. Maybe to do some other tests
>> in parallel to have more interrupts on the system. Will let you know if I can
>> reproduce the bug on sama5d3_xplained.
>
> Thanks for testing!
you're welcome, no worries.
>
> Since you (probably) don't have the interrupt source from the USB
> serial chip that I have, that is not completely unexpected.
>
> $ lsusb
> Bus 001 Device 002: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
> $ cat /sys/bus/usb-serial/devices/ttyUSB?/latency_timer
> 1
> 1
> 1
> 1
>
> Also, your file is perhaps too small? You leave approx 50MB for the
> system, so it might be the case that the page cache can hold the whole
> file?
>
> So, can you please try that again with a slightly bigger file or if you
> restrict how much RAM you allow the kernel to see?
>
> And if you don't have the FTDI usb-serial chip, you should probably go
> with the other reproducer, namely to simply copy the random file to a
> different host using scp.
I kept the rootfs on sdcard but this time I generated a 300MB random file.
I ran a mtd_stresstest on the NAND flash while doing the sha256sum or scp
tests. All went fine.
Here's the mtd_stresstest being successful https://pastebin.com/eWQNHAsE
While the stresstest was running I did the following sha256 and scp tests:
https://pastebin.com/wjutw63C
On my laptop the sha256sum is matching the one on the board:
$ sha256sum /tmp/testfile?
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile1
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile2
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile3
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile4
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile5
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile6
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile7
d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile8
Here's what "top" cmd was showing when doing the scp and the mtd_stresstest:
top - 14:40:13 up 39 min, 3 users, load average: 1.95, 1.88, 1.80
Tasks: 54 total, 3 running, 51 sleeping, 0 stopped, 0 zombie
%Cpu(s): 35.1 us, 48.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.9 si, 0.0 st
MiB Mem : 242.3 total, 2.5 free, 15.2 used, 224.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 220.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
464 root 20 0 4296 3292 2940 R 46.6 1.3 0:17.53 ssh
401 root 20 0 1668 760 676 R 45.0 0.3 17:57.11 modprobe
463 root 20 0 3456 2232 2000 S 5.2 0.9 0:02.04 scp
Here's what "top" cmd was showing when doing the sha256sum and the mtd_stresstest:
top - 14:12:47 up 12 min, 3 users, load average: 2.14, 1.92, 1.08
Tasks: 54 total, 3 running, 51 sleeping, 0 stopped, 0 zombie
%Cpu(s): 37.4 us, 58.4 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 4.2 si, 0.0 st
MiB Mem : 242.3 total, 3.0 free, 14.8 used, 224.5 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 220.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
420 root 20 0 1396 784 692 R 47.2 0.3 0:06.42 sha256sum
401 root 20 0 1668 1208 1124 R 43.0 0.5 4:50.34 modprobe
419 root 20 0 1520 868 680 S 6.5 0.3 0:00.92 cat
Peter, do you think it is worth to do some other tests on sama5d3_xplained?
I'll try to find a SAMA5D31 evaluation kit meanwhile.
Cheers,
ta
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-04 16:48 ` Tudor.Ambarus
@ 2022-03-07 9:45 ` Tudor.Ambarus
2022-03-07 11:32 ` Peter Rosin
0 siblings, 1 reply; 39+ messages in thread
From: Tudor.Ambarus @ 2022-03-07 9:45 UTC (permalink / raw)
To: peda, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel
On 3/4/22 18:48, Tudor.Ambarus@microchip.com wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 3/4/22 14:38, Peter Rosin wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> Hi!
>
> Hi, Peter!
>
>>
>> On 2022-03-04 12:12, Tudor.Ambarus@microchip.com wrote:
>>> Hi, Peter!
>>>
>>> On 3/4/22 12:57, Peter Rosin wrote:
>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>>
>>>> On 2022-03-04 07:57, Peter Rosin wrote:
>>>>> On 2022-03-04 04:55, Saravana Kannan wrote:
>>>>>> On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
>>>>>>>
>>>>>>> On 2022-03-03 04:02, Saravana Kannan wrote:
>>>>>>>> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
>>>>>>>>>
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> I'm seeing a weird problem, and I'd like some help with further
>>>>>>>>> things to try in order to track down what's going on. I have
>>>>>>>>> bisected the issue to
>>>>>>>>>
>>>>>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>>>>>
>>>>>>>> I skimmed through your email and I'll read it more closely tomorrow,
>>>>>>>> but it wasn't clear if you see this on Linus's tip of the tree too.
>>>>>>>> Asking because of:
>>>>>>>> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
>>>>>>>>
>>>>>>>> Also, a couple of other data points that _might_ help. Try kernel
>>>>>>>> command line option fw_devlink=permissive vs fw_devlink=on (I forget
>>>>>>>> if this was the default by 5.10) vs fw_devlink=off.
>>>>>>>>
>>>>>>>> I'm expecting "off" to fix the issue for you. But if permissive vs on
>>>>>>>> shows a difference driver issues would start becoming a real
>>>>>>>> possibility.
>>>>>>>>
>>>>>>>> -Saravana
>>>>>>>
>>>>>>> Thanks for the quick reply! I don't think I tested the very tip of
>>>>>>> Linus tree before, only latest rc or something like that, but now I
>>>>>>> have. I.e.
>>>>>>>
>>>>>>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>>>>>>
>>>>>>> It would have been typical if an issue that existed for a couple of
>>>>>>> years had been fixed the last few weeks, but alas, no.
>>>>>>>
>>>>>>> On that kernel, and with whatever the default fw_devlink value is, the
>>>>>>
>>>>>> It's fw_devlink=on by default from at least 5.12-rc4 or so.
>>>>>>
>>>>>>> issue is there. It's a bit hard to tell if the incident probability
>>>>>>> is the same when trying fw_devlink arguments, but roughly so, and I
>>>>>>> do not have to wait for long to get a bad hash with the first
>>>>>>> reproducer
>>>>>>>
>>>>>>> while :; do cat testfile | sha256sum; done
>>>>>>>
>>>>>>> The output is typical:
>>>>>>> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
>>>>>>> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
>>>>>>> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
>>>>>>> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
>>>>>>> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
>>>>>>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>>>>>>>
>>>>>>> Setting fw_devlink=off makes no difference, AFAICT.
>>>>>>
>>>>>> By this, I'm assuming you set fw_devlink=off in the kernel command
>>>>>> line and you still saw the corruption.
>>>>>
>>>>> Yes. On a bad kernel it's the same with all of the following kernel
>>>>> command lines.
>>>>>
>>>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=on ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>>>
>>>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=off ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>>>
>>>>> console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=permissive ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
>>>>>
>>>>>> If that's the case, I can't see how this could possibly have anything
>>>>>> to do with:
>>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>>>
>>>>>> If you look at fw_devlink_link_device(), you'll see that the function
>>>>>> is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
>>>>>> there, the rest of the code in the series doesn't run because more
>>>>>> fields wouldn't get set, etc. That pretty much disables ALL the code
>>>>>> in the entire series. The only remaining diff would be header file
>>>>>> changes where I add/remove fields. But that's unlikely to cause any
>>>>>> issues here because I'm either deleting fields that aren't used or
>>>>>> adding fields that won't be used (with fw_devlink=off). I think the
>>>>>> patch was just causing enough timing changes that it's masking the
>>>>>> real issue.
>>>>>
>>>>> When I compare fw_devlink_link_device() from before and after
>>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>> I notice that you also removed an unconditional call to
>>>>> device_link_add_missing_supplier_links() that was live before,
>>>>> regardless of any fw_devlink parameter.
>>>>>
>>>>> I don't know if that's relevant. Is it?
>>>>>
>>>>> Not knowing this code at all, and without any serious attempt
>>>>> at reading it, from here the comment of that removed function
>>>>> sure looks like it might cause a different ordering before and
>>>>> after the patch that is not restored with any fw_devlink
>>>>> argument.
>>>>
>>>> It appears that the device_link_add_missing_supplier_links() difference
>>>> is not relevant after all. What actually happened in the header file in
>>>> the "bad" commit was that two fields were removed (none added). Like so:
>>>>
>>>> struct dev_links_info {
>>>> struct list_head suppliers;
>>>> struct list_head consumers;
>>>> - struct list_head needs_suppliers;
>>>> struct list_head defer_sync;
>>>> - bool need_for_probe;
>>>> enum dl_dev_state status;
>>>> };
>>>>
>>>> If I restore those fields on a bad kernel, the issue is no longer
>>>> visible. That is true for the first bad kernel, i.e.
>>>>
>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>>>>
>>>> and for tip of Linus as of recently, i.e.
>>>>
>>>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
>>>>
>>>> Which is of course insane and a whole different level of bad. WTF!?!
>>>>
>>>> I wonder if I can dig out the old SAMA5D31 evaluation kit and reproduce
>>>> there? I think that's next on the list...
>>>>
>>>
>>> I have a sama5d3_xplained that uses a SAMA5D36 and has a 256MBytes DDR2 and a
>>> 256MBytes NAND Flash. I tried a test with a 200MB file, rootfs on sdcard and
>>> I couldn't reproduce the bug. I'm using Linus's latest kernel:
>>> 38f80f42147f (HEAD, origin/master, origin/HEAD) MAINTAINERS: Remove dead patchwork link
>>>
>>> root@sama5d3-xplained-sd:~# dd if=/dev/urandom of=testfile bs=1024 count=200000
>>> 200000+0 records in
>>> 200000+0 records out
>>> 204800000 bytes (205 MB, 195 MiB) copied, 37.6424 s, 5.4 MB/s
>>> root@sama5d3-xplained-sd:~# for i in 1 2 3 4 5 6 7 8; do cat testfile | sha256sum; done
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> 2a4f1534aec6ace9d68f2f42fa28c1f1fe7bd281f960f2218797557aa41fe8de -
>>> root@sama5d3-xplained-sd:~#
>>>
>>> I'll put the rootfs on NAND and try to retest. Maybe to do some other tests
>>> in parallel to have more interrupts on the system. Will let you know if I can
>>> reproduce the bug on sama5d3_xplained.
>>
>> Thanks for testing!
>
> you're welcome, no worries.
>>
>> Since you (probably) don't have the interrupt source from the USB
>> serial chip that I have, that is not completely unexpected.
>>
>> $ lsusb
>> Bus 001 Device 002: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC
>> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
>> $ cat /sys/bus/usb-serial/devices/ttyUSB?/latency_timer
>> 1
>> 1
>> 1
>> 1
>>
>> Also, your file is perhaps too small? You leave approx 50MB for the
>> system, so it might be the case that the page cache can hold the whole
>> file?
>>
>> So, can you please try that again with a slightly bigger file or if you
>> restrict how much RAM you allow the kernel to see?
>>
>> And if you don't have the FTDI usb-serial chip, you should probably go
>> with the other reproducer, namely to simply copy the random file to a
>> different host using scp.
>
> I kept the rootfs on sdcard but this time I generated a 300MB random file.
> I ran a mtd_stresstest on the NAND flash while doing the sha256sum or scp
> tests. All went fine.
>
> Here's the mtd_stresstest being successful https://pastebin.com/eWQNHAsE
> While the stresstest was running I did the following sha256 and scp tests:
> https://pastebin.com/wjutw63C
>
> On my laptop the sha256sum is matching the one on the board:
> $ sha256sum /tmp/testfile?
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile1
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile2
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile3
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile4
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile5
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile6
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile7
> d9232cee3ac29c3a9aaff8b23b4cb2914edd54e21550a555656988596fbd0b58 /tmp/testfile8
>
> Here's what "top" cmd was showing when doing the scp and the mtd_stresstest:
> top - 14:40:13 up 39 min, 3 users, load average: 1.95, 1.88, 1.80
> Tasks: 54 total, 3 running, 51 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 35.1 us, 48.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.9 si, 0.0 st
> MiB Mem : 242.3 total, 2.5 free, 15.2 used, 224.6 buff/cache
> MiB Swap: 0.0 total, 0.0 free, 0.0 used. 220.1 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 464 root 20 0 4296 3292 2940 R 46.6 1.3 0:17.53 ssh
> 401 root 20 0 1668 760 676 R 45.0 0.3 17:57.11 modprobe
> 463 root 20 0 3456 2232 2000 S 5.2 0.9 0:02.04 scp
>
> Here's what "top" cmd was showing when doing the sha256sum and the mtd_stresstest:
> top - 14:12:47 up 12 min, 3 users, load average: 2.14, 1.92, 1.08
> Tasks: 54 total, 3 running, 51 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 37.4 us, 58.4 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 4.2 si, 0.0 st
> MiB Mem : 242.3 total, 3.0 free, 14.8 used, 224.5 buff/cache
> MiB Swap: 0.0 total, 0.0 free, 0.0 used. 220.6 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 420 root 20 0 1396 784 692 R 47.2 0.3 0:06.42 sha256sum
> 401 root 20 0 1668 1208 1124 R 43.0 0.5 4:50.34 modprobe
> 419 root 20 0 1520 868 680 S 6.5 0.3 0:00.92 cat
>
> Peter, do you think it is worth to do some other tests on sama5d3_xplained?
> I'll try to find a SAMA5D31 evaluation kit meanwhile.
>
Peter, would it worth to do on your board a similar test to what I did?
I'm thinking whether the source of interrupts matters or not. So can you
disable your USB and use a mtd NAND stress test as a source of interrupts?
mtd_stresstest together with scp or hexdump.
Cheers,
ta
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-07 9:45 ` Tudor.Ambarus
@ 2022-03-07 11:32 ` Peter Rosin
2022-03-07 20:32 ` Peter Rosin
0 siblings, 1 reply; 39+ messages in thread
From: Peter Rosin @ 2022-03-07 11:32 UTC (permalink / raw)
To: Tudor.Ambarus, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel
On 2022-03-07 10:45, Tudor.Ambarus@microchip.com wrote:
> Peter, would it worth to do on your board a similar test to what I did?
> I'm thinking whether the source of interrupts matters or not. So can you
> disable your USB and use a mtd NAND stress test as a source of interrupts?
> mtd_stresstest together with scp or hexdump.
That's not a quick test for me, since I don't have modules enabled.
I have located my SAMA5D31 evaluation kit, and I think I will try
to get that running instead.
Meanwhile, during the weekend I made tests with a slightly permuted
"old style" struct dev_links_info, i.e. swapping place for the with
defer_sync and needs_suppliers list heads for this layout:
struct dev_links_info {
struct list_head suppliers;
struct list_head consumers;
struct list_head defer_sync;
struct list_head needs_suppliers;
bool need_for_probe;
enum dl_dev_state status;
};
This produces a new failure mode and hits a BUG. Maybe that's a hint
for someone? I have several more of these reports if someone is
interested, but they all look very similar to me.
$ while :; do cat testfile | sha256sum; done
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
[ 690.196564] ------------[ cut here ]------------
[ 690.201193] kernel BUG at drivers/dma/dmaengine.h:54!
[ 690.206249] Internal error: Oops - BUG: 0 [#1] ARM
[ 690.211057] CPU: 0 PID: 1753 Comm: cat Not tainted 5.17.0-rc6+ #72
[ 690.217245] Hardware name: Atmel SAMA5
[ 690.220998] PC is at atc_chain_complete+0x114/0x174
[ 690.225885] LR is at atc_advance_work+0x7c/0x190
[ 690.230510] pc : [<c03de48c>] lr : [<c03de6d4>] psr: 600f0193
[ 690.236793] sp : c0a718e8 ip : 00000000 fp : c0a71a74
[ 690.242030] r10: c0f28000 r9 : c03dd624 r8 : 00000002
[ 690.247267] r7 : 600f0113 r6 : c0d5bae8 r5 : c0d5ba78 r4 : c0d5bacc
[ 690.253811] r3 : 00000000 r2 : c0d5b800 r1 : 00000000 r0 : c0d5ba78
[ 690.260358] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
[ 690.267605] Control: 10c53c7d Table: 20b6c059 DAC: 00000051
[ 690.273361] Register r0 information: slab kmalloc-2k start c0d5b800 pointer offset 632 size 2048
[ 690.282193] Register r1 information: NULL pointer
[ 690.286906] Register r2 information: slab kmalloc-2k start c0d5b800 pointer offset 0 size 2048
[ 690.295545] Register r3 information: NULL pointer
[ 690.300258] Register r4 information: slab kmalloc-2k start c0d5b800 pointer offset 716 size 2048
[ 690.309073] Register r5 information: slab kmalloc-2k start c0d5b800 pointer offset 632 size 2048
[ 690.317887] Register r6 information: slab kmalloc-2k start c0d5b800 pointer offset 744 size 2048
[ 690.326702] Register r7 information: non-paged memory
[ 690.331764] Register r8 information: non-paged memory
[ 690.336825] Register r9 information: non-slab/vmalloc memory
[ 690.342498] Register r10 information: slab kmalloc-4k start c0f28000 pointer offset 0 size 4096
[ 690.351225] Register r11 information: non-slab/vmalloc memory
[ 690.356985] Register r12 information: NULL pointer
[ 690.361786] Process cat (pid: 1753, stack limit = 0x2b6a6c18)
[ 690.367547] Stack: (0xc0a718e8 to 0xc0a72000)
[ 690.371921] 18e0: c0f28000 c0a71a74 00000000 47a25045 c0478b9c c0d5ba78
[ 690.380128] 1900: c0d5bb18 20f28000 00000800 c03de6d4 c0f28000 c0a71a74 00000000 47a25045
[ 690.388330] 1920: c0f15a00 c0a71940 20f28000 00000800 00000002 c0478b9c 00000003 00000000
[ 690.396534] 1940: 00000001 c0a71944 c0a71944 47a25045 0002eb42 c0f1d050 c0f15a00 00000000
[ 690.404737] 1960: 00000000 c0f28000 c0f15a00 c047a23c 00000002 00000000 00000000 c0f1d050
[ 690.412942] 1980: 000005f0 00000000 00000000 00000000 c047a350 c047a36c 00000000 00000000
[ 690.421145] 19a0: 00000000 c0468570 00001030 00000000 00000004 c0f94000 00000000 00000800
[ 690.429348] 19c0: 0002dd98 00000001 c0f1d12c 00000000 c1dec000 c0f28210 00000210 00001030
[ 690.437552] 19e0: 0002dd98 00000000 00000000 c0f1d50c 00000000 00000040 00000000 c0b438e0
[ 690.445757] 1a00: 00000000 00000000 00000000 00000000 00001540 c0f1d050 c0f94000 00000000
[ 690.453959] 1a20: 00000000 00000000 c0a71a74 00000000 06ecc210 c045b078 c0a71a74 c01bc8b8
[ 690.462164] 1a40: 000000bc c0f94000 00000000 06ecc210 c0a71ad0 c1dec000 00000000 c091b928
[ 690.470367] 1a60: 06ecc210 c045b19c c0a71a74 c0484388 000000bc 00000000 00001030 00000000
[ 690.478570] 1a80: 00000000 00000000 00000000 c1dec000 00000000 47a25045 00000004 c16a6000
[ 690.486773] 1aa0: 0000c210 00000376 00001030 c04865e4 00001030 c0a71ad0 c1dec000 0000000a
[ 690.494977] 1ac0: c1dec000 c08e52c4 c091b8c0 c0b44228 00000000 47a25045 c16a6000 00000000
[ 690.503182] 1ae0: c16a6000 c16b9000 00000018 c1dec000 c16a6000 00000000 00000376 c0484250
[ 690.511385] 1b00: 00001030 47a25045 00000540 c1d33480 a0000013 c02f1604 c0c00100 00000000
[ 690.519589] 1b20: 00000018 0000b210 c16b9000 c1dec000 c16a6000 00000000 00001030 c0483144
[ 690.527792] 1b40: 0000b210 00001030 00000000 00000540 c0b62734 c16b7000 c16b7000 0000b210
[ 690.535995] 1b60: 00000018 00001030 c0a71c98 00000018 0000b210 c02cfd38 00001030 00000000
[ 690.544199] 1b80: c16b7000 c0a71c28 c1dec000 c0b40028 c16b7000 c02d2e6c 00001030 00000001
[ 690.552403] 1ba0: 00000000 c02d53a4 60000013 00001030 00000001 c0a71c24 c0a17d00 00000018
[ 690.560607] 1bc0: c16b7000 c0100b14 c16b70e4 c0a71c98 c1dec000 c0a70000 c16b7000 c0a71c98
[ 690.568810] 1be0: 00000000 00000000 c1dec000 47a25045 00000018 c16b7000 c0a71c98 00000000
[ 690.577014] 1c00: 00000000 c1dec000 c16b70e4 00000018 00000000 c02d57ec c0b52718 00000000
[ 690.585218] 1c20: 00000000 c0a17d00 0000007d 20001082 00000000 00000018 0000b210 00001030
[ 690.593423] 1c40: 01140cca 47a25045 70586723 c3fd4ee0 c2cf7000 c12b4b30 c1dec000 0000007d
[ 690.601625] 1c60: 00001082 c0b3f800 c16b7000 c02c5870 00000000 c0182df8 c12b4c20 c0803744
[ 690.609830] 1c80: c0a71cd4 c12b4c18 02710000 00002710 c16c6180 00000000 0000007d 20001082
[ 690.618033] 1ca0: c0a71c9c 47a25045 c3fd4ee0 c3fd4ee0 c16b7000 00001082 00001082 c12b4b30
[ 690.626236] 1cc0: 00001081 00000000 c12b4c24 c02c5f80 00000000 c0b3dd0c 60000013 c0b5b900
[ 690.634440] 1ce0: c0b5b8d8 c0b1d6f4 c01995ac c3fd4ee0 00000000 00000cc0 00001082 47a25045
[ 690.642644] 1d00: c3fd4ee0 c3fd4ee0 c16c6180 c0a71dc4 00001082 c12b4c18 c3fd4ee0 00000000
[ 690.650847] 1d20: c12b4c24 c0176090 000010a0 c0a71e30 c0a71dc4 c0177d40 00000002 c0a70000
[ 690.659050] 1d40: c16c6180 c16c61d8 c0a71f18 61c88647 c0a71d84 c16c6180 c12b4c18 c16c61d8
[ 690.667254] 1d60: 00001082 00000000 00000000 47a25045 00001000 c12b4b30 c0a71dc8 00000000
[ 690.675457] 1d80: 00001000 c0a71f18 c0a71e30 00002000 c16c6180 c017a064 c0b52718 c0a71dc8
[ 690.683660] 1da0: 00000000 c0d8b268 c0b0eb40 00000000 02710000 00000000 c12b4b30 c12b4c18
[ 690.691864] 1dc0: 200f0193 00000000 c3fd4ec0 00000006 c0a71de4 c01381cc 00000001 c0c24040
[ 690.700069] 1de0: c0a71e04 c01382ac 00000040 c0b52730 40000006 c0a70000 00000100 c0b52718
[ 690.708272] 1e00: c0b52734 47a25045 00000000 c16c6180 00000000 c0a71f38 00000000 c0a71f18
[ 690.716475] 1e20: 00000000 00000000 00004004 c01c1f3c c16c6180 00000000 01082000 00000000
[ 690.724680] 1e40: 00000000 00000000 00000000 40040000 00000000 00000000 c12b4b30 47a25045
[ 690.732883] 1e60: 00008000 c0a71f18 c0a71f18 c1dd8600 c16c6180 c0a71f38 00000000 00000001
[ 690.741089] 1e80: c0a71f30 c01c205c 00000000 c0a71ea8 c02fee04 c02fbe10 c0a71f18 c0a71f80
[ 690.749290] 1ea0: c1dd8600 c1c54e80 00000000 00000000 c0a71ecc c02ff91c c07aa990 c0136e98
[ 690.757494] 1ec0: 60000013 ffffffff 00000051 c16c6180 00000000 47a25045 00000002 c1dd8600
[ 690.765697] 1ee0: c0a71f80 00000001 00020000 00000000 00000000 00004004 00020000 c01c37f0
[ 690.773903] 1f00: 00020000 c0d8b240 c0a70000 c0100264 b6c7c000 00020000 00000000 00002000
[ 690.782105] 1f20: 0001e000 c0a71f10 00000001 00000000 c1dd8600 00000000 01080000 00000000
[ 690.790308] 1f40: 00000000 00000000 00000000 40040000 00000000 00000000 b6c7c000 47a25045
[ 690.798512] 1f60: c1dd8600 c1dd8600 01080000 00000000 c0100264 c0a70000 00000003 c01c419c
[ 690.806716] 1f80: 01080000 00000000 00000000 47a25045 00020000 b6c7c000 00020000 b6fdc550
[ 690.814920] 1fa0: 00000003 c0100060 b6c7c000 00020000 00000003 b6c7c000 00020000 00000000
[ 690.823122] 1fc0: b6c7c000 00020000 b6fdc550 00000003 00000003 00000000 0000005e 00020000
[ 690.831327] 1fe0: 00000003 bed35b58 b6dce1db b6dcffc6 600f0030 00000003 00000000 00000000
[ 690.839537] atc_chain_complete from atc_advance_work+0x7c/0x190
[ 690.845562] atc_advance_work from atmel_nand_dma_transfer+0x118/0x234
[ 690.852109] atmel_nand_dma_transfer from atmel_hsmc_nand_pmecc_read_pg+0xd8/0x1c8
[ 690.859698] atmel_hsmc_nand_pmecc_read_pg from atmel_hsmc_nand_pmecc_read_page+0x1c/0x24
[ 690.867901] atmel_hsmc_nand_pmecc_read_page from nand_read_oob+0x268/0x7f8
[ 690.874883] nand_read_oob from mtd_read_oob+0x84/0x148
[ 690.880121] mtd_read_oob from mtd_read+0x60/0x90
[ 690.884832] mtd_read from ubi_io_read+0xf0/0x3fc
[ 690.889545] ubi_io_read from ubi_eba_read_leb+0xb0/0x468
[ 690.894956] ubi_eba_read_leb from ubi_leb_read+0x90/0x104
[ 690.900454] ubi_leb_read from ubifs_leb_read+0x2c/0x78
[ 690.905693] ubifs_leb_read from fallible_read_node+0x84/0x2b0
[ 690.911537] fallible_read_node from ubifs_tnc_locate+0x140/0x1dc
[ 690.917647] ubifs_tnc_locate from do_readpage+0x10c/0x4c4
[ 690.923146] do_readpage from ubifs_readpage+0x4c/0x4e0
[ 690.928381] ubifs_readpage from filemap_read_folio+0x34/0xac
[ 690.934144] filemap_read_folio from filemap_get_pages+0x4c0/0x670
[ 690.940337] filemap_get_pages from filemap_read+0xc4/0x390
[ 690.945922] filemap_read from do_iter_readv_writev+0x128/0x1c0
[ 690.951859] do_iter_readv_writev from do_iter_read+0x88/0x1f0
[ 690.957704] do_iter_read from ovl_read_iter+0x1f4/0x248
[ 690.963030] ovl_read_iter from vfs_read+0x204/0x314
[ 690.968003] vfs_read from ksys_read+0x60/0xe4
[ 690.972454] ksys_read from ret_fast_syscall+0x0/0x58
[ 690.977513] Exception stack(0xc0a71fa8 to 0xc0a71ff0)
[ 690.982586] 1fa0: b6c7c000 00020000 00000003 b6c7c000 00020000 00000000
[ 690.990791] 1fc0: b6c7c000 00020000 b6fdc550 00000003 00000003 00000000 0000005e 00020000
[ 690.998989] 1fe0: 00000003 bed35b58 b6dce1db b6dcffc6
[ 691.004061] Code: c5940028 c580100c c584301c caffffca (e7f001f2)
[ 691.010166] ---[ end trace 0000000000000000 ]---
$ while :; do cat testfile | sha256sum; done
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
[ 1928.214666] ------------[ cut here ]------------
[ 1928.219293] kernel BUG at drivers/dma/dmaengine.h:54!
[ 1928.224350] Internal error: Oops - BUG: 0 [#1] ARM
[ 1928.229157] CPU: 0 PID: 4427 Comm: cat Not tainted 5.17.0-rc6+ #72
[ 1928.235346] Hardware name: Atmel SAMA5
[ 1928.239100] PC is at atc_chain_complete+0x114/0x174
[ 1928.243988] LR is at atc_advance_work+0x7c/0x190
[ 1928.248612] pc : [<c03de48c>] lr : [<c03de6d4>] psr: 600f0193
[ 1928.254895] sp : c17358e8 ip : 00000000 fp : c1735a74
[ 1928.260131] r10: c0f28000 r9 : c03dd624 r8 : 00000002
[ 1928.265367] r7 : 600f0113 r6 : c0d5bae8 r5 : c0d5ba78 r4 : c0d5bacc
[ 1928.271913] r3 : 00000000 r2 : c0d5b800 r1 : 00000000 r0 : c0d5ba78
[ 1928.278460] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
[ 1928.285707] Control: 10c53c7d Table: 20044059 DAC: 00000051
[ 1928.291463] Register r0 information: slab kmalloc-2k start c0d5b800 pointer offset 632 size 2048
[ 1928.300295] Register r1 information: NULL pointer
[ 1928.305007] Register r2 information: slab kmalloc-2k start c0d5b800 pointer offset 0 size 2048
[ 1928.313647] Register r3 information: NULL pointer
[ 1928.318360] Register r4 information: slab kmalloc-2k start c0d5b800 pointer offset 716 size 2048
[ 1928.327174] Register r5 information: slab kmalloc-2k start c0d5b800 pointer offset 632 size 2048
[ 1928.335989] Register r6 information: slab kmalloc-2k start c0d5b800 pointer offset 744 size 2048
[ 1928.344803] Register r7 information: non-paged memory
[ 1928.349865] Register r8 information: non-paged memory
[ 1928.354927] Register r9 information: non-slab/vmalloc memory
[ 1928.360600] Register r10 information: slab kmalloc-4k start c0f28000 pointer offset 0 size 4096
[ 1928.369327] Register r11 information: non-slab/vmalloc memory
[ 1928.375087] Register r12 information: NULL pointer
[ 1928.379887] Process cat (pid: 4427, stack limit = 0x41e59390)
[ 1928.385648] Stack: (0xc17358e8 to 0xc1736000)
[ 1928.390027] 58e0: c0f28000 c1735a74 00000000 f09b5186 c0478b9c c0d5ba78
[ 1928.398228] 5900: c0d5bb18 20f28000 00000800 c03de6d4 00000051 c03dcdc8 c0f15a00 f09b5186
[ 1928.406433] 5920: c0f15a00 c1735940 20f28000 00000800 00000002 c0478b9c 00000003 00000000
[ 1928.414636] 5940: 00000001 c1735944 c1735944 f09b5186 00028218 c0f1d050 c0f15a00 00000000
[ 1928.422839] 5960: 00000000 c0f28000 c0f15a00 c047a23c 00000002 00000000 00000000 c0f1d050
[ 1928.431042] 5980: 000007d0 00000000 00000000 00000000 c047a350 c047a36c 00000000 00000000
[ 1928.439247] 59a0: 00000000 c0468570 00001030 00000000 c159d300 c0f94000 00000000 00000800
[ 1928.447449] 59c0: 0002c404 00000001 c0f1d12c 00000000 c1c78000 c0f28030 00000030 00001030
[ 1928.455654] 59e0: 0002c404 00000000 00000000 c0f1d50c 00000000 00000040 00000000 c0b438e0
[ 1928.463858] 5a00: 00000000 00000000 00000000 00000000 0000c1c0 c0f1d050 c0f94000 00000000
[ 1928.472061] 5a20: 00000000 00000000 c1735a74 00000000 06202030 c045b078 c1735a74 c01bc8b8
[ 1928.480264] 5a40: 000000bc c0f94000 00000000 06202030 c1735ad0 c1c78000 00000000 c091b928
[ 1928.488468] 5a60: 06202030 c045b19c c1735a74 c0484388 000000bc 00000000 00001030 00000000
[ 1928.496671] 5a80: 00000000 00000000 00000000 c1c78000 00000000 f09b5186 00000004 c16b7000
[ 1928.504876] 5aa0: 00002030 00000310 00001030 c04865e4 00001030 c1735ad0 c1c78000 a0000113
[ 1928.513080] 5ac0: c1c78000 c08e52c4 c091b8c0 c0b44228 00000000 f09b5186 c16b7000 00000000
[ 1928.521283] 5ae0: c16b7000 c16be000 0000005a c1c78000 c16b7000 00000000 00000310 c0484250
[ 1928.529487] 5b00: 00001030 f09b5186 0000b1c0 c1d37300 a0000013 c02f1604 c0c00100 00000000
[ 1928.537690] 5b20: 0000005a 00001030 c16be000 c1c78000 c16b7000 00000000 00001030 c0483144
[ 1928.545894] 5b40: 00001030 00001030 00000000 0000b1c0 c0c06018 c16b8000 c16b8000 00001030
[ 1928.554098] 5b60: 0000005a 00001030 c1735c98 0000005a 00001030 c02cfd38 00001030 00000000
[ 1928.562302] 5b80: c16b8000 c1735c28 c1c78000 c0b40028 c16b8000 c02d2e6c 00001030 00000001
[ 1928.570505] 5ba0: 00000000 c02d53a4 00000041 00001030 00000001 c1735c24 c17b5400 f09b5186
[ 1928.578709] 5bc0: c0b52718 00000a20 00000000 c159f400 c159f4f8 00000040 00000000 00000006
[ 1928.586912] 5be0: c0b52718 c04eb804 c159d300 f09b5186 00000000 c16b8000 c1735c98 00000000
[ 1928.595116] 5c00: 00000000 c1c78000 c16b80e4 00000018 00000000 c02d57ec 20000193 00000000
[ 1928.603319] 5c20: 00000002 c17b5400 0000007d 200004fc 00000000 0000005a 00001030 00001030
[ 1928.611523] 5c40: 01140cca f09b5186 70586723 c3fee7c0 c39be000 c12b4b30 c1c78000 0000007d
[ 1928.619727] 5c60: 000004fc c0b3f800 c16b8000 c02c5870 00000000 c0b5b94c c12b4c20 c0b3dd0c
[ 1928.627931] 5c80: c1735cd4 c12b4c18 02710000 00002710 c16e3000 00000000 0000007d 200004fc
[ 1928.636134] 5ca0: c1734000 f09b5186 c3fee7c0 c3fee7c0 c16b8000 000004fc 000004fc c12b4b30
[ 1928.644338] 5cc0: 000004f1 00000000 c12b4c24 c02c5f80 c0b3e164 c12b4c1c 000004fc 003c0000
[ 1928.652542] 5ce0: c12b6e40 00000000 c01995ac f09b5186 00000013 c3fee7c0 c1735e30 f09b5186
[ 1928.660746] 5d00: c3fee7c0 c3fee7c0 c16e3000 c1735dc4 000004fc c12b4c18 c3fee7c0 00000000
[ 1928.668948] 5d20: c12b4c24 c0176090 00000500 c1735e30 c1735dc4 c0177d40 00000002 c1734000
[ 1928.677153] 5d40: c16e3000 c16e3058 c1735f18 61c88647 c1735d84 c16e3000 c12b4c18 c16e3058
[ 1928.685356] 5d60: 000004fc 00000000 00000000 f09b5186 00001000 c12b4b30 c1735dec 00000000
[ 1928.693560] 5d80: 00001000 c1735f18 c1735e30 0001c000 c16e3000 c017a064 c014965c c1735dec
[ 1928.701762] 5da0: 00000000 c3fb0120 00000000 00000000 02710000 00000000 c12b4b30 c12b4c18
[ 1928.709968] 5dc0: 70586723 c1730000 c3fc38e0 c3fae3c0 c3fae140 c3fae2a0 c3fae300 c3fae360
[ 1928.718170] 5de0: c3fadfa0 c3faee60 c3faf480 c3fb0120 00000010 00000000 00000000 c0b032f4
[ 1928.726375] 5e00: c1c00000 f09b5186 c1735e20 c16e3000 00000000 c1735f38 00000000 c1735f18
[ 1928.734578] 5e20: 00000000 00000000 00004004 c01c1f3c c16e3000 00000000 004fc000 00000000
[ 1928.742779] 5e40: 00000000 00000000 00000000 40040000 00000000 00000000 00000006 f09b5186
[ 1928.750985] 5e60: c1735f18 c1735f18 c1735f18 c16e3840 c16e3000 c1735f38 00000000 00000001
[ 1928.759191] 5e80: c1735f30 c01c205c 00000000 70729076 c159d000 f09b5186 c1735f18 c1735f80
[ 1928.767391] 5ea0: c16e3840 c0c22f80 00000000 00000000 c1735ecc c02ff91c c0803c00 00500cc2
[ 1928.775595] 5ec0: 00000001 c0c25240 c013f468 c16e3000 00000000 f09b5186 00000002 c16e3840
[ 1928.783799] 5ee0: c1735f80 00000001 00020000 00000000 00000000 00004004 00020000 c01c37f0
[ 1928.792003] 5f00: 00020000 c0c25240 c1734000 c0100264 b6c2a000 00020000 00000000 0001c000
[ 1928.800206] 5f20: 00004000 c1735f10 00000001 00000000 c16e3840 00000000 004e0000 00000000
[ 1928.808409] 5f40: 00000000 00000000 00000000 40040000 00000000 00000000 b6c2a000 f09b5186
[ 1928.816614] 5f60: c16e3840 c16e3840 004e0000 00000000 c0100264 c1734000 00000003 c01c419c
[ 1928.824817] 5f80: 004e0000 00000000 10c53c7d f09b5186 00020000 b6c2a000 00020000 b6f8a550
[ 1928.833022] 5fa0: 00000003 c0100060 b6c2a000 00020000 00000003 b6c2a000 00020000 00000000
[ 1928.841224] 5fc0: b6c2a000 00020000 b6f8a550 00000003 00000003 00000000 0000005e 00020000
[ 1928.849428] 5fe0: 00000003 bebf4b58 b6d7c1db b6d7dfc6 600f0030 00000003 00000000 00000000
[ 1928.857638] atc_chain_complete from atc_advance_work+0x7c/0x190
[ 1928.863664] atc_advance_work from atmel_nand_dma_transfer+0x118/0x234
[ 1928.870209] atmel_nand_dma_transfer from atmel_hsmc_nand_pmecc_read_pg+0xd8/0x1c8
[ 1928.877799] atmel_hsmc_nand_pmecc_read_pg from atmel_hsmc_nand_pmecc_read_page+0x1c/0x24
[ 1928.886003] atmel_hsmc_nand_pmecc_read_page from nand_read_oob+0x268/0x7f8
[ 1928.892985] nand_read_oob from mtd_read_oob+0x84/0x148
[ 1928.898222] mtd_read_oob from mtd_read+0x60/0x90
[ 1928.902933] mtd_read from ubi_io_read+0xf0/0x3fc
[ 1928.907647] ubi_io_read from ubi_eba_read_leb+0xb0/0x468
[ 1928.913057] ubi_eba_read_leb from ubi_leb_read+0x90/0x104
[ 1928.918555] ubi_leb_read from ubifs_leb_read+0x2c/0x78
[ 1928.923792] ubifs_leb_read from fallible_read_node+0x84/0x2b0
[ 1928.929639] fallible_read_node from ubifs_tnc_locate+0x140/0x1dc
[ 1928.935748] ubifs_tnc_locate from do_readpage+0x10c/0x4c4
[ 1928.941246] do_readpage from ubifs_readpage+0x4c/0x4e0
[ 1928.946482] ubifs_readpage from filemap_read_folio+0x34/0xac
[ 1928.952244] filemap_read_folio from filemap_get_pages+0x4c0/0x670
[ 1928.958439] filemap_get_pages from filemap_read+0xc4/0x390
[ 1928.964023] filemap_read from do_iter_readv_writev+0x128/0x1c0
[ 1928.969961] do_iter_readv_writev from do_iter_read+0x88/0x1f0
[ 1928.975805] do_iter_read from ovl_read_iter+0x1f4/0x248
[ 1928.981131] ovl_read_iter from vfs_read+0x204/0x314
[ 1928.986104] vfs_read from ksys_read+0x60/0xe4
[ 1928.990555] ksys_read from ret_fast_syscall+0x0/0x58
[ 1928.995614] Exception stack(0xc1735fa8 to 0xc1735ff0)
[ 1929.000687] 5fa0: b6c2a000 00020000 00000003 b6c2a000 00020000 00000000
[ 1929.008892] 5fc0: b6c2a000 00020000 b6f8a550 00000003 00000003 00000000 0000005e 00020000
[ 1929.017091] 5fe0: 00000003 bebf4b58 b6d7c1db b6d7dfc6
[ 1929.022161] Code: c5940028 c580100c c584301c caffffca (e7f001f2)
[ 1929.028267] ---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-07 11:32 ` Peter Rosin
@ 2022-03-07 20:32 ` Peter Rosin
2022-03-08 7:55 ` Nicolas Ferre
0 siblings, 1 reply; 39+ messages in thread
From: Peter Rosin @ 2022-03-07 20:32 UTC (permalink / raw)
To: Tudor.Ambarus, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel
On 2022-03-07 12:32, Peter Rosin wrote:
> On 2022-03-07 10:45, Tudor.Ambarus@microchip.com wrote:
>> Peter, would it worth to do on your board a similar test to what I did?
>> I'm thinking whether the source of interrupts matters or not. So can you
>> disable your USB and use a mtd NAND stress test as a source of interrupts?
>> mtd_stresstest together with scp or hexdump.
>
> That's not a quick test for me, since I don't have modules enabled.
> I have located my SAMA5D31 evaluation kit, and I think I will try
> to get that running instead.
Hi again!
I got my SAMA5D31EK board running, using a freshly built at91bootstrap
and u-boot according to linux4sam.org and using the cross compiler I
have used from buildroot 2021.08, i.e. gcc 10.3.0, then using the
dtb for the ME20 from the original post and the same kernel and userspace
as I have used previously. Now, that dtb describes things that may not
actually be there etc etc, and I will try with a proper dtb as well
tomorrow, so this was just a quick-n-dirty test. I also added mem=64MB
to the kernel command line, to mimic our "Linea" CPU module and get a
bit quicker turnaround in the page cache.
Anyway, with that setup I can reproduce the problem on the EK board.
$ while :; do cat testfile | sha256sum; done
5a939c69dd60a1f991e43d278d2e824a0e7376600a6b20c8e8b347871c546f9b -
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
250556db0a6ac3c3e101ae2845da48ebb39a0c12d4c9b9eec5ea229c426bcce9 -
874c694ed002b04b67bf354a95ee521effd07e198f363e02cd63069a94fd4df8 -
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
c3a918a923ff2d504a45ffa51289e69fb6d8aa1140cca3fd9f30703b18d9e509 -
1577ed72d2f296f9adc50707e0e56547ecb311fa21ad875a3d55ca908c440307 -
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
But apparently only if I have an FTDI usb-serial adapter attached
while I boot. I also start to get good hashes if I remove and
reinsert the FTDI adapter, which is interesting.
$ while :; do cat testfile | sha256sum; done
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
...
*snip many dozens of lines*
...
7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
It's of course hard to prove the absence of trouble, but it feels
like it is working from both of those latter cases...
(for the "real" case the FTDI usb-serial adapter is soldered in,
with no easy way to make it go away, so it is not as easy to do the
same test there.)
I'll try to reduce the number of local parts of the setup further
tomorrow, such as the dtb mentioned above and the rootfs. I was
hoping for a binary download of prebuilt parts, but some links on
https://www.linux4sam.org/bin/view/Linux4SAM/Sama5d3xekMainPage
are bogus. E.g.
ftp://twiki.lnx4mchp_backend/pub/demo/linux4sam_4.7/linux4sam-poky-sama5d3xek-4.7.zip
What's up with that twiki.lnx4mchp_backend "host"?
Cheers,
Peter
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-07 20:32 ` Peter Rosin
@ 2022-03-08 7:55 ` Nicolas Ferre
2022-03-09 8:30 ` Peter Rosin
0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Ferre @ 2022-03-08 7:55 UTC (permalink / raw)
To: Peter Rosin, Tudor.Ambarus, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel
On 07/03/2022 at 21:32, Peter Rosin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 2022-03-07 12:32, Peter Rosin wrote:
>> On 2022-03-07 10:45, Tudor.Ambarus@microchip.com wrote:
>>> Peter, would it worth to do on your board a similar test to what I did?
>>> I'm thinking whether the source of interrupts matters or not. So can you
>>> disable your USB and use a mtd NAND stress test as a source of interrupts?
>>> mtd_stresstest together with scp or hexdump.
>>
>> That's not a quick test for me, since I don't have modules enabled.
>> I have located my SAMA5D31 evaluation kit, and I think I will try
>> to get that running instead.
>
> Hi again!
>
> I got my SAMA5D31EK board running, using a freshly built at91bootstrap
> and u-boot according to linux4sam.org and using the cross compiler I
> have used from buildroot 2021.08, i.e. gcc 10.3.0, then using the
> dtb for the ME20 from the original post and the same kernel and userspace
> as I have used previously. Now, that dtb describes things that may not
> actually be there etc etc, and I will try with a proper dtb as well
> tomorrow, so this was just a quick-n-dirty test. I also added mem=64MB
> to the kernel command line, to mimic our "Linea" CPU module and get a
> bit quicker turnaround in the page cache.
>
> Anyway, with that setup I can reproduce the problem on the EK board.
>
> $ while :; do cat testfile | sha256sum; done
> 5a939c69dd60a1f991e43d278d2e824a0e7376600a6b20c8e8b347871c546f9b -
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
> 250556db0a6ac3c3e101ae2845da48ebb39a0c12d4c9b9eec5ea229c426bcce9 -
> 874c694ed002b04b67bf354a95ee521effd07e198f363e02cd63069a94fd4df8 -
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
> c3a918a923ff2d504a45ffa51289e69fb6d8aa1140cca3fd9f30703b18d9e509 -
> 1577ed72d2f296f9adc50707e0e56547ecb311fa21ad875a3d55ca908c440307 -
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>
>
> But apparently only if I have an FTDI usb-serial adapter attached
> while I boot. I also start to get good hashes if I remove and
> reinsert the FTDI adapter, which is interesting.
>
> $ while :; do cat testfile | sha256sum; done
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
> ...
> *snip many dozens of lines*
> ...
> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>
> It's of course hard to prove the absence of trouble, but it feels
> like it is working from both of those latter cases...
>
> (for the "real" case the FTDI usb-serial adapter is soldered in,
> with no easy way to make it go away, so it is not as easy to do the
> same test there.)
>
> I'll try to reduce the number of local parts of the setup further
> tomorrow, such as the dtb mentioned above and the rootfs. I was
> hoping for a binary download of prebuilt parts, but some links on
>
> https://www.linux4sam.org/bin/view/Linux4SAM/Sama5d3xekMainPage
>
> are bogus. E.g.
>
> ftp://twiki.lnx4mchp_backend/pub/demo/linux4sam_4.7/linux4sam-poky-sama5d3xek-4.7.zip
Okay, that's a bug in the TWiki page.
> What's up with that twiki.lnx4mchp_backend "host"?
URL must be:
https://files.linux4sam.org/pub/demo/linux4sam_4.7/linux4sam-poky-sama5d3xek-4.7.zip
Regards,
Nicolas
--
Nicolas Ferre
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-08 7:55 ` Nicolas Ferre
@ 2022-03-09 8:30 ` Peter Rosin
[not found] ` <6d9561a4-39e4-3dbe-5fe2-c6f88ee2a4c6@axentia.se>
0 siblings, 1 reply; 39+ messages in thread
From: Peter Rosin @ 2022-03-09 8:30 UTC (permalink / raw)
To: Nicolas Ferre, Tudor.Ambarus, saravanak
Cc: alexandre.belloni, gregkh, linux-kernel, du, Ludovic.Desroches,
linux-arm-kernel
On 2022-03-08 08:55, Nicolas Ferre wrote:
> On 07/03/2022 at 21:32, Peter Rosin wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> On 2022-03-07 12:32, Peter Rosin wrote:
>>> On 2022-03-07 10:45, Tudor.Ambarus@microchip.com wrote:
>>>> Peter, would it worth to do on your board a similar test to what I did?
>>>> I'm thinking whether the source of interrupts matters or not. So can you
>>>> disable your USB and use a mtd NAND stress test as a source of interrupts?
>>>> mtd_stresstest together with scp or hexdump.
>>>
>>> That's not a quick test for me, since I don't have modules enabled.
>>> I have located my SAMA5D31 evaluation kit, and I think I will try
>>> to get that running instead.
>>
>> Hi again!
>>
>> I got my SAMA5D31EK board running, using a freshly built at91bootstrap
>> and u-boot according to linux4sam.org and using the cross compiler I
>> have used from buildroot 2021.08, i.e. gcc 10.3.0, then using the
>> dtb for the ME20 from the original post and the same kernel and userspace
>> as I have used previously. Now, that dtb describes things that may not
>> actually be there etc etc, and I will try with a proper dtb as well
>> tomorrow, so this was just a quick-n-dirty test. I also added mem=64MB
>> to the kernel command line, to mimic our "Linea" CPU module and get a
>> bit quicker turnaround in the page cache.
>>
>> Anyway, with that setup I can reproduce the problem on the EK board.
>>
>> $ while :; do cat testfile | sha256sum; done
>> 5a939c69dd60a1f991e43d278d2e824a0e7376600a6b20c8e8b347871c546f9b -
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>> 250556db0a6ac3c3e101ae2845da48ebb39a0c12d4c9b9eec5ea229c426bcce9 -
>> 874c694ed002b04b67bf354a95ee521effd07e198f363e02cd63069a94fd4df8 -
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>> c3a918a923ff2d504a45ffa51289e69fb6d8aa1140cca3fd9f30703b18d9e509 -
>> 1577ed72d2f296f9adc50707e0e56547ecb311fa21ad875a3d55ca908c440307 -
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>>
>>
>> But apparently only if I have an FTDI usb-serial adapter attached
>> while I boot. I also start to get good hashes if I remove and
>> reinsert the FTDI adapter, which is interesting.
>>
>> $ while :; do cat testfile | sha256sum; done
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>> ...
>> *snip many dozens of lines*
>> ...
>> 7bf74cf37c8bf81ad4f8e86da8eb129a8ae0ee0f5a22ac584ad39233b97acb4d -
>>
>> It's of course hard to prove the absence of trouble, but it feels
>> like it is working from both of those latter cases...
>>
>> (for the "real" case the FTDI usb-serial adapter is soldered in,
>> with no easy way to make it go away, so it is not as easy to do the
>> same test there.)
>>
>> I'll try to reduce the number of local parts of the setup further
>> tomorrow, such as the dtb mentioned above and the rootfs. I was
>> hoping for a binary download of prebuilt parts, but some links on
>>
>> https://www.linux4sam.org/bin/view/Linux4SAM/Sama5d3xekMainPage
>>
>> are bogus. E.g.
>>
>> ftp://twiki.lnx4mchp_backend/pub/demo/linux4sam_4.7/linux4sam-poky-sama5d3xek-4.7.zip
>
> Okay, that's a bug in the TWiki page.
>> What's up with that twiki.lnx4mchp_backend "host"?
>
> URL must be:
> https://files.linux4sam.org/pub/demo/linux4sam_4.7/linux4sam-poky-sama5d3xek-4.7.zip
Thanks,
I ended up not using that anyway since it didn't reproduce right
away. So, I went back to something I knew was workable and built
a smaller reproducer that isn't depending on any of our code. I
uploaded it to github.
https://github.com/peda-r/sama5d31
I make that, then flash it from the output sam-ba dir with sam-ba 3.2.
$ make
... *snip* *snip* *snip* *snip* ...
$ cd sam-ba
$ .../sam-ba_3.2.1/sam-ba -x prog-sama5d31ek.qml ttyACM0
... *snip* ...
Then on first boot, I append mem=64MB to the kernel command line.
Also, since I no longer have anything else that accesses the serial
ports I need something to make them fire USB interrupts, hence the
"cat /dev/ttyUSB0 &" etc commands in the below transcript. I have
also bumped the testfile to 50MB since there are fewer things going
on, and thus more memory available for the page cache.
I have the ETDI serial adapter in the top USB slot since the udev
rule that sets the latency_timer to 1ms is written as it is; it
is based on what we use for the soldered in case on the "real"
hardware. It shouldn't really matter, I can connect the FTDI serial
adapter to the other USB port and set the latency_timers to 1ms
manually and still reproduce.
I have some trouble getting the network going on the EK board,
and I plan to dig into that next and check if I can also reproduce
with the scp load. I'm not too hopeful thoough, since I fail to
reproduce even with the "cat testfile | sha256sum" load when FTDI
serial adapter has not been connected all the time since boot. That
makes me think that the issue is there for the scp load only because
the ETDI serial adapter is always there on the "real" board, and
that it will be hard to reproduce without that adapter in place.
Cheers,
Peter
-------------- transcript --------------
RomBOOT
AT91Bootstrap 3.10.3 (2022-03-08 17:40:20)
1-Wire: Loading 1-Wire information ...
1-Wire: ROM Searching ... Done, 2 1-Wire chips found
1-Wire: BoardName | [Revid] | VendorName
#0 SAMA5D31-CM [DD4] EMBEST
#1 SAMA5D3x-MB [CC3] FLEX
1-Wire: Board sn: 0x480002a revision: 0x620803
NAND: ONFI flash detected
NAND: Manufacturer ID: 0x2c Chip ID: 0xda
NAND: Page Bytes: 2048, Spare Bytes: 64
NAND: ECC Correctability Bits: 4, ECC Sector Bytes: 512
NAND: Disable On-Die ECC
NAND: Initialize PMECC params, cap: 4, sector: 512
NAND: Image: Copy 0xa0000 bytes from 0x40000 to 0x26f00000
NAND: Done to load image
<debug_uart>
U-Boot 2017.03-linux4sam_5.8 (Mar 08 2022 - 17:40:32 +0100)
CPU: SAMA5D31
Crystal frequency: 12 MHz
CPU clock : 528 MHz
Master clock : 132 MHz
DRAM: 512 MiB
Flash: 16 MiB
NAND: 256 MiB
MMC: Atmel mci: 0, Atmel mci: 1
*** Warning - bad CRC, using default environment
In: serial
Out: serial
Err: serial
Net:
Error: ethernet@f0028000 address not set.
No ethernet found.
Hit any key to stop autoboot: 0
=> printenv bootargs
bootargs=console=ttyS0,115200 earlyprintk mtdparts=atmel_nand:256k(bootstrap)ro,768k(uboot)ro,256K(env_redundant),256k(env),512k(dtb),6M(kernel)ro,-(rootfs) rootfstype=ubifs ubi.mtd=6 root=ubi0:rootfs
=> setenv bootargs console=ttyS0,115200 earlyprintk mtdparts=atmel_nand:256k(bootstrap)ro,768k(uboot)ro,256K(env_redundant),256k(env),512k(dtb),6M(kernel)ro,-(rootfs) rootfstype=ubifs ubi.mtd=6 root=ubi0:rootfs mem=64MB
=> saveenv
Saving Environment to NAND...
Erasing redundant NAND...
Erasing at 0x100000 -- 100% complete.
Writing to redundant NAND... OK
=> boot
NAND read: device 0 offset 0x180000, size 0x80000
524288 bytes read: OK
NAND read: device 0 offset 0x200000, size 0x600000
6291456 bytes read: OK
## Flattened Device Tree blob at 21000000
Booting using the fdt blob at 0x21000000
Loading Device Tree to 3fb42000, end 3fb4b8cf ... OK
Starting kernel ...
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 5.17.0-rc7 (peda@orc) (arm-buildroot-linux-gnueabihf-gcc.br_real (Buildroot 2021.08.3) 10.3.0, GNU ld (GNU Binutils) 2.36.1) #1 Tue Mar 8 17:48:36 CET 2022
[ 0.000000] CPU: ARMv7 Processor [410fc051] revision 1 (ARMv7), cr=10c53c7d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] OF: fdt: Machine model: Atmel SAMA5D31-EK
[ 0.000000] Memory policy: Data cache writeback
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000020000000-0x0000000023ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000020000000-0x0000000023ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000020000000-0x0000000023ffffff]
[ 0.000000] CPU: All CPU(s) started in SVC mode.
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 16256
[ 0.000000] Kernel command line: console=ttyS0,115200 earlyprintk mtdparts=atmel_nand:256k(bootstrap)ro,768k(uboot)ro,256K(env_redundant),256k(env),512k(dtb),6M(kernel)ro,-(rootfs) rootfstype=ubifs ubi.mtd=6 root=ubi0:rootfs mem=64MB
[ 0.000000] Unknown kernel command line parameters "earlyprintk", will be passed to user space.
[ 0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 54160K/65536K available (7168K kernel code, 325K rwdata, 1344K rodata, 1024K init, 104K bss, 11376K reserved, 0K cma-reserved)
[ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[ 0.000000] random: get_random_bytes called from start_kernel+0x3ec/0x524 with crng_init=0
[ 0.000000] clocksource: timer@f0010000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 115833966437 ns
[ 0.000004] sched_clock: 32 bits at 16MHz, resolution 60ns, wraps every 130150523873ns
[ 0.000056] Switching to timer-based delay loop, resolution 60ns
[ 0.000477] clocksource: pit: mask: 0xfffffff max_cycles: 0xfffffff, max_idle_ns: 14479245754 ns
[ 0.001100] Console: colour dummy device 80x30
[ 0.001189] Calibrating delay loop (skipped), value calculated using timer frequency.. 33.00 BogoMIPS (lpj=165000)
[ 0.001241] pid_max: default: 32768 minimum: 301
[ 0.001504] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.001565] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.002635] CPU: Testing write buffer coherency: ok
[ 0.003882] Setting up static identity map for 0x20100000 - 0x20100060
[ 0.005538] devtmpfs: initialized
[ 0.016983] VFP support v0.3: implementor 41 architecture 2 part 30 variant 5 rev 1
[ 0.017461] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.017533] futex hash table entries: 256 (order: -1, 3072 bytes, linear)
[ 0.017699] pinctrl core: initialized pinctrl subsystem
[ 0.019515] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 0.020668] DMA: preallocated 256 KiB pool for atomic coherent allocations
[ 0.057473] AT91: PM: standby: standby, suspend: ulp0
[ 0.057529] No ATAGs?
[ 0.058784] gpio-at91 fffff200.gpio: at address (ptrval)
[ 0.060184] gpio-at91 fffff400.gpio: at address (ptrval)
[ 0.061631] gpio-at91 fffff600.gpio: at address (ptrval)
[ 0.063128] gpio-at91 fffff800.gpio: at address (ptrval)
[ 0.064739] gpio-at91 fffffa00.gpio: at address (ptrval)
[ 0.066585] pinctrl-at91 ahb:apb:pinctrl@fffff200: initialized AT91 pinctrl driver
[ 0.080562] at_hdmac ffffe600.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels
[ 0.082434] at_hdmac ffffe800.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels
[ 0.084762] AT91: Detected SoC family: sama5d3
[ 0.084805] AT91: Detected SoC: sama5d31, revision 2
[ 0.085672] SCSI subsystem initialized
[ 0.086186] usbcore: registered new interface driver usbfs
[ 0.086329] usbcore: registered new interface driver hub
[ 0.086466] usbcore: registered new device driver usb
[ 0.087663] at91_i2c f0014000.i2c: using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers
[ 0.088083] i2c i2c-0: using pinctrl states for GPIO recovery
[ 0.088224] i2c i2c-0: using generic GPIOs for recovery
[ 0.088698] at91_i2c f0014000.i2c: AT91 i2c bus driver (hw version: 0x402).
[ 0.089833] at91_i2c f0018000.i2c: using dma0chan2 (tx) and dma0chan3 (rx) for DMA transfers
[ 0.090295] i2c i2c-1: using pinctrl states for GPIO recovery
[ 0.090433] i2c i2c-1: using generic GPIOs for recovery
[ 0.092266] at91_i2c f0018000.i2c: AT91 i2c bus driver (hw version: 0x402).
[ 0.093647] Advanced Linux Sound Architecture Driver Initialized.
[ 0.095756] clocksource: Switched to clocksource timer@f0010000
[ 0.118209] NET: Registered PF_INET protocol family
[ 0.118510] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.119613] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[ 0.119704] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.119761] TCP bind hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.119809] TCP: Hash tables configured (established 1024 bind 1024)
[ 0.120102] UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.120186] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.120498] NET: Registered PF_UNIX/PF_LOCAL protocol family
[ 0.122255] workingset: timestamp_bits=30 max_order=14 bucket_order=0
[ 0.123405] io scheduler mq-deadline registered
[ 0.123457] io scheduler kyber registered
[ 0.136138] brd: module loaded
[ 0.149150] loop: module loaded
[ 0.149846] ssc f0008000.ssc: Atmel SSC device at 0x(ptrval) (irq 21)
[ 0.151507] atmel_usart_serial.0.auto: ttyS2 at MMIO 0xf0020000 (irq = 24, base_baud = 4125000) is a ATMEL_SERIAL
[ 0.153485] atmel_usart_serial.1.auto: ttyS0 at MMIO 0xffffee00 (irq = 34, base_baud = 8250000) is a ATMEL_SERIAL
[ 0.705038] printk: console [ttyS0] enabled
[ 0.716182] macb f802c000.ethernet: invalid hw address, using random
[ 0.751175] macb f802c000.ethernet eth0: Cadence MACB rev 0x0001010c at 0xf802c000 irq 42 (d2:e4:fe:11:9c:b2)
[ 0.761741] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 0.768332] ehci-atmel: EHCI Atmel driver
[ 0.776581] atmel-ehci 700000.ehci: EHCI Host Controller
[ 0.781999] atmel-ehci 700000.ehci: new USB bus registered, assigned bus number 1
[ 0.789730] atmel-ehci 700000.ehci: irq 44, io mem 0x00700000
[ 0.820024] atmel-ehci 700000.ehci: USB 2.0 started, EHCI 1.00
[ 0.826275] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.17
[ 0.834597] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 0.841838] usb usb1: Product: EHCI Host Controller
[ 0.846697] usb usb1: Manufacturer: Linux 5.17.0-rc7 ehci_hcd
[ 0.852481] usb usb1: SerialNumber: 700000.ehci
[ 0.858200] hub 1-0:1.0: USB hub found
[ 0.862177] hub 1-0:1.0: 3 ports detected
[ 0.867560] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 0.873824] ohci-atmel: OHCI Atmel driver
[ 0.879375] at91_ohci 600000.ohci: USB Host Controller
[ 0.884635] at91_ohci 600000.ohci: new USB bus registered, assigned bus number 2
[ 0.892287] at91_ohci 600000.ohci: irq 44, io mem 0x00600000
[ 0.964328] usb usb2: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 5.17
[ 0.972650] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 0.979859] usb usb2: Product: USB Host Controller
[ 0.984686] usb usb2: Manufacturer: Linux 5.17.0-rc7 ohci_hcd
[ 0.990440] usb usb2: SerialNumber: at91
[ 0.995541] hub 2-0:1.0: USB hub found
[ 0.999369] hub 2-0:1.0: 3 ports detected
[ 1.005896] usbcore: registered new interface driver uas
[ 1.011436] usbcore: registered new interface driver usb-storage
[ 1.017508] usbcore: registered new interface driver ums-alauda
[ 1.023564] usbcore: registered new interface driver ums-cypress
[ 1.029629] usbcore: registered new interface driver ums-datafab
[ 1.035731] usbcore: registered new interface driver ums_eneub6250
[ 1.042038] usbcore: registered new interface driver ums-freecom
[ 1.048095] usbcore: registered new interface driver ums-isd200
[ 1.054109] usbcore: registered new interface driver ums-jumpshot
[ 1.060291] usbcore: registered new interface driver ums-karma
[ 1.066174] usbcore: registered new interface driver ums-onetouch
[ 1.072367] usbcore: registered new interface driver ums-realtek
[ 1.078434] usbcore: registered new interface driver ums-sddr09
[ 1.084463] usbcore: registered new interface driver ums-sddr55
[ 1.090482] usbcore: registered new interface driver ums-usbat
[ 1.096513] usbcore: registered new interface driver ftdi_sio
[ 1.102376] usbserial: USB Serial support registered for FTDI USB Serial Device
[ 1.110721] atmel_usba_udc 500000.gadget: MMIO registers at [mem 0xf8030000-0xf8033fff] mapped at (ptrval)
[ 1.120587] atmel_usba_udc 500000.gadget: FIFO at [mem 0x00500000-0x005fffff] mapped at (ptrval)
[ 1.132265] g_serial gadget: Gadget Serial v2.4
[ 1.136785] g_serial gadget: g_serial ready
[ 1.143169] at91_rtc fffffeb0.rtc: registered as rtc0
[ 1.148247] at91_rtc fffffeb0.rtc: setting system clock to 2015-05-16T14:19:33 UTC (1431785973)
[ 1.157038] at91_rtc fffffeb0.rtc: AT91 Real Time Clock driver.
[ 1.163257] i2c_dev: i2c /dev entries driver
[ 1.169663] at91-reset fffffe00.rstc: Starting after user reset
[ 1.176794] at91_wdt fffffe40.watchdog: watchdog is disabled
[ 1.182495] at91_wdt: probe of fffffe40.watchdog failed with error -22
[ 1.190832] atmel_aes f8038000.aes: version: 0x135
[ 1.196145] atmel_aes f8038000.aes: Atmel AES - Using dma1chan0, dma1chan1 for DMA transfers
[ 1.205373] atmel_sha f8034000.sha: version: 0x410
[ 1.210437] atmel_sha f8034000.sha: using dma1chan2 for DMA transfers
[ 1.216976] atmel_sha f8034000.sha: Atmel SHA1/SHA256/SHA224/SHA384/SHA512
[ 1.224567] atmel_tdes f803c000.tdes: version: 0x701
[ 1.229943] atmel_tdes f803c000.tdes: using dma1chan3, dma1chan4 for DMA transfers
[ 1.237747] atmel_tdes f803c000.tdes: Atmel DES/TDES
[ 1.243284] usbcore: registered new interface driver usbhid
[ 1.248839] usbhid: USB HID core driver
[ 1.258067] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[ 1.264472] nand: Micron MT29F2G08ABAEAWP
[ 1.268447] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[ 1.276874] usb 1-2: new high-speed USB device number 2 using atmel-ehci
[ 1.286322] Bad block table not found for chip 0
[ 1.293124] Bad block table not found for chip 0
[ 1.297723] Scanning device for bad blocks
[ 1.500885] Bad block table written to 0x00000ffe0000, version 0x01
[ 1.508187] Bad block table written to 0x00000ffc0000, version 0x01
[ 1.514569] 7 cmdlinepart partitions found on MTD device atmel_nand
[ 1.520869] Creating 7 MTD partitions on "atmel_nand":
[ 1.525983] 0x000000000000-0x000000040000 : "bootstrap"
[ 1.532203] mtdblock: MTD device 'bootstrap' is NAND, please consider using UBI block devices instead.
[ 1.543914] 0x000000040000-0x000000100000 : "uboot"
[ 1.549845] mtdblock: MTD device 'uboot' is NAND, please consider using UBI block devices instead.
[ 1.560705] 0x000000100000-0x000000140000 : "env_redundant"
[ 1.567241] mtdblock: MTD device 'env_redundant' is NAND, please consider using UBI block devices instead.
[ 1.579007] 0x000000140000-0x000000180000 : "env"
[ 1.584771] mtdblock: MTD device 'env' is NAND, please consider using UBI block devices instead.
[ 1.595462] 0x000000180000-0x000000200000 : "dtb"
[ 1.601194] mtdblock: MTD device 'dtb' is NAND, please consider using UBI block devices instead.
[ 1.611800] 0x000000200000-0x000000800000 : "kernel"
[ 1.617732] mtdblock: MTD device 'kernel' is NAND, please consider using UBI block devices instead.
[ 1.629150] 0x000000800000-0x000010000000 : "rootfs"
[ 1.637058] usb 1-2: New USB device found, idVendor=0403, idProduct=6011, bcdDevice= 8.00
[ 1.645340] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 1.652514] usb 1-2: Product: Quad RS232-HS
[ 1.656687] usb 1-2: Manufacturer: FTDI
[ 1.661074] mtdblock: MTD device 'rootfs' is NAND, please consider using UBI block devices instead.
[ 1.673740] iio iio:device0: Resolution used: 12 bits
[ 1.679427] input: at91_adc as /devices/platform/ahb/ahb:apb/f8018000.adc/input/input0
[ 1.687403] random: fast init done
[ 1.694695] ftdi_sio 1-2:1.0: FTDI USB Serial Device converter detected
[ 1.701699] usb 1-2: Detected FT4232H
[ 1.707813] xt_time: kernel timezone is -0000
[ 1.712672] gre: GRE over IPv4 demultiplexor driver
[ 1.717681] Initializing XFRM netlink socket
[ 1.722278] NET: Registered PF_INET6 protocol family
[ 1.729303] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB0
[ 1.738202] Segment Routing with IPv6
[ 1.741969] In-situ OAM (IOAM) with IPv6
[ 1.746187] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 1.753545] NET: Registered PF_PACKET protocol family
[ 1.759611] ftdi_sio 1-2:1.1: FTDI USB Serial Device converter detected
[ 1.766611] usb 1-2: Detected FT4232H
[ 1.771399] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB1
[ 1.783894] ftdi_sio 1-2:1.2: FTDI USB Serial Device converter detected
[ 1.790948] usb 1-2: Detected FT4232H
[ 1.798470] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB2
[ 1.807567] ftdi_sio 1-2:1.3: FTDI USB Serial Device converter detected
[ 1.814557] usb 1-2: Detected FT4232H
[ 1.820308] usb 1-2: FTDI USB Serial Device converter now attached to ttyUSB3
[ 1.843413] ubi0: attaching mtd6
[ 2.648198] ubi0: scanning is finished
[ 2.674329] gluebi (pid 1): gluebi_resized: got update notification for unknown UBI device 0 volume 0
[ 2.683623] ubi0: volume 0 ("rootfs") re-sized from 90 to 1940 LEBs
[ 2.691016] ubi0: attached mtd6 (name "rootfs", size 248 MiB)
[ 2.696764] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 2.703703] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[ 2.710494] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 2.717454] ubi0: good PEBs: 1980, bad PEBs: 4, corrupted PEBs: 0
[ 2.723582] ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
[ 2.730822] ubi0: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 1391204677
[ 2.739970] ubi0: available PEBs: 0, total reserved PEBs: 1980, PEBs reserved for bad PEB handling: 36
[ 2.749603] ubi0: background thread "ubi_bgt0d" started, PID 67
[ 2.758960] ALSA device list:
[ 2.761952] No soundcards found.
[ 2.769813] UBIFS (ubi0:0): Mounting in unauthenticated mode
[ 2.882936] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "rootfs", R/O mode
[ 2.891290] UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 2.901241] UBIFS (ubi0:0): FS size: 244936704 bytes (233 MiB, 1929 LEBs), max 2048 LEBs, journal size 9023488 bytes (8 MiB, 72 LEBs)
[ 2.913292] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 2.919115] UBIFS (ubi0:0): media format: w4/r0 (latest is w5/r0), UUID 6AAC8EC5-1B1E-4E71-9F6F-EEB719E02AFC, small LPT model
[ 2.935358] VFS: Mounted root (ubifs filesystem) readonly on device 0:13.
[ 2.945679] devtmpfs: mounted
[ 2.951449] Freeing unused kernel image (initmem) memory: 1024K
[ 2.958144] Run /sbin/init as init process
[ 3.303513] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 70
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Populating /dev using udev: [ 4.159275] udevd[97]: starting version 3.2.10
[ 4.193870] random: udevd: uninitialized urandom read (16 bytes read)
[ 4.226640] random: udevd: uninitialized urandom read (16 bytes read)
[ 4.236763] random: udevd: uninitialized urandom read (16 bytes read)
[ 4.325369] random: crng init done
[ 4.353551] udevd[98]: starting eudev-3.2.10
[ 6.162090] ubi0 error: ubi_open_volume: cannot open device 0, volume 0, error -16
[ 6.214815] ubi0 error: ubi_open_volume: cannot open device 0, volume 0, error -16
done
Saving random seed: OK
Starting network: [ 7.090546] macb f802c000.ethernet eth0: PHY [f802c000.ethernet-ffffffff:01] driver [Micrel KSZ8031] (irq=45)
[ 7.150491] macb f802c000.ethernet eth0: configuring for phy/rmii link mode
udhcpc: started, v1.33.1
udhcpc: sending discover
udhcpc: sending discover
udhcpc: sending discover
udhcpc: no lease, failing
FAIL
ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519
Starting sshd: OK
Welcome to Buildroot
buildroot login: root
root@buildroot:~# cat /sys/bus/usb-serial/devices/ttyUSB?/latency_timer
1
1
1
1
root@buildroot:~# cat inittest.sh
#! /bin/sh
echo "generating random file"
dd if=/dev/urandom of=testfile bs=1024 count=50000
root@buildroot:~# ./inittest.sh
generating random file
50000+0 records in
50000+0 records out
root@buildroot:~# cat /dev/ttyUSB0 &
root@buildroot:~# cat /dev/ttyUSB1 &
root@buildroot:~# cat /dev/ttyUSB2 &
root@buildroot:~# cat /dev/ttyUSB3 &
root@buildroot:~# cat runtest.sh
#! /bin/sh
while :; do cat testfile | sha256sum; done
root@buildroot:~# ./runtest.sh
abd6ded5a6eb1467e4b48909bfae35cea2191d417c3f27022954cee103c334ca -
98d03c79185168cbff6dc8db32e931061aa9e7c35025b7507f89faa208e12b6f -
1464940fc3cc527f89f153ec79ae7c8c892948ae013e6f54fba64664930e9ec4 -
98d03c79185168cbff6dc8db32e931061aa9e7c35025b7507f89faa208e12b6f -
326320e5a50777f8db772b6d06ac1beab246c32c66c75cefc0ace12f73394d68 -
d79664b5e2d461ce6617be24c1fbeab551b8fed0501e596ba09f1977b0fd70ee -
c362e254b14024fc46c4f18d7d10dc9424688c4d842ba6672361da12420a58fa -
be35c862a57e8a751af8517f3dc6f257ba1f18157b643ca3e8919f827e37e241 -
98d03c79185168cbff6dc8db32e931061aa9e7c35025b7507f89faa208e12b6f -
087eba1b603365320c9379391521791c5cd2ddce9a77e230ccb5bd67b2e856d0 -
22b6b0eb1d9428360fcd930c47bc41e566337a824bd66c5a468bfdf8adf89b36 -
^C
root@buildroot:~#
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-04 10:57 ` Peter Rosin
2022-03-04 11:12 ` Tudor.Ambarus
@ 2022-03-04 20:06 ` Saravana Kannan
1 sibling, 0 replies; 39+ messages in thread
From: Saravana Kannan @ 2022-03-04 20:06 UTC (permalink / raw)
To: Peter Rosin
Cc: linux-kernel, linux-arm-kernel, Nicolas Ferre, Alexandre Belloni,
Ludovic Desroches, Daniels Umanovskis, Greg Kroah-Hartman
On Fri, Mar 4, 2022 at 2:57 AM Peter Rosin <peda@axentia.se> wrote:
>
> On 2022-03-04 07:57, Peter Rosin wrote:
> > On 2022-03-04 04:55, Saravana Kannan wrote:
> >> On Thu, Mar 3, 2022 at 1:17 AM Peter Rosin <peda@axentia.se> wrote:
> >>>
> >>> On 2022-03-03 04:02, Saravana Kannan wrote:
> >>>> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@axentia.se> wrote:
> >>>>>
> >>>>> Hi!
> >>>>>
> >>>>> I'm seeing a weird problem, and I'd like some help with further
> >>>>> things to try in order to track down what's going on. I have
> >>>>> bisected the issue to
> >>>>>
> >>>>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
> >>>>
> >>>> I skimmed through your email and I'll read it more closely tomorrow,
> >>>> but it wasn't clear if you see this on Linus's tip of the tree too.
> >>>> Asking because of:
> >>>> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@huawei.com/
> >>>>
> >>>> Also, a couple of other data points that _might_ help. Try kernel
> >>>> command line option fw_devlink=permissive vs fw_devlink=on (I forget
> >>>> if this was the default by 5.10) vs fw_devlink=off.
> >>>>
> >>>> I'm expecting "off" to fix the issue for you. But if permissive vs on
> >>>> shows a difference driver issues would start becoming a real
> >>>> possibility.
> >>>>
> >>>> -Saravana
> >>>
> >>> Thanks for the quick reply! I don't think I tested the very tip of
> >>> Linus tree before, only latest rc or something like that, but now I
> >>> have. I.e.
> >>>
> >>> 5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
> >>>
> >>> It would have been typical if an issue that existed for a couple of
> >>> years had been fixed the last few weeks, but alas, no.
> >>>
> >>> On that kernel, and with whatever the default fw_devlink value is, the
> >>
> >> It's fw_devlink=on by default from at least 5.12-rc4 or so.
> >>
> >>> issue is there. It's a bit hard to tell if the incident probability
> >>> is the same when trying fw_devlink arguments, but roughly so, and I
> >>> do not have to wait for long to get a bad hash with the first
> >>> reproducer
> >>>
> >>> while :; do cat testfile | sha256sum; done
> >>>
> >>> The output is typical:
> >>> 78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
> >>> 1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> 7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
> >>> 212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> 7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
> >>> d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
> >>> 4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>> 9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
> >>> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> >>>
> >>> Setting fw_devlink=off makes no difference, AFAICT.
> >>
> >> By this, I'm assuming you set fw_devlink=off in the kernel command
> >> line and you still saw the corruption.
> >
> > Yes. On a bad kernel it's the same with all of the following kernel
> > command lines.
> >
> > console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=on ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
> >
> > console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=off ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
> >
> > console=ttyS0,115200 rw oops=panic panic=30 fw_devlink=permissive ip=none root=ubi0:rootfs ubi.mtd=6 rootfstype=ubifs noinitrd mtdparts=atmel_nand:256k(at91bootstrap),384k(barebox),256k@768k(bareboxenv),256k(bareboxenv2),128k@1536k(oftree),5M@2M(kernel),248M@8M(rootfs),-@256M(ovlfs)
> >
> >> If that's the case, I can't see how this could possibly have anything
> >> to do with:
> >> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
> >>
> >> If you look at fw_devlink_link_device(), you'll see that the function
> >> is NOP if fw_devlink=off (the !fw_devlink_flags check). And from
> >> there, the rest of the code in the series doesn't run because more
> >> fields wouldn't get set, etc. That pretty much disables ALL the code
> >> in the entire series. The only remaining diff would be header file
> >> changes where I add/remove fields. But that's unlikely to cause any
> >> issues here because I'm either deleting fields that aren't used or
> >> adding fields that won't be used (with fw_devlink=off). I think the
> >> patch was just causing enough timing changes that it's masking the
> >> real issue.
> >
> > When I compare fw_devlink_link_device() from before and after
> > f9aa460672c9 ("driver core: Refactor fw_devlink feature")
> > I notice that you also removed an unconditional call to
> > device_link_add_missing_supplier_links() that was live before,
> > regardless of any fw_devlink parameter.
> >
> > I don't know if that's relevant. Is it?
> >
> > Not knowing this code at all, and without any serious attempt
> > at reading it, from here the comment of that removed function
> > sure looks like it might cause a different ordering before and
> > after the patch that is not restored with any fw_devlink
> > argument.
>
> It appears that the device_link_add_missing_supplier_links() difference
> is not relevant after all. What actually happened in the header file in
> the "bad" commit was that two fields were removed (none added). Like so:
>
> struct dev_links_info {
> struct list_head suppliers;
> struct list_head consumers;
> - struct list_head needs_suppliers;
> struct list_head defer_sync;
> - bool need_for_probe;
> enum dl_dev_state status;
> };
>
> If I restore those fields on a bad kernel, the issue is no longer
> visible. That is true for the first bad kernel, i.e.
Ha... I thought this might be a possibility but I wasn't sure. Which
is why I kinda left it at:
"The only remaining diff would be header file
changes where I add/remove fields. But that's unlikely to cause any
issues here because I'm either deleting fields that aren't used or
adding fields that won't be used (with fw_devlink=off)."
Ok, at this point I'm going to ignore this thread. Call me out
explicitly if you want me to pay attention :)
-Saravana
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Regression: memory corruption on Atmel SAMA5D31
2022-03-03 0:29 Regression: memory corruption on Atmel SAMA5D31 Peter Rosin
2022-03-03 3:02 ` Saravana Kannan
@ 2022-03-04 8:00 ` Thorsten Leemhuis
1 sibling, 0 replies; 39+ messages in thread
From: Thorsten Leemhuis @ 2022-03-04 8:00 UTC (permalink / raw)
To: Peter Rosin, linux-kernel, linux-arm-kernel, Nicolas Ferre,
Alexandre Belloni, Ludovic Desroches, regressions
Cc: Saravana Kannan, Daniels Umanovskis, Greg Kroah-Hartman
[TLDR: I'm adding the regression report below to regzbot, the Linux
kernel regression tracking bot; all text you find below is compiled from
a few templates paragraphs you might have encountered already already
from similar mails.]
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.
CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
Thanks for the report.
To be sure below issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:
#regzbot ^introduced f9aa460672c9
#regzbot title memory corruption on Atmel SAMA5D31
#regzbot ignore-activity
Reminder for developers: when fixing the issue, please add a 'Link:'
tags pointing to the report (the mail quoted above) using
lore.kernel.org/r/, as explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'. This allows the bot to connect
the report with any patches posted or committed to fix the issue; this
again allows the bot to show the current status of regressions and
automatically resolve the issue when the fix hits the right tree.
I'm sending this to everyone that got the initial report, to make them
aware of the tracking. I also hope that messages like this motivate
people to directly get at least the regression mailing list and ideally
even regzbot involved when dealing with regressions, as messages like
this wouldn't be needed then. And don't worry, if I need to send other
mails regarding this regression only relevant for regzbot I'll send them
to the regressions lists only (with a tag in the subject so people can
filter them away). With a bit of luck no such messages will be needed
anyway.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.
On 03.03.22 01:29, Peter Rosin wrote:
> Hi!
>
> I'm seeing a weird problem, and I'd like some help with further
> things to try in order to track down what's going on. I have
> bisected the issue to
>
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>
> The symptoms are that I get (seemingly) random memory corruption
> when processing large amounts of data (compared to system size).
> I have two known reproducers, but I'm sure there are more if I
> keep digging. One is to do this:
>
> $ dd if=/dev/urandom of=testfile bs=1024 count=40000
> 40000+0 records in
> 40000+0 records out
> 40960000 bytes (41 MB, 39 MiB) copied, 19.7759 s, 2.1 MB/s
> $ for i in 1 2 3 4; do cat testfile | sha256sum; done
> d8c85f816e08baa5ad27050bf0413e11a09f325fb0a8843b7b2b45b9333ab542 -
> f223c1cbb6dbecb02d1741e7991dc98cd8d5b40ffee05bb32dc2c15eb73d6b1f -
> d6f3e7f3d325c67e83a6104934dd8a7c891ebfd9a2cf59633dbe97fb2cbb9c81 -
> cf8ada47e7e2fee299314440b225ba83fca3cef1f6286adc160a5d4f207caccd -
>
> It is harder to tickle the problem if I redirect the testfile to
> sha256sum w/o involving cat or give the file as an argument to
> sha256sum. I can also get things to behave better by getting rid
> of a bunch of USB interrupts by doing the following:
>
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB0/latency_timer
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB1/latency_timer
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB2/latency_timer
> $ echo 100 > /sys/bus/usb-serial/devices/ttyUSB3/latency_timer
>
> With the lower interrupt pressure I get this:
>
> $ for i in 1 2 3 4; do cat testfile | sha256sum; done
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
> 4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
>
> Nice. However, I need the latency to be lower than the default
> 16ms, 3ms could perhaps work in theory, but preferably 1ms, so
> the above 100ms is far off. The initial hash run was with latency
> set to 1ms, which makes it easy to trigger the issue. The latency
> timer setting is for this driver: drivers/usb/serial/ftdi_sio.c
>
> And also, that does not help with the other reproducer, namely
> to copy that same random testfile with scp to a working system...
>
> $ scp testfile peda@xyzzy:testfile1
> testfile 100% 39MB 2.0MB/s 00:19
> $ scp testfile peda@xyzzy:testfile2
> testfile 100% 39MB 2.1MB/s 00:18
> $ scp testfile peda@xyzzy:testfile3
> testfile 100% 39MB 2.1MB/s 00:18
> $ scp testfile peda@xyzzy:testfile4
> testfile 100% 39MB 2.1MB/s 00:19
>
> ...and then perform the sha256sum on that xyzzy host instead:
>
> $ sha256sum testfile?
> 39dc3a7d05483ae7a2c64c5ed2e8e6108287bf4ddf124a2f0c1a9d0221f9ac66 testfile1
> 9597ef542e7cce879872a027d9ec591feb5fc766aeaec47d58eff6e8c6ab3206 testfile2
> c6104a700b1d6f13eb1de84b5a91a1846a3e1576e052d51a664d2e2711a3869d testfile3
> 60b9c240cb331bad530c3c1d766f50d53a24e01831bfc04e48f329b738521310 testfile4
> $ sha256sum testfile?
> 39dc3a7d05483ae7a2c64c5ed2e8e6108287bf4ddf124a2f0c1a9d0221f9ac66 testfile1
> 9597ef542e7cce879872a027d9ec591feb5fc766aeaec47d58eff6e8c6ab3206 testfile2
> c6104a700b1d6f13eb1de84b5a91a1846a3e1576e052d51a664d2e2711a3869d testfile3
> 60b9c240cb331bad530c3c1d766f50d53a24e01831bfc04e48f329b738521310 testfile4
>
> Same output every time. Of course. xyzzy is a working system...
> Converting these files to hex (hexdump -C) and diffing yields this:
>
> $ diff -u0 testfile1.hex testfile2.hex
> --- testfile1.hex 2022-03-02 23:56:38.273149516 +0100
> +++ testfile2.hex 2022-03-03 00:00:57.912747033 +0100
> @@ -8658,2 +8658,2 @@
> -00021d10 08 2a dd c6 c8 0f 0d e2 4c 1e 46 21 f9 89 a2 54 |.*......L.F!...T|
> -00021d20 23 8c 4f f1 46 f1 61 05 ee f2 d2 ee 56 79 4f 28 |#.O.F.a.....VyO(|
> +00021d10 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
> +00021d20 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
> @@ -20592,2 +20592,2 @@
> -000506f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
> -00050700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
> +000506f0 ad e6 d5 65 e6 dc c1 a3 e2 ba c9 e2 61 39 5f 5f |...e........a9__|
> +00050700 bf eb 8e 5c 08 f1 f2 89 3c 57 c5 07 b9 f4 91 fc |...\....<W......|
> @@ -461019,2 +461019,2 @@
> -00708da0 0d 49 c3 e8 57 06 20 5a c1 27 74 29 f8 83 af 69 |.I..W. Z.'t)...i|
> -00708db0 94 4d 5b 71 9f 3e e5 d2 91 cc cb cd aa ff 44 8b |.M[q.>........D.|
> +00708da0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
> +00708db0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
> @@ -902788,2 +902788,2 @@
> -00dc6830 f2 41 23 1b ec 54 d5 fe f0 33 51 f7 d2 fc bf bd |.A#..T...3Q.....|
> -00dc6840 e5 1f 58 df 24 2f e3 dc 65 87 b2 27 12 86 d1 9a |..X.$/..e..'....|
> +00dc6830 44 82 94 b5 c9 26 08 42 bd 89 e1 96 41 66 8a b5 |D....&.B....Af..|
> +00dc6840 a5 34 46 5e fd 1b c1 73 86 33 24 fd 4d e1 e1 68 |.4F^...s.3$.M..h|
> @@ -931900,2 +931900,2 @@
> -00e383b0 ee 64 c5 6f 38 44 5b 31 41 e1 2c 64 49 d5 f8 ad |.d.o8D[1A.,dI...|
> -00e383c0 fb 85 52 4f 00 1f 80 7a f3 de ee 8e db ac d5 bb |..RO...z........|
> +00e383b0 4b 4d 29 a1 0a 99 8f f7 32 71 8c de 23 ca a0 f1 |KM).....2q..#...|
> +00e383c0 e2 af e3 c4 a0 95 d3 1c ed 58 c4 c5 30 da 56 b9 |.........X..0.V.|
> @@ -1170109,2 +1170109,2 @@
> -011dabc0 6a 7c 0c 3c 86 1a b6 48 50 d7 98 68 0c 01 e3 1c |j|.<...HP..h....|
> -011dabd0 a3 a8 b0 f2 62 21 86 b9 d1 52 9d 74 9e 26 42 51 |....b!...R.t.&BQ|
> +011dabc0 5b 1a 9e 23 ae 58 42 68 83 58 df d6 c1 57 6b b0 |[..#.XBh.X...Wk.|
> +011dabd0 ec d5 50 8b 76 5e 96 b4 49 21 f7 e4 b7 8f a3 45 |..P.v^..I!.....E|
> @@ -1880164,2 +1880164,2 @@
> -01cb0630 1c 74 74 16 75 b4 de f7 ce 4b 5e 4d 97 d6 36 d4 |.tt.u....K^M..6.|
> -01cb0640 44 d9 fd 69 c5 d0 f0 a6 c6 44 26 53 7f 91 f3 62 |D..i.....D&S...b|
> +01cb0630 73 bc 40 ce f8 9d 99 91 1b 14 8b a8 52 2a 7b 39 |s.@.........R*{9|
> +01cb0640 6b ff f5 c5 02 b9 ab c2 c2 08 5e e7 3a 5e 69 c4 |k.........^.:^i.|
>
> Grepping (some of the above) for duplicates yields this:
>
> $ egrep "0 (08 2a dd|23 8c 4f|7b c8 d2|2a f8 fb)" testfile1.hex
> 00020d40 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
> 00020d50 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
> 00021d10 08 2a dd c6 c8 0f 0d e2 4c 1e 46 21 f9 89 a2 54 |.*......L.F!...T|
> 00021d20 23 8c 4f f1 46 f1 61 05 ee f2 d2 ee 56 79 4f 28 |#.O.F.a.....VyO(|
> $ egrep "0 (08 2a dd|23 8c 4f|7b c8 d2|2a f8 fb)" testfile2.hex
> 00020d40 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|
> 00020d50 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|
> 00021d10 7b c8 d2 0b f4 ca 5f ba 61 b3 93 04 59 8f ed bf |{....._.a...Y...|*
> 00021d20 2a f8 fb 0c ad 0e 23 2a 3e cf d3 10 02 ef 04 b9 |*.....#*>.......|*
>
> $ egrep "0 (1f 6c ca|8b 6d 86|ad e6 d5|bf eb 8e)" testfile1.hex
> 0004f6f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
> 0004f700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
> 000506f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|*
> 00050700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|*
> $ egrep "0 (1f 6c ca|8b 6d 86|ad e6 d5|bf eb 8e)" testfile2.hex
> 0004f6f0 1f 6c ca 6b a6 2a 39 a6 1f bd b0 67 5b 22 1a dd |.l.k.*9....g["..|
> 0004f700 8b 6d 86 7c 87 37 ee a8 46 4d e5 79 0e 3e 96 e6 |.m.|.7..FM.y.>..|
> 000506f0 ad e6 d5 65 e6 dc c1 a3 e2 ba c9 e2 61 39 5f 5f |...e........a9__|
> 00050700 bf eb 8e 5c 08 f1 f2 89 3c 57 c5 07 b9 f4 91 fc |...\....<W......|
>
> $ egrep "0 (0d 49 c3|94 4d 5b|d3 b4 96|5f f4 10 92)" testfile1.hex
> 00707dd0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
> 00707de0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
> 00708da0 0d 49 c3 e8 57 06 20 5a c1 27 74 29 f8 83 af 69 |.I..W. Z.'t)...i|
> 00708db0 94 4d 5b 71 9f 3e e5 d2 91 cc cb cd aa ff 44 8b |.M[q.>........D.|
> $ egrep "0 (0d 49 c3|94 4d 5b|d3 b4 96|5f f4 10 92)" testfile2.hex
> 00707dd0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|
> 00707de0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|
> 00708da0 d3 b4 96 d6 40 8d 79 67 69 68 fd 10 b4 15 82 e6 |....@.ygih......|*
> 00708db0 5f f4 10 92 ae 39 9d 92 42 88 44 3b be 35 38 33 |_....9..B.D;.583|*
>
> I.e. testfile1 is (probably) corrupted at 000506f0..70f while
> testfile2 is (probably) corrupted at 00021d10..2f and 00708da0..bf
> (correpted lines marked with hand-made asterisks above)
>
> If I keep grepping like this, the pattern is similar both within
> these files and within testfile3 and testfile4. I.e. with
> corruptions in 32-byte blocks at (seemingly) random positions
> in the files. The corruption is always 16-byte-aligned and the bad
> data seems to be a copy from exactly one page up in the file.
>
> As stated above, I have bisected the issue to patch
>
> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>
> which was added between v5.10-rc3 and v5.10-rc4. Every kernel I have
> tried with that patch applied have exhibited the issue, and I have
> had no trouble like this with any kernel without that patch. Apart
> from a whole bunch of kernels prior to v5.10-rc3, that includes some
> later kernels with the patch reverted (along with the dependent
> followup 2d09e6eb4a6f). The latest I have tried is 5.11.22. Those
> two patches does not revert cleanly in 5.12 (and thereafter) so I
> have not tried anything beyond 5.11 with the patch reverted.
>
> I fail to understand how that patch might cause this issue. I have
> compared boot messages before and after the patch and there is no
> (significant) difference. Everything seems to happen in the same
> order with the same result. But that comparison is of course limited
> to what is logged.
>
> In some random attempt I tried to disable the D-Cache bit, and that
> makes it all very slow but it also (seemingly) fixes the issue. But
> that may of course be due to vastly different timings.
>
> Some background:
>
> We have a "Linea" CPU module, with a design based on the Atmel (now
> Microchip) SAMA5D31 evaluation board. This CPU module is used on e.g.
> our TSE-850 for which there is a device tree in
> arch/arm/boot/dts/at91-tse850-3.dts
> It has a nand flash for the rootfs and 64 MB RAM. The 40 MB random
> testfile is thus big enough to cause page cache churn.
>
> We have used this module in thousands of delivered units (however,
> not that many TSE-850) and have never observed anything like this
> before. But that has been with older kernels. 4.13.<something> and
> 4.15.<something> was what we were on until this recent activity.
>
> We're now developing a new product (preliminary device tree included)
> and the trusty old CPU module was used again and a fresh new kernel
> was built for it. I then started to notice this issue and have tried
> to include as much relevant data as possible. If you need more data
> or would like me to test something, please ask.
>
> I'm stumped.
>
> Cheers,
> Peter
--
Additional information about regzbot:
If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:
https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
The last two documents will explain how you can interact with regzbot
yourself if your want to.
Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.
Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2022-07-31 3:44 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 0:29 Regression: memory corruption on Atmel SAMA5D31 Peter Rosin
2022-03-03 3:02 ` Saravana Kannan
2022-03-03 9:17 ` Peter Rosin
2022-03-04 3:55 ` Saravana Kannan
2022-03-04 6:57 ` Peter Rosin
2022-03-04 10:57 ` Peter Rosin
2022-03-04 11:12 ` Tudor.Ambarus
2022-03-04 12:38 ` Peter Rosin
2022-03-04 16:48 ` Tudor.Ambarus
2022-03-07 9:45 ` Tudor.Ambarus
2022-03-07 11:32 ` Peter Rosin
2022-03-07 20:32 ` Peter Rosin
2022-03-08 7:55 ` Nicolas Ferre
2022-03-09 8:30 ` Peter Rosin
[not found] ` <6d9561a4-39e4-3dbe-5fe2-c6f88ee2a4c6@axentia.se>
[not found] ` <ed24a281-1790-8e24-5f5a-25b66527044b@microchip.com>
[not found] ` <d563c7ba-6431-2639-9f2a-2e2c6788e625@axentia.se>
[not found] ` <e5a715c5-ad9f-6fd4-071e-084ab950603e@microchip.com>
2022-03-10 9:58 ` Peter Rosin
2022-03-10 10:40 ` Peter Rosin
2022-04-09 13:02 ` Thorsten Leemhuis
2022-04-11 6:21 ` Tudor.Ambarus
2022-05-17 14:50 ` Peter Rosin
2022-05-18 6:21 ` Tudor.Ambarus
2022-05-18 7:51 ` Peter Rosin
2022-06-20 7:04 ` Thorsten Leemhuis
2022-06-20 8:43 ` Tudor.Ambarus
2022-06-20 14:22 ` Tudor.Ambarus
2022-06-21 7:00 ` Peter Rosin
2022-06-21 10:46 ` Peter Rosin
2022-06-27 12:26 ` Tudor.Ambarus
2022-06-27 16:53 ` Tudor.Ambarus
2022-06-30 5:20 ` Peter Rosin
2022-06-30 9:23 ` Tudor.Ambarus
2022-06-30 10:20 ` Tudor.Ambarus
2022-07-13 16:01 ` Tudor.Ambarus
2022-07-28 7:45 ` Tudor.Ambarus
2022-07-28 8:39 ` Tudor.Ambarus
2022-07-29 20:09 ` Peter Rosin
2022-07-30 11:37 ` Peter Rosin
2022-07-31 3:44 ` Tudor.Ambarus
2022-03-04 20:06 ` Saravana Kannan
2022-03-04 8:00 ` Thorsten Leemhuis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).