> Would be interesting if you only attache 1 mcp2518fd to the board and then re-run the test. The CM4 is the "raspberry pi 4 _compute_ module". So same "hardware" as the standard PI4 but with only one interface attached (on a professional PCB). Error also happens there. > You said you made some modifications to the kernel, also it would be good to use the _extact_ version you using to reproduce the error. Does this answer this question?: root@raspberrypi:~/linux-6.0# git remote -v origin https://github.com/raspberrypi/linux (fetch) origin https://github.com/raspberrypi/linux (push) root@raspberrypi:~/linux-6.0# git status On branch rpi-6.0.y Your branch is up to date with 'origin/rpi-6.0.y'. Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c no changes added to commit (use "git add" and/or "git commit -a") root@raspberrypi:~/linux-6.0# git diff diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c index 68df6d464..5eab9dd86 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c @@ -648,7 +648,7 @@ static u8 mcp251xfd_get_normal_mode(const struct mcp251xfd_priv *priv) u8 mode; if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK) - mode = MCP251XFD_REG_CON_MODE_INT_LOOPBACK; + mode = MCP251XFD_REG_CON_MODE_EXT_LOOPBACK; else if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY) mode = MCP251XFD_REG_CON_MODE_LISTENONLY; else if (priv->can.ctrlmode & CAN_CTRLMODE_FD) Regarding my test-program: I used to cross compile it. I moved it to the PI and compiled it natively with the Makefile.standalone that I send to you. Surprisingly I found only 1 of my 4 instances failed after 72 hours. When I added -g -O2 flags, the natively build programs failed 3 out of 4 within 26 hours (similar to the cross-build binaries: gcc-6.4 vs gcc-10.2). > Can you run this in parallel to the test. Abort with Ctrl+c after the test fails and send me the log file. The last 256 lines should be enough. Sure, but logging for 24h will be to much for my disk space. I found no utility to handle this, so I quick-hacked rbuflog https://gist.github.com/stefanalt/dd4e68490d1a4e2a343b0beaa1b0d230. The result was so bizarre, that I finally also added at normal log which I continuously shrinked by running the following command in parallel: candump -D can0,0:0,#FFFFFFFF | tee candump0_tee | rbuflog -n 200 -d 10000 candump0 loopit -d 10 bash -c 'if [ $( du -k candump0_tee | cut -f1 ) -gt 7000 ] ; then cat /dev/null > candump0_tee ; fi' After the fail I saved all logs/registers and sent a test message through the interface (cansend can0 555#55555555). After that I re-saved everything and attached it as zip. The funny part is marked below. Some messages appear 4 times, where all other message appear only twice. I have seen this happening several times. can0 2A5 [16] 00 02 67 A6 C8 CE 0D 37 F9 63 56 62 F6 2B E4 02 can0 2A5 [16] 00 02 67 A6 C8 CE 0D 37 F9 63 56 62 F6 2B E4 02 can0 2A5 [16] 00 03 C5 50 54 BE E1 16 7A 3F 70 B8 EE 5A 09 67 can0 2A5 [16] 00 03 C5 50 54 BE E1 16 7A 3F 70 B8 EE 5A 09 67 can0 2A5 [16] 00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8 ** can0 2A5 [16] 00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8 ** can0 2A5 [16] 00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8 ** can0 2A5 [16] 00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8 ** can0 2A5 [16] 00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94 ** can0 2A5 [16] 00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94 ** can0 2A5 [16] 00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94 ** can0 2A5 [16] 00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94 ** can0 2A5 [16] 00 02 0F 8D FC D0 57 48 F9 C8 5D EF 46 A9 DF 27 can0 2A5 [16] 00 02 0F 8D FC D0 57 48 F9 C8 5D EF 46 A9 DF 27 can0 2A5 [16] 00 03 4B 31 FF 18 67 D9 AA ED 07 FB 55 4B C6 FB can0 2A5 [16] 00 03 4B 31 FF 18 67 D9 AA ED 07 FB 55 4B C6 FB can0 555 [4] 55 55 55 55 can0 555 [4] 55 55 55 55 Beside of that, the messages match with what my application has sent (rx-ed messages removed from the following lines, see the zip for that) 0: TX72 (003/002) 2A5#00 03 4B 31 FF 18 67 D9 AA ED 07 FB 55 4B C6 FB 0: TX72 (002/001) 2A5#00 02 0F 8D FC D0 57 48 F9 C8 5D EF 46 A9 DF 27 0: TX72 (001/000) 2A5#00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94 0: TX72 (000/000) 2A5#00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8 And one more observation: Once or twice I have seen my application fail with not returning MTU size in the socketcan read: + ./sctestself -b -n 4 -l 999 -t 2 -v cmperr,logmsg -F refilldata,leastdots,allowintloopb,stoponerror -d 16 can1 CAN selftest can1 . ERROR: recvfrom: ret=16, errno=0 Is there any obvious explanation for this? I tried to add more output for this case, but it has not happed until then. --Stefan