stmmac on Banana PI CPU stalls since Linux 6.6

* stmmac on Banana PI CPU stalls since Linux 6.6
@ 2024-01-21 20:17 Marc Haber
  2024-01-21 21:52 ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Haber @ 2024-01-21 20:17 UTC (permalink / raw)
  To: alexandre.torgue, Jose Abreu, Chen-Yu Tsai, Jernej Skrabec,
	Samuel Holland, Jisheng Zhang, netdev

Hi,

I am running a bunch of Banana Pis with Debian stable and unstable but
with a bleeding edge kernel. Since kernel 6.6, especially the test
system running Debian unstable is plagued by self-detected stalls on
CPU. The system seems to continue running normally locally but doesn't
answer on the network any more. Sometimes, after a few hours, things
heal themselves.

Here is an example log output:
[73929.363030] rcu: INFO: rcu_sched self-detected stall on CPU
[73929.368653] rcu:     1-....: (5249 ticks this GP) idle=d15c/1/0x40000002 softirq=471343/471343 fqs=2625
[73929.377796] rcu:     (t=5250 jiffies g=851349 q=113 ncpus=2)
[73929.383205] CPU: 1 PID: 14512 Comm: atop Tainted: G             L     6.6.0-zgbpi-armmp-lpae+ #1
[73929.383222] Hardware name: Allwinner sun7i (A20) Family
[73929.383233] PC is at stmmac_get_stats64+0x64/0x20c [stmmac]
[73929.383363] LR is at dev_get_stats+0x44/0x144
[73929.383389] pc : [<bf126db0>]    lr : [<c09525e8>]    psr: 200f0013
[73929.383401] sp : f0c59c78  ip : f0c59df8  fp : c2bb8000
[73929.383412] r10: 00800001  r9 : c3443dd8  r8 : 00000143
[73929.383423] r7 : 00000001  r6 : 00000000  r5 : c2bbb000  r4 : 00000001
[73929.383434] r3 : 0004c891  r2 : c2bbae48  r1 : f0c59d30  r0 : c2bb8000
[73929.383447] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[73929.383463] Control: 30c5387d  Table: 49b553c0  DAC: a7f66f60
[73929.383486]  stmmac_get_stats64 [stmmac] from dev_get_stats+0x44/0x144
[73929.383564]  dev_get_stats from dev_seq_printf_stats+0x40/0x194
[73929.383593]  dev_seq_printf_stats from dev_seq_show+0x18/0x4c
[73929.383617]  dev_seq_show from seq_read_iter+0x3c4/0x57c
[73929.383647]  seq_read_iter from seq_read+0x9c/0xdc
[73929.383674]  seq_read from proc_reg_read+0xb0/0xe4
[73929.383706]  proc_reg_read from vfs_read+0xa8/0x2f4
[73929.383735]  vfs_read from ksys_read+0x78/0x10c
[73929.383757]  ksys_read from ret_fast_syscall+0x0/0x4c
[73929.383781] Exception stack(0xf0c59fa8 to 0xf0c59ff0)
[73929.383800] 9fa0:                   024b7190 00000498 00000003 024cac10 00000400 00000001
[73929.383817] 9fc0: 024b7190 00000498 b6ef6d20 00000003 0000000a be9eb15c 00000000 00000000
[73929.383831] 9fe0: 00000003 be9eb030 b6e90eeb b6e0ab06

The issue is still present in Linux 6.7. I tried transplanting the stmmac
sub directory from Linux 6.5 to Linux 6.6, but the changes were too big,
the result doesn't even build.

I am running a bisect attempt since before christmas, but since it takes
up to a day for the issue to show themselves on a "bad" kernel, I'll let
"good" kernels run for four days until I declare them good. That takes a
lot of wall clock (or better, wall calendar) time.

If you might have some ideas why this is happening on my Banana Pis,
I'm open to suggestions. Tentative patches against 6.6.$HIGH or
6.7.$CURRENT would be appreciated as well.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 18+ messages in thread