All of lore.kernel.org
 help / color / mirror / Atom feed
* about rcu_preempt detected stalls on CPUs/tasks
@ 2023-02-15 10:03 Yao,Yongxian
  2023-02-15 10:49 ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: Yao,Yongxian @ 2023-02-15 10:03 UTC (permalink / raw)
  To: xenomai


Hi,
 
Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
Please give me some hints. 

Thanks.

--
Yao Yongxian

Feb 13 09:24:55 debian kernel: [215115.095276] sched: RT throttling activated
Feb 14 12:33:43 debian kernel: [312842.843747] CIFS VFS: Server 192.168.14.1 has not responded in 6 seconds. Reconnecting...
Feb 14 12:33:54 debian kernel: [312854.111845] CIFS VFS: Server 192.168.14.1 has not responded in 6 seconds. Reconnecting...
Feb 14 12:34:47 debian kernel: [312906.193897] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Feb 14 12:34:47 debian kernel: [312906.200105] rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P1626 P1760
Feb 14 12:34:47 debian kernel: [312906.207246] rcu: 	(detected by 0, t=5255 jiffies, g=13178013, q=508)
Feb 14 12:34:47 debian kernel: [312906.213739] McsfContainerBa R  running task        0  1626   1376 0x00000000
Feb 14 12:34:47 debian kernel: [312906.221054] Call Trace:
Feb 14 12:34:47 debian kernel: [312906.223599]  ? __schedule+0x2ac/0x800
Feb 14 12:34:47 debian kernel: [312906.227365]  ? ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.231992]  preempt_schedule_common+0x7b/0x90
Feb 14 12:34:47 debian kernel: [312906.236793]  ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.241084]  __local_bh_enable_ip+0x45/0x80
Feb 14 12:34:47 debian kernel: [312906.245356]  ip_finish_output2+0x19b/0x3f0
Feb 14 12:34:47 debian kernel: [312906.249544]  ? update_curr+0xeb/0x1d0
Feb 14 12:34:47 debian kernel: [312906.253452]  ? __update_load_avg_se+0x1f1/0x270
Feb 14 12:34:47 debian kernel: [312906.258069]  ? ip_finish_output+0xcd/0x1b0
Feb 14 12:34:47 debian kernel: [312906.262378]  ip_output+0x69/0x100
Feb 14 12:34:47 debian kernel: [312906.266051]  __ip_queue_xmit+0x14b/0x3c0
Feb 14 12:34:47 debian kernel: [312906.270056]  __tcp_transmit_skb+0x507/0xab0
Feb 14 12:34:47 debian kernel: [312906.274493]  tcp_write_xmit+0x371/0xfb0
Feb 14 12:34:47 debian kernel: [312906.278411]  __tcp_push_pending_frames+0x28/0x90
Feb 14 12:34:47 debian kernel: [312906.283108]  tcp_sendmsg_locked+0xbfe/0xd10
Feb 14 12:34:47 debian kernel: [312906.287374]  tcp_sendmsg+0x22/0x40
Feb 14 12:34:47 debian kernel: [312906.290997]  sock_sendmsg+0x2b/0x40
Feb 14 12:34:47 debian kernel: [312906.294652]  __sys_sendto+0xe9/0x150
Feb 14 12:34:47 debian kernel: [312906.298312]  ? hrtimer_nanosleep+0xc5/0x1e0
Feb 14 12:34:47 debian kernel: [312906.302931]  ? kern_select+0xc2/0x100
Feb 14 12:34:47 debian kernel: [312906.306777]  __x64_sys_sendto+0x1f/0x30
Feb 14 12:34:47 debian kernel: [312906.310694]  do_syscall_64+0x5b/0x170
Feb 14 12:34:47 debian kernel: [312906.314457]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 14 12:34:47 debian kernel: [312906.319595] RIP: 0033:0x7f771aba493e
Feb 14 12:34:47 debian kernel: [312906.323298] Code: Bad RIP value.
Feb 14 12:34:47 debian kernel: [312906.326615] RSP: 002b:00007f76084a5990 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
Feb 14 12:34:47 debian kernel: [312906.334501] RAX: ffffffffffffffda RBX: 000000000000001c RCX: 00007f771aba493e
Feb 14 12:34:47 debian kernel: [312906.341712] RDX: 0000000000000024 RSI: 0000558ac4f3f880 RDI: 000000000000001c
Feb 14 12:34:47 debian kernel: [312906.349018] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Feb 14 12:34:47 debian kernel: [312906.356506] R10: 0000000000004000 R11: 0000000000000246 R12: 0000558ac4f3f880
Feb 14 12:34:47 debian kernel: [312906.363717] R13: 0000000000000024 R14: 0000000000004000 R15: 0000000010000000
Feb 14 12:34:47 debian kernel: [312906.370929] McsfContainerBa R  running task        0  1760   1376 0x00000000
Feb 14 12:34:47 debian kernel: [312906.378095] Call Trace:
Feb 14 12:34:47 debian kernel: [312906.380679]  ? __schedule+0x2ac/0x800
Feb 14 12:34:47 debian kernel: [312906.384555]  ? reschedule_interrupt+0xd/0x40
Feb 14 12:34:47 debian kernel: [312906.389103]  ? ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.393453]  preempt_schedule_common+0x7b/0x90
Feb 14 12:34:47 debian kernel: [312906.398064]  ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.402244]  get_mem_cgroup_from_mm.part.40+0xbd/0xd0
Feb 14 12:34:47 debian kernel: [312906.407383]  mem_cgroup_try_charge+0x51/0x1e0
Feb 14 12:34:47 debian kernel: [312906.411829]  mem_cgroup_try_charge_delay+0x17/0x40
Feb 14 12:34:47 debian kernel: [312906.416812]  __handle_mm_fault+0xa35/0xf30
Feb 14 12:34:47 debian kernel: [312906.421000]  handle_mm_fault+0x110/0x240
Feb 14 12:34:47 debian kernel: [312906.425009]  __get_user_pages+0x233/0x670
Feb 14 12:34:47 debian kernel: [312906.429274]  populate_vma_page_range+0x68/0x70
Feb 14 12:34:47 debian kernel: [312906.433816]  __ipipe_pin_vma+0x85/0xa0
Feb 14 12:34:47 debian kernel: [312906.437648]  change_protection+0x75d/0x790
Feb 14 12:34:47 debian kernel: [312906.441833]  mprotect_fixup+0x1ac/0x2f0
Feb 14 12:34:47 debian kernel: [312906.445924]  ? vm_mmap_pgoff+0x10b/0x120
Feb 14 12:34:47 debian kernel: [312906.449937]  do_mprotect_pkey+0x1af/0x2f0
Feb 14 12:34:47 debian kernel: [312906.454070]  __x64_sys_mprotect+0x16/0x20
Feb 14 12:34:47 debian kernel: [312906.458283]  do_syscall_64+0x5b/0x170
Feb 14 12:34:47 debian kernel: [312906.462277]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 14 12:34:47 debian kernel: [312906.467582] RIP: 0033:0x7f7716ab9207
Feb 14 12:34:47 debian kernel: [312906.471248] Code: Bad RIP value.
Feb 14 12:34:47 debian kernel: [312906.474568] RSP: 002b:00007f75d4d23e08 EFLAGS: 00000206 ORIG_RAX: 000000000000000a
Feb 14 12:34:47 debian kernel: [312906.482497] RAX: ffffffffffffffda RBX: 00007f75c94d5700 RCX: 00007f7716ab9207
Feb 14 12:34:47 debian kernel: [312906.490028] RDX: 0000000000000007 RSI: 0000000000800000 RDI: 00007f75c8cd6000
Feb 14 12:34:47 debian kernel: [312906.497239] RBP: 00007f75d4d23ee0 R08: 00000000ffffffff R09: 0000000000000000
Feb 14 12:34:47 debian kernel: [312906.504553] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000801000
Feb 14 12:34:47 debian kernel: [312906.511903] R13: 00007f75d4d23e70 R14: 0000000000001000 R15: 00007f761c002cd0
Feb 14 12:34:47 debian kernel: [312906.519454] McsfContainerBa R  running task        0  1626   1376 0x00000000
Feb 14 12:34:47 debian kernel: [312906.526654] Call Trace:
Feb 14 12:34:47 debian kernel: [312906.529186]  ? __schedule+0x2ac/0x800
Feb 14 12:34:47 debian kernel: [312906.533085]  ? ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.537557]  preempt_schedule_common+0x7b/0x90
Feb 14 12:34:47 debian kernel: [312906.542176]  ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.546423]  __local_bh_enable_ip+0x45/0x80
Feb 14 12:34:47 debian kernel: [312906.550732]  ip_finish_output2+0x19b/0x3f0
Feb 14 12:34:47 debian kernel: [312906.554918]  ? update_curr+0xeb/0x1d0
Feb 14 12:34:47 debian kernel: [312906.558669]  ? __update_load_avg_se+0x1f1/0x270
Feb 14 12:34:47 debian kernel: [312906.563340]  ? ip_finish_output+0xcd/0x1b0
Feb 14 12:34:47 debian kernel: [312906.567527]  ip_output+0x69/0x100
Feb 14 12:34:47 debian kernel: [312906.570926]  __ip_queue_xmit+0x14b/0x3c0
Feb 14 12:34:47 debian kernel: [312906.574937]  __tcp_transmit_skb+0x507/0xab0
Feb 14 12:34:47 debian kernel: [312906.579228]  tcp_write_xmit+0x371/0xfb0
Feb 14 12:34:47 debian kernel: [312906.583440]  __tcp_push_pending_frames+0x28/0x90
Feb 14 12:34:47 debian kernel: [312906.588147]  tcp_sendmsg_locked+0xbfe/0xd10
Feb 14 12:34:47 debian kernel: [312906.592462]  tcp_sendmsg+0x22/0x40
Feb 14 12:34:47 debian kernel: [312906.596076]  sock_sendmsg+0x2b/0x40
Feb 14 12:34:47 debian kernel: [312906.599906]  __sys_sendto+0xe9/0x150
Feb 14 12:34:47 debian kernel: [312906.603565]  ? hrtimer_nanosleep+0xc5/0x1e0
Feb 14 12:34:47 debian kernel: [312906.608218]  ? kern_select+0xc2/0x100
Feb 14 12:34:47 debian kernel: [312906.612258]  __x64_sys_sendto+0x1f/0x30
Feb 14 12:34:47 debian kernel: [312906.616183]  do_syscall_64+0x5b/0x170
Feb 14 12:34:47 debian kernel: [312906.620169]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 14 12:34:47 debian kernel: [312906.625301] RIP: 0033:0x7f771aba493e
Feb 14 12:34:47 debian kernel: [312906.628967] Code: Bad RIP value.
Feb 14 12:34:47 debian kernel: [312906.632278] RSP: 002b:00007f76084a5990 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
Feb 14 12:34:47 debian kernel: [312906.639931] RAX: ffffffffffffffda RBX: 000000000000001c RCX: 00007f771aba493e
Feb 14 12:34:47 debian kernel: [312906.647245] RDX: 0000000000000024 RSI: 0000558ac4f3f880 RDI: 000000000000001c
Feb 14 12:34:47 debian kernel: [312906.654455] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Feb 14 12:34:47 debian kernel: [312906.661813] R10: 0000000000004000 R11: 0000000000000246 R12: 0000558ac4f3f880
Feb 14 12:34:47 debian kernel: [312906.669111] R13: 0000000000000024 R14: 0000000000004000 R15: 0000000010000000
Feb 14 12:34:47 debian kernel: [312906.676339] McsfContainerBa R  running task        0  1760   1376 0x00000000
Feb 14 12:34:47 debian kernel: [312906.683680] Call Trace:
Feb 14 12:34:47 debian kernel: [312906.686212]  ? __schedule+0x2ac/0x800
Feb 14 12:34:47 debian kernel: [312906.689965]  ? reschedule_interrupt+0xd/0x40
Feb 14 12:34:47 debian kernel: [312906.694323]  ? ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.698854]  preempt_schedule_common+0x7b/0x90
Feb 14 12:34:47 debian kernel: [312906.703378]  ___preempt_schedule+0x16/0x18
Feb 14 12:34:47 debian kernel: [312906.707557]  get_mem_cgroup_from_mm.part.40+0xbd/0xd0
Feb 14 12:34:47 debian kernel: [312906.712706]  mem_cgroup_try_charge+0x51/0x1e0
Feb 14 12:34:47 debian kernel: [312906.717143]  mem_cgroup_try_charge_delay+0x17/0x40
Feb 14 12:34:47 debian kernel: [312906.722135]  __handle_mm_fault+0xa35/0xf30
Feb 14 12:34:47 debian kernel: [312906.726391]  handle_mm_fault+0x110/0x240
Feb 14 12:34:47 debian kernel: [312906.730663]  __get_user_pages+0x233/0x670
Feb 14 12:34:47 debian kernel: [312906.734876]  populate_vma_page_range+0x68/0x70
Feb 14 12:34:47 debian kernel: [312906.739409]  __ipipe_pin_vma+0x85/0xa0
Feb 14 12:34:47 debian kernel: [312906.743247]  change_protection+0x75d/0x790
Feb 14 12:34:47 debian kernel: [312906.747427]  mprotect_fixup+0x1ac/0x2f0
Feb 14 12:34:47 debian kernel: [312906.751352]  ? vm_mmap_pgoff+0x10b/0x120
Feb 14 12:34:47 debian kernel: [312906.755424]  do_mprotect_pkey+0x1af/0x2f0
Feb 14 12:34:47 debian kernel: [312906.759619]  __x64_sys_mprotect+0x16/0x20
Feb 14 12:34:47 debian kernel: [312906.763848]  do_syscall_64+0x5b/0x170
Feb 14 12:34:47 debian kernel: [312906.767593]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 14 12:34:47 debian kernel: [312906.772836] RIP: 0033:0x7f7716ab9207
Feb 14 12:34:47 debian kernel: [312906.776518] Code: Bad RIP value.
Feb 14 12:34:47 debian kernel: [312906.779829] RSP: 002b:00007f75d4d23e08 EFLAGS: 00000206 ORIG_RAX: 000000000000000a
Feb 14 12:34:47 debian kernel: [312906.787778] RAX: ffffffffffffffda RBX: 00007f75c94d5700 RCX: 00007f7716ab9207
Feb 14 12:34:47 debian kernel: [312906.795301] RDX: 0000000000000007 RSI: 0000000000800000 RDI: 00007f75c8cd6000
Feb 14 12:34:47 debian kernel: [312906.802623] RBP: 00007f75d4d23ee0 R08: 00000000ffffffff R09: 0000000000000000
Feb 14 12:34:47 debian kernel: [312906.810283] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000801000
Feb 14 12:34:47 debian kernel: [312906.817659] R13: 00007f75d4d23e70 R14: 0000000000001000 R15: 00007f761c002cd0
Feb 14 12:35:24 debian kernel: [312944.109229] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
Feb 14 12:35:24 debian kernel: [312944.120609] clocksource:                       'hpet' wd_now: b44cf4bf wd_last: 1e5c728f mask: ffffffff
Feb 14 12:35:24 debian kernel: [312944.130454] clocksource:                       'tsc' cs_now: 400e384aa8504 cs_last: 4008ba9be40cc mask: ffffffffffffffff
Feb 14 12:35:24 debian kernel: [312944.141397] tsc: Marking TSC unstable due to clocksource watchdog
Feb 14 12:37:25 debian kernel: [313065.064451] SCC_WaitSysEvent[1450]: The calling task has been unblocked by a signal
Feb 14 12:37:25 debian kernel: [313065.072255] SCC_IoctlBoard[1210]: Process IO command[0x80085328] failed, ret[-4]
Feb 14 12:37:25 debian kernel: [313065.079923] DACO_WaitSysEvent[1335]: The calling task has been unblocked by a signal
Feb 14 12:37:25 debian kernel: [313065.087794] DACO_IoctlBoard[1834]: Process IO command[0x80084453] Failed, ret[-4]
Feb 14 12:37:25 debian kernel: [313065.095385] TSMU_WaitSysEvent[2110]: The calling task has been unblocked by a signal
Feb 14 12:37:25 debian kernel: [313065.103394] TSMU_IoctlBoard[2741]: Process IO command[0x80085467] Failed, ret[-4]
Feb 14 12:37:38 debian kernel: [313078.365928] INFO: task McsfContainerBa:1545 blocked for more than 120 seconds.
Feb 14 12:37:38 debian kernel: [313078.373418]       Tainted: P           O      4.19.89-ipipe-9-xenomai-3.1-20201027 #1
Feb 14 12:37:38 debian kernel: [313078.381671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 12:37:38 debian kernel: [313078.389716] McsfContainerBa D    0  1545   1375 0x00000000
Feb 14 12:37:38 debian kernel: [313078.395360] Call Trace:
Feb 14 12:37:38 debian kernel: [313078.398017]  ? __schedule+0x2ac/0x800
Feb 14 12:37:38 debian kernel: [313078.401808]  schedule+0x67/0x90
Feb 14 12:37:38 debian kernel: [313078.405141]  rwsem_down_read_failed+0xe0/0x150
Feb 14 12:37:38 debian kernel: [313078.409732]  ? __do_softirq+0x159/0x330
Feb 14 12:37:38 debian kernel: [313078.414019]  call_rwsem_down_read_failed+0x14/0x30
Feb 14 12:37:38 debian kernel: [313078.418959]  down_read+0xe/0x30
Feb 14 12:37:38 debian kernel: [313078.422273]  __do_page_fault+0x481/0x570
Feb 14 12:37:38 debian kernel: [313078.426391]  ? __ipipe_trap_prologue+0xef/0x200
Feb 14 12:37:38 debian kernel: [313078.431075]  page_fault+0x43/0x5b
Feb 14 12:37:38 debian kernel: [313078.434551] RIP: 0033:0x7fe8656c14ac
Feb 14 12:37:38 debian kernel: [313078.438281] Code: Bad RIP value.
Feb 14 12:37:38 debian kernel: [313078.441640] RSP: 002b:00007fe81a532670 EFLAGS: 00010202
Feb 14 12:37:38 debian kernel: [313078.447060] RAX: ffffffffffffff92 RBX: 00007fe81c096e90 RCX: 00007fe8656c135b
Feb 14 12:37:38 debian kernel: [313078.454396] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 0000000000000000
Feb 14 12:37:38 debian kernel: [313078.462027] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
Feb 14 12:37:38 debian kernel: [313078.469416] R10: 00007fe81a532d28 R11: 0000000000000246 R12: 00007fe81c096eb8
Feb 14 12:37:38 debian kernel: [313078.476699] R13: 00007fe81a532d28 R14: 00007fe81c096e68 R15: 0000000000000000
Feb 14 12:37:38 debian kernel: [313078.484041] INFO: task McsfContainerBa:2113 blocked for more than 120 seconds.
Feb 14 12:37:38 debian kernel: [313078.491556]       Tainted: P           O      4.19.89-ipipe-9-xenomai-3.1-20201027 #1
Feb 14 12:37:38 debian kernel: [313078.499557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 12:37:38 debian kernel: [313078.507656] McsfContainerBa D    0  2113   1375 0x00000000
Feb 14 12:37:38 debian kernel: [313078.513343] Call Trace:
Feb 14 12:37:38 debian kernel: [313078.515945]  ? __schedule+0x2ac/0x800
Feb 14 12:37:38 debian kernel: [313078.519790]  ? preempt_count_add+0x74/0xa0
Feb 14 12:37:38 debian kernel: [313078.524144]  schedule+0x67/0x90
Feb 14 12:37:38 debian kernel: [313078.527466]  schedule_timeout+0x1df/0x3b0
Feb 14 12:37:38 debian kernel: [313078.531789]  ? mem_cgroup_iter+0x258/0x3b0
Feb 14 12:37:38 debian kernel: [313078.536109]  wait_for_completion+0xa0/0x120
Feb 14 12:37:38 debian kernel: [313078.540436]  ? wake_up_q+0x70/0x70
Feb 14 12:37:38 debian kernel: [313078.543995]  __flush_work+0x11b/0x1c0
Feb 14 12:37:38 debian kernel: [313078.547806]  ? flush_workqueue_prep_pwqs+0x120/0x120
Feb 14 12:37:38 debian kernel: [313078.552982]  drain_all_pages+0x13b/0x190
Feb 14 12:37:38 debian kernel: [313078.557108]  __alloc_pages_slowpath+0x3a9/0xd90
Feb 14 12:37:38 debian kernel: [313078.561781]  ? mem_cgroup_event_ratelimit.isra.50+0x31/0x90
Feb 14 12:37:38 debian kernel: [313078.567513]  __alloc_pages_nodemask+0x218/0x2a0
Feb 14 12:37:38 debian kernel: [313078.572220]  alloc_pages_vma+0xc0/0x170
Feb 14 12:37:38 debian kernel: [313078.576203]  __handle_mm_fault+0xa10/0xf30
Feb 14 12:37:38 debian kernel: [313078.580450]  handle_mm_fault+0x110/0x240
Feb 14 12:37:38 debian kernel: [313078.584543]  __do_page_fault+0x269/0x570
Feb 14 12:37:38 debian kernel: [313078.588616]  page_fault+0x43/0x5b
Feb 14 12:37:38 debian kernel: [313078.592252] RIP: 0033:0x7fe8616416ff
Feb 14 12:37:38 debian kernel: [313078.595972] Code: Bad RIP value.
Feb 14 12:37:38 debian kernel: [313078.599387] RSP: 002b:00007fe813ffccc8 EFLAGS: 00010202
Feb 14 12:37:38 debian kernel: [313078.604868] RAX: 00007fe801627c80 RBX: 00007fe81c09b600 RCX: 000000000001d480
Feb 14 12:37:38 debian kernel: [313078.612177] RDX: 00000000001d8800 RSI: 00007fe800395750 RDI: 00007fe8017e3000
Feb 14 12:37:38 debian kernel: [313078.619481] RBP: 00007fe813ffcf00 R08: 0000000000000000 R09: 00007fe8003b2bd0
Feb 14 12:37:38 debian kernel: [313078.626756] R10: fffffffffffff000 R11: 0000000001628000 R12: 00000000001d8800
Feb 14 12:37:38 debian kernel: [313078.634087] R13: 00007fe801627c80 R14: 00007fe81c09b610 R15: 0000000000000000
Feb 14 12:37:54 debian kernel: [313094.250218] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Feb 14 12:37:54 debian kernel: [313094.293904] sched_clock: Marking unstable (313092844842787, 1405370436)<-(313094649326888, -399109818)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about rcu_preempt detected stalls on CPUs/tasks
  2023-02-15 10:03 about rcu_preempt detected stalls on CPUs/tasks Yao,Yongxian
@ 2023-02-15 10:49 ` Jan Kiszka
  2023-02-15 11:07   ` Yao,Yongxian
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2023-02-15 10:49 UTC (permalink / raw)
  To: Yao,Yongxian, xenomai

On 15.02.23 11:03, Yao,Yongxian wrote:
> 
> Hi,
>  
> Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
> Please give me some hints. 

Is your Xenomai application dominating a CPU for more than a few dozens
of milliseconds? Do you have the Xenomai watchdog enabled (it would
catch such cases, after some seconds)? If you do have such a workload,
Linux complaining about not having the time to "breath" on such a CPU is
no surprise.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:Re: about rcu_preempt detected stalls on CPUs/tasks
  2023-02-15 10:49 ` Jan Kiszka
@ 2023-02-15 11:07   ` Yao,Yongxian
  2023-02-15 13:04     ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: Yao,Yongxian @ 2023-02-15 11:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai


Hi Jan,

on 2023-02-15 18:49:11,"Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>On 15.02.23 11:03, Yao,Yongxian wrote:
>>  
>> Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
>> Please give me some hints. 
>
>Is your Xenomai application dominating a CPU for more than a few dozens
>of milliseconds? Do you have the Xenomai watchdog enabled (it would
>catch such cases, after some seconds)? If you do have such a workload,
>Linux complaining about not having the time to "breath" on such a CPU is
>no surprise.

Real-time applications will not occupy CPU continuously for a long time.
Adn the xenomai watchdog has been turned on.

CONFIG+XENO_OPT_WATCHDOG=y
CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=1

--
Yao Yongxian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about rcu_preempt detected stalls on CPUs/tasks
  2023-02-15 11:07   ` Yao,Yongxian
@ 2023-02-15 13:04     ` Jan Kiszka
  2023-02-16  1:24       ` Yao,Yongxian
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2023-02-15 13:04 UTC (permalink / raw)
  To: Yao,Yongxian; +Cc: xenomai

On 15.02.23 12:07, Yao,Yongxian wrote:
> 
> Hi Jan,
> 
> on 2023-02-15 18:49:11,"Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>> On 15.02.23 11:03, Yao,Yongxian wrote:
>>>  
>>> Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
>>> Please give me some hints. 
>>
>> Is your Xenomai application dominating a CPU for more than a few dozens
>> of milliseconds? Do you have the Xenomai watchdog enabled (it would
>> catch such cases, after some seconds)? If you do have such a workload,
>> Linux complaining about not having the time to "breath" on such a CPU is
>> no surprise.
> 
> Real-time applications will not occupy CPU continuously for a long time.
> Adn the xenomai watchdog has been turned on.
> 
> CONFIG+XENO_OPT_WATCHDOG=y
> CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=1
> 

Ok, then:

 - Any other error messages before the one you shared?
 - Already tried if the issue persists with a (more) up-to-date kernel?
   ipipe-core-4.19.266-cip86-x86-25 is latest on 4.19, last I-pipe is
   5.4, and we have plenty of dovetail kernels
 - Any details on how this can be reproduced?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:Re: about rcu_preempt detected stalls on CPUs/tasks
  2023-02-15 13:04     ` Jan Kiszka
@ 2023-02-16  1:24       ` Yao,Yongxian
  2023-02-16  9:31         ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: Yao,Yongxian @ 2023-02-16  1:24 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai


Hi Jan,

At 2023-02-15 21:04:34, "Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>On 15.02.23 12:07, Yao,Yongxian wrote:
>> 
>> on 2023-02-15 18:49:11,"Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>>> On 15.02.23 11:03, Yao,Yongxian wrote:
>>>>  
>>>> Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
>>>> Please give me some hints. 
>>>
>>> Is your Xenomai application dominating a CPU for more than a few dozens
>>> of milliseconds? Do you have the Xenomai watchdog enabled (it would
>>> catch such cases, after some seconds)? If you do have such a workload,
>>> Linux complaining about not having the time to "breath" on such a CPU is
>>> no surprise.
>> 
>> Real-time applications will not occupy CPU continuously for a long time.
>> And the xenomai watchdog has been turned on.
>> 
>> CONFIG_XENO_OPT_WATCHDOG=y
>> CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=1
>> 
>
>Ok, then:
>
> - Any other error messages before the one you shared?

That's all the error messages.

> - Already tried if the issue persists with a (more) up-to-date kernel?
>   ipipe-core-4.19.266-cip86-x86-25 is latest on 4.19, last I-pipe is
>   5.4, and we have plenty of dovetail kernels

The customer sites now use the ipipe-core-4.19.89-x86-9, which is difficult to keep pace with the release of the xenomai community. 
We are planning to update the new kernel.

> - Any details on how this can be reproduced?
>

Many sites have deployed this version of the kernel. This problem has only occurred once, and there is no way to reproduce it.

In your opinion, what are the reasons for this problem?

Thanks.

--
Yao Yongxian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about rcu_preempt detected stalls on CPUs/tasks
  2023-02-16  1:24       ` Yao,Yongxian
@ 2023-02-16  9:31         ` Jan Kiszka
  2023-02-16 22:16           ` Florian Bezdeka
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2023-02-16  9:31 UTC (permalink / raw)
  To: Yao,Yongxian, Florian Bezdeka; +Cc: xenomai

On 16.02.23 02:24, Yao,Yongxian wrote:
> 
> Hi Jan,
> 
> At 2023-02-15 21:04:34, "Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>> On 15.02.23 12:07, Yao,Yongxian wrote:
>>>
>>> on 2023-02-15 18:49:11,"Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>>>> On 15.02.23 11:03, Yao,Yongxian wrote:
>>>>>  
>>>>> Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
>>>>> Please give me some hints. 
>>>>
>>>> Is your Xenomai application dominating a CPU for more than a few dozens
>>>> of milliseconds? Do you have the Xenomai watchdog enabled (it would
>>>> catch such cases, after some seconds)? If you do have such a workload,
>>>> Linux complaining about not having the time to "breath" on such a CPU is
>>>> no surprise.
>>>
>>> Real-time applications will not occupy CPU continuously for a long time.
>>> And the xenomai watchdog has been turned on.
>>>
>>> CONFIG_XENO_OPT_WATCHDOG=y
>>> CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=1
>>>
>>
>> Ok, then:
>>
>> - Any other error messages before the one you shared?
> 
> That's all the error messages.
> 
>> - Already tried if the issue persists with a (more) up-to-date kernel?
>>   ipipe-core-4.19.266-cip86-x86-25 is latest on 4.19, last I-pipe is
>>   5.4, and we have plenty of dovetail kernels
> 
> The customer sites now use the ipipe-core-4.19.89-x86-9, which is difficult to keep pace with the release of the xenomai community. 
> We are planning to update the new kernel.
> 
>> - Any details on how this can be reproduced?
>>
> 
> Many sites have deployed this version of the kernel. This problem has only occurred once, and there is no way to reproduce it.
> 

That sounds indeed unpleasant.

> In your opinion, what are the reasons for this problem?
> 

No idea yet.

Florian, from our own deployments of older 4.19 in our MRT scenarios, do
you recall any similar reports?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about rcu_preempt detected stalls on CPUs/tasks
  2023-02-16  9:31         ` Jan Kiszka
@ 2023-02-16 22:16           ` Florian Bezdeka
  2023-02-17  1:47             ` Yao Yongxian
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Bezdeka @ 2023-02-16 22:16 UTC (permalink / raw)
  To: Jan Kiszka, Yao,Yongxian; +Cc: xenomai

On Thu, 2023-02-16 at 10:31 +0100, Jan Kiszka wrote:
> On 16.02.23 02:24, Yao,Yongxian wrote:
> > 
> > Hi Jan,
> > 
> > At 2023-02-15 21:04:34, "Jan Kiszka" <jan.kiszka@siemens.com> wrote:
> > > On 15.02.23 12:07, Yao,Yongxian wrote:
> > > > 
> > > > on 2023-02-15 18:49:11,"Jan Kiszka" <jan.kiszka@siemens.com> wrote:
> > > > > On 15.02.23 11:03, Yao,Yongxian wrote:
> > > > > >  
> > > > > > Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
> > > > > > Please give me some hints. 
> > > > > 
> > > > > Is your Xenomai application dominating a CPU for more than a few dozens
> > > > > of milliseconds? Do you have the Xenomai watchdog enabled (it would
> > > > > catch such cases, after some seconds)? If you do have such a workload,
> > > > > Linux complaining about not having the time to "breath" on such a CPU is
> > > > > no surprise.
> > > > 
> > > > Real-time applications will not occupy CPU continuously for a long time.
> > > > And the xenomai watchdog has been turned on.
> > > > 
> > > > CONFIG_XENO_OPT_WATCHDOG=y
> > > > CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=1
> > > > 
> > > 
> > > Ok, then:
> > > 
> > > - Any other error messages before the one you shared?
> > 
> > That's all the error messages.
> > 
> > > - Already tried if the issue persists with a (more) up-to-date kernel?
> > >   ipipe-core-4.19.266-cip86-x86-25 is latest on 4.19, last I-pipe is
> > >   5.4, and we have plenty of dovetail kernels
> > 
> > The customer sites now use the ipipe-core-4.19.89-x86-9, which is difficult to keep pace with the release of the xenomai community. 
> > We are planning to update the new kernel.
> > 
> > > - Any details on how this can be reproduced?
> > > 
> > 
> > Many sites have deployed this version of the kernel. This problem has only occurred once, and there is no way to reproduce it.
> > 
> 
> That sounds indeed unpleasant.
> 
> > In your opinion, what are the reasons for this problem?
> > 
> 
> No idea yet.
> 
> Florian, from our own deployments of older 4.19 in our MRT scenarios, do
> you recall any similar reports?

Tried to find something similar in our internal issue tracker. We had
something similar in the past, but with dovetail underneath - not
ipipe.

The first report is about a network inactivity (CIFS timing out). It's
likely not the root cause but 6sec is a quite long time. Something
similar was detected in the early dovetail days. Root cause was a LAPIC
stall. I'm not aware of any related/similar ipipe issue.

Any special kernel cmdline arguments deployed?

It might help to put significant load and network traffic on the system
(stress-ng, iperf). Still needs some luck, but maybe you're able to
reproduce it.

Best regards,
Florianq

> 
> Jan
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:Re: about rcu_preempt detected stalls on CPUs/tasks
  2023-02-16 22:16           ` Florian Bezdeka
@ 2023-02-17  1:47             ` Yao Yongxian
  0 siblings, 0 replies; 8+ messages in thread
From: Yao Yongxian @ 2023-02-17  1:47 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: Jan Kiszka, xenomai


At 2023-02-17 06:16:07, "Florian Bezdeka" <florian.bezdeka@siemens.com> wrote:
>On Thu, 2023-02-16 at 10:31 +0100, Jan Kiszka wrote:
>> On 16.02.23 02:24, Yao,Yongxian wrote:
>> > 
>> > At 2023-02-15 21:04:34, "Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>> > > On 15.02.23 12:07, Yao,Yongxian wrote:
>> > > > 
>> > > > on 2023-02-15 18:49:11,"Jan Kiszka" <jan.kiszka@siemens.com> wrote:
>> > > > > On 15.02.23 11:03, Yao,Yongxian wrote:
>> > > > > >  
>> > > > > > Following exceptions occur on the Debian10.3 using Linux+xenomai kernel. 
>> > > > > > Please give me some hints. 
>> > > > > 
>> > > > > Is your Xenomai application dominating a CPU for more than a few dozens
>> > > > > of milliseconds? Do you have the Xenomai watchdog enabled (it would
>> > > > > catch such cases, after some seconds)? If you do have such a workload,
>> > > > > Linux complaining about not having the time to "breath" on such a CPU is
>> > > > > no surprise.
>> > > > 
>> > > > Real-time applications will not occupy CPU continuously for a long time.
>> > > > And the xenomai watchdog has been turned on.
>> > > > 
>> > > > CONFIG_XENO_OPT_WATCHDOG=y
>> > > > CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=1
>> > > > 
>> > > 
>> > > Ok, then:
>> > > 
>> > > - Any other error messages before the one you shared?
>> > 
>> > That's all the error messages.
>> > 
>> > > - Already tried if the issue persists with a (more) up-to-date kernel?
>> > >   ipipe-core-4.19.266-cip86-x86-25 is latest on 4.19, last I-pipe is
>> > >   5.4, and we have plenty of dovetail kernels
>> > 
>> > The customer sites now use the ipipe-core-4.19.89-x86-9, which is difficult to keep pace with the release of the xenomai community. 
>> > We are planning to update the new kernel.
>> > 
>> > > - Any details on how this can be reproduced?
>> > > 
>> > 
>> > Many sites have deployed this version of the kernel. This problem has only occurred once, and there is no way to reproduce it.
>> > 
>> 
>> That sounds indeed unpleasant.
>> 
>> > In your opinion, what are the reasons for this problem?
>> > 
>> 
>> No idea yet.
>> 
>> Florian, from our own deployments of older 4.19 in our MRT scenarios, do
>> you recall any similar reports?
>
>Tried to find something similar in our internal issue tracker. We had
>something similar in the past, but with dovetail underneath - not
>ipipe.
>
>The first report is about a network inactivity (CIFS timing out). It's
>likely not the root cause but 6sec is a quite long time. Something
>similar was detected in the early dovetail days. Root cause was a LAPIC
>stall. I'm not aware of any related/similar ipipe issue.
>
>Any special kernel cmdline arguments deployed?

net.ifnames=0 crashkernel=512M xenomai.supported_cpus=0xf idle=halt 
nosmap nmi_watchdog=panic console=ttyS0,115200n8 panic=1

>It might help to put significant load and network traffic on the system
>(stress-ng, iperf). Still needs some luck, but maybe you're able to
>reproduce it.

I am looking forward to your results and truly grateful for your help.

In addition what specific execution commands(command,command parameter and 
parameter value) do you recommend for these two tools?
And I will try to reproduce it on my system according to your suggestion. 

Best regards.
Yao Yongxian

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-17  2:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-15 10:03 about rcu_preempt detected stalls on CPUs/tasks Yao,Yongxian
2023-02-15 10:49 ` Jan Kiszka
2023-02-15 11:07   ` Yao,Yongxian
2023-02-15 13:04     ` Jan Kiszka
2023-02-16  1:24       ` Yao,Yongxian
2023-02-16  9:31         ` Jan Kiszka
2023-02-16 22:16           ` Florian Bezdeka
2023-02-17  1:47             ` Yao Yongxian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.