linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
@ 2021-08-28  4:52 syzbot
  2021-08-30 10:58 ` Dmitry Vyukov
       [not found] ` <20210831074532.2255-1-hdanton@sina.com>
  0 siblings, 2 replies; 11+ messages in thread
From: syzbot @ 2021-08-28  4:52 UTC (permalink / raw)
  To: bp, hpa, jirislaby, jpoimboe, jthierry, linux-kernel, mingo,
	peterz, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following issue on:

HEAD commit:    d5ae8d7f85b7 Revert "media: dvb header files: move some he..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1138c9ad300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=765eea9a273a8879
dashboard link: https://syzkaller.appspot.com/bug?extid=0e964fad69a9c462bc1e
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	1-...!: (1 GPs behind) idle=70e/1/0x4000000000000000 softirq=60586/60587 fqs=2492 
	(t=10501 jiffies g=90689 q=3510)
rcu: rcu_preempt kthread starved for 5495 jiffies! g90689 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:28864 pid:   14 ppid:     2 flags:0x00004000
Call Trace:
 context_switch kernel/sched/core.c:4681 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:5938
 schedule+0xd3/0x270 kernel/sched/core.c:6017
 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
 rcu_gp_fqs_loop kernel/rcu/tree.c:1996 [inline]
 rcu_gp_kthread+0xd34/0x1980 kernel/rcu/tree.c:2169
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 10491 Comm: syz-executor.3 Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:unwind_next_frame+0x829/0x1ce0 arch/x86/kernel/unwind_orc.c:466
Code: 83 e0 07 40 38 c6 40 0f 9e c7 40 84 f6 0f 95 c0 40 84 c7 0f 85 bc 0f 00 00 49 0f bf 30 48 01 d6 48 89 74 24 60 e9 d8 fd ff ff <48> b8 00 00 00 00 00 fc ff df 48 8b 54 24 08 48 c1 ea 03 80 3c 02
RSP: 0018:ffffc90000007698 EFLAGS: 00000297
RAX: 0000000000000005 RBX: 1ffff92000000edb RCX: ffffffff8e5a56dd
RDX: 0000000000000005 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000002 R08: ffffffff8e5a56d8 R09: ffffffff8e5a56a8
R10: fffff52000000ef9 R11: 0000000000086088 R12: ffffc900000077b8
R13: ffffc900000077a5 R14: ffffc90000007770 R15: ffffffff8e5a56dc
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c01b7ba1b0 CR3: 0000000014fb2000 CR4: 00000000001526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 arch_stack_walk+0x7d/0xe0 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x8c/0xc0 kernel/stacktrace.c:121
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 __kasan_slab_alloc+0x83/0xb0 mm/kasan/common.c:467
 kasan_slab_alloc include/linux/kasan.h:254 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:2959 [inline]
 kmem_cache_alloc_node+0x266/0x3e0 mm/slub.c:2995
 __alloc_skb+0x20b/0x340 net/core/skbuff.c:414
 skb_copy+0x137/0x2f0 net/core/skbuff.c:1581
 mac80211_hwsim_tx_frame_no_nl.isra.0+0xb17/0x1330 drivers/net/wireless/mac80211_hwsim.c:1565
 mac80211_hwsim_tx_frame+0x1ee/0x2a0 drivers/net/wireless/mac80211_hwsim.c:1784
 mac80211_hwsim_beacon_tx+0x49b/0x930 drivers/net/wireless/mac80211_hwsim.c:1838
 __iterate_interfaces+0x1e5/0x520 net/mac80211/util.c:793
 ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
 mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
 __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
 __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
 hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x16e/0x1c0 kernel/softirq.c:636
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:648
 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__page_mapcount+0x2/0x350 mm/util.c:716
Code: 89 e7 e8 31 cf 17 00 e9 10 fe ff ff e8 27 cf 17 00 e9 74 fe ff ff e8 1d cf 17 00 e9 3c fe ff ff 0f 1f 84 00 00 00 00 00 41 56 <41> 55 41 54 55 48 89 fd 4c 8d 65 30 53 e8 ec 94 d1 ff be 04 00 00
RSP: 0018:ffffc900180177a0 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88807d149c40 RSI: ffffffff81aa23f5 RDI: ffffea0001deaa80
RBP: ffffea0001deaa80 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81aa28ae R11: 0000000000000000 R12: ffffea0001deaa88
R13: 0000000000000001 R14: dffffc0000000000 R15: 0000000000aab000
 page_mapcount include/linux/mm.h:875 [inline]
 zap_pte_range mm/memory.c:1363 [inline]
 zap_pmd_range mm/memory.c:1481 [inline]
 zap_pud_range mm/memory.c:1510 [inline]
 zap_p4d_range mm/memory.c:1531 [inline]
 unmap_page_range+0xf1d/0x2a10 mm/memory.c:1552
 unmap_single_vma+0x198/0x300 mm/memory.c:1597
 unmap_vmas+0x16d/0x2f0 mm/memory.c:1629
 exit_mmap+0x1d0/0x620 mm/mmap.c:3201
 __mmput+0x122/0x470 kernel/fork.c:1101
 mmput+0x58/0x60 kernel/fork.c:1122
 exit_mm kernel/exit.c:501 [inline]
 do_exit+0xae2/0x2a60 kernel/exit.c:812
 do_group_exit+0x125/0x310 kernel/exit.c:922
 get_signal+0x47f/0x2160 kernel/signal.c:2808
 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865
 handle_signal_work kernel/entry/common.c:148 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
 exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665e9
Code: Unable to access opcode bytes at RIP 0x4665bf.
RSP: 002b:00007f1283537218 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000056bf88 RCX: 00000000004665e9
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000000056bf88
RBP: 000000000056bf80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf8c
R13: 0000000000a9fb1f R14: 00007f1283537300 R15: 0000000000022000
NMI backtrace for cpu 1
CPU: 1 PID: 4075 Comm: syz-executor.4 Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:342
 print_cpu_stall kernel/rcu/tree_stall.h:625 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:700 [inline]
 rcu_pending kernel/rcu/tree.c:3922 [inline]
 rcu_sched_clock_irq.cold+0x9f/0x747 kernel/rcu/tree.c:2641
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1421
 __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1601
 hrtimer_interrupt+0x330/0xa00 kernel/time/hrtimer.c:1663
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1106
 sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1100
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 kernel/locking/spinlock.c:191
Code: 74 24 10 e8 9a fa 2d f8 48 89 ef e8 42 70 2e f8 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> 03 4e 22 f8 65 8b 05 ec c8 d4 76 85 c0 74 0a 5b 5d c3 e8 e0 aa
RSP: 0018:ffffc90000dc0c08 EFLAGS: 00000206
RAX: 0000000000000002 RBX: 0000000000000200 RCX: 1ffffffff1fa4e6a
RDX: 0000000000000000 RSI: 0000000000000102 RDI: 0000000000000001
RBP: ffff8880b9d40280 R08: 0000000000000001 R09: ffffffff8fcd59bf
R10: 0000000000000001 R11: 0000000000000000 R12: 000000010000c10b
R13: ffff8880b9d40280 R14: 0000000000000000 R15: 00000000ffffffff
 __mod_timer+0x837/0xe30 kernel/time/timer.c:1065
 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
 expire_timers kernel/time/timer.c:1466 [inline]
 __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
 __run_timers kernel/time/timer.c:1715 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x16e/0x1c0 kernel/softirq.c:636
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:648
 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:finish_task_switch.isra.0+0x23c/0xa50 kernel/sched/core.c:4555
Code: 8b 3a 4c 89 e7 48 c7 02 00 00 00 00 ff d1 4d 85 ff 75 bf 4c 89 e7 e8 23 3c dd 07 e8 fe e7 2b 00 fb 65 48 8b 1c 25 40 f0 01 00 <48> 8d bb 10 15 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1
RSP: 0018:ffffc90003ae7810 EFLAGS: 00000202
RAX: 00000000000003cd RBX: ffff888030a13880 RCX: 1ffffffff1f9c5a2
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc90003ae7850 R08: 0000000000000001 R09: ffffffff8fcd5907
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8880b9d51b00
R13: ffff88801f1c1c40 R14: 0000000000000000 R15: ffff8880b9d51b18
 context_switch kernel/sched/core.c:4684 [inline]
 __schedule+0x942/0x26f0 kernel/sched/core.c:5938
 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6098
 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
 put_cpu_partial+0x1a1/0x230 mm/slub.c:2482
 qlink_free mm/kasan/quarantine.c:146 [inline]
 qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
 __kasan_slab_alloc+0x95/0xb0 mm/kasan/common.c:444
 kasan_slab_alloc include/linux/kasan.h:254 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:2959 [inline]
 slab_alloc mm/slub.c:2967 [inline]
 __kmalloc+0x1f4/0x330 mm/slub.c:4111
 kmalloc include/linux/slab.h:596 [inline]
 tomoyo_realpath_from_path+0xc3/0x620 security/tomoyo/realpath.c:254
 tomoyo_realpath_nofollow+0xc8/0xe0 security/tomoyo/realpath.c:309
 tomoyo_find_next_domain+0x280/0x1f80 security/tomoyo/domain.c:727
 tomoyo_bprm_check_security security/tomoyo/tomoyo.c:101 [inline]
 tomoyo_bprm_check_security+0x121/0x1a0 security/tomoyo/tomoyo.c:91
 security_bprm_check+0x45/0xa0 security/security.c:866
 search_binary_handler fs/exec.c:1709 [inline]
 exec_binprm fs/exec.c:1762 [inline]
 bprm_execve fs/exec.c:1831 [inline]
 bprm_execve+0x732/0x19b0 fs/exec.c:1793
 do_execveat_common+0x5e3/0x780 fs/exec.c:1920
 do_execve fs/exec.c:1988 [inline]
 __do_sys_execve fs/exec.c:2064 [inline]
 __se_sys_execve fs/exec.c:2059 [inline]
 __x64_sys_execve+0x8f/0xc0 fs/exec.c:2059
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665e9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fc2fc6a3188 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665e9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000180
RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 0000000000a9fb1f R14: 00007fc2fc6a3300 R15: 0000000000022000
----------------
Code disassembly (best guess):
   0:	83 e0 07             	and    $0x7,%eax
   3:	40 38 c6             	cmp    %al,%sil
   6:	40 0f 9e c7          	setle  %dil
   a:	40 84 f6             	test   %sil,%sil
   d:	0f 95 c0             	setne  %al
  10:	40 84 c7             	test   %al,%dil
  13:	0f 85 bc 0f 00 00    	jne    0xfd5
  19:	49 0f bf 30          	movswq (%r8),%rsi
  1d:	48 01 d6             	add    %rdx,%rsi
  20:	48 89 74 24 60       	mov    %rsi,0x60(%rsp)
  25:	e9 d8 fd ff ff       	jmpq   0xfffffe02
* 2a:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax <-- trapping instruction
  31:	fc ff df
  34:	48 8b 54 24 08       	mov    0x8(%rsp),%rdx
  39:	48 c1 ea 03          	shr    $0x3,%rdx
  3d:	80                   	.byte 0x80
  3e:	3c 02                	cmp    $0x2,%al


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-08-28  4:52 [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode syzbot
@ 2021-08-30 10:58 ` Dmitry Vyukov
       [not found] ` <20210831074532.2255-1-hdanton@sina.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Dmitry Vyukov @ 2021-08-30 10:58 UTC (permalink / raw)
  To: syzbot, Johannes Berg, Aleksandr Nogikh, Kalle Valo, linux-wireless
  Cc: bp, hpa, jirislaby, jpoimboe, jthierry, linux-kernel, mingo,
	peterz, syzkaller-bugs, tglx, x86

On Sat, 28 Aug 2021 at 06:52, syzbot
<syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    d5ae8d7f85b7 Revert "media: dvb header files: move some he..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1138c9ad300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=765eea9a273a8879
> dashboard link: https://syzkaller.appspot.com/bug?extid=0e964fad69a9c462bc1e
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com

It seems that lots of these stalls actually happen in invoke_softirq()
-> mac80211_hwsim_beacon().
See https://github.com/google/syzkaller/issues/2732 for examples, also
the issue to attribute it properly in future.
+drivers/net/wireless/mac80211_hwsim.c maintainers


> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:    1-...!: (1 GPs behind) idle=70e/1/0x4000000000000000 softirq=60586/60587 fqs=2492
>         (t=10501 jiffies g=90689 q=3510)
> rcu: rcu_preempt kthread starved for 5495 jiffies! g90689 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt     state:R  running task     stack:28864 pid:   14 ppid:     2 flags:0x00004000
> Call Trace:
>  context_switch kernel/sched/core.c:4681 [inline]
>  __schedule+0x93a/0x26f0 kernel/sched/core.c:5938
>  schedule+0xd3/0x270 kernel/sched/core.c:6017
>  schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
>  rcu_gp_fqs_loop kernel/rcu/tree.c:1996 [inline]
>  rcu_gp_kthread+0xd34/0x1980 kernel/rcu/tree.c:2169
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> rcu: Stack dump where RCU GP kthread last ran:
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 10491 Comm: syz-executor.3 Not tainted 5.14.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:unwind_next_frame+0x829/0x1ce0 arch/x86/kernel/unwind_orc.c:466
> Code: 83 e0 07 40 38 c6 40 0f 9e c7 40 84 f6 0f 95 c0 40 84 c7 0f 85 bc 0f 00 00 49 0f bf 30 48 01 d6 48 89 74 24 60 e9 d8 fd ff ff <48> b8 00 00 00 00 00 fc ff df 48 8b 54 24 08 48 c1 ea 03 80 3c 02
> RSP: 0018:ffffc90000007698 EFLAGS: 00000297
> RAX: 0000000000000005 RBX: 1ffff92000000edb RCX: ffffffff8e5a56dd
> RDX: 0000000000000005 RSI: 0000000000000001 RDI: 0000000000000001
> RBP: 0000000000000002 R08: ffffffff8e5a56d8 R09: ffffffff8e5a56a8
> R10: fffff52000000ef9 R11: 0000000000086088 R12: ffffc900000077b8
> R13: ffffc900000077a5 R14: ffffc90000007770 R15: ffffffff8e5a56dc
> FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000c01b7ba1b0 CR3: 0000000014fb2000 CR4: 00000000001526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <IRQ>
>  arch_stack_walk+0x7d/0xe0 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0x8c/0xc0 kernel/stacktrace.c:121
>  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>  kasan_set_track mm/kasan/common.c:46 [inline]
>  set_alloc_info mm/kasan/common.c:434 [inline]
>  __kasan_slab_alloc+0x83/0xb0 mm/kasan/common.c:467
>  kasan_slab_alloc include/linux/kasan.h:254 [inline]
>  slab_post_alloc_hook mm/slab.h:519 [inline]
>  slab_alloc_node mm/slub.c:2959 [inline]
>  kmem_cache_alloc_node+0x266/0x3e0 mm/slub.c:2995
>  __alloc_skb+0x20b/0x340 net/core/skbuff.c:414
>  skb_copy+0x137/0x2f0 net/core/skbuff.c:1581
>  mac80211_hwsim_tx_frame_no_nl.isra.0+0xb17/0x1330 drivers/net/wireless/mac80211_hwsim.c:1565
>  mac80211_hwsim_tx_frame+0x1ee/0x2a0 drivers/net/wireless/mac80211_hwsim.c:1784
>  mac80211_hwsim_beacon_tx+0x49b/0x930 drivers/net/wireless/mac80211_hwsim.c:1838
>  __iterate_interfaces+0x1e5/0x520 net/mac80211/util.c:793
>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>  invoke_softirq kernel/softirq.c:432 [inline]
>  __irq_exit_rcu+0x16e/0x1c0 kernel/softirq.c:636
>  irq_exit_rcu+0x5/0x20 kernel/softirq.c:648
>  sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
>  </IRQ>
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
> RIP: 0010:__page_mapcount+0x2/0x350 mm/util.c:716
> Code: 89 e7 e8 31 cf 17 00 e9 10 fe ff ff e8 27 cf 17 00 e9 74 fe ff ff e8 1d cf 17 00 e9 3c fe ff ff 0f 1f 84 00 00 00 00 00 41 56 <41> 55 41 54 55 48 89 fd 4c 8d 65 30 53 e8 ec 94 d1 ff be 04 00 00
> RSP: 0018:ffffc900180177a0 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: ffff88807d149c40 RSI: ffffffff81aa23f5 RDI: ffffea0001deaa80
> RBP: ffffea0001deaa80 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81aa28ae R11: 0000000000000000 R12: ffffea0001deaa88
> R13: 0000000000000001 R14: dffffc0000000000 R15: 0000000000aab000
>  page_mapcount include/linux/mm.h:875 [inline]
>  zap_pte_range mm/memory.c:1363 [inline]
>  zap_pmd_range mm/memory.c:1481 [inline]
>  zap_pud_range mm/memory.c:1510 [inline]
>  zap_p4d_range mm/memory.c:1531 [inline]
>  unmap_page_range+0xf1d/0x2a10 mm/memory.c:1552
>  unmap_single_vma+0x198/0x300 mm/memory.c:1597
>  unmap_vmas+0x16d/0x2f0 mm/memory.c:1629
>  exit_mmap+0x1d0/0x620 mm/mmap.c:3201
>  __mmput+0x122/0x470 kernel/fork.c:1101
>  mmput+0x58/0x60 kernel/fork.c:1122
>  exit_mm kernel/exit.c:501 [inline]
>  do_exit+0xae2/0x2a60 kernel/exit.c:812
>  do_group_exit+0x125/0x310 kernel/exit.c:922
>  get_signal+0x47f/0x2160 kernel/signal.c:2808
>  arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865
>  handle_signal_work kernel/entry/common.c:148 [inline]
>  exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
>  exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
>  syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
>  do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x4665e9
> Code: Unable to access opcode bytes at RIP 0x4665bf.
> RSP: 002b:00007f1283537218 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 000000000056bf88 RCX: 00000000004665e9
> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000000056bf88
> RBP: 000000000056bf80 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf8c
> R13: 0000000000a9fb1f R14: 00007f1283537300 R15: 0000000000022000
> NMI backtrace for cpu 1
> CPU: 1 PID: 4075 Comm: syz-executor.4 Not tainted 5.14.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>  trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
>  rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:342
>  print_cpu_stall kernel/rcu/tree_stall.h:625 [inline]
>  check_cpu_stall kernel/rcu/tree_stall.h:700 [inline]
>  rcu_pending kernel/rcu/tree.c:3922 [inline]
>  rcu_sched_clock_irq.cold+0x9f/0x747 kernel/rcu/tree.c:2641
>  update_process_times+0x16d/0x200 kernel/time/timer.c:1785
>  tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
>  tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1421
>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
>  __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1601
>  hrtimer_interrupt+0x330/0xa00 kernel/time/hrtimer.c:1663
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
>  __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1106
>  sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1100
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
> RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 kernel/locking/spinlock.c:191
> Code: 74 24 10 e8 9a fa 2d f8 48 89 ef e8 42 70 2e f8 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> 03 4e 22 f8 65 8b 05 ec c8 d4 76 85 c0 74 0a 5b 5d c3 e8 e0 aa
> RSP: 0018:ffffc90000dc0c08 EFLAGS: 00000206
> RAX: 0000000000000002 RBX: 0000000000000200 RCX: 1ffffffff1fa4e6a
> RDX: 0000000000000000 RSI: 0000000000000102 RDI: 0000000000000001
> RBP: ffff8880b9d40280 R08: 0000000000000001 R09: ffffffff8fcd59bf
> R10: 0000000000000001 R11: 0000000000000000 R12: 000000010000c10b
> R13: ffff8880b9d40280 R14: 0000000000000000 R15: 00000000ffffffff
>  __mod_timer+0x837/0xe30 kernel/time/timer.c:1065
>  call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
>  expire_timers kernel/time/timer.c:1466 [inline]
>  __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
>  __run_timers kernel/time/timer.c:1715 [inline]
>  run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>  invoke_softirq kernel/softirq.c:432 [inline]
>  __irq_exit_rcu+0x16e/0x1c0 kernel/softirq.c:636
>  irq_exit_rcu+0x5/0x20 kernel/softirq.c:648
>  sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
>  </IRQ>
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
> RIP: 0010:finish_task_switch.isra.0+0x23c/0xa50 kernel/sched/core.c:4555
> Code: 8b 3a 4c 89 e7 48 c7 02 00 00 00 00 ff d1 4d 85 ff 75 bf 4c 89 e7 e8 23 3c dd 07 e8 fe e7 2b 00 fb 65 48 8b 1c 25 40 f0 01 00 <48> 8d bb 10 15 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1
> RSP: 0018:ffffc90003ae7810 EFLAGS: 00000202
> RAX: 00000000000003cd RBX: ffff888030a13880 RCX: 1ffffffff1f9c5a2
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: ffffc90003ae7850 R08: 0000000000000001 R09: ffffffff8fcd5907
> R10: 0000000000000001 R11: 0000000000000000 R12: ffff8880b9d51b00
> R13: ffff88801f1c1c40 R14: 0000000000000000 R15: ffff8880b9d51b18
>  context_switch kernel/sched/core.c:4684 [inline]
>  __schedule+0x942/0x26f0 kernel/sched/core.c:5938
>  preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6098
>  preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
>  put_cpu_partial+0x1a1/0x230 mm/slub.c:2482
>  qlink_free mm/kasan/quarantine.c:146 [inline]
>  qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
>  kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
>  __kasan_slab_alloc+0x95/0xb0 mm/kasan/common.c:444
>  kasan_slab_alloc include/linux/kasan.h:254 [inline]
>  slab_post_alloc_hook mm/slab.h:519 [inline]
>  slab_alloc_node mm/slub.c:2959 [inline]
>  slab_alloc mm/slub.c:2967 [inline]
>  __kmalloc+0x1f4/0x330 mm/slub.c:4111
>  kmalloc include/linux/slab.h:596 [inline]
>  tomoyo_realpath_from_path+0xc3/0x620 security/tomoyo/realpath.c:254
>  tomoyo_realpath_nofollow+0xc8/0xe0 security/tomoyo/realpath.c:309
>  tomoyo_find_next_domain+0x280/0x1f80 security/tomoyo/domain.c:727
>  tomoyo_bprm_check_security security/tomoyo/tomoyo.c:101 [inline]
>  tomoyo_bprm_check_security+0x121/0x1a0 security/tomoyo/tomoyo.c:91
>  security_bprm_check+0x45/0xa0 security/security.c:866
>  search_binary_handler fs/exec.c:1709 [inline]
>  exec_binprm fs/exec.c:1762 [inline]
>  bprm_execve fs/exec.c:1831 [inline]
>  bprm_execve+0x732/0x19b0 fs/exec.c:1793
>  do_execveat_common+0x5e3/0x780 fs/exec.c:1920
>  do_execve fs/exec.c:1988 [inline]
>  __do_sys_execve fs/exec.c:2064 [inline]
>  __se_sys_execve fs/exec.c:2059 [inline]
>  __x64_sys_execve+0x8f/0xc0 fs/exec.c:2059
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x4665e9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fc2fc6a3188 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665e9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000180
> RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
> R13: 0000000000a9fb1f R14: 00007fc2fc6a3300 R15: 0000000000022000
> ----------------
> Code disassembly (best guess):
>    0:   83 e0 07                and    $0x7,%eax
>    3:   40 38 c6                cmp    %al,%sil
>    6:   40 0f 9e c7             setle  %dil
>    a:   40 84 f6                test   %sil,%sil
>    d:   0f 95 c0                setne  %al
>   10:   40 84 c7                test   %al,%dil
>   13:   0f 85 bc 0f 00 00       jne    0xfd5
>   19:   49 0f bf 30             movswq (%r8),%rsi
>   1d:   48 01 d6                add    %rdx,%rsi
>   20:   48 89 74 24 60          mov    %rsi,0x60(%rsp)
>   25:   e9 d8 fd ff ff          jmpq   0xfffffe02
> * 2a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax <-- trapping instruction
>   31:   fc ff df
>   34:   48 8b 54 24 08          mov    0x8(%rsp),%rdx
>   39:   48 c1 ea 03             shr    $0x3,%rdx
>   3d:   80                      .byte 0x80
>   3e:   3c 02                   cmp    $0x2,%al
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000eaacf005ca975d1a%40google.com.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
       [not found] ` <20210831074532.2255-1-hdanton@sina.com>
@ 2021-09-13 10:28   ` Thomas Gleixner
       [not found]   ` <20210914123726.4219-1-hdanton@sina.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-13 10:28 UTC (permalink / raw)
  To: Hillf Danton, Dmitry Vyukov; +Cc: syzbot, linux-kernel, paulmck, syzkaller-bugs

On Tue, Aug 31 2021 at 15:45, Hillf Danton wrote:
> On Mon, 30 Aug 2021 12:58:58 +0200 Dmitry Vyukov wrote:
>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>
> Add debug info only to help kasan catch the timer running longer than 2 ticks.
>
> Is it anything in the right direction, tglx?

Not really. As Dmitry pointed out this seems to be related to
mac80211_hwsim and if you look at the above stacktrace then how is
adding something to the timer wheel helpful?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
       [not found]   ` <20210914123726.4219-1-hdanton@sina.com>
@ 2021-09-14 14:58     ` Thomas Gleixner
  2021-09-14 18:00       ` Dmitry Vyukov
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-14 14:58 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Dmitry Vyukov, syzbot, linux-kernel, paulmck, syzkaller-bugs,
	Peter Zijlstra

On Tue, Sep 14 2021 at 20:37, Hillf Danton wrote:

> On Mon, 13 Sep 2021 12:28:14 +0200 Thomas Gleixner wrote:
>>On Tue, Aug 31 2021 at 15:45, Hillf Danton wrote:
>>> On Mon, 30 Aug 2021 12:58:58 +0200 Dmitry Vyukov wrote:
>>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
>>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
>>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
>>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
>>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
>>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>>>
>>> Add debug info only to help kasan catch the timer running longer than 2 ticks.
>>>
>>> Is it anything in the right direction, tglx?
>>
>>Not really. As Dmitry pointed out this seems to be related to
>
> Thanks for taking a look.
>
>>mac80211_hwsim and if you look at the above stacktrace then how is
>>adding something to the timer wheel helpful?
>
> Given the stall was printed on CPU1 while the supposedly offending timer was
> expiring on CPU0, what was proposed is the lame debug info only for kasan to
> catch the timer red handed.
>
> It is more appreciated if the tglx dude would likely spend a couple of minutes
> giving us a lesson on the expertises needed for collecting evidence that any
> timer runs longer than two ticks. It helps beyond the extent of kasan.

That tglx dude already picked the relevant part of the stack trace (see
also above):

>>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
>>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
>>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
>>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
>>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
>>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

and then asked the question how a timer wheel timer runtime check
helps. He just omitted the appendix "if the timer in question is a
hrtimer" as he assumed that this is pretty obvious from the stack trace.

Aside of that if the wireless timer callback runs in an endless loop,
what is a runtime detection of that in the hrtimer softirq invocation
helping to decode the problem if the stall detector catches it when it
hangs there?

Now that mac80211 hrtimer callback might actually be not the real
problem. It's certainly containing a bunch of loops, but I couldn't find
an endless loop there during a cursory inspection.

But that callback does rearm the hrtimer and that made me look at
hrtimer_run_queues() which might be the reason for the endless loop as
it only terminates when there is no timer to expire anymore.

Now what happens when the mac80211 callback rearms the timer so it
expires immediately again:

        hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
                        ns_to_ktime(bcn_int * NSEC_PER_USEC));

bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
which would certainly cause the loop in hrtimer_run_queues() to keeping
looping forever.

That should be easy to verify by implementing a simple test which
reschedules a hrtimer from the callback with a expiry time close to now.

Not today as I'm about to head home to fire up the pizza oven.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-09-14 14:58     ` Thomas Gleixner
@ 2021-09-14 18:00       ` Dmitry Vyukov
  2021-09-14 18:31         ` Paul E. McKenney
  2021-09-15  8:57         ` Thomas Gleixner
  0 siblings, 2 replies; 11+ messages in thread
From: Dmitry Vyukov @ 2021-09-14 18:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Hillf Danton, syzbot, linux-kernel, paulmck, syzkaller-bugs,
	Peter Zijlstra, kasan-dev

On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Sep 14 2021 at 20:37, Hillf Danton wrote:
>
> > On Mon, 13 Sep 2021 12:28:14 +0200 Thomas Gleixner wrote:
> >>On Tue, Aug 31 2021 at 15:45, Hillf Danton wrote:
> >>> On Mon, 30 Aug 2021 12:58:58 +0200 Dmitry Vyukov wrote:
> >>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
> >>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
> >>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> >>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
> >>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
> >>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
> >>>
> >>> Add debug info only to help kasan catch the timer running longer than 2 ticks.
> >>>
> >>> Is it anything in the right direction, tglx?
> >>
> >>Not really. As Dmitry pointed out this seems to be related to
> >
> > Thanks for taking a look.
> >
> >>mac80211_hwsim and if you look at the above stacktrace then how is
> >>adding something to the timer wheel helpful?
> >
> > Given the stall was printed on CPU1 while the supposedly offending timer was
> > expiring on CPU0, what was proposed is the lame debug info only for kasan to
> > catch the timer red handed.
> >
> > It is more appreciated if the tglx dude would likely spend a couple of minutes
> > giving us a lesson on the expertises needed for collecting evidence that any
> > timer runs longer than two ticks. It helps beyond the extent of kasan.
>
> That tglx dude already picked the relevant part of the stack trace (see
> also above):
>
> >>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
> >>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
> >>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> >>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
> >>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
> >>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>
> and then asked the question how a timer wheel timer runtime check
> helps. He just omitted the appendix "if the timer in question is a
> hrtimer" as he assumed that this is pretty obvious from the stack trace.
>
> Aside of that if the wireless timer callback runs in an endless loop,
> what is a runtime detection of that in the hrtimer softirq invocation
> helping to decode the problem if the stall detector catches it when it
> hangs there?
>
> Now that mac80211 hrtimer callback might actually be not the real
> problem. It's certainly containing a bunch of loops, but I couldn't find
> an endless loop there during a cursory inspection.
>
> But that callback does rearm the hrtimer and that made me look at
> hrtimer_run_queues() which might be the reason for the endless loop as
> it only terminates when there is no timer to expire anymore.
>
> Now what happens when the mac80211 callback rearms the timer so it
> expires immediately again:
>
>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
>
> bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
> which would certainly cause the loop in hrtimer_run_queues() to keeping
> looping forever.
>
> That should be easy to verify by implementing a simple test which
> reschedules a hrtimer from the callback with a expiry time close to now.
>
> Not today as I'm about to head home to fire up the pizza oven.

This question definitely shouldn't take priority over the pizza. But I
think I saw this "rearm a timer with a user-controlled value without
any checks" pattern lots of times and hangs are inherently harder to
localize and reproduce. So I wonder if it makes sense to add a debug
config that would catch such cases right when the timer is set up
(issue a WARNING)?
However, for automated testing there is the usual question of
balancing between false positives and false negatives. The check
should not produce false positives, but at the same time it should
catch [almost] all actual stalls so that they don't manifest as
duplicate stall reports.

If I understand it correctly the timer is not actually set up as
periodic, but rather each callback invocation arms it again. Setting
up a timer for 1 ns _once_ (or few times) is probably fine (right?),
so the check needs to be somewhat more elaborate and detect "infinite"
rearming.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-09-14 18:00       ` Dmitry Vyukov
@ 2021-09-14 18:31         ` Paul E. McKenney
  2021-09-15  9:36           ` Thomas Gleixner
  2021-09-15  8:57         ` Thomas Gleixner
  1 sibling, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2021-09-14 18:31 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Hillf Danton, syzbot, linux-kernel,
	syzkaller-bugs, Peter Zijlstra, kasan-dev

On Tue, Sep 14, 2021 at 08:00:04PM +0200, Dmitry Vyukov wrote:
> On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Tue, Sep 14 2021 at 20:37, Hillf Danton wrote:
> >
> > > On Mon, 13 Sep 2021 12:28:14 +0200 Thomas Gleixner wrote:
> > >>On Tue, Aug 31 2021 at 15:45, Hillf Danton wrote:
> > >>> On Mon, 30 Aug 2021 12:58:58 +0200 Dmitry Vyukov wrote:
> > >>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
> > >>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
> > >>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> > >>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
> > >>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
> > >>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
> > >>>
> > >>> Add debug info only to help kasan catch the timer running longer than 2 ticks.
> > >>>
> > >>> Is it anything in the right direction, tglx?
> > >>
> > >>Not really. As Dmitry pointed out this seems to be related to
> > >
> > > Thanks for taking a look.
> > >
> > >>mac80211_hwsim and if you look at the above stacktrace then how is
> > >>adding something to the timer wheel helpful?
> > >
> > > Given the stall was printed on CPU1 while the supposedly offending timer was
> > > expiring on CPU0, what was proposed is the lame debug info only for kasan to
> > > catch the timer red handed.
> > >
> > > It is more appreciated if the tglx dude would likely spend a couple of minutes
> > > giving us a lesson on the expertises needed for collecting evidence that any
> > > timer runs longer than two ticks. It helps beyond the extent of kasan.
> >
> > That tglx dude already picked the relevant part of the stack trace (see
> > also above):
> >
> > >>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
> > >>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
> > >>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> > >>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
> > >>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
> > >>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
> >
> > and then asked the question how a timer wheel timer runtime check
> > helps. He just omitted the appendix "if the timer in question is a
> > hrtimer" as he assumed that this is pretty obvious from the stack trace.
> >
> > Aside of that if the wireless timer callback runs in an endless loop,
> > what is a runtime detection of that in the hrtimer softirq invocation
> > helping to decode the problem if the stall detector catches it when it
> > hangs there?
> >
> > Now that mac80211 hrtimer callback might actually be not the real
> > problem. It's certainly containing a bunch of loops, but I couldn't find
> > an endless loop there during a cursory inspection.
> >
> > But that callback does rearm the hrtimer and that made me look at
> > hrtimer_run_queues() which might be the reason for the endless loop as
> > it only terminates when there is no timer to expire anymore.
> >
> > Now what happens when the mac80211 callback rearms the timer so it
> > expires immediately again:
> >
> >         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> >                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
> >
> > bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
> > which would certainly cause the loop in hrtimer_run_queues() to keeping
> > looping forever.
> >
> > That should be easy to verify by implementing a simple test which
> > reschedules a hrtimer from the callback with a expiry time close to now.
> >
> > Not today as I'm about to head home to fire up the pizza oven.
> 
> This question definitely shouldn't take priority over the pizza. But I
> think I saw this "rearm a timer with a user-controlled value without
> any checks" pattern lots of times and hangs are inherently harder to
> localize and reproduce. So I wonder if it makes sense to add a debug
> config that would catch such cases right when the timer is set up
> (issue a WARNING)?
> However, for automated testing there is the usual question of
> balancing between false positives and false negatives. The check
> should not produce false positives, but at the same time it should
> catch [almost] all actual stalls so that they don't manifest as
> duplicate stall reports.
> 
> If I understand it correctly the timer is not actually set up as
> periodic, but rather each callback invocation arms it again. Setting
> up a timer for 1 ns _once_ (or few times) is probably fine (right?),
> so the check needs to be somewhat more elaborate and detect "infinite"
> rearming.

If it were practical, I would suggest checking for a CPU never actually
executing any instructions in the interrupted context.  The old-school
way of doing this was to check the amount of time spent interrupted,
perhaps adding some guess at interrupt entry/exit overhead.  Is there
a better new-school way?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-09-14 18:00       ` Dmitry Vyukov
  2021-09-14 18:31         ` Paul E. McKenney
@ 2021-09-15  8:57         ` Thomas Gleixner
  2021-09-15  9:14           ` Dmitry Vyukov
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-15  8:57 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Hillf Danton, syzbot, linux-kernel, paulmck, syzkaller-bugs,
	Peter Zijlstra, kasan-dev, Johannes Berg, Kalle Valo,
	linux-wireless

On Tue, Sep 14 2021 at 20:00, Dmitry Vyukov wrote:
> On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Now what happens when the mac80211 callback rearms the timer so it
>> expires immediately again:
>>
>>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
>>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
>>
>> bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
>> which would certainly cause the loop in hrtimer_run_queues() to keeping
>> looping forever.
>>
>> That should be easy to verify by implementing a simple test which
>> reschedules a hrtimer from the callback with a expiry time close to now.
>>
>> Not today as I'm about to head home to fire up the pizza oven.
>
> This question definitely shouldn't take priority over the pizza. But I
> think I saw this "rearm a timer with a user-controlled value without
> any checks" pattern lots of times and hangs are inherently harder to
> localize and reproduce. So I wonder if it makes sense to add a debug
> config that would catch such cases right when the timer is set up
> (issue a WARNING)?

Yes and no. It's hard to differentiate between a valid short expiry
rearm and something which is caused by unchecked values. I have some
ideas but all of them are expensive and therefore probably debug
only. Which is actually better than nothing :)

> However, for automated testing there is the usual question of
> balancing between false positives and false negatives. The check
> should not produce false positives, but at the same time it should
> catch [almost] all actual stalls so that they don't manifest as
> duplicate stall reports.

Right. The problem could be even there with checked values:

       start_timer(1ms)
       timer_expires()
         callback()
           forward_timer(timer, now, period(1ms));

which might be perfectly fine with a production kernel as it leaves
enough time to make overall progress.

Now with a full debug kernel with all bells and whistels that callback
might just run into this situation:

      start_timer(1ms) T0
       timer_expires() T1
         callback()
           do_stuff()
           forward_timer(timer, TNOW, period(1ms));


T1 - T0   = 1.001ms
TNOW - T1 = 0.998 ms

So the forward will just rearm it to T0 + 2ms which means it expires in
1us.

> If I understand it correctly the timer is not actually set up as
> periodic, but rather each callback invocation arms it again. Setting
> up a timer for 1 ns _once_ (or few times) is probably fine (right?),
> so the check needs to be somewhat more elaborate and detect "infinite"
> rearming.

Yes.

That made me actually look at that mac80211_hwsim callback again.

	hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
			ns_to_ktime(bcn_int * NSEC_PER_USEC));

So what this does is really wrong because it tries to schedule the timer
on the theoretical periodic timeline. Which goes really south once the
timer is late or the callback execution took longer than the
period. Hypervisors scheduling out a VCPU at the wrong place will do
that for you nicely.

What this actually should use is hrtimer_forward_now() which prevents
that problem because it will forward the timer in the periodic schedule
beyond now. That won't prevent the above corner case, but I doubt you
can create an endless loop with that scenario as easy as you can with
trying to catch up on your theoretical timeline by using the previous
expiry time as a base for the forward. Patch below.

/me goes off to audit hrtimer_forward() usage. Sigh...

After that figure out ways to debug or even prevent this. More sigh...

Thanks,

        tglx
---
 drivers/net/wireless/mac80211_hwsim.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -1867,8 +1867,8 @@ mac80211_hwsim_beacon(struct hrtimer *ti
 		bcn_int -= data->bcn_delta;
 		data->bcn_delta = 0;
 	}
-	hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
-			ns_to_ktime(bcn_int * NSEC_PER_USEC));
+	hrtimer_forward_now(&data->beacon_timer,
+			    ns_to_ktime(bcn_int * NSEC_PER_USEC));
 	return HRTIMER_RESTART;
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-09-15  8:57         ` Thomas Gleixner
@ 2021-09-15  9:14           ` Dmitry Vyukov
  2021-09-15  9:32             ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Vyukov @ 2021-09-15  9:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Hillf Danton, syzbot, linux-kernel, paulmck, syzkaller-bugs,
	Peter Zijlstra, kasan-dev, Johannes Berg, Kalle Valo,
	linux-wireless

On Wed, 15 Sept 2021 at 10:57, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Sep 14 2021 at 20:00, Dmitry Vyukov wrote:
> > On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Now what happens when the mac80211 callback rearms the timer so it
> >> expires immediately again:
> >>
> >>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> >>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
> >>
> >> bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
> >> which would certainly cause the loop in hrtimer_run_queues() to keeping
> >> looping forever.
> >>
> >> That should be easy to verify by implementing a simple test which
> >> reschedules a hrtimer from the callback with a expiry time close to now.
> >>
> >> Not today as I'm about to head home to fire up the pizza oven.
> >
> > This question definitely shouldn't take priority over the pizza. But I
> > think I saw this "rearm a timer with a user-controlled value without
> > any checks" pattern lots of times and hangs are inherently harder to
> > localize and reproduce. So I wonder if it makes sense to add a debug
> > config that would catch such cases right when the timer is set up
> > (issue a WARNING)?
>
> Yes and no. It's hard to differentiate between a valid short expiry
> rearm and something which is caused by unchecked values. I have some
> ideas but all of them are expensive and therefore probably debug
> only. Which is actually better than nothing :)
>
> > However, for automated testing there is the usual question of
> > balancing between false positives and false negatives. The check
> > should not produce false positives, but at the same time it should
> > catch [almost] all actual stalls so that they don't manifest as
> > duplicate stall reports.
>
> Right. The problem could be even there with checked values:
>
>        start_timer(1ms)
>        timer_expires()
>          callback()
>            forward_timer(timer, now, period(1ms));
>
> which might be perfectly fine with a production kernel as it leaves
> enough time to make overall progress.
>
> Now with a full debug kernel with all bells and whistels that callback
> might just run into this situation:
>
>       start_timer(1ms) T0
>        timer_expires() T1
>          callback()
>            do_stuff()
>            forward_timer(timer, TNOW, period(1ms));
>
>
> T1 - T0   = 1.001ms
> TNOW - T1 = 0.998 ms
>
> So the forward will just rearm it to T0 + 2ms which means it expires in
> 1us.
>
> > If I understand it correctly the timer is not actually set up as
> > periodic, but rather each callback invocation arms it again. Setting
> > up a timer for 1 ns _once_ (or few times) is probably fine (right?),
> > so the check needs to be somewhat more elaborate and detect "infinite"
> > rearming.
>
> Yes.
>
> That made me actually look at that mac80211_hwsim callback again.
>
>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
>
> So what this does is really wrong because it tries to schedule the timer
> on the theoretical periodic timeline. Which goes really south once the
> timer is late or the callback execution took longer than the
> period. Hypervisors scheduling out a VCPU at the wrong place will do
> that for you nicely.

Nice!

You mentioned that hrtimer_run_queues() may not return. Does it mean
that it can just loop executing the same re-armed callback again and
again? Maybe then the debug check condition should be that
hrtimer_run_queues() runs the same callback more than N times w/o
returning?


> What this actually should use is hrtimer_forward_now() which prevents
> that problem because it will forward the timer in the periodic schedule
> beyond now. That won't prevent the above corner case, but I doubt you
> can create an endless loop with that scenario as easy as you can with
> trying to catch up on your theoretical timeline by using the previous
> expiry time as a base for the forward. Patch below.
>
> /me goes off to audit hrtimer_forward() usage. Sigh...
>
> After that figure out ways to debug or even prevent this. More sigh...
>
> Thanks,
>
>         tglx
> ---
>  drivers/net/wireless/mac80211_hwsim.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/net/wireless/mac80211_hwsim.c
> +++ b/drivers/net/wireless/mac80211_hwsim.c
> @@ -1867,8 +1867,8 @@ mac80211_hwsim_beacon(struct hrtimer *ti
>                 bcn_int -= data->bcn_delta;
>                 data->bcn_delta = 0;
>         }
> -       hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> -                       ns_to_ktime(bcn_int * NSEC_PER_USEC));
> +       hrtimer_forward_now(&data->beacon_timer,
> +                           ns_to_ktime(bcn_int * NSEC_PER_USEC));
>         return HRTIMER_RESTART;
>  }
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-09-15  9:14           ` Dmitry Vyukov
@ 2021-09-15  9:32             ` Thomas Gleixner
  2021-09-16  9:24               ` Dmitry Vyukov
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-15  9:32 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Hillf Danton, syzbot, linux-kernel, paulmck, syzkaller-bugs,
	Peter Zijlstra, kasan-dev, Johannes Berg, Kalle Valo,
	linux-wireless

On Wed, Sep 15 2021 at 11:14, Dmitry Vyukov wrote:
> On Wed, 15 Sept 2021 at 10:57, Thomas Gleixner <tglx@linutronix.de> wrote:
>> That made me actually look at that mac80211_hwsim callback again.
>>
>>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
>>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
>>
>> So what this does is really wrong because it tries to schedule the timer
>> on the theoretical periodic timeline. Which goes really south once the
>> timer is late or the callback execution took longer than the
>> period. Hypervisors scheduling out a VCPU at the wrong place will do
>> that for you nicely.
>
> Nice!
>
> You mentioned that hrtimer_run_queues() may not return. Does it mean
> that it can just loop executing the same re-armed callback again and
> again? Maybe then the debug check condition should be that
> hrtimer_run_queues() runs the same callback more than N times w/o
> returning?

Something like that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-09-14 18:31         ` Paul E. McKenney
@ 2021-09-15  9:36           ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-15  9:36 UTC (permalink / raw)
  To: paulmck, Dmitry Vyukov
  Cc: Hillf Danton, syzbot, linux-kernel, syzkaller-bugs,
	Peter Zijlstra, kasan-dev

On Tue, Sep 14 2021 at 11:31, Paul E. McKenney wrote:
> On Tue, Sep 14, 2021 at 08:00:04PM +0200, Dmitry Vyukov wrote:
>> If I understand it correctly the timer is not actually set up as
>> periodic, but rather each callback invocation arms it again. Setting
>> up a timer for 1 ns _once_ (or few times) is probably fine (right?),
>> so the check needs to be somewhat more elaborate and detect "infinite"
>> rearming.
>
> If it were practical, I would suggest checking for a CPU never actually
> executing any instructions in the interrupted context.  The old-school
> way of doing this was to check the amount of time spent interrupted,
> perhaps adding some guess at interrupt entry/exit overhead.  Is there
> a better new-school way?

Set NR_CPUS=0 and if then any executed instruction is observed the bug
is pretty obvious, isn't it?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
  2021-09-15  9:32             ` Thomas Gleixner
@ 2021-09-16  9:24               ` Dmitry Vyukov
  0 siblings, 0 replies; 11+ messages in thread
From: Dmitry Vyukov @ 2021-09-16  9:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Hillf Danton, syzbot, linux-kernel, paulmck, syzkaller-bugs,
	Peter Zijlstra, kasan-dev, Johannes Berg, Kalle Valo,
	linux-wireless

On Wed, 15 Sept 2021 at 11:32, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, Sep 15 2021 at 11:14, Dmitry Vyukov wrote:
> > On Wed, 15 Sept 2021 at 10:57, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> That made me actually look at that mac80211_hwsim callback again.
> >>
> >>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> >>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
> >>
> >> So what this does is really wrong because it tries to schedule the timer
> >> on the theoretical periodic timeline. Which goes really south once the
> >> timer is late or the callback execution took longer than the
> >> period. Hypervisors scheduling out a VCPU at the wrong place will do
> >> that for you nicely.
> >
> > Nice!
> >
> > You mentioned that hrtimer_run_queues() may not return. Does it mean
> > that it can just loop executing the same re-armed callback again and
> > again? Maybe then the debug check condition should be that
> > hrtimer_run_queues() runs the same callback more than N times w/o
> > returning?
>
> Something like that.

I've filed https://bugzilla.kernel.org/show_bug.cgi?id=214429 so that
it's not lost. Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-16  9:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-28  4:52 [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode syzbot
2021-08-30 10:58 ` Dmitry Vyukov
     [not found] ` <20210831074532.2255-1-hdanton@sina.com>
2021-09-13 10:28   ` Thomas Gleixner
     [not found]   ` <20210914123726.4219-1-hdanton@sina.com>
2021-09-14 14:58     ` Thomas Gleixner
2021-09-14 18:00       ` Dmitry Vyukov
2021-09-14 18:31         ` Paul E. McKenney
2021-09-15  9:36           ` Thomas Gleixner
2021-09-15  8:57         ` Thomas Gleixner
2021-09-15  9:14           ` Dmitry Vyukov
2021-09-15  9:32             ` Thomas Gleixner
2021-09-16  9:24               ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).