* INFO: rcu detected stall in sys_kill @ 2019-12-03 8:27 syzbot 2019-12-03 8:38 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: syzbot @ 2019-12-03 8:27 UTC (permalink / raw) To: aarcange, akpm, christian, christian, cyphar, elena.reshetova, jgg, keescook, ldv, linux-kernel, luto, mingo, peterz, syzkaller-bugs, tglx, viro, wad Hello, syzbot found the following crash on: HEAD commit: 596cf45c Merge branch 'akpm' (patches from Andrew) git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000 kernel config: https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4 dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb compiler: clang version 9.0.0 (/home/glider/llvm/clang 80fee25776c2fb61e74c1ecb1a523375c2500b69) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: (detected by 1, t=10502 jiffies, g=6629, q=331) rcu: All QSes seen, last rcu_preempt kthread activity 10503 (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0 syz-executor.0 R running task 24648 8293 8292 0x0000400a Call Trace: <IRQ> sched_show_task+0x40f/0x560 kernel/sched/core.c:5954 print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline] check_cpu_stall kernel/rcu/tree_stall.h:538 [inline] rcu_pending kernel/rcu/tree.c:2827 [inline] rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271 update_process_times+0x12d/0x180 kernel/time/timer.c:1726 tick_sched_handle kernel/time/tick-sched.c:167 [inline] tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310 __run_hrtimer kernel/time/hrtimer.c:1514 [inline] __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576 hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102 Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25 c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48 RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100 RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240 RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025 R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000 R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428 free_thread_stack+0x168/0x590 kernel/fork.c:280 release_task_stack kernel/fork.c:440 [inline] put_task_stack+0xa3/0x130 kernel/fork.c:451 finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256 context_switch kernel/sched/core.c:3388 [inline] __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 preempt_schedule_common kernel/sched/core.c:4236 [inline] preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261 ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50 __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline] _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255 kill_something_info kernel/signal.c:1586 [inline] __do_sys_kill kernel/signal.c:3640 [inline] __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634 __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634 do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x422a17 Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17 RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008 R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580 rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: RCU grace-period kthread stack dump: rcu_preempt R running task 29032 10 2 0x80004008 Call Trace: context_switch kernel/sched/core.c:3388 [inline] __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 schedule+0x181/0x210 kernel/sched/core.c:4155 schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895 rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline] rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821 kthread+0x332/0x350 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2019-12-03 8:27 INFO: rcu detected stall in sys_kill syzbot @ 2019-12-03 8:38 ` Dmitry Vyukov 2019-12-04 13:58 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: Dmitry Vyukov @ 2019-12-03 8:38 UTC (permalink / raw) To: syzbot, Casey Schaufler, linux-security-module, Daniel Axtens, Andrey Ryabinin, kasan-dev Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian, cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry On Tue, Dec 3, 2019 at 9:27 AM syzbot <syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following crash on: > > HEAD commit: 596cf45c Merge branch 'akpm' (patches from Andrew) > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4 > dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb > compiler: clang version 9.0.0 (/home/glider/llvm/clang > 80fee25776c2fb61e74c1ecb1a523375c2500b69) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com Something seriously broken in smack+kasan+vmap stacks, we now have 60 rcu stalls all over the place and counting. This is one of the samples. I've duped 2 other samples to this one, you can see them on the dashboard: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb I see 2 common this across all stalls: 1. They all happen on the instance that uses smack (which is now effectively dead), see smack instance here: https://syzkaller.appspot.com/upstream 2. They all contain this frame in the stack trace: free_thread_stack+0x168/0x590 kernel/fork.c:280 The last commit that touches this file is "fork: support VMAP_STACK with KASAN_VMALLOC". That may be very likely the root cause. +Daniel > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > (detected by 1, t=10502 jiffies, g=6629, q=331) > rcu: All QSes seen, last rcu_preempt kthread activity 10503 > (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0 > syz-executor.0 R running task 24648 8293 8292 0x0000400a > Call Trace: > <IRQ> > sched_show_task+0x40f/0x560 kernel/sched/core.c:5954 > print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline] > check_cpu_stall kernel/rcu/tree_stall.h:538 [inline] > rcu_pending kernel/rcu/tree.c:2827 [inline] > rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271 > update_process_times+0x12d/0x180 kernel/time/timer.c:1726 > tick_sched_handle kernel/time/tick-sched.c:167 [inline] > tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310 > __run_hrtimer kernel/time/hrtimer.c:1514 [inline] > __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576 > hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] > smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135 > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 > </IRQ> > RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] > RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline] > RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102 > Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25 > c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00 > 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48 > RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100 > RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240 > RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025 > R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000 > R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428 > free_thread_stack+0x168/0x590 kernel/fork.c:280 > release_task_stack kernel/fork.c:440 [inline] > put_task_stack+0xa3/0x130 kernel/fork.c:451 > finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256 > context_switch kernel/sched/core.c:3388 [inline] > __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 > preempt_schedule_common kernel/sched/core.c:4236 [inline] > preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261 > ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50 > __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline] > _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255 > kill_something_info kernel/signal.c:1586 [inline] > __do_sys_kill kernel/signal.c:3640 [inline] > __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634 > __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634 > do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x422a17 > Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e > 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff > ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e > RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17 > RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940 > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008 > R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580 > rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2 > RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > rcu: RCU grace-period kthread stack dump: > rcu_preempt R running task 29032 10 2 0x80004008 > Call Trace: > context_switch kernel/sched/core.c:3388 [inline] > __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 > schedule+0x181/0x210 kernel/sched/core.c:4155 > schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895 > rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline] > rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821 > kthread+0x332/0x350 kernel/kthread.c:255 > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2019-12-03 8:38 ` Dmitry Vyukov @ 2019-12-04 13:58 ` Dmitry Vyukov 2019-12-04 16:05 ` Casey Schaufler 0 siblings, 1 reply; 22+ messages in thread From: Dmitry Vyukov @ 2019-12-04 13:58 UTC (permalink / raw) To: syzbot, Casey Schaufler, linux-security-module, Daniel Axtens, Andrey Ryabinin, kasan-dev Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian, cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry On Tue, Dec 3, 2019 at 9:38 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Tue, Dec 3, 2019 at 9:27 AM syzbot > <syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com> wrote: > > > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: 596cf45c Merge branch 'akpm' (patches from Andrew) > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4 > > dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb > > compiler: clang version 9.0.0 (/home/glider/llvm/clang > > 80fee25776c2fb61e74c1ecb1a523375c2500b69) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com > > Something seriously broken in smack+kasan+vmap stacks, we now have 60 > rcu stalls all over the place and counting. This is one of the > samples. I've duped 2 other samples to this one, you can see them on > the dashboard: > https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb > > I see 2 common this across all stalls: > 1. They all happen on the instance that uses smack (which is now > effectively dead), see smack instance here: > https://syzkaller.appspot.com/upstream > 2. They all contain this frame in the stack trace: > free_thread_stack+0x168/0x590 kernel/fork.c:280 > The last commit that touches this file is "fork: support VMAP_STACK > with KASAN_VMALLOC". > That may be very likely the root cause. +Daniel I've stopped smack syzbot instance b/c it produces infinite stream of assorted crashes due to this. Please ping syzkaller@googlegroups.com when this is fixed, I will re-enable the instance. > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > > (detected by 1, t=10502 jiffies, g=6629, q=331) > > rcu: All QSes seen, last rcu_preempt kthread activity 10503 > > (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0 > > syz-executor.0 R running task 24648 8293 8292 0x0000400a > > Call Trace: > > <IRQ> > > sched_show_task+0x40f/0x560 kernel/sched/core.c:5954 > > print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline] > > check_cpu_stall kernel/rcu/tree_stall.h:538 [inline] > > rcu_pending kernel/rcu/tree.c:2827 [inline] > > rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271 > > update_process_times+0x12d/0x180 kernel/time/timer.c:1726 > > tick_sched_handle kernel/time/tick-sched.c:167 [inline] > > tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310 > > __run_hrtimer kernel/time/hrtimer.c:1514 [inline] > > __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576 > > hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638 > > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] > > smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135 > > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 > > </IRQ> > > RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] > > RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline] > > RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102 > > Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25 > > c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00 > > 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48 > > RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > > RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100 > > RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240 > > RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025 > > R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000 > > R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428 > > free_thread_stack+0x168/0x590 kernel/fork.c:280 > > release_task_stack kernel/fork.c:440 [inline] > > put_task_stack+0xa3/0x130 kernel/fork.c:451 > > finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256 > > context_switch kernel/sched/core.c:3388 [inline] > > __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 > > preempt_schedule_common kernel/sched/core.c:4236 [inline] > > preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261 > > ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50 > > __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline] > > _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255 > > kill_something_info kernel/signal.c:1586 [inline] > > __do_sys_kill kernel/signal.c:3640 [inline] > > __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634 > > __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634 > > do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x422a17 > > Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e > > 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff > > ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00 > > RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e > > RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17 > > RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940 > > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008 > > R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580 > > rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2 > > RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > > rcu: RCU grace-period kthread stack dump: > > rcu_preempt R running task 29032 10 2 0x80004008 > > Call Trace: > > context_switch kernel/sched/core.c:3388 [inline] > > __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 > > schedule+0x181/0x210 kernel/sched/core.c:4155 > > schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895 > > rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline] > > rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821 > > kthread+0x332/0x350 kernel/kthread.c:255 > > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > > > > > > --- > > This bug is generated by a bot. It may contain errors. > > See https://goo.gl/tpsmEJ for more information about syzbot. > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > syzbot will keep track of this bug report. See: > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > -- > > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2019-12-04 13:58 ` Dmitry Vyukov @ 2019-12-04 16:05 ` Casey Schaufler 2019-12-04 23:34 ` Daniel Axtens 0 siblings, 1 reply; 22+ messages in thread From: Casey Schaufler @ 2019-12-04 16:05 UTC (permalink / raw) To: Dmitry Vyukov, syzbot, linux-security-module, Daniel Axtens, Andrey Ryabinin, kasan-dev Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian, cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler On 12/4/2019 5:58 AM, Dmitry Vyukov wrote: > On Tue, Dec 3, 2019 at 9:38 AM Dmitry Vyukov <dvyukov@google.com> wrote: >> On Tue, Dec 3, 2019 at 9:27 AM syzbot >> <syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com> wrote: >>> Hello, >>> >>> syzbot found the following crash on: >>> >>> HEAD commit: 596cf45c Merge branch 'akpm' (patches from Andrew) >>> git tree: upstream >>> console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4 >>> dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb >>> compiler: clang version 9.0.0 (/home/glider/llvm/clang >>> 80fee25776c2fb61e74c1ecb1a523375c2500b69) >>> >>> Unfortunately, I don't have any reproducer for this crash yet. >>> >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: >>> Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com >> Something seriously broken in smack+kasan+vmap stacks, we now have 60 >> rcu stalls all over the place and counting. This is one of the >> samples. I've duped 2 other samples to this one, you can see them on >> the dashboard: >> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb There haven't been Smack changes recently, so this is going to have been introduced elsewhere. I'm perfectly willing to accept that Smack is doing something horribly wrong WRT rcu, and that it needs repair, but its going to be tough for me to track down. I hope someone else is looking into this, as my chances of finding the problem are pretty slim. >> >> I see 2 common this across all stalls: >> 1. They all happen on the instance that uses smack (which is now >> effectively dead), see smack instance here: >> https://syzkaller.appspot.com/upstream >> 2. They all contain this frame in the stack trace: >> free_thread_stack+0x168/0x590 kernel/fork.c:280 >> The last commit that touches this file is "fork: support VMAP_STACK >> with KASAN_VMALLOC". >> That may be very likely the root cause. +Daniel > I've stopped smack syzbot instance b/c it produces infinite stream of > assorted crashes due to this. > Please ping syzkaller@googlegroups.com when this is fixed, I will > re-enable the instance. > >>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: >>> (detected by 1, t=10502 jiffies, g=6629, q=331) >>> rcu: All QSes seen, last rcu_preempt kthread activity 10503 >>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0 >>> syz-executor.0 R running task 24648 8293 8292 0x0000400a >>> Call Trace: >>> <IRQ> >>> sched_show_task+0x40f/0x560 kernel/sched/core.c:5954 >>> print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline] >>> check_cpu_stall kernel/rcu/tree_stall.h:538 [inline] >>> rcu_pending kernel/rcu/tree.c:2827 [inline] >>> rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271 >>> update_process_times+0x12d/0x180 kernel/time/timer.c:1726 >>> tick_sched_handle kernel/time/tick-sched.c:167 [inline] >>> tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310 >>> __run_hrtimer kernel/time/hrtimer.c:1514 [inline] >>> __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576 >>> hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638 >>> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] >>> smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135 >>> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 >>> </IRQ> >>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] >>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline] >>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102 >>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25 >>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00 >>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48 >>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 >>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100 >>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240 >>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025 >>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000 >>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428 >>> free_thread_stack+0x168/0x590 kernel/fork.c:280 >>> release_task_stack kernel/fork.c:440 [inline] >>> put_task_stack+0xa3/0x130 kernel/fork.c:451 >>> finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256 >>> context_switch kernel/sched/core.c:3388 [inline] >>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 >>> preempt_schedule_common kernel/sched/core.c:4236 [inline] >>> preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261 >>> ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50 >>> __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline] >>> _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255 >>> kill_something_info kernel/signal.c:1586 [inline] >>> __do_sys_kill kernel/signal.c:3640 [inline] >>> __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634 >>> __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634 >>> do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> RIP: 0033:0x422a17 >>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e >>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff >>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00 >>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e >>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17 >>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe >>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940 >>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008 >>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580 >>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2 >>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 >>> rcu: RCU grace-period kthread stack dump: >>> rcu_preempt R running task 29032 10 2 0x80004008 >>> Call Trace: >>> context_switch kernel/sched/core.c:3388 [inline] >>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 >>> schedule+0x181/0x210 kernel/sched/core.c:4155 >>> schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895 >>> rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline] >>> rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821 >>> kthread+0x332/0x350 kernel/kthread.c:255 >>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 >>> >>> >>> --- >>> This bug is generated by a bot. It may contain errors. >>> See https://goo.gl/tpsmEJ for more information about syzbot. >>> syzbot engineers can be reached at syzkaller@googlegroups.com. >>> >>> syzbot will keep track of this bug report. See: >>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2019-12-04 16:05 ` Casey Schaufler @ 2019-12-04 23:34 ` Daniel Axtens 2019-12-17 13:38 ` Daniel Axtens 0 siblings, 1 reply; 22+ messages in thread From: Daniel Axtens @ 2019-12-04 23:34 UTC (permalink / raw) To: Casey Schaufler, Dmitry Vyukov, syzbot, linux-security-module, Andrey Ryabinin, kasan-dev Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian, cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler Hi Casey, > There haven't been Smack changes recently, so this is > going to have been introduced elsewhere. I'm perfectly > willing to accept that Smack is doing something horribly > wrong WRT rcu, and that it needs repair, but its going to > be tough for me to track down. I hope someone else is looking > into this, as my chances of finding the problem are pretty > slim. Yeah, I'm having a look, it's probably related to my kasan-vmalloc stuff. It's currently in a bit of flux as syzkaller finds a bunch of other bugs with it, once that stablises a bit I'll come back to Smack. Regards, Daniel > >>> >>> I see 2 common this across all stalls: >>> 1. They all happen on the instance that uses smack (which is now >>> effectively dead), see smack instance here: >>> https://syzkaller.appspot.com/upstream >>> 2. They all contain this frame in the stack trace: >>> free_thread_stack+0x168/0x590 kernel/fork.c:280 >>> The last commit that touches this file is "fork: support VMAP_STACK >>> with KASAN_VMALLOC". >>> That may be very likely the root cause. +Daniel >> I've stopped smack syzbot instance b/c it produces infinite stream of >> assorted crashes due to this. >> Please ping syzkaller@googlegroups.com when this is fixed, I will >> re-enable the instance. >> >>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: >>>> (detected by 1, t=10502 jiffies, g=6629, q=331) >>>> rcu: All QSes seen, last rcu_preempt kthread activity 10503 >>>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0 >>>> syz-executor.0 R running task 24648 8293 8292 0x0000400a >>>> Call Trace: >>>> <IRQ> >>>> sched_show_task+0x40f/0x560 kernel/sched/core.c:5954 >>>> print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline] >>>> check_cpu_stall kernel/rcu/tree_stall.h:538 [inline] >>>> rcu_pending kernel/rcu/tree.c:2827 [inline] >>>> rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271 >>>> update_process_times+0x12d/0x180 kernel/time/timer.c:1726 >>>> tick_sched_handle kernel/time/tick-sched.c:167 [inline] >>>> tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310 >>>> __run_hrtimer kernel/time/hrtimer.c:1514 [inline] >>>> __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576 >>>> hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638 >>>> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] >>>> smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135 >>>> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 >>>> </IRQ> >>>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] >>>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline] >>>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102 >>>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25 >>>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00 >>>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48 >>>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 >>>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100 >>>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240 >>>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025 >>>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000 >>>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428 >>>> free_thread_stack+0x168/0x590 kernel/fork.c:280 >>>> release_task_stack kernel/fork.c:440 [inline] >>>> put_task_stack+0xa3/0x130 kernel/fork.c:451 >>>> finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256 >>>> context_switch kernel/sched/core.c:3388 [inline] >>>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 >>>> preempt_schedule_common kernel/sched/core.c:4236 [inline] >>>> preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261 >>>> ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50 >>>> __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline] >>>> _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255 >>>> kill_something_info kernel/signal.c:1586 [inline] >>>> __do_sys_kill kernel/signal.c:3640 [inline] >>>> __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634 >>>> __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634 >>>> do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> RIP: 0033:0x422a17 >>>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e >>>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff >>>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00 >>>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e >>>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17 >>>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe >>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940 >>>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008 >>>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580 >>>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2 >>>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 >>>> rcu: RCU grace-period kthread stack dump: >>>> rcu_preempt R running task 29032 10 2 0x80004008 >>>> Call Trace: >>>> context_switch kernel/sched/core.c:3388 [inline] >>>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 >>>> schedule+0x181/0x210 kernel/sched/core.c:4155 >>>> schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895 >>>> rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline] >>>> rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821 >>>> kthread+0x332/0x350 kernel/kthread.c:255 >>>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 >>>> >>>> >>>> --- >>>> This bug is generated by a bot. It may contain errors. >>>> See https://goo.gl/tpsmEJ for more information about syzbot. >>>> syzbot engineers can be reached at syzkaller@googlegroups.com. >>>> >>>> syzbot will keep track of this bug report. See: >>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2019-12-04 23:34 ` Daniel Axtens @ 2019-12-17 13:38 ` Daniel Axtens 2020-01-08 6:20 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: Daniel Axtens @ 2019-12-17 13:38 UTC (permalink / raw) To: Casey Schaufler, Dmitry Vyukov, syzbot, linux-security-module, Andrey Ryabinin, kasan-dev Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian, cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler Daniel Axtens <dja@axtens.net> writes: > Hi Casey, > >> There haven't been Smack changes recently, so this is >> going to have been introduced elsewhere. I'm perfectly >> willing to accept that Smack is doing something horribly >> wrong WRT rcu, and that it needs repair, but its going to >> be tough for me to track down. I hope someone else is looking >> into this, as my chances of finding the problem are pretty >> slim. > > Yeah, I'm having a look, it's probably related to my kasan-vmalloc > stuff. It's currently in a bit of flux as syzkaller finds a bunch of > other bugs with it, once that stablises a bit I'll come back to Smack. I have had a brief and wildly unsuccessful look at this. I'm happy to come back to it and go over it with a finer toothed comb, but it will almost certainly have to wait until next year. I don't think it's related to RCU, we also have a plain lockup: https://syzkaller.appspot.com/bug?id=be03729d17bb3b2df1754a7486a8f8628f6ff1ec Dmitry, I've been really struggling to repro this locally, even with your config. Is there an easy way to see the kernel command line you booted with and anything else that makes this image special? I have zero experience with smack so this is a steep learning curve. Regards, Daniel > > Regards, > Daniel > >> >>>> >>>> I see 2 common this across all stalls: >>>> 1. They all happen on the instance that uses smack (which is now >>>> effectively dead), see smack instance here: >>>> https://syzkaller.appspot.com/upstream >>>> 2. They all contain this frame in the stack trace: >>>> free_thread_stack+0x168/0x590 kernel/fork.c:280 >>>> The last commit that touches this file is "fork: support VMAP_STACK >>>> with KASAN_VMALLOC". >>>> That may be very likely the root cause. +Daniel >>> I've stopped smack syzbot instance b/c it produces infinite stream of >>> assorted crashes due to this. >>> Please ping syzkaller@googlegroups.com when this is fixed, I will >>> re-enable the instance. >>> >>>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: >>>>> (detected by 1, t=10502 jiffies, g=6629, q=331) >>>>> rcu: All QSes seen, last rcu_preempt kthread activity 10503 >>>>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0 >>>>> syz-executor.0 R running task 24648 8293 8292 0x0000400a >>>>> Call Trace: >>>>> <IRQ> >>>>> sched_show_task+0x40f/0x560 kernel/sched/core.c:5954 >>>>> print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline] >>>>> check_cpu_stall kernel/rcu/tree_stall.h:538 [inline] >>>>> rcu_pending kernel/rcu/tree.c:2827 [inline] >>>>> rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271 >>>>> update_process_times+0x12d/0x180 kernel/time/timer.c:1726 >>>>> tick_sched_handle kernel/time/tick-sched.c:167 [inline] >>>>> tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310 >>>>> __run_hrtimer kernel/time/hrtimer.c:1514 [inline] >>>>> __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576 >>>>> hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638 >>>>> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] >>>>> smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135 >>>>> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 >>>>> </IRQ> >>>>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] >>>>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline] >>>>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102 >>>>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25 >>>>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00 >>>>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48 >>>>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 >>>>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100 >>>>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240 >>>>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025 >>>>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000 >>>>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428 >>>>> free_thread_stack+0x168/0x590 kernel/fork.c:280 >>>>> release_task_stack kernel/fork.c:440 [inline] >>>>> put_task_stack+0xa3/0x130 kernel/fork.c:451 >>>>> finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256 >>>>> context_switch kernel/sched/core.c:3388 [inline] >>>>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 >>>>> preempt_schedule_common kernel/sched/core.c:4236 [inline] >>>>> preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261 >>>>> ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50 >>>>> __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline] >>>>> _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255 >>>>> kill_something_info kernel/signal.c:1586 [inline] >>>>> __do_sys_kill kernel/signal.c:3640 [inline] >>>>> __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634 >>>>> __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634 >>>>> do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>> RIP: 0033:0x422a17 >>>>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e >>>>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff >>>>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00 >>>>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e >>>>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17 >>>>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe >>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940 >>>>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008 >>>>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580 >>>>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2 >>>>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 >>>>> rcu: RCU grace-period kthread stack dump: >>>>> rcu_preempt R running task 29032 10 2 0x80004008 >>>>> Call Trace: >>>>> context_switch kernel/sched/core.c:3388 [inline] >>>>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 >>>>> schedule+0x181/0x210 kernel/sched/core.c:4155 >>>>> schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895 >>>>> rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline] >>>>> rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821 >>>>> kthread+0x332/0x350 kernel/kthread.c:255 >>>>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 >>>>> >>>>> >>>>> --- >>>>> This bug is generated by a bot. It may contain errors. >>>>> See https://goo.gl/tpsmEJ for more information about syzbot. >>>>> syzbot engineers can be reached at syzkaller@googlegroups.com. >>>>> >>>>> syzbot will keep track of this bug report. See: >>>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2019-12-17 13:38 ` Daniel Axtens @ 2020-01-08 6:20 ` Dmitry Vyukov 2020-01-08 10:25 ` Tetsuo Handa 0 siblings, 1 reply; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-08 6:20 UTC (permalink / raw) To: Daniel Axtens Cc: Casey Schaufler, syzbot, linux-security-module, Andrey Ryabinin, kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner, christian, cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry On Tue, Dec 17, 2019 at 2:39 PM Daniel Axtens <dja@axtens.net> wrote: > > Daniel Axtens <dja@axtens.net> writes: > > > Hi Casey, > > > >> There haven't been Smack changes recently, so this is > >> going to have been introduced elsewhere. I'm perfectly > >> willing to accept that Smack is doing something horribly > >> wrong WRT rcu, and that it needs repair, but its going to > >> be tough for me to track down. I hope someone else is looking > >> into this, as my chances of finding the problem are pretty > >> slim. > > > > Yeah, I'm having a look, it's probably related to my kasan-vmalloc > > stuff. It's currently in a bit of flux as syzkaller finds a bunch of > > other bugs with it, once that stablises a bit I'll come back to Smack. > > I have had a brief and wildly unsuccessful look at this. I'm happy to > come back to it and go over it with a finer toothed comb, but it will > almost certainly have to wait until next year. > > I don't think it's related to RCU, we also have a plain lockup: > https://syzkaller.appspot.com/bug?id=be03729d17bb3b2df1754a7486a8f8628f6ff1ec > > Dmitry, I've been really struggling to repro this locally, even with > your config. Is there an easy way to see the kernel command line you > booted with and anything else that makes this image special? I have zero > experience with smack so this is a steep learning curve. I temporarily re-enabled smack instance and it produced another 50 stalls all over the kernel, and now keeps spewing a dozen every hour. I've mailed 3 new samples, you can see them here: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb The config is provided, command line args are here: https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline Some non-default sysctls that syzbot sets are here: https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl Image can be downloaded from here: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not look to be virtualization-related (?) so probably should reproduce in qemu too. > Regards, > Daniel > > > > > Regards, > > Daniel > > > >> > >>>> > >>>> I see 2 common this across all stalls: > >>>> 1. They all happen on the instance that uses smack (which is now > >>>> effectively dead), see smack instance here: > >>>> https://syzkaller.appspot.com/upstream > >>>> 2. They all contain this frame in the stack trace: > >>>> free_thread_stack+0x168/0x590 kernel/fork.c:280 > >>>> The last commit that touches this file is "fork: support VMAP_STACK > >>>> with KASAN_VMALLOC". > >>>> That may be very likely the root cause. +Daniel > >>> I've stopped smack syzbot instance b/c it produces infinite stream of > >>> assorted crashes due to this. > >>> Please ping syzkaller@googlegroups.com when this is fixed, I will > >>> re-enable the instance. > >>> > >>>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > >>>>> (detected by 1, t=10502 jiffies, g=6629, q=331) > >>>>> rcu: All QSes seen, last rcu_preempt kthread activity 10503 > >>>>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0 > >>>>> syz-executor.0 R running task 24648 8293 8292 0x0000400a > >>>>> Call Trace: > >>>>> <IRQ> > >>>>> sched_show_task+0x40f/0x560 kernel/sched/core.c:5954 > >>>>> print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline] > >>>>> check_cpu_stall kernel/rcu/tree_stall.h:538 [inline] > >>>>> rcu_pending kernel/rcu/tree.c:2827 [inline] > >>>>> rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271 > >>>>> update_process_times+0x12d/0x180 kernel/time/timer.c:1726 > >>>>> tick_sched_handle kernel/time/tick-sched.c:167 [inline] > >>>>> tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310 > >>>>> __run_hrtimer kernel/time/hrtimer.c:1514 [inline] > >>>>> __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576 > >>>>> hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638 > >>>>> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] > >>>>> smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135 > >>>>> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 > >>>>> </IRQ> > >>>>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] > >>>>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline] > >>>>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102 > >>>>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25 > >>>>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00 > >>>>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48 > >>>>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > >>>>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100 > >>>>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240 > >>>>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025 > >>>>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000 > >>>>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428 > >>>>> free_thread_stack+0x168/0x590 kernel/fork.c:280 > >>>>> release_task_stack kernel/fork.c:440 [inline] > >>>>> put_task_stack+0xa3/0x130 kernel/fork.c:451 > >>>>> finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256 > >>>>> context_switch kernel/sched/core.c:3388 [inline] > >>>>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 > >>>>> preempt_schedule_common kernel/sched/core.c:4236 [inline] > >>>>> preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261 > >>>>> ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50 > >>>>> __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline] > >>>>> _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255 > >>>>> kill_something_info kernel/signal.c:1586 [inline] > >>>>> __do_sys_kill kernel/signal.c:3640 [inline] > >>>>> __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634 > >>>>> __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634 > >>>>> do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 > >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>>>> RIP: 0033:0x422a17 > >>>>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e > >>>>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff > >>>>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00 > >>>>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e > >>>>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17 > >>>>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe > >>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940 > >>>>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008 > >>>>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580 > >>>>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2 > >>>>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > >>>>> rcu: RCU grace-period kthread stack dump: > >>>>> rcu_preempt R running task 29032 10 2 0x80004008 > >>>>> Call Trace: > >>>>> context_switch kernel/sched/core.c:3388 [inline] > >>>>> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081 > >>>>> schedule+0x181/0x210 kernel/sched/core.c:4155 > >>>>> schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895 > >>>>> rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline] > >>>>> rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821 > >>>>> kthread+0x332/0x350 kernel/kthread.c:255 > >>>>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > >>>>> > >>>>> > >>>>> --- > >>>>> This bug is generated by a bot. It may contain errors. > >>>>> See https://goo.gl/tpsmEJ for more information about syzbot. > >>>>> syzbot engineers can be reached at syzkaller@googlegroups.com. > >>>>> > >>>>> syzbot will keep track of this bug report. See: > >>>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > >>>>> > >>>>> -- > >>>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > >>>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > >>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com. > > -- > You received this message because you are subscribed to the Google Groups "kasan-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/87h81zax74.fsf%40dja-thinkpad.axtens.net. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-08 6:20 ` Dmitry Vyukov @ 2020-01-08 10:25 ` Tetsuo Handa 2020-01-08 17:19 ` Casey Schaufler 0 siblings, 1 reply; 22+ messages in thread From: Tetsuo Handa @ 2020-01-08 10:25 UTC (permalink / raw) To: Dmitry Vyukov Cc: Casey Schaufler, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On 2020/01/08 15:20, Dmitry Vyukov wrote: > I temporarily re-enabled smack instance and it produced another 50 > stalls all over the kernel, and now keeps spewing a dozen every hour. Since we can get stall reports rather easily, can we try modifying kernel command line (e.g. lsm=smack) and/or kernel config (e.g. no kasan) ? > > I've mailed 3 new samples, you can see them here: > https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb > > The config is provided, command line args are here: > https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline > Some non-default sysctls that syzbot sets are here: > https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl > Image can be downloaded from here: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce > syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not > look to be virtualization-related (?) so probably should reproduce in > qemu too. Is it possible to add instance for linux-next.git that uses these configs? If yes, we could try adding some debug printk() under CONFIG_DEBUG_AID_FOR_SYZBOT=y . ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-08 10:25 ` Tetsuo Handa @ 2020-01-08 17:19 ` Casey Schaufler 2020-01-09 8:19 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: Casey Schaufler @ 2020-01-08 17:19 UTC (permalink / raw) To: Tetsuo Handa, Dmitry Vyukov Cc: syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs, Casey Schaufler On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > On 2020/01/08 15:20, Dmitry Vyukov wrote: >> I temporarily re-enabled smack instance and it produced another 50 >> stalls all over the kernel, and now keeps spewing a dozen every hour. Do I have to be using clang to test this? I'm setting up to work on this, and don't want to waste time using my current tool chain if the problem is clang specific. > Since we can get stall reports rather easily, can we try modifying > kernel command line (e.g. lsm=smack) and/or kernel config (e.g. no kasan) ? > >> I've mailed 3 new samples, you can see them here: >> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb >> >> The config is provided, command line args are here: >> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline >> Some non-default sysctls that syzbot sets are here: >> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl >> Image can be downloaded from here: >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce >> syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not >> look to be virtualization-related (?) so probably should reproduce in >> qemu too. > Is it possible to add instance for linux-next.git that uses these configs? > If yes, we could try adding some debug printk() under CONFIG_DEBUG_AID_FOR_SYZBOT=y . ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-08 17:19 ` Casey Schaufler @ 2020-01-09 8:19 ` Dmitry Vyukov 2020-01-09 8:50 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-09 8:19 UTC (permalink / raw) To: Casey Schaufler Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > >> I temporarily re-enabled smack instance and it produced another 50 > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > Do I have to be using clang to test this? I'm setting up to work on this, > and don't want to waste time using my current tool chain if the problem > is clang specific. Humm, interesting. Initially I was going to say that most likely it's not clang-related. Bug smack instance is actually the only one that uses clang as well (except for KMSAN of course). So maybe it's indeed clang-related rather than smack-related. Let me try to build a kernel with clang. > > Since we can get stall reports rather easily, can we try modifying > > kernel command line (e.g. lsm=smack) and/or kernel config (e.g. no kasan) ? > > > >> I've mailed 3 new samples, you can see them here: > >> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb > >> > >> The config is provided, command line args are here: > >> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline > >> Some non-default sysctls that syzbot sets are here: > >> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl > >> Image can be downloaded from here: > >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce > >> syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not > >> look to be virtualization-related (?) so probably should reproduce in > >> qemu too. > > Is it possible to add instance for linux-next.git that uses these configs? > > If yes, we could try adding some debug printk() under CONFIG_DEBUG_AID_FOR_SYZBOT=y . ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 8:19 ` Dmitry Vyukov @ 2020-01-09 8:50 ` Dmitry Vyukov 2020-01-09 9:29 ` Dmitry Vyukov 2020-01-09 15:43 ` Casey Schaufler 0 siblings, 2 replies; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-09 8:50 UTC (permalink / raw) To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 9:19 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > >> I temporarily re-enabled smack instance and it produced another 50 > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > and don't want to waste time using my current tool chain if the problem > > is clang specific. > > Humm, interesting. Initially I was going to say that most likely it's > not clang-related. Bug smack instance is actually the only one that > uses clang as well (except for KMSAN of course). So maybe it's indeed > clang-related rather than smack-related. Let me try to build a kernel > with clang. +clang-built-linux, glider [clang-built linux is severe broken since early Dec] Building kernel with clang I can immediately reproduce this locally: $ syz-manager 2020/01/09 09:27:15 loading corpus... 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 2020/01/09 09:27:17 booting test machines... 2020/01/09 09:27:17 wait for the connection from test machine... 2020/01/09 09:29:23 machine check: 2020/01/09 09:29:23 syscalls : 2961/3195 2020/01/09 09:29:23 code coverage : enabled 2020/01/09 09:29:23 comparison tracing : enabled 2020/01/09 09:29:23 extra coverage : enabled 2020/01/09 09:29:23 setuid sandbox : enabled 2020/01/09 09:29:23 namespace sandbox : enabled 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy does not exist 2020/01/09 09:29:23 fault injection : enabled 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is not enabled 2020/01/09 09:29:23 net packet injection : enabled 2020/01/09 09:29:23 net device setup : enabled 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan does not exist 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 is not available 2020/01/09 09:29:27 corpus : 50226 (0 deleted) 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep Then I switched LSM to selinux and I _still_ can reproduce this. So, Casey, you may relax, this is not smack-specific :) Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it started working normally. So this is somehow related to both clang and KASAN/VMAP_STACK. The clang I used is: https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz (the one we use on syzbot). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 8:50 ` Dmitry Vyukov @ 2020-01-09 9:29 ` Dmitry Vyukov 2020-01-09 10:05 ` Dmitry Vyukov 2020-01-09 15:43 ` Casey Schaufler 1 sibling, 1 reply; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-09 9:29 UTC (permalink / raw) To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 9:50 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, Jan 9, 2020 at 9:19 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > > >> I temporarily re-enabled smack instance and it produced another 50 > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > > and don't want to waste time using my current tool chain if the problem > > > is clang specific. > > > > Humm, interesting. Initially I was going to say that most likely it's > > not clang-related. Bug smack instance is actually the only one that > > uses clang as well (except for KMSAN of course). So maybe it's indeed > > clang-related rather than smack-related. Let me try to build a kernel > > with clang. > > +clang-built-linux, glider > > [clang-built linux is severe broken since early Dec] > > Building kernel with clang I can immediately reproduce this locally: > > $ syz-manager > 2020/01/09 09:27:15 loading corpus... > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > 2020/01/09 09:27:17 booting test machines... > 2020/01/09 09:27:17 wait for the connection from test machine... > 2020/01/09 09:29:23 machine check: > 2020/01/09 09:29:23 syscalls : 2961/3195 > 2020/01/09 09:29:23 code coverage : enabled > 2020/01/09 09:29:23 comparison tracing : enabled > 2020/01/09 09:29:23 extra coverage : enabled > 2020/01/09 09:29:23 setuid sandbox : enabled > 2020/01/09 09:29:23 namespace sandbox : enabled > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > does not exist > 2020/01/09 09:29:23 fault injection : enabled > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > not enabled > 2020/01/09 09:29:23 net packet injection : enabled > 2020/01/09 09:29:23 net device setup : enabled > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > does not exist > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > is not available > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > Casey, you may relax, this is not smack-specific :) > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > started working normally. > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > The clang I used is: > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > (the one we use on syzbot). Clustering hangs, they all happen within very limited section of the code: 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 Here is disass of the function: https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt But if I am not mistaken, the function only ever jumps down. So how can it loop?... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 9:29 ` Dmitry Vyukov @ 2020-01-09 10:05 ` Dmitry Vyukov 2020-01-09 10:39 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-09 10:05 UTC (permalink / raw) To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 10:29 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > > > >> I temporarily re-enabled smack instance and it produced another 50 > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > > > and don't want to waste time using my current tool chain if the problem > > > > is clang specific. > > > > > > Humm, interesting. Initially I was going to say that most likely it's > > > not clang-related. Bug smack instance is actually the only one that > > > uses clang as well (except for KMSAN of course). So maybe it's indeed > > > clang-related rather than smack-related. Let me try to build a kernel > > > with clang. > > > > +clang-built-linux, glider > > > > [clang-built linux is severe broken since early Dec] > > > > Building kernel with clang I can immediately reproduce this locally: > > > > $ syz-manager > > 2020/01/09 09:27:15 loading corpus... > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > > 2020/01/09 09:27:17 booting test machines... > > 2020/01/09 09:27:17 wait for the connection from test machine... > > 2020/01/09 09:29:23 machine check: > > 2020/01/09 09:29:23 syscalls : 2961/3195 > > 2020/01/09 09:29:23 code coverage : enabled > > 2020/01/09 09:29:23 comparison tracing : enabled > > 2020/01/09 09:29:23 extra coverage : enabled > > 2020/01/09 09:29:23 setuid sandbox : enabled > > 2020/01/09 09:29:23 namespace sandbox : enabled > > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > > does not exist > > 2020/01/09 09:29:23 fault injection : enabled > > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > > not enabled > > 2020/01/09 09:29:23 net packet injection : enabled > > 2020/01/09 09:29:23 net device setup : enabled > > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > > does not exist > > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > > is not available > > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > > Casey, you may relax, this is not smack-specific :) > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > > started working normally. > > > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > > > The clang I used is: > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > > (the one we use on syzbot). > > > Clustering hangs, they all happen within very limited section of the code: > > 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 > 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 > 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 > 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 > 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 > 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 > 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 > 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 > 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 > 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 > 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 > 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 > 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 > 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 > 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 > 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 > > Here is disass of the function: > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt > > But if I am not mistaken, the function only ever jumps down. So how > can it loop?... This is a miscompilation related to static branches. objdump shows: ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ./arch/x86/include/asm/jump_label.h:25 asm_volatile_goto("1:" However, the actual instruction in memory at the time is: 0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f <free_thread_stack+287> Which jumps to a wrong location in free_thread_stack and makes it loop. The static branch is this: static inline bool memcg_kmem_enabled(void) { return static_branch_unlikely(&memcg_kmem_enabled_key); } static inline void memcg_kmem_uncharge(struct page *page, int order) { if (memcg_kmem_enabled()) __memcg_kmem_uncharge(page, order); } I suspect it may have something to do with loop unrolling. It may jump to the right location, but in the wrong unrolled iteration. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 10:05 ` Dmitry Vyukov @ 2020-01-09 10:39 ` Dmitry Vyukov 2020-01-09 16:23 ` Alexander Potapenko 2020-01-09 23:25 ` Daniel Axtens 0 siblings, 2 replies; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-09 10:39 UTC (permalink / raw) To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > > > > >> I temporarily re-enabled smack instance and it produced another 50 > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > > > > and don't want to waste time using my current tool chain if the problem > > > > > is clang specific. > > > > > > > > Humm, interesting. Initially I was going to say that most likely it's > > > > not clang-related. Bug smack instance is actually the only one that > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed > > > > clang-related rather than smack-related. Let me try to build a kernel > > > > with clang. > > > > > > +clang-built-linux, glider > > > > > > [clang-built linux is severe broken since early Dec] > > > > > > Building kernel with clang I can immediately reproduce this locally: > > > > > > $ syz-manager > > > 2020/01/09 09:27:15 loading corpus... > > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > > > 2020/01/09 09:27:17 booting test machines... > > > 2020/01/09 09:27:17 wait for the connection from test machine... > > > 2020/01/09 09:29:23 machine check: > > > 2020/01/09 09:29:23 syscalls : 2961/3195 > > > 2020/01/09 09:29:23 code coverage : enabled > > > 2020/01/09 09:29:23 comparison tracing : enabled > > > 2020/01/09 09:29:23 extra coverage : enabled > > > 2020/01/09 09:29:23 setuid sandbox : enabled > > > 2020/01/09 09:29:23 namespace sandbox : enabled > > > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > > > does not exist > > > 2020/01/09 09:29:23 fault injection : enabled > > > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > > > not enabled > > > 2020/01/09 09:29:23 net packet injection : enabled > > > 2020/01/09 09:29:23 net device setup : enabled > > > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > > > does not exist > > > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > > > is not available > > > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > > > > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > > > Casey, you may relax, this is not smack-specific :) > > > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > > > started working normally. > > > > > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > > > > > The clang I used is: > > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > > > (the one we use on syzbot). > > > > > > Clustering hangs, they all happen within very limited section of the code: > > > > 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 > > 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 > > 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 > > 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 > > 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 > > 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 > > 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 > > 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 > > 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 > > 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 > > 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 > > 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 > > 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 > > 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 > > 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 > > 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 > > > > Here is disass of the function: > > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt > > > > But if I am not mistaken, the function only ever jumps down. So how > > can it loop?... > > > This is a miscompilation related to static branches. > > objdump shows: > > ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > ./arch/x86/include/asm/jump_label.h:25 > asm_volatile_goto("1:" > > However, the actual instruction in memory at the time is: > > 0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f <free_thread_stack+287> > > Which jumps to a wrong location in free_thread_stack and makes it loop. > > The static branch is this: > > static inline bool memcg_kmem_enabled(void) > { > return static_branch_unlikely(&memcg_kmem_enabled_key); > } > > static inline void memcg_kmem_uncharge(struct page *page, int order) > { > if (memcg_kmem_enabled()) > __memcg_kmem_uncharge(page, order); > } > > I suspect it may have something to do with loop unrolling. It may jump > to the right location, but in the wrong unrolled iteration. Kernel built with clang version 10.0.0 (https://github.com/llvm/llvm-project.git c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine. Alex, please update clang on syzbot machines. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 10:39 ` Dmitry Vyukov @ 2020-01-09 16:23 ` Alexander Potapenko 2020-01-09 17:16 ` Nick Desaulniers 2020-01-09 23:25 ` Daniel Axtens 1 sibling, 1 reply; 22+ messages in thread From: Alexander Potapenko @ 2020-01-09 16:23 UTC (permalink / raw) To: Dmitry Vyukov Cc: Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 11:39 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > > > > > >> I temporarily re-enabled smack instance and it produced another 50 > > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > > > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > > > > > and don't want to waste time using my current tool chain if the problem > > > > > > is clang specific. > > > > > > > > > > Humm, interesting. Initially I was going to say that most likely it's > > > > > not clang-related. Bug smack instance is actually the only one that > > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed > > > > > clang-related rather than smack-related. Let me try to build a kernel > > > > > with clang. > > > > > > > > +clang-built-linux, glider > > > > > > > > [clang-built linux is severe broken since early Dec] > > > > > > > > Building kernel with clang I can immediately reproduce this locally: > > > > > > > > $ syz-manager > > > > 2020/01/09 09:27:15 loading corpus... > > > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > > > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > > > > 2020/01/09 09:27:17 booting test machines... > > > > 2020/01/09 09:27:17 wait for the connection from test machine... > > > > 2020/01/09 09:29:23 machine check: > > > > 2020/01/09 09:29:23 syscalls : 2961/3195 > > > > 2020/01/09 09:29:23 code coverage : enabled > > > > 2020/01/09 09:29:23 comparison tracing : enabled > > > > 2020/01/09 09:29:23 extra coverage : enabled > > > > 2020/01/09 09:29:23 setuid sandbox : enabled > > > > 2020/01/09 09:29:23 namespace sandbox : enabled > > > > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > > > > does not exist > > > > 2020/01/09 09:29:23 fault injection : enabled > > > > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > > > > not enabled > > > > 2020/01/09 09:29:23 net packet injection : enabled > > > > 2020/01/09 09:29:23 net device setup : enabled > > > > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > > > > does not exist > > > > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > > > > is not available > > > > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > > > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > > > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > > > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > > > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > > > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > > > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > > > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > > > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > > > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > > > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > > > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > > > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > > > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > > > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > > > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > > > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > > > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > > > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > > > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > > > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > > > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > > > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > > > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > > > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > > > > > > > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > > > > Casey, you may relax, this is not smack-specific :) > > > > > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > > > > started working normally. > > > > > > > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > > > > > > > The clang I used is: > > > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > > > > (the one we use on syzbot). > > > > > > > > > Clustering hangs, they all happen within very limited section of the code: > > > > > > 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 > > > 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 > > > 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 > > > 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 > > > 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 > > > 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 > > > 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 > > > 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 > > > 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 > > > 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 > > > 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 > > > 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 > > > 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 > > > 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 > > > 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 > > > 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 > > > > > > Here is disass of the function: > > > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt > > > > > > But if I am not mistaken, the function only ever jumps down. So how > > > can it loop?... > > > > > > This is a miscompilation related to static branches. > > > > objdump shows: > > > > ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > ./arch/x86/include/asm/jump_label.h:25 > > asm_volatile_goto("1:" > > > > However, the actual instruction in memory at the time is: > > > > 0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f <free_thread_stack+287> > > > > Which jumps to a wrong location in free_thread_stack and makes it loop. > > > > The static branch is this: > > > > static inline bool memcg_kmem_enabled(void) > > { > > return static_branch_unlikely(&memcg_kmem_enabled_key); > > } > > > > static inline void memcg_kmem_uncharge(struct page *page, int order) > > { > > if (memcg_kmem_enabled()) > > __memcg_kmem_uncharge(page, order); > > } > > > > I suspect it may have something to do with loop unrolling. It may jump > > to the right location, but in the wrong unrolled iteration. > > > Kernel built with clang version 10.0.0 > (https://github.com/llvm/llvm-project.git > c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine. > > Alex, please update clang on syzbot machines. Done ~3 hours ago, guess we'll see the results within a day. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 16:23 ` Alexander Potapenko @ 2020-01-09 17:16 ` Nick Desaulniers 2020-01-09 17:23 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: Nick Desaulniers @ 2020-01-09 17:16 UTC (permalink / raw) To: Alexander Potapenko Cc: Dmitry Vyukov, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 8:23 AM 'Alexander Potapenko' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Thu, Jan 9, 2020 at 11:39 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > > > > > > >> I temporarily re-enabled smack instance and it produced another 50 > > > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > > > > > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > > > > > > and don't want to waste time using my current tool chain if the problem > > > > > > > is clang specific. > > > > > > > > > > > > Humm, interesting. Initially I was going to say that most likely it's > > > > > > not clang-related. Bug smack instance is actually the only one that > > > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed > > > > > > clang-related rather than smack-related. Let me try to build a kernel > > > > > > with clang. > > > > > > > > > > +clang-built-linux, glider > > > > > > > > > > [clang-built linux is severe broken since early Dec] Is there automated reporting? Consider adding our mailing list for Clang specific failures. clang-built-linux <clang-built-linux@googlegroups.com> Our CI looks green, but there's a very long tail of combinations of configs that we don't have coverage of, so bug reports are appreciated: https://github.com/ClangBuiltLinux/linux/issues > > > > > > > > > > Building kernel with clang I can immediately reproduce this locally: > > > > > > > > > > $ syz-manager > > > > > 2020/01/09 09:27:15 loading corpus... > > > > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > > > > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > > > > > 2020/01/09 09:27:17 booting test machines... > > > > > 2020/01/09 09:27:17 wait for the connection from test machine... > > > > > 2020/01/09 09:29:23 machine check: > > > > > 2020/01/09 09:29:23 syscalls : 2961/3195 > > > > > 2020/01/09 09:29:23 code coverage : enabled > > > > > 2020/01/09 09:29:23 comparison tracing : enabled > > > > > 2020/01/09 09:29:23 extra coverage : enabled > > > > > 2020/01/09 09:29:23 setuid sandbox : enabled > > > > > 2020/01/09 09:29:23 namespace sandbox : enabled > > > > > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > > > > > does not exist > > > > > 2020/01/09 09:29:23 fault injection : enabled > > > > > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > > > > > not enabled > > > > > 2020/01/09 09:29:23 net packet injection : enabled > > > > > 2020/01/09 09:29:23 net device setup : enabled > > > > > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > > > > > does not exist > > > > > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > > > > > is not available > > > > > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > > > > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > > > > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > > > > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > > > > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > > > > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > > > > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > > > > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > > > > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > > > > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > > > > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > > > > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > > > > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > > > > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > > > > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > > > > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > > > > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > > > > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > > > > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > > > > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > > > > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > > > > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > > > > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > > > > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > > > > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > > > > > > > > > > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > > > > > Casey, you may relax, this is not smack-specific :) > > > > > > > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > > > > > started working normally. > > > > > > > > > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > > > > > > > > > The clang I used is: > > > > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > > > > > (the one we use on syzbot). > > > > > > > > > > > > Clustering hangs, they all happen within very limited section of the code: > > > > > > > > 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 > > > > 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 > > > > 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 > > > > 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 > > > > 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 > > > > 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 > > > > 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 > > > > 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 > > > > 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 > > > > 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 > > > > 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 > > > > 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 > > > > 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 > > > > 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 > > > > 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 > > > > 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 > > > > > > > > Here is disass of the function: > > > > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt > > > > > > > > But if I am not mistaken, the function only ever jumps down. So how > > > > can it loop?... > > > > > > > > > This is a miscompilation related to static branches. > > > > > > objdump shows: > > > > > > ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > > ./arch/x86/include/asm/jump_label.h:25 > > > asm_volatile_goto("1:" > > > > > > However, the actual instruction in memory at the time is: > > > > > > 0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f <free_thread_stack+287> > > > > > > Which jumps to a wrong location in free_thread_stack and makes it loop. > > > > > > The static branch is this: > > > > > > static inline bool memcg_kmem_enabled(void) > > > { > > > return static_branch_unlikely(&memcg_kmem_enabled_key); > > > } > > > > > > static inline void memcg_kmem_uncharge(struct page *page, int order) > > > { > > > if (memcg_kmem_enabled()) > > > __memcg_kmem_uncharge(page, order); > > > } > > > > > > I suspect it may have something to do with loop unrolling. It may jump > > > to the right location, but in the wrong unrolled iteration. I disabled loop unrolling and loop unswitching in LLVM when the loop contained asm goto in: https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 I have a fix for loop unrolling in: https://reviews.llvm.org/D64101 that I should dust off. I haven't looked into loop unswitching yet. > > > > > > Kernel built with clang version 10.0.0 > > (https://github.com/llvm/llvm-project.git > > c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine. > > > > Alex, please update clang on syzbot machines. > > Done ~3 hours ago, guess we'll see the results within a day. Please let me know if you otherwise encounter any miscompiles with Clang, particularly `asm goto` I treat as P0. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 17:16 ` Nick Desaulniers @ 2020-01-09 17:23 ` Dmitry Vyukov 2020-01-09 17:38 ` Nick Desaulniers 0 siblings, 1 reply; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-09 17:23 UTC (permalink / raw) To: Nick Desaulniers Cc: Alexander Potapenko, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > > > > > > > >> I temporarily re-enabled smack instance and it produced another 50 > > > > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > > > > > > > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > > > > > > > and don't want to waste time using my current tool chain if the problem > > > > > > > > is clang specific. > > > > > > > > > > > > > > Humm, interesting. Initially I was going to say that most likely it's > > > > > > > not clang-related. Bug smack instance is actually the only one that > > > > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed > > > > > > > clang-related rather than smack-related. Let me try to build a kernel > > > > > > > with clang. > > > > > > > > > > > > +clang-built-linux, glider > > > > > > > > > > > > [clang-built linux is severe broken since early Dec] > > Is there automated reporting? Consider adding our mailing list for > Clang specific failures. > clang-built-linux <clang-built-linux@googlegroups.com> > Our CI looks green, but there's a very long tail of combinations of > configs that we don't have coverage of, so bug reports are > appreciated: > https://github.com/ClangBuiltLinux/linux/issues syzbot does automatic reporting, but it does not automatically classify bugs as clang-specific. FTR, this combination is clang+KASAN+VMAP_STACK (relatively recent changes, and that's what triggered the infinite loop). But note that the kernel booted, you can ssh and do some basic things. > > > > > > > > > > > > Building kernel with clang I can immediately reproduce this locally: > > > > > > > > > > > > $ syz-manager > > > > > > 2020/01/09 09:27:15 loading corpus... > > > > > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > > > > > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > > > > > > 2020/01/09 09:27:17 booting test machines... > > > > > > 2020/01/09 09:27:17 wait for the connection from test machine... > > > > > > 2020/01/09 09:29:23 machine check: > > > > > > 2020/01/09 09:29:23 syscalls : 2961/3195 > > > > > > 2020/01/09 09:29:23 code coverage : enabled > > > > > > 2020/01/09 09:29:23 comparison tracing : enabled > > > > > > 2020/01/09 09:29:23 extra coverage : enabled > > > > > > 2020/01/09 09:29:23 setuid sandbox : enabled > > > > > > 2020/01/09 09:29:23 namespace sandbox : enabled > > > > > > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > > > > > > does not exist > > > > > > 2020/01/09 09:29:23 fault injection : enabled > > > > > > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > > > > > > not enabled > > > > > > 2020/01/09 09:29:23 net packet injection : enabled > > > > > > 2020/01/09 09:29:23 net device setup : enabled > > > > > > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > > > > > > does not exist > > > > > > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > > > > > > is not available > > > > > > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > > > > > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > > > > > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > > > > > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > > > > > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > > > > > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > > > > > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > > > > > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > > > > > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > > > > > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > > > > > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > > > > > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > > > > > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > > > > > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > > > > > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > > > > > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > > > > > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > > > > > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > > > > > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > > > > > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > > > > > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > > > > > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > > > > > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > > > > > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > > > > > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > > > > > > > > > > > > > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > > > > > > Casey, you may relax, this is not smack-specific :) > > > > > > > > > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > > > > > > started working normally. > > > > > > > > > > > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > > > > > > > > > > > The clang I used is: > > > > > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > > > > > > (the one we use on syzbot). > > > > > > > > > > > > > > > Clustering hangs, they all happen within very limited section of the code: > > > > > > > > > > 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 > > > > > 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 > > > > > 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 > > > > > 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 > > > > > 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 > > > > > 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 > > > > > 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 > > > > > 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 > > > > > 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 > > > > > 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 > > > > > 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 > > > > > 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 > > > > > 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 > > > > > 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 > > > > > 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 > > > > > 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 > > > > > > > > > > Here is disass of the function: > > > > > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt > > > > > > > > > > But if I am not mistaken, the function only ever jumps down. So how > > > > > can it loop?... > > > > > > > > > > > > This is a miscompilation related to static branches. > > > > > > > > objdump shows: > > > > > > > > ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > > > ./arch/x86/include/asm/jump_label.h:25 > > > > asm_volatile_goto("1:" > > > > > > > > However, the actual instruction in memory at the time is: > > > > > > > > 0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f <free_thread_stack+287> > > > > > > > > Which jumps to a wrong location in free_thread_stack and makes it loop. > > > > > > > > The static branch is this: > > > > > > > > static inline bool memcg_kmem_enabled(void) > > > > { > > > > return static_branch_unlikely(&memcg_kmem_enabled_key); > > > > } > > > > > > > > static inline void memcg_kmem_uncharge(struct page *page, int order) > > > > { > > > > if (memcg_kmem_enabled()) > > > > __memcg_kmem_uncharge(page, order); > > > > } > > > > > > > > I suspect it may have something to do with loop unrolling. It may jump > > > > to the right location, but in the wrong unrolled iteration. > > I disabled loop unrolling and loop unswitching in LLVM when the loop > contained asm goto in: > https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 > I have a fix for loop unrolling in: > https://reviews.llvm.org/D64101 > that I should dust off. I haven't looked into loop unswitching yet. c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the broken compiler and the newer compiler that seems to work, so I would assume that that commit fixes this. We will get the final stamp from syzbot hopefully by tomorrow. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 17:23 ` Dmitry Vyukov @ 2020-01-09 17:38 ` Nick Desaulniers 2020-01-10 8:37 ` Alexander Potapenko 0 siblings, 1 reply; 22+ messages in thread From: Nick Desaulniers @ 2020-01-09 17:38 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 9:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > I disabled loop unrolling and loop unswitching in LLVM when the loop > > contained asm goto in: > > https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 > > I have a fix for loop unrolling in: > > https://reviews.llvm.org/D64101 > > that I should dust off. I haven't looked into loop unswitching yet. > > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the > broken compiler and the newer compiler that seems to work, so I would > assume that that commit fixes this. > We will get the final stamp from syzbot hopefully by tomorrow. How often do you refresh the build of Clang in syzbot? Is it manual? I understand the tradeoffs of living on the tip of the spear, but c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is 6 months old. So upstream LLVM could be regressing more often, and you wouldn't notice for 1/2 a year or more. :-/ -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 17:38 ` Nick Desaulniers @ 2020-01-10 8:37 ` Alexander Potapenko 2020-01-14 10:15 ` Dmitry Vyukov 0 siblings, 1 reply; 22+ messages in thread From: Alexander Potapenko @ 2020-01-10 8:37 UTC (permalink / raw) To: Nick Desaulniers Cc: Dmitry Vyukov, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs On Thu, Jan 9, 2020 at 6:39 PM 'Nick Desaulniers' via kasan-dev <kasan-dev@googlegroups.com> wrote: > > On Thu, Jan 9, 2020 at 9:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > > I disabled loop unrolling and loop unswitching in LLVM when the loop > > > contained asm goto in: > > > https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 > > > I have a fix for loop unrolling in: > > > https://reviews.llvm.org/D64101 > > > that I should dust off. I haven't looked into loop unswitching yet. > > > > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the > > broken compiler and the newer compiler that seems to work, so I would > > assume that that commit fixes this. > > We will get the final stamp from syzbot hopefully by tomorrow. > > How often do you refresh the build of Clang in syzbot? Is it manual? I > understand the tradeoffs of living on the tip of the spear, but > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is 6 months old. So upstream > LLVM could be regressing more often, and you wouldn't notice for 1/2 a > year or more. :-/ KMSAN used to be the only user of Clang on syzbot, so I didn't bother too often. Now that there are other users, we'll need a better strategy. Clang revisions I've been picking previously came from Chromium's Clang distributions. This is nice, because Chromium folks usually pick a revision that has been extensively tested at Google already, plus they make sure Chromium tests also pass. They don't roll the compiler often, however (typically once a month or two, but this time there were holidays, plus some nasty breakages). > -- > Thanks, > ~Nick Desaulniers > > -- > You received this message because you are subscribed to the Google Groups "kasan-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAKwvOdkh8CV0pgqqHXknv8%2BgE2ovoKEV_m%2BqiEmWutmLnra3%3Dg%40mail.gmail.com. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-10 8:37 ` Alexander Potapenko @ 2020-01-14 10:15 ` Dmitry Vyukov 0 siblings, 0 replies; 22+ messages in thread From: Dmitry Vyukov @ 2020-01-14 10:15 UTC (permalink / raw) To: Alexander Potapenko Cc: Nick Desaulniers, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs The clang instances are back to life (incl smack). #syz invalid On Fri, Jan 10, 2020 at 9:37 AM 'Alexander Potapenko' via kasan-dev <kasan-dev@googlegroups.com> wrote: > > On Thu, Jan 9, 2020 at 6:39 PM 'Nick Desaulniers' via kasan-dev > <kasan-dev@googlegroups.com> wrote: > > > > On Thu, Jan 9, 2020 at 9:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > I disabled loop unrolling and loop unswitching in LLVM when the loop > > > > contained asm goto in: > > > > https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 > > > > I have a fix for loop unrolling in: > > > > https://reviews.llvm.org/D64101 > > > > that I should dust off. I haven't looked into loop unswitching yet. > > > > > > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the > > > broken compiler and the newer compiler that seems to work, so I would > > > assume that that commit fixes this. > > > We will get the final stamp from syzbot hopefully by tomorrow. > > > > How often do you refresh the build of Clang in syzbot? Is it manual? I > > understand the tradeoffs of living on the tip of the spear, but > > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is 6 months old. So upstream > > LLVM could be regressing more often, and you wouldn't notice for 1/2 a > > year or more. :-/ > KMSAN used to be the only user of Clang on syzbot, so I didn't bother too often. > Now that there are other users, we'll need a better strategy. > Clang revisions I've been picking previously came from Chromium's > Clang distributions. This is nice, because Chromium folks usually pick > a revision that has been extensively tested at Google already, plus > they make sure Chromium tests also pass. > They don't roll the compiler often, however (typically once a month or > two, but this time there were holidays, plus some nasty breakages). > > -- > > Thanks, > > ~Nick Desaulniers > > > > -- > > You received this message because you are subscribed to the Google Groups "kasan-dev" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. > > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAKwvOdkh8CV0pgqqHXknv8%2BgE2ovoKEV_m%2BqiEmWutmLnra3%3Dg%40mail.gmail.com. > > > > -- > Alexander Potapenko > Software Engineer > > Google Germany GmbH > Erika-Mann-Straße, 33 > 80636 München > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > > -- > You received this message because you are subscribed to the Google Groups "kasan-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAG_fn%3DUU0fuws59L8Bp8DEVhH%2BX6xRaanwuRrzy-HNdrVpqJmg%40mail.gmail.com. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 10:39 ` Dmitry Vyukov 2020-01-09 16:23 ` Alexander Potapenko @ 2020-01-09 23:25 ` Daniel Axtens 1 sibling, 0 replies; 22+ messages in thread From: Daniel Axtens @ 2020-01-09 23:25 UTC (permalink / raw) To: Dmitry Vyukov, Casey Schaufler, Alexander Potapenko, clang-built-linux Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs Dmitry Vyukov <dvyukov@google.com> writes: > On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote: >> > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: >> > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: >> > > > > >> I temporarily re-enabled smack instance and it produced another 50 >> > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. >> > > > > >> > > > > Do I have to be using clang to test this? I'm setting up to work on this, >> > > > > and don't want to waste time using my current tool chain if the problem >> > > > > is clang specific. >> > > > >> > > > Humm, interesting. Initially I was going to say that most likely it's >> > > > not clang-related. Bug smack instance is actually the only one that >> > > > uses clang as well (except for KMSAN of course). So maybe it's indeed >> > > > clang-related rather than smack-related. Let me try to build a kernel >> > > > with clang. >> > > >> > > +clang-built-linux, glider >> > > >> > > [clang-built linux is severe broken since early Dec] >> > > >> > > Building kernel with clang I can immediately reproduce this locally: >> > > >> > > $ syz-manager >> > > 2020/01/09 09:27:15 loading corpus... >> > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 >> > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 >> > > 2020/01/09 09:27:17 booting test machines... >> > > 2020/01/09 09:27:17 wait for the connection from test machine... >> > > 2020/01/09 09:29:23 machine check: >> > > 2020/01/09 09:29:23 syscalls : 2961/3195 >> > > 2020/01/09 09:29:23 code coverage : enabled >> > > 2020/01/09 09:29:23 comparison tracing : enabled >> > > 2020/01/09 09:29:23 extra coverage : enabled >> > > 2020/01/09 09:29:23 setuid sandbox : enabled >> > > 2020/01/09 09:29:23 namespace sandbox : enabled >> > > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy >> > > does not exist >> > > 2020/01/09 09:29:23 fault injection : enabled >> > > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is >> > > not enabled >> > > 2020/01/09 09:29:23 net packet injection : enabled >> > > 2020/01/09 09:29:23 net device setup : enabled >> > > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan >> > > does not exist >> > > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 >> > > is not available >> > > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) >> > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle >> > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 >> > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex >> > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex >> > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt >> > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 >> > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex >> > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle >> > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 >> > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail >> > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail >> > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail >> > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex >> > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep >> > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 >> > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex >> > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule >> > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail >> > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail >> > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex >> > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 >> > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex >> > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput >> > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep >> > > >> > > >> > > Then I switched LSM to selinux and I _still_ can reproduce this. So, >> > > Casey, you may relax, this is not smack-specific :) >> > > >> > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it >> > > started working normally. >> > > >> > > So this is somehow related to both clang and KASAN/VMAP_STACK. >> > > >> > > The clang I used is: >> > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz >> > > (the one we use on syzbot). >> > >> > >> > Clustering hangs, they all happen within very limited section of the code: >> > >> > 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 >> > 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 >> > 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 >> > 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 >> > 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 >> > 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 >> > 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 >> > 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 >> > 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 >> > 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 >> > 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 >> > 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 >> > 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 >> > 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 >> > 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 >> > 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 >> > >> > Here is disass of the function: >> > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt >> > >> > But if I am not mistaken, the function only ever jumps down. So how >> > can it loop?... >> >> >> This is a miscompilation related to static branches. >> >> objdump shows: >> >> ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> ./arch/x86/include/asm/jump_label.h:25 >> asm_volatile_goto("1:" >> >> However, the actual instruction in memory at the time is: >> >> 0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f <free_thread_stack+287> >> >> Which jumps to a wrong location in free_thread_stack and makes it loop. >> >> The static branch is this: >> >> static inline bool memcg_kmem_enabled(void) >> { >> return static_branch_unlikely(&memcg_kmem_enabled_key); >> } >> >> static inline void memcg_kmem_uncharge(struct page *page, int order) >> { >> if (memcg_kmem_enabled()) >> __memcg_kmem_uncharge(page, order); >> } >> >> I suspect it may have something to do with loop unrolling. It may jump >> to the right location, but in the wrong unrolled iteration. > > > Kernel built with clang version 10.0.0 > (https://github.com/llvm/llvm-project.git > c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine. > > Alex, please update clang on syzbot machines. Wow, what a bug. Very happy to be off the hook for causing it, and feeling a lot better about my inability to reproduce it with a GCC-built kernel! Regards, Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: INFO: rcu detected stall in sys_kill 2020-01-09 8:50 ` Dmitry Vyukov 2020-01-09 9:29 ` Dmitry Vyukov @ 2020-01-09 15:43 ` Casey Schaufler 1 sibling, 0 replies; 22+ messages in thread From: Casey Schaufler @ 2020-01-09 15:43 UTC (permalink / raw) To: Dmitry Vyukov, Daniel Axtens, Alexander Potapenko, clang-built-linux Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs, Casey Schaufler On 1/9/2020 12:50 AM, Dmitry Vyukov wrote: > On Thu, Jan 9, 2020 at 9:19 AM Dmitry Vyukov <dvyukov@google.com> wrote: >> On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 1/8/2020 2:25 AM, Tetsuo Handa wrote: >>>> On 2020/01/08 15:20, Dmitry Vyukov wrote: >>>>> I temporarily re-enabled smack instance and it produced another 50 >>>>> stalls all over the kernel, and now keeps spewing a dozen every hour. >>> Do I have to be using clang to test this? I'm setting up to work on this, >>> and don't want to waste time using my current tool chain if the problem >>> is clang specific. >> Humm, interesting. Initially I was going to say that most likely it's >> not clang-related. Bug smack instance is actually the only one that >> uses clang as well (except for KMSAN of course). So maybe it's indeed >> clang-related rather than smack-related. Let me try to build a kernel >> with clang. > +clang-built-linux, glider > > [clang-built linux is severe broken since early Dec] > > Building kernel with clang I can immediately reproduce this locally: > > $ syz-manager > 2020/01/09 09:27:15 loading corpus... > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > 2020/01/09 09:27:17 booting test machines... > 2020/01/09 09:27:17 wait for the connection from test machine... > 2020/01/09 09:29:23 machine check: > 2020/01/09 09:29:23 syscalls : 2961/3195 > 2020/01/09 09:29:23 code coverage : enabled > 2020/01/09 09:29:23 comparison tracing : enabled > 2020/01/09 09:29:23 extra coverage : enabled > 2020/01/09 09:29:23 setuid sandbox : enabled > 2020/01/09 09:29:23 namespace sandbox : enabled > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > does not exist > 2020/01/09 09:29:23 fault injection : enabled > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > not enabled > 2020/01/09 09:29:23 net packet injection : enabled > 2020/01/09 09:29:23 net device setup : enabled > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > does not exist > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > is not available > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > Casey, you may relax, this is not smack-specific :) Wow. I wasn't expecting clang to be the problem, just a possible required condition. I am, of course, quite relieved. > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > started working normally. > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > The clang I used is: > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > (the one we use on syzbot). ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2020-01-14 10:15 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-03 8:27 INFO: rcu detected stall in sys_kill syzbot 2019-12-03 8:38 ` Dmitry Vyukov 2019-12-04 13:58 ` Dmitry Vyukov 2019-12-04 16:05 ` Casey Schaufler 2019-12-04 23:34 ` Daniel Axtens 2019-12-17 13:38 ` Daniel Axtens 2020-01-08 6:20 ` Dmitry Vyukov 2020-01-08 10:25 ` Tetsuo Handa 2020-01-08 17:19 ` Casey Schaufler 2020-01-09 8:19 ` Dmitry Vyukov 2020-01-09 8:50 ` Dmitry Vyukov 2020-01-09 9:29 ` Dmitry Vyukov 2020-01-09 10:05 ` Dmitry Vyukov 2020-01-09 10:39 ` Dmitry Vyukov 2020-01-09 16:23 ` Alexander Potapenko 2020-01-09 17:16 ` Nick Desaulniers 2020-01-09 17:23 ` Dmitry Vyukov 2020-01-09 17:38 ` Nick Desaulniers 2020-01-10 8:37 ` Alexander Potapenko 2020-01-14 10:15 ` Dmitry Vyukov 2020-01-09 23:25 ` Daniel Axtens 2020-01-09 15:43 ` Casey Schaufler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).