linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: rcu detected stall in sys_kill
@ 2019-12-03  8:27 syzbot
  2019-12-03  8:38 ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: syzbot @ 2019-12-03  8:27 UTC (permalink / raw)
  To: aarcange, akpm, christian, christian, cyphar, elena.reshetova,
	jgg, keescook, ldv, linux-kernel, luto, mingo, peterz,
	syzkaller-bugs, tglx, viro, wad

Hello,

syzbot found the following crash on:

HEAD commit:    596cf45c Merge branch 'akpm' (patches from Andrew)
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4
dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 1, t=10502 jiffies, g=6629, q=331)
rcu: All QSes seen, last rcu_preempt kthread activity 10503  
(4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
syz-executor.0  R  running task    24648  8293   8292 0x0000400a
Call Trace:
  <IRQ>
  sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
  print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
  check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
  rcu_pending kernel/rcu/tree.c:2827 [inline]
  rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
  update_process_times+0x12d/0x180 kernel/time/timer.c:1726
  tick_sched_handle kernel/time/tick-sched.c:167 [inline]
  tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
  __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
  __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
  hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
  smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
  </IRQ>
RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25  
c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00  
00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
  free_thread_stack+0x168/0x590 kernel/fork.c:280
  release_task_stack kernel/fork.c:440 [inline]
  put_task_stack+0xa3/0x130 kernel/fork.c:451
  finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
  context_switch kernel/sched/core.c:3388 [inline]
  __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
  preempt_schedule_common kernel/sched/core.c:4236 [inline]
  preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
  ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
  __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
  _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
  kill_something_info kernel/signal.c:1586 [inline]
  __do_sys_kill kernel/signal.c:3640 [inline]
  __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
  __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
  do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x422a17
Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e  
0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2  
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: RCU grace-period kthread stack dump:
rcu_preempt     R  running task    29032    10      2 0x80004008
Call Trace:
  context_switch kernel/sched/core.c:3388 [inline]
  __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
  schedule+0x181/0x210 kernel/sched/core.c:4155
  schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
  rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
  rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
  kthread+0x332/0x350 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2019-12-03  8:27 INFO: rcu detected stall in sys_kill syzbot
@ 2019-12-03  8:38 ` Dmitry Vyukov
  2019-12-04 13:58   ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2019-12-03  8:38 UTC (permalink / raw)
  To: syzbot, Casey Schaufler, linux-security-module, Daniel Axtens,
	Andrey Ryabinin, kasan-dev
  Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian,
	cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML,
	Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
	Thomas Gleixner, Al Viro, Will Drewry

On Tue, Dec 3, 2019 at 9:27 AM syzbot
<syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    596cf45c Merge branch 'akpm' (patches from Andrew)
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4
> dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
> compiler:       clang version 9.0.0 (/home/glider/llvm/clang
> 80fee25776c2fb61e74c1ecb1a523375c2500b69)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com

Something seriously broken in smack+kasan+vmap stacks, we now have 60
rcu stalls all over the place and counting. This is one of the
samples. I've duped 2 other samples to this one, you can see them on
the dashboard:
https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb

I see 2 common this across all stalls:
1. They all happen on the instance that uses smack (which is now
effectively dead), see smack instance here:
https://syzkaller.appspot.com/upstream
2. They all contain this frame in the stack trace:
free_thread_stack+0x168/0x590 kernel/fork.c:280
The last commit that touches this file is "fork: support VMAP_STACK
with KASAN_VMALLOC".
That may be very likely the root cause. +Daniel


> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>         (detected by 1, t=10502 jiffies, g=6629, q=331)
> rcu: All QSes seen, last rcu_preempt kthread activity 10503
> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
> syz-executor.0  R  running task    24648  8293   8292 0x0000400a
> Call Trace:
>   <IRQ>
>   sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
>   print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
>   check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
>   rcu_pending kernel/rcu/tree.c:2827 [inline]
>   rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
>   update_process_times+0x12d/0x180 kernel/time/timer.c:1726
>   tick_sched_handle kernel/time/tick-sched.c:167 [inline]
>   tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
>   __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
>   __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
>   hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
>   smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
>   </IRQ>
> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25
> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00
> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
>   free_thread_stack+0x168/0x590 kernel/fork.c:280
>   release_task_stack kernel/fork.c:440 [inline]
>   put_task_stack+0xa3/0x130 kernel/fork.c:451
>   finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
>   context_switch kernel/sched/core.c:3388 [inline]
>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>   preempt_schedule_common kernel/sched/core.c:4236 [inline]
>   preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
>   ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
>   __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
>   _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
>   kill_something_info kernel/signal.c:1586 [inline]
>   __do_sys_kill kernel/signal.c:3640 [inline]
>   __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
>   __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
>   do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x422a17
> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e
> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2
> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu: RCU grace-period kthread stack dump:
> rcu_preempt     R  running task    29032    10      2 0x80004008
> Call Trace:
>   context_switch kernel/sched/core.c:3388 [inline]
>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>   schedule+0x181/0x210 kernel/sched/core.c:4155
>   schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
>   rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
>   rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
>   kthread+0x332/0x350 kernel/kthread.c:255
>   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2019-12-03  8:38 ` Dmitry Vyukov
@ 2019-12-04 13:58   ` Dmitry Vyukov
  2019-12-04 16:05     ` Casey Schaufler
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2019-12-04 13:58 UTC (permalink / raw)
  To: syzbot, Casey Schaufler, linux-security-module, Daniel Axtens,
	Andrey Ryabinin, kasan-dev
  Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian,
	cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML,
	Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
	Thomas Gleixner, Al Viro, Will Drewry

On Tue, Dec 3, 2019 at 9:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, Dec 3, 2019 at 9:27 AM syzbot
> <syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    596cf45c Merge branch 'akpm' (patches from Andrew)
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4
> > dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
> > compiler:       clang version 9.0.0 (/home/glider/llvm/clang
> > 80fee25776c2fb61e74c1ecb1a523375c2500b69)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com
>
> Something seriously broken in smack+kasan+vmap stacks, we now have 60
> rcu stalls all over the place and counting. This is one of the
> samples. I've duped 2 other samples to this one, you can see them on
> the dashboard:
> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
>
> I see 2 common this across all stalls:
> 1. They all happen on the instance that uses smack (which is now
> effectively dead), see smack instance here:
> https://syzkaller.appspot.com/upstream
> 2. They all contain this frame in the stack trace:
> free_thread_stack+0x168/0x590 kernel/fork.c:280
> The last commit that touches this file is "fork: support VMAP_STACK
> with KASAN_VMALLOC".
> That may be very likely the root cause. +Daniel

I've stopped smack syzbot instance b/c it produces infinite stream of
assorted crashes due to this.
Please ping syzkaller@googlegroups.com when this is fixed, I will
re-enable the instance.

> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >         (detected by 1, t=10502 jiffies, g=6629, q=331)
> > rcu: All QSes seen, last rcu_preempt kthread activity 10503
> > (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > syz-executor.0  R  running task    24648  8293   8292 0x0000400a
> > Call Trace:
> >   <IRQ>
> >   sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
> >   print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
> >   check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
> >   rcu_pending kernel/rcu/tree.c:2827 [inline]
> >   rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
> >   update_process_times+0x12d/0x180 kernel/time/timer.c:1726
> >   tick_sched_handle kernel/time/tick-sched.c:167 [inline]
> >   tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
> >   __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
> >   __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
> >   hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
> >   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
> >   smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
> >   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
> >   </IRQ>
> > RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
> > RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
> > RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
> > Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25
> > c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00
> > 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
> > RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
> > RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
> > RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
> > R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
> > R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
> >   free_thread_stack+0x168/0x590 kernel/fork.c:280
> >   release_task_stack kernel/fork.c:440 [inline]
> >   put_task_stack+0xa3/0x130 kernel/fork.c:451
> >   finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
> >   context_switch kernel/sched/core.c:3388 [inline]
> >   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
> >   preempt_schedule_common kernel/sched/core.c:4236 [inline]
> >   preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
> >   ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
> >   __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
> >   _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
> >   kill_something_info kernel/signal.c:1586 [inline]
> >   __do_sys_kill kernel/signal.c:3640 [inline]
> >   __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
> >   __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
> >   do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x422a17
> > Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e
> > 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff
> > ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
> > RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
> > RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
> > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
> > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> > R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
> > rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2
> > RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > rcu: RCU grace-period kthread stack dump:
> > rcu_preempt     R  running task    29032    10      2 0x80004008
> > Call Trace:
> >   context_switch kernel/sched/core.c:3388 [inline]
> >   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
> >   schedule+0x181/0x210 kernel/sched/core.c:4155
> >   schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
> >   rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
> >   rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
> >   kthread+0x332/0x350 kernel/kthread.c:255
> >   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> >
> >
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2019-12-04 13:58   ` Dmitry Vyukov
@ 2019-12-04 16:05     ` Casey Schaufler
  2019-12-04 23:34       ` Daniel Axtens
  0 siblings, 1 reply; 22+ messages in thread
From: Casey Schaufler @ 2019-12-04 16:05 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot, linux-security-module, Daniel Axtens,
	Andrey Ryabinin, kasan-dev
  Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian,
	cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML,
	Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
	Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler

On 12/4/2019 5:58 AM, Dmitry Vyukov wrote:
> On Tue, Dec 3, 2019 at 9:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Dec 3, 2019 at 9:27 AM syzbot
>> <syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    596cf45c Merge branch 'akpm' (patches from Andrew)
>>> git tree:       upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
>>> compiler:       clang version 9.0.0 (/home/glider/llvm/clang
>>> 80fee25776c2fb61e74c1ecb1a523375c2500b69)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+de8d933e7d153aa0c1bb@syzkaller.appspotmail.com
>> Something seriously broken in smack+kasan+vmap stacks, we now have 60
>> rcu stalls all over the place and counting. This is one of the
>> samples. I've duped 2 other samples to this one, you can see them on
>> the dashboard:
>> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb

There haven't been Smack changes recently, so this is
going to have been introduced elsewhere. I'm perfectly
willing to accept that Smack is doing something horribly
wrong WRT rcu, and that it needs repair, but its going to
be tough for me to track down. I hope someone else is looking
into this, as my chances of finding the problem are pretty
slim.

>>
>> I see 2 common this across all stalls:
>> 1. They all happen on the instance that uses smack (which is now
>> effectively dead), see smack instance here:
>> https://syzkaller.appspot.com/upstream
>> 2. They all contain this frame in the stack trace:
>> free_thread_stack+0x168/0x590 kernel/fork.c:280
>> The last commit that touches this file is "fork: support VMAP_STACK
>> with KASAN_VMALLOC".
>> That may be very likely the root cause. +Daniel
> I've stopped smack syzbot instance b/c it produces infinite stream of
> assorted crashes due to this.
> Please ping syzkaller@googlegroups.com when this is fixed, I will
> re-enable the instance.
>
>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>         (detected by 1, t=10502 jiffies, g=6629, q=331)
>>> rcu: All QSes seen, last rcu_preempt kthread activity 10503
>>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
>>> syz-executor.0  R  running task    24648  8293   8292 0x0000400a
>>> Call Trace:
>>>   <IRQ>
>>>   sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
>>>   print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
>>>   check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
>>>   rcu_pending kernel/rcu/tree.c:2827 [inline]
>>>   rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
>>>   update_process_times+0x12d/0x180 kernel/time/timer.c:1726
>>>   tick_sched_handle kernel/time/tick-sched.c:167 [inline]
>>>   tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
>>>   __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
>>>   __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
>>>   hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
>>>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
>>>   smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
>>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
>>>   </IRQ>
>>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
>>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
>>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
>>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25
>>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00
>>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
>>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
>>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
>>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
>>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
>>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
>>>   free_thread_stack+0x168/0x590 kernel/fork.c:280
>>>   release_task_stack kernel/fork.c:440 [inline]
>>>   put_task_stack+0xa3/0x130 kernel/fork.c:451
>>>   finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
>>>   context_switch kernel/sched/core.c:3388 [inline]
>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>>>   preempt_schedule_common kernel/sched/core.c:4236 [inline]
>>>   preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
>>>   ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
>>>   __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
>>>   _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
>>>   kill_something_info kernel/signal.c:1586 [inline]
>>>   __do_sys_kill kernel/signal.c:3640 [inline]
>>>   __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
>>>   __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
>>>   do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
>>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> RIP: 0033:0x422a17
>>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e
>>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff
>>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
>>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
>>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
>>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
>>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
>>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
>>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2
>>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
>>> rcu: RCU grace-period kthread stack dump:
>>> rcu_preempt     R  running task    29032    10      2 0x80004008
>>> Call Trace:
>>>   context_switch kernel/sched/core.c:3388 [inline]
>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>>>   schedule+0x181/0x210 kernel/sched/core.c:4155
>>>   schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
>>>   rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
>>>   rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
>>>   kthread+0x332/0x350 kernel/kthread.c:255
>>>   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>>>
>>>
>>> ---
>>> This bug is generated by a bot. It may contain errors.
>>> See https://goo.gl/tpsmEJ for more information about syzbot.
>>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>>
>>> syzbot will keep track of this bug report. See:
>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2019-12-04 16:05     ` Casey Schaufler
@ 2019-12-04 23:34       ` Daniel Axtens
  2019-12-17 13:38         ` Daniel Axtens
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Axtens @ 2019-12-04 23:34 UTC (permalink / raw)
  To: Casey Schaufler, Dmitry Vyukov, syzbot, linux-security-module,
	Andrey Ryabinin, kasan-dev
  Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian,
	cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML,
	Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
	Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler

Hi Casey,

> There haven't been Smack changes recently, so this is
> going to have been introduced elsewhere. I'm perfectly
> willing to accept that Smack is doing something horribly
> wrong WRT rcu, and that it needs repair, but its going to
> be tough for me to track down. I hope someone else is looking
> into this, as my chances of finding the problem are pretty
> slim.

Yeah, I'm having a look, it's probably related to my kasan-vmalloc
stuff. It's currently in a bit of flux as syzkaller finds a bunch of
other bugs with it, once that stablises a bit I'll come back to Smack.

Regards,
Daniel

>
>>>
>>> I see 2 common this across all stalls:
>>> 1. They all happen on the instance that uses smack (which is now
>>> effectively dead), see smack instance here:
>>> https://syzkaller.appspot.com/upstream
>>> 2. They all contain this frame in the stack trace:
>>> free_thread_stack+0x168/0x590 kernel/fork.c:280
>>> The last commit that touches this file is "fork: support VMAP_STACK
>>> with KASAN_VMALLOC".
>>> That may be very likely the root cause. +Daniel
>> I've stopped smack syzbot instance b/c it produces infinite stream of
>> assorted crashes due to this.
>> Please ping syzkaller@googlegroups.com when this is fixed, I will
>> re-enable the instance.
>>
>>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>         (detected by 1, t=10502 jiffies, g=6629, q=331)
>>>> rcu: All QSes seen, last rcu_preempt kthread activity 10503
>>>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
>>>> syz-executor.0  R  running task    24648  8293   8292 0x0000400a
>>>> Call Trace:
>>>>   <IRQ>
>>>>   sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
>>>>   print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
>>>>   check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
>>>>   rcu_pending kernel/rcu/tree.c:2827 [inline]
>>>>   rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
>>>>   update_process_times+0x12d/0x180 kernel/time/timer.c:1726
>>>>   tick_sched_handle kernel/time/tick-sched.c:167 [inline]
>>>>   tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
>>>>   __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
>>>>   __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
>>>>   hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
>>>>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
>>>>   smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
>>>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
>>>>   </IRQ>
>>>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
>>>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
>>>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
>>>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25
>>>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00
>>>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
>>>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
>>>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
>>>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
>>>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
>>>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
>>>>   free_thread_stack+0x168/0x590 kernel/fork.c:280
>>>>   release_task_stack kernel/fork.c:440 [inline]
>>>>   put_task_stack+0xa3/0x130 kernel/fork.c:451
>>>>   finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
>>>>   context_switch kernel/sched/core.c:3388 [inline]
>>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>>>>   preempt_schedule_common kernel/sched/core.c:4236 [inline]
>>>>   preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
>>>>   ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
>>>>   __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
>>>>   _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
>>>>   kill_something_info kernel/signal.c:1586 [inline]
>>>>   __do_sys_kill kernel/signal.c:3640 [inline]
>>>>   __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
>>>>   __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
>>>>   do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
>>>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>> RIP: 0033:0x422a17
>>>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e
>>>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff
>>>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
>>>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
>>>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
>>>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
>>>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
>>>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
>>>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2
>>>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
>>>> rcu: RCU grace-period kthread stack dump:
>>>> rcu_preempt     R  running task    29032    10      2 0x80004008
>>>> Call Trace:
>>>>   context_switch kernel/sched/core.c:3388 [inline]
>>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>>>>   schedule+0x181/0x210 kernel/sched/core.c:4155
>>>>   schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
>>>>   rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
>>>>   rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
>>>>   kthread+0x332/0x350 kernel/kthread.c:255
>>>>   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>>>>
>>>>
>>>> ---
>>>> This bug is generated by a bot. It may contain errors.
>>>> See https://goo.gl/tpsmEJ for more information about syzbot.
>>>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>>>
>>>> syzbot will keep track of this bug report. See:
>>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2019-12-04 23:34       ` Daniel Axtens
@ 2019-12-17 13:38         ` Daniel Axtens
  2020-01-08  6:20           ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Axtens @ 2019-12-17 13:38 UTC (permalink / raw)
  To: Casey Schaufler, Dmitry Vyukov, syzbot, linux-security-module,
	Andrey Ryabinin, kasan-dev
  Cc: Andrea Arcangeli, Andrew Morton, Christian Brauner, christian,
	cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook, ldv, LKML,
	Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
	Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler

Daniel Axtens <dja@axtens.net> writes:

> Hi Casey,
>
>> There haven't been Smack changes recently, so this is
>> going to have been introduced elsewhere. I'm perfectly
>> willing to accept that Smack is doing something horribly
>> wrong WRT rcu, and that it needs repair, but its going to
>> be tough for me to track down. I hope someone else is looking
>> into this, as my chances of finding the problem are pretty
>> slim.
>
> Yeah, I'm having a look, it's probably related to my kasan-vmalloc
> stuff. It's currently in a bit of flux as syzkaller finds a bunch of
> other bugs with it, once that stablises a bit I'll come back to Smack.

I have had a brief and wildly unsuccessful look at this. I'm happy to
come back to it and go over it with a finer toothed comb, but it will
almost certainly have to wait until next year.

I don't think it's related to RCU, we also have a plain lockup:
https://syzkaller.appspot.com/bug?id=be03729d17bb3b2df1754a7486a8f8628f6ff1ec

Dmitry, I've been really struggling to repro this locally, even with
your config. Is there an easy way to see the kernel command line you
booted with and anything else that makes this image special? I have zero
experience with smack so this is a steep learning curve.

Regards,
Daniel

>
> Regards,
> Daniel
>
>>
>>>>
>>>> I see 2 common this across all stalls:
>>>> 1. They all happen on the instance that uses smack (which is now
>>>> effectively dead), see smack instance here:
>>>> https://syzkaller.appspot.com/upstream
>>>> 2. They all contain this frame in the stack trace:
>>>> free_thread_stack+0x168/0x590 kernel/fork.c:280
>>>> The last commit that touches this file is "fork: support VMAP_STACK
>>>> with KASAN_VMALLOC".
>>>> That may be very likely the root cause. +Daniel
>>> I've stopped smack syzbot instance b/c it produces infinite stream of
>>> assorted crashes due to this.
>>> Please ping syzkaller@googlegroups.com when this is fixed, I will
>>> re-enable the instance.
>>>
>>>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>         (detected by 1, t=10502 jiffies, g=6629, q=331)
>>>>> rcu: All QSes seen, last rcu_preempt kthread activity 10503
>>>>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
>>>>> syz-executor.0  R  running task    24648  8293   8292 0x0000400a
>>>>> Call Trace:
>>>>>   <IRQ>
>>>>>   sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
>>>>>   print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
>>>>>   check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
>>>>>   rcu_pending kernel/rcu/tree.c:2827 [inline]
>>>>>   rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
>>>>>   update_process_times+0x12d/0x180 kernel/time/timer.c:1726
>>>>>   tick_sched_handle kernel/time/tick-sched.c:167 [inline]
>>>>>   tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
>>>>>   __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
>>>>>   __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
>>>>>   hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
>>>>>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
>>>>>   smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
>>>>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
>>>>>   </IRQ>
>>>>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
>>>>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
>>>>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
>>>>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25
>>>>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00
>>>>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
>>>>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>>>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
>>>>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
>>>>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
>>>>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
>>>>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
>>>>>   free_thread_stack+0x168/0x590 kernel/fork.c:280
>>>>>   release_task_stack kernel/fork.c:440 [inline]
>>>>>   put_task_stack+0xa3/0x130 kernel/fork.c:451
>>>>>   finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
>>>>>   context_switch kernel/sched/core.c:3388 [inline]
>>>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>>>>>   preempt_schedule_common kernel/sched/core.c:4236 [inline]
>>>>>   preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
>>>>>   ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
>>>>>   __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
>>>>>   _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
>>>>>   kill_something_info kernel/signal.c:1586 [inline]
>>>>>   __do_sys_kill kernel/signal.c:3640 [inline]
>>>>>   __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
>>>>>   __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
>>>>>   do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
>>>>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>> RIP: 0033:0x422a17
>>>>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e
>>>>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff
>>>>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
>>>>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
>>>>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
>>>>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
>>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
>>>>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
>>>>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
>>>>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2
>>>>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
>>>>> rcu: RCU grace-period kthread stack dump:
>>>>> rcu_preempt     R  running task    29032    10      2 0x80004008
>>>>> Call Trace:
>>>>>   context_switch kernel/sched/core.c:3388 [inline]
>>>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
>>>>>   schedule+0x181/0x210 kernel/sched/core.c:4155
>>>>>   schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
>>>>>   rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
>>>>>   rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
>>>>>   kthread+0x332/0x350 kernel/kthread.c:255
>>>>>   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>>>>>
>>>>>
>>>>> ---
>>>>> This bug is generated by a bot. It may contain errors.
>>>>> See https://goo.gl/tpsmEJ for more information about syzbot.
>>>>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>>>>
>>>>> syzbot will keep track of this bug report. See:
>>>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2019-12-17 13:38         ` Daniel Axtens
@ 2020-01-08  6:20           ` Dmitry Vyukov
  2020-01-08 10:25             ` Tetsuo Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-08  6:20 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Casey Schaufler, syzbot, linux-security-module, Andrey Ryabinin,
	kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner,
	christian, cyphar, Reshetova, Elena, Jason Gunthorpe, Kees Cook,
	ldv, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra,
	syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry

On Tue, Dec 17, 2019 at 2:39 PM Daniel Axtens <dja@axtens.net> wrote:
>
> Daniel Axtens <dja@axtens.net> writes:
>
> > Hi Casey,
> >
> >> There haven't been Smack changes recently, so this is
> >> going to have been introduced elsewhere. I'm perfectly
> >> willing to accept that Smack is doing something horribly
> >> wrong WRT rcu, and that it needs repair, but its going to
> >> be tough for me to track down. I hope someone else is looking
> >> into this, as my chances of finding the problem are pretty
> >> slim.
> >
> > Yeah, I'm having a look, it's probably related to my kasan-vmalloc
> > stuff. It's currently in a bit of flux as syzkaller finds a bunch of
> > other bugs with it, once that stablises a bit I'll come back to Smack.
>
> I have had a brief and wildly unsuccessful look at this. I'm happy to
> come back to it and go over it with a finer toothed comb, but it will
> almost certainly have to wait until next year.
>
> I don't think it's related to RCU, we also have a plain lockup:
> https://syzkaller.appspot.com/bug?id=be03729d17bb3b2df1754a7486a8f8628f6ff1ec
>
> Dmitry, I've been really struggling to repro this locally, even with
> your config. Is there an easy way to see the kernel command line you
> booted with and anything else that makes this image special? I have zero
> experience with smack so this is a steep learning curve.

I temporarily re-enabled smack instance and it produced another 50
stalls all over the kernel, and now keeps spewing a dozen every hour.

I've mailed 3 new samples, you can see them here:
https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb

The config is provided, command line args are here:
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline
Some non-default sysctls that syzbot sets are here:
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl
Image can be downloaded from here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not
look to be virtualization-related (?) so probably should reproduce in
qemu too.



> Regards,
> Daniel
>
> >
> > Regards,
> > Daniel
> >
> >>
> >>>>
> >>>> I see 2 common this across all stalls:
> >>>> 1. They all happen on the instance that uses smack (which is now
> >>>> effectively dead), see smack instance here:
> >>>> https://syzkaller.appspot.com/upstream
> >>>> 2. They all contain this frame in the stack trace:
> >>>> free_thread_stack+0x168/0x590 kernel/fork.c:280
> >>>> The last commit that touches this file is "fork: support VMAP_STACK
> >>>> with KASAN_VMALLOC".
> >>>> That may be very likely the root cause. +Daniel
> >>> I've stopped smack syzbot instance b/c it produces infinite stream of
> >>> assorted crashes due to this.
> >>> Please ping syzkaller@googlegroups.com when this is fixed, I will
> >>> re-enable the instance.
> >>>
> >>>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>>>>         (detected by 1, t=10502 jiffies, g=6629, q=331)
> >>>>> rcu: All QSes seen, last rcu_preempt kthread activity 10503
> >>>>> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
> >>>>> syz-executor.0  R  running task    24648  8293   8292 0x0000400a
> >>>>> Call Trace:
> >>>>>   <IRQ>
> >>>>>   sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
> >>>>>   print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
> >>>>>   check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
> >>>>>   rcu_pending kernel/rcu/tree.c:2827 [inline]
> >>>>>   rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
> >>>>>   update_process_times+0x12d/0x180 kernel/time/timer.c:1726
> >>>>>   tick_sched_handle kernel/time/tick-sched.c:167 [inline]
> >>>>>   tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
> >>>>>   __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
> >>>>>   __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
> >>>>>   hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
> >>>>>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
> >>>>>   smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
> >>>>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
> >>>>>   </IRQ>
> >>>>> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
> >>>>> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
> >>>>> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
> >>>>> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25
> >>>>> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00
> >>>>> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
> >>>>> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> >>>>> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
> >>>>> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
> >>>>> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
> >>>>> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
> >>>>> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
> >>>>>   free_thread_stack+0x168/0x590 kernel/fork.c:280
> >>>>>   release_task_stack kernel/fork.c:440 [inline]
> >>>>>   put_task_stack+0xa3/0x130 kernel/fork.c:451
> >>>>>   finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
> >>>>>   context_switch kernel/sched/core.c:3388 [inline]
> >>>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
> >>>>>   preempt_schedule_common kernel/sched/core.c:4236 [inline]
> >>>>>   preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
> >>>>>   ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
> >>>>>   __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
> >>>>>   _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
> >>>>>   kill_something_info kernel/signal.c:1586 [inline]
> >>>>>   __do_sys_kill kernel/signal.c:3640 [inline]
> >>>>>   __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
> >>>>>   __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
> >>>>>   do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
> >>>>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>>> RIP: 0033:0x422a17
> >>>>> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e
> >>>>> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff
> >>>>> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
> >>>>> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
> >>>>> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
> >>>>> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
> >>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
> >>>>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> >>>>> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
> >>>>> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2
> >>>>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> >>>>> rcu: RCU grace-period kthread stack dump:
> >>>>> rcu_preempt     R  running task    29032    10      2 0x80004008
> >>>>> Call Trace:
> >>>>>   context_switch kernel/sched/core.c:3388 [inline]
> >>>>>   __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
> >>>>>   schedule+0x181/0x210 kernel/sched/core.c:4155
> >>>>>   schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
> >>>>>   rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
> >>>>>   rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
> >>>>>   kthread+0x332/0x350 kernel/kthread.c:255
> >>>>>   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> >>>>>
> >>>>>
> >>>>> ---
> >>>>> This bug is generated by a bot. It may contain errors.
> >>>>> See https://goo.gl/tpsmEJ for more information about syzbot.
> >>>>> syzbot engineers can be reached at syzkaller@googlegroups.com.
> >>>>>
> >>>>> syzbot will keep track of this bug report. See:
> >>>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >>>>>
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> >>>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> >>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com.
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/87h81zax74.fsf%40dja-thinkpad.axtens.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-08  6:20           ` Dmitry Vyukov
@ 2020-01-08 10:25             ` Tetsuo Handa
  2020-01-08 17:19               ` Casey Schaufler
  0 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2020-01-08 10:25 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Casey Schaufler, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

On 2020/01/08 15:20, Dmitry Vyukov wrote:
> I temporarily re-enabled smack instance and it produced another 50
> stalls all over the kernel, and now keeps spewing a dozen every hour.

Since we can get stall reports rather easily, can we try modifying
kernel command line (e.g. lsm=smack) and/or kernel config (e.g. no kasan) ?

> 
> I've mailed 3 new samples, you can see them here:
> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
> 
> The config is provided, command line args are here:
> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline
> Some non-default sysctls that syzbot sets are here:
> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl
> Image can be downloaded from here:
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
> syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not
> look to be virtualization-related (?) so probably should reproduce in
> qemu too.

Is it possible to add instance for linux-next.git that uses these configs?
If yes, we could try adding some debug printk() under CONFIG_DEBUG_AID_FOR_SYZBOT=y .

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-08 10:25             ` Tetsuo Handa
@ 2020-01-08 17:19               ` Casey Schaufler
  2020-01-09  8:19                 ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Casey Schaufler @ 2020-01-08 17:19 UTC (permalink / raw)
  To: Tetsuo Handa, Dmitry Vyukov
  Cc: syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs, Casey Schaufler

On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> On 2020/01/08 15:20, Dmitry Vyukov wrote:
>> I temporarily re-enabled smack instance and it produced another 50
>> stalls all over the kernel, and now keeps spewing a dozen every hour.

Do I have to be using clang to test this? I'm setting up to work on this,
and don't want to waste time using my current tool chain if the problem
is clang specific.

> Since we can get stall reports rather easily, can we try modifying
> kernel command line (e.g. lsm=smack) and/or kernel config (e.g. no kasan) ?
>
>> I've mailed 3 new samples, you can see them here:
>> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
>>
>> The config is provided, command line args are here:
>> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline
>> Some non-default sysctls that syzbot sets are here:
>> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl
>> Image can be downloaded from here:
>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
>> syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not
>> look to be virtualization-related (?) so probably should reproduce in
>> qemu too.
> Is it possible to add instance for linux-next.git that uses these configs?
> If yes, we could try adding some debug printk() under CONFIG_DEBUG_AID_FOR_SYZBOT=y .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-08 17:19               ` Casey Schaufler
@ 2020-01-09  8:19                 ` Dmitry Vyukov
  2020-01-09  8:50                   ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-09  8:19 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> >> I temporarily re-enabled smack instance and it produced another 50
> >> stalls all over the kernel, and now keeps spewing a dozen every hour.
>
> Do I have to be using clang to test this? I'm setting up to work on this,
> and don't want to waste time using my current tool chain if the problem
> is clang specific.

Humm, interesting. Initially I was going to say that most likely it's
not clang-related. Bug smack instance is actually the only one that
uses clang as well (except for KMSAN of course). So maybe it's indeed
clang-related rather than smack-related. Let me try to build a kernel
with clang.

> > Since we can get stall reports rather easily, can we try modifying
> > kernel command line (e.g. lsm=smack) and/or kernel config (e.g. no kasan) ?
> >
> >> I've mailed 3 new samples, you can see them here:
> >> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
> >>
> >> The config is provided, command line args are here:
> >> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline
> >> Some non-default sysctls that syzbot sets are here:
> >> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl
> >> Image can be downloaded from here:
> >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
> >> syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not
> >> look to be virtualization-related (?) so probably should reproduce in
> >> qemu too.
> > Is it possible to add instance for linux-next.git that uses these configs?
> > If yes, we could try adding some debug printk() under CONFIG_DEBUG_AID_FOR_SYZBOT=y .

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09  8:19                 ` Dmitry Vyukov
@ 2020-01-09  8:50                   ` Dmitry Vyukov
  2020-01-09  9:29                     ` Dmitry Vyukov
  2020-01-09 15:43                     ` Casey Schaufler
  0 siblings, 2 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-09  8:50 UTC (permalink / raw)
  To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux
  Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

On Thu, Jan 9, 2020 at 9:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >
> > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > >> I temporarily re-enabled smack instance and it produced another 50
> > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> >
> > Do I have to be using clang to test this? I'm setting up to work on this,
> > and don't want to waste time using my current tool chain if the problem
> > is clang specific.
>
> Humm, interesting. Initially I was going to say that most likely it's
> not clang-related. Bug smack instance is actually the only one that
> uses clang as well (except for KMSAN of course). So maybe it's indeed
> clang-related rather than smack-related. Let me try to build a kernel
> with clang.

+clang-built-linux, glider

[clang-built linux is severe broken since early Dec]

Building kernel with clang I can immediately reproduce this locally:

$ syz-manager
2020/01/09 09:27:15 loading corpus...
2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
2020/01/09 09:27:17 booting test machines...
2020/01/09 09:27:17 wait for the connection from test machine...
2020/01/09 09:29:23 machine check:
2020/01/09 09:29:23 syscalls                : 2961/3195
2020/01/09 09:29:23 code coverage           : enabled
2020/01/09 09:29:23 comparison tracing      : enabled
2020/01/09 09:29:23 extra coverage          : enabled
2020/01/09 09:29:23 setuid sandbox          : enabled
2020/01/09 09:29:23 namespace sandbox       : enabled
2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
does not exist
2020/01/09 09:29:23 fault injection         : enabled
2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
not enabled
2020/01/09 09:29:23 net packet injection    : enabled
2020/01/09 09:29:23 net device setup        : enabled
2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
does not exist
2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
is not available
2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep


Then I switched LSM to selinux and I _still_ can reproduce this. So,
Casey, you may relax, this is not smack-specific :)

Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
started working normally.

So this is somehow related to both clang and KASAN/VMAP_STACK.

The clang I used is:
https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
(the one we use on syzbot).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09  8:50                   ` Dmitry Vyukov
@ 2020-01-09  9:29                     ` Dmitry Vyukov
  2020-01-09 10:05                       ` Dmitry Vyukov
  2020-01-09 15:43                     ` Casey Schaufler
  1 sibling, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-09  9:29 UTC (permalink / raw)
  To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux
  Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

On Thu, Jan 9, 2020 at 9:50 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Jan 9, 2020 at 9:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> > >
> > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > >> I temporarily re-enabled smack instance and it produced another 50
> > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > >
> > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > and don't want to waste time using my current tool chain if the problem
> > > is clang specific.
> >
> > Humm, interesting. Initially I was going to say that most likely it's
> > not clang-related. Bug smack instance is actually the only one that
> > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > clang-related rather than smack-related. Let me try to build a kernel
> > with clang.
>
> +clang-built-linux, glider
>
> [clang-built linux is severe broken since early Dec]
>
> Building kernel with clang I can immediately reproduce this locally:
>
> $ syz-manager
> 2020/01/09 09:27:15 loading corpus...
> 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
> 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
> 2020/01/09 09:27:17 booting test machines...
> 2020/01/09 09:27:17 wait for the connection from test machine...
> 2020/01/09 09:29:23 machine check:
> 2020/01/09 09:29:23 syscalls                : 2961/3195
> 2020/01/09 09:29:23 code coverage           : enabled
> 2020/01/09 09:29:23 comparison tracing      : enabled
> 2020/01/09 09:29:23 extra coverage          : enabled
> 2020/01/09 09:29:23 setuid sandbox          : enabled
> 2020/01/09 09:29:23 namespace sandbox       : enabled
> 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
> does not exist
> 2020/01/09 09:29:23 fault injection         : enabled
> 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
> not enabled
> 2020/01/09 09:29:23 net packet injection    : enabled
> 2020/01/09 09:29:23 net device setup        : enabled
> 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
> does not exist
> 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
> is not available
> 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
> 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
> 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
> 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
> 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
> 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
> 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
> 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
> 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
> 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
> 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
> 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
> 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
> 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
> 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
>
>
> Then I switched LSM to selinux and I _still_ can reproduce this. So,
> Casey, you may relax, this is not smack-specific :)
>
> Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
> started working normally.
>
> So this is somehow related to both clang and KASAN/VMAP_STACK.
>
> The clang I used is:
> https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
> (the one we use on syzbot).


Clustering hangs, they all happen within very limited section of the code:

      1  free_thread_stack+0x124/0x590 kernel/fork.c:284
      5  free_thread_stack+0x12e/0x590 kernel/fork.c:280
     39  free_thread_stack+0x12e/0x590 kernel/fork.c:284
      6  free_thread_stack+0x133/0x590 kernel/fork.c:280
      5  free_thread_stack+0x13d/0x590 kernel/fork.c:280
      2  free_thread_stack+0x141/0x590 kernel/fork.c:280
      6  free_thread_stack+0x14c/0x590 kernel/fork.c:280
      9  free_thread_stack+0x151/0x590 kernel/fork.c:280
      3  free_thread_stack+0x15b/0x590 kernel/fork.c:280
     67  free_thread_stack+0x168/0x590 kernel/fork.c:280
      6  free_thread_stack+0x16d/0x590 kernel/fork.c:284
      2  free_thread_stack+0x177/0x590 kernel/fork.c:284
      1  free_thread_stack+0x182/0x590 kernel/fork.c:284
      1  free_thread_stack+0x186/0x590 kernel/fork.c:284
     16  free_thread_stack+0x18b/0x590 kernel/fork.c:284
      4  free_thread_stack+0x195/0x590 kernel/fork.c:284

Here is disass of the function:
https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt

But if I am not mistaken, the function only ever jumps down. So how
can it loop?...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09  9:29                     ` Dmitry Vyukov
@ 2020-01-09 10:05                       ` Dmitry Vyukov
  2020-01-09 10:39                         ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-09 10:05 UTC (permalink / raw)
  To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux
  Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

On Thu, Jan 9, 2020 at 10:29 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > >
> > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > > >> I temporarily re-enabled smack instance and it produced another 50
> > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > > >
> > > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > > and don't want to waste time using my current tool chain if the problem
> > > > is clang specific.
> > >
> > > Humm, interesting. Initially I was going to say that most likely it's
> > > not clang-related. Bug smack instance is actually the only one that
> > > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > > clang-related rather than smack-related. Let me try to build a kernel
> > > with clang.
> >
> > +clang-built-linux, glider
> >
> > [clang-built linux is severe broken since early Dec]
> >
> > Building kernel with clang I can immediately reproduce this locally:
> >
> > $ syz-manager
> > 2020/01/09 09:27:15 loading corpus...
> > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
> > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
> > 2020/01/09 09:27:17 booting test machines...
> > 2020/01/09 09:27:17 wait for the connection from test machine...
> > 2020/01/09 09:29:23 machine check:
> > 2020/01/09 09:29:23 syscalls                : 2961/3195
> > 2020/01/09 09:29:23 code coverage           : enabled
> > 2020/01/09 09:29:23 comparison tracing      : enabled
> > 2020/01/09 09:29:23 extra coverage          : enabled
> > 2020/01/09 09:29:23 setuid sandbox          : enabled
> > 2020/01/09 09:29:23 namespace sandbox       : enabled
> > 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
> > does not exist
> > 2020/01/09 09:29:23 fault injection         : enabled
> > 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
> > not enabled
> > 2020/01/09 09:29:23 net packet injection    : enabled
> > 2020/01/09 09:29:23 net device setup        : enabled
> > 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
> > does not exist
> > 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
> > is not available
> > 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
> > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
> > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
> > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
> > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
> > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
> > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
> > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
> > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
> > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
> > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
> > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
> > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
> > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
> > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
> > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
> > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
> > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
> > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
> > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
> > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
> > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
> > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
> > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
> > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
> > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
> > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
> > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
> >
> >
> > Then I switched LSM to selinux and I _still_ can reproduce this. So,
> > Casey, you may relax, this is not smack-specific :)
> >
> > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
> > started working normally.
> >
> > So this is somehow related to both clang and KASAN/VMAP_STACK.
> >
> > The clang I used is:
> > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
> > (the one we use on syzbot).
>
>
> Clustering hangs, they all happen within very limited section of the code:
>
>       1  free_thread_stack+0x124/0x590 kernel/fork.c:284
>       5  free_thread_stack+0x12e/0x590 kernel/fork.c:280
>      39  free_thread_stack+0x12e/0x590 kernel/fork.c:284
>       6  free_thread_stack+0x133/0x590 kernel/fork.c:280
>       5  free_thread_stack+0x13d/0x590 kernel/fork.c:280
>       2  free_thread_stack+0x141/0x590 kernel/fork.c:280
>       6  free_thread_stack+0x14c/0x590 kernel/fork.c:280
>       9  free_thread_stack+0x151/0x590 kernel/fork.c:280
>       3  free_thread_stack+0x15b/0x590 kernel/fork.c:280
>      67  free_thread_stack+0x168/0x590 kernel/fork.c:280
>       6  free_thread_stack+0x16d/0x590 kernel/fork.c:284
>       2  free_thread_stack+0x177/0x590 kernel/fork.c:284
>       1  free_thread_stack+0x182/0x590 kernel/fork.c:284
>       1  free_thread_stack+0x186/0x590 kernel/fork.c:284
>      16  free_thread_stack+0x18b/0x590 kernel/fork.c:284
>       4  free_thread_stack+0x195/0x590 kernel/fork.c:284
>
> Here is disass of the function:
> https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt
>
> But if I am not mistaken, the function only ever jumps down. So how
> can it loop?...


This is a miscompilation related to static branches.

objdump shows:

ffffffff814878f8: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
 ./arch/x86/include/asm/jump_label.h:25
asm_volatile_goto("1:"

However, the actual instruction in memory at the time is:

   0xffffffff814878f8 <+408>: jmpq   0xffffffff8148787f <free_thread_stack+287>

Which jumps to a wrong location in free_thread_stack and makes it loop.

The static branch is this:

static inline bool memcg_kmem_enabled(void)
{
  return static_branch_unlikely(&memcg_kmem_enabled_key);
}

static inline void memcg_kmem_uncharge(struct page *page, int order)
{
  if (memcg_kmem_enabled())
    __memcg_kmem_uncharge(page, order);
}

I suspect it may have something to do with loop unrolling. It may jump
to the right location, but in the wrong unrolled iteration.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09 10:05                       ` Dmitry Vyukov
@ 2020-01-09 10:39                         ` Dmitry Vyukov
  2020-01-09 16:23                           ` Alexander Potapenko
  2020-01-09 23:25                           ` Daniel Axtens
  0 siblings, 2 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-09 10:39 UTC (permalink / raw)
  To: Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux
  Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > > > >> I temporarily re-enabled smack instance and it produced another 50
> > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > > > >
> > > > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > > > and don't want to waste time using my current tool chain if the problem
> > > > > is clang specific.
> > > >
> > > > Humm, interesting. Initially I was going to say that most likely it's
> > > > not clang-related. Bug smack instance is actually the only one that
> > > > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > > > clang-related rather than smack-related. Let me try to build a kernel
> > > > with clang.
> > >
> > > +clang-built-linux, glider
> > >
> > > [clang-built linux is severe broken since early Dec]
> > >
> > > Building kernel with clang I can immediately reproduce this locally:
> > >
> > > $ syz-manager
> > > 2020/01/09 09:27:15 loading corpus...
> > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
> > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
> > > 2020/01/09 09:27:17 booting test machines...
> > > 2020/01/09 09:27:17 wait for the connection from test machine...
> > > 2020/01/09 09:29:23 machine check:
> > > 2020/01/09 09:29:23 syscalls                : 2961/3195
> > > 2020/01/09 09:29:23 code coverage           : enabled
> > > 2020/01/09 09:29:23 comparison tracing      : enabled
> > > 2020/01/09 09:29:23 extra coverage          : enabled
> > > 2020/01/09 09:29:23 setuid sandbox          : enabled
> > > 2020/01/09 09:29:23 namespace sandbox       : enabled
> > > 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
> > > does not exist
> > > 2020/01/09 09:29:23 fault injection         : enabled
> > > 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
> > > not enabled
> > > 2020/01/09 09:29:23 net packet injection    : enabled
> > > 2020/01/09 09:29:23 net device setup        : enabled
> > > 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
> > > does not exist
> > > 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
> > > is not available
> > > 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
> > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
> > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
> > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
> > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
> > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
> > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
> > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
> > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
> > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
> > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
> > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
> > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
> > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
> > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
> > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
> > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
> > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
> > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
> > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
> > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
> > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
> > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
> > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
> > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
> > >
> > >
> > > Then I switched LSM to selinux and I _still_ can reproduce this. So,
> > > Casey, you may relax, this is not smack-specific :)
> > >
> > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
> > > started working normally.
> > >
> > > So this is somehow related to both clang and KASAN/VMAP_STACK.
> > >
> > > The clang I used is:
> > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
> > > (the one we use on syzbot).
> >
> >
> > Clustering hangs, they all happen within very limited section of the code:
> >
> >       1  free_thread_stack+0x124/0x590 kernel/fork.c:284
> >       5  free_thread_stack+0x12e/0x590 kernel/fork.c:280
> >      39  free_thread_stack+0x12e/0x590 kernel/fork.c:284
> >       6  free_thread_stack+0x133/0x590 kernel/fork.c:280
> >       5  free_thread_stack+0x13d/0x590 kernel/fork.c:280
> >       2  free_thread_stack+0x141/0x590 kernel/fork.c:280
> >       6  free_thread_stack+0x14c/0x590 kernel/fork.c:280
> >       9  free_thread_stack+0x151/0x590 kernel/fork.c:280
> >       3  free_thread_stack+0x15b/0x590 kernel/fork.c:280
> >      67  free_thread_stack+0x168/0x590 kernel/fork.c:280
> >       6  free_thread_stack+0x16d/0x590 kernel/fork.c:284
> >       2  free_thread_stack+0x177/0x590 kernel/fork.c:284
> >       1  free_thread_stack+0x182/0x590 kernel/fork.c:284
> >       1  free_thread_stack+0x186/0x590 kernel/fork.c:284
> >      16  free_thread_stack+0x18b/0x590 kernel/fork.c:284
> >       4  free_thread_stack+0x195/0x590 kernel/fork.c:284
> >
> > Here is disass of the function:
> > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt
> >
> > But if I am not mistaken, the function only ever jumps down. So how
> > can it loop?...
>
>
> This is a miscompilation related to static branches.
>
> objdump shows:
>
> ffffffff814878f8: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
>  ./arch/x86/include/asm/jump_label.h:25
> asm_volatile_goto("1:"
>
> However, the actual instruction in memory at the time is:
>
>    0xffffffff814878f8 <+408>: jmpq   0xffffffff8148787f <free_thread_stack+287>
>
> Which jumps to a wrong location in free_thread_stack and makes it loop.
>
> The static branch is this:
>
> static inline bool memcg_kmem_enabled(void)
> {
>   return static_branch_unlikely(&memcg_kmem_enabled_key);
> }
>
> static inline void memcg_kmem_uncharge(struct page *page, int order)
> {
>   if (memcg_kmem_enabled())
>     __memcg_kmem_uncharge(page, order);
> }
>
> I suspect it may have something to do with loop unrolling. It may jump
> to the right location, but in the wrong unrolled iteration.


Kernel built with clang version 10.0.0
(https://github.com/llvm/llvm-project.git
c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine.

Alex, please update clang on syzbot machines.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09  8:50                   ` Dmitry Vyukov
  2020-01-09  9:29                     ` Dmitry Vyukov
@ 2020-01-09 15:43                     ` Casey Schaufler
  1 sibling, 0 replies; 22+ messages in thread
From: Casey Schaufler @ 2020-01-09 15:43 UTC (permalink / raw)
  To: Dmitry Vyukov, Daniel Axtens, Alexander Potapenko, clang-built-linux
  Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML,
	syzkaller-bugs, Casey Schaufler

On 1/9/2020 12:50 AM, Dmitry Vyukov wrote:
> On Thu, Jan 9, 2020 at 9:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>> On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
>>>> On 2020/01/08 15:20, Dmitry Vyukov wrote:
>>>>> I temporarily re-enabled smack instance and it produced another 50
>>>>> stalls all over the kernel, and now keeps spewing a dozen every hour.
>>> Do I have to be using clang to test this? I'm setting up to work on this,
>>> and don't want to waste time using my current tool chain if the problem
>>> is clang specific.
>> Humm, interesting. Initially I was going to say that most likely it's
>> not clang-related. Bug smack instance is actually the only one that
>> uses clang as well (except for KMSAN of course). So maybe it's indeed
>> clang-related rather than smack-related. Let me try to build a kernel
>> with clang.
> +clang-built-linux, glider
>
> [clang-built linux is severe broken since early Dec]
>
> Building kernel with clang I can immediately reproduce this locally:
>
> $ syz-manager
> 2020/01/09 09:27:15 loading corpus...
> 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
> 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
> 2020/01/09 09:27:17 booting test machines...
> 2020/01/09 09:27:17 wait for the connection from test machine...
> 2020/01/09 09:29:23 machine check:
> 2020/01/09 09:29:23 syscalls                : 2961/3195
> 2020/01/09 09:29:23 code coverage           : enabled
> 2020/01/09 09:29:23 comparison tracing      : enabled
> 2020/01/09 09:29:23 extra coverage          : enabled
> 2020/01/09 09:29:23 setuid sandbox          : enabled
> 2020/01/09 09:29:23 namespace sandbox       : enabled
> 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
> does not exist
> 2020/01/09 09:29:23 fault injection         : enabled
> 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
> not enabled
> 2020/01/09 09:29:23 net packet injection    : enabled
> 2020/01/09 09:29:23 net device setup        : enabled
> 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
> does not exist
> 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
> is not available
> 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
> 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
> 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
> 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
> 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
> 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
> 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
> 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
> 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
> 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
> 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
> 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
> 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
> 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
> 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
> 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
> 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
> 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
>
>
> Then I switched LSM to selinux and I _still_ can reproduce this. So,
> Casey, you may relax, this is not smack-specific :)

Wow. I wasn't expecting clang to be the problem, just a possible
required condition. I am, of course, quite relieved.

>
> Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
> started working normally.
>
> So this is somehow related to both clang and KASAN/VMAP_STACK.
>
> The clang I used is:
> https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
> (the one we use on syzbot).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09 10:39                         ` Dmitry Vyukov
@ 2020-01-09 16:23                           ` Alexander Potapenko
  2020-01-09 17:16                             ` Nick Desaulniers
  2020-01-09 23:25                           ` Daniel Axtens
  1 sibling, 1 reply; 22+ messages in thread
From: Alexander Potapenko @ 2020-01-09 16:23 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa,
	syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

On Thu, Jan 9, 2020 at 11:39 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > > > > >> I temporarily re-enabled smack instance and it produced another 50
> > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > > > > >
> > > > > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > > > > and don't want to waste time using my current tool chain if the problem
> > > > > > is clang specific.
> > > > >
> > > > > Humm, interesting. Initially I was going to say that most likely it's
> > > > > not clang-related. Bug smack instance is actually the only one that
> > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > > > > clang-related rather than smack-related. Let me try to build a kernel
> > > > > with clang.
> > > >
> > > > +clang-built-linux, glider
> > > >
> > > > [clang-built linux is severe broken since early Dec]
> > > >
> > > > Building kernel with clang I can immediately reproduce this locally:
> > > >
> > > > $ syz-manager
> > > > 2020/01/09 09:27:15 loading corpus...
> > > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
> > > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
> > > > 2020/01/09 09:27:17 booting test machines...
> > > > 2020/01/09 09:27:17 wait for the connection from test machine...
> > > > 2020/01/09 09:29:23 machine check:
> > > > 2020/01/09 09:29:23 syscalls                : 2961/3195
> > > > 2020/01/09 09:29:23 code coverage           : enabled
> > > > 2020/01/09 09:29:23 comparison tracing      : enabled
> > > > 2020/01/09 09:29:23 extra coverage          : enabled
> > > > 2020/01/09 09:29:23 setuid sandbox          : enabled
> > > > 2020/01/09 09:29:23 namespace sandbox       : enabled
> > > > 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
> > > > does not exist
> > > > 2020/01/09 09:29:23 fault injection         : enabled
> > > > 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
> > > > not enabled
> > > > 2020/01/09 09:29:23 net packet injection    : enabled
> > > > 2020/01/09 09:29:23 net device setup        : enabled
> > > > 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
> > > > does not exist
> > > > 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
> > > > is not available
> > > > 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
> > > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
> > > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
> > > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
> > > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
> > > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
> > > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
> > > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
> > > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
> > > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
> > > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
> > > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
> > > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
> > > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
> > > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
> > > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
> > > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
> > > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
> > > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
> > > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
> > > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
> > > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
> > > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
> > > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
> > > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
> > > >
> > > >
> > > > Then I switched LSM to selinux and I _still_ can reproduce this. So,
> > > > Casey, you may relax, this is not smack-specific :)
> > > >
> > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
> > > > started working normally.
> > > >
> > > > So this is somehow related to both clang and KASAN/VMAP_STACK.
> > > >
> > > > The clang I used is:
> > > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
> > > > (the one we use on syzbot).
> > >
> > >
> > > Clustering hangs, they all happen within very limited section of the code:
> > >
> > >       1  free_thread_stack+0x124/0x590 kernel/fork.c:284
> > >       5  free_thread_stack+0x12e/0x590 kernel/fork.c:280
> > >      39  free_thread_stack+0x12e/0x590 kernel/fork.c:284
> > >       6  free_thread_stack+0x133/0x590 kernel/fork.c:280
> > >       5  free_thread_stack+0x13d/0x590 kernel/fork.c:280
> > >       2  free_thread_stack+0x141/0x590 kernel/fork.c:280
> > >       6  free_thread_stack+0x14c/0x590 kernel/fork.c:280
> > >       9  free_thread_stack+0x151/0x590 kernel/fork.c:280
> > >       3  free_thread_stack+0x15b/0x590 kernel/fork.c:280
> > >      67  free_thread_stack+0x168/0x590 kernel/fork.c:280
> > >       6  free_thread_stack+0x16d/0x590 kernel/fork.c:284
> > >       2  free_thread_stack+0x177/0x590 kernel/fork.c:284
> > >       1  free_thread_stack+0x182/0x590 kernel/fork.c:284
> > >       1  free_thread_stack+0x186/0x590 kernel/fork.c:284
> > >      16  free_thread_stack+0x18b/0x590 kernel/fork.c:284
> > >       4  free_thread_stack+0x195/0x590 kernel/fork.c:284
> > >
> > > Here is disass of the function:
> > > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt
> > >
> > > But if I am not mistaken, the function only ever jumps down. So how
> > > can it loop?...
> >
> >
> > This is a miscompilation related to static branches.
> >
> > objdump shows:
> >
> > ffffffff814878f8: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
> >  ./arch/x86/include/asm/jump_label.h:25
> > asm_volatile_goto("1:"
> >
> > However, the actual instruction in memory at the time is:
> >
> >    0xffffffff814878f8 <+408>: jmpq   0xffffffff8148787f <free_thread_stack+287>
> >
> > Which jumps to a wrong location in free_thread_stack and makes it loop.
> >
> > The static branch is this:
> >
> > static inline bool memcg_kmem_enabled(void)
> > {
> >   return static_branch_unlikely(&memcg_kmem_enabled_key);
> > }
> >
> > static inline void memcg_kmem_uncharge(struct page *page, int order)
> > {
> >   if (memcg_kmem_enabled())
> >     __memcg_kmem_uncharge(page, order);
> > }
> >
> > I suspect it may have something to do with loop unrolling. It may jump
> > to the right location, but in the wrong unrolled iteration.
>
>
> Kernel built with clang version 10.0.0
> (https://github.com/llvm/llvm-project.git
> c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine.
>
> Alex, please update clang on syzbot machines.

Done ~3 hours ago, guess we'll see the results within a day.

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09 16:23                           ` Alexander Potapenko
@ 2020-01-09 17:16                             ` Nick Desaulniers
  2020-01-09 17:23                               ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Nick Desaulniers @ 2020-01-09 17:16 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Dmitry Vyukov, Casey Schaufler, Daniel Axtens, clang-built-linux,
	Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML,
	syzkaller-bugs

On Thu, Jan 9, 2020 at 8:23 AM 'Alexander Potapenko' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> On Thu, Jan 9, 2020 at 11:39 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > > > > > >> I temporarily re-enabled smack instance and it produced another 50
> > > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > > > > > >
> > > > > > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > > > > > and don't want to waste time using my current tool chain if the problem
> > > > > > > is clang specific.
> > > > > >
> > > > > > Humm, interesting. Initially I was going to say that most likely it's
> > > > > > not clang-related. Bug smack instance is actually the only one that
> > > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > > > > > clang-related rather than smack-related. Let me try to build a kernel
> > > > > > with clang.
> > > > >
> > > > > +clang-built-linux, glider
> > > > >
> > > > > [clang-built linux is severe broken since early Dec]

Is there automated reporting? Consider adding our mailing list for
Clang specific failures.
clang-built-linux <clang-built-linux@googlegroups.com>
Our CI looks green, but there's a very long tail of combinations of
configs that we don't have coverage of, so bug reports are
appreciated:
https://github.com/ClangBuiltLinux/linux/issues

> > > > >
> > > > > Building kernel with clang I can immediately reproduce this locally:
> > > > >
> > > > > $ syz-manager
> > > > > 2020/01/09 09:27:15 loading corpus...
> > > > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
> > > > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
> > > > > 2020/01/09 09:27:17 booting test machines...
> > > > > 2020/01/09 09:27:17 wait for the connection from test machine...
> > > > > 2020/01/09 09:29:23 machine check:
> > > > > 2020/01/09 09:29:23 syscalls                : 2961/3195
> > > > > 2020/01/09 09:29:23 code coverage           : enabled
> > > > > 2020/01/09 09:29:23 comparison tracing      : enabled
> > > > > 2020/01/09 09:29:23 extra coverage          : enabled
> > > > > 2020/01/09 09:29:23 setuid sandbox          : enabled
> > > > > 2020/01/09 09:29:23 namespace sandbox       : enabled
> > > > > 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
> > > > > does not exist
> > > > > 2020/01/09 09:29:23 fault injection         : enabled
> > > > > 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
> > > > > not enabled
> > > > > 2020/01/09 09:29:23 net packet injection    : enabled
> > > > > 2020/01/09 09:29:23 net device setup        : enabled
> > > > > 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
> > > > > does not exist
> > > > > 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
> > > > > is not available
> > > > > 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
> > > > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
> > > > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
> > > > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
> > > > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
> > > > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
> > > > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
> > > > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
> > > > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
> > > > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
> > > > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
> > > > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
> > > > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
> > > > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
> > > > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
> > > > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
> > > > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
> > > > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
> > > > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
> > > > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
> > > > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
> > > > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
> > > > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
> > > > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
> > > > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
> > > > >
> > > > >
> > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So,
> > > > > Casey, you may relax, this is not smack-specific :)
> > > > >
> > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
> > > > > started working normally.
> > > > >
> > > > > So this is somehow related to both clang and KASAN/VMAP_STACK.
> > > > >
> > > > > The clang I used is:
> > > > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
> > > > > (the one we use on syzbot).
> > > >
> > > >
> > > > Clustering hangs, they all happen within very limited section of the code:
> > > >
> > > >       1  free_thread_stack+0x124/0x590 kernel/fork.c:284
> > > >       5  free_thread_stack+0x12e/0x590 kernel/fork.c:280
> > > >      39  free_thread_stack+0x12e/0x590 kernel/fork.c:284
> > > >       6  free_thread_stack+0x133/0x590 kernel/fork.c:280
> > > >       5  free_thread_stack+0x13d/0x590 kernel/fork.c:280
> > > >       2  free_thread_stack+0x141/0x590 kernel/fork.c:280
> > > >       6  free_thread_stack+0x14c/0x590 kernel/fork.c:280
> > > >       9  free_thread_stack+0x151/0x590 kernel/fork.c:280
> > > >       3  free_thread_stack+0x15b/0x590 kernel/fork.c:280
> > > >      67  free_thread_stack+0x168/0x590 kernel/fork.c:280
> > > >       6  free_thread_stack+0x16d/0x590 kernel/fork.c:284
> > > >       2  free_thread_stack+0x177/0x590 kernel/fork.c:284
> > > >       1  free_thread_stack+0x182/0x590 kernel/fork.c:284
> > > >       1  free_thread_stack+0x186/0x590 kernel/fork.c:284
> > > >      16  free_thread_stack+0x18b/0x590 kernel/fork.c:284
> > > >       4  free_thread_stack+0x195/0x590 kernel/fork.c:284
> > > >
> > > > Here is disass of the function:
> > > > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt
> > > >
> > > > But if I am not mistaken, the function only ever jumps down. So how
> > > > can it loop?...
> > >
> > >
> > > This is a miscompilation related to static branches.
> > >
> > > objdump shows:
> > >
> > > ffffffff814878f8: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
> > >  ./arch/x86/include/asm/jump_label.h:25
> > > asm_volatile_goto("1:"
> > >
> > > However, the actual instruction in memory at the time is:
> > >
> > >    0xffffffff814878f8 <+408>: jmpq   0xffffffff8148787f <free_thread_stack+287>
> > >
> > > Which jumps to a wrong location in free_thread_stack and makes it loop.
> > >
> > > The static branch is this:
> > >
> > > static inline bool memcg_kmem_enabled(void)
> > > {
> > >   return static_branch_unlikely(&memcg_kmem_enabled_key);
> > > }
> > >
> > > static inline void memcg_kmem_uncharge(struct page *page, int order)
> > > {
> > >   if (memcg_kmem_enabled())
> > >     __memcg_kmem_uncharge(page, order);
> > > }
> > >
> > > I suspect it may have something to do with loop unrolling. It may jump
> > > to the right location, but in the wrong unrolled iteration.

I disabled loop unrolling and loop unswitching in LLVM when the loop
contained asm goto in:
https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665
I have a fix for loop unrolling in:
https://reviews.llvm.org/D64101
that I should dust off. I haven't looked into loop unswitching yet.

> >
> >
> > Kernel built with clang version 10.0.0
> > (https://github.com/llvm/llvm-project.git
> > c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine.
> >
> > Alex, please update clang on syzbot machines.
>
> Done ~3 hours ago, guess we'll see the results within a day.

Please let me know if you otherwise encounter any miscompiles with
Clang, particularly `asm goto` I treat as P0.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09 17:16                             ` Nick Desaulniers
@ 2020-01-09 17:23                               ` Dmitry Vyukov
  2020-01-09 17:38                                 ` Nick Desaulniers
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-09 17:23 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Alexander Potapenko, Casey Schaufler, Daniel Axtens,
	clang-built-linux, Tetsuo Handa, syzbot, kasan-dev,
	Andrew Morton, LKML, syzkaller-bugs

On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > > > > > > >> I temporarily re-enabled smack instance and it produced another 50
> > > > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > > > > > > >
> > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > > > > > > and don't want to waste time using my current tool chain if the problem
> > > > > > > > is clang specific.
> > > > > > >
> > > > > > > Humm, interesting. Initially I was going to say that most likely it's
> > > > > > > not clang-related. Bug smack instance is actually the only one that
> > > > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > > > > > > clang-related rather than smack-related. Let me try to build a kernel
> > > > > > > with clang.
> > > > > >
> > > > > > +clang-built-linux, glider
> > > > > >
> > > > > > [clang-built linux is severe broken since early Dec]
>
> Is there automated reporting? Consider adding our mailing list for
> Clang specific failures.
> clang-built-linux <clang-built-linux@googlegroups.com>
> Our CI looks green, but there's a very long tail of combinations of
> configs that we don't have coverage of, so bug reports are
> appreciated:
> https://github.com/ClangBuiltLinux/linux/issues

syzbot does automatic reporting, but it does not automatically
classify bugs as clang-specific.
FTR, this combination is clang+KASAN+VMAP_STACK (relatively recent
changes, and that's what triggered the infinite loop). But note that
the kernel booted, you can ssh and do some basic things.

> > > > > >
> > > > > > Building kernel with clang I can immediately reproduce this locally:
> > > > > >
> > > > > > $ syz-manager
> > > > > > 2020/01/09 09:27:15 loading corpus...
> > > > > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
> > > > > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
> > > > > > 2020/01/09 09:27:17 booting test machines...
> > > > > > 2020/01/09 09:27:17 wait for the connection from test machine...
> > > > > > 2020/01/09 09:29:23 machine check:
> > > > > > 2020/01/09 09:29:23 syscalls                : 2961/3195
> > > > > > 2020/01/09 09:29:23 code coverage           : enabled
> > > > > > 2020/01/09 09:29:23 comparison tracing      : enabled
> > > > > > 2020/01/09 09:29:23 extra coverage          : enabled
> > > > > > 2020/01/09 09:29:23 setuid sandbox          : enabled
> > > > > > 2020/01/09 09:29:23 namespace sandbox       : enabled
> > > > > > 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
> > > > > > does not exist
> > > > > > 2020/01/09 09:29:23 fault injection         : enabled
> > > > > > 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
> > > > > > not enabled
> > > > > > 2020/01/09 09:29:23 net packet injection    : enabled
> > > > > > 2020/01/09 09:29:23 net device setup        : enabled
> > > > > > 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
> > > > > > does not exist
> > > > > > 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
> > > > > > is not available
> > > > > > 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
> > > > > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
> > > > > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
> > > > > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
> > > > > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
> > > > > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
> > > > > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
> > > > > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
> > > > > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
> > > > > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
> > > > > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
> > > > > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
> > > > > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
> > > > > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
> > > > > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
> > > > > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
> > > > > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
> > > > > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
> > > > > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
> > > > > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
> > > > > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
> > > > > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
> > > > > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
> > > > > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
> > > > > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
> > > > > >
> > > > > >
> > > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So,
> > > > > > Casey, you may relax, this is not smack-specific :)
> > > > > >
> > > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
> > > > > > started working normally.
> > > > > >
> > > > > > So this is somehow related to both clang and KASAN/VMAP_STACK.
> > > > > >
> > > > > > The clang I used is:
> > > > > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
> > > > > > (the one we use on syzbot).
> > > > >
> > > > >
> > > > > Clustering hangs, they all happen within very limited section of the code:
> > > > >
> > > > >       1  free_thread_stack+0x124/0x590 kernel/fork.c:284
> > > > >       5  free_thread_stack+0x12e/0x590 kernel/fork.c:280
> > > > >      39  free_thread_stack+0x12e/0x590 kernel/fork.c:284
> > > > >       6  free_thread_stack+0x133/0x590 kernel/fork.c:280
> > > > >       5  free_thread_stack+0x13d/0x590 kernel/fork.c:280
> > > > >       2  free_thread_stack+0x141/0x590 kernel/fork.c:280
> > > > >       6  free_thread_stack+0x14c/0x590 kernel/fork.c:280
> > > > >       9  free_thread_stack+0x151/0x590 kernel/fork.c:280
> > > > >       3  free_thread_stack+0x15b/0x590 kernel/fork.c:280
> > > > >      67  free_thread_stack+0x168/0x590 kernel/fork.c:280
> > > > >       6  free_thread_stack+0x16d/0x590 kernel/fork.c:284
> > > > >       2  free_thread_stack+0x177/0x590 kernel/fork.c:284
> > > > >       1  free_thread_stack+0x182/0x590 kernel/fork.c:284
> > > > >       1  free_thread_stack+0x186/0x590 kernel/fork.c:284
> > > > >      16  free_thread_stack+0x18b/0x590 kernel/fork.c:284
> > > > >       4  free_thread_stack+0x195/0x590 kernel/fork.c:284
> > > > >
> > > > > Here is disass of the function:
> > > > > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt
> > > > >
> > > > > But if I am not mistaken, the function only ever jumps down. So how
> > > > > can it loop?...
> > > >
> > > >
> > > > This is a miscompilation related to static branches.
> > > >
> > > > objdump shows:
> > > >
> > > > ffffffff814878f8: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
> > > >  ./arch/x86/include/asm/jump_label.h:25
> > > > asm_volatile_goto("1:"
> > > >
> > > > However, the actual instruction in memory at the time is:
> > > >
> > > >    0xffffffff814878f8 <+408>: jmpq   0xffffffff8148787f <free_thread_stack+287>
> > > >
> > > > Which jumps to a wrong location in free_thread_stack and makes it loop.
> > > >
> > > > The static branch is this:
> > > >
> > > > static inline bool memcg_kmem_enabled(void)
> > > > {
> > > >   return static_branch_unlikely(&memcg_kmem_enabled_key);
> > > > }
> > > >
> > > > static inline void memcg_kmem_uncharge(struct page *page, int order)
> > > > {
> > > >   if (memcg_kmem_enabled())
> > > >     __memcg_kmem_uncharge(page, order);
> > > > }
> > > >
> > > > I suspect it may have something to do with loop unrolling. It may jump
> > > > to the right location, but in the wrong unrolled iteration.
>
> I disabled loop unrolling and loop unswitching in LLVM when the loop
> contained asm goto in:
> https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665
> I have a fix for loop unrolling in:
> https://reviews.llvm.org/D64101
> that I should dust off. I haven't looked into loop unswitching yet.

c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the
broken compiler and the newer compiler that seems to work, so I would
assume that that commit fixes this.
We will get the final stamp from syzbot hopefully by tomorrow.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09 17:23                               ` Dmitry Vyukov
@ 2020-01-09 17:38                                 ` Nick Desaulniers
  2020-01-10  8:37                                   ` Alexander Potapenko
  0 siblings, 1 reply; 22+ messages in thread
From: Nick Desaulniers @ 2020-01-09 17:38 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Alexander Potapenko, Casey Schaufler, Daniel Axtens,
	clang-built-linux, Tetsuo Handa, syzbot, kasan-dev,
	Andrew Morton, LKML, syzkaller-bugs

On Thu, Jan 9, 2020 at 9:23 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> > I disabled loop unrolling and loop unswitching in LLVM when the loop
> > contained asm goto in:
> > https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665
> > I have a fix for loop unrolling in:
> > https://reviews.llvm.org/D64101
> > that I should dust off. I haven't looked into loop unswitching yet.
>
> c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the
> broken compiler and the newer compiler that seems to work, so I would
> assume that that commit fixes this.
> We will get the final stamp from syzbot hopefully by tomorrow.

How often do you refresh the build of Clang in syzbot? Is it manual? I
understand the tradeoffs of living on the tip of the spear, but
c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is 6 months old.  So upstream
LLVM could be regressing more often, and you wouldn't notice for 1/2 a
year or more. :-/

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09 10:39                         ` Dmitry Vyukov
  2020-01-09 16:23                           ` Alexander Potapenko
@ 2020-01-09 23:25                           ` Daniel Axtens
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Axtens @ 2020-01-09 23:25 UTC (permalink / raw)
  To: Dmitry Vyukov, Casey Schaufler, Alexander Potapenko, clang-built-linux
  Cc: Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs

Dmitry Vyukov <dvyukov@google.com> writes:

> On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>> > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
>> > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
>> > > > > >> I temporarily re-enabled smack instance and it produced another 50
>> > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
>> > > > >
>> > > > > Do I have to be using clang to test this? I'm setting up to work on this,
>> > > > > and don't want to waste time using my current tool chain if the problem
>> > > > > is clang specific.
>> > > >
>> > > > Humm, interesting. Initially I was going to say that most likely it's
>> > > > not clang-related. Bug smack instance is actually the only one that
>> > > > uses clang as well (except for KMSAN of course). So maybe it's indeed
>> > > > clang-related rather than smack-related. Let me try to build a kernel
>> > > > with clang.
>> > >
>> > > +clang-built-linux, glider
>> > >
>> > > [clang-built linux is severe broken since early Dec]
>> > >
>> > > Building kernel with clang I can immediately reproduce this locally:
>> > >
>> > > $ syz-manager
>> > > 2020/01/09 09:27:15 loading corpus...
>> > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
>> > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
>> > > 2020/01/09 09:27:17 booting test machines...
>> > > 2020/01/09 09:27:17 wait for the connection from test machine...
>> > > 2020/01/09 09:29:23 machine check:
>> > > 2020/01/09 09:29:23 syscalls                : 2961/3195
>> > > 2020/01/09 09:29:23 code coverage           : enabled
>> > > 2020/01/09 09:29:23 comparison tracing      : enabled
>> > > 2020/01/09 09:29:23 extra coverage          : enabled
>> > > 2020/01/09 09:29:23 setuid sandbox          : enabled
>> > > 2020/01/09 09:29:23 namespace sandbox       : enabled
>> > > 2020/01/09 09:29:23 Android sandbox         : /sys/fs/selinux/policy
>> > > does not exist
>> > > 2020/01/09 09:29:23 fault injection         : enabled
>> > > 2020/01/09 09:29:23 leak checking           : CONFIG_DEBUG_KMEMLEAK is
>> > > not enabled
>> > > 2020/01/09 09:29:23 net packet injection    : enabled
>> > > 2020/01/09 09:29:23 net device setup        : enabled
>> > > 2020/01/09 09:29:23 concurrency sanitizer   : /sys/kernel/debug/kcsan
>> > > does not exist
>> > > 2020/01/09 09:29:23 devlink PCI setup       : PCI device 0000:00:10.0
>> > > is not available
>> > > 2020/01/09 09:29:27 corpus                  : 50226 (0 deleted)
>> > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
>> > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
>> > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
>> > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
>> > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
>> > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
>> > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
>> > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
>> > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
>> > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
>> > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
>> > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
>> > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
>> > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
>> > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
>> > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
>> > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
>> > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
>> > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
>> > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
>> > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
>> > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
>> > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
>> > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep
>> > >
>> > >
>> > > Then I switched LSM to selinux and I _still_ can reproduce this. So,
>> > > Casey, you may relax, this is not smack-specific :)
>> > >
>> > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
>> > > started working normally.
>> > >
>> > > So this is somehow related to both clang and KASAN/VMAP_STACK.
>> > >
>> > > The clang I used is:
>> > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
>> > > (the one we use on syzbot).
>> >
>> >
>> > Clustering hangs, they all happen within very limited section of the code:
>> >
>> >       1  free_thread_stack+0x124/0x590 kernel/fork.c:284
>> >       5  free_thread_stack+0x12e/0x590 kernel/fork.c:280
>> >      39  free_thread_stack+0x12e/0x590 kernel/fork.c:284
>> >       6  free_thread_stack+0x133/0x590 kernel/fork.c:280
>> >       5  free_thread_stack+0x13d/0x590 kernel/fork.c:280
>> >       2  free_thread_stack+0x141/0x590 kernel/fork.c:280
>> >       6  free_thread_stack+0x14c/0x590 kernel/fork.c:280
>> >       9  free_thread_stack+0x151/0x590 kernel/fork.c:280
>> >       3  free_thread_stack+0x15b/0x590 kernel/fork.c:280
>> >      67  free_thread_stack+0x168/0x590 kernel/fork.c:280
>> >       6  free_thread_stack+0x16d/0x590 kernel/fork.c:284
>> >       2  free_thread_stack+0x177/0x590 kernel/fork.c:284
>> >       1  free_thread_stack+0x182/0x590 kernel/fork.c:284
>> >       1  free_thread_stack+0x186/0x590 kernel/fork.c:284
>> >      16  free_thread_stack+0x18b/0x590 kernel/fork.c:284
>> >       4  free_thread_stack+0x195/0x590 kernel/fork.c:284
>> >
>> > Here is disass of the function:
>> > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt
>> >
>> > But if I am not mistaken, the function only ever jumps down. So how
>> > can it loop?...
>>
>>
>> This is a miscompilation related to static branches.
>>
>> objdump shows:
>>
>> ffffffff814878f8: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
>>  ./arch/x86/include/asm/jump_label.h:25
>> asm_volatile_goto("1:"
>>
>> However, the actual instruction in memory at the time is:
>>
>>    0xffffffff814878f8 <+408>: jmpq   0xffffffff8148787f <free_thread_stack+287>
>>
>> Which jumps to a wrong location in free_thread_stack and makes it loop.
>>
>> The static branch is this:
>>
>> static inline bool memcg_kmem_enabled(void)
>> {
>>   return static_branch_unlikely(&memcg_kmem_enabled_key);
>> }
>>
>> static inline void memcg_kmem_uncharge(struct page *page, int order)
>> {
>>   if (memcg_kmem_enabled())
>>     __memcg_kmem_uncharge(page, order);
>> }
>>
>> I suspect it may have something to do with loop unrolling. It may jump
>> to the right location, but in the wrong unrolled iteration.
>
>
> Kernel built with clang version 10.0.0
> (https://github.com/llvm/llvm-project.git
> c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine.
>
> Alex, please update clang on syzbot machines.

Wow, what a bug. Very happy to be off the hook for causing it, and
feeling a lot better about my inability to reproduce it with a GCC-built
kernel!

Regards,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-09 17:38                                 ` Nick Desaulniers
@ 2020-01-10  8:37                                   ` Alexander Potapenko
  2020-01-14 10:15                                     ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Potapenko @ 2020-01-10  8:37 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Dmitry Vyukov, Casey Schaufler, Daniel Axtens, clang-built-linux,
	Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML,
	syzkaller-bugs

On Thu, Jan 9, 2020 at 6:39 PM 'Nick Desaulniers' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> On Thu, Jan 9, 2020 at 9:23 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > I disabled loop unrolling and loop unswitching in LLVM when the loop
> > > contained asm goto in:
> > > https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665
> > > I have a fix for loop unrolling in:
> > > https://reviews.llvm.org/D64101
> > > that I should dust off. I haven't looked into loop unswitching yet.
> >
> > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the
> > broken compiler and the newer compiler that seems to work, so I would
> > assume that that commit fixes this.
> > We will get the final stamp from syzbot hopefully by tomorrow.
>
> How often do you refresh the build of Clang in syzbot? Is it manual? I
> understand the tradeoffs of living on the tip of the spear, but
> c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is 6 months old.  So upstream
> LLVM could be regressing more often, and you wouldn't notice for 1/2 a
> year or more. :-/
KMSAN used to be the only user of Clang on syzbot, so I didn't bother too often.
Now that there are other users, we'll need a better strategy.
Clang revisions I've been picking previously came from Chromium's
Clang distributions. This is nice, because Chromium folks usually pick
a revision that has been extensively tested at Google already, plus
they make sure Chromium tests also pass.
They don't roll the compiler often, however (typically once a month or
two, but this time there were holidays, plus some nasty breakages).
> --
> Thanks,
> ~Nick Desaulniers
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAKwvOdkh8CV0pgqqHXknv8%2BgE2ovoKEV_m%2BqiEmWutmLnra3%3Dg%40mail.gmail.com.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: INFO: rcu detected stall in sys_kill
  2020-01-10  8:37                                   ` Alexander Potapenko
@ 2020-01-14 10:15                                     ` Dmitry Vyukov
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2020-01-14 10:15 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Nick Desaulniers, Casey Schaufler, Daniel Axtens,
	clang-built-linux, Tetsuo Handa, syzbot, kasan-dev,
	Andrew Morton, LKML, syzkaller-bugs

The clang instances are back to life (incl smack).

#syz invalid

On Fri, Jan 10, 2020 at 9:37 AM 'Alexander Potapenko' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> On Thu, Jan 9, 2020 at 6:39 PM 'Nick Desaulniers' via kasan-dev
> <kasan-dev@googlegroups.com> wrote:
> >
> > On Thu, Jan 9, 2020 at 9:23 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > >
> > > On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > > I disabled loop unrolling and loop unswitching in LLVM when the loop
> > > > contained asm goto in:
> > > > https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665
> > > > I have a fix for loop unrolling in:
> > > > https://reviews.llvm.org/D64101
> > > > that I should dust off. I haven't looked into loop unswitching yet.
> > >
> > > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the
> > > broken compiler and the newer compiler that seems to work, so I would
> > > assume that that commit fixes this.
> > > We will get the final stamp from syzbot hopefully by tomorrow.
> >
> > How often do you refresh the build of Clang in syzbot? Is it manual? I
> > understand the tradeoffs of living on the tip of the spear, but
> > c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is 6 months old.  So upstream
> > LLVM could be regressing more often, and you wouldn't notice for 1/2 a
> > year or more. :-/
> KMSAN used to be the only user of Clang on syzbot, so I didn't bother too often.
> Now that there are other users, we'll need a better strategy.
> Clang revisions I've been picking previously came from Chromium's
> Clang distributions. This is nice, because Chromium folks usually pick
> a revision that has been extensively tested at Google already, plus
> they make sure Chromium tests also pass.
> They don't roll the compiler often, however (typically once a month or
> two, but this time there were holidays, plus some nasty breakages).
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAKwvOdkh8CV0pgqqHXknv8%2BgE2ovoKEV_m%2BqiEmWutmLnra3%3Dg%40mail.gmail.com.
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAG_fn%3DUU0fuws59L8Bp8DEVhH%2BX6xRaanwuRrzy-HNdrVpqJmg%40mail.gmail.com.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-01-14 10:15 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-03  8:27 INFO: rcu detected stall in sys_kill syzbot
2019-12-03  8:38 ` Dmitry Vyukov
2019-12-04 13:58   ` Dmitry Vyukov
2019-12-04 16:05     ` Casey Schaufler
2019-12-04 23:34       ` Daniel Axtens
2019-12-17 13:38         ` Daniel Axtens
2020-01-08  6:20           ` Dmitry Vyukov
2020-01-08 10:25             ` Tetsuo Handa
2020-01-08 17:19               ` Casey Schaufler
2020-01-09  8:19                 ` Dmitry Vyukov
2020-01-09  8:50                   ` Dmitry Vyukov
2020-01-09  9:29                     ` Dmitry Vyukov
2020-01-09 10:05                       ` Dmitry Vyukov
2020-01-09 10:39                         ` Dmitry Vyukov
2020-01-09 16:23                           ` Alexander Potapenko
2020-01-09 17:16                             ` Nick Desaulniers
2020-01-09 17:23                               ` Dmitry Vyukov
2020-01-09 17:38                                 ` Nick Desaulniers
2020-01-10  8:37                                   ` Alexander Potapenko
2020-01-14 10:15                                     ` Dmitry Vyukov
2020-01-09 23:25                           ` Daniel Axtens
2020-01-09 15:43                     ` Casey Schaufler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).