linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: task hung in perf_trace_event_unreg
@ 2018-04-02  9:20 syzbot
  2018-04-02 13:40 ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: syzbot @ 2018-04-02  9:20 UTC (permalink / raw)
  To: linux-kernel, mingo, rostedt, syzkaller-bugs

Hello,

syzbot hit the following crash on upstream commit
0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
Linux 4.16
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
option "g\a�;e�K�׫>pquota"
INFO: task syz-executor3:10803 blocked for more than 120 seconds.
       Not tainted 4.16.0+ #10
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor3   D20944 10803   4492 0x80000002
Call Trace:
  context_switch kernel/sched/core.c:2862 [inline]
  __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
  schedule+0xf5/0x430 kernel/sched/core.c:3499
  schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
  do_wait_for_common kernel/sched/completion.c:86 [inline]
  __wait_for_common kernel/sched/completion.c:107 [inline]
  wait_for_common kernel/sched/completion.c:118 [inline]
  wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
  __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
  synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
  synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
  tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
  perf_trace_event_unreg.isra.2+0xb7/0x1f0  
kernel/trace/trace_event_perf.c:161
  perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
  tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
  _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
  put_event+0x24/0x30 kernel/events/core.c:4204
  perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
  perf_release+0x37/0x50 kernel/events/core.c:4320
  __fput+0x327/0x7e0 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x199/0x270 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x9bb/0x1ad0 kernel/exit.c:865
  do_group_exit+0x149/0x400 kernel/exit.c:968
  get_signal+0x73a/0x16d0 kernel/signal.c:2469
  do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
  exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
  do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455269
RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000

Showing all locks held in the system:
2 locks held by khungtaskd/876:
  #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
  #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60  
kernel/hung_task.c:249
  #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]  
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
2 locks held by getty/4414:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4415:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4416:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4417:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4418:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4419:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4420:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
1 lock held by syz-executor3/10803:
  #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]  
perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
4 locks held by syz-executor5/10816:
  #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]  
tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
  #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]  
tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
  #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]  
tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
  #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]  
n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
1 lock held by syz-executor2/10827:
  #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]  
perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
1 lock held by blkid/10832:
  #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor4/10835:
  #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor4/10845:
  #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
  check_hung_task kernel/hung_task.c:132 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
  watchdog+0x90c/0xd60 kernel/hung_task.c:249
INFO: rcu_sched self-detected stall on CPU
	0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906  
softirq=33205/33205 fqs=30980
	
  (t=125000 jiffies g=17618 c=17617 q=921)
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events_unbound flush_to_ldisc
RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
  commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
  n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
  n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
  __receive_buf drivers/tty/n_tty.c:1611 [inline]
  n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
  n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
  tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
  tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
  receive_buf drivers/tty/tty_buffer.c:475 [inline]
  flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38  
d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48  
90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02  9:20 INFO: task hung in perf_trace_event_unreg syzbot
@ 2018-04-02 13:40 ` Steven Rostedt
  2018-04-02 15:33   ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2018-04-02 13:40 UTC (permalink / raw)
  To: syzbot
  Cc: linux-kernel, mingo, syzkaller-bugs, Peter Zijlstra, Paul E. McKenney

On Mon, 02 Apr 2018 02:20:02 -0700
syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:

> Hello,
> 
> syzbot hit the following crash on upstream commit
> 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> Linux 4.16
> syzbot dashboard link:  
> https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:  
> https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> Kernel config:  
> https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> option "g\a�;e�K�׫>pquota"
> INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>        Not tainted 4.16.0+ #10
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor3   D20944 10803   4492 0x80000002
> Call Trace:
>   context_switch kernel/sched/core.c:2862 [inline]
>   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>   schedule+0xf5/0x430 kernel/sched/core.c:3499
>   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>   do_wait_for_common kernel/sched/completion.c:86 [inline]
>   __wait_for_common kernel/sched/completion.c:107 [inline]
>   wait_for_common kernel/sched/completion.c:118 [inline]
>   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213

I don't think this is a perf issue. Looks like something is preventing
rcu_sched from completing. If there's a CPU that is running in kernel
space and never scheduling, that can cause this issue. Or if RCU
somehow missed a transition into idle or user space.

-- Steve

>   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> kernel/trace/trace_event_perf.c:161
>   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>   put_event+0x24/0x30 kernel/events/core.c:4204
>   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>   perf_release+0x37/0x50 kernel/events/core.c:4320
>   __fput+0x327/0x7e0 fs/file_table.c:209
>   ____fput+0x15/0x20 fs/file_table.c:243
>   task_work_run+0x199/0x270 kernel/task_work.c:113
>   exit_task_work include/linux/task_work.h:22 [inline]
>   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>   do_group_exit+0x149/0x400 kernel/exit.c:968
>   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455269
> RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
> RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
> 
> Showing all locks held in the system:
> 2 locks held by khungtaskd/876:
>   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]  
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60  
> kernel/hung_task.c:249
>   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]  
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 2 locks held by getty/4414:
>   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4415:
>   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4416:
>   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4417:
>   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4418:
>   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4419:
>   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4420:
>   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 1 lock held by syz-executor3/10803:
>   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]  
> perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> 4 locks held by syz-executor5/10816:
>   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]  
> tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]  
> tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]  
> tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
>   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]  
> n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
> 1 lock held by syz-executor2/10827:
>   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]  
> perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> 1 lock held by blkid/10832:
>   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
> lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> 1 lock held by syz-executor4/10835:
>   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
> lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> 1 lock held by syz-executor4/10845:
>   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
> lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> 
> =============================================
> 
> NMI backtrace for cpu 1
> CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:17 [inline]
>   dump_stack+0x194/0x24d lib/dump_stack.c:53
>   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
>   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
>   check_hung_task kernel/hung_task.c:132 [inline]
>   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
>   watchdog+0x90c/0xd60 kernel/hung_task.c:249
> INFO: rcu_sched self-detected stall on CPU
> 	0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906  
> softirq=33205/33205 fqs=30980
> 	
>   (t=125000 jiffies g=17618 c=17617 q=921)
>   kthread+0x33c/0x400 kernel/kthread.c:238
>   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> Workqueue: events_unbound flush_to_ldisc
> RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
> RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
> RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
> RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
> RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
> R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
> FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
> DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Call Trace:
>   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
>   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
>   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
>   __receive_buf drivers/tty/n_tty.c:1611 [inline]
>   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
>   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
>   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
>   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
>   receive_buf drivers/tty/tty_buffer.c:475 [inline]
>   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
>   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
>   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>   kthread+0x33c/0x400 kernel/kthread.c:238
>   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38  
> d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48  
> 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is  
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug  
> report.
> Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 13:40 ` Steven Rostedt
@ 2018-04-02 15:33   ` Paul E. McKenney
  2018-04-02 16:04     ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-02 15:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: syzbot, linux-kernel, mingo, syzkaller-bugs, Peter Zijlstra

On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> On Mon, 02 Apr 2018 02:20:02 -0700
> syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:
> 
> > Hello,
> > 
> > syzbot hit the following crash on upstream commit
> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> > Linux 4.16
> > syzbot dashboard link:  
> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > Raw console output:  
> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> > Kernel config:  
> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> > compiler: gcc (GCC) 7.1.1 20170620
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for  
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> > option "g\a�;e�K�׫>pquota"

Might not hurt to look into the above, though perhaps this is just syzkaller
playing around with mount options.

> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >        Not tainted 4.16.0+ #10
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > syz-executor3   D20944 10803   4492 0x80000002
> > Call Trace:
> >   context_switch kernel/sched/core.c:2862 [inline]
> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> 
> I don't think this is a perf issue. Looks like something is preventing
> rcu_sched from completing. If there's a CPU that is running in kernel
> space and never scheduling, that can cause this issue. Or if RCU
> somehow missed a transition into idle or user space.

The RCU CPU stall warning below strongly supports this position ...

> -- Steve
> 
> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> > kernel/trace/trace_event_perf.c:161
> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >   ____fput+0x15/0x20 fs/file_table.c:243
> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >   exit_task_work include/linux/task_work.h:22 [inline]
> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > RIP: 0033:0x455269
> > RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> > RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
> > RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
> > 
> > Showing all locks held in the system:
> > 2 locks held by khungtaskd/876:
> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]  
> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
> > kernel/hung_task.c:249

... And two places to start looking are the two above rcu_read_lock() calls.
Especially given that khungtask shows up below.

> >   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]  
> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> > 2 locks held by getty/4414:
> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> > 2 locks held by getty/4415:
> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> > 2 locks held by getty/4416:
> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> > 2 locks held by getty/4417:
> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> > 2 locks held by getty/4418:
> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> > 2 locks held by getty/4419:
> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> > 2 locks held by getty/4420:
> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]  
> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> > 1 lock held by syz-executor3/10803:
> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]  
> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> > 4 locks held by syz-executor5/10816:
> >   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]  
> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]  
> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]  
> > tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
> >   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]  
> > n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
> > 1 lock held by syz-executor2/10827:
> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]  
> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> > 1 lock held by blkid/10832:
> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> > 1 lock held by syz-executor4/10835:
> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> > 1 lock held by syz-executor4/10845:
> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]  
> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> > 
> > =============================================
> > 
> > NMI backtrace for cpu 1
> > CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> > Google 01/01/2011
> > Call Trace:
> >   __dump_stack lib/dump_stack.c:17 [inline]
> >   dump_stack+0x194/0x24d lib/dump_stack.c:53
> >   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
> >   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
> >   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> >   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
> >   check_hung_task kernel/hung_task.c:132 [inline]
> >   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
> >   watchdog+0x90c/0xd60 kernel/hung_task.c:249
> > INFO: rcu_sched self-detected stall on CPU
> > 	0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906  
> > softirq=33205/33205 fqs=30980
> > 	
> >   (t=125000 jiffies g=17618 c=17617 q=921)
> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> > Sending NMI from CPU 1 to CPUs 0:
> > NMI backtrace for cpu 0
> > CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> > Google 01/01/2011
> > Workqueue: events_unbound flush_to_ldisc
> > RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
> > RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
> > RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
> > RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
> > RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
> > R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
> > FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
> > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> > Call Trace:
> >   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
> >   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
> >   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
> >   __receive_buf drivers/tty/n_tty.c:1611 [inline]
> >   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
> >   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
> >   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
> >   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
> >   receive_buf drivers/tty/tty_buffer.c:475 [inline]
> >   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
> >   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
> >   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

And the above is another good place to look.

							Thanx, Paul

> > Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38  
> > d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48  
> > 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
> > 
> > 
> > ---
> > This bug is generated by a dumb bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for details.
> > Direct all questions to syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this bug report.
> > If you forgot to add the Reported-by tag, once the fix for this bug is  
> > merged
> > into any tree, please reply to this email with:
> > #syz fix: exact-commit-title
> > To mark this as a duplicate of another syzbot report, please reply with:
> > #syz dup: exact-subject-of-another-report
> > If it's a one-off invalid bug report, please reply with:
> > #syz invalid
> > Note: if the crash happens again, it will cause creation of a new bug  
> > report.
> > Note: all commands must start from beginning of the line in the email body.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 15:33   ` Paul E. McKenney
@ 2018-04-02 16:04     ` Dmitry Vyukov
  2018-04-02 16:21       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-02 16:04 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> On Mon, 02 Apr 2018 02:20:02 -0700
>> syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:
>>
>> > Hello,
>> >
>> > syzbot hit the following crash on upstream commit
>> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> > Linux 4.16
>> > syzbot dashboard link:
>> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >
>> > Unfortunately, I don't have any reproducer for this crash yet.
>> > Raw console output:
>> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> > Kernel config:
>> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> > compiler: gcc (GCC) 7.1.1 20170620
>> >
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> > It will help syzbot understand when the bug is fixed. See footer for
>> > details.
>> > If you forward the report, please keep this part and the footer.
>> >
>> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> > option "g �;e�K�׫>pquota"
>
> Might not hurt to look into the above, though perhaps this is just syzkaller
> playing around with mount options.
>
>> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >        Not tainted 4.16.0+ #10
>> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > syz-executor3   D20944 10803   4492 0x80000002
>> > Call Trace:
>> >   context_switch kernel/sched/core.c:2862 [inline]
>> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>>
>> I don't think this is a perf issue. Looks like something is preventing
>> rcu_sched from completing. If there's a CPU that is running in kernel
>> space and never scheduling, that can cause this issue. Or if RCU
>> somehow missed a transition into idle or user space.
>
> The RCU CPU stall warning below strongly supports this position ...


I think this is this guy then:

https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40

#syz dup: INFO: rcu detected stall in __process_echoes


Looking retrospectively at the various hang/stall bugs that we have, I
think we need some kind of priority between them. I.e. we have rcu
stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
hang and maybe something else. It would be useful if they fire
deterministically according to priorities. If there is an rcu stall,
that's always detected as CPU stall. Then if there is no RCU stall,
but a workqueue stall, then that's always detected as workqueue stall,
etc.
Currently if we have an RCU stall (effectively CPU stall), that can be
detected either RCU stall or a task hung, producing 2 different bug
reports (which is bad).
One can say that it's only a matter of tuning timeouts, but at least
task hung detector has a problem that if you set timeout to X, it can
detect hung anywhere between X and 2*X. And on one hand we need quite
large timeout (a minute may not be enough), and on the other hand we
can't wait for an hour just to make sure that the machine is indeed
dead (these things happen every few minutes).





>> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
>> > kernel/trace/trace_event_perf.c:161
>> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>> >   put_event+0x24/0x30 kernel/events/core.c:4204
>> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>> >   perf_release+0x37/0x50 kernel/events/core.c:4320
>> >   __fput+0x327/0x7e0 fs/file_table.c:209
>> >   ____fput+0x15/0x20 fs/file_table.c:243
>> >   task_work_run+0x199/0x270 kernel/task_work.c:113
>> >   exit_task_work include/linux/task_work.h:22 [inline]
>> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>> >   do_group_exit+0x149/0x400 kernel/exit.c:968
>> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
>> > RIP: 0033:0x455269
>> > RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
>> > RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
>> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
>> > RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
>> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> > R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
>> >
>> > Showing all locks held in the system:
>> > 2 locks held by khungtaskd/876:
>> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]
>> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
>> > kernel/hung_task.c:249
>
> ... And two places to start looking are the two above rcu_read_lock() calls.
> Especially given that khungtask shows up below.
>
>> >   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]
>> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
>> > 2 locks held by getty/4414:
>> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> > 2 locks held by getty/4415:
>> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> > 2 locks held by getty/4416:
>> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> > 2 locks held by getty/4417:
>> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> > 2 locks held by getty/4418:
>> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> > 2 locks held by getty/4419:
>> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> > 2 locks held by getty/4420:
>> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> > 1 lock held by syz-executor3/10803:
>> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
>> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
>> > 4 locks held by syz-executor5/10816:
>> >   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]
>> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>> >   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]
>> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>> >   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]
>> > tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
>> >   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]
>> > n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
>> > 1 lock held by syz-executor2/10827:
>> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
>> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
>> > 1 lock held by blkid/10832:
>> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> > 1 lock held by syz-executor4/10835:
>> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> > 1 lock held by syz-executor4/10845:
>> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> >
>> > =============================================
>> >
>> > NMI backtrace for cpu 1
>> > CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > Call Trace:
>> >   __dump_stack lib/dump_stack.c:17 [inline]
>> >   dump_stack+0x194/0x24d lib/dump_stack.c:53
>> >   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>> >   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
>> >   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>> >   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
>> >   check_hung_task kernel/hung_task.c:132 [inline]
>> >   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
>> >   watchdog+0x90c/0xd60 kernel/hung_task.c:249
>> > INFO: rcu_sched self-detected stall on CPU
>> >     0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906
>> > softirq=33205/33205 fqs=30980
>> >
>> >   (t=125000 jiffies g=17618 c=17617 q=921)
>> >   kthread+0x33c/0x400 kernel/kthread.c:238
>> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
>> > Sending NMI from CPU 1 to CPUs 0:
>> > NMI backtrace for cpu 0
>> > CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > Workqueue: events_unbound flush_to_ldisc
>> > RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
>> > RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
>> > RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
>> > RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
>> > RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
>> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
>> > R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
>> > FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
>> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
>> > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>> > Call Trace:
>> >   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
>> >   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
>> >   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
>> >   __receive_buf drivers/tty/n_tty.c:1611 [inline]
>> >   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
>> >   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
>> >   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
>> >   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
>> >   receive_buf drivers/tty/tty_buffer.c:475 [inline]
>> >   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
>> >   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
>> >   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>> >   kthread+0x33c/0x400 kernel/kthread.c:238
>> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
>
> And the above is another good place to look.
>
>                                                         Thanx, Paul
>
>> > Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38
>> > d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48
>> > 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
>> >
>> >
>> > ---
>> > This bug is generated by a dumb bot. It may contain errors.
>> > See https://goo.gl/tpsmEJ for details.
>> > Direct all questions to syzkaller@googlegroups.com.
>> >
>> > syzbot will keep track of this bug report.
>> > If you forgot to add the Reported-by tag, once the fix for this bug is
>> > merged
>> > into any tree, please reply to this email with:
>> > #syz fix: exact-commit-title
>> > To mark this as a duplicate of another syzbot report, please reply with:
>> > #syz dup: exact-subject-of-another-report
>> > If it's a one-off invalid bug report, please reply with:
>> > #syz invalid
>> > Note: if the crash happens again, it will cause creation of a new bug
>> > report.
>> > Note: all commands must start from beginning of the line in the email body.
>>
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180402153332.GM3948%40linux.vnet.ibm.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 16:04     ` Dmitry Vyukov
@ 2018-04-02 16:21       ` Paul E. McKenney
  2018-04-02 16:32         ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-02 16:21 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:
> >>
> >> > Hello,
> >> >
> >> > syzbot hit the following crash on upstream commit
> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> > Linux 4.16
> >> > syzbot dashboard link:
> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >
> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> > Raw console output:
> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> > Kernel config:
> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >
> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> > details.
> >> > If you forward the report, please keep this part and the footer.
> >> >
> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> > option "g �;e�K�׫>pquota"
> >
> > Might not hurt to look into the above, though perhaps this is just syzkaller
> > playing around with mount options.
> >
> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >        Not tainted 4.16.0+ #10
> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> > Call Trace:
> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >>
> >> I don't think this is a perf issue. Looks like something is preventing
> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> space and never scheduling, that can cause this issue. Or if RCU
> >> somehow missed a transition into idle or user space.
> >
> > The RCU CPU stall warning below strongly supports this position ...
> 
> I think this is this guy then:
> 
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> 
> #syz dup: INFO: rcu detected stall in __process_echoes

Seems likely to me!

> Looking retrospectively at the various hang/stall bugs that we have, I
> think we need some kind of priority between them. I.e. we have rcu
> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> hang and maybe something else. It would be useful if they fire
> deterministically according to priorities. If there is an rcu stall,
> that's always detected as CPU stall. Then if there is no RCU stall,
> but a workqueue stall, then that's always detected as workqueue stall,
> etc.
> Currently if we have an RCU stall (effectively CPU stall), that can be
> detected either RCU stall or a task hung, producing 2 different bug
> reports (which is bad).
> One can say that it's only a matter of tuning timeouts, but at least
> task hung detector has a problem that if you set timeout to X, it can
> detect hung anywhere between X and 2*X. And on one hand we need quite
> large timeout (a minute may not be enough), and on the other hand we
> can't wait for an hour just to make sure that the machine is indeed
> dead (these things happen every few minutes).

I suppose that we could have a global variable that was set to the
priority of the complaint in question, which would suppress all
lower-priority complaints.  Might need to be opt-in, though -- I would
guess that not everyone is going to be happy with one complaint suppressing
others, especially given the possibility that the two complaints might
be about different things.

Or did you have something more deft in mind?

							Thanx, Paul

> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
> >> > kernel/trace/trace_event_perf.c:161
> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >> >   ____fput+0x15/0x20 fs/file_table.c:243
> >> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >> >   exit_task_work include/linux/task_work.h:22 [inline]
> >> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> >> > RIP: 0033:0x455269
> >> > RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> >> > RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
> >> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
> >> > RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
> >> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >> > R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
> >> >
> >> > Showing all locks held in the system:
> >> > 2 locks held by khungtaskd/876:
> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]
> >> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
> >> > kernel/hung_task.c:249
> >
> > ... And two places to start looking are the two above rcu_read_lock() calls.
> > Especially given that khungtask shows up below.
> >
> >> >   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]
> >> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> >> > 2 locks held by getty/4414:
> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> > 2 locks held by getty/4415:
> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> > 2 locks held by getty/4416:
> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> > 2 locks held by getty/4417:
> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> > 2 locks held by getty/4418:
> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> > 2 locks held by getty/4419:
> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> > 2 locks held by getty/4420:
> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> > 1 lock held by syz-executor3/10803:
> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> >> > 4 locks held by syz-executor5/10816:
> >> >   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]
> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >> >   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]
> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >> >   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]
> >> > tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
> >> >   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]
> >> > n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
> >> > 1 lock held by syz-executor2/10827:
> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> >> > 1 lock held by blkid/10832:
> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> > 1 lock held by syz-executor4/10835:
> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> > 1 lock held by syz-executor4/10845:
> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> >
> >> > =============================================
> >> >
> >> > NMI backtrace for cpu 1
> >> > CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> > Google 01/01/2011
> >> > Call Trace:
> >> >   __dump_stack lib/dump_stack.c:17 [inline]
> >> >   dump_stack+0x194/0x24d lib/dump_stack.c:53
> >> >   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
> >> >   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
> >> >   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> >> >   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
> >> >   check_hung_task kernel/hung_task.c:132 [inline]
> >> >   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
> >> >   watchdog+0x90c/0xd60 kernel/hung_task.c:249
> >> > INFO: rcu_sched self-detected stall on CPU
> >> >     0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906
> >> > softirq=33205/33205 fqs=30980
> >> >
> >> >   (t=125000 jiffies g=17618 c=17617 q=921)
> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> >> > Sending NMI from CPU 1 to CPUs 0:
> >> > NMI backtrace for cpu 0
> >> > CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> > Google 01/01/2011
> >> > Workqueue: events_unbound flush_to_ldisc
> >> > RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
> >> > RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
> >> > RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
> >> > RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
> >> > RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
> >> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
> >> > R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
> >> > FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
> >> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
> >> > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> >> > Call Trace:
> >> >   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
> >> >   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
> >> >   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
> >> >   __receive_buf drivers/tty/n_tty.c:1611 [inline]
> >> >   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
> >> >   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
> >> >   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
> >> >   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
> >> >   receive_buf drivers/tty/tty_buffer.c:475 [inline]
> >> >   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
> >> >   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
> >> >   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> >
> > And the above is another good place to look.
> >
> >                                                         Thanx, Paul
> >
> >> > Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38
> >> > d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48
> >> > 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
> >> >
> >> >
> >> > ---
> >> > This bug is generated by a dumb bot. It may contain errors.
> >> > See https://goo.gl/tpsmEJ for details.
> >> > Direct all questions to syzkaller@googlegroups.com.
> >> >
> >> > syzbot will keep track of this bug report.
> >> > If you forgot to add the Reported-by tag, once the fix for this bug is
> >> > merged
> >> > into any tree, please reply to this email with:
> >> > #syz fix: exact-commit-title
> >> > To mark this as a duplicate of another syzbot report, please reply with:
> >> > #syz dup: exact-subject-of-another-report
> >> > If it's a one-off invalid bug report, please reply with:
> >> > #syz invalid
> >> > Note: if the crash happens again, it will cause creation of a new bug
> >> > report.
> >> > Note: all commands must start from beginning of the line in the email body.
> >>
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180402153332.GM3948%40linux.vnet.ibm.com.
> > For more options, visit https://groups.google.com/d/optout.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 16:21       ` Paul E. McKenney
@ 2018-04-02 16:32         ` Dmitry Vyukov
  2018-04-02 16:39           ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-02 16:32 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > syzbot hit the following crash on upstream commit
>> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> >> > Linux 4.16
>> >> > syzbot dashboard link:
>> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >
>> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> > Raw console output:
>> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> > Kernel config:
>> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >
>> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> > details.
>> >> > If you forward the report, please keep this part and the footer.
>> >> >
>> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> >> > option "g �;e�K�׫>pquota"
>> >
>> > Might not hurt to look into the above, though perhaps this is just syzkaller
>> > playing around with mount options.
>> >
>> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >        Not tainted 4.16.0+ #10
>> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> > syz-executor3   D20944 10803   4492 0x80000002
>> >> > Call Trace:
>> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >>
>> >> I don't think this is a perf issue. Looks like something is preventing
>> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> somehow missed a transition into idle or user space.
>> >
>> > The RCU CPU stall warning below strongly supports this position ...
>>
>> I think this is this guy then:
>>
>> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>>
>> #syz dup: INFO: rcu detected stall in __process_echoes
>
> Seems likely to me!
>
>> Looking retrospectively at the various hang/stall bugs that we have, I
>> think we need some kind of priority between them. I.e. we have rcu
>> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> hang and maybe something else. It would be useful if they fire
>> deterministically according to priorities. If there is an rcu stall,
>> that's always detected as CPU stall. Then if there is no RCU stall,
>> but a workqueue stall, then that's always detected as workqueue stall,
>> etc.
>> Currently if we have an RCU stall (effectively CPU stall), that can be
>> detected either RCU stall or a task hung, producing 2 different bug
>> reports (which is bad).
>> One can say that it's only a matter of tuning timeouts, but at least
>> task hung detector has a problem that if you set timeout to X, it can
>> detect hung anywhere between X and 2*X. And on one hand we need quite
>> large timeout (a minute may not be enough), and on the other hand we
>> can't wait for an hour just to make sure that the machine is indeed
>> dead (these things happen every few minutes).
>
> I suppose that we could have a global variable that was set to the
> priority of the complaint in question, which would suppress all
> lower-priority complaints.  Might need to be opt-in, though -- I would
> guess that not everyone is going to be happy with one complaint suppressing
> others, especially given the possibility that the two complaints might
> be about different things.
>
> Or did you have something more deft in mind?


syzkaller generally looks only at the first report. One does not know
if/when there will be a second one, or the second one can be induced
by the first one, and we generally want clean reports on a non-tainted
kernel. So we don't just need to suppress lower priority ones, we need
to produce the right report first.
I am thinking maybe setting:
 - rcu stalls at 1.5 minutes
 - workqueue stalls at 2 minutes
 - task hungs at 2.5 minutes
 - and no output whatsoever at 3 minutes
Do I miss anything? I think at least spinlocks. Should they go before
or after rcu?

This will require fixing task hung. Have not yet looked at workqueue detector.
Does at least RCU respect the given timeout more or less precisely?


>> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
>> >> > kernel/trace/trace_event_perf.c:161
>> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>> >> >   put_event+0x24/0x30 kernel/events/core.c:4204
>> >> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>> >> >   perf_release+0x37/0x50 kernel/events/core.c:4320
>> >> >   __fput+0x327/0x7e0 fs/file_table.c:209
>> >> >   ____fput+0x15/0x20 fs/file_table.c:243
>> >> >   task_work_run+0x199/0x270 kernel/task_work.c:113
>> >> >   exit_task_work include/linux/task_work.h:22 [inline]
>> >> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>> >> >   do_group_exit+0x149/0x400 kernel/exit.c:968
>> >> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>> >> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>> >> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>> >> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>> >> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>> >> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
>> >> > RIP: 0033:0x455269
>> >> > RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
>> >> > RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
>> >> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
>> >> > RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
>> >> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> >> > R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
>> >> >
>> >> > Showing all locks held in the system:
>> >> > 2 locks held by khungtaskd/876:
>> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]
>> >> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
>> >> > kernel/hung_task.c:249
>> >
>> > ... And two places to start looking are the two above rcu_read_lock() calls.
>> > Especially given that khungtask shows up below.
>> >
>> >> >   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]
>> >> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
>> >> > 2 locks held by getty/4414:
>> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> > 2 locks held by getty/4415:
>> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> > 2 locks held by getty/4416:
>> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> > 2 locks held by getty/4417:
>> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> > 2 locks held by getty/4418:
>> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> > 2 locks held by getty/4419:
>> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> > 2 locks held by getty/4420:
>> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> > 1 lock held by syz-executor3/10803:
>> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
>> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
>> >> > 4 locks held by syz-executor5/10816:
>> >> >   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]
>> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>> >> >   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]
>> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>> >> >   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]
>> >> > tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
>> >> >   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]
>> >> > n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
>> >> > 1 lock held by syz-executor2/10827:
>> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
>> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
>> >> > 1 lock held by blkid/10832:
>> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> >> > 1 lock held by syz-executor4/10835:
>> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> >> > 1 lock held by syz-executor4/10845:
>> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> >> >
>> >> > =============================================
>> >> >
>> >> > NMI backtrace for cpu 1
>> >> > CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
>> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> >> > Google 01/01/2011
>> >> > Call Trace:
>> >> >   __dump_stack lib/dump_stack.c:17 [inline]
>> >> >   dump_stack+0x194/0x24d lib/dump_stack.c:53
>> >> >   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>> >> >   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
>> >> >   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>> >> >   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
>> >> >   check_hung_task kernel/hung_task.c:132 [inline]
>> >> >   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
>> >> >   watchdog+0x90c/0xd60 kernel/hung_task.c:249
>> >> > INFO: rcu_sched self-detected stall on CPU
>> >> >     0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906
>> >> > softirq=33205/33205 fqs=30980
>> >> >
>> >> >   (t=125000 jiffies g=17618 c=17617 q=921)
>> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
>> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
>> >> > Sending NMI from CPU 1 to CPUs 0:
>> >> > NMI backtrace for cpu 0
>> >> > CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
>> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> >> > Google 01/01/2011
>> >> > Workqueue: events_unbound flush_to_ldisc
>> >> > RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
>> >> > RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
>> >> > RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
>> >> > RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
>> >> > RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
>> >> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
>> >> > R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
>> >> > FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
>> >> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> > CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
>> >> > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
>> >> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>> >> > Call Trace:
>> >> >   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
>> >> >   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
>> >> >   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
>> >> >   __receive_buf drivers/tty/n_tty.c:1611 [inline]
>> >> >   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
>> >> >   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
>> >> >   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
>> >> >   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
>> >> >   receive_buf drivers/tty/tty_buffer.c:475 [inline]
>> >> >   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
>> >> >   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
>> >> >   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
>> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
>> >
>> > And the above is another good place to look.
>> >
>> >                                                         Thanx, Paul
>> >
>> >> > Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38
>> >> > d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48
>> >> > 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
>> >> >
>> >> >
>> >> > ---
>> >> > This bug is generated by a dumb bot. It may contain errors.
>> >> > See https://goo.gl/tpsmEJ for details.
>> >> > Direct all questions to syzkaller@googlegroups.com.
>> >> >
>> >> > syzbot will keep track of this bug report.
>> >> > If you forgot to add the Reported-by tag, once the fix for this bug is
>> >> > merged
>> >> > into any tree, please reply to this email with:
>> >> > #syz fix: exact-commit-title
>> >> > To mark this as a duplicate of another syzbot report, please reply with:
>> >> > #syz dup: exact-subject-of-another-report
>> >> > If it's a one-off invalid bug report, please reply with:
>> >> > #syz invalid
>> >> > Note: if the crash happens again, it will cause creation of a new bug
>> >> > report.
>> >> > Note: all commands must start from beginning of the line in the email body.
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180402153332.GM3948%40linux.vnet.ibm.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 16:32         ` Dmitry Vyukov
@ 2018-04-02 16:39           ` Paul E. McKenney
  2018-04-02 17:11             ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-02 16:39 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:
> >> >>
> >> >> > Hello,
> >> >> >
> >> >> > syzbot hit the following crash on upstream commit
> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> >> > Linux 4.16
> >> >> > syzbot dashboard link:
> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >
> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> > Raw console output:
> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> > Kernel config:
> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >
> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> > details.
> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >
> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> >> > option "g �;e�K�׫>pquota"
> >> >
> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
> >> > playing around with mount options.
> >> >
> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >        Not tainted 4.16.0+ #10
> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> >> > Call Trace:
> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >>
> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> somehow missed a transition into idle or user space.
> >> >
> >> > The RCU CPU stall warning below strongly supports this position ...
> >>
> >> I think this is this guy then:
> >>
> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >>
> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >
> > Seems likely to me!
> >
> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> think we need some kind of priority between them. I.e. we have rcu
> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> hang and maybe something else. It would be useful if they fire
> >> deterministically according to priorities. If there is an rcu stall,
> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> etc.
> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> detected either RCU stall or a task hung, producing 2 different bug
> >> reports (which is bad).
> >> One can say that it's only a matter of tuning timeouts, but at least
> >> task hung detector has a problem that if you set timeout to X, it can
> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> large timeout (a minute may not be enough), and on the other hand we
> >> can't wait for an hour just to make sure that the machine is indeed
> >> dead (these things happen every few minutes).
> >
> > I suppose that we could have a global variable that was set to the
> > priority of the complaint in question, which would suppress all
> > lower-priority complaints.  Might need to be opt-in, though -- I would
> > guess that not everyone is going to be happy with one complaint suppressing
> > others, especially given the possibility that the two complaints might
> > be about different things.
> >
> > Or did you have something more deft in mind?
> 
> 
> syzkaller generally looks only at the first report. One does not know
> if/when there will be a second one, or the second one can be induced
> by the first one, and we generally want clean reports on a non-tainted
> kernel. So we don't just need to suppress lower priority ones, we need
> to produce the right report first.
> I am thinking maybe setting:
>  - rcu stalls at 1.5 minutes
>  - workqueue stalls at 2 minutes
>  - task hungs at 2.5 minutes
>  - and no output whatsoever at 3 minutes
> Do I miss anything? I think at least spinlocks. Should they go before
> or after rcu?

That is what I know of, but the Linux kernel being what it is, there is
probably something more out there.  If not now, in a few months.  The
RCU CPU stall timeout can be set on the kernel-boot command line, but
you probably already knew that.

Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
was 1.5 -seconds-.  ;-)

> This will require fixing task hung. Have not yet looked at workqueue detector.
> Does at least RCU respect the given timeout more or less precisely?

Assuming that there is at least one CPU capable of taking scheduling-clock
interrupts, it should respect the timeout to within a few jiffies.

							Thanx, Paul

> >> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
> >> >> > kernel/trace/trace_event_perf.c:161
> >> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >> >> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >> >> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >> >> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >> >> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >> >> >   ____fput+0x15/0x20 fs/file_table.c:243
> >> >> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >> >> >   exit_task_work include/linux/task_work.h:22 [inline]
> >> >> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >> >> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >> >> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >> >> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >> >> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >> >> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >> >> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >> >> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >> >> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> >> >> > RIP: 0033:0x455269
> >> >> > RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> >> >> > RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
> >> >> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
> >> >> > RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
> >> >> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >> >> > R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
> >> >> >
> >> >> > Showing all locks held in the system:
> >> >> > 2 locks held by khungtaskd/876:
> >> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]
> >> >> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
> >> >> > kernel/hung_task.c:249
> >> >
> >> > ... And two places to start looking are the two above rcu_read_lock() calls.
> >> > Especially given that khungtask shows up below.
> >> >
> >> >> >   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]
> >> >> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> >> >> > 2 locks held by getty/4414:
> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> > 2 locks held by getty/4415:
> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> > 2 locks held by getty/4416:
> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> > 2 locks held by getty/4417:
> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> > 2 locks held by getty/4418:
> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> > 2 locks held by getty/4419:
> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> > 2 locks held by getty/4420:
> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> > 1 lock held by syz-executor3/10803:
> >> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
> >> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> >> >> > 4 locks held by syz-executor5/10816:
> >> >> >   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]
> >> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >> >> >   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]
> >> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >> >> >   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]
> >> >> > tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
> >> >> >   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]
> >> >> > n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
> >> >> > 1 lock held by syz-executor2/10827:
> >> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
> >> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> >> >> > 1 lock held by blkid/10832:
> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> >> > 1 lock held by syz-executor4/10835:
> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> >> > 1 lock held by syz-executor4/10845:
> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> >> >
> >> >> > =============================================
> >> >> >
> >> >> > NMI backtrace for cpu 1
> >> >> > CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
> >> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> >> > Google 01/01/2011
> >> >> > Call Trace:
> >> >> >   __dump_stack lib/dump_stack.c:17 [inline]
> >> >> >   dump_stack+0x194/0x24d lib/dump_stack.c:53
> >> >> >   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
> >> >> >   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
> >> >> >   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> >> >> >   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
> >> >> >   check_hung_task kernel/hung_task.c:132 [inline]
> >> >> >   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
> >> >> >   watchdog+0x90c/0xd60 kernel/hung_task.c:249
> >> >> > INFO: rcu_sched self-detected stall on CPU
> >> >> >     0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906
> >> >> > softirq=33205/33205 fqs=30980
> >> >> >
> >> >> >   (t=125000 jiffies g=17618 c=17617 q=921)
> >> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> >> >> > Sending NMI from CPU 1 to CPUs 0:
> >> >> > NMI backtrace for cpu 0
> >> >> > CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
> >> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> >> > Google 01/01/2011
> >> >> > Workqueue: events_unbound flush_to_ldisc
> >> >> > RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
> >> >> > RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
> >> >> > RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
> >> >> > RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
> >> >> > RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
> >> >> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
> >> >> > R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
> >> >> > FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
> >> >> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> >> > CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
> >> >> > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> >> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> >> >> > Call Trace:
> >> >> >   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
> >> >> >   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
> >> >> >   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
> >> >> >   __receive_buf drivers/tty/n_tty.c:1611 [inline]
> >> >> >   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
> >> >> >   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
> >> >> >   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
> >> >> >   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
> >> >> >   receive_buf drivers/tty/tty_buffer.c:475 [inline]
> >> >> >   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
> >> >> >   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
> >> >> >   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
> >> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> >> >
> >> > And the above is another good place to look.
> >> >
> >> >                                                         Thanx, Paul
> >> >
> >> >> > Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38
> >> >> > d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48
> >> >> > 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
> >> >> >
> >> >> >
> >> >> > ---
> >> >> > This bug is generated by a dumb bot. It may contain errors.
> >> >> > See https://goo.gl/tpsmEJ for details.
> >> >> > Direct all questions to syzkaller@googlegroups.com.
> >> >> >
> >> >> > syzbot will keep track of this bug report.
> >> >> > If you forgot to add the Reported-by tag, once the fix for this bug is
> >> >> > merged
> >> >> > into any tree, please reply to this email with:
> >> >> > #syz fix: exact-commit-title
> >> >> > To mark this as a duplicate of another syzbot report, please reply with:
> >> >> > #syz dup: exact-subject-of-another-report
> >> >> > If it's a one-off invalid bug report, please reply with:
> >> >> > #syz invalid
> >> >> > Note: if the crash happens again, it will cause creation of a new bug
> >> >> > report.
> >> >> > Note: all commands must start from beginning of the line in the email body.
> >> >>
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> >> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> >> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180402153332.GM3948%40linux.vnet.ibm.com.
> >> > For more options, visit https://groups.google.com/d/optout.
> >>
> >
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 16:39           ` Paul E. McKenney
@ 2018-04-02 17:11             ` Dmitry Vyukov
  2018-04-02 17:23               ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-02 17:11 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>> >> <paulmck@linux.vnet.ibm.com> wrote:
>> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> >> syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:
>> >> >>
>> >> >> > Hello,
>> >> >> >
>> >> >> > syzbot hit the following crash on upstream commit
>> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> >> >> > Linux 4.16
>> >> >> > syzbot dashboard link:
>> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >
>> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> > Raw console output:
>> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> > Kernel config:
>> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >
>> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> > details.
>> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >
>> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> >> >> > option "g �;e�K�׫>pquota"
>> >> >
>> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
>> >> > playing around with mount options.
>> >> >
>> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >        Not tainted 4.16.0+ #10
>> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> >> > syz-executor3   D20944 10803   4492 0x80000002
>> >> >> > Call Trace:
>> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >>
>> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> somehow missed a transition into idle or user space.
>> >> >
>> >> > The RCU CPU stall warning below strongly supports this position ...
>> >>
>> >> I think this is this guy then:
>> >>
>> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >>
>> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >
>> > Seems likely to me!
>> >
>> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> think we need some kind of priority between them. I.e. we have rcu
>> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> hang and maybe something else. It would be useful if they fire
>> >> deterministically according to priorities. If there is an rcu stall,
>> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> etc.
>> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> reports (which is bad).
>> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> task hung detector has a problem that if you set timeout to X, it can
>> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> large timeout (a minute may not be enough), and on the other hand we
>> >> can't wait for an hour just to make sure that the machine is indeed
>> >> dead (these things happen every few minutes).
>> >
>> > I suppose that we could have a global variable that was set to the
>> > priority of the complaint in question, which would suppress all
>> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> > guess that not everyone is going to be happy with one complaint suppressing
>> > others, especially given the possibility that the two complaints might
>> > be about different things.
>> >
>> > Or did you have something more deft in mind?
>>
>>
>> syzkaller generally looks only at the first report. One does not know
>> if/when there will be a second one, or the second one can be induced
>> by the first one, and we generally want clean reports on a non-tainted
>> kernel. So we don't just need to suppress lower priority ones, we need
>> to produce the right report first.
>> I am thinking maybe setting:
>>  - rcu stalls at 1.5 minutes
>>  - workqueue stalls at 2 minutes
>>  - task hungs at 2.5 minutes
>>  - and no output whatsoever at 3 minutes
>> Do I miss anything? I think at least spinlocks. Should they go before
>> or after rcu?
>
> That is what I know of, but the Linux kernel being what it is, there is
> probably something more out there.  If not now, in a few months.  The
> RCU CPU stall timeout can be set on the kernel-boot command line, but
> you probably already knew that.


Well, it's all based solely on a large number of patches and stopgaps.
If we fix main problems for today, it's already good.


> Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> was 1.5 -seconds-.  ;-)

Have you tried to instrument every basic block with a function call to
collect coverage, check every damn memory access for validity, enable
all thinkable and unthinkable debug configs and put the insanest load
one can imagine from a swarm of parallel threads? It makes things a
bit slower ;)


>> This will require fixing task hung. Have not yet looked at workqueue detector.
>> Does at least RCU respect the given timeout more or less precisely?
>
> Assuming that there is at least one CPU capable of taking scheduling-clock
> interrupts, it should respect the timeout to within a few jiffies.

This is good!


>                                                         Thanx, Paul
>
>> >> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>> >> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
>> >> >> > kernel/trace/trace_event_perf.c:161
>> >> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>> >> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>> >> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>> >> >> >   put_event+0x24/0x30 kernel/events/core.c:4204
>> >> >> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>> >> >> >   perf_release+0x37/0x50 kernel/events/core.c:4320
>> >> >> >   __fput+0x327/0x7e0 fs/file_table.c:209
>> >> >> >   ____fput+0x15/0x20 fs/file_table.c:243
>> >> >> >   task_work_run+0x199/0x270 kernel/task_work.c:113
>> >> >> >   exit_task_work include/linux/task_work.h:22 [inline]
>> >> >> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>> >> >> >   do_group_exit+0x149/0x400 kernel/exit.c:968
>> >> >> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>> >> >> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>> >> >> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>> >> >> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >> >> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>> >> >> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>> >> >> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
>> >> >> > RIP: 0033:0x455269
>> >> >> > RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
>> >> >> > RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
>> >> >> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
>> >> >> > RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
>> >> >> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> >> >> > R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
>> >> >> >
>> >> >> > Showing all locks held in the system:
>> >> >> > 2 locks held by khungtaskd/876:
>> >> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]
>> >> >> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>> >> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
>> >> >> > kernel/hung_task.c:249
>> >> >
>> >> > ... And two places to start looking are the two above rcu_read_lock() calls.
>> >> > Especially given that khungtask shows up below.
>> >> >
>> >> >> >   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]
>> >> >> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
>> >> >> > 2 locks held by getty/4414:
>> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> >> > 2 locks held by getty/4415:
>> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> >> > 2 locks held by getty/4416:
>> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> >> > 2 locks held by getty/4417:
>> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> >> > 2 locks held by getty/4418:
>> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> >> > 2 locks held by getty/4419:
>> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> >> > 2 locks held by getty/4420:
>> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
>> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
>> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
>> >> >> > 1 lock held by syz-executor3/10803:
>> >> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
>> >> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
>> >> >> > 4 locks held by syz-executor5/10816:
>> >> >> >   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]
>> >> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>> >> >> >   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]
>> >> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
>> >> >> >   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]
>> >> >> > tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
>> >> >> >   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]
>> >> >> > n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
>> >> >> > 1 lock held by syz-executor2/10827:
>> >> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
>> >> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
>> >> >> > 1 lock held by blkid/10832:
>> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> >> >> > 1 lock held by syz-executor4/10835:
>> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> >> >> > 1 lock held by syz-executor4/10845:
>> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
>> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
>> >> >> >
>> >> >> > =============================================
>> >> >> >
>> >> >> > NMI backtrace for cpu 1
>> >> >> > CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
>> >> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> >> >> > Google 01/01/2011
>> >> >> > Call Trace:
>> >> >> >   __dump_stack lib/dump_stack.c:17 [inline]
>> >> >> >   dump_stack+0x194/0x24d lib/dump_stack.c:53
>> >> >> >   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>> >> >> >   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
>> >> >> >   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>> >> >> >   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
>> >> >> >   check_hung_task kernel/hung_task.c:132 [inline]
>> >> >> >   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
>> >> >> >   watchdog+0x90c/0xd60 kernel/hung_task.c:249
>> >> >> > INFO: rcu_sched self-detected stall on CPU
>> >> >> >     0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906
>> >> >> > softirq=33205/33205 fqs=30980
>> >> >> >
>> >> >> >   (t=125000 jiffies g=17618 c=17617 q=921)
>> >> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
>> >> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
>> >> >> > Sending NMI from CPU 1 to CPUs 0:
>> >> >> > NMI backtrace for cpu 0
>> >> >> > CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
>> >> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> >> >> > Google 01/01/2011
>> >> >> > Workqueue: events_unbound flush_to_ldisc
>> >> >> > RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
>> >> >> > RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
>> >> >> > RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
>> >> >> > RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
>> >> >> > RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
>> >> >> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
>> >> >> > R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
>> >> >> > FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
>> >> >> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> >> > CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
>> >> >> > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
>> >> >> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>> >> >> > Call Trace:
>> >> >> >   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
>> >> >> >   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
>> >> >> >   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
>> >> >> >   __receive_buf drivers/tty/n_tty.c:1611 [inline]
>> >> >> >   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
>> >> >> >   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
>> >> >> >   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
>> >> >> >   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
>> >> >> >   receive_buf drivers/tty/tty_buffer.c:475 [inline]
>> >> >> >   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
>> >> >> >   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
>> >> >> >   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>> >> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
>> >> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
>> >> >
>> >> > And the above is another good place to look.
>> >> >
>> >> >                                                         Thanx, Paul
>> >> >
>> >> >> > Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38
>> >> >> > d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48
>> >> >> > 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
>> >> >> >
>> >> >> >
>> >> >> > ---
>> >> >> > This bug is generated by a dumb bot. It may contain errors.
>> >> >> > See https://goo.gl/tpsmEJ for details.
>> >> >> > Direct all questions to syzkaller@googlegroups.com.
>> >> >> >
>> >> >> > syzbot will keep track of this bug report.
>> >> >> > If you forgot to add the Reported-by tag, once the fix for this bug is
>> >> >> > merged
>> >> >> > into any tree, please reply to this email with:
>> >> >> > #syz fix: exact-commit-title
>> >> >> > To mark this as a duplicate of another syzbot report, please reply with:
>> >> >> > #syz dup: exact-subject-of-another-report
>> >> >> > If it's a one-off invalid bug report, please reply with:
>> >> >> > #syz invalid
>> >> >> > Note: if the crash happens again, it will cause creation of a new bug
>> >> >> > report.
>> >> >> > Note: all commands must start from beginning of the line in the email body.
>> >> >>
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>> >> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> >> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180402153332.GM3948%40linux.vnet.ibm.com.
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >>
>> >
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 17:11             ` Dmitry Vyukov
@ 2018-04-02 17:23               ` Paul E. McKenney
  2018-04-09 12:54                 ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-02 17:23 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 02, 2018 at 07:11:50PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> >> syzbot <syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com> wrote:
> >> >> >>
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> >> >> > Linux 4.16
> >> >> >> > syzbot dashboard link:
> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >
> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> > Raw console output:
> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> > Kernel config:
> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >
> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> >> > details.
> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >
> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >
> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
> >> >> > playing around with mount options.
> >> >> >
> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >        Not tainted 4.16.0+ #10
> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> >> >> > Call Trace:
> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >>
> >> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> somehow missed a transition into idle or user space.
> >> >> >
> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >>
> >> >> I think this is this guy then:
> >> >>
> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >>
> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >
> >> > Seems likely to me!
> >> >
> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> hang and maybe something else. It would be useful if they fire
> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> etc.
> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> reports (which is bad).
> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> dead (these things happen every few minutes).
> >> >
> >> > I suppose that we could have a global variable that was set to the
> >> > priority of the complaint in question, which would suppress all
> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
> >> > guess that not everyone is going to be happy with one complaint suppressing
> >> > others, especially given the possibility that the two complaints might
> >> > be about different things.
> >> >
> >> > Or did you have something more deft in mind?
> >>
> >>
> >> syzkaller generally looks only at the first report. One does not know
> >> if/when there will be a second one, or the second one can be induced
> >> by the first one, and we generally want clean reports on a non-tainted
> >> kernel. So we don't just need to suppress lower priority ones, we need
> >> to produce the right report first.
> >> I am thinking maybe setting:
> >>  - rcu stalls at 1.5 minutes
> >>  - workqueue stalls at 2 minutes
> >>  - task hungs at 2.5 minutes
> >>  - and no output whatsoever at 3 minutes
> >> Do I miss anything? I think at least spinlocks. Should they go before
> >> or after rcu?
> >
> > That is what I know of, but the Linux kernel being what it is, there is
> > probably something more out there.  If not now, in a few months.  The
> > RCU CPU stall timeout can be set on the kernel-boot command line, but
> > you probably already knew that.
> 
> Well, it's all based solely on a large number of patches and stopgaps.
> If we fix main problems for today, it's already good.

Fair enough!

> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> > was 1.5 -seconds-.  ;-)
> 
> Have you tried to instrument every basic block with a function call to
> collect coverage, check every damn memory access for validity, enable
> all thinkable and unthinkable debug configs and put the insanest load
> one can imagine from a swarm of parallel threads? It makes things a
> bit slower ;)

Given that we wouldn't have had enough CPU or memory to accommodate
all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)

> >> This will require fixing task hung. Have not yet looked at workqueue detector.
> >> Does at least RCU respect the given timeout more or less precisely?
> >
> > Assuming that there is at least one CPU capable of taking scheduling-clock
> > interrupts, it should respect the timeout to within a few jiffies.
> 
> This is good!

;-)

                                                         Thanx, Paul

> >> >> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >> >> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
> >> >> >> > kernel/trace/trace_event_perf.c:161
> >> >> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >> >> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >> >> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >> >> >> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >> >> >> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >> >> >> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >> >> >> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >> >> >> >   ____fput+0x15/0x20 fs/file_table.c:243
> >> >> >> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >> >> >> >   exit_task_work include/linux/task_work.h:22 [inline]
> >> >> >> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >> >> >> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >> >> >> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >> >> >> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >> >> >> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >> >> >> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >> >> >> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >> >> >> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >> >> >> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> >> >> >> > RIP: 0033:0x455269
> >> >> >> > RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> >> >> >> > RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
> >> >> >> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
> >> >> >> > RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
> >> >> >> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >> >> >> > R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000
> >> >> >> >
> >> >> >> > Showing all locks held in the system:
> >> >> >> > 2 locks held by khungtaskd/876:
> >> >> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>]
> >> >> >> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >> >> >> >   #0:  (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
> >> >> >> > kernel/hung_task.c:249
> >> >> >
> >> >> > ... And two places to start looking are the two above rcu_read_lock() calls.
> >> >> > Especially given that khungtask shows up below.
> >> >> >
> >> >> >> >   #1:  (tasklist_lock){.+.+}, at: [<0000000006b3009f>]
> >> >> >> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> >> >> >> > 2 locks held by getty/4414:
> >> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> >> > 2 locks held by getty/4415:
> >> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> >> > 2 locks held by getty/4416:
> >> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> >> > 2 locks held by getty/4417:
> >> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> >> > 2 locks held by getty/4418:
> >> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> >> > 2 locks held by getty/4419:
> >> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> >> > 2 locks held by getty/4420:
> >> >> >> >   #0:  (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
> >> >> >> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >> >> >> >   #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
> >> >> >> > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> >> >> >> > 1 lock held by syz-executor3/10803:
> >> >> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
> >> >> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> >> >> >> > 4 locks held by syz-executor5/10816:
> >> >> >> >   #0:  (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]
> >> >> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >> >> >> >   #1:  (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]
> >> >> >> > tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
> >> >> >> >   #2:  (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]
> >> >> >> > tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
> >> >> >> >   #3:  (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]
> >> >> >> > n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
> >> >> >> > 1 lock held by syz-executor2/10827:
> >> >> >> >   #0:  (event_mutex){+.+.}, at: [<00000000c507b78a>]
> >> >> >> > perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
> >> >> >> > 1 lock held by blkid/10832:
> >> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> >> >> > 1 lock held by syz-executor4/10835:
> >> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> >> >> > 1 lock held by syz-executor4/10845:
> >> >> >> >   #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
> >> >> >> > lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
> >> >> >> >
> >> >> >> > =============================================
> >> >> >> >
> >> >> >> > NMI backtrace for cpu 1
> >> >> >> > CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
> >> >> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> >> >> > Google 01/01/2011
> >> >> >> > Call Trace:
> >> >> >> >   __dump_stack lib/dump_stack.c:17 [inline]
> >> >> >> >   dump_stack+0x194/0x24d lib/dump_stack.c:53
> >> >> >> >   nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
> >> >> >> >   nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
> >> >> >> >   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
> >> >> >> >   trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
> >> >> >> >   check_hung_task kernel/hung_task.c:132 [inline]
> >> >> >> >   check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
> >> >> >> >   watchdog+0x90c/0xd60 kernel/hung_task.c:249
> >> >> >> > INFO: rcu_sched self-detected stall on CPU
> >> >> >> >     0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906
> >> >> >> > softirq=33205/33205 fqs=30980
> >> >> >> >
> >> >> >> >   (t=125000 jiffies g=17618 c=17617 q=921)
> >> >> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >> >> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> >> >> >> > Sending NMI from CPU 1 to CPUs 0:
> >> >> >> > NMI backtrace for cpu 0
> >> >> >> > CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
> >> >> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> >> >> > Google 01/01/2011
> >> >> >> > Workqueue: events_unbound flush_to_ldisc
> >> >> >> > RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
> >> >> >> > RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
> >> >> >> > RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
> >> >> >> > RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
> >> >> >> > RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
> >> >> >> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
> >> >> >> > R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
> >> >> >> > FS:  0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
> >> >> >> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> >> >> > CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
> >> >> >> > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> >> >> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> >> >> >> > Call Trace:
> >> >> >> >   commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
> >> >> >> >   n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
> >> >> >> >   n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
> >> >> >> >   __receive_buf drivers/tty/n_tty.c:1611 [inline]
> >> >> >> >   n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
> >> >> >> >   n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
> >> >> >> >   tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
> >> >> >> >   tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
> >> >> >> >   receive_buf drivers/tty/tty_buffer.c:475 [inline]
> >> >> >> >   flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
> >> >> >> >   process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
> >> >> >> >   worker_thread+0x223/0x1990 kernel/workqueue.c:2247
> >> >> >> >   kthread+0x33c/0x400 kernel/kthread.c:238
> >> >> >> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> >> >> >
> >> >> > And the above is another good place to look.
> >> >> >
> >> >> >                                                         Thanx, Paul
> >> >> >
> >> >> >> > Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38
> >> >> >> > d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48
> >> >> >> > 90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e
> >> >> >> >
> >> >> >> >
> >> >> >> > ---
> >> >> >> > This bug is generated by a dumb bot. It may contain errors.
> >> >> >> > See https://goo.gl/tpsmEJ for details.
> >> >> >> > Direct all questions to syzkaller@googlegroups.com.
> >> >> >> >
> >> >> >> > syzbot will keep track of this bug report.
> >> >> >> > If you forgot to add the Reported-by tag, once the fix for this bug is
> >> >> >> > merged
> >> >> >> > into any tree, please reply to this email with:
> >> >> >> > #syz fix: exact-commit-title
> >> >> >> > To mark this as a duplicate of another syzbot report, please reply with:
> >> >> >> > #syz dup: exact-subject-of-another-report
> >> >> >> > If it's a one-off invalid bug report, please reply with:
> >> >> >> > #syz invalid
> >> >> >> > Note: if the crash happens again, it will cause creation of a new bug
> >> >> >> > report.
> >> >> >> > Note: all commands must start from beginning of the line in the email body.
> >> >> >>
> >> >> >
> >> >> > --
> >> >> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> >> >> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> >> >> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180402153332.GM3948%40linux.vnet.ibm.com.
> >> >> > For more options, visit https://groups.google.com/d/optout.
> >> >>
> >> >
> >>
> >
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-02 17:23               ` Paul E. McKenney
@ 2018-04-09 12:54                 ` Dmitry Vyukov
  2018-04-09 16:20                   ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-09 12:54 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>> >> >> >>
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> >> >> >> > Linux 4.16
>> >> >> >> > syzbot dashboard link:
>> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >
>> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> > Raw console output:
>> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> > Kernel config:
>> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >
>> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> >> > details.
>> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >
>> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >
>> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
>> >> >> > playing around with mount options.
>> >> >> >
>> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >        Not tainted 4.16.0+ #10
>> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
>> >> >> >> > Call Trace:
>> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >>
>> >> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >
>> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >>
>> >> >> I think this is this guy then:
>> >> >>
>> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >>
>> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >
>> >> > Seems likely to me!
>> >> >
>> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> etc.
>> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> reports (which is bad).
>> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> dead (these things happen every few minutes).
>> >> >
>> >> > I suppose that we could have a global variable that was set to the
>> >> > priority of the complaint in question, which would suppress all
>> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> > guess that not everyone is going to be happy with one complaint suppressing
>> >> > others, especially given the possibility that the two complaints might
>> >> > be about different things.
>> >> >
>> >> > Or did you have something more deft in mind?
>> >>
>> >>
>> >> syzkaller generally looks only at the first report. One does not know
>> >> if/when there will be a second one, or the second one can be induced
>> >> by the first one, and we generally want clean reports on a non-tainted
>> >> kernel. So we don't just need to suppress lower priority ones, we need
>> >> to produce the right report first.
>> >> I am thinking maybe setting:
>> >>  - rcu stalls at 1.5 minutes
>> >>  - workqueue stalls at 2 minutes
>> >>  - task hungs at 2.5 minutes
>> >>  - and no output whatsoever at 3 minutes
>> >> Do I miss anything? I think at least spinlocks. Should they go before
>> >> or after rcu?
>> >
>> > That is what I know of, but the Linux kernel being what it is, there is
>> > probably something more out there.  If not now, in a few months.  The
>> > RCU CPU stall timeout can be set on the kernel-boot command line, but
>> > you probably already knew that.
>>
>> Well, it's all based solely on a large number of patches and stopgaps.
>> If we fix main problems for today, it's already good.
>
> Fair enough!
>
>> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
>> > was 1.5 -seconds-.  ;-)
>>
>> Have you tried to instrument every basic block with a function call to
>> collect coverage, check every damn memory access for validity, enable
>> all thinkable and unthinkable debug configs and put the insanest load
>> one can imagine from a swarm of parallel threads? It makes things a
>> bit slower ;)
>
> Given that we wouldn't have had enough CPU or memory to accommodate
> all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
>
>> >> This will require fixing task hung. Have not yet looked at workqueue detector.
>> >> Does at least RCU respect the given timeout more or less precisely?
>> >
>> > Assuming that there is at least one CPU capable of taking scheduling-clock
>> > interrupts, it should respect the timeout to within a few jiffies.


Hi Paul,

Speaking of stalls and rcu, we are seeing lots of crashes that go like this:

INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
rcu_sched detected stalls on CPUs/tasks:
INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
rcu_sched detected stalls on CPUs/tasks:
INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
rcu_sched detected stalls on CPUs/tasks:

or like this:

INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
softirq=57641/57641 fqs=31151
0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
softirq=57641/57641 fqs=31151
 (t=125002 jiffies g=31656 c=31655 q=910)

 INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
softirq=65194/65194 fqs=31231
0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
softirq=65194/65194 fqs=31231
 (t=125002 jiffies g=34421 c=34420 q=1119)
(detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)


and then there is an unintelligible mess of 2 reports. Such crashes go
to trash bin, because we can't even say which function hanged. It
seems that in all cases 2 different rcu stall detection facilities
race with each other. Is it possible to make them not race?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-09 12:54                 ` Dmitry Vyukov
@ 2018-04-09 16:20                   ` Paul E. McKenney
  2018-04-09 16:28                     ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-09 16:20 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> >> >>
> >> >> >> >> > Hello,
> >> >> >> >> >
> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> >> >> >> > Linux 4.16
> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >
> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> > Raw console output:
> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> > Kernel config:
> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >
> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> >> >> > details.
> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >
> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >
> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
> >> >> >> > playing around with mount options.
> >> >> >> >
> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >> >        Not tainted 4.16.0+ #10
> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> >> >> >> > Call Trace:
> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >>
> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >
> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >>
> >> >> >> I think this is this guy then:
> >> >> >>
> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >>
> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >
> >> >> > Seems likely to me!
> >> >> >
> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> >> etc.
> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> reports (which is bad).
> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> dead (these things happen every few minutes).
> >> >> >
> >> >> > I suppose that we could have a global variable that was set to the
> >> >> > priority of the complaint in question, which would suppress all
> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
> >> >> > guess that not everyone is going to be happy with one complaint suppressing
> >> >> > others, especially given the possibility that the two complaints might
> >> >> > be about different things.
> >> >> >
> >> >> > Or did you have something more deft in mind?
> >> >>
> >> >>
> >> >> syzkaller generally looks only at the first report. One does not know
> >> >> if/when there will be a second one, or the second one can be induced
> >> >> by the first one, and we generally want clean reports on a non-tainted
> >> >> kernel. So we don't just need to suppress lower priority ones, we need
> >> >> to produce the right report first.
> >> >> I am thinking maybe setting:
> >> >>  - rcu stalls at 1.5 minutes
> >> >>  - workqueue stalls at 2 minutes
> >> >>  - task hungs at 2.5 minutes
> >> >>  - and no output whatsoever at 3 minutes
> >> >> Do I miss anything? I think at least spinlocks. Should they go before
> >> >> or after rcu?
> >> >
> >> > That is what I know of, but the Linux kernel being what it is, there is
> >> > probably something more out there.  If not now, in a few months.  The
> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
> >> > you probably already knew that.
> >>
> >> Well, it's all based solely on a large number of patches and stopgaps.
> >> If we fix main problems for today, it's already good.
> >
> > Fair enough!
> >
> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> >> > was 1.5 -seconds-.  ;-)
> >>
> >> Have you tried to instrument every basic block with a function call to
> >> collect coverage, check every damn memory access for validity, enable
> >> all thinkable and unthinkable debug configs and put the insanest load
> >> one can imagine from a swarm of parallel threads? It makes things a
> >> bit slower ;)
> >
> > Given that we wouldn't have had enough CPU or memory to accommodate
> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
> >
> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
> >> >> Does at least RCU respect the given timeout more or less precisely?
> >> >
> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
> >> > interrupts, it should respect the timeout to within a few jiffies.
> 
> 
> Hi Paul,
> 
> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
> 
> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
> rcu_sched detected stalls on CPUs/tasks:
> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
> rcu_sched detected stalls on CPUs/tasks:
> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
> rcu_sched detected stalls on CPUs/tasks:
> 
> or like this:
> 
> INFO: rcu_sched self-detected stall on CPU
> INFO: rcu_sched detected stalls on CPUs/tasks:
> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> softirq=57641/57641 fqs=31151
> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> softirq=57641/57641 fqs=31151
>  (t=125002 jiffies g=31656 c=31655 q=910)
> 
>  INFO: rcu_sched self-detected stall on CPU
> INFO: rcu_sched detected stalls on CPUs/tasks:
> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> softirq=65194/65194 fqs=31231
> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> softirq=65194/65194 fqs=31231
>  (t=125002 jiffies g=34421 c=34420 q=1119)
> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
> 
> 
> and then there is an unintelligible mess of 2 reports. Such crashes go
> to trash bin, because we can't even say which function hanged. It
> seems that in all cases 2 different rcu stall detection facilities
> race with each other. Is it possible to make them not race?

How about the following (untested, not for mainline) patch?  It suppresses
all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
to synchronize_rcu().  This works well in the common case where there
is almost always an RCU grace period in flight.

One reason that this patch is not for mainline is that I am working on
merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
at which point there won't be any races.  But that might be a couple
merge windows away from now.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 381b47a68ac6..31f7818f2d63 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 	struct rcu_node *rnp;
 
 	if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
-	    !rcu_gp_in_progress(rsp))
+	    !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
 		return;
 	rcu_stall_kick_kthreads(rsp);
 	j = jiffies;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-09 16:20                   ` Paul E. McKenney
@ 2018-04-09 16:28                     ` Dmitry Vyukov
  2018-04-09 18:11                       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-09 16:28 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> >> >> >> >>
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> >> >> >> >> > Linux 4.16
>> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >
>> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> > Raw console output:
>> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> > Kernel config:
>> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >
>> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> >> >> > details.
>> >> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >> >
>> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >
>> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
>> >> >> >> > playing around with mount options.
>> >> >> >> >
>> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >> >        Not tainted 4.16.0+ #10
>> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
>> >> >> >> >> > Call Trace:
>> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >>
>> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >
>> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >> >>
>> >> >> >> I think this is this guy then:
>> >> >> >>
>> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >>
>> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >
>> >> >> > Seems likely to me!
>> >> >> >
>> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> >> etc.
>> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> reports (which is bad).
>> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> dead (these things happen every few minutes).
>> >> >> >
>> >> >> > I suppose that we could have a global variable that was set to the
>> >> >> > priority of the complaint in question, which would suppress all
>> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> >> > guess that not everyone is going to be happy with one complaint suppressing
>> >> >> > others, especially given the possibility that the two complaints might
>> >> >> > be about different things.
>> >> >> >
>> >> >> > Or did you have something more deft in mind?
>> >> >>
>> >> >>
>> >> >> syzkaller generally looks only at the first report. One does not know
>> >> >> if/when there will be a second one, or the second one can be induced
>> >> >> by the first one, and we generally want clean reports on a non-tainted
>> >> >> kernel. So we don't just need to suppress lower priority ones, we need
>> >> >> to produce the right report first.
>> >> >> I am thinking maybe setting:
>> >> >>  - rcu stalls at 1.5 minutes
>> >> >>  - workqueue stalls at 2 minutes
>> >> >>  - task hungs at 2.5 minutes
>> >> >>  - and no output whatsoever at 3 minutes
>> >> >> Do I miss anything? I think at least spinlocks. Should they go before
>> >> >> or after rcu?
>> >> >
>> >> > That is what I know of, but the Linux kernel being what it is, there is
>> >> > probably something more out there.  If not now, in a few months.  The
>> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
>> >> > you probably already knew that.
>> >>
>> >> Well, it's all based solely on a large number of patches and stopgaps.
>> >> If we fix main problems for today, it's already good.
>> >
>> > Fair enough!
>> >
>> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
>> >> > was 1.5 -seconds-.  ;-)
>> >>
>> >> Have you tried to instrument every basic block with a function call to
>> >> collect coverage, check every damn memory access for validity, enable
>> >> all thinkable and unthinkable debug configs and put the insanest load
>> >> one can imagine from a swarm of parallel threads? It makes things a
>> >> bit slower ;)
>> >
>> > Given that we wouldn't have had enough CPU or memory to accommodate
>> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
>> >
>> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
>> >> >> Does at least RCU respect the given timeout more or less precisely?
>> >> >
>> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
>> >> > interrupts, it should respect the timeout to within a few jiffies.
>>
>>
>> Hi Paul,
>>
>> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
>>
>> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
>> rcu_sched detected stalls on CPUs/tasks:
>> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
>> rcu_sched detected stalls on CPUs/tasks:
>> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
>> rcu_sched detected stalls on CPUs/tasks:
>>
>> or like this:
>>
>> INFO: rcu_sched self-detected stall on CPU
>> INFO: rcu_sched detected stalls on CPUs/tasks:
>> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> softirq=57641/57641 fqs=31151
>> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> softirq=57641/57641 fqs=31151
>>  (t=125002 jiffies g=31656 c=31655 q=910)
>>
>>  INFO: rcu_sched self-detected stall on CPU
>> INFO: rcu_sched detected stalls on CPUs/tasks:
>> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> softirq=65194/65194 fqs=31231
>> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> softirq=65194/65194 fqs=31231
>>  (t=125002 jiffies g=34421 c=34420 q=1119)
>> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
>>
>>
>> and then there is an unintelligible mess of 2 reports. Such crashes go
>> to trash bin, because we can't even say which function hanged. It
>> seems that in all cases 2 different rcu stall detection facilities
>> race with each other. Is it possible to make them not race?
>
> How about the following (untested, not for mainline) patch?  It suppresses
> all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
> rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
> to synchronize_rcu().  This works well in the common case where there
> is almost always an RCU grace period in flight.
>
> One reason that this patch is not for mainline is that I am working on
> merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
> at which point there won't be any races.  But that might be a couple
> merge windows away from now.
>
>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 381b47a68ac6..31f7818f2d63 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>         struct rcu_node *rnp;
>
>         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
> -           !rcu_gp_in_progress(rsp))
> +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
>                 return;
>         rcu_stall_kick_kthreads(rsp);
>         j = jiffies;


But doesn't they both relate to the same rcu flavor? They both say
rcu_sched. I assumed that the difference is "self-detected" vs "on
CPUs/tasks", i.e. on the current CPU vs on other CPUs.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-09 16:28                     ` Dmitry Vyukov
@ 2018-04-09 18:11                       ` Paul E. McKenney
  2018-04-10 11:13                         ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-09 18:11 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> >> >> >>
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >
> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >
> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> >> >> >> > details.
> >> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >> >
> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >
> >> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
> >> >> >> >> > playing around with mount options.
> >> >> >> >> >
> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >> >> >        Not tainted 4.16.0+ #10
> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >>
> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >
> >> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >> >>
> >> >> >> >> I think this is this guy then:
> >> >> >> >>
> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >>
> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >
> >> >> >> > Seems likely to me!
> >> >> >> >
> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> >> >> etc.
> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> >> reports (which is bad).
> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> >> dead (these things happen every few minutes).
> >> >> >> >
> >> >> >> > I suppose that we could have a global variable that was set to the
> >> >> >> > priority of the complaint in question, which would suppress all
> >> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
> >> >> >> > guess that not everyone is going to be happy with one complaint suppressing
> >> >> >> > others, especially given the possibility that the two complaints might
> >> >> >> > be about different things.
> >> >> >> >
> >> >> >> > Or did you have something more deft in mind?
> >> >> >>
> >> >> >>
> >> >> >> syzkaller generally looks only at the first report. One does not know
> >> >> >> if/when there will be a second one, or the second one can be induced
> >> >> >> by the first one, and we generally want clean reports on a non-tainted
> >> >> >> kernel. So we don't just need to suppress lower priority ones, we need
> >> >> >> to produce the right report first.
> >> >> >> I am thinking maybe setting:
> >> >> >>  - rcu stalls at 1.5 minutes
> >> >> >>  - workqueue stalls at 2 minutes
> >> >> >>  - task hungs at 2.5 minutes
> >> >> >>  - and no output whatsoever at 3 minutes
> >> >> >> Do I miss anything? I think at least spinlocks. Should they go before
> >> >> >> or after rcu?
> >> >> >
> >> >> > That is what I know of, but the Linux kernel being what it is, there is
> >> >> > probably something more out there.  If not now, in a few months.  The
> >> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
> >> >> > you probably already knew that.
> >> >>
> >> >> Well, it's all based solely on a large number of patches and stopgaps.
> >> >> If we fix main problems for today, it's already good.
> >> >
> >> > Fair enough!
> >> >
> >> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> >> >> > was 1.5 -seconds-.  ;-)
> >> >>
> >> >> Have you tried to instrument every basic block with a function call to
> >> >> collect coverage, check every damn memory access for validity, enable
> >> >> all thinkable and unthinkable debug configs and put the insanest load
> >> >> one can imagine from a swarm of parallel threads? It makes things a
> >> >> bit slower ;)
> >> >
> >> > Given that we wouldn't have had enough CPU or memory to accommodate
> >> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
> >> >
> >> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
> >> >> >> Does at least RCU respect the given timeout more or less precisely?
> >> >> >
> >> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
> >> >> > interrupts, it should respect the timeout to within a few jiffies.
> >>
> >>
> >> Hi Paul,
> >>
> >> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
> >>
> >> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
> >> rcu_sched detected stalls on CPUs/tasks:
> >> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
> >> rcu_sched detected stalls on CPUs/tasks:
> >> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
> >> rcu_sched detected stalls on CPUs/tasks:
> >>
> >> or like this:
> >>
> >> INFO: rcu_sched self-detected stall on CPU
> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> softirq=57641/57641 fqs=31151
> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> softirq=57641/57641 fqs=31151
> >>  (t=125002 jiffies g=31656 c=31655 q=910)
> >>
> >>  INFO: rcu_sched self-detected stall on CPU
> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> softirq=65194/65194 fqs=31231
> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> softirq=65194/65194 fqs=31231
> >>  (t=125002 jiffies g=34421 c=34420 q=1119)
> >> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
> >>
> >>
> >> and then there is an unintelligible mess of 2 reports. Such crashes go
> >> to trash bin, because we can't even say which function hanged. It
> >> seems that in all cases 2 different rcu stall detection facilities
> >> race with each other. Is it possible to make them not race?
> >
> > How about the following (untested, not for mainline) patch?  It suppresses
> > all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
> > rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
> > to synchronize_rcu().  This works well in the common case where there
> > is almost always an RCU grace period in flight.
> >
> > One reason that this patch is not for mainline is that I am working on
> > merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
> > at which point there won't be any races.  But that might be a couple
> > merge windows away from now.
> >
> >                                                         Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 381b47a68ac6..31f7818f2d63 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >         struct rcu_node *rnp;
> >
> >         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
> > -           !rcu_gp_in_progress(rsp))
> > +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
> >                 return;
> >         rcu_stall_kick_kthreads(rsp);
> >         j = jiffies;
> 
> But doesn't they both relate to the same rcu flavor? They both say
> rcu_sched. I assumed that the difference is "self-detected" vs "on
> CPUs/tasks", i.e. on the current CPU vs on other CPUs.

Right you are!

One approach would be to increase the value of RCU_STALL_RAT_DELAY,
which is currently two jiffies to (say) 20 jiffies.  This is in
kernel/rcu/tree.h.  But this would fail on a sufficiently overloaded
system -- and the failure of the two-jiffy delay is a bit of a surprise,
given interrupts disabled and all that.  Are you by any chance loaded
heavily enough to see vCPU preemption?

I could avoid at least some of these timing issues instead using cmpxchg()
on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
update to discourage overly long stall prints from running into the
next one.  This is not perfect, either, and is roughly equivalent to
setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
avoiding that minute's delay.  But it should get rid of the duplication
in almost all cases, though it could allow a stall warning to overlap
with a later stall warning for that same grace period.  Which can
already happen anyway.  Also, a tens-of-seconds vCPU preemption can
still cause concurrent stall warnings, but if that is happening to you,
the concurrent stall warnings are probably the least of your problems.
Besides, we do need at least one CPU to actually report the stall, which
won't happen if that CPU's vCPU is indefinitely preempted.  So there is
only so much I can do about that particular corner case.

So how does the following (untested) patch work for you?

							Thanx, Paul

------------------------------------------------------------------------

commit 6a5ab1e68f8636d8823bb5a9aee35fc44c2be866
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Apr 9 11:04:46 2018 -0700

    rcu: Exclude near-simultaneous RCU CPU stall warnings
    
    There is a two-jiffy delay between the time that a CPU will self-report
    an RCU CPU stall warning and the time that some other CPU will report a
    warning on behalf of the first CPU.  This has worked well in the past,
    but on busy systems, it is possible for the two warnings to overlap,
    which makes interpreting them extremely difficult.
    
    This commit therefore uses a cmpxchg-based timing decision that
    allows only one report in a given one-minute period (assuming default
    stall-warning Kconfig parameters).  This approach will of course fail
    if you are seeing minute-long vCPU preemption, but in that case the
    overlapping RCU CPU stall warnings are the least of your worries.
    
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 381b47a68ac6..b7246bcbf633 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1429,8 +1429,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		return;
 	}
-	WRITE_ONCE(rsp->jiffies_stall,
-		   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
 	/*
@@ -1481,6 +1479,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 			sched_show_task(current);
 		}
 	}
+	/* Rewrite if needed in case of slow consoles. */
+	if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
+		WRITE_ONCE(rsp->jiffies_stall,
+			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
 
 	rcu_check_gp_kthread_starvation(rsp);
 
@@ -1525,6 +1527,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 	rcu_dump_cpu_stacks(rsp);
 
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	/* Rewrite if needed in case of slow consoles. */
 	if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
 		WRITE_ONCE(rsp->jiffies_stall,
 			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
@@ -1548,6 +1551,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 	unsigned long gpnum;
 	unsigned long gps;
 	unsigned long j;
+	unsigned long jn;
 	unsigned long js;
 	struct rcu_node *rnp;
 
@@ -1586,14 +1590,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 	    ULONG_CMP_GE(gps, js))
 		return; /* No stall or GP completed since entering function. */
 	rnp = rdp->mynode;
+	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
 	if (rcu_gp_in_progress(rsp) &&
-	    (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
+	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
+	    cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
 
 		/* We haven't checked in, so go dump stack. */
 		print_cpu_stall(rsp);
 
 	} else if (rcu_gp_in_progress(rsp) &&
-		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
+		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
+		   cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
 
 		/* They had a few time units to dump stack, so complain. */
 		print_other_cpu_stall(rsp, gpnum);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-09 18:11                       ` Paul E. McKenney
@ 2018-04-10 11:13                         ` Dmitry Vyukov
  2018-04-10 17:02                           ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-10 11:13 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >> <paulmck@linux.vnet.ibm.com> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >
>> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >
>> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> >> >> >> > details.
>> >> >> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >> >> >
>> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >
>> >> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
>> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >
>> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >> >> >        Not tainted 4.16.0+ #10
>> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
>> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >
>> >> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >> >> >>
>> >> >> >> >> I think this is this guy then:
>> >> >> >> >>
>> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >>
>> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >
>> >> >> >> > Seems likely to me!
>> >> >> >> >
>> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> >> >> etc.
>> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> >> reports (which is bad).
>> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> >> dead (these things happen every few minutes).
>> >> >> >> >
>> >> >> >> > I suppose that we could have a global variable that was set to the
>> >> >> >> > priority of the complaint in question, which would suppress all
>> >> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> >> >> > guess that not everyone is going to be happy with one complaint suppressing
>> >> >> >> > others, especially given the possibility that the two complaints might
>> >> >> >> > be about different things.
>> >> >> >> >
>> >> >> >> > Or did you have something more deft in mind?
>> >> >> >>
>> >> >> >>
>> >> >> >> syzkaller generally looks only at the first report. One does not know
>> >> >> >> if/when there will be a second one, or the second one can be induced
>> >> >> >> by the first one, and we generally want clean reports on a non-tainted
>> >> >> >> kernel. So we don't just need to suppress lower priority ones, we need
>> >> >> >> to produce the right report first.
>> >> >> >> I am thinking maybe setting:
>> >> >> >>  - rcu stalls at 1.5 minutes
>> >> >> >>  - workqueue stalls at 2 minutes
>> >> >> >>  - task hungs at 2.5 minutes
>> >> >> >>  - and no output whatsoever at 3 minutes
>> >> >> >> Do I miss anything? I think at least spinlocks. Should they go before
>> >> >> >> or after rcu?
>> >> >> >
>> >> >> > That is what I know of, but the Linux kernel being what it is, there is
>> >> >> > probably something more out there.  If not now, in a few months.  The
>> >> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
>> >> >> > you probably already knew that.
>> >> >>
>> >> >> Well, it's all based solely on a large number of patches and stopgaps.
>> >> >> If we fix main problems for today, it's already good.
>> >> >
>> >> > Fair enough!
>> >> >
>> >> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
>> >> >> > was 1.5 -seconds-.  ;-)
>> >> >>
>> >> >> Have you tried to instrument every basic block with a function call to
>> >> >> collect coverage, check every damn memory access for validity, enable
>> >> >> all thinkable and unthinkable debug configs and put the insanest load
>> >> >> one can imagine from a swarm of parallel threads? It makes things a
>> >> >> bit slower ;)
>> >> >
>> >> > Given that we wouldn't have had enough CPU or memory to accommodate
>> >> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
>> >> >
>> >> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
>> >> >> >> Does at least RCU respect the given timeout more or less precisely?
>> >> >> >
>> >> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
>> >> >> > interrupts, it should respect the timeout to within a few jiffies.
>> >>
>> >>
>> >> Hi Paul,
>> >>
>> >> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
>> >>
>> >> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
>> >> rcu_sched detected stalls on CPUs/tasks:
>> >> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
>> >> rcu_sched detected stalls on CPUs/tasks:
>> >> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
>> >> rcu_sched detected stalls on CPUs/tasks:
>> >>
>> >> or like this:
>> >>
>> >> INFO: rcu_sched self-detected stall on CPU
>> >> INFO: rcu_sched detected stalls on CPUs/tasks:
>> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> >> softirq=57641/57641 fqs=31151
>> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> >> softirq=57641/57641 fqs=31151
>> >>  (t=125002 jiffies g=31656 c=31655 q=910)
>> >>
>> >>  INFO: rcu_sched self-detected stall on CPU
>> >> INFO: rcu_sched detected stalls on CPUs/tasks:
>> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> >> softirq=65194/65194 fqs=31231
>> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> >> softirq=65194/65194 fqs=31231
>> >>  (t=125002 jiffies g=34421 c=34420 q=1119)
>> >> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
>> >>
>> >>
>> >> and then there is an unintelligible mess of 2 reports. Such crashes go
>> >> to trash bin, because we can't even say which function hanged. It
>> >> seems that in all cases 2 different rcu stall detection facilities
>> >> race with each other. Is it possible to make them not race?
>> >
>> > How about the following (untested, not for mainline) patch?  It suppresses
>> > all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
>> > rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
>> > to synchronize_rcu().  This works well in the common case where there
>> > is almost always an RCU grace period in flight.
>> >
>> > One reason that this patch is not for mainline is that I am working on
>> > merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
>> > at which point there won't be any races.  But that might be a couple
>> > merge windows away from now.
>> >
>> >                                                         Thanx, Paul
>> >
>> > ------------------------------------------------------------------------
>> >
>> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> > index 381b47a68ac6..31f7818f2d63 100644
>> > --- a/kernel/rcu/tree.c
>> > +++ b/kernel/rcu/tree.c
>> > @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>> >         struct rcu_node *rnp;
>> >
>> >         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
>> > -           !rcu_gp_in_progress(rsp))
>> > +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
>> >                 return;
>> >         rcu_stall_kick_kthreads(rsp);
>> >         j = jiffies;
>>
>> But doesn't they both relate to the same rcu flavor? They both say
>> rcu_sched. I assumed that the difference is "self-detected" vs "on
>> CPUs/tasks", i.e. on the current CPU vs on other CPUs.
>
> Right you are!
>
> One approach would be to increase the value of RCU_STALL_RAT_DELAY,
> which is currently two jiffies to (say) 20 jiffies.  This is in
> kernel/rcu/tree.h.  But this would fail on a sufficiently overloaded
> system -- and the failure of the two-jiffy delay is a bit of a surprise,
> given interrupts disabled and all that.  Are you by any chance loaded
> heavily enough to see vCPU preemption?
>
> I could avoid at least some of these timing issues instead using cmpxchg()
> on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
> update to discourage overly long stall prints from running into the
> next one.  This is not perfect, either, and is roughly equivalent to
> setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
> avoiding that minute's delay.  But it should get rid of the duplication
> in almost all cases, though it could allow a stall warning to overlap
> with a later stall warning for that same grace period.  Which can
> already happen anyway.  Also, a tens-of-seconds vCPU preemption can
> still cause concurrent stall warnings, but if that is happening to you,
> the concurrent stall warnings are probably the least of your problems.
> Besides, we do need at least one CPU to actually report the stall, which
> won't happen if that CPU's vCPU is indefinitely preempted.  So there is
> only so much I can do about that particular corner case.
>
> So how does the following (untested) patch work for you?

Looks good to me.

We run on VMs, so we can well have vCPU preemption.


>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 6a5ab1e68f8636d8823bb5a9aee35fc44c2be866
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Mon Apr 9 11:04:46 2018 -0700
>
>     rcu: Exclude near-simultaneous RCU CPU stall warnings
>
>     There is a two-jiffy delay between the time that a CPU will self-report
>     an RCU CPU stall warning and the time that some other CPU will report a
>     warning on behalf of the first CPU.  This has worked well in the past,
>     but on busy systems, it is possible for the two warnings to overlap,
>     which makes interpreting them extremely difficult.
>
>     This commit therefore uses a cmpxchg-based timing decision that
>     allows only one report in a given one-minute period (assuming default
>     stall-warning Kconfig parameters).  This approach will of course fail
>     if you are seeing minute-long vCPU preemption, but in that case the
>     overlapping RCU CPU stall warnings are the least of your worries.
>
>     Reported-by: Dmitry Vyukov <dvyukov@google.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 381b47a68ac6..b7246bcbf633 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1429,8 +1429,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>                 raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>                 return;
>         }
> -       WRITE_ONCE(rsp->jiffies_stall,
> -                  jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>         raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>
>         /*
> @@ -1481,6 +1479,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>                         sched_show_task(current);
>                 }
>         }
> +       /* Rewrite if needed in case of slow consoles. */
> +       if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
> +               WRITE_ONCE(rsp->jiffies_stall,
> +                          jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>
>         rcu_check_gp_kthread_starvation(rsp);
>
> @@ -1525,6 +1527,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
>         rcu_dump_cpu_stacks(rsp);
>
>         raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +       /* Rewrite if needed in case of slow consoles. */
>         if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
>                 WRITE_ONCE(rsp->jiffies_stall,
>                            jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> @@ -1548,6 +1551,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>         unsigned long gpnum;
>         unsigned long gps;
>         unsigned long j;
> +       unsigned long jn;
>         unsigned long js;
>         struct rcu_node *rnp;
>
> @@ -1586,14 +1590,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>             ULONG_CMP_GE(gps, js))
>                 return; /* No stall or GP completed since entering function. */
>         rnp = rdp->mynode;
> +       jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
>         if (rcu_gp_in_progress(rsp) &&
> -           (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
> +           (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> +           cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
>
>                 /* We haven't checked in, so go dump stack. */
>                 print_cpu_stall(rsp);
>
>         } else if (rcu_gp_in_progress(rsp) &&
> -                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
> +                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> +                  cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
>
>                 /* They had a few time units to dump stack, so complain. */
>                 print_other_cpu_stall(rsp, gpnum);
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-10 11:13                         ` Dmitry Vyukov
@ 2018-04-10 17:02                           ` Paul E. McKenney
  2018-04-11 10:06                             ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-10 17:02 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Tue, Apr 10, 2018 at 01:13:13PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >
> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >
> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >> >> >> >        Not tainted 4.16.0+ #10
> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >
> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >> >> >>
> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >>
> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >>
> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >
> >> >> >> >> > Seems likely to me!
> >> >> >> >> >
> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> >> >> >> etc.
> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> >> >> reports (which is bad).
> >> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> >> >> dead (these things happen every few minutes).
> >> >> >> >> >
> >> >> >> >> > I suppose that we could have a global variable that was set to the
> >> >> >> >> > priority of the complaint in question, which would suppress all
> >> >> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
> >> >> >> >> > guess that not everyone is going to be happy with one complaint suppressing
> >> >> >> >> > others, especially given the possibility that the two complaints might
> >> >> >> >> > be about different things.
> >> >> >> >> >
> >> >> >> >> > Or did you have something more deft in mind?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> syzkaller generally looks only at the first report. One does not know
> >> >> >> >> if/when there will be a second one, or the second one can be induced
> >> >> >> >> by the first one, and we generally want clean reports on a non-tainted
> >> >> >> >> kernel. So we don't just need to suppress lower priority ones, we need
> >> >> >> >> to produce the right report first.
> >> >> >> >> I am thinking maybe setting:
> >> >> >> >>  - rcu stalls at 1.5 minutes
> >> >> >> >>  - workqueue stalls at 2 minutes
> >> >> >> >>  - task hungs at 2.5 minutes
> >> >> >> >>  - and no output whatsoever at 3 minutes
> >> >> >> >> Do I miss anything? I think at least spinlocks. Should they go before
> >> >> >> >> or after rcu?
> >> >> >> >
> >> >> >> > That is what I know of, but the Linux kernel being what it is, there is
> >> >> >> > probably something more out there.  If not now, in a few months.  The
> >> >> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
> >> >> >> > you probably already knew that.
> >> >> >>
> >> >> >> Well, it's all based solely on a large number of patches and stopgaps.
> >> >> >> If we fix main problems for today, it's already good.
> >> >> >
> >> >> > Fair enough!
> >> >> >
> >> >> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> >> >> >> > was 1.5 -seconds-.  ;-)
> >> >> >>
> >> >> >> Have you tried to instrument every basic block with a function call to
> >> >> >> collect coverage, check every damn memory access for validity, enable
> >> >> >> all thinkable and unthinkable debug configs and put the insanest load
> >> >> >> one can imagine from a swarm of parallel threads? It makes things a
> >> >> >> bit slower ;)
> >> >> >
> >> >> > Given that we wouldn't have had enough CPU or memory to accommodate
> >> >> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
> >> >> >
> >> >> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
> >> >> >> >> Does at least RCU respect the given timeout more or less precisely?
> >> >> >> >
> >> >> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
> >> >> >> > interrupts, it should respect the timeout to within a few jiffies.
> >> >>
> >> >>
> >> >> Hi Paul,
> >> >>
> >> >> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
> >> >>
> >> >> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >>
> >> >> or like this:
> >> >>
> >> >> INFO: rcu_sched self-detected stall on CPU
> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> >> softirq=57641/57641 fqs=31151
> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> >> softirq=57641/57641 fqs=31151
> >> >>  (t=125002 jiffies g=31656 c=31655 q=910)
> >> >>
> >> >>  INFO: rcu_sched self-detected stall on CPU
> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> >> softirq=65194/65194 fqs=31231
> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> >> softirq=65194/65194 fqs=31231
> >> >>  (t=125002 jiffies g=34421 c=34420 q=1119)
> >> >> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
> >> >>
> >> >>
> >> >> and then there is an unintelligible mess of 2 reports. Such crashes go
> >> >> to trash bin, because we can't even say which function hanged. It
> >> >> seems that in all cases 2 different rcu stall detection facilities
> >> >> race with each other. Is it possible to make them not race?
> >> >
> >> > How about the following (untested, not for mainline) patch?  It suppresses
> >> > all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
> >> > rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
> >> > to synchronize_rcu().  This works well in the common case where there
> >> > is almost always an RCU grace period in flight.
> >> >
> >> > One reason that this patch is not for mainline is that I am working on
> >> > merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
> >> > at which point there won't be any races.  But that might be a couple
> >> > merge windows away from now.
> >> >
> >> >                                                         Thanx, Paul
> >> >
> >> > ------------------------------------------------------------------------
> >> >
> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> > index 381b47a68ac6..31f7818f2d63 100644
> >> > --- a/kernel/rcu/tree.c
> >> > +++ b/kernel/rcu/tree.c
> >> > @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >> >         struct rcu_node *rnp;
> >> >
> >> >         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
> >> > -           !rcu_gp_in_progress(rsp))
> >> > +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
> >> >                 return;
> >> >         rcu_stall_kick_kthreads(rsp);
> >> >         j = jiffies;
> >>
> >> But doesn't they both relate to the same rcu flavor? They both say
> >> rcu_sched. I assumed that the difference is "self-detected" vs "on
> >> CPUs/tasks", i.e. on the current CPU vs on other CPUs.
> >
> > Right you are!
> >
> > One approach would be to increase the value of RCU_STALL_RAT_DELAY,
> > which is currently two jiffies to (say) 20 jiffies.  This is in
> > kernel/rcu/tree.h.  But this would fail on a sufficiently overloaded
> > system -- and the failure of the two-jiffy delay is a bit of a surprise,
> > given interrupts disabled and all that.  Are you by any chance loaded
> > heavily enough to see vCPU preemption?
> >
> > I could avoid at least some of these timing issues instead using cmpxchg()
> > on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
> > update to discourage overly long stall prints from running into the
> > next one.  This is not perfect, either, and is roughly equivalent to
> > setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
> > avoiding that minute's delay.  But it should get rid of the duplication
> > in almost all cases, though it could allow a stall warning to overlap
> > with a later stall warning for that same grace period.  Which can
> > already happen anyway.  Also, a tens-of-seconds vCPU preemption can
> > still cause concurrent stall warnings, but if that is happening to you,
> > the concurrent stall warnings are probably the least of your problems.
> > Besides, we do need at least one CPU to actually report the stall, which
> > won't happen if that CPU's vCPU is indefinitely preempted.  So there is
> > only so much I can do about that particular corner case.
> >
> > So how does the following (untested) patch work for you?
> 
> Looks good to me.
> 
> We run on VMs, so we can well have vCPU preemption.

Very good!  Please do get me a Tested-by when you get to that point.

                                                        Thanx, Paul

> > ------------------------------------------------------------------------
> >
> > commit 6a5ab1e68f8636d8823bb5a9aee35fc44c2be866
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Mon Apr 9 11:04:46 2018 -0700
> >
> >     rcu: Exclude near-simultaneous RCU CPU stall warnings
> >
> >     There is a two-jiffy delay between the time that a CPU will self-report
> >     an RCU CPU stall warning and the time that some other CPU will report a
> >     warning on behalf of the first CPU.  This has worked well in the past,
> >     but on busy systems, it is possible for the two warnings to overlap,
> >     which makes interpreting them extremely difficult.
> >
> >     This commit therefore uses a cmpxchg-based timing decision that
> >     allows only one report in a given one-minute period (assuming default
> >     stall-warning Kconfig parameters).  This approach will of course fail
> >     if you are seeing minute-long vCPU preemption, but in that case the
> >     overlapping RCU CPU stall warnings are the least of your worries.
> >
> >     Reported-by: Dmitry Vyukov <dvyukov@google.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 381b47a68ac6..b7246bcbf633 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1429,8 +1429,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
> >                 raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >                 return;
> >         }
> > -       WRITE_ONCE(rsp->jiffies_stall,
> > -                  jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >         raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >
> >         /*
> > @@ -1481,6 +1479,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
> >                         sched_show_task(current);
> >                 }
> >         }
> > +       /* Rewrite if needed in case of slow consoles. */
> > +       if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
> > +               WRITE_ONCE(rsp->jiffies_stall,
> > +                          jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >
> >         rcu_check_gp_kthread_starvation(rsp);
> >
> > @@ -1525,6 +1527,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> >         rcu_dump_cpu_stacks(rsp);
> >
> >         raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +       /* Rewrite if needed in case of slow consoles. */
> >         if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
> >                 WRITE_ONCE(rsp->jiffies_stall,
> >                            jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> > @@ -1548,6 +1551,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >         unsigned long gpnum;
> >         unsigned long gps;
> >         unsigned long j;
> > +       unsigned long jn;
> >         unsigned long js;
> >         struct rcu_node *rnp;
> >
> > @@ -1586,14 +1590,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >             ULONG_CMP_GE(gps, js))
> >                 return; /* No stall or GP completed since entering function. */
> >         rnp = rdp->mynode;
> > +       jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> >         if (rcu_gp_in_progress(rsp) &&
> > -           (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
> > +           (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> > +           cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
> >
> >                 /* We haven't checked in, so go dump stack. */
> >                 print_cpu_stall(rsp);
> >
> >         } else if (rcu_gp_in_progress(rsp) &&
> > -                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
> > +                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> > +                  cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
> >
> >                 /* They had a few time units to dump stack, so complain. */
> >                 print_other_cpu_stall(rsp, gpnum);
> >
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-10 17:02                           ` Paul E. McKenney
@ 2018-04-11 10:06                             ` Dmitry Vyukov
  2018-04-11 19:36                               ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-11 10:06 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >> >> <paulmck@linux.vnet.ibm.com> wrote:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> >> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >
>> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
>> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >
>> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >> >> >> >        Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
>> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >
>> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >> >> >> >>
>> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >>
>> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >>
>> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >
>> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >
>> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> >> >> >> etc.
>> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> >> >> reports (which is bad).
>> >> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> >> >> dead (these things happen every few minutes).
>> >> >> >> >> >
>> >> >> >> >> > I suppose that we could have a global variable that was set to the
>> >> >> >> >> > priority of the complaint in question, which would suppress all
>> >> >> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> >> >> >> > guess that not everyone is going to be happy with one complaint suppressing
>> >> >> >> >> > others, especially given the possibility that the two complaints might
>> >> >> >> >> > be about different things.
>> >> >> >> >> >
>> >> >> >> >> > Or did you have something more deft in mind?
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> syzkaller generally looks only at the first report. One does not know
>> >> >> >> >> if/when there will be a second one, or the second one can be induced
>> >> >> >> >> by the first one, and we generally want clean reports on a non-tainted
>> >> >> >> >> kernel. So we don't just need to suppress lower priority ones, we need
>> >> >> >> >> to produce the right report first.
>> >> >> >> >> I am thinking maybe setting:
>> >> >> >> >>  - rcu stalls at 1.5 minutes
>> >> >> >> >>  - workqueue stalls at 2 minutes
>> >> >> >> >>  - task hungs at 2.5 minutes
>> >> >> >> >>  - and no output whatsoever at 3 minutes
>> >> >> >> >> Do I miss anything? I think at least spinlocks. Should they go before
>> >> >> >> >> or after rcu?
>> >> >> >> >
>> >> >> >> > That is what I know of, but the Linux kernel being what it is, there is
>> >> >> >> > probably something more out there.  If not now, in a few months.  The
>> >> >> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
>> >> >> >> > you probably already knew that.
>> >> >> >>
>> >> >> >> Well, it's all based solely on a large number of patches and stopgaps.
>> >> >> >> If we fix main problems for today, it's already good.
>> >> >> >
>> >> >> > Fair enough!
>> >> >> >
>> >> >> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
>> >> >> >> > was 1.5 -seconds-.  ;-)
>> >> >> >>
>> >> >> >> Have you tried to instrument every basic block with a function call to
>> >> >> >> collect coverage, check every damn memory access for validity, enable
>> >> >> >> all thinkable and unthinkable debug configs and put the insanest load
>> >> >> >> one can imagine from a swarm of parallel threads? It makes things a
>> >> >> >> bit slower ;)
>> >> >> >
>> >> >> > Given that we wouldn't have had enough CPU or memory to accommodate
>> >> >> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
>> >> >> >
>> >> >> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
>> >> >> >> >> Does at least RCU respect the given timeout more or less precisely?
>> >> >> >> >
>> >> >> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
>> >> >> >> > interrupts, it should respect the timeout to within a few jiffies.
>> >> >>
>> >> >>
>> >> >> Hi Paul,
>> >> >>
>> >> >> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
>> >> >>
>> >> >> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
>> >> >> rcu_sched detected stalls on CPUs/tasks:
>> >> >> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
>> >> >> rcu_sched detected stalls on CPUs/tasks:
>> >> >> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
>> >> >> rcu_sched detected stalls on CPUs/tasks:
>> >> >>
>> >> >> or like this:
>> >> >>
>> >> >> INFO: rcu_sched self-detected stall on CPU
>> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
>> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> >> >> softirq=57641/57641 fqs=31151
>> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> >> >> softirq=57641/57641 fqs=31151
>> >> >>  (t=125002 jiffies g=31656 c=31655 q=910)
>> >> >>
>> >> >>  INFO: rcu_sched self-detected stall on CPU
>> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
>> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> >> >> softirq=65194/65194 fqs=31231
>> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> >> >> softirq=65194/65194 fqs=31231
>> >> >>  (t=125002 jiffies g=34421 c=34420 q=1119)
>> >> >> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
>> >> >>
>> >> >>
>> >> >> and then there is an unintelligible mess of 2 reports. Such crashes go
>> >> >> to trash bin, because we can't even say which function hanged. It
>> >> >> seems that in all cases 2 different rcu stall detection facilities
>> >> >> race with each other. Is it possible to make them not race?
>> >> >
>> >> > How about the following (untested, not for mainline) patch?  It suppresses
>> >> > all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
>> >> > rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
>> >> > to synchronize_rcu().  This works well in the common case where there
>> >> > is almost always an RCU grace period in flight.
>> >> >
>> >> > One reason that this patch is not for mainline is that I am working on
>> >> > merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
>> >> > at which point there won't be any races.  But that might be a couple
>> >> > merge windows away from now.
>> >> >
>> >> >                                                         Thanx, Paul
>> >> >
>> >> > ------------------------------------------------------------------------
>> >> >
>> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> >> > index 381b47a68ac6..31f7818f2d63 100644
>> >> > --- a/kernel/rcu/tree.c
>> >> > +++ b/kernel/rcu/tree.c
>> >> > @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>> >> >         struct rcu_node *rnp;
>> >> >
>> >> >         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
>> >> > -           !rcu_gp_in_progress(rsp))
>> >> > +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
>> >> >                 return;
>> >> >         rcu_stall_kick_kthreads(rsp);
>> >> >         j = jiffies;
>> >>
>> >> But doesn't they both relate to the same rcu flavor? They both say
>> >> rcu_sched. I assumed that the difference is "self-detected" vs "on
>> >> CPUs/tasks", i.e. on the current CPU vs on other CPUs.
>> >
>> > Right you are!
>> >
>> > One approach would be to increase the value of RCU_STALL_RAT_DELAY,
>> > which is currently two jiffies to (say) 20 jiffies.  This is in
>> > kernel/rcu/tree.h.  But this would fail on a sufficiently overloaded
>> > system -- and the failure of the two-jiffy delay is a bit of a surprise,
>> > given interrupts disabled and all that.  Are you by any chance loaded
>> > heavily enough to see vCPU preemption?
>> >
>> > I could avoid at least some of these timing issues instead using cmpxchg()
>> > on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
>> > update to discourage overly long stall prints from running into the
>> > next one.  This is not perfect, either, and is roughly equivalent to
>> > setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
>> > avoiding that minute's delay.  But it should get rid of the duplication
>> > in almost all cases, though it could allow a stall warning to overlap
>> > with a later stall warning for that same grace period.  Which can
>> > already happen anyway.  Also, a tens-of-seconds vCPU preemption can
>> > still cause concurrent stall warnings, but if that is happening to you,
>> > the concurrent stall warnings are probably the least of your problems.
>> > Besides, we do need at least one CPU to actually report the stall, which
>> > won't happen if that CPU's vCPU is indefinitely preempted.  So there is
>> > only so much I can do about that particular corner case.
>> >
>> > So how does the following (untested) patch work for you?
>>
>> Looks good to me.
>>
>> We run on VMs, so we can well have vCPU preemption.
>
> Very good!  Please do get me a Tested-by when you get to that point.


Unfortunately I don't have a good way to test it until it's submitted
upstream. While we are seeing thousands of such instances, they happen
episodically on a farm of test machines. But they are still harmful,
especially when the system tries to reproduce a bug, because it's
mid-way through and thinks it got a hook, but then suddenly boom! it
gets some mess that it can't parse and now it does not know if it's
still the same bug, or maybe a different bug triggered by the same
program, so it does not know how to properly attribute the reproducer.
You can see these cases as they happen here (under report/log links in
the table):
https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
When the patch is submitted, the rate should go down.



>                                                         Thanx, Paul
>
>> > ------------------------------------------------------------------------
>> >
>> > commit 6a5ab1e68f8636d8823bb5a9aee35fc44c2be866
>> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> > Date:   Mon Apr 9 11:04:46 2018 -0700
>> >
>> >     rcu: Exclude near-simultaneous RCU CPU stall warnings
>> >
>> >     There is a two-jiffy delay between the time that a CPU will self-report
>> >     an RCU CPU stall warning and the time that some other CPU will report a
>> >     warning on behalf of the first CPU.  This has worked well in the past,
>> >     but on busy systems, it is possible for the two warnings to overlap,
>> >     which makes interpreting them extremely difficult.
>> >
>> >     This commit therefore uses a cmpxchg-based timing decision that
>> >     allows only one report in a given one-minute period (assuming default
>> >     stall-warning Kconfig parameters).  This approach will of course fail
>> >     if you are seeing minute-long vCPU preemption, but in that case the
>> >     overlapping RCU CPU stall warnings are the least of your worries.
>> >
>> >     Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> >
>> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> > index 381b47a68ac6..b7246bcbf633 100644
>> > --- a/kernel/rcu/tree.c
>> > +++ b/kernel/rcu/tree.c
>> > @@ -1429,8 +1429,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>> >                 raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>> >                 return;
>> >         }
>> > -       WRITE_ONCE(rsp->jiffies_stall,
>> > -                  jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>> >         raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>> >
>> >         /*
>> > @@ -1481,6 +1479,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>> >                         sched_show_task(current);
>> >                 }
>> >         }
>> > +       /* Rewrite if needed in case of slow consoles. */
>> > +       if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
>> > +               WRITE_ONCE(rsp->jiffies_stall,
>> > +                          jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>> >
>> >         rcu_check_gp_kthread_starvation(rsp);
>> >
>> > @@ -1525,6 +1527,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
>> >         rcu_dump_cpu_stacks(rsp);
>> >
>> >         raw_spin_lock_irqsave_rcu_node(rnp, flags);
>> > +       /* Rewrite if needed in case of slow consoles. */
>> >         if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
>> >                 WRITE_ONCE(rsp->jiffies_stall,
>> >                            jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>> > @@ -1548,6 +1551,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>> >         unsigned long gpnum;
>> >         unsigned long gps;
>> >         unsigned long j;
>> > +       unsigned long jn;
>> >         unsigned long js;
>> >         struct rcu_node *rnp;
>> >
>> > @@ -1586,14 +1590,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>> >             ULONG_CMP_GE(gps, js))
>> >                 return; /* No stall or GP completed since entering function. */
>> >         rnp = rdp->mynode;
>> > +       jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
>> >         if (rcu_gp_in_progress(rsp) &&
>> > -           (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
>> > +           (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>> > +           cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
>> >
>> >                 /* We haven't checked in, so go dump stack. */
>> >                 print_cpu_stall(rsp);
>> >
>> >         } else if (rcu_gp_in_progress(rsp) &&
>> > -                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
>> > +                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>> > +                  cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
>> >
>> >                 /* They had a few time units to dump stack, so complain. */
>> >                 print_other_cpu_stall(rsp, gpnum);
>> >
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-11 10:06                             ` Dmitry Vyukov
@ 2018-04-11 19:36                               ` Paul E. McKenney
  2018-04-12  9:39                                 ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-11 19:36 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Wed, Apr 11, 2018 at 12:06:27PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> >> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >> >> >> >> >        Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >> >> >> >>
> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >>
> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >>
> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >
> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >
> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> >> >> >> >> etc.
> >> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> >> >> >> reports (which is bad).
> >> >> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> >> >> >> dead (these things happen every few minutes).
> >> >> >> >> >> >
> >> >> >> >> >> > I suppose that we could have a global variable that was set to the
> >> >> >> >> >> > priority of the complaint in question, which would suppress all
> >> >> >> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
> >> >> >> >> >> > guess that not everyone is going to be happy with one complaint suppressing
> >> >> >> >> >> > others, especially given the possibility that the two complaints might
> >> >> >> >> >> > be about different things.
> >> >> >> >> >> >
> >> >> >> >> >> > Or did you have something more deft in mind?
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> syzkaller generally looks only at the first report. One does not know
> >> >> >> >> >> if/when there will be a second one, or the second one can be induced
> >> >> >> >> >> by the first one, and we generally want clean reports on a non-tainted
> >> >> >> >> >> kernel. So we don't just need to suppress lower priority ones, we need
> >> >> >> >> >> to produce the right report first.
> >> >> >> >> >> I am thinking maybe setting:
> >> >> >> >> >>  - rcu stalls at 1.5 minutes
> >> >> >> >> >>  - workqueue stalls at 2 minutes
> >> >> >> >> >>  - task hungs at 2.5 minutes
> >> >> >> >> >>  - and no output whatsoever at 3 minutes
> >> >> >> >> >> Do I miss anything? I think at least spinlocks. Should they go before
> >> >> >> >> >> or after rcu?
> >> >> >> >> >
> >> >> >> >> > That is what I know of, but the Linux kernel being what it is, there is
> >> >> >> >> > probably something more out there.  If not now, in a few months.  The
> >> >> >> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
> >> >> >> >> > you probably already knew that.
> >> >> >> >>
> >> >> >> >> Well, it's all based solely on a large number of patches and stopgaps.
> >> >> >> >> If we fix main problems for today, it's already good.
> >> >> >> >
> >> >> >> > Fair enough!
> >> >> >> >
> >> >> >> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> >> >> >> >> > was 1.5 -seconds-.  ;-)
> >> >> >> >>
> >> >> >> >> Have you tried to instrument every basic block with a function call to
> >> >> >> >> collect coverage, check every damn memory access for validity, enable
> >> >> >> >> all thinkable and unthinkable debug configs and put the insanest load
> >> >> >> >> one can imagine from a swarm of parallel threads? It makes things a
> >> >> >> >> bit slower ;)
> >> >> >> >
> >> >> >> > Given that we wouldn't have had enough CPU or memory to accommodate
> >> >> >> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
> >> >> >> >
> >> >> >> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
> >> >> >> >> >> Does at least RCU respect the given timeout more or less precisely?
> >> >> >> >> >
> >> >> >> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
> >> >> >> >> > interrupts, it should respect the timeout to within a few jiffies.
> >> >> >>
> >> >> >>
> >> >> >> Hi Paul,
> >> >> >>
> >> >> >> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
> >> >> >>
> >> >> >> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
> >> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> >> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
> >> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> >> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
> >> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> >>
> >> >> >> or like this:
> >> >> >>
> >> >> >> INFO: rcu_sched self-detected stall on CPU
> >> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> >> >> softirq=57641/57641 fqs=31151
> >> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> >> >> softirq=57641/57641 fqs=31151
> >> >> >>  (t=125002 jiffies g=31656 c=31655 q=910)
> >> >> >>
> >> >> >>  INFO: rcu_sched self-detected stall on CPU
> >> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> >> >> softirq=65194/65194 fqs=31231
> >> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> >> >> softirq=65194/65194 fqs=31231
> >> >> >>  (t=125002 jiffies g=34421 c=34420 q=1119)
> >> >> >> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
> >> >> >>
> >> >> >>
> >> >> >> and then there is an unintelligible mess of 2 reports. Such crashes go
> >> >> >> to trash bin, because we can't even say which function hanged. It
> >> >> >> seems that in all cases 2 different rcu stall detection facilities
> >> >> >> race with each other. Is it possible to make them not race?
> >> >> >
> >> >> > How about the following (untested, not for mainline) patch?  It suppresses
> >> >> > all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
> >> >> > rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
> >> >> > to synchronize_rcu().  This works well in the common case where there
> >> >> > is almost always an RCU grace period in flight.
> >> >> >
> >> >> > One reason that this patch is not for mainline is that I am working on
> >> >> > merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
> >> >> > at which point there won't be any races.  But that might be a couple
> >> >> > merge windows away from now.
> >> >> >
> >> >> >                                                         Thanx, Paul
> >> >> >
> >> >> > ------------------------------------------------------------------------
> >> >> >
> >> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> >> > index 381b47a68ac6..31f7818f2d63 100644
> >> >> > --- a/kernel/rcu/tree.c
> >> >> > +++ b/kernel/rcu/tree.c
> >> >> > @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >> >> >         struct rcu_node *rnp;
> >> >> >
> >> >> >         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
> >> >> > -           !rcu_gp_in_progress(rsp))
> >> >> > +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
> >> >> >                 return;
> >> >> >         rcu_stall_kick_kthreads(rsp);
> >> >> >         j = jiffies;
> >> >>
> >> >> But doesn't they both relate to the same rcu flavor? They both say
> >> >> rcu_sched. I assumed that the difference is "self-detected" vs "on
> >> >> CPUs/tasks", i.e. on the current CPU vs on other CPUs.
> >> >
> >> > Right you are!
> >> >
> >> > One approach would be to increase the value of RCU_STALL_RAT_DELAY,
> >> > which is currently two jiffies to (say) 20 jiffies.  This is in
> >> > kernel/rcu/tree.h.  But this would fail on a sufficiently overloaded
> >> > system -- and the failure of the two-jiffy delay is a bit of a surprise,
> >> > given interrupts disabled and all that.  Are you by any chance loaded
> >> > heavily enough to see vCPU preemption?
> >> >
> >> > I could avoid at least some of these timing issues instead using cmpxchg()
> >> > on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
> >> > update to discourage overly long stall prints from running into the
> >> > next one.  This is not perfect, either, and is roughly equivalent to
> >> > setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
> >> > avoiding that minute's delay.  But it should get rid of the duplication
> >> > in almost all cases, though it could allow a stall warning to overlap
> >> > with a later stall warning for that same grace period.  Which can
> >> > already happen anyway.  Also, a tens-of-seconds vCPU preemption can
> >> > still cause concurrent stall warnings, but if that is happening to you,
> >> > the concurrent stall warnings are probably the least of your problems.
> >> > Besides, we do need at least one CPU to actually report the stall, which
> >> > won't happen if that CPU's vCPU is indefinitely preempted.  So there is
> >> > only so much I can do about that particular corner case.
> >> >
> >> > So how does the following (untested) patch work for you?
> >>
> >> Looks good to me.
> >>
> >> We run on VMs, so we can well have vCPU preemption.
> >
> > Very good!  Please do get me a Tested-by when you get to that point.
> 
> Unfortunately I don't have a good way to test it until it's submitted
> upstream. While we are seeing thousands of such instances, they happen
> episodically on a farm of test machines. But they are still harmful,
> especially when the system tries to reproduce a bug, because it's
> mid-way through and thinks it got a hook, but then suddenly boom! it
> gets some mess that it can't parse and now it does not know if it's
> still the same bug, or maybe a different bug triggered by the same
> program, so it does not know how to properly attribute the reproducer.
> You can see these cases as they happen here (under report/log links in
> the table):
> https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> When the patch is submitted, the rate should go down.

OK, I will bite...  How do you test fixes to problems that syzkaller finds?

                                                        Thanx, Paul

> >> > ------------------------------------------------------------------------
> >> >
> >> > commit 6a5ab1e68f8636d8823bb5a9aee35fc44c2be866
> >> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> > Date:   Mon Apr 9 11:04:46 2018 -0700
> >> >
> >> >     rcu: Exclude near-simultaneous RCU CPU stall warnings
> >> >
> >> >     There is a two-jiffy delay between the time that a CPU will self-report
> >> >     an RCU CPU stall warning and the time that some other CPU will report a
> >> >     warning on behalf of the first CPU.  This has worked well in the past,
> >> >     but on busy systems, it is possible for the two warnings to overlap,
> >> >     which makes interpreting them extremely difficult.
> >> >
> >> >     This commit therefore uses a cmpxchg-based timing decision that
> >> >     allows only one report in a given one-minute period (assuming default
> >> >     stall-warning Kconfig parameters).  This approach will of course fail
> >> >     if you are seeing minute-long vCPU preemption, but in that case the
> >> >     overlapping RCU CPU stall warnings are the least of your worries.
> >> >
> >> >     Reported-by: Dmitry Vyukov <dvyukov@google.com>
> >> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> >
> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> > index 381b47a68ac6..b7246bcbf633 100644
> >> > --- a/kernel/rcu/tree.c
> >> > +++ b/kernel/rcu/tree.c
> >> > @@ -1429,8 +1429,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
> >> >                 raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >> >                 return;
> >> >         }
> >> > -       WRITE_ONCE(rsp->jiffies_stall,
> >> > -                  jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >> >         raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >> >
> >> >         /*
> >> > @@ -1481,6 +1479,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
> >> >                         sched_show_task(current);
> >> >                 }
> >> >         }
> >> > +       /* Rewrite if needed in case of slow consoles. */
> >> > +       if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
> >> > +               WRITE_ONCE(rsp->jiffies_stall,
> >> > +                          jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >> >
> >> >         rcu_check_gp_kthread_starvation(rsp);
> >> >
> >> > @@ -1525,6 +1527,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> >> >         rcu_dump_cpu_stacks(rsp);
> >> >
> >> >         raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >> > +       /* Rewrite if needed in case of slow consoles. */
> >> >         if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
> >> >                 WRITE_ONCE(rsp->jiffies_stall,
> >> >                            jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >> > @@ -1548,6 +1551,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >> >         unsigned long gpnum;
> >> >         unsigned long gps;
> >> >         unsigned long j;
> >> > +       unsigned long jn;
> >> >         unsigned long js;
> >> >         struct rcu_node *rnp;
> >> >
> >> > @@ -1586,14 +1590,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >> >             ULONG_CMP_GE(gps, js))
> >> >                 return; /* No stall or GP completed since entering function. */
> >> >         rnp = rdp->mynode;
> >> > +       jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> >> >         if (rcu_gp_in_progress(rsp) &&
> >> > -           (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
> >> > +           (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> >> > +           cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
> >> >
> >> >                 /* We haven't checked in, so go dump stack. */
> >> >                 print_cpu_stall(rsp);
> >> >
> >> >         } else if (rcu_gp_in_progress(rsp) &&
> >> > -                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
> >> > +                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> >> > +                  cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
> >> >
> >> >                 /* They had a few time units to dump stack, so complain. */
> >> >                 print_other_cpu_stall(rsp, gpnum);
> >> >
> >>
> >
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-11 19:36                               ` Paul E. McKenney
@ 2018-04-12  9:39                                 ` Dmitry Vyukov
  2018-04-12 15:07                                   ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2018-04-12  9:39 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>> >> >> >> <paulmck@linux.vnet.ibm.com> wrote:
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
>> >> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> >> >> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
>> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
>> >> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >> >> >> >> >        Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
>> >> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >> >
>> >> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> >> >> >> >> etc.
>> >> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> >> >> >> reports (which is bad).
>> >> >> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> >> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> >> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> >> >> >> dead (these things happen every few minutes).
>> >> >> >> >> >> >
>> >> >> >> >> >> > I suppose that we could have a global variable that was set to the
>> >> >> >> >> >> > priority of the complaint in question, which would suppress all
>> >> >> >> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> >> >> >> >> > guess that not everyone is going to be happy with one complaint suppressing
>> >> >> >> >> >> > others, especially given the possibility that the two complaints might
>> >> >> >> >> >> > be about different things.
>> >> >> >> >> >> >
>> >> >> >> >> >> > Or did you have something more deft in mind?
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> syzkaller generally looks only at the first report. One does not know
>> >> >> >> >> >> if/when there will be a second one, or the second one can be induced
>> >> >> >> >> >> by the first one, and we generally want clean reports on a non-tainted
>> >> >> >> >> >> kernel. So we don't just need to suppress lower priority ones, we need
>> >> >> >> >> >> to produce the right report first.
>> >> >> >> >> >> I am thinking maybe setting:
>> >> >> >> >> >>  - rcu stalls at 1.5 minutes
>> >> >> >> >> >>  - workqueue stalls at 2 minutes
>> >> >> >> >> >>  - task hungs at 2.5 minutes
>> >> >> >> >> >>  - and no output whatsoever at 3 minutes
>> >> >> >> >> >> Do I miss anything? I think at least spinlocks. Should they go before
>> >> >> >> >> >> or after rcu?
>> >> >> >> >> >
>> >> >> >> >> > That is what I know of, but the Linux kernel being what it is, there is
>> >> >> >> >> > probably something more out there.  If not now, in a few months.  The
>> >> >> >> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
>> >> >> >> >> > you probably already knew that.
>> >> >> >> >>
>> >> >> >> >> Well, it's all based solely on a large number of patches and stopgaps.
>> >> >> >> >> If we fix main problems for today, it's already good.
>> >> >> >> >
>> >> >> >> > Fair enough!
>> >> >> >> >
>> >> >> >> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
>> >> >> >> >> > was 1.5 -seconds-.  ;-)
>> >> >> >> >>
>> >> >> >> >> Have you tried to instrument every basic block with a function call to
>> >> >> >> >> collect coverage, check every damn memory access for validity, enable
>> >> >> >> >> all thinkable and unthinkable debug configs and put the insanest load
>> >> >> >> >> one can imagine from a swarm of parallel threads? It makes things a
>> >> >> >> >> bit slower ;)
>> >> >> >> >
>> >> >> >> > Given that we wouldn't have had enough CPU or memory to accommodate
>> >> >> >> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
>> >> >> >> >
>> >> >> >> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
>> >> >> >> >> >> Does at least RCU respect the given timeout more or less precisely?
>> >> >> >> >> >
>> >> >> >> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
>> >> >> >> >> > interrupts, it should respect the timeout to within a few jiffies.
>> >> >> >>
>> >> >> >>
>> >> >> >> Hi Paul,
>> >> >> >>
>> >> >> >> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
>> >> >> >>
>> >> >> >> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
>> >> >> >> rcu_sched detected stalls on CPUs/tasks:
>> >> >> >> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
>> >> >> >> rcu_sched detected stalls on CPUs/tasks:
>> >> >> >> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
>> >> >> >> rcu_sched detected stalls on CPUs/tasks:
>> >> >> >>
>> >> >> >> or like this:
>> >> >> >>
>> >> >> >> INFO: rcu_sched self-detected stall on CPU
>> >> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
>> >> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> >> >> >> softirq=57641/57641 fqs=31151
>> >> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
>> >> >> >> softirq=57641/57641 fqs=31151
>> >> >> >>  (t=125002 jiffies g=31656 c=31655 q=910)
>> >> >> >>
>> >> >> >>  INFO: rcu_sched self-detected stall on CPU
>> >> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
>> >> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> >> >> >> softirq=65194/65194 fqs=31231
>> >> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
>> >> >> >> softirq=65194/65194 fqs=31231
>> >> >> >>  (t=125002 jiffies g=34421 c=34420 q=1119)
>> >> >> >> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
>> >> >> >>
>> >> >> >>
>> >> >> >> and then there is an unintelligible mess of 2 reports. Such crashes go
>> >> >> >> to trash bin, because we can't even say which function hanged. It
>> >> >> >> seems that in all cases 2 different rcu stall detection facilities
>> >> >> >> race with each other. Is it possible to make them not race?
>> >> >> >
>> >> >> > How about the following (untested, not for mainline) patch?  It suppresses
>> >> >> > all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
>> >> >> > rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
>> >> >> > to synchronize_rcu().  This works well in the common case where there
>> >> >> > is almost always an RCU grace period in flight.
>> >> >> >
>> >> >> > One reason that this patch is not for mainline is that I am working on
>> >> >> > merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
>> >> >> > at which point there won't be any races.  But that might be a couple
>> >> >> > merge windows away from now.
>> >> >> >
>> >> >> >                                                         Thanx, Paul
>> >> >> >
>> >> >> > ------------------------------------------------------------------------
>> >> >> >
>> >> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> >> >> > index 381b47a68ac6..31f7818f2d63 100644
>> >> >> > --- a/kernel/rcu/tree.c
>> >> >> > +++ b/kernel/rcu/tree.c
>> >> >> > @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>> >> >> >         struct rcu_node *rnp;
>> >> >> >
>> >> >> >         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
>> >> >> > -           !rcu_gp_in_progress(rsp))
>> >> >> > +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
>> >> >> >                 return;
>> >> >> >         rcu_stall_kick_kthreads(rsp);
>> >> >> >         j = jiffies;
>> >> >>
>> >> >> But doesn't they both relate to the same rcu flavor? They both say
>> >> >> rcu_sched. I assumed that the difference is "self-detected" vs "on
>> >> >> CPUs/tasks", i.e. on the current CPU vs on other CPUs.
>> >> >
>> >> > Right you are!
>> >> >
>> >> > One approach would be to increase the value of RCU_STALL_RAT_DELAY,
>> >> > which is currently two jiffies to (say) 20 jiffies.  This is in
>> >> > kernel/rcu/tree.h.  But this would fail on a sufficiently overloaded
>> >> > system -- and the failure of the two-jiffy delay is a bit of a surprise,
>> >> > given interrupts disabled and all that.  Are you by any chance loaded
>> >> > heavily enough to see vCPU preemption?
>> >> >
>> >> > I could avoid at least some of these timing issues instead using cmpxchg()
>> >> > on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
>> >> > update to discourage overly long stall prints from running into the
>> >> > next one.  This is not perfect, either, and is roughly equivalent to
>> >> > setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
>> >> > avoiding that minute's delay.  But it should get rid of the duplication
>> >> > in almost all cases, though it could allow a stall warning to overlap
>> >> > with a later stall warning for that same grace period.  Which can
>> >> > already happen anyway.  Also, a tens-of-seconds vCPU preemption can
>> >> > still cause concurrent stall warnings, but if that is happening to you,
>> >> > the concurrent stall warnings are probably the least of your problems.
>> >> > Besides, we do need at least one CPU to actually report the stall, which
>> >> > won't happen if that CPU's vCPU is indefinitely preempted.  So there is
>> >> > only so much I can do about that particular corner case.
>> >> >
>> >> > So how does the following (untested) patch work for you?
>> >>
>> >> Looks good to me.
>> >>
>> >> We run on VMs, so we can well have vCPU preemption.
>> >
>> > Very good!  Please do get me a Tested-by when you get to that point.
>>
>> Unfortunately I don't have a good way to test it until it's submitted
>> upstream. While we are seeing thousands of such instances, they happen
>> episodically on a farm of test machines. But they are still harmful,
>> especially when the system tries to reproduce a bug, because it's
>> mid-way through and thinks it got a hook, but then suddenly boom! it
>> gets some mess that it can't parse and now it does not know if it's
>> still the same bug, or maybe a different bug triggered by the same
>> program, so it does not know how to properly attribute the reproducer.
>> You can see these cases as they happen here (under report/log links in
>> the table):
>> https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
>> When the patch is submitted, the rate should go down.
>
> OK, I will bite...  How do you test fixes to problems that syzkaller finds?

I don't. I can't. No one can test that many fixes.

Normally syzbot provides reproducers for bugs. Then you have 2
choices: (1) test it yourself (if you debugged it, you probably
already have everything setup for this), or (2) ask syzbot to test the
patch on this particular reproducer.
Some bugs don't have reproducers. Then you either localize the bug and
write a test, or go with the old good "it must be correct, right?".
Even for the second case, syzbot will notify if the bug happens again
after the fix is landed, or it's silent, then presumably the fix
indeed fixed the bug.

Now, this is not a syzbot bug (syzbot reports bugs itself from own
email address). This is more like you looked at somebody else dmsg and
like "oh, this looks bad, let me copy-paste and report it".
So can also go with the old good "it must be correct, right?" and
assess how well it goes after few weeks when it reaches syzbot, or
someone needs to write a test for rcu.

This could have been handled with some kind of "cluster-wide" test,
but I don't see how it is feasible. See this for details:
https://groups.google.com/d/msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ
Especially the part that someone will need to go through and triage
hundreds of crashes and assess that they are not related to the new
patch, and do something with then afterwards.




>> >> > ------------------------------------------------------------------------
>> >> >
>> >> > commit 6a5ab1e68f8636d8823bb5a9aee35fc44c2be866
>> >> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> >> > Date:   Mon Apr 9 11:04:46 2018 -0700
>> >> >
>> >> >     rcu: Exclude near-simultaneous RCU CPU stall warnings
>> >> >
>> >> >     There is a two-jiffy delay between the time that a CPU will self-report
>> >> >     an RCU CPU stall warning and the time that some other CPU will report a
>> >> >     warning on behalf of the first CPU.  This has worked well in the past,
>> >> >     but on busy systems, it is possible for the two warnings to overlap,
>> >> >     which makes interpreting them extremely difficult.
>> >> >
>> >> >     This commit therefore uses a cmpxchg-based timing decision that
>> >> >     allows only one report in a given one-minute period (assuming default
>> >> >     stall-warning Kconfig parameters).  This approach will of course fail
>> >> >     if you are seeing minute-long vCPU preemption, but in that case the
>> >> >     overlapping RCU CPU stall warnings are the least of your worries.
>> >> >
>> >> >     Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> >> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> >> >
>> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> >> > index 381b47a68ac6..b7246bcbf633 100644
>> >> > --- a/kernel/rcu/tree.c
>> >> > +++ b/kernel/rcu/tree.c
>> >> > @@ -1429,8 +1429,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>> >> >                 raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>> >> >                 return;
>> >> >         }
>> >> > -       WRITE_ONCE(rsp->jiffies_stall,
>> >> > -                  jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>> >> >         raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>> >> >
>> >> >         /*
>> >> > @@ -1481,6 +1479,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>> >> >                         sched_show_task(current);
>> >> >                 }
>> >> >         }
>> >> > +       /* Rewrite if needed in case of slow consoles. */
>> >> > +       if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
>> >> > +               WRITE_ONCE(rsp->jiffies_stall,
>> >> > +                          jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>> >> >
>> >> >         rcu_check_gp_kthread_starvation(rsp);
>> >> >
>> >> > @@ -1525,6 +1527,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
>> >> >         rcu_dump_cpu_stacks(rsp);
>> >> >
>> >> >         raw_spin_lock_irqsave_rcu_node(rnp, flags);
>> >> > +       /* Rewrite if needed in case of slow consoles. */
>> >> >         if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
>> >> >                 WRITE_ONCE(rsp->jiffies_stall,
>> >> >                            jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>> >> > @@ -1548,6 +1551,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>> >> >         unsigned long gpnum;
>> >> >         unsigned long gps;
>> >> >         unsigned long j;
>> >> > +       unsigned long jn;
>> >> >         unsigned long js;
>> >> >         struct rcu_node *rnp;
>> >> >
>> >> > @@ -1586,14 +1590,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>> >> >             ULONG_CMP_GE(gps, js))
>> >> >                 return; /* No stall or GP completed since entering function. */
>> >> >         rnp = rdp->mynode;
>> >> > +       jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
>> >> >         if (rcu_gp_in_progress(rsp) &&
>> >> > -           (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
>> >> > +           (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>> >> > +           cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
>> >> >
>> >> >                 /* We haven't checked in, so go dump stack. */
>> >> >                 print_cpu_stall(rsp);
>> >> >
>> >> >         } else if (rcu_gp_in_progress(rsp) &&
>> >> > -                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
>> >> > +                  ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>> >> > +                  cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
>> >> >
>> >> >                 /* They had a few time units to dump stack, so complain. */
>> >> >                 print_other_cpu_stall(rsp, gpnum);
>> >> >
>> >>
>> >
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: INFO: task hung in perf_trace_event_unreg
  2018-04-12  9:39                                 ` Dmitry Vyukov
@ 2018-04-12 15:07                                   ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2018-04-12 15:07 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs,
	Peter Zijlstra, syzkaller

On Thu, Apr 12, 2018 at 11:39:42AM +0200, Dmitry Vyukov wrote:
> On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> >> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> >> >> >> >> >> >> > Reported-by: syzbot+2dbc55da20fa246378fd@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this is just syzkaller
> >> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >> >> >> >> >> >        Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x80000002
> >> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> >> >> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> >> >> >> >> >> etc.
> >> >> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> >> >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> >> >> >> >> reports (which is bad).
> >> >> >> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> >> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> >> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> >> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> >> >> >> >> dead (these things happen every few minutes).
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > I suppose that we could have a global variable that was set to the
> >> >> >> >> >> >> > priority of the complaint in question, which would suppress all
> >> >> >> >> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
> >> >> >> >> >> >> > guess that not everyone is going to be happy with one complaint suppressing
> >> >> >> >> >> >> > others, especially given the possibility that the two complaints might
> >> >> >> >> >> >> > be about different things.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Or did you have something more deft in mind?
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> syzkaller generally looks only at the first report. One does not know
> >> >> >> >> >> >> if/when there will be a second one, or the second one can be induced
> >> >> >> >> >> >> by the first one, and we generally want clean reports on a non-tainted
> >> >> >> >> >> >> kernel. So we don't just need to suppress lower priority ones, we need
> >> >> >> >> >> >> to produce the right report first.
> >> >> >> >> >> >> I am thinking maybe setting:
> >> >> >> >> >> >>  - rcu stalls at 1.5 minutes
> >> >> >> >> >> >>  - workqueue stalls at 2 minutes
> >> >> >> >> >> >>  - task hungs at 2.5 minutes
> >> >> >> >> >> >>  - and no output whatsoever at 3 minutes
> >> >> >> >> >> >> Do I miss anything? I think at least spinlocks. Should they go before
> >> >> >> >> >> >> or after rcu?
> >> >> >> >> >> >
> >> >> >> >> >> > That is what I know of, but the Linux kernel being what it is, there is
> >> >> >> >> >> > probably something more out there.  If not now, in a few months.  The
> >> >> >> >> >> > RCU CPU stall timeout can be set on the kernel-boot command line, but
> >> >> >> >> >> > you probably already knew that.
> >> >> >> >> >>
> >> >> >> >> >> Well, it's all based solely on a large number of patches and stopgaps.
> >> >> >> >> >> If we fix main problems for today, it's already good.
> >> >> >> >> >
> >> >> >> >> > Fair enough!
> >> >> >> >> >
> >> >> >> >> >> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> >> >> >> >> >> > was 1.5 -seconds-.  ;-)
> >> >> >> >> >>
> >> >> >> >> >> Have you tried to instrument every basic block with a function call to
> >> >> >> >> >> collect coverage, check every damn memory access for validity, enable
> >> >> >> >> >> all thinkable and unthinkable debug configs and put the insanest load
> >> >> >> >> >> one can imagine from a swarm of parallel threads? It makes things a
> >> >> >> >> >> bit slower ;)
> >> >> >> >> >
> >> >> >> >> > Given that we wouldn't have had enough CPU or memory to accommodate
> >> >> >> >> > all of that back in DYNIX/ptx days, I am forced to answer "no".  ;-)
> >> >> >> >> >
> >> >> >> >> >> >> This will require fixing task hung. Have not yet looked at workqueue detector.
> >> >> >> >> >> >> Does at least RCU respect the given timeout more or less precisely?
> >> >> >> >> >> >
> >> >> >> >> >> > Assuming that there is at least one CPU capable of taking scheduling-clock
> >> >> >> >> >> > interrupts, it should respect the timeout to within a few jiffies.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Hi Paul,
> >> >> >> >>
> >> >> >> >> Speaking of stalls and rcu, we are seeing lots of crashes that go like this:
> >> >> >> >>
> >> >> >> >> INFO: rcu_sched self-detected stall on CPU[  404.992530] INFO:
> >> >> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> >> >> INFO: rcu_sched self-detected stall on CPU[  454.347448] INFO:
> >> >> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> >> >> INFO: rcu_sched self-detected stall on CPU[  396.073634] INFO:
> >> >> >> >> rcu_sched detected stalls on CPUs/tasks:
> >> >> >> >>
> >> >> >> >> or like this:
> >> >> >> >>
> >> >> >> >> INFO: rcu_sched self-detected stall on CPU
> >> >> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> >> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> >> >> >> softirq=57641/57641 fqs=31151
> >> >> >> >> 0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
> >> >> >> >> softirq=57641/57641 fqs=31151
> >> >> >> >>  (t=125002 jiffies g=31656 c=31655 q=910)
> >> >> >> >>
> >> >> >> >>  INFO: rcu_sched self-detected stall on CPU
> >> >> >> >> INFO: rcu_sched detected stalls on CPUs/tasks:
> >> >> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> >> >> >> softirq=65194/65194 fqs=31231
> >> >> >> >> 0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
> >> >> >> >> softirq=65194/65194 fqs=31231
> >> >> >> >>  (t=125002 jiffies g=34421 c=34420 q=1119)
> >> >> >> >> (detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> and then there is an unintelligible mess of 2 reports. Such crashes go
> >> >> >> >> to trash bin, because we can't even say which function hanged. It
> >> >> >> >> seems that in all cases 2 different rcu stall detection facilities
> >> >> >> >> race with each other. Is it possible to make them not race?
> >> >> >> >
> >> >> >> > How about the following (untested, not for mainline) patch?  It suppresses
> >> >> >> > all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
> >> >> >> > rcu_preempt otherwise.  Either way, this is the RCU flavor corresponding
> >> >> >> > to synchronize_rcu().  This works well in the common case where there
> >> >> >> > is almost always an RCU grace period in flight.
> >> >> >> >
> >> >> >> > One reason that this patch is not for mainline is that I am working on
> >> >> >> > merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
> >> >> >> > at which point there won't be any races.  But that might be a couple
> >> >> >> > merge windows away from now.
> >> >> >> >
> >> >> >> >                                                         Thanx, Paul
> >> >> >> >
> >> >> >> > ------------------------------------------------------------------------
> >> >> >> >
> >> >> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> >> >> > index 381b47a68ac6..31f7818f2d63 100644
> >> >> >> > --- a/kernel/rcu/tree.c
> >> >> >> > +++ b/kernel/rcu/tree.c
> >> >> >> > @@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
> >> >> >> >         struct rcu_node *rnp;
> >> >> >> >
> >> >> >> >         if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
> >> >> >> > -           !rcu_gp_in_progress(rsp))
> >> >> >> > +           !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
> >> >> >> >                 return;
> >> >> >> >         rcu_stall_kick_kthreads(rsp);
> >> >> >> >         j = jiffies;
> >> >> >>
> >> >> >> But doesn't they both relate to the same rcu flavor? They both say
> >> >> >> rcu_sched. I assumed that the difference is "self-detected" vs "on
> >> >> >> CPUs/tasks", i.e. on the current CPU vs on other CPUs.
> >> >> >
> >> >> > Right you are!
> >> >> >
> >> >> > One approach would be to increase the value of RCU_STALL_RAT_DELAY,
> >> >> > which is currently two jiffies to (say) 20 jiffies.  This is in
> >> >> > kernel/rcu/tree.h.  But this would fail on a sufficiently overloaded
> >> >> > system -- and the failure of the two-jiffy delay is a bit of a surprise,
> >> >> > given interrupts disabled and all that.  Are you by any chance loaded
> >> >> > heavily enough to see vCPU preemption?
> >> >> >
> >> >> > I could avoid at least some of these timing issues instead using cmpxchg()
> >> >> > on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
> >> >> > update to discourage overly long stall prints from running into the
> >> >> > next one.  This is not perfect, either, and is roughly equivalent to
> >> >> > setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
> >> >> > avoiding that minute's delay.  But it should get rid of the duplication
> >> >> > in almost all cases, though it could allow a stall warning to overlap
> >> >> > with a later stall warning for that same grace period.  Which can
> >> >> > already happen anyway.  Also, a tens-of-seconds vCPU preemption can
> >> >> > still cause concurrent stall warnings, but if that is happening to you,
> >> >> > the concurrent stall warnings are probably the least of your problems.
> >> >> > Besides, we do need at least one CPU to actually report the stall, which
> >> >> > won't happen if that CPU's vCPU is indefinitely preempted.  So there is
> >> >> > only so much I can do about that particular corner case.
> >> >> >
> >> >> > So how does the following (untested) patch work for you?
> >> >>
> >> >> Looks good to me.
> >> >>
> >> >> We run on VMs, so we can well have vCPU preemption.
> >> >
> >> > Very good!  Please do get me a Tested-by when you get to that point.
> >>
> >> Unfortunately I don't have a good way to test it until it's submitted
> >> upstream. While we are seeing thousands of such instances, they happen
> >> episodically on a farm of test machines. But they are still harmful,
> >> especially when the system tries to reproduce a bug, because it's
> >> mid-way through and thinks it got a hook, but then suddenly boom! it
> >> gets some mess that it can't parse and now it does not know if it's
> >> still the same bug, or maybe a different bug triggered by the same
> >> program, so it does not know how to properly attribute the reproducer.
> >> You can see these cases as they happen here (under report/log links in
> >> the table):
> >> https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> >> When the patch is submitted, the rate should go down.
> >
> > OK, I will bite...  How do you test fixes to problems that syzkaller finds?
> 
> I don't. I can't. No one can test that many fixes.
> 
> Normally syzbot provides reproducers for bugs. Then you have 2
> choices: (1) test it yourself (if you debugged it, you probably
> already have everything setup for this), or (2) ask syzbot to test the
> patch on this particular reproducer.
> Some bugs don't have reproducers. Then you either localize the bug and
> write a test, or go with the old good "it must be correct, right?".
> Even for the second case, syzbot will notify if the bug happens again
> after the fix is landed, or it's silent, then presumably the fix
> indeed fixed the bug.
> 
> Now, this is not a syzbot bug (syzbot reports bugs itself from own
> email address). This is more like you looked at somebody else dmsg and
> like "oh, this looks bad, let me copy-paste and report it".
> So can also go with the old good "it must be correct, right?" and
> assess how well it goes after few weeks when it reaches syzbot, or
> someone needs to write a test for rcu.
> 
> This could have been handled with some kind of "cluster-wide" test,
> but I don't see how it is feasible. See this for details:
> https://groups.google.com/d/msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ
> Especially the part that someone will need to go through and triage
> hundreds of crashes and assess that they are not related to the new
> patch, and do something with then afterwards.

Fair enough, and apologies for the hassle.  I don't expect that the
patch will be controversial, so it should go into the next merge
window.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-04-12 15:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-02  9:20 INFO: task hung in perf_trace_event_unreg syzbot
2018-04-02 13:40 ` Steven Rostedt
2018-04-02 15:33   ` Paul E. McKenney
2018-04-02 16:04     ` Dmitry Vyukov
2018-04-02 16:21       ` Paul E. McKenney
2018-04-02 16:32         ` Dmitry Vyukov
2018-04-02 16:39           ` Paul E. McKenney
2018-04-02 17:11             ` Dmitry Vyukov
2018-04-02 17:23               ` Paul E. McKenney
2018-04-09 12:54                 ` Dmitry Vyukov
2018-04-09 16:20                   ` Paul E. McKenney
2018-04-09 16:28                     ` Dmitry Vyukov
2018-04-09 18:11                       ` Paul E. McKenney
2018-04-10 11:13                         ` Dmitry Vyukov
2018-04-10 17:02                           ` Paul E. McKenney
2018-04-11 10:06                             ` Dmitry Vyukov
2018-04-11 19:36                               ` Paul E. McKenney
2018-04-12  9:39                                 ` Dmitry Vyukov
2018-04-12 15:07                                   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).