All of lore.kernel.org
 help / color / mirror / Atom feed
* INFO: task hung in generic_file_write_iter
@ 2018-07-18  8:58 syzbot
  2018-07-18 10:28 ` Tetsuo Handa
  0 siblings, 1 reply; 30+ messages in thread
From: syzbot @ 2018-07-18  8:58 UTC (permalink / raw)
  To: ak, akpm, jack, jlayton, linux-kernel, linux-mm, mawilcox,
	mgorman, syzkaller-bugs, tim.c.chen

Hello,

syzbot found the following crash on:

HEAD commit:    30b06abfb92b Merge tag 'pinctrl-v4.18-3' of git://git.kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1240ed62400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6d0ccc9273f0e539
dashboard link: https://syzkaller.appspot.com/bug?extid=9933e4476f365f5d5a1b
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com

IPVS: ftp: loaded support on port[0] = 21
mmap: syz-executor7 (10902) uses deprecated remap_file_pages() syscall. See  
Documentation/vm/remap_file_pages.rst.
Process accounting resumed
INFO: task syz-executor0:4538 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #151
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor0   D18664  4538      1 0x80000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  __rwsem_down_write_failed_common+0x95d/0x1630  
kernel/locking/rwsem-xadd.c:566
  rwsem_down_write_failed+0xe/0x10 kernel/locking/rwsem-xadd.c:595
  call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
  __down_write arch/x86/include/asm/rwsem.h:142 [inline]
  down_write+0xaa/0x130 kernel/locking/rwsem.c:72
  inode_lock include/linux/fs.h:715 [inline]
  generic_file_write_iter+0xed/0x870 mm/filemap.c:3289
  call_write_iter include/linux/fs.h:1793 [inline]
  new_sync_write fs/read_write.c:474 [inline]
  __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
  __kernel_write+0x10c/0x380 fs/read_write.c:506
  do_acct_process+0x1148/0x1660 kernel/acct.c:520
  slow_acct_process kernel/acct.c:579 [inline]
  acct_process+0x5f7/0x770 kernel/acct.c:605
  do_exit+0x1ae0/0x2750 kernel/exit.c:855
  do_group_exit+0x177/0x440 kernel/exit.c:968
  get_signal+0x88e/0x1970 kernel/signal.c:2468
  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
  exit_to_usermode_loop+0x2e0/0x370 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x40fdba
Code: f0 e8 ce d0 04 00 48 8b 6d 00 31 c9 48 8d 54 24 68 48 8b 5c 24 38 48  
be 00 20 00 20 84 00 00 00 eb 22 48 8b bc 24 90 00 00 00 <48> 8d 57 08 48  
8b 7c 24 60 48 8d 4f 01 48 89 c3 48 8b 74 24 30 48
RSP: 002b:00007fff0597c4f8 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
RAX: fffffffffffffe00 RBX: 0000000000032da8 RCX: 000000000040fdba
RDX: 0000000040000000 RSI: 00007fff0597c514 RDI: ffffffffffffffff
RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000184c940
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
R13: 000000000000022a R14: 00007fff0597cba0 R15: 0000000000032cf4
INFO: task syz-executor4:10872 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #151
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor4   D22192 10872   4545 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3603
  __mutex_lock_common kernel/locking/mutex.c:834 [inline]
  __mutex_lock+0xede/0x1820 kernel/locking/mutex.c:894
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
  __do_sys_acct kernel/acct.c:285 [inline]
  __se_sys_acct kernel/acct.c:273 [inline]
  __x64_sys_acct+0xba/0x1f0 kernel/acct.c:273
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: e0 1f 48 89 04 24 e8 b6 6f fd ff e8 81 6a fd ff e8 5c 68 fd ff 48 8d  
05 23 cd 48 00 48 89 04 24 48 c7 44 24 08 1d 00 00 00 e8 <13> 5e fd ff 0f  
0b e8 8c 44 00 00 e9 07 f0 ff ff cc cc cc cc cc cc
RSP: 002b:00007f315a9dbc68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a3
RAX: ffffffffffffffda RBX: 00007f315a9dc6d4 RCX: 0000000000455ab9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bb522 R14: 00000000004c8788 R15: 0000000000000000
INFO: task syz-executor4:10927 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #151
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor4   D23416 10927   4545 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3603
  __mutex_lock_common kernel/locking/mutex.c:834 [inline]
  __mutex_lock+0xede/0x1820 kernel/locking/mutex.c:894
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
  __do_sys_acct kernel/acct.c:285 [inline]
  __se_sys_acct kernel/acct.c:273 [inline]
  __x64_sys_acct+0xba/0x1f0 kernel/acct.c:273
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: Bad RIP value.
RSP: 002b:00007f315a9bac68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a3
RAX: ffffffffffffffda RBX: 00007f315a9bb6d4 RCX: 0000000000455ab9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 000000000072bf48 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bb522 R14: 00000000004c8788 R15: 0000000000000001
INFO: task syz-executor1:10874 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #151
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor1   D22192 10874   4540 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3603
  __mutex_lock_common kernel/locking/mutex.c:834 [inline]
  __mutex_lock+0xede/0x1820 kernel/locking/mutex.c:894
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
  __do_sys_acct kernel/acct.c:285 [inline]
  __se_sys_acct kernel/acct.c:273 [inline]
  __x64_sys_acct+0xba/0x1f0 kernel/acct.c:273
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: Bad RIP value.
RSP: 002b:00007feb44d73c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a3
RAX: ffffffffffffffda RBX: 00007feb44d746d4 RCX: 0000000000455ab9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bb522 R14: 00000000004c8788 R15: 0000000000000000
INFO: task syz-executor1:10928 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #151
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor1   D23416 10928   4540 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3603
  __mutex_lock_common kernel/locking/mutex.c:834 [inline]
  __mutex_lock+0xede/0x1820 kernel/locking/mutex.c:894
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
  __do_sys_acct kernel/acct.c:285 [inline]
  __se_sys_acct kernel/acct.c:273 [inline]
  __x64_sys_acct+0xba/0x1f0 kernel/acct.c:273
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: Bad RIP value.
RSP: 002b:00007feb44d52c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a3
RAX: ffffffffffffffda RBX: 00007feb44d536d4 RCX: 0000000000455ab9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 000000000072bf48 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bb522 R14: 00000000004c8788 R15: 0000000000000001
INFO: task syz-executor2:10880 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #151
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor2   D22192 10880   4541 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3603
  __mutex_lock_common kernel/locking/mutex.c:834 [inline]
  __mutex_lock+0xede/0x1820 kernel/locking/mutex.c:894
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
  __do_sys_acct kernel/acct.c:285 [inline]
  __se_sys_acct kernel/acct.c:273 [inline]
  __x64_sys_acct+0xba/0x1f0 kernel/acct.c:273
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: Bad RIP value.
RSP: 002b:00007fcc59e8ac68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a3
RAX: ffffffffffffffda RBX: 00007fcc59e8b6d4 RCX: 0000000000455ab9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bb522 R14: 00000000004c8788 R15: 0000000000000000
INFO: task syz-executor2:10939 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #151
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor2   D25408 10939   4541 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3603
  __mutex_lock_common kernel/locking/mutex.c:834 [inline]
  __mutex_lock+0xede/0x1820 kernel/locking/mutex.c:894
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
  __do_sys_acct kernel/acct.c:285 [inline]
  __se_sys_acct kernel/acct.c:273 [inline]
  __x64_sys_acct+0xba/0x1f0 kernel/acct.c:273
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: Bad RIP value.
RSP: 002b:00007fcc59e48c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a3
RAX: ffffffffffffffda RBX: 00007fcc59e496d4 RCX: 0000000000455ab9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 000000000072bff0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bb522 R14: 00000000004c8788 R15: 0000000000000002

Showing all locks held in the system:
1 lock held by khungtaskd/902:
  #0: 00000000c440fc08 (rcu_read_lock){....}, at:  
debug_show_all_locks+0xd0/0x428 kernel/locking/lockdep.c:4461
2 locks held by rs:main Q:Reg/4400:
  #0: 000000007117f10a (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200  
fs/file.c:766
  #1: 00000000a5ecbe98 (sb_writers#6){.+.+}, at: file_start_write  
include/linux/fs.h:2737 [inline]
  #1: 00000000a5ecbe98 (sb_writers#6){.+.+}, at: vfs_write+0x452/0x560  
fs/read_write.c:548
1 lock held by rsyslogd/4402:
  #0: 00000000eb1d59fa (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200  
fs/file.c:766
2 locks held by getty/4492:
  #0: 00000000469be283 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000b87f84ff (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4493:
  #0: 0000000038dacc5f (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000c5ccdb2a (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4494:
  #0: 00000000e4f8deb7 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000662526e8 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4495:
  #0: 000000004075d763 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000d2e8b65d (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4496:
  #0: 000000009840c6a9 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 0000000026d8298c (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4497:
  #0: 0000000048ba6bbd (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 000000000cb89ffd (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4498:
  #0: 000000001cf7a97e (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000ab83fb73 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
3 locks held by syz-executor0/4538:
  #0: 00000000eedbf98e (&acct->lock#2){+.+.}, at: acct_get kernel/acct.c:161  
[inline]
  #0: 00000000eedbf98e (&acct->lock#2){+.+.}, at: slow_acct_process  
kernel/acct.c:577 [inline]
  #0: 00000000eedbf98e (&acct->lock#2){+.+.}, at: acct_process+0x3c2/0x770  
kernel/acct.c:605
  #1: 000000001265a882 (sb_writers#13){.+.+}, at: file_start_write_trylock  
include/linux/fs.h:2744 [inline]
  #1: 000000001265a882 (sb_writers#13){.+.+}, at:  
do_acct_process+0x133c/0x1660 kernel/acct.c:517
  #2: 00000000824eb913 (&sb->s_type->i_mutex_key#18){++++}, at: inode_lock  
include/linux/fs.h:715 [inline]
  #2: 00000000824eb913 (&sb->s_type->i_mutex_key#18){++++}, at:  
generic_file_write_iter+0xed/0x870 mm/filemap.c:3289
6 locks held by syz-executor0/10863:
1 lock held by syz-executor4/10872:
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __do_sys_acct  
kernel/acct.c:285 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __se_sys_acct  
kernel/acct.c:273 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __x64_sys_acct+0xba/0x1f0  
kernel/acct.c:273
1 lock held by syz-executor4/10927:
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __do_sys_acct  
kernel/acct.c:285 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __se_sys_acct  
kernel/acct.c:273 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __x64_sys_acct+0xba/0x1f0  
kernel/acct.c:273
1 lock held by syz-executor1/10874:
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __do_sys_acct  
kernel/acct.c:285 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __se_sys_acct  
kernel/acct.c:273 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __x64_sys_acct+0xba/0x1f0  
kernel/acct.c:273
1 lock held by syz-executor1/10928:
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __do_sys_acct  
kernel/acct.c:285 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __se_sys_acct  
kernel/acct.c:273 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __x64_sys_acct+0xba/0x1f0  
kernel/acct.c:273
1 lock held by syz-executor2/10880:
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __do_sys_acct  
kernel/acct.c:285 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __se_sys_acct  
kernel/acct.c:273 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __x64_sys_acct+0xba/0x1f0  
kernel/acct.c:273
1 lock held by syz-executor2/10939:
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __do_sys_acct  
kernel/acct.c:285 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __se_sys_acct  
kernel/acct.c:273 [inline]
  #0: 00000000a79969e4 (acct_on_mutex){+.+.}, at: __x64_sys_acct+0xba/0x1f0  
kernel/acct.c:273

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 902 Comm: khungtaskd Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:196 [inline]
  watchdog+0x9c4/0xf80 kernel/hung_task.c:252
  kthread+0x345/0x410 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
Code: e2 00 01 1f 00 48 8b 75 08 75 2b 8b 90 90 12 00 00 83 fa 02 75 20 48  
8b 88 98 12 00 00 8b 80 94 12 00 00 48 8b 11 48 83 c2 01 <48> 39 d0 76 07  
48 89 34 d1 48 89 11 5d c3 0f 1f 00 55 40 0f b6 d6
RSP: 0018:ffff8801913be760 EFLAGS: 00000216
RAX: 0000000000040000 RBX: ffff88018e06b348 RCX: ffffc90001e34000
RDX: 0000000000040000 RSI: ffffffff81d44df2 RDI: 0000000000000007
RBP: ffff8801913be760 R08: ffff8801b465c540 R09: ffffed003b5e46d6
R10: 0000000000000003 R11: 0000000000000006 R12: dffffc0000000000
R13: 0000000000000042 R14: 0000000000000001 R15: 0000000000000020
FS:  00007f53ece9c700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000119f218 CR3: 00000001ba743000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  init_page_buffers+0x3e2/0x530 fs/buffer.c:904
  grow_dev_page fs/buffer.c:947 [inline]
  grow_buffers fs/buffer.c:1009 [inline]
  __getblk_slow fs/buffer.c:1036 [inline]
  __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
  __bread_gfp+0x2d/0x310 fs/buffer.c:1347
  sb_bread include/linux/buffer_head.h:307 [inline]
  fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
  fat_ent_read_block fs/fat/fatent.c:441 [inline]
  fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
  fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
  __fat_get_block fs/fat/inode.c:148 [inline]
  fat_get_block+0x375/0xaf0 fs/fat/inode.c:183
  __block_write_begin_int+0x50d/0x1b00 fs/buffer.c:1958
  __block_write_begin fs/buffer.c:2008 [inline]
  block_write_begin+0xda/0x370 fs/buffer.c:2067
  cont_write_begin+0x569/0x860 fs/buffer.c:2417
  fat_write_begin+0x8d/0x120 fs/fat/inode.c:229
  generic_perform_write+0x3ae/0x6c0 mm/filemap.c:3139
  __generic_file_write_iter+0x26e/0x630 mm/filemap.c:3264
  generic_file_write_iter+0x438/0x870 mm/filemap.c:3292
  call_write_iter include/linux/fs.h:1793 [inline]
  new_sync_write fs/read_write.c:474 [inline]
  __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
  __kernel_write+0x10c/0x380 fs/read_write.c:506
  do_acct_process+0x1148/0x1660 kernel/acct.c:520
  acct_pin_kill+0x2e/0x100 kernel/acct.c:174
  pin_kill+0x29f/0xb60 fs/fs_pin.c:50
  acct_on+0x63b/0x8b0 kernel/acct.c:254
  __do_sys_acct kernel/acct.c:286 [inline]
  __se_sys_acct kernel/acct.c:273 [inline]
  __x64_sys_acct+0xc2/0x1f0 kernel/acct.c:273
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f53ece9bc68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a3
RAX: ffffffffffffffda RBX: 00007f53ece9c6d4 RCX: 0000000000455ab9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004bb522 R14: 00000000004c8788 R15: 0000000000000000
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.663  
msecs


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-07-18  8:58 INFO: task hung in generic_file_write_iter syzbot
@ 2018-07-18 10:28 ` Tetsuo Handa
  2018-07-18 10:36   ` Dmitry Vyukov
  2018-07-20 10:36     ` Tetsuo Handa
  0 siblings, 2 replies; 30+ messages in thread
From: Tetsuo Handa @ 2018-07-18 10:28 UTC (permalink / raw)
  To: syzbot, akpm, linux-mm, mgorman, Michal Hocko
  Cc: ak, jack, jlayton, linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen

On 2018/07/18 17:58, syzbot wrote:
> mmap: syz-executor7 (10902) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.rst.

There are many reports which are stalling inside __getblk_gfp().
And there is horrible comment for __getblk_gfp():

  /*
   * __getblk_gfp() will locate (and, if necessary, create) the buffer_head
   * which corresponds to the passed block_device, block and size. The
   * returned buffer has its reference count incremented.
   *
   * __getblk_gfp() will lock up the machine if grow_dev_page's
   * try_to_free_buffers() attempt is failing.  FIXME, perhaps?
   */

This report is stalling after mount() completed and process used remap_file_pages().
I think that we might need to use debug printk(). But I don't know what to examine.

> CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
> Code: e2 00 01 1f 00 48 8b 75 08 75 2b 8b 90 90 12 00 00 83 fa 02 75 20 48 8b 88 98 12 00 00 8b 80 94 12 00 00 48 8b 11 48 83 c2 01 <48> 39 d0 76 07 48 89 34 d1 48 89 11 5d c3 0f 1f 00 55 40 0f b6 d6
> RSP: 0018:ffff8801913be760 EFLAGS: 00000216
> RAX: 0000000000040000 RBX: ffff88018e06b348 RCX: ffffc90001e34000
> RDX: 0000000000040000 RSI: ffffffff81d44df2 RDI: 0000000000000007
> RBP: ffff8801913be760 R08: ffff8801b465c540 R09: ffffed003b5e46d6
> R10: 0000000000000003 R11: 0000000000000006 R12: dffffc0000000000
> R13: 0000000000000042 R14: 0000000000000001 R15: 0000000000000020
> FS:  00007f53ece9c700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000119f218 CR3: 00000001ba743000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  init_page_buffers+0x3e2/0x530 fs/buffer.c:904
>  grow_dev_page fs/buffer.c:947 [inline]
>  grow_buffers fs/buffer.c:1009 [inline]
>  __getblk_slow fs/buffer.c:1036 [inline]
>  __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
>  __bread_gfp+0x2d/0x310 fs/buffer.c:1347
>  sb_bread include/linux/buffer_head.h:307 [inline]
>  fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
>  fat_ent_read_block fs/fat/fatent.c:441 [inline]
>  fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
>  fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
>  __fat_get_block fs/fat/inode.c:148 [inline]
>  fat_get_block+0x375/0xaf0 fs/fat/inode.c:183
>  __block_write_begin_int+0x50d/0x1b00 fs/buffer.c:1958
>  __block_write_begin fs/buffer.c:2008 [inline]
>  block_write_begin+0xda/0x370 fs/buffer.c:2067
>  cont_write_begin+0x569/0x860 fs/buffer.c:2417
>  fat_write_begin+0x8d/0x120 fs/fat/inode.c:229
>  generic_perform_write+0x3ae/0x6c0 mm/filemap.c:3139
>  __generic_file_write_iter+0x26e/0x630 mm/filemap.c:3264
>  generic_file_write_iter+0x438/0x870 mm/filemap.c:3292
>  call_write_iter include/linux/fs.h:1793 [inline]
>  new_sync_write fs/read_write.c:474 [inline]
>  __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
>  __kernel_write+0x10c/0x380 fs/read_write.c:506
>  do_acct_process+0x1148/0x1660 kernel/acct.c:520
>  acct_pin_kill+0x2e/0x100 kernel/acct.c:174
>  pin_kill+0x29f/0xb60 fs/fs_pin.c:50
>  acct_on+0x63b/0x8b0 kernel/acct.c:254
>  __do_sys_acct kernel/acct.c:286 [inline]
>  __se_sys_acct kernel/acct.c:273 [inline]
>  __x64_sys_acct+0xc2/0x1f0 kernel/acct.c:273
>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-07-18 10:28 ` Tetsuo Handa
@ 2018-07-18 10:36   ` Dmitry Vyukov
  2018-07-20 10:36     ` Tetsuo Handa
  1 sibling, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2018-07-18 10:36 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot, Andrew Morton, Linux-MM, Mel Gorman, Michal Hocko,
	Andi Kleen, Jan Kara, jlayton, LKML, Matthew Wilcox,
	syzkaller-bugs, tim.c.chen

On Wed, Jul 18, 2018 at 12:28 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/07/18 17:58, syzbot wrote:
>> mmap: syz-executor7 (10902) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.rst.
>
> There are many reports which are stalling inside __getblk_gfp().
> And there is horrible comment for __getblk_gfp():
>
>   /*
>    * __getblk_gfp() will locate (and, if necessary, create) the buffer_head
>    * which corresponds to the passed block_device, block and size. The
>    * returned buffer has its reference count incremented.
>    *
>    * __getblk_gfp() will lock up the machine if grow_dev_page's
>    * try_to_free_buffers() attempt is failing.  FIXME, perhaps?
>    */
>
> This report is stalling after mount() completed and process used remap_file_pages().
> I think that we might need to use debug printk(). But I don't know what to examine.


There is a thread in the same process that is doing acct on a mounted
fat filesystem, you can see it at the bottom of the report. Not sure
it's related or not.

FWIW the program that caused this is:

18:26:17 executing program 0:
syz_mount_image$vfat(&(0x7f0000001680)='vfat\x00',
&(0x7f0000000100)='./file0\x00', 0x100000000000dfff, 0x1,
&(0x7f0000000040)=[{&(0x7f00000016c0)="eb64c86d4f66732e66617400020441000500077008f8",
0x16}], 0x0, &(0x7f0000000140)=ANY=[])
chroot(&(0x7f00000000c0)='./file0\x00')
mknod(&(0x7f0000000500)='./file0/file0\x00', 0x0, 0x0)
acct(&(0x7f0000000280)='./file0/file0\x00')
umount2(&(0x7f0000000140)='./file0\x00', 0xffffff0f)




>> CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
>> Code: e2 00 01 1f 00 48 8b 75 08 75 2b 8b 90 90 12 00 00 83 fa 02 75 20 48 8b 88 98 12 00 00 8b 80 94 12 00 00 48 8b 11 48 83 c2 01 <48> 39 d0 76 07 48 89 34 d1 48 89 11 5d c3 0f 1f 00 55 40 0f b6 d6
>> RSP: 0018:ffff8801913be760 EFLAGS: 00000216
>> RAX: 0000000000040000 RBX: ffff88018e06b348 RCX: ffffc90001e34000
>> RDX: 0000000000040000 RSI: ffffffff81d44df2 RDI: 0000000000000007
>> RBP: ffff8801913be760 R08: ffff8801b465c540 R09: ffffed003b5e46d6
>> R10: 0000000000000003 R11: 0000000000000006 R12: dffffc0000000000
>> R13: 0000000000000042 R14: 0000000000000001 R15: 0000000000000020
>> FS:  00007f53ece9c700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000000000119f218 CR3: 00000001ba743000 CR4: 00000000001406e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>  init_page_buffers+0x3e2/0x530 fs/buffer.c:904
>>  grow_dev_page fs/buffer.c:947 [inline]
>>  grow_buffers fs/buffer.c:1009 [inline]
>>  __getblk_slow fs/buffer.c:1036 [inline]
>>  __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
>>  __bread_gfp+0x2d/0x310 fs/buffer.c:1347
>>  sb_bread include/linux/buffer_head.h:307 [inline]
>>  fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
>>  fat_ent_read_block fs/fat/fatent.c:441 [inline]
>>  fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
>>  fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
>>  __fat_get_block fs/fat/inode.c:148 [inline]
>>  fat_get_block+0x375/0xaf0 fs/fat/inode.c:183
>>  __block_write_begin_int+0x50d/0x1b00 fs/buffer.c:1958
>>  __block_write_begin fs/buffer.c:2008 [inline]
>>  block_write_begin+0xda/0x370 fs/buffer.c:2067
>>  cont_write_begin+0x569/0x860 fs/buffer.c:2417
>>  fat_write_begin+0x8d/0x120 fs/fat/inode.c:229
>>  generic_perform_write+0x3ae/0x6c0 mm/filemap.c:3139
>>  __generic_file_write_iter+0x26e/0x630 mm/filemap.c:3264
>>  generic_file_write_iter+0x438/0x870 mm/filemap.c:3292
>>  call_write_iter include/linux/fs.h:1793 [inline]
>>  new_sync_write fs/read_write.c:474 [inline]
>>  __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
>>  __kernel_write+0x10c/0x380 fs/read_write.c:506
>>  do_acct_process+0x1148/0x1660 kernel/acct.c:520
>>  acct_pin_kill+0x2e/0x100 kernel/acct.c:174
>>  pin_kill+0x29f/0xb60 fs/fs_pin.c:50
>>  acct_on+0x63b/0x8b0 kernel/acct.c:254
>>  __do_sys_acct kernel/acct.c:286 [inline]
>>  __se_sys_acct kernel/acct.c:273 [inline]
>>  __x64_sys_acct+0xc2/0x1f0 kernel/acct.c:273
>>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-07-18 10:28 ` Tetsuo Handa
  2018-07-18 10:36   ` Dmitry Vyukov
@ 2018-07-20 10:36     ` Tetsuo Handa
  1 sibling, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2018-07-20 10:36 UTC (permalink / raw)
  To: akpm, Alexander Viro
  Cc: syzbot, linux-mm, mgorman, Michal Hocko, ak, jack, jlayton,
	linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2018/07/18 19:28, Tetsuo Handa wrote:
> There are many reports which are stalling inside __getblk_gfp().

Currently 18 reports out of 65 "INFO: task hung in " reports.

  INFO: task hung in aead_recvmsg
  INFO: task hung in inode_sleep_on_writeback
  INFO: task hung in __writeback_inodes_sb_nr
  INFO: task hung in __blkdev_get (2)
  INFO: task hung in lookup_slow
  INFO: task hung in iterate_supers
  INFO: task hung in flush_work
  INFO: task hung in vfs_setxattr
  INFO: task hung in lock_mount
  INFO: task hung in __get_super
  INFO: task hung in do_unlinkat
  INFO: task hung in fat_fallocate
  INFO: task hung in generic_file_write_iter
  INFO: task hung in d_alloc_parallel
  INFO: task hung in __fdget_pos (2)
  INFO: task hung in path_openat
  INFO: task hung in do_truncate
  INFO: task hung in filename_create

> And there is horrible comment for __getblk_gfp():
> 
>   /*
>    * __getblk_gfp() will locate (and, if necessary, create) the buffer_head
>    * which corresponds to the passed block_device, block and size. The
>    * returned buffer has its reference count incremented.
>    *
>    * __getblk_gfp() will lock up the machine if grow_dev_page's
>    * try_to_free_buffers() attempt is failing.  FIXME, perhaps?
>    */
> 
> This report is stalling after mount() completed and process used remap_file_pages().
> I think that we might need to use debug printk(). But I don't know what to examine.
> 

Andrew, can you pick up this debug printk() patch?
I guess we can get the result within one week.

From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 20 Jul 2018 19:29:06 +0900
Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem

Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
__getblk_gfp() line in the backtrace. Since there is a comment block that
says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
attempt from grow_dev_page() is failing, let's start from checking whether
syzbot is hitting that case. This change will be removed after the bug is
fixed.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvyukov@google.com>
---
 fs/buffer.c           | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/sched.h |  7 +++++++
 lib/Kconfig.debug     |  6 ++++++
 3 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index be31e28..ebf78ab 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -955,10 +955,20 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 			end_block = init_page_buffers(page, bdev,
 						(sector_t)index << sizebits,
 						size);
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+			current->getblk_executed |= 0x01;
+#endif
 			goto done;
 		}
-		if (!try_to_free_buffers(page))
+		if (!try_to_free_buffers(page)) {
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+			current->getblk_executed |= 0x02;
+#endif
 			goto failed;
+		}
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		current->getblk_executed |= 0x04;
+#endif
 	}
 
 	/*
@@ -978,6 +988,9 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 	spin_unlock(&inode->i_mapping->private_lock);
 done:
 	ret = (block < end_block) ? 1 : -ENXIO;
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_executed |= 0x08;
+#endif
 failed:
 	unlock_page(page);
 	put_page(page);
@@ -1033,6 +1046,12 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 		return NULL;
 	}
 
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_stamp = jiffies;
+	current->getblk_executed = 0;
+	current->getblk_bh_count = 0;
+	current->getblk_bh_state = 0;
+#endif
 	for (;;) {
 		struct buffer_head *bh;
 		int ret;
@@ -1044,6 +1063,18 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 		ret = grow_buffers(bdev, block, size, gfp);
 		if (ret < 0)
 			return NULL;
+
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
+			continue;
+		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
+		       current->comm, current->pid, current->getblk_executed,
+		       current->getblk_bh_count, current->getblk_bh_state);
+		current->getblk_executed = 0;
+		current->getblk_bh_count = 0;
+		current->getblk_bh_state = 0;
+		current->getblk_stamp = jiffies;
+#endif
 	}
 }
 
@@ -3216,6 +3247,11 @@ int sync_dirty_buffer(struct buffer_head *bh)
  */
 static inline int buffer_busy(struct buffer_head *bh)
 {
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_executed |= 0x80;
+	current->getblk_bh_count = atomic_read(&bh->b_count);
+	current->getblk_bh_state = bh->b_state;
+#endif
 	return atomic_read(&bh->b_count) |
 		(bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock)));
 }
@@ -3254,11 +3290,18 @@ int try_to_free_buffers(struct page *page)
 	int ret = 0;
 
 	BUG_ON(!PageLocked(page));
-	if (PageWriteback(page))
+	if (PageWriteback(page)) {
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		current->getblk_executed |= 0x10;
+#endif
 		return 0;
+	}
 
 	if (mapping == NULL) {		/* can this still happen? */
 		ret = drop_buffers(page, &buffers_to_free);
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		current->getblk_executed |= 0x20;
+#endif
 		goto out;
 	}
 
@@ -3282,6 +3325,9 @@ int try_to_free_buffers(struct page *page)
 	if (ret)
 		cancel_dirty_page(page);
 	spin_unlock(&mapping->private_lock);
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_executed |= 0x40;
+#endif
 out:
 	if (buffers_to_free) {
 		struct buffer_head *bh = buffers_to_free;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c4e0aba..95b143e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1207,6 +1207,13 @@ struct task_struct {
 	unsigned long			prev_lowest_stack;
 #endif
 
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	unsigned long			getblk_stamp;
+	unsigned int			getblk_executed;
+	unsigned int			getblk_bh_count;
+	unsigned long			getblk_bh_state;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c731ff9..0747ce7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2058,3 +2058,9 @@ config IO_STRICT_DEVMEM
 	  if the driver using a given range cannot be disabled.
 
 	  If in doubt, say Y.
+
+config DEBUG_AID_FOR_SYZBOT
+       bool "Additional debug code for syzbot"
+       default n
+       help
+         This option is intended for testing by syzbot.
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2018-07-20 10:36     ` Tetsuo Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2018-07-20 10:36 UTC (permalink / raw)
  To: akpm, Alexander Viro
  Cc: syzbot, linux-mm, mgorman, Michal Hocko, ak, jack, jlayton,
	linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2018/07/18 19:28, Tetsuo Handa wrote:
> There are many reports which are stalling inside __getblk_gfp().

Currently 18 reports out of 65 "INFO: task hung in " reports.

  INFO: task hung in aead_recvmsg
  INFO: task hung in inode_sleep_on_writeback
  INFO: task hung in __writeback_inodes_sb_nr
  INFO: task hung in __blkdev_get (2)
  INFO: task hung in lookup_slow
  INFO: task hung in iterate_supers
  INFO: task hung in flush_work
  INFO: task hung in vfs_setxattr
  INFO: task hung in lock_mount
  INFO: task hung in __get_super
  INFO: task hung in do_unlinkat
  INFO: task hung in fat_fallocate
  INFO: task hung in generic_file_write_iter
  INFO: task hung in d_alloc_parallel
  INFO: task hung in __fdget_pos (2)
  INFO: task hung in path_openat
  INFO: task hung in do_truncate
  INFO: task hung in filename_create

> And there is horrible comment for __getblk_gfp():
> 
>   /*
>    * __getblk_gfp() will locate (and, if necessary, create) the buffer_head
>    * which corresponds to the passed block_device, block and size. The
>    * returned buffer has its reference count incremented.
>    *
>    * __getblk_gfp() will lock up the machine if grow_dev_page's
>    * try_to_free_buffers() attempt is failing.  FIXME, perhaps?
>    */
> 
> This report is stalling after mount() completed and process used remap_file_pages().
> I think that we might need to use debug printk(). But I don't know what to examine.
> 

Andrew, can you pick up this debug printk() patch?
I guess we can get the result within one week.

>From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 20 Jul 2018 19:29:06 +0900
Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem

Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
__getblk_gfp() line in the backtrace. Since there is a comment block that
says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
attempt from grow_dev_page() is failing, let's start from checking whether
syzbot is hitting that case. This change will be removed after the bug is
fixed.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvyukov@google.com>
---
 fs/buffer.c           | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/sched.h |  7 +++++++
 lib/Kconfig.debug     |  6 ++++++
 3 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index be31e28..ebf78ab 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -955,10 +955,20 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 			end_block = init_page_buffers(page, bdev,
 						(sector_t)index << sizebits,
 						size);
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+			current->getblk_executed |= 0x01;
+#endif
 			goto done;
 		}
-		if (!try_to_free_buffers(page))
+		if (!try_to_free_buffers(page)) {
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+			current->getblk_executed |= 0x02;
+#endif
 			goto failed;
+		}
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		current->getblk_executed |= 0x04;
+#endif
 	}
 
 	/*
@@ -978,6 +988,9 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 	spin_unlock(&inode->i_mapping->private_lock);
 done:
 	ret = (block < end_block) ? 1 : -ENXIO;
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_executed |= 0x08;
+#endif
 failed:
 	unlock_page(page);
 	put_page(page);
@@ -1033,6 +1046,12 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 		return NULL;
 	}
 
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_stamp = jiffies;
+	current->getblk_executed = 0;
+	current->getblk_bh_count = 0;
+	current->getblk_bh_state = 0;
+#endif
 	for (;;) {
 		struct buffer_head *bh;
 		int ret;
@@ -1044,6 +1063,18 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 		ret = grow_buffers(bdev, block, size, gfp);
 		if (ret < 0)
 			return NULL;
+
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
+			continue;
+		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
+		       current->comm, current->pid, current->getblk_executed,
+		       current->getblk_bh_count, current->getblk_bh_state);
+		current->getblk_executed = 0;
+		current->getblk_bh_count = 0;
+		current->getblk_bh_state = 0;
+		current->getblk_stamp = jiffies;
+#endif
 	}
 }
 
@@ -3216,6 +3247,11 @@ int sync_dirty_buffer(struct buffer_head *bh)
  */
 static inline int buffer_busy(struct buffer_head *bh)
 {
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_executed |= 0x80;
+	current->getblk_bh_count = atomic_read(&bh->b_count);
+	current->getblk_bh_state = bh->b_state;
+#endif
 	return atomic_read(&bh->b_count) |
 		(bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock)));
 }
@@ -3254,11 +3290,18 @@ int try_to_free_buffers(struct page *page)
 	int ret = 0;
 
 	BUG_ON(!PageLocked(page));
-	if (PageWriteback(page))
+	if (PageWriteback(page)) {
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		current->getblk_executed |= 0x10;
+#endif
 		return 0;
+	}
 
 	if (mapping == NULL) {		/* can this still happen? */
 		ret = drop_buffers(page, &buffers_to_free);
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+		current->getblk_executed |= 0x20;
+#endif
 		goto out;
 	}
 
@@ -3282,6 +3325,9 @@ int try_to_free_buffers(struct page *page)
 	if (ret)
 		cancel_dirty_page(page);
 	spin_unlock(&mapping->private_lock);
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	current->getblk_executed |= 0x40;
+#endif
 out:
 	if (buffers_to_free) {
 		struct buffer_head *bh = buffers_to_free;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c4e0aba..95b143e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1207,6 +1207,13 @@ struct task_struct {
 	unsigned long			prev_lowest_stack;
 #endif
 
+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+	unsigned long			getblk_stamp;
+	unsigned int			getblk_executed;
+	unsigned int			getblk_bh_count;
+	unsigned long			getblk_bh_state;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c731ff9..0747ce7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2058,3 +2058,9 @@ config IO_STRICT_DEVMEM
 	  if the driver using a given range cannot be disabled.
 
 	  If in doubt, say Y.
+
+config DEBUG_AID_FOR_SYZBOT
+       bool "Additional debug code for syzbot"
+       default n
+       help
+         This option is intended for testing by syzbot.
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2018-07-20 10:36     ` Tetsuo Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2018-07-20 10:36 UTC (permalink / raw)
  To: akpm, Alexander Viro
  Cc: syzbot, linux-mm, mgorman, Michal Hocko, ak, jack, jlayton,
	linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2018/07/18 19:28, Tetsuo Handa wrote:
> There are many reports which are stalling inside __getblk_gfp().

Currently 18 reports out of 65 "INFO: task hung in " reports.

  INFO: task hung in aead_recvmsg
  INFO: task hung in inode_sleep_on_writeback
  INFO: task hung in __writeback_inodes_sb_nr
  INFO: task hung in __blkdev_get (2)
  INFO: task hung in lookup_slow
  INFO: task hung in iterate_supers
  INFO: task hung in flush_work
  INFO: task hung in vfs_setxattr
  INFO: task hung in lock_mount
  INFO: task hung in __get_super
  INFO: task hung in do_unlinkat
  INFO: task hung in fat_fallocate
  INFO: task hung in generic_file_write_iter
  INFO: task hung in d_alloc_parallel
  INFO: task hung in __fdget_pos (2)
  INFO: task hung in path_openat
  INFO: task hung in do_truncate
  INFO: task hung in filename_create

> And there is horrible comment for __getblk_gfp():
> 
>   /*
>    * __getblk_gfp() will locate (and, if necessary, create) the buffer_head
>    * which corresponds to the passed block_device, block and size. The
>    * returned buffer has its reference count incremented.
>    *
>    * __getblk_gfp() will lock up the machine if grow_dev_page's
>    * try_to_free_buffers() attempt is failing.  FIXME, perhaps?
>    */
> 
> This report is stalling after mount() completed and process used remap_file_pages().
> I think that we might need to use debug printk(). But I don't know what to examine.
> 

Andrew, can you pick up this debug printk() patch?
I guess we can get the result within one week.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-07-20 10:36     ` Tetsuo Handa
  (?)
  (?)
@ 2018-07-20 20:06     ` Andrew Morton
  2018-07-30 15:07       ` Tetsuo Handa
  -1 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2018-07-20 20:06 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Alexander Viro, syzbot, linux-mm, mgorman, Michal Hocko, ak,
	jack, jlayton, linux-kernel, mawilcox, syzkaller-bugs,
	tim.c.chen, linux-fsdevel

On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> > 
> > This report is stalling after mount() completed and process used remap_file_pages().
> > I think that we might need to use debug printk(). But I don't know what to examine.
> > 
> 
> Andrew, can you pick up this debug printk() patch?
> I guess we can get the result within one week.

Sure, let's toss it in -next for a while.

> >From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Fri, 20 Jul 2018 19:29:06 +0900
> Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem
> 
> Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
> __getblk_gfp() line in the backtrace. Since there is a comment block that
> says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
> attempt from grow_dev_page() is failing, let's start from checking whether
> syzbot is hitting that case. This change will be removed after the bug is
> fixed.

I'm not sure that grow_dev_page() is hanging.  It has often been
suspected, but always is proven innocent.  Lets see.

>
> ...
>
> @@ -978,6 +988,9 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
>  	spin_unlock(&inode->i_mapping->private_lock);
>  done:
>  	ret = (block < end_block) ? 1 : -ENXIO;
> +#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
> +	current->getblk_executed |= 0x08;
> +#endif
>  failed:
>  	unlock_page(page);
>  	put_page(page);
> @@ -1033,6 +1046,12 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)

Something is wrong with your diff(1).  That's grow_dev_page(), not
blkdev_max_block().

>  		return NULL;
>  	}
>  
> +#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
> +	current->getblk_stamp = jiffies;

AFACIT getblk_stamp didn't need to be in the task_struct - it could be
a local.  Doesn't matter much.

> +	current->getblk_executed = 0;
> +	current->getblk_bh_count = 0;
> +	current->getblk_bh_state = 0;
> +#endif
>  	for (;;) {
>  		struct buffer_head *bh;
>  		int ret;
> @@ -1044,6 +1063,18 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
>  		ret = grow_buffers(bdev, block, size, gfp);
>  		if (ret < 0)
>  			return NULL;
> +
> +#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
> +		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
> +			continue;
> +		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
> +		       current->comm, current->pid, current->getblk_executed,
> +		       current->getblk_bh_count, current->getblk_bh_state);
> +		current->getblk_executed = 0;
> +		current->getblk_bh_count = 0;
> +		current->getblk_bh_state = 0;
> +		current->getblk_stamp = jiffies;
> +#endif
>  	}
>  }
>  
> @@ -3216,6 +3247,11 @@ int sync_dirty_buffer(struct buffer_head *bh)
>   */
>  static inline int buffer_busy(struct buffer_head *bh)
>  {
> +#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
> +	current->getblk_executed |= 0x80;
> +	current->getblk_bh_count = atomic_read(&bh->b_count);
> +	current->getblk_bh_state = bh->b_state;
> +#endif

Some explanation of your design wouldn't have hurt.  What does
getblk_executed do, why were these particular fields chosen?

>  	return atomic_read(&bh->b_count) |
>  		(bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock)));
>  }
>
> ...
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-07-20 20:06     ` Andrew Morton
@ 2018-07-30 15:07       ` Tetsuo Handa
  2018-08-06 10:09         ` Jan Kara
  0 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2018-07-30 15:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, syzbot, linux-mm, mgorman, Michal Hocko, ak,
	jack, jlayton, linux-kernel, mawilcox, syzkaller-bugs,
	tim.c.chen, linux-fsdevel

On 2018/07/21 5:06, Andrew Morton wrote:
> On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>>>
>>> This report is stalling after mount() completed and process used remap_file_pages().
>>> I think that we might need to use debug printk(). But I don't know what to examine.
>>>
>>
>> Andrew, can you pick up this debug printk() patch?
>> I guess we can get the result within one week.
> 
> Sure, let's toss it in -next for a while.
> 
>> >From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Fri, 20 Jul 2018 19:29:06 +0900
>> Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem
>>
>> Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
>> __getblk_gfp() line in the backtrace. Since there is a comment block that
>> says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
>> attempt from grow_dev_page() is failing, let's start from checking whether
>> syzbot is hitting that case. This change will be removed after the bug is
>> fixed.
> 
> I'm not sure that grow_dev_page() is hanging.  It has often been
> suspected, but always is proven innocent.  Lets see.

syzbot reproduced this problem ( https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000 ) .
It says that grow_dev_page() is returning 1 but __find_get_block() is failing forever. Any idea?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-07-30 15:07       ` Tetsuo Handa
@ 2018-08-06 10:09         ` Jan Kara
  2018-08-06 11:56           ` Tetsuo Handa
  2018-12-28 13:34           ` Tetsuo Handa
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2018-08-06 10:09 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, Alexander Viro, syzbot, linux-mm, mgorman,
	Michal Hocko, ak, jack, jlayton, linux-kernel, mawilcox,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On Tue 31-07-18 00:07:22, Tetsuo Handa wrote:
> On 2018/07/21 5:06, Andrew Morton wrote:
> > On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > 
> >>>
> >>> This report is stalling after mount() completed and process used remap_file_pages().
> >>> I think that we might need to use debug printk(). But I don't know what to examine.
> >>>
> >>
> >> Andrew, can you pick up this debug printk() patch?
> >> I guess we can get the result within one week.
> > 
> > Sure, let's toss it in -next for a while.
> > 
> >> >From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
> >> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >> Date: Fri, 20 Jul 2018 19:29:06 +0900
> >> Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem
> >>
> >> Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
> >> __getblk_gfp() line in the backtrace. Since there is a comment block that
> >> says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
> >> attempt from grow_dev_page() is failing, let's start from checking whether
> >> syzbot is hitting that case. This change will be removed after the bug is
> >> fixed.
> > 
> > I'm not sure that grow_dev_page() is hanging.  It has often been
> > suspected, but always is proven innocent.  Lets see.
> 
> syzbot reproduced this problem ( https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000 ) .
> It says that grow_dev_page() is returning 1 but __find_get_block() is failing forever. Any idea?

Looks like some kind of a race where device block size gets changed while
getblk() runs (and creates buffers for underlying page). I don't have time
to nail it down at this moment can have a look into it later unless someone
beats me to it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-08-06 10:09         ` Jan Kara
@ 2018-08-06 11:56           ` Tetsuo Handa
  2018-08-20 14:12             ` Tetsuo Handa
  2018-12-28 13:34           ` Tetsuo Handa
  1 sibling, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2018-08-06 11:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Alexander Viro, syzbot, linux-mm, mgorman,
	Michal Hocko, ak, jlayton, linux-kernel, mawilcox,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On 2018/08/06 19:09, Jan Kara wrote:
>> syzbot reproduced this problem ( https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000 ) .
>> It says that grow_dev_page() is returning 1 but __find_get_block() is failing forever. Any idea?

So far syzbot reproduced 7 times, and all reports are saying that
grow_dev_page() is returning 1 but __find_get_block() is failing forever.

  INFO: task hung in __get_super
  https://syzkaller.appspot.com/text?tag=CrashLog&x=17347272400000

  INFO: task hung in __blkdev_get (2)
  https://syzkaller.appspot.com/text?tag=CrashLog&x=144347fc400000
  https://syzkaller.appspot.com/text?tag=CrashLog&x=12382bfc400000

  INFO: task hung in __writeback_inodes_sb_nr
  https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000
  https://syzkaller.appspot.com/text?tag=CrashLog&x=13c2449c400000

  INFO: task hung in filename_create
  https://syzkaller.appspot.com/text?tag=CrashLog&x=174a672c400000

  INFO: task hung in lookup_slow
  https://syzkaller.appspot.com/text?tag=CrashLog&x=16113bdc400000

> 
> Looks like some kind of a race where device block size gets changed while
> getblk() runs (and creates buffers for underlying page). I don't have time
> to nail it down at this moment can have a look into it later unless someone
> beats me to it.
> 

It seems that loop device is relevant to this problem.

144347fc400000:

[  557.484258] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  560.491589] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  563.500432] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  566.508502] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  569.516546] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  572.524800] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  575.532499] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  575.803688] INFO: task syz-executor7:7893 blocked for more than 140 seconds.
[  575.810957]       Not tainted 4.18.0-rc7-next-20180801+ #29
[  575.816685] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  575.824663] syz-executor7   D24464  7893   4272 0x00000004
[  575.830298] Call Trace:
[  575.832911]  __schedule+0x87c/0x1ec0
[  575.879299]  schedule+0xfb/0x450
[  575.912448]  rwsem_down_read_failed+0x362/0x610
[  575.951922]  call_rwsem_down_read_failed+0x18/0x30
[  575.956843]  down_read+0xc3/0x1d0
[  575.990222]  __get_super.part.12+0x20f/0x2e0
[  575.994654]  get_super+0x2d/0x50
[  575.998016]  fsync_bdev+0x17/0xc0
[  576.001469]  invalidate_partition+0x35/0x60
[  576.005800]  drop_partitions.isra.13+0xe8/0x200
[  576.023366]  rescan_partitions+0x75/0x910
[  576.037534]  __blkdev_reread_part+0x1ad/0x230
[  576.042023]  blkdev_reread_part+0x26/0x40
[  576.046173]  loop_reread_partitions+0x163/0x190
[  576.055795]  loop_set_status+0xb95/0x1010
[  576.059938]  loop_set_status64+0xaa/0x100
[  576.078629]  lo_ioctl+0x90e/0x1d70
[  576.095290]  blkdev_ioctl+0x9cd/0x2030
[  576.146746]  block_ioctl+0xee/0x130
[  576.154704]  do_vfs_ioctl+0x1de/0x1720
[  576.183069]  ksys_ioctl+0xa9/0xd0
[  576.186519]  __x64_sys_ioctl+0x73/0xb0
[  576.190423]  do_syscall_64+0x1b9/0x820



It seems that there was loop_reread_partitions() failure before this message starts.

12382bfc400000:

[  205.467816] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  205.467816] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=1)

[  205.905418] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  205.905418] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=-16)

[  206.113592] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  206.113592] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=-16)

[  206.767225] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  206.767225] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=-16)

[  206.990060] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  206.990060] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=1)

[  208.630329] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  208.630329] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=-16)

[  228.101929] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  228.101929] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=-16)

[  228.605840] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  228.605840] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=1)

[  229.400629] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  229.400629] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=1)

[  229.622379] loop_reread_partitions: partition scan of loop0 (<E1><D3><F8>w<E5><EA><E4>S<E5>]}d<D0>^MI^A<BA><ED>!<F7><DE><92><A8>f<9B>8<CC><D6>W<DB><F5><AE>F5Eice^W<B5>^O<80>Z<E2>^H%<8D><BA>}
[  229.622379] <BE><8D> <8F>^OESC<CC><88>\<8B><A9>) failed (rc=1)

[  233.406342] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0
[  236.414328] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0
[  239.422327] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0
[  242.430332] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0
(...snipped...)
[  416.894327] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0
[  419.902327] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0
[  422.910325] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0
[  425.918329] syz-executor0(18214): getblk(): executed=9 bh_count=0 bh_state=0

[  431.005871] NMI backtrace for cpu 0
[  431.005876] CPU: 0 PID: 18214 Comm: syz-executor0 Not tainted 4.18.0-rc6-next-20180725+ #18
[  431.005881] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  431.005884] RIP: 0010:write_comp_data+0x14/0x70
[  431.005893] Code: 4c 89 ef e8 1e 93 3e 00 e9 76 fc ff ff e8 b4 9c ca ff 90 90 90 90 55 65 4c 8b 04 25 40 ee 01 00 65 8b 05 2f 46 85 7e 48 89 e5 <a9> 00 01 1f 00 75 51 41 8b 80 98 12 00 00 83 f8 03 75 45 49 8b 80
[  431.005896] RSP: 0018:ffff880192487130 EFLAGS: 00000046
[  431.005903] RAX: 0000000080000000 RBX: ffff880168426b28 RCX: ffffffff81d704dd
[  431.005906] RDX: 000000000003085f RSI: 000000000000000d RDI: 0000000000000006
[  431.005910] RBP: ffff880192487130 R08: ffff8801d9696200 R09: fffff94000af6a66
[  431.005914] R10: fffff94000af6a66 R11: ffffea00057b5337 R12: 0000000000000006
[  431.005918] R13: dffffc0000000000 R14: 0000000000000006 R15: 000000000003085f
[  431.005923] FS:  00007f6c291f1700(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
[  431.005926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  431.005930] CR2: ffffffffff600400 CR3: 00000001c5ff4000 CR4: 00000000001406f0
[  431.005935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  431.005938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  431.005941] Call Trace:
[  431.005944]  __sanitizer_cov_trace_cmp8+0x18/0x20
[  431.005946]  __find_get_block+0x1bd/0xe60
[  431.005969]  __getblk_gfp+0x3dc/0xe00
[  431.005996]  __bread_gfp+0x2d/0x310
[  431.005999]  fat__get_entry+0x59c/0xa30
[  431.006017]  fat_get_short_entry+0x13c/0x2c0
[  431.006020]  fat_subdirs+0x13c/0x290
[  431.006043]  fat_fill_super+0x29f6/0x4430
[  431.006068]  vfat_fill_super+0x31/0x40
[  431.006070]  mount_bdev+0x314/0x3e0
[  431.006075]  vfat_mount+0x3c/0x50
[  431.006080]  legacy_get_tree+0x131/0x460
[  431.006083]  vfs_get_tree+0x1cb/0x5c0
[  431.006088]  do_mount+0x6f2/0x1e20
[  431.006113]  ksys_mount+0x12d/0x140
[  431.006116]  __x64_sys_mount+0xbe/0x150
[  431.006118]  do_syscall_64+0x1b9/0x820
[  431.006143]  entry_SYSCALL_64_after_hwframe+0x49/0xbe



There were a lot of directory bread failure messages before this message starts.

17347272400000:

[  431.497237] FAT-fs (loop3): Directory bread(block 2574) failed
[  431.505561] FAT-fs (loop3): Directory bread(block 2575) failed
[  431.516048] FAT-fs (loop3): Directory bread(block 2576) failed
[  431.525740] FAT-fs (loop3): Directory bread(block 2577) failed
[  431.535899] FAT-fs (loop3): Directory bread(block 2578) failed
[  431.636001] FAT-fs (loop5): bogus number of reserved sectors
[  431.641910] FAT-fs (loop5): Can't find a valid FAT filesystem
[  434.145080] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  437.152496] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  440.161598] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  443.169498] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
(...snipped...)
[  563.500432] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  566.508502] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  569.516546] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  572.524800] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  575.532499] syz-executor0(7883): getblk(): executed=9 bh_count=0 bh_state=0
[  575.803688] INFO: task syz-executor7:7893 blocked for more than 140 seconds.
[  575.810957]       Not tainted 4.18.0-rc7-next-20180801+ #29
[  575.816685] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  575.824663] syz-executor7   D24464  7893   4272 0x00000004
[  575.830298] Call Trace:
[  575.832911]  __schedule+0x87c/0x1ec0
[  575.879299]  schedule+0xfb/0x450
[  575.912448]  rwsem_down_read_failed+0x362/0x610
[  575.951922]  call_rwsem_down_read_failed+0x18/0x30
[  575.956843]  down_read+0xc3/0x1d0
[  575.990222]  __get_super.part.12+0x20f/0x2e0
[  575.994654]  get_super+0x2d/0x50
[  575.998016]  fsync_bdev+0x17/0xc0
[  576.001469]  invalidate_partition+0x35/0x60
[  576.005800]  drop_partitions.isra.13+0xe8/0x200
[  576.023366]  rescan_partitions+0x75/0x910
[  576.037534]  __blkdev_reread_part+0x1ad/0x230
[  576.042023]  blkdev_reread_part+0x26/0x40
[  576.046173]  loop_reread_partitions+0x163/0x190
[  576.055795]  loop_set_status+0xb95/0x1010
[  576.059938]  loop_set_status64+0xaa/0x100
[  576.078629]  lo_ioctl+0x90e/0x1d70
[  576.095290]  blkdev_ioctl+0x9cd/0x2030
[  576.146746]  block_ioctl+0xee/0x130
[  576.154704]  do_vfs_ioctl+0x1de/0x1720
[  576.183069]  ksys_ioctl+0xa9/0xd0
[  576.186519]  __x64_sys_ioctl+0x73/0xb0
[  576.190423]  do_syscall_64+0x1b9/0x820
[  576.246994]  entry_SYSCALL_64_after_hwframe+0x49/0xbe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-08-06 11:56           ` Tetsuo Handa
@ 2018-08-20 14:12             ` Tetsuo Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2018-08-20 14:12 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton
  Cc: Alexander Viro, syzbot, linux-mm, mgorman, Michal Hocko, ak,
	jlayton, linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2018/08/06 20:56, Tetsuo Handa wrote:
> On 2018/08/06 19:09, Jan Kara wrote:
>> Looks like some kind of a race where device block size gets changed while
>> getblk() runs (and creates buffers for underlying page). I don't have time
>> to nail it down at this moment can have a look into it later unless someone
>> beats me to it.
>>
> 
> It seems that loop device is relevant to this problem.

Speak of loop device, I'm waiting for Jens for three months
http://lkml.kernel.org/r/1527297408-4428-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
but Jens is too busy to come up with an alternative. Since that is a big patch, I wish we can
start testing that patch before Jan starts writing a patch for fixing getblk() race problem.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-08-06 10:09         ` Jan Kara
  2018-08-06 11:56           ` Tetsuo Handa
@ 2018-12-28 13:34           ` Tetsuo Handa
  2019-01-02 14:40             ` Jan Kara
  1 sibling, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2018-12-28 13:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Alexander Viro, syzbot, linux-mm, mgorman,
	Michal Hocko, ak, jlayton, linux-kernel, mawilcox,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On 2018/08/06 19:09, Jan Kara wrote:
> On Tue 31-07-18 00:07:22, Tetsuo Handa wrote:
>> On 2018/07/21 5:06, Andrew Morton wrote:
>>> On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>
>>>>>
>>>>> This report is stalling after mount() completed and process used remap_file_pages().
>>>>> I think that we might need to use debug printk(). But I don't know what to examine.
>>>>>
>>>>
>>>> Andrew, can you pick up this debug printk() patch?
>>>> I guess we can get the result within one week.
>>>
>>> Sure, let's toss it in -next for a while.
>>>
>>>> >From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
>>>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>>> Date: Fri, 20 Jul 2018 19:29:06 +0900
>>>> Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem
>>>>
>>>> Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
>>>> __getblk_gfp() line in the backtrace. Since there is a comment block that
>>>> says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
>>>> attempt from grow_dev_page() is failing, let's start from checking whether
>>>> syzbot is hitting that case. This change will be removed after the bug is
>>>> fixed.
>>>
>>> I'm not sure that grow_dev_page() is hanging.  It has often been
>>> suspected, but always is proven innocent.  Lets see.
>>
>> syzbot reproduced this problem ( https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000 ) .
>> It says that grow_dev_page() is returning 1 but __find_get_block() is failing forever. Any idea?
> 
> Looks like some kind of a race where device block size gets changed while
> getblk() runs (and creates buffers for underlying page). I don't have time
> to nail it down at this moment can have a look into it later unless someone
> beats me to it.

I feel that the frequency of hitting this problem was decreased
by merging loop module's ioctl() serialization patches. But this
problem is still there, and syzbot got a new line in
https://syzkaller.appspot.com/text?tag=CrashLog&x=177f889f400000 .

  [  615.881781] __loop_clr_fd: partition scan of loop5 failed (rc=-22)
  [  619.059920] syz-executor4(2193): getblk(): executed=cd bh_count=0 bh_state=29
  [  622.069808] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
  [  625.080013] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
  [  628.089900] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0

I guess that loop module is somehow related to this problem.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2018-12-28 13:34           ` Tetsuo Handa
@ 2019-01-02 14:40             ` Jan Kara
  2019-01-02 14:46                 ` Dmitry Vyukov
  2019-01-02 16:07                 ` Tetsuo Handa
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2019-01-02 14:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Andrew Morton, Alexander Viro, syzbot, linux-mm,
	mgorman, Michal Hocko, ak, jlayton, linux-kernel, mawilcox,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On Fri 28-12-18 22:34:13, Tetsuo Handa wrote:
> On 2018/08/06 19:09, Jan Kara wrote:
> > On Tue 31-07-18 00:07:22, Tetsuo Handa wrote:
> >> On 2018/07/21 5:06, Andrew Morton wrote:
> >>> On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >>>
> >>>>>
> >>>>> This report is stalling after mount() completed and process used remap_file_pages().
> >>>>> I think that we might need to use debug printk(). But I don't know what to examine.
> >>>>>
> >>>>
> >>>> Andrew, can you pick up this debug printk() patch?
> >>>> I guess we can get the result within one week.
> >>>
> >>> Sure, let's toss it in -next for a while.
> >>>
> >>>> >From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
> >>>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >>>> Date: Fri, 20 Jul 2018 19:29:06 +0900
> >>>> Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem
> >>>>
> >>>> Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
> >>>> __getblk_gfp() line in the backtrace. Since there is a comment block that
> >>>> says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
> >>>> attempt from grow_dev_page() is failing, let's start from checking whether
> >>>> syzbot is hitting that case. This change will be removed after the bug is
> >>>> fixed.
> >>>
> >>> I'm not sure that grow_dev_page() is hanging.  It has often been
> >>> suspected, but always is proven innocent.  Lets see.
> >>
> >> syzbot reproduced this problem ( https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000 ) .
> >> It says that grow_dev_page() is returning 1 but __find_get_block() is failing forever. Any idea?
> > 
> > Looks like some kind of a race where device block size gets changed while
> > getblk() runs (and creates buffers for underlying page). I don't have time
> > to nail it down at this moment can have a look into it later unless someone
> > beats me to it.
> 
> I feel that the frequency of hitting this problem was decreased
> by merging loop module's ioctl() serialization patches. But this
> problem is still there, and syzbot got a new line in
> https://syzkaller.appspot.com/text?tag=CrashLog&x=177f889f400000 .
> 
>   [  615.881781] __loop_clr_fd: partition scan of loop5 failed (rc=-22)
>   [  619.059920] syz-executor4(2193): getblk(): executed=cd bh_count=0 bh_state=29
>   [  622.069808] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
>   [  625.080013] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
>   [  628.089900] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
> 
> I guess that loop module is somehow related to this problem.

I had a look into this and the only good explanation for this I have is
that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
If that would happen, we'd get exactly the behavior syzkaller observes
because grow_buffers() would populate different page than
__find_get_block() then looks up.

However I don't see how that's possible since the filesystem has the block
device open exclusively and blkdev_bszset() makes sure we also have
exclusive access to the block device before changing the block device size.
So changing block device block size after filesystem gets access to the
device should be impossible. 

Anyway, could you perhaps add to your debug patch a dump of 'size' passed
to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
whether my theory is right or not. Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-02 14:46                 ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-02 14:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	Matthew Wilcox, syzkaller-bugs, tim.c.chen, linux-fsdevel

On Wed, Jan 2, 2019 at 3:40 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 28-12-18 22:34:13, Tetsuo Handa wrote:
> > On 2018/08/06 19:09, Jan Kara wrote:
> > > On Tue 31-07-18 00:07:22, Tetsuo Handa wrote:
> > >> On 2018/07/21 5:06, Andrew Morton wrote:
> > >>> On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > >>>
> > >>>>>
> > >>>>> This report is stalling after mount() completed and process used remap_file_pages().
> > >>>>> I think that we might need to use debug printk(). But I don't know what to examine.
> > >>>>>
> > >>>>
> > >>>> Andrew, can you pick up this debug printk() patch?
> > >>>> I guess we can get the result within one week.
> > >>>
> > >>> Sure, let's toss it in -next for a while.
> > >>>
> > >>>> >From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
> > >>>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > >>>> Date: Fri, 20 Jul 2018 19:29:06 +0900
> > >>>> Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem
> > >>>>
> > >>>> Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
> > >>>> __getblk_gfp() line in the backtrace. Since there is a comment block that
> > >>>> says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
> > >>>> attempt from grow_dev_page() is failing, let's start from checking whether
> > >>>> syzbot is hitting that case. This change will be removed after the bug is
> > >>>> fixed.
> > >>>
> > >>> I'm not sure that grow_dev_page() is hanging.  It has often been
> > >>> suspected, but always is proven innocent.  Lets see.
> > >>
> > >> syzbot reproduced this problem ( https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000 ) .
> > >> It says that grow_dev_page() is returning 1 but __find_get_block() is failing forever. Any idea?
> > >
> > > Looks like some kind of a race where device block size gets changed while
> > > getblk() runs (and creates buffers for underlying page). I don't have time
> > > to nail it down at this moment can have a look into it later unless someone
> > > beats me to it.
> >
> > I feel that the frequency of hitting this problem was decreased
> > by merging loop module's ioctl() serialization patches. But this
> > problem is still there, and syzbot got a new line in
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=177f889f400000 .
> >
> >   [  615.881781] __loop_clr_fd: partition scan of loop5 failed (rc=-22)
> >   [  619.059920] syz-executor4(2193): getblk(): executed=cd bh_count=0 bh_state=29
> >   [  622.069808] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
> >   [  625.080013] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
> >   [  628.089900] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
> >
> > I guess that loop module is somehow related to this problem.
>
> I had a look into this and the only good explanation for this I have is
> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> If that would happen, we'd get exactly the behavior syzkaller observes
> because grow_buffers() would populate different page than
> __find_get_block() then looks up.
>
> However I don't see how that's possible since the filesystem has the block
> device open exclusively and blkdev_bszset() makes sure we also have
> exclusive access to the block device before changing the block device size.
> So changing block device block size after filesystem gets access to the
> device should be impossible.

If this is that critical and impossible to fire, maybe it makes sense
to add a corresponding debug check to some code paths?
syzkaller will immediately catch any violations if they happen.


> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> whether my theory is right or not. Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-02 14:46                 ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-02 14:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	Matthew Wilcox, syzkaller-bugs, tim.c.chen, linux-fsdevel

On Wed, Jan 2, 2019 at 3:40 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 28-12-18 22:34:13, Tetsuo Handa wrote:
> > On 2018/08/06 19:09, Jan Kara wrote:
> > > On Tue 31-07-18 00:07:22, Tetsuo Handa wrote:
> > >> On 2018/07/21 5:06, Andrew Morton wrote:
> > >>> On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > >>>
> > >>>>>
> > >>>>> This report is stalling after mount() completed and process used remap_file_pages().
> > >>>>> I think that we might need to use debug printk(). But I don't know what to examine.
> > >>>>>
> > >>>>
> > >>>> Andrew, can you pick up this debug printk() patch?
> > >>>> I guess we can get the result within one week.
> > >>>
> > >>> Sure, let's toss it in -next for a while.
> > >>>
> > >>>> >From 8f55e00b21fefffbc6abd9085ac503c52a302464 Mon Sep 17 00:00:00 2001
> > >>>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > >>>> Date: Fri, 20 Jul 2018 19:29:06 +0900
> > >>>> Subject: [PATCH] fs/buffer.c: add debug print for __getblk_gfp() stall problem
> > >>>>
> > >>>> Among syzbot's unresolved hung task reports, 18 out of 65 reports contain
> > >>>> __getblk_gfp() line in the backtrace. Since there is a comment block that
> > >>>> says that __getblk_gfp() will lock up the machine if try_to_free_buffers()
> > >>>> attempt from grow_dev_page() is failing, let's start from checking whether
> > >>>> syzbot is hitting that case. This change will be removed after the bug is
> > >>>> fixed.
> > >>>
> > >>> I'm not sure that grow_dev_page() is hanging.  It has often been
> > >>> suspected, but always is proven innocent.  Lets see.
> > >>
> > >> syzbot reproduced this problem ( https://syzkaller.appspot.com/text?tag=CrashLog&x=11f2fc44400000 ) .
> > >> It says that grow_dev_page() is returning 1 but __find_get_block() is failing forever. Any idea?
> > >
> > > Looks like some kind of a race where device block size gets changed while
> > > getblk() runs (and creates buffers for underlying page). I don't have time
> > > to nail it down at this moment can have a look into it later unless someone
> > > beats me to it.
> >
> > I feel that the frequency of hitting this problem was decreased
> > by merging loop module's ioctl() serialization patches. But this
> > problem is still there, and syzbot got a new line in
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=177f889f400000 .
> >
> >   [  615.881781] __loop_clr_fd: partition scan of loop5 failed (rc=-22)
> >   [  619.059920] syz-executor4(2193): getblk(): executed=cd bh_count=0 bh_state=29
> >   [  622.069808] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
> >   [  625.080013] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
> >   [  628.089900] syz-executor4(2193): getblk(): executed=9 bh_count=0 bh_state=0
> >
> > I guess that loop module is somehow related to this problem.
>
> I had a look into this and the only good explanation for this I have is
> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> If that would happen, we'd get exactly the behavior syzkaller observes
> because grow_buffers() would populate different page than
> __find_get_block() then looks up.
>
> However I don't see how that's possible since the filesystem has the block
> device open exclusively and blkdev_bszset() makes sure we also have
> exclusive access to the block device before changing the block device size.
> So changing block device block size after filesystem gets access to the
> device should be impossible.

If this is that critical and impossible to fire, maybe it makes sense
to add a corresponding debug check to some code paths?
syzkaller will immediately catch any violations if they happen.


> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> whether my theory is right or not. Thanks!


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2019-01-02 14:40             ` Jan Kara
@ 2019-01-02 16:07                 ` Tetsuo Handa
  2019-01-02 16:07                 ` Tetsuo Handa
  1 sibling, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2019-01-02 16:07 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton
  Cc: Alexander Viro, syzbot, linux-mm, mgorman, Michal Hocko, ak,
	jlayton, linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2019/01/02 23:40, Jan Kara wrote:
> I had a look into this and the only good explanation for this I have is
> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> If that would happen, we'd get exactly the behavior syzkaller observes
> because grow_buffers() would populate different page than
> __find_get_block() then looks up.
> 
> However I don't see how that's possible since the filesystem has the block
> device open exclusively and blkdev_bszset() makes sure we also have
> exclusive access to the block device before changing the block device size.
> So changing block device block size after filesystem gets access to the
> device should be impossible. 
> 
> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> whether my theory is right or not. Thanks!
> 

OK. Andrew, will you add (or fold into) this change?

From e6f334380ad2c87457bfc2a4058316c47f75824a Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 3 Jan 2019 01:03:35 +0900
Subject: [PATCH] fs/buffer.c: dump more info for __getblk_gfp() stall problem

We need to dump more variables on top of
"fs/buffer.c: add debug print for __getblk_gfp() stall problem".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.cz>
---
 fs/buffer.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 580fda0..a50acac 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1066,9 +1066,14 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
 		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
 			continue;
-		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
+		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx "
+		       "bdev_super_blocksize=%lu size=%u "
+		       "bdev_super_blocksize_bits=%u bdev_inode_blkbits=%u\n",
 		       current->comm, current->pid, current->getblk_executed,
-		       current->getblk_bh_count, current->getblk_bh_state);
+		       current->getblk_bh_count, current->getblk_bh_state,
+		       bdev->bd_super->s_blocksize, size,
+		       bdev->bd_super->s_blocksize_bits,
+		       bdev->bd_inode->i_blkbits);
 		current->getblk_executed = 0;
 		current->getblk_bh_count = 0;
 		current->getblk_bh_state = 0;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-02 16:07                 ` Tetsuo Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2019-01-02 16:07 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton
  Cc: Alexander Viro, syzbot, linux-mm, mgorman, Michal Hocko, ak,
	jlayton, linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2019/01/02 23:40, Jan Kara wrote:
> I had a look into this and the only good explanation for this I have is
> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> If that would happen, we'd get exactly the behavior syzkaller observes
> because grow_buffers() would populate different page than
> __find_get_block() then looks up.
> 
> However I don't see how that's possible since the filesystem has the block
> device open exclusively and blkdev_bszset() makes sure we also have
> exclusive access to the block device before changing the block device size.
> So changing block device block size after filesystem gets access to the
> device should be impossible. 
> 
> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> whether my theory is right or not. Thanks!
> 

OK. Andrew, will you add (or fold into) this change?

>From e6f334380ad2c87457bfc2a4058316c47f75824a Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 3 Jan 2019 01:03:35 +0900
Subject: [PATCH] fs/buffer.c: dump more info for __getblk_gfp() stall problem

We need to dump more variables on top of
"fs/buffer.c: add debug print for __getblk_gfp() stall problem".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.cz>
---
 fs/buffer.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 580fda0..a50acac 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1066,9 +1066,14 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
 		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
 			continue;
-		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
+		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx "
+		       "bdev_super_blocksize=%lu size=%u "
+		       "bdev_super_blocksize_bits=%u bdev_inode_blkbits=%u\n",
 		       current->comm, current->pid, current->getblk_executed,
-		       current->getblk_bh_count, current->getblk_bh_state);
+		       current->getblk_bh_count, current->getblk_bh_state,
+		       bdev->bd_super->s_blocksize, size,
+		       bdev->bd_super->s_blocksize_bits,
+		       bdev->bd_inode->i_blkbits);
 		current->getblk_executed = 0;
 		current->getblk_bh_count = 0;
 		current->getblk_bh_state = 0;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2019-01-02 16:07                 ` Tetsuo Handa
  (?)
@ 2019-01-02 17:26                 ` Jan Kara
  2019-01-03  0:46                     ` Tetsuo Handa
  2019-01-08 10:04                   ` Tetsuo Handa
  -1 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2019-01-02 17:26 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Andrew Morton, Alexander Viro, syzbot, linux-mm,
	mgorman, Michal Hocko, ak, jlayton, linux-kernel, mawilcox,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> On 2019/01/02 23:40, Jan Kara wrote:
> > I had a look into this and the only good explanation for this I have is
> > that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > If that would happen, we'd get exactly the behavior syzkaller observes
> > because grow_buffers() would populate different page than
> > __find_get_block() then looks up.
> > 
> > However I don't see how that's possible since the filesystem has the block
> > device open exclusively and blkdev_bszset() makes sure we also have
> > exclusive access to the block device before changing the block device size.
> > So changing block device block size after filesystem gets access to the
> > device should be impossible. 
> > 
> > Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > whether my theory is right or not. Thanks!
> > 
> 
> OK. Andrew, will you add (or fold into) this change?
> 
> From e6f334380ad2c87457bfc2a4058316c47f75824a Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Thu, 3 Jan 2019 01:03:35 +0900
> Subject: [PATCH] fs/buffer.c: dump more info for __getblk_gfp() stall problem
> 
> We need to dump more variables on top of
> "fs/buffer.c: add debug print for __getblk_gfp() stall problem".
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Jan Kara <jack@suse.cz>
> ---
>  fs/buffer.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 580fda0..a50acac 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1066,9 +1066,14 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
>  #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
>  		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
>  			continue;
> -		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
> +		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx "
> +		       "bdev_super_blocksize=%lu size=%u "
> +		       "bdev_super_blocksize_bits=%u bdev_inode_blkbits=%u\n",
>  		       current->comm, current->pid, current->getblk_executed,
> -		       current->getblk_bh_count, current->getblk_bh_state);
> +		       current->getblk_bh_count, current->getblk_bh_state,
> +		       bdev->bd_super->s_blocksize, size,
> +		       bdev->bd_super->s_blocksize_bits,
> +		       bdev->bd_inode->i_blkbits);

Well, bd_super may be NULL if there's no filesystem mounted so it would be
safer to check for this rather than blindly dereferencing it... Otherwise
the change looks good to me.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2019-01-02 17:26                 ` Jan Kara
@ 2019-01-03  0:46                     ` Tetsuo Handa
  2019-01-08 10:04                   ` Tetsuo Handa
  1 sibling, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2019-01-03  0:46 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton
  Cc: Alexander Viro, syzbot, linux-mm, mgorman, Michal Hocko, ak,
	jlayton, linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2019/01/03 2:26, Jan Kara wrote:
> On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
>> On 2019/01/02 23:40, Jan Kara wrote:
>>> I had a look into this and the only good explanation for this I have is
>>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
>>> If that would happen, we'd get exactly the behavior syzkaller observes
>>> because grow_buffers() would populate different page than
>>> __find_get_block() then looks up.
>>>
>>> However I don't see how that's possible since the filesystem has the block
>>> device open exclusively and blkdev_bszset() makes sure we also have
>>> exclusive access to the block device before changing the block device size.
>>> So changing block device block size after filesystem gets access to the
>>> device should be impossible. 
>>>
>>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
>>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
>>> whether my theory is right or not. Thanks!
>>>
>>
>> OK. Andrew, will you add (or fold into) this change?
>>
>> From e6f334380ad2c87457bfc2a4058316c47f75824a Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Thu, 3 Jan 2019 01:03:35 +0900
>> Subject: [PATCH] fs/buffer.c: dump more info for __getblk_gfp() stall problem
>>
>> We need to dump more variables on top of
>> "fs/buffer.c: add debug print for __getblk_gfp() stall problem".
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Jan Kara <jack@suse.cz>
>> ---
>>  fs/buffer.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 580fda0..a50acac 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -1066,9 +1066,14 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
>>  #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
>>  		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
>>  			continue;
>> -		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
>> +		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx "
>> +		       "bdev_super_blocksize=%lu size=%u "
>> +		       "bdev_super_blocksize_bits=%u bdev_inode_blkbits=%u\n",
>>  		       current->comm, current->pid, current->getblk_executed,
>> -		       current->getblk_bh_count, current->getblk_bh_state);
>> +		       current->getblk_bh_count, current->getblk_bh_state,
>> +		       bdev->bd_super->s_blocksize, size,
>> +		       bdev->bd_super->s_blocksize_bits,
>> +		       bdev->bd_inode->i_blkbits);
> 
> Well, bd_super may be NULL if there's no filesystem mounted so it would be
> safer to check for this rather than blindly dereferencing it... Otherwise
> the change looks good to me.

I see. Let's be cautious here.

From 317a0d0002b3d2cadae606055ad50f2926ca62d2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 3 Jan 2019 09:42:02 +0900
Subject: [PATCH v2] fs/buffer.c: dump more info for __getblk_gfp() stall problem

We need to dump more variables on top of
"fs/buffer.c: add debug print for __getblk_gfp() stall problem".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.cz>
---
 fs/buffer.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 580fda0..784de3d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1066,9 +1066,15 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
 		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
 			continue;
-		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
+		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx bdev_super_blocksize=%ld size=%u bdev_super_blocksize_bits=%d bdev_inode_blkbits=%d\n",
 		       current->comm, current->pid, current->getblk_executed,
-		       current->getblk_bh_count, current->getblk_bh_state);
+		       current->getblk_bh_count, current->getblk_bh_state,
+		       IS_ERR_OR_NULL(bdev->bd_super) ? -1L :
+		       bdev->bd_super->s_blocksize, size,
+		       IS_ERR_OR_NULL(bdev->bd_super) ? -1 :
+		       bdev->bd_super->s_blocksize_bits,
+		       IS_ERR_OR_NULL(bdev->bd_inode) ? -1 :
+		       bdev->bd_inode->i_blkbits);
 		current->getblk_executed = 0;
 		current->getblk_bh_count = 0;
 		current->getblk_bh_state = 0;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-03  0:46                     ` Tetsuo Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2019-01-03  0:46 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton
  Cc: Alexander Viro, syzbot, linux-mm, mgorman, Michal Hocko, ak,
	jlayton, linux-kernel, mawilcox, syzkaller-bugs, tim.c.chen,
	linux-fsdevel

On 2019/01/03 2:26, Jan Kara wrote:
> On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
>> On 2019/01/02 23:40, Jan Kara wrote:
>>> I had a look into this and the only good explanation for this I have is
>>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
>>> If that would happen, we'd get exactly the behavior syzkaller observes
>>> because grow_buffers() would populate different page than
>>> __find_get_block() then looks up.
>>>
>>> However I don't see how that's possible since the filesystem has the block
>>> device open exclusively and blkdev_bszset() makes sure we also have
>>> exclusive access to the block device before changing the block device size.
>>> So changing block device block size after filesystem gets access to the
>>> device should be impossible. 
>>>
>>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
>>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
>>> whether my theory is right or not. Thanks!
>>>
>>
>> OK. Andrew, will you add (or fold into) this change?
>>
>> From e6f334380ad2c87457bfc2a4058316c47f75824a Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Thu, 3 Jan 2019 01:03:35 +0900
>> Subject: [PATCH] fs/buffer.c: dump more info for __getblk_gfp() stall problem
>>
>> We need to dump more variables on top of
>> "fs/buffer.c: add debug print for __getblk_gfp() stall problem".
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Jan Kara <jack@suse.cz>
>> ---
>>  fs/buffer.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 580fda0..a50acac 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -1066,9 +1066,14 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
>>  #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
>>  		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
>>  			continue;
>> -		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
>> +		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx "
>> +		       "bdev_super_blocksize=%lu size=%u "
>> +		       "bdev_super_blocksize_bits=%u bdev_inode_blkbits=%u\n",
>>  		       current->comm, current->pid, current->getblk_executed,
>> -		       current->getblk_bh_count, current->getblk_bh_state);
>> +		       current->getblk_bh_count, current->getblk_bh_state,
>> +		       bdev->bd_super->s_blocksize, size,
>> +		       bdev->bd_super->s_blocksize_bits,
>> +		       bdev->bd_inode->i_blkbits);
> 
> Well, bd_super may be NULL if there's no filesystem mounted so it would be
> safer to check for this rather than blindly dereferencing it... Otherwise
> the change looks good to me.

I see. Let's be cautious here.

>From 317a0d0002b3d2cadae606055ad50f2926ca62d2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 3 Jan 2019 09:42:02 +0900
Subject: [PATCH v2] fs/buffer.c: dump more info for __getblk_gfp() stall problem

We need to dump more variables on top of
"fs/buffer.c: add debug print for __getblk_gfp() stall problem".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.cz>
---
 fs/buffer.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 580fda0..784de3d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1066,9 +1066,15 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
 		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
 			continue;
-		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
+		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx bdev_super_blocksize=%ld size=%u bdev_super_blocksize_bits=%d bdev_inode_blkbits=%d\n",
 		       current->comm, current->pid, current->getblk_executed,
-		       current->getblk_bh_count, current->getblk_bh_state);
+		       current->getblk_bh_count, current->getblk_bh_state,
+		       IS_ERR_OR_NULL(bdev->bd_super) ? -1L :
+		       bdev->bd_super->s_blocksize, size,
+		       IS_ERR_OR_NULL(bdev->bd_super) ? -1 :
+		       bdev->bd_super->s_blocksize_bits,
+		       IS_ERR_OR_NULL(bdev->bd_inode) ? -1 :
+		       bdev->bd_inode->i_blkbits);
 		current->getblk_executed = 0;
 		current->getblk_bh_count = 0;
 		current->getblk_bh_state = 0;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2019-01-02 17:26                 ` Jan Kara
  2019-01-03  0:46                     ` Tetsuo Handa
@ 2019-01-08 10:04                   ` Tetsuo Handa
  2019-01-08 11:24                     ` Jan Kara
  1 sibling, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2019-01-08 10:04 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Alexander Viro, syzbot, linux-mm, mgorman,
	Michal Hocko, ak, jlayton, linux-kernel, mawilcox,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On 2019/01/03 2:26, Jan Kara wrote:
> On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
>> On 2019/01/02 23:40, Jan Kara wrote:
>>> I had a look into this and the only good explanation for this I have is
>>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
>>> If that would happen, we'd get exactly the behavior syzkaller observes
>>> because grow_buffers() would populate different page than
>>> __find_get_block() then looks up.
>>>
>>> However I don't see how that's possible since the filesystem has the block
>>> device open exclusively and blkdev_bszset() makes sure we also have
>>> exclusive access to the block device before changing the block device size.
>>> So changing block device block size after filesystem gets access to the
>>> device should be impossible. 
>>>
>>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
>>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
>>> whether my theory is right or not. Thanks!
>>>

Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.

https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000

[  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
(...snipped...)
[  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12

https://syzkaller.appspot.com/text?tag=CrashLog&x=143383d7400000

[ 1355.681513][ T6893] syz-executor0(6893): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
[ 1358.274585][T15649] kworker/u4:17(15649): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
(...snipped...)
[ 1455.341572][ T6893] syz-executor0(6893): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
[ 1455.541457][T15649] kworker/u4:17(15649): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12

>>
>> OK. Andrew, will you add (or fold into) this change?
>>
>> From e6f334380ad2c87457bfc2a4058316c47f75824a Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Thu, 3 Jan 2019 01:03:35 +0900
>> Subject: [PATCH] fs/buffer.c: dump more info for __getblk_gfp() stall problem
>>
>> We need to dump more variables on top of
>> "fs/buffer.c: add debug print for __getblk_gfp() stall problem".
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Jan Kara <jack@suse.cz>
>> ---
>>  fs/buffer.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 580fda0..a50acac 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -1066,9 +1066,14 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
>>  #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
>>  		if (!time_after(jiffies, current->getblk_stamp + 3 * HZ))
>>  			continue;
>> -		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx\n",
>> +		printk(KERN_ERR "%s(%u): getblk(): executed=%x bh_count=%d bh_state=%lx "
>> +		       "bdev_super_blocksize=%lu size=%u "
>> +		       "bdev_super_blocksize_bits=%u bdev_inode_blkbits=%u\n",
>>  		       current->comm, current->pid, current->getblk_executed,
>> -		       current->getblk_bh_count, current->getblk_bh_state);
>> +		       current->getblk_bh_count, current->getblk_bh_state,
>> +		       bdev->bd_super->s_blocksize, size,
>> +		       bdev->bd_super->s_blocksize_bits,
>> +		       bdev->bd_inode->i_blkbits);
> 
> Well, bd_super may be NULL if there's no filesystem mounted so it would be
> safer to check for this rather than blindly dereferencing it... Otherwise
> the change looks good to me.
> 
> 								Honza
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2019-01-08 10:04                   ` Tetsuo Handa
@ 2019-01-08 11:24                     ` Jan Kara
  2019-01-08 11:49                         ` Dmitry Vyukov
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2019-01-08 11:24 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Andrew Morton, Alexander Viro, syzbot, linux-mm,
	mgorman, Michal Hocko, ak, jlayton, linux-kernel, mawilcox,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> On 2019/01/03 2:26, Jan Kara wrote:
> > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> >> On 2019/01/02 23:40, Jan Kara wrote:
> >>> I had a look into this and the only good explanation for this I have is
> >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> >>> If that would happen, we'd get exactly the behavior syzkaller observes
> >>> because grow_buffers() would populate different page than
> >>> __find_get_block() then looks up.
> >>>
> >>> However I don't see how that's possible since the filesystem has the block
> >>> device open exclusively and blkdev_bszset() makes sure we also have
> >>> exclusive access to the block device before changing the block device size.
> >>> So changing block device block size after filesystem gets access to the
> >>> device should be impossible. 
> >>>
> >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> >>> whether my theory is right or not. Thanks!
> >>>
> 
> Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> 
> https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> 
> [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> (...snipped...)
> [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12

Right, so indeed the block size in the superblock and in the block device
gets out of sync which explains why we endlessly loop in the buffer cache
code. The superblock uses blocksize of 512 while the block device thinks
the set block size is 4096.

And after staring into the code for some time, I finally have a trivial
reproducer:

truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt
<hangs>

And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
device block size to 4096 by calling bd_set_size(). I have to think how to
best fix this...

Thanks for your help with debugging this!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-08 11:49                         ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-08 11:49 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	Matthew Wilcox, syzkaller-bugs, tim.c.chen, linux-fsdevel

On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > On 2019/01/03 2:26, Jan Kara wrote:
> > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > >> On 2019/01/02 23:40, Jan Kara wrote:
> > >>> I had a look into this and the only good explanation for this I have is
> > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > >>> because grow_buffers() would populate different page than
> > >>> __find_get_block() then looks up.
> > >>>
> > >>> However I don't see how that's possible since the filesystem has the block
> > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > >>> exclusive access to the block device before changing the block device size.
> > >>> So changing block device block size after filesystem gets access to the
> > >>> device should be impossible.
> > >>>
> > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > >>> whether my theory is right or not. Thanks!
> > >>>
> >
> > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> >
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> >
> > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > (...snipped...)
> > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
>
> Right, so indeed the block size in the superblock and in the block device
> gets out of sync which explains why we endlessly loop in the buffer cache
> code. The superblock uses blocksize of 512 while the block device thinks
> the set block size is 4096.
>
> And after staring into the code for some time, I finally have a trivial
> reproducer:
>
> truncate -s 1G /tmp/image
> losetup /dev/loop0 /tmp/image
> mkfs.ext4 -b 1024 /dev/loop0
> mount -t ext4 /dev/loop0 /mnt
> losetup -c /dev/loop0
> l /mnt
> <hangs>
>
> And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> device block size to 4096 by calling bd_set_size(). I have to think how to
> best fix this...
>
> Thanks for your help with debugging this!

Wow! I am very excited.
We have 587 open "task hung" reports, I suspect this explains lots of them.
What would be some pattern that we can use to best-effort distinguish
most manifestations? Skimming through few reports I see "inode_lock",
"get_super", "blkdev_put" as common indicators. Anything else?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-08 11:49                         ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-08 11:49 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	Matthew Wilcox, syzkaller-bugs, tim.c.chen, linux-fsdevel

On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > On 2019/01/03 2:26, Jan Kara wrote:
> > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > >> On 2019/01/02 23:40, Jan Kara wrote:
> > >>> I had a look into this and the only good explanation for this I have is
> > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > >>> because grow_buffers() would populate different page than
> > >>> __find_get_block() then looks up.
> > >>>
> > >>> However I don't see how that's possible since the filesystem has the block
> > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > >>> exclusive access to the block device before changing the block device size.
> > >>> So changing block device block size after filesystem gets access to the
> > >>> device should be impossible.
> > >>>
> > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > >>> whether my theory is right or not. Thanks!
> > >>>
> >
> > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> >
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> >
> > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > (...snipped...)
> > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
>
> Right, so indeed the block size in the superblock and in the block device
> gets out of sync which explains why we endlessly loop in the buffer cache
> code. The superblock uses blocksize of 512 while the block device thinks
> the set block size is 4096.
>
> And after staring into the code for some time, I finally have a trivial
> reproducer:
>
> truncate -s 1G /tmp/image
> losetup /dev/loop0 /tmp/image
> mkfs.ext4 -b 1024 /dev/loop0
> mount -t ext4 /dev/loop0 /mnt
> losetup -c /dev/loop0
> l /mnt
> <hangs>
>
> And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> device block size to 4096 by calling bd_set_size(). I have to think how to
> best fix this...
>
> Thanks for your help with debugging this!

Wow! I am very excited.
We have 587 open "task hung" reports, I suspect this explains lots of them.
What would be some pattern that we can use to best-effort distinguish
most manifestations? Skimming through few reports I see "inode_lock",
"get_super", "blkdev_put" as common indicators. Anything else?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2019-01-08 11:49                         ` Dmitry Vyukov
  (?)
@ 2019-01-09 13:30                         ` Jan Kara
  2019-01-14 15:11                             ` Dmitry Vyukov
  -1 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2019-01-09 13:30 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jan Kara, Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot,
	Linux-MM, Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	Matthew Wilcox, syzkaller-bugs, tim.c.chen, linux-fsdevel

On Tue 08-01-19 12:49:08, Dmitry Vyukov wrote:
> On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > > On 2019/01/03 2:26, Jan Kara wrote:
> > > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > > >> On 2019/01/02 23:40, Jan Kara wrote:
> > > >>> I had a look into this and the only good explanation for this I have is
> > > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > > >>> because grow_buffers() would populate different page than
> > > >>> __find_get_block() then looks up.
> > > >>>
> > > >>> However I don't see how that's possible since the filesystem has the block
> > > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > > >>> exclusive access to the block device before changing the block device size.
> > > >>> So changing block device block size after filesystem gets access to the
> > > >>> device should be impossible.
> > > >>>
> > > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > > >>> whether my theory is right or not. Thanks!
> > > >>>
> > >
> > > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> > >
> > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> > >
> > > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > (...snipped...)
> > > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> >
> > Right, so indeed the block size in the superblock and in the block device
> > gets out of sync which explains why we endlessly loop in the buffer cache
> > code. The superblock uses blocksize of 512 while the block device thinks
> > the set block size is 4096.
> >
> > And after staring into the code for some time, I finally have a trivial
> > reproducer:
> >
> > truncate -s 1G /tmp/image
> > losetup /dev/loop0 /tmp/image
> > mkfs.ext4 -b 1024 /dev/loop0
> > mount -t ext4 /dev/loop0 /mnt
> > losetup -c /dev/loop0
> > l /mnt
> > <hangs>
> >
> > And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> > device block size to 4096 by calling bd_set_size(). I have to think how to
> > best fix this...
> >
> > Thanks for your help with debugging this!
> 
> Wow! I am very excited.
> We have 587 open "task hung" reports, I suspect this explains lots of them.
> What would be some pattern that we can use to best-effort distinguish
> most manifestations? Skimming through few reports I see "inode_lock",
> "get_super", "blkdev_put" as common indicators. Anything else?

Well, there will be always looping task with __getblk_gfp() on its stack
(which should be visible in the stacktrace generated by the stall
detector). Then there can be lots of other processes getting blocked due to
locks and other resources held by this task...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-14 15:11                             ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-14 15:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	Matthew Wilcox, syzkaller-bugs, tim.c.chen, linux-fsdevel

On Wed, Jan 9, 2019 at 2:30 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 08-01-19 12:49:08, Dmitry Vyukov wrote:
> > On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > > > On 2019/01/03 2:26, Jan Kara wrote:
> > > > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > > > >> On 2019/01/02 23:40, Jan Kara wrote:
> > > > >>> I had a look into this and the only good explanation for this I have is
> > > > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > > > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > > > >>> because grow_buffers() would populate different page than
> > > > >>> __find_get_block() then looks up.
> > > > >>>
> > > > >>> However I don't see how that's possible since the filesystem has the block
> > > > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > > > >>> exclusive access to the block device before changing the block device size.
> > > > >>> So changing block device block size after filesystem gets access to the
> > > > >>> device should be impossible.
> > > > >>>
> > > > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > > > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > > > >>> whether my theory is right or not. Thanks!
> > > > >>>
> > > >
> > > > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> > > >
> > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> > > >
> > > > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > > (...snipped...)
> > > > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > >
> > > Right, so indeed the block size in the superblock and in the block device
> > > gets out of sync which explains why we endlessly loop in the buffer cache
> > > code. The superblock uses blocksize of 512 while the block device thinks
> > > the set block size is 4096.
> > >
> > > And after staring into the code for some time, I finally have a trivial
> > > reproducer:
> > >
> > > truncate -s 1G /tmp/image
> > > losetup /dev/loop0 /tmp/image
> > > mkfs.ext4 -b 1024 /dev/loop0
> > > mount -t ext4 /dev/loop0 /mnt
> > > losetup -c /dev/loop0
> > > l /mnt
> > > <hangs>
> > >
> > > And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> > > device block size to 4096 by calling bd_set_size(). I have to think how to
> > > best fix this...
> > >
> > > Thanks for your help with debugging this!
> >
> > Wow! I am very excited.
> > We have 587 open "task hung" reports, I suspect this explains lots of them.
> > What would be some pattern that we can use to best-effort distinguish
> > most manifestations? Skimming through few reports I see "inode_lock",
> > "get_super", "blkdev_put" as common indicators. Anything else?
>
> Well, there will be always looping task with __getblk_gfp() on its stack
> (which should be visible in the stacktrace generated by the stall
> detector). Then there can be lots of other processes getting blocked due to
> locks and other resources held by this task...


Once we have a fix, I plan to do a sweep over existing open "task
hung" reports and dup lots of them onto this one. Probably preferring
to over-sweep rather then to under-sweep because there are too many of
them and lots does not seem to be actionable otherwise.
Tetsuo, do you have comments before I start?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-14 15:11                             ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-14 15:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	Matthew Wilcox, syzkaller-bugs, tim.c.chen, linux-fsdevel

On Wed, Jan 9, 2019 at 2:30 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 08-01-19 12:49:08, Dmitry Vyukov wrote:
> > On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > > > On 2019/01/03 2:26, Jan Kara wrote:
> > > > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > > > >> On 2019/01/02 23:40, Jan Kara wrote:
> > > > >>> I had a look into this and the only good explanation for this I have is
> > > > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > > > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > > > >>> because grow_buffers() would populate different page than
> > > > >>> __find_get_block() then looks up.
> > > > >>>
> > > > >>> However I don't see how that's possible since the filesystem has the block
> > > > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > > > >>> exclusive access to the block device before changing the block device size.
> > > > >>> So changing block device block size after filesystem gets access to the
> > > > >>> device should be impossible.
> > > > >>>
> > > > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > > > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > > > >>> whether my theory is right or not. Thanks!
> > > > >>>
> > > >
> > > > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> > > >
> > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> > > >
> > > > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > > (...snipped...)
> > > > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > >
> > > Right, so indeed the block size in the superblock and in the block device
> > > gets out of sync which explains why we endlessly loop in the buffer cache
> > > code. The superblock uses blocksize of 512 while the block device thinks
> > > the set block size is 4096.
> > >
> > > And after staring into the code for some time, I finally have a trivial
> > > reproducer:
> > >
> > > truncate -s 1G /tmp/image
> > > losetup /dev/loop0 /tmp/image
> > > mkfs.ext4 -b 1024 /dev/loop0
> > > mount -t ext4 /dev/loop0 /mnt
> > > losetup -c /dev/loop0
> > > l /mnt
> > > <hangs>
> > >
> > > And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> > > device block size to 4096 by calling bd_set_size(). I have to think how to
> > > best fix this...
> > >
> > > Thanks for your help with debugging this!
> >
> > Wow! I am very excited.
> > We have 587 open "task hung" reports, I suspect this explains lots of them.
> > What would be some pattern that we can use to best-effort distinguish
> > most manifestations? Skimming through few reports I see "inode_lock",
> > "get_super", "blkdev_put" as common indicators. Anything else?
>
> Well, there will be always looping task with __getblk_gfp() on its stack
> (which should be visible in the stacktrace generated by the stall
> detector). Then there can be lots of other processes getting blocked due to
> locks and other resources held by this task...


Once we have a fix, I plan to do a sweep over existing open "task
hung" reports and dup lots of them onto this one. Probably preferring
to over-sweep rather then to under-sweep because there are too many of
them and lots does not seem to be actionable otherwise.
Tetsuo, do you have comments before I start?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-14 15:13                               ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-14 15:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On Mon, Jan 14, 2019 at 4:11 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Jan 9, 2019 at 2:30 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 08-01-19 12:49:08, Dmitry Vyukov wrote:
> > > On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > > > > On 2019/01/03 2:26, Jan Kara wrote:
> > > > > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > > > > >> On 2019/01/02 23:40, Jan Kara wrote:
> > > > > >>> I had a look into this and the only good explanation for this I have is
> > > > > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > > > > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > > > > >>> because grow_buffers() would populate different page than
> > > > > >>> __find_get_block() then looks up.
> > > > > >>>
> > > > > >>> However I don't see how that's possible since the filesystem has the block
> > > > > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > > > > >>> exclusive access to the block device before changing the block device size.
> > > > > >>> So changing block device block size after filesystem gets access to the
> > > > > >>> device should be impossible.
> > > > > >>>
> > > > > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > > > > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > > > > >>> whether my theory is right or not. Thanks!
> > > > > >>>
> > > > >
> > > > > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> > > > >
> > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> > > > >
> > > > > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > > > (...snipped...)
> > > > > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > >
> > > > Right, so indeed the block size in the superblock and in the block device
> > > > gets out of sync which explains why we endlessly loop in the buffer cache
> > > > code. The superblock uses blocksize of 512 while the block device thinks
> > > > the set block size is 4096.
> > > >
> > > > And after staring into the code for some time, I finally have a trivial
> > > > reproducer:
> > > >
> > > > truncate -s 1G /tmp/image
> > > > losetup /dev/loop0 /tmp/image
> > > > mkfs.ext4 -b 1024 /dev/loop0
> > > > mount -t ext4 /dev/loop0 /mnt
> > > > losetup -c /dev/loop0
> > > > l /mnt
> > > > <hangs>
> > > >
> > > > And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> > > > device block size to 4096 by calling bd_set_size(). I have to think how to
> > > > best fix this...
> > > >
> > > > Thanks for your help with debugging this!
> > >
> > > Wow! I am very excited.
> > > We have 587 open "task hung" reports, I suspect this explains lots of them.
> > > What would be some pattern that we can use to best-effort distinguish
> > > most manifestations? Skimming through few reports I see "inode_lock",
> > > "get_super", "blkdev_put" as common indicators. Anything else?
> >
> > Well, there will be always looping task with __getblk_gfp() on its stack
> > (which should be visible in the stacktrace generated by the stall
> > detector). Then there can be lots of other processes getting blocked due to
> > locks and other resources held by this task...
>
>
> Once we have a fix, I plan to do a sweep over existing open "task
> hung" reports and dup lots of them onto this one. Probably preferring
> to over-sweep rather then to under-sweep because there are too many of
> them and lots does not seem to be actionable otherwise.
> Tetsuo, do you have comments before I start?

Also, is it possible to add some kind of WARNING for this condition?
Taking into account how much effort it too to debug, looks like a
useful check. Or did I ask this already...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
@ 2019-01-14 15:13                               ` Dmitry Vyukov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2019-01-14 15:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot, Linux-MM,
	Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On Mon, Jan 14, 2019 at 4:11 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Jan 9, 2019 at 2:30 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 08-01-19 12:49:08, Dmitry Vyukov wrote:
> > > On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > > > > On 2019/01/03 2:26, Jan Kara wrote:
> > > > > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > > > > >> On 2019/01/02 23:40, Jan Kara wrote:
> > > > > >>> I had a look into this and the only good explanation for this I have is
> > > > > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > > > > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > > > > >>> because grow_buffers() would populate different page than
> > > > > >>> __find_get_block() then looks up.
> > > > > >>>
> > > > > >>> However I don't see how that's possible since the filesystem has the block
> > > > > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > > > > >>> exclusive access to the block device before changing the block device size.
> > > > > >>> So changing block device block size after filesystem gets access to the
> > > > > >>> device should be impossible.
> > > > > >>>
> > > > > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > > > > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > > > > >>> whether my theory is right or not. Thanks!
> > > > > >>>
> > > > >
> > > > > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> > > > >
> > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> > > > >
> > > > > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > > > (...snipped...)
> > > > > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > >
> > > > Right, so indeed the block size in the superblock and in the block device
> > > > gets out of sync which explains why we endlessly loop in the buffer cache
> > > > code. The superblock uses blocksize of 512 while the block device thinks
> > > > the set block size is 4096.
> > > >
> > > > And after staring into the code for some time, I finally have a trivial
> > > > reproducer:
> > > >
> > > > truncate -s 1G /tmp/image
> > > > losetup /dev/loop0 /tmp/image
> > > > mkfs.ext4 -b 1024 /dev/loop0
> > > > mount -t ext4 /dev/loop0 /mnt
> > > > losetup -c /dev/loop0
> > > > l /mnt
> > > > <hangs>
> > > >
> > > > And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> > > > device block size to 4096 by calling bd_set_size(). I have to think how to
> > > > best fix this...
> > > >
> > > > Thanks for your help with debugging this!
> > >
> > > Wow! I am very excited.
> > > We have 587 open "task hung" reports, I suspect this explains lots of them.
> > > What would be some pattern that we can use to best-effort distinguish
> > > most manifestations? Skimming through few reports I see "inode_lock",
> > > "get_super", "blkdev_put" as common indicators. Anything else?
> >
> > Well, there will be always looping task with __getblk_gfp() on its stack
> > (which should be visible in the stacktrace generated by the stall
> > detector). Then there can be lots of other processes getting blocked due to
> > locks and other resources held by this task...
>
>
> Once we have a fix, I plan to do a sweep over existing open "task
> hung" reports and dup lots of them onto this one. Probably preferring
> to over-sweep rather then to under-sweep because there are too many of
> them and lots does not seem to be actionable otherwise.
> Tetsuo, do you have comments before I start?

Also, is it possible to add some kind of WARNING for this condition?
Taking into account how much effort it too to debug, looks like a
useful check. Or did I ask this already...


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: INFO: task hung in generic_file_write_iter
  2019-01-14 15:13                               ` Dmitry Vyukov
  (?)
@ 2019-01-15  9:29                               ` Jan Kara
  -1 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2019-01-15  9:29 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jan Kara, Tetsuo Handa, Andrew Morton, Alexander Viro, syzbot,
	Linux-MM, Mel Gorman, Michal Hocko, Andi Kleen, jlayton, LKML,
	syzkaller-bugs, tim.c.chen, linux-fsdevel

On Mon 14-01-19 16:13:08, Dmitry Vyukov wrote:
> On Mon, Jan 14, 2019 at 4:11 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Jan 9, 2019 at 2:30 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Tue 08-01-19 12:49:08, Dmitry Vyukov wrote:
> > > > On Tue, Jan 8, 2019 at 12:24 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > > > > > On 2019/01/03 2:26, Jan Kara wrote:
> > > > > > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> > > > > > >> On 2019/01/02 23:40, Jan Kara wrote:
> > > > > > >>> I had a look into this and the only good explanation for this I have is
> > > > > > >>> that sb->s_blocksize is different from (1 << sb->s_bdev->bd_inode->i_blkbits).
> > > > > > >>> If that would happen, we'd get exactly the behavior syzkaller observes
> > > > > > >>> because grow_buffers() would populate different page than
> > > > > > >>> __find_get_block() then looks up.
> > > > > > >>>
> > > > > > >>> However I don't see how that's possible since the filesystem has the block
> > > > > > >>> device open exclusively and blkdev_bszset() makes sure we also have
> > > > > > >>> exclusive access to the block device before changing the block device size.
> > > > > > >>> So changing block device block size after filesystem gets access to the
> > > > > > >>> device should be impossible.
> > > > > > >>>
> > > > > > >>> Anyway, could you perhaps add to your debug patch a dump of 'size' passed
> > > > > > >>> to __getblk_slow() and bdev->bd_inode->i_blkbits? That should tell us
> > > > > > >>> whether my theory is right or not. Thanks!
> > > > > > >>>
> > > > > >
> > > > > > Got two reports. 'size' is 512 while bdev->bd_inode->i_blkbits is 12.
> > > > > >
> > > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1237c3ab400000
> > > > > >
> > > > > > [  385.723941][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > > > > (...snipped...)
> > > > > > [  568.159544][  T439] kworker/u4:3(439): getblk(): executed=9 bh_count=0 bh_state=0 bdev_super_blocksize=512 size=512 bdev_super_blocksize_bits=9 bdev_inode_blkbits=12
> > > > >
> > > > > Right, so indeed the block size in the superblock and in the block device
> > > > > gets out of sync which explains why we endlessly loop in the buffer cache
> > > > > code. The superblock uses blocksize of 512 while the block device thinks
> > > > > the set block size is 4096.
> > > > >
> > > > > And after staring into the code for some time, I finally have a trivial
> > > > > reproducer:
> > > > >
> > > > > truncate -s 1G /tmp/image
> > > > > losetup /dev/loop0 /tmp/image
> > > > > mkfs.ext4 -b 1024 /dev/loop0
> > > > > mount -t ext4 /dev/loop0 /mnt
> > > > > losetup -c /dev/loop0
> > > > > l /mnt
> > > > > <hangs>
> > > > >
> > > > > And the problem is that LOOP_SET_CAPACITY ioctl ends up reseting block
> > > > > device block size to 4096 by calling bd_set_size(). I have to think how to
> > > > > best fix this...
> > > > >
> > > > > Thanks for your help with debugging this!
> > > >
> > > > Wow! I am very excited.
> > > > We have 587 open "task hung" reports, I suspect this explains lots of them.
> > > > What would be some pattern that we can use to best-effort distinguish
> > > > most manifestations? Skimming through few reports I see "inode_lock",
> > > > "get_super", "blkdev_put" as common indicators. Anything else?
> > >
> > > Well, there will be always looping task with __getblk_gfp() on its stack
> > > (which should be visible in the stacktrace generated by the stall
> > > detector). Then there can be lots of other processes getting blocked due to
> > > locks and other resources held by this task...
> >
> >
> > Once we have a fix, I plan to do a sweep over existing open "task

I have submitted the fix yesterday to linux-block ([PATCH 0/2] blkdev: Fix
livelock when loop device updates capacity).

> > hung" reports and dup lots of them onto this one. Probably preferring
> > to over-sweep rather then to under-sweep because there are too many of
> > them and lots does not seem to be actionable otherwise.
> > Tetsuo, do you have comments before I start?
> 
> Also, is it possible to add some kind of WARNING for this condition?
> Taking into account how much effort it too to debug, looks like a
> useful check. Or did I ask this already...

There are two things we could do:

1) Warn if we loop in __getblk_slow() more than couple of times (looping
once is normal, looping twice can happen easily due to races).

2) Warn & bail if block size passed to __getblk_slow() does not match the
block device block size.

I'll write the patch.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-01-15  9:29 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18  8:58 INFO: task hung in generic_file_write_iter syzbot
2018-07-18 10:28 ` Tetsuo Handa
2018-07-18 10:36   ` Dmitry Vyukov
2018-07-20 10:36   ` Tetsuo Handa
2018-07-20 10:36     ` Tetsuo Handa
2018-07-20 10:36     ` Tetsuo Handa
2018-07-20 20:06     ` Andrew Morton
2018-07-30 15:07       ` Tetsuo Handa
2018-08-06 10:09         ` Jan Kara
2018-08-06 11:56           ` Tetsuo Handa
2018-08-20 14:12             ` Tetsuo Handa
2018-12-28 13:34           ` Tetsuo Handa
2019-01-02 14:40             ` Jan Kara
2019-01-02 14:46               ` Dmitry Vyukov
2019-01-02 14:46                 ` Dmitry Vyukov
2019-01-02 16:07               ` Tetsuo Handa
2019-01-02 16:07                 ` Tetsuo Handa
2019-01-02 17:26                 ` Jan Kara
2019-01-03  0:46                   ` Tetsuo Handa
2019-01-03  0:46                     ` Tetsuo Handa
2019-01-08 10:04                   ` Tetsuo Handa
2019-01-08 11:24                     ` Jan Kara
2019-01-08 11:49                       ` Dmitry Vyukov
2019-01-08 11:49                         ` Dmitry Vyukov
2019-01-09 13:30                         ` Jan Kara
2019-01-14 15:11                           ` Dmitry Vyukov
2019-01-14 15:11                             ` Dmitry Vyukov
2019-01-14 15:13                             ` Dmitry Vyukov
2019-01-14 15:13                               ` Dmitry Vyukov
2019-01-15  9:29                               ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.