linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: task hung in grab_super
@ 2018-07-18 10:38 syzbot
  2018-07-18 10:49 ` Tetsuo Handa
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2018-07-18 10:38 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot found the following crash on:

HEAD commit:    3c53776e29f8 Mark HI and TASKLET softirq synchronous
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=136af978400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6d0ccc9273f0e539
dashboard link: https://syzkaller.appspot.com/bug?extid=f425456ea8aa16b40d20
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+f425456ea8aa16b40d20@syzkaller.appspotmail.com

INFO: task syz-executor1:20048 blocked for more than 140 seconds.
       Not tainted 4.18.0-rc5+ #152
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor1   D24600 20048   4592 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2853 [inline]
  __schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
  schedule+0xfb/0x450 kernel/sched/core.c:3545
  __rwsem_down_write_failed_common+0x95d/0x1630  
kernel/locking/rwsem-xadd.c:566
  rwsem_down_write_failed+0xe/0x10 kernel/locking/rwsem-xadd.c:595
  call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
  __down_write arch/x86/include/asm/rwsem.h:142 [inline]
  down_write+0xaa/0x130 kernel/locking/rwsem.c:72
  grab_super+0xac/0x2a0 fs/super.c:380
  sget_userns+0x457/0xf20 fs/super.c:508
  sget+0x10b/0x150 fs/super.c:567
  mount_bdev+0x111/0x3e0 fs/super.c:1147
  ext4_mount+0x34/0x40 fs/ext4/super.c:5844
  mount_fs+0xae/0x328 fs/super.c:1277
  vfs_kern_mount.part.34+0xdc/0x4e0 fs/namespace.c:1037
  vfs_kern_mount fs/namespace.c:1027 [inline]
  do_new_mount fs/namespace.c:2518 [inline]
  do_mount+0x581/0x30e0 fs/namespace.c:2848
  ksys_mount+0x12d/0x140 fs/namespace.c:3064
  __do_sys_mount fs/namespace.c:3078 [inline]
  __se_sys_mount fs/namespace.c:3075 [inline]
  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: e0 1f 48 89 04 24 e8 b6 6f fd ff e8 81 6a fd ff e8 5c 68 fd ff 48 8d  
05 23 cd 48 00 48 89 04 24 48 c7 44 24 08 1d 00 00 00 e8 <13> 5e fd ff 0f  
0b e8 8c 44 00 00 e9 07 f0 ff ff cc cc cc cc cc cc
RSP: 002b:00007f243a3f3c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f243a3f46d4 RCX: 0000000000455ab9
RDX: 0000000020000140 RSI: 0000000020000000 RDI: 0000000020000040
RBP: 000000000072bf48 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004c01fb R14: 00000000004cfe38 R15: 0000000000000001

Showing all locks held in the system:
1 lock held by khungtaskd/902:
  #0: 0000000035584d3c (rcu_read_lock){....}, at:  
debug_show_all_locks+0xd0/0x428 kernel/locking/lockdep.c:4461
1 lock held by rsyslogd/4451:
2 locks held by getty/4541:
  #0: 000000006f0b7018 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000bae5c9d0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4542:
  #0: 00000000e9d8517f (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000150ebe8a (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4543:
  #0: 0000000040e50d21 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 0000000069970342 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4544:
  #0: 00000000955281b7 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 000000004616d311 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4545:
  #0: 000000003faa1dd4 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 000000000dfddacf (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4546:
  #0: 00000000c5cc544f (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000caf6720f (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4547:
  #0: 00000000c10d3596 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000e2223f78 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
1 lock held by syz-executor1/19991:
2 locks held by syz-executor1/20048:
  #0: 00000000ca0a2d7d (&bdev->bd_fsfreeze_mutex){+.+.}, at:  
mount_bdev+0xa5/0x3e0 fs/super.c:1141
  #1: 00000000af861ef0 (&type->s_umount_key#69){+.+.}, at:  
grab_super+0xac/0x2a0 fs/super.c:380

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 902 Comm: khungtaskd Not tainted 4.18.0-rc5+ #152
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:196 [inline]
  watchdog+0x9c4/0xf80 kernel/hung_task.c:252
  kthread+0x345/0x410 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 19991 Comm: syz-executor1 Not tainted 4.18.0-rc5+ #152
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__lock_is_held+0x76/0x140 kernel/locking/lockdep.c:3741
Code: 00 00 4c 8b 6d c8 4d 8d b4 24 38 08 00 00 45 31 ff 48 b8 00 00 00 00  
00 fc ff df 49 c1 ed 03 49 01 c5 41 8b 84 24 30 08 00 00 <85> c0 7f 24 e9  
85 00 00 00 41 0f b6 45 00 41 83 c7 01 84 c0 74 08
RSP: 0018:ffff88018ea3eed8 EFLAGS: 00000086
RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffffc9000443f000
RDX: 1ffff10031c245de RSI: 00000000ffffffff RDI: ffffffff88f925e0
RBP: ffff88018ea3ef18 R08: ffff88018e1226c0 R09: ffffed003b5e46d6
R10: 0000000000000003 R11: 0000000000000001 R12: ffff88018e1226c0
R13: ffffed0031c245de R14: ffff88018e122ef8 R15: 0000000000000000
FS:  00007f243a415700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001d8749000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  lock_is_held_type+0x121/0x210 kernel/locking/lockdep.c:3962
  lock_is_held include/linux/lockdep.h:344 [inline]
  ___might_sleep+0x295/0x330 kernel/sched/core.c:6152
  __might_sleep+0x95/0x190 kernel/sched/core.c:6140
  lock_page include/linux/pagemap.h:481 [inline]
  pagecache_get_page+0x787/0xe40 mm/filemap.c:1554
  find_or_create_page include/linux/pagemap.h:322 [inline]
  grow_dev_page fs/buffer.c:940 [inline]
  grow_buffers fs/buffer.c:1009 [inline]
  __getblk_slow fs/buffer.c:1036 [inline]
  __getblk_gfp+0x3bb/0xb10 fs/buffer.c:1313
  __bread_gfp+0x2d/0x310 fs/buffer.c:1347
  sb_bread_unmovable include/linux/buffer_head.h:313 [inline]
  ext4_fill_super+0x6de/0xda80 fs/ext4/super.c:3541
  mount_bdev+0x30c/0x3e0 fs/super.c:1174
  ext4_mount+0x34/0x40 fs/ext4/super.c:5844
  mount_fs+0xae/0x328 fs/super.c:1277
  vfs_kern_mount.part.34+0xdc/0x4e0 fs/namespace.c:1037
  vfs_kern_mount fs/namespace.c:1027 [inline]
  do_new_mount fs/namespace.c:2518 [inline]
  do_mount+0x581/0x30e0 fs/namespace.c:2848
  ksys_mount+0x12d/0x140 fs/namespace.c:3064
  __do_sys_mount fs/namespace.c:3078 [inline]
  __se_sys_mount fs/namespace.c:3075 [inline]
  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f243a414c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f243a4156d4 RCX: 0000000000455ab9
RDX: 0000000020000140 RSI: 0000000020000000 RDI: 0000000020000040
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004c01fb R14: 00000000004cfe38 R15: 0000000000000000
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.163  
msecs


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 10:38 INFO: task hung in grab_super syzbot
@ 2018-07-18 10:49 ` Tetsuo Handa
  2018-07-18 11:02   ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2018-07-18 10:49 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Dmitry,

this is yet another example of stalling inside __bread_gfp().

Can you find all reports where NMI backtrace contains __bread_gfp ?

I need to wget all reports if I try to do that on my side.
If you can locally grep on your side, it will be nice.

On 2018/07/18 19:38, syzbot wrote:
> CPU: 1 PID: 19991 Comm: syz-executor1 Not tainted 4.18.0-rc5+ #152
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__lock_is_held+0x76/0x140 kernel/locking/lockdep.c:3741
> Code: 00 00 4c 8b 6d c8 4d 8d b4 24 38 08 00 00 45 31 ff 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 49 01 c5 41 8b 84 24 30 08 00 00 <85> c0 7f 24 e9 85 00 00 00 41 0f b6 45 00 41 83 c7 01 84 c0 74 08
> RSP: 0018:ffff88018ea3eed8 EFLAGS: 00000086
> RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffffc9000443f000
> RDX: 1ffff10031c245de RSI: 00000000ffffffff RDI: ffffffff88f925e0
> RBP: ffff88018ea3ef18 R08: ffff88018e1226c0 R09: ffffed003b5e46d6
> R10: 0000000000000003 R11: 0000000000000001 R12: ffff88018e1226c0
> R13: ffffed0031c245de R14: ffff88018e122ef8 R15: 0000000000000000
> FS:  00007f243a415700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 00000001d8749000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  lock_is_held_type+0x121/0x210 kernel/locking/lockdep.c:3962
>  lock_is_held include/linux/lockdep.h:344 [inline]
>  ___might_sleep+0x295/0x330 kernel/sched/core.c:6152
>  __might_sleep+0x95/0x190 kernel/sched/core.c:6140
>  lock_page include/linux/pagemap.h:481 [inline]
>  pagecache_get_page+0x787/0xe40 mm/filemap.c:1554
>  find_or_create_page include/linux/pagemap.h:322 [inline]
>  grow_dev_page fs/buffer.c:940 [inline]
>  grow_buffers fs/buffer.c:1009 [inline]
>  __getblk_slow fs/buffer.c:1036 [inline]
>  __getblk_gfp+0x3bb/0xb10 fs/buffer.c:1313
>  __bread_gfp+0x2d/0x310 fs/buffer.c:1347
>  sb_bread_unmovable include/linux/buffer_head.h:313 [inline]
>  ext4_fill_super+0x6de/0xda80 fs/ext4/super.c:3541
>  mount_bdev+0x30c/0x3e0 fs/super.c:1174
>  ext4_mount+0x34/0x40 fs/ext4/super.c:5844
>  mount_fs+0xae/0x328 fs/super.c:1277
>  vfs_kern_mount.part.34+0xdc/0x4e0 fs/namespace.c:1037
>  vfs_kern_mount fs/namespace.c:1027 [inline]
>  do_new_mount fs/namespace.c:2518 [inline]
>  do_mount+0x581/0x30e0 fs/namespace.c:2848
>  ksys_mount+0x12d/0x140 fs/namespace.c:3064
>  __do_sys_mount fs/namespace.c:3078 [inline]
>  __se_sys_mount fs/namespace.c:3075 [inline]
>  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 10:49 ` Tetsuo Handa
@ 2018-07-18 11:02   ` Dmitry Vyukov
  2018-07-18 11:41     ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2018-07-18 11:02 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro

On Wed, Jul 18, 2018 at 12:49 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> Dmitry,
>
> this is yet another example of stalling inside __bread_gfp().
>
> Can you find all reports where NMI backtrace contains __bread_gfp ?
>
> I need to wget all reports if I try to do that on my side.
> If you can locally grep on your side, it will be nice.


No, I don't have grep. But here are some of the matches:

https://groups.google.com/forum/#!searchin/syzkaller-bugs/__bread_gfp%7Csort:date
https://groups.google.com/forum/#!searchin/syzkaller-upstream-moderation/__bread_gfp%7Csort:date



> On 2018/07/18 19:38, syzbot wrote:
>> CPU: 1 PID: 19991 Comm: syz-executor1 Not tainted 4.18.0-rc5+ #152
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> RIP: 0010:__lock_is_held+0x76/0x140 kernel/locking/lockdep.c:3741
>> Code: 00 00 4c 8b 6d c8 4d 8d b4 24 38 08 00 00 45 31 ff 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 49 01 c5 41 8b 84 24 30 08 00 00 <85> c0 7f 24 e9 85 00 00 00 41 0f b6 45 00 41 83 c7 01 84 c0 74 08
>> RSP: 0018:ffff88018ea3eed8 EFLAGS: 00000086
>> RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffffc9000443f000
>> RDX: 1ffff10031c245de RSI: 00000000ffffffff RDI: ffffffff88f925e0
>> RBP: ffff88018ea3ef18 R08: ffff88018e1226c0 R09: ffffed003b5e46d6
>> R10: 0000000000000003 R11: 0000000000000001 R12: ffff88018e1226c0
>> R13: ffffed0031c245de R14: ffff88018e122ef8 R15: 0000000000000000
>> FS:  00007f243a415700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 00000001d8749000 CR4: 00000000001406e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>  lock_is_held_type+0x121/0x210 kernel/locking/lockdep.c:3962
>>  lock_is_held include/linux/lockdep.h:344 [inline]
>>  ___might_sleep+0x295/0x330 kernel/sched/core.c:6152
>>  __might_sleep+0x95/0x190 kernel/sched/core.c:6140
>>  lock_page include/linux/pagemap.h:481 [inline]
>>  pagecache_get_page+0x787/0xe40 mm/filemap.c:1554
>>  find_or_create_page include/linux/pagemap.h:322 [inline]
>>  grow_dev_page fs/buffer.c:940 [inline]
>>  grow_buffers fs/buffer.c:1009 [inline]
>>  __getblk_slow fs/buffer.c:1036 [inline]
>>  __getblk_gfp+0x3bb/0xb10 fs/buffer.c:1313
>>  __bread_gfp+0x2d/0x310 fs/buffer.c:1347
>>  sb_bread_unmovable include/linux/buffer_head.h:313 [inline]
>>  ext4_fill_super+0x6de/0xda80 fs/ext4/super.c:3541
>>  mount_bdev+0x30c/0x3e0 fs/super.c:1174
>>  ext4_mount+0x34/0x40 fs/ext4/super.c:5844
>>  mount_fs+0xae/0x328 fs/super.c:1277
>>  vfs_kern_mount.part.34+0xdc/0x4e0 fs/namespace.c:1037
>>  vfs_kern_mount fs/namespace.c:1027 [inline]
>>  do_new_mount fs/namespace.c:2518 [inline]
>>  do_mount+0x581/0x30e0 fs/namespace.c:2848
>>  ksys_mount+0x12d/0x140 fs/namespace.c:3064
>>  __do_sys_mount fs/namespace.c:3078 [inline]
>>  __se_sys_mount fs/namespace.c:3075 [inline]
>>  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
>>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 11:02   ` Dmitry Vyukov
@ 2018-07-18 11:41     ` Dmitry Vyukov
  2018-07-18 12:53       ` Tetsuo Handa
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2018-07-18 11:41 UTC (permalink / raw)
  To: Tetsuo Handa, Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	v9fs-developer
  Cc: syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro

On Wed, Jul 18, 2018 at 1:02 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, Jul 18, 2018 at 12:49 PM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> Dmitry,
>>
>> this is yet another example of stalling inside __bread_gfp().
>>
>> Can you find all reports where NMI backtrace contains __bread_gfp ?
>>
>> I need to wget all reports if I try to do that on my side.
>> If you can locally grep on your side, it will be nice.
>
>
> No, I don't have grep. But here are some of the matches:
>
> https://groups.google.com/forum/#!searchin/syzkaller-bugs/__bread_gfp%7Csort:date
> https://groups.google.com/forum/#!searchin/syzkaller-upstream-moderation/__bread_gfp%7Csort:date
>
>
>
>> On 2018/07/18 19:38, syzbot wrote:
>>> CPU: 1 PID: 19991 Comm: syz-executor1 Not tainted 4.18.0-rc5+ #152
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>> RIP: 0010:__lock_is_held+0x76/0x140 kernel/locking/lockdep.c:3741
>>> Code: 00 00 4c 8b 6d c8 4d 8d b4 24 38 08 00 00 45 31 ff 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 49 01 c5 41 8b 84 24 30 08 00 00 <85> c0 7f 24 e9 85 00 00 00 41 0f b6 45 00 41 83 c7 01 84 c0 74 08
>>> RSP: 0018:ffff88018ea3eed8 EFLAGS: 00000086
>>> RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffffc9000443f000
>>> RDX: 1ffff10031c245de RSI: 00000000ffffffff RDI: ffffffff88f925e0
>>> RBP: ffff88018ea3ef18 R08: ffff88018e1226c0 R09: ffffed003b5e46d6
>>> R10: 0000000000000003 R11: 0000000000000001 R12: ffff88018e1226c0
>>> R13: ffffed0031c245de R14: ffff88018e122ef8 R15: 0000000000000000
>>> FS:  00007f243a415700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: ffffffffff600400 CR3: 00000001d8749000 CR4: 00000000001406e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Call Trace:
>>>  lock_is_held_type+0x121/0x210 kernel/locking/lockdep.c:3962
>>>  lock_is_held include/linux/lockdep.h:344 [inline]
>>>  ___might_sleep+0x295/0x330 kernel/sched/core.c:6152
>>>  __might_sleep+0x95/0x190 kernel/sched/core.c:6140
>>>  lock_page include/linux/pagemap.h:481 [inline]
>>>  pagecache_get_page+0x787/0xe40 mm/filemap.c:1554
>>>  find_or_create_page include/linux/pagemap.h:322 [inline]
>>>  grow_dev_page fs/buffer.c:940 [inline]
>>>  grow_buffers fs/buffer.c:1009 [inline]
>>>  __getblk_slow fs/buffer.c:1036 [inline]
>>>  __getblk_gfp+0x3bb/0xb10 fs/buffer.c:1313
>>>  __bread_gfp+0x2d/0x310 fs/buffer.c:1347
>>>  sb_bread_unmovable include/linux/buffer_head.h:313 [inline]
>>>  ext4_fill_super+0x6de/0xda80 fs/ext4/super.c:3541
>>>  mount_bdev+0x30c/0x3e0 fs/super.c:1174
>>>  ext4_mount+0x34/0x40 fs/ext4/super.c:5844
>>>  mount_fs+0xae/0x328 fs/super.c:1277
>>>  vfs_kern_mount.part.34+0xdc/0x4e0 fs/namespace.c:1037
>>>  vfs_kern_mount fs/namespace.c:1027 [inline]
>>>  do_new_mount fs/namespace.c:2518 [inline]
>>>  do_mount+0x581/0x30e0 fs/namespace.c:2848
>>>  ksys_mount+0x12d/0x140 fs/namespace.c:3064
>>>  __do_sys_mount fs/namespace.c:3078 [inline]
>>>  __se_sys_mount fs/namespace.c:3075 [inline]
>>>  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
>>>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe


This seems to be related to 9p. After rerunning the log I got:

root@syzkaller:~# ps afxu | grep syz
root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
[syz-executor] <defunct>
root@syzkaller:~# cat /proc/18253/task/*/stack
[<0>] p9_client_rpc+0x3a2/0x1400
[<0>] p9_client_flush+0x134/0x2a0
[<0>] p9_client_rpc+0x122c/0x1400
[<0>] p9_client_create+0xc56/0x16af
[<0>] v9fs_session_init+0x21a/0x1a80
[<0>] v9fs_mount+0x7c/0x900
[<0>] mount_fs+0xae/0x328
[<0>] vfs_kern_mount.part.34+0xdc/0x4e0
[<0>] do_mount+0x581/0x30e0
[<0>] ksys_mount+0x12d/0x140
[<0>] __x64_sys_mount+0xbe/0x150
[<0>] do_syscall_64+0x1b9/0x820
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff

There is a bunch of hangs in 9p, so let's do:

#syz dup: INFO: task hung in flush_work

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 11:41     ` Dmitry Vyukov
@ 2018-07-18 12:53       ` Tetsuo Handa
  2018-07-18 13:04         ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2018-07-18 12:53 UTC (permalink / raw)
  To: Dmitry Vyukov, Eric Van Hensbergen, Ron Minnich,
	Latchesar Ionkov, v9fs-developer
  Cc: syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro

On 2018/07/18 20:41, Dmitry Vyukov wrote:
> This seems to be related to 9p. After rerunning the log I got:
> 
> root@syzkaller:~# ps afxu | grep syz
> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
> [syz-executor] <defunct>
> root@syzkaller:~# cat /proc/18253/task/*/stack
> [<0>] p9_client_rpc+0x3a2/0x1400
> [<0>] p9_client_flush+0x134/0x2a0
> [<0>] p9_client_rpc+0x122c/0x1400
> [<0>] p9_client_create+0xc56/0x16af
> [<0>] v9fs_session_init+0x21a/0x1a80
> [<0>] v9fs_mount+0x7c/0x900
> [<0>] mount_fs+0xae/0x328
> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
> [<0>] do_mount+0x581/0x30e0
> [<0>] ksys_mount+0x12d/0x140
> [<0>] __x64_sys_mount+0xbe/0x150
> [<0>] do_syscall_64+0x1b9/0x820
> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [<0>] 0xffffffffffffffff
> 
> There is a bunch of hangs in 9p, so let's do:
> 
> #syz dup: INFO: task hung in flush_work
> 
Then, is dumping all threads when khungtaskd fires a candidate
for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 12:53       ` Tetsuo Handa
@ 2018-07-18 13:04         ` Dmitry Vyukov
  2018-07-18 13:35           ` Tetsuo Handa
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2018-07-18 13:04 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	v9fs-developer, syzbot, linux-fsdevel, LKML, syzkaller-bugs,
	Al Viro

On Wed, Jul 18, 2018 at 2:53 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/07/18 20:41, Dmitry Vyukov wrote:
>> This seems to be related to 9p. After rerunning the log I got:
>>
>> root@syzkaller:~# ps afxu | grep syz
>> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
>> [syz-executor] <defunct>
>> root@syzkaller:~# cat /proc/18253/task/*/stack
>> [<0>] p9_client_rpc+0x3a2/0x1400
>> [<0>] p9_client_flush+0x134/0x2a0
>> [<0>] p9_client_rpc+0x122c/0x1400
>> [<0>] p9_client_create+0xc56/0x16af
>> [<0>] v9fs_session_init+0x21a/0x1a80
>> [<0>] v9fs_mount+0x7c/0x900
>> [<0>] mount_fs+0xae/0x328
>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
>> [<0>] do_mount+0x581/0x30e0
>> [<0>] ksys_mount+0x12d/0x140
>> [<0>] __x64_sys_mount+0xbe/0x150
>> [<0>] do_syscall_64+0x1b9/0x820
>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [<0>] 0xffffffffffffffff
>>
>> There is a bunch of hangs in 9p, so let's do:
>>
>> #syz dup: INFO: task hung in flush_work
>>
> Then, is dumping all threads when khungtaskd fires a candidate
> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?

Perhaps would be useful. But maybe only tasks that are blocked for
more than timeout/2? and/or unkillable tasks? killable tasks are not a
problem.

Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
killed along with the whole process?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 13:04         ` Dmitry Vyukov
@ 2018-07-18 13:35           ` Tetsuo Handa
  2018-07-18 14:11             ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2018-07-18 13:35 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	v9fs-developer, syzbot, linux-fsdevel, LKML, syzkaller-bugs,
	Al Viro

On 2018/07/18 22:04, Dmitry Vyukov wrote:
> On Wed, Jul 18, 2018 at 2:53 PM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> On 2018/07/18 20:41, Dmitry Vyukov wrote:
>>> This seems to be related to 9p. After rerunning the log I got:
>>>
>>> root@syzkaller:~# ps afxu | grep syz
>>> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
>>> [syz-executor] <defunct>
>>> root@syzkaller:~# cat /proc/18253/task/*/stack
>>> [<0>] p9_client_rpc+0x3a2/0x1400
>>> [<0>] p9_client_flush+0x134/0x2a0
>>> [<0>] p9_client_rpc+0x122c/0x1400
>>> [<0>] p9_client_create+0xc56/0x16af
>>> [<0>] v9fs_session_init+0x21a/0x1a80
>>> [<0>] v9fs_mount+0x7c/0x900
>>> [<0>] mount_fs+0xae/0x328
>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
>>> [<0>] do_mount+0x581/0x30e0
>>> [<0>] ksys_mount+0x12d/0x140
>>> [<0>] __x64_sys_mount+0xbe/0x150
>>> [<0>] do_syscall_64+0x1b9/0x820
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [<0>] 0xffffffffffffffff
>>>
>>> There is a bunch of hangs in 9p, so let's do:
>>>
>>> #syz dup: INFO: task hung in flush_work
>>>
>> Then, is dumping all threads when khungtaskd fires a candidate
>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?
> 
> Perhaps would be useful. But maybe only tasks that are blocked for
> more than timeout/2? and/or unkillable tasks? killable tasks are not a
> problem.

TASK_KILLABLE waiters are not reported by khungtaskd, are they?

  /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
  if (t->state == TASK_UNINTERRUPTIBLE)
    check_hung_task(t, timeout);

And TASK_KILLABLE waiters can become a problem because

> 
> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
> killed along with the whole process?
> 

wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
it ignores SIGKILL by retrying the loop...

  again:
    err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
    if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
      sigpending = 1;
      clear_thread_flag(TIF_SIGPENDING);
      goto again;
    }

I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 13:35           ` Tetsuo Handa
@ 2018-07-18 14:11             ` Dmitry Vyukov
  2018-07-18 14:17               ` Tetsuo Handa
  2018-08-02 22:18               ` [V9fs-developer] " Dominique Martinet
  0 siblings, 2 replies; 12+ messages in thread
From: Dmitry Vyukov @ 2018-07-18 14:11 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	v9fs-developer, syzbot, linux-fsdevel, LKML, syzkaller-bugs,
	Al Viro

On Wed, Jul 18, 2018 at 3:35 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>> This seems to be related to 9p. After rerunning the log I got:
>>>>
>>>> root@syzkaller:~# ps afxu | grep syz
>>>> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
>>>> [syz-executor] <defunct>
>>>> root@syzkaller:~# cat /proc/18253/task/*/stack
>>>> [<0>] p9_client_rpc+0x3a2/0x1400
>>>> [<0>] p9_client_flush+0x134/0x2a0
>>>> [<0>] p9_client_rpc+0x122c/0x1400
>>>> [<0>] p9_client_create+0xc56/0x16af
>>>> [<0>] v9fs_session_init+0x21a/0x1a80
>>>> [<0>] v9fs_mount+0x7c/0x900
>>>> [<0>] mount_fs+0xae/0x328
>>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
>>>> [<0>] do_mount+0x581/0x30e0
>>>> [<0>] ksys_mount+0x12d/0x140
>>>> [<0>] __x64_sys_mount+0xbe/0x150
>>>> [<0>] do_syscall_64+0x1b9/0x820
>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>> [<0>] 0xffffffffffffffff
>>>>
>>>> There is a bunch of hangs in 9p, so let's do:
>>>>
>>>> #syz dup: INFO: task hung in flush_work
>>>>
>>> Then, is dumping all threads when khungtaskd fires a candidate
>>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?
>>
>> Perhaps would be useful. But maybe only tasks that are blocked for
>> more than timeout/2? and/or unkillable tasks? killable tasks are not a
>> problem.
>
> TASK_KILLABLE waiters are not reported by khungtaskd, are they?
>
>   /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
>   if (t->state == TASK_UNINTERRUPTIBLE)
>     check_hung_task(t, timeout);
>
> And TASK_KILLABLE waiters can become a problem because
>
>>
>> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
>> killed along with the whole process?
>>
>
> wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
> But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
> it ignores SIGKILL by retrying the loop...
>
>   again:
>     err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
>     if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
>       sigpending = 1;
>       clear_thread_flag(TIF_SIGPENDING);
>       goto again;
>     }
>
> I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).


I guess that's the problem, right? SIGKILL-ed task must not ignore
SIGKILL and hang in infinite loop. This would explain a bunch of hangs
in 9p.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 14:11             ` Dmitry Vyukov
@ 2018-07-18 14:17               ` Tetsuo Handa
  2018-11-02 19:19                 ` Dmitry Vyukov
  2018-08-02 22:18               ` [V9fs-developer] " Dominique Martinet
  1 sibling, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2018-07-18 14:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	v9fs-developer, syzbot, linux-fsdevel, LKML, syzkaller-bugs,
	Al Viro

On 2018/07/18 23:11, Dmitry Vyukov wrote:
> On Wed, Jul 18, 2018 at 3:35 PM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>>> This seems to be related to 9p. After rerunning the log I got:
>>>>>
>>>>> root@syzkaller:~# ps afxu | grep syz
>>>>> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
>>>>> [syz-executor] <defunct>
>>>>> root@syzkaller:~# cat /proc/18253/task/*/stack
>>>>> [<0>] p9_client_rpc+0x3a2/0x1400
>>>>> [<0>] p9_client_flush+0x134/0x2a0
>>>>> [<0>] p9_client_rpc+0x122c/0x1400
>>>>> [<0>] p9_client_create+0xc56/0x16af
>>>>> [<0>] v9fs_session_init+0x21a/0x1a80
>>>>> [<0>] v9fs_mount+0x7c/0x900
>>>>> [<0>] mount_fs+0xae/0x328
>>>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
>>>>> [<0>] do_mount+0x581/0x30e0
>>>>> [<0>] ksys_mount+0x12d/0x140
>>>>> [<0>] __x64_sys_mount+0xbe/0x150
>>>>> [<0>] do_syscall_64+0x1b9/0x820
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>> There is a bunch of hangs in 9p, so let's do:
>>>>>
>>>>> #syz dup: INFO: task hung in flush_work
>>>>>
>>>> Then, is dumping all threads when khungtaskd fires a candidate
>>>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?
>>>
>>> Perhaps would be useful. But maybe only tasks that are blocked for
>>> more than timeout/2? and/or unkillable tasks? killable tasks are not a
>>> problem.
>>
>> TASK_KILLABLE waiters are not reported by khungtaskd, are they?
>>
>>   /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
>>   if (t->state == TASK_UNINTERRUPTIBLE)
>>     check_hung_task(t, timeout);
>>
>> And TASK_KILLABLE waiters can become a problem because
>>
>>>
>>> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
>>> killed along with the whole process?
>>>
>>
>> wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
>> But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
>> it ignores SIGKILL by retrying the loop...
>>
>>   again:
>>     err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
>>     if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
>>       sigpending = 1;
>>       clear_thread_flag(TIF_SIGPENDING);
>>       goto again;
>>     }
>>
>> I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).
> 
> 
> I guess that's the problem, right? SIGKILL-ed task must not ignore
> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
> in 9p.

Did you check /proc/18253/task/*/stack after manually sending SIGKILL?
I mean, who (i.e. you or syzkaller programs) is sending a signal (not limited
to SIGKILL but any signal) that makes TASK_KILLABLE waiters to wake up?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [V9fs-developer] INFO: task hung in grab_super
  2018-07-18 14:11             ` Dmitry Vyukov
  2018-07-18 14:17               ` Tetsuo Handa
@ 2018-08-02 22:18               ` Dominique Martinet
  1 sibling, 0 replies; 12+ messages in thread
From: Dominique Martinet @ 2018-08-02 22:18 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Tetsuo Handa, Latchesar Ionkov, Eric Van Hensbergen,
	syzkaller-bugs, LKML, syzbot, Al Viro, v9fs-developer,
	linux-fsdevel, Ron Minnich

Dmitry Vyukov via V9fs-developer wrote on Wed, Jul 18, 2018:
> >> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
> >> killed along with the whole process?
> >>
> >
> > wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
> > But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
> > it ignores SIGKILL by retrying the loop...
> >
> >   again:
> >     err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
> >     if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
> >       sigpending = 1;
> >       clear_thread_flag(TIF_SIGPENDING);
> >       goto again;
> >     }
> >
> > I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).
> 
> 
> I guess that's the problem, right? SIGKILL-ed task must not ignore
> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
> in 9p.

Tricky with the current way we handle this, as the normal action if
wait_event_killable is interrupted is to send a tflush message (which is
what you could also notice, if you just send one sigkill it'll just send
a flush message and wait for that instead)

There's work in progress to add refcounting to requests which would make
us one step closer to being able to not wait for the flush reply (or
rather, it'll give us the ability to wait for it asynchronously) ; we
should be able to get rid of that loop after that.

-- 
Dominique

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-07-18 14:17               ` Tetsuo Handa
@ 2018-11-02 19:19                 ` Dmitry Vyukov
  2018-11-02 22:45                   ` Dominique Martinet
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2018-11-02 19:19 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	v9fs-developer, syzbot, linux-fsdevel, LKML, syzkaller-bugs,
	Al Viro

On Wed, Jul 18, 2018 at 4:17 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/07/18 23:11, Dmitry Vyukov wrote:
>> On Wed, Jul 18, 2018 at 3:35 PM, Tetsuo Handa
>> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>>>> This seems to be related to 9p. After rerunning the log I got:
>>>>>>
>>>>>> root@syzkaller:~# ps afxu | grep syz
>>>>>> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
>>>>>> [syz-executor] <defunct>
>>>>>> root@syzkaller:~# cat /proc/18253/task/*/stack
>>>>>> [<0>] p9_client_rpc+0x3a2/0x1400
>>>>>> [<0>] p9_client_flush+0x134/0x2a0
>>>>>> [<0>] p9_client_rpc+0x122c/0x1400
>>>>>> [<0>] p9_client_create+0xc56/0x16af
>>>>>> [<0>] v9fs_session_init+0x21a/0x1a80
>>>>>> [<0>] v9fs_mount+0x7c/0x900
>>>>>> [<0>] mount_fs+0xae/0x328
>>>>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
>>>>>> [<0>] do_mount+0x581/0x30e0
>>>>>> [<0>] ksys_mount+0x12d/0x140
>>>>>> [<0>] __x64_sys_mount+0xbe/0x150
>>>>>> [<0>] do_syscall_64+0x1b9/0x820
>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>> [<0>] 0xffffffffffffffff
>>>>>>
>>>>>> There is a bunch of hangs in 9p, so let's do:
>>>>>>
>>>>>> #syz dup: INFO: task hung in flush_work
>>>>>>
>>>>> Then, is dumping all threads when khungtaskd fires a candidate
>>>>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?
>>>>
>>>> Perhaps would be useful. But maybe only tasks that are blocked for
>>>> more than timeout/2? and/or unkillable tasks? killable tasks are not a
>>>> problem.
>>>
>>> TASK_KILLABLE waiters are not reported by khungtaskd, are they?
>>>
>>>   /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
>>>   if (t->state == TASK_UNINTERRUPTIBLE)
>>>     check_hung_task(t, timeout);
>>>
>>> And TASK_KILLABLE waiters can become a problem because
>>>
>>>>
>>>> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
>>>> killed along with the whole process?
>>>>
>>>
>>> wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
>>> But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
>>> it ignores SIGKILL by retrying the loop...
>>>
>>>   again:
>>>     err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
>>>     if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
>>>       sigpending = 1;
>>>       clear_thread_flag(TIF_SIGPENDING);
>>>       goto again;
>>>     }
>>>
>>> I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).
>>
>>
>> I guess that's the problem, right? SIGKILL-ed task must not ignore
>> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
>> in 9p.
>
> Did you check /proc/18253/task/*/stack after manually sending SIGKILL?

Yes:

root@syzkaller:~# ps afxu | grep syz
root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
[syz-executor] <defunct>
root@syzkaller:~# cat /proc/18253/task/*/stack
[<0>] p9_client_rpc+0x3a2/0x1400
[<0>] p9_client_flush+0x134/0x2a0
[<0>] p9_client_rpc+0x122c/0x1400
[<0>] p9_client_create+0xc56/0x16af
[<0>] v9fs_session_init+0x21a/0x1a80
[<0>] v9fs_mount+0x7c/0x900
[<0>] mount_fs+0xae/0x328
[<0>] vfs_kern_mount.part.34+0xdc/0x4e0
[<0>] do_mount+0x581/0x30e0
[<0>] ksys_mount+0x12d/0x140
[<0>] __x64_sys_mount+0xbe/0x150
[<0>] do_syscall_64+0x1b9/0x820
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff



> I mean, who (i.e. you or syzkaller programs) is sending a signal (not limited
> to SIGKILL but any signal) that makes TASK_KILLABLE waiters to wake up?

Both. syzkaller always SIGKILLs test process after some timeout and
expects it to go away. I also tried manually after that, but it does
not make any difference.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: INFO: task hung in grab_super
  2018-11-02 19:19                 ` Dmitry Vyukov
@ 2018-11-02 22:45                   ` Dominique Martinet
  0 siblings, 0 replies; 12+ messages in thread
From: Dominique Martinet @ 2018-11-02 22:45 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Tetsuo Handa, Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	v9fs-developer, syzbot, linux-fsdevel, LKML, syzkaller-bugs,
	Al Viro

Dmitry Vyukov wrote on Fri, Nov 02, 2018:
> >> I guess that's the problem, right? SIGKILL-ed task must not ignore
> >> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
> >> in 9p.
> >
> > Did you check /proc/18253/task/*/stack after manually sending SIGKILL?
> 
> Yes:
> 
> root@syzkaller:~# ps afxu | grep syz
> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
> [syz-executor] <defunct>
> root@syzkaller:~# cat /proc/18253/task/*/stack
> [<0>] p9_client_rpc+0x3a2/0x1400
> [<0>] p9_client_flush+0x134/0x2a0
> [<0>] p9_client_rpc+0x122c/0x1400
> [<0>] p9_client_create+0xc56/0x16af
> [<0>] v9fs_session_init+0x21a/0x1a80
> [<0>] v9fs_mount+0x7c/0x900
> [<0>] mount_fs+0xae/0x328
> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
> [<0>] do_mount+0x581/0x30e0
> [<0>] ksys_mount+0x12d/0x140
> [<0>] __x64_sys_mount+0xbe/0x150
> [<0>] do_syscall_64+0x1b9/0x820
> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [<0>] 0xffffffffffffffff

Yes that's a known problem with the current code, since everything must
be cleaned up on the spot, the first kill sends a flush and waits again
for the flush reply to come; the second kill is completly ignored.

With the refcounting work we've done that went in this merge window
we're halfways there - memory can now have a lifetime independant of the
current request and won't be freed when the process exits p9_client_rpc,
so we can send the flush and return immediately; then have the rest of
the cleanup happen asynchronously when the flush reply comes or the
client is torn down, whichever happens first.

I've got this planned for 4.21 if I can find the time to do it early in
this cycle and I get it to work on first try, 4.22 if I run into
complications to make sure it's well tested in -next first.
My freetime is pretty limited this year so unless you want to help it'll
get done when it's ready :)

-- 
Dominique

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-11-02 22:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18 10:38 INFO: task hung in grab_super syzbot
2018-07-18 10:49 ` Tetsuo Handa
2018-07-18 11:02   ` Dmitry Vyukov
2018-07-18 11:41     ` Dmitry Vyukov
2018-07-18 12:53       ` Tetsuo Handa
2018-07-18 13:04         ` Dmitry Vyukov
2018-07-18 13:35           ` Tetsuo Handa
2018-07-18 14:11             ` Dmitry Vyukov
2018-07-18 14:17               ` Tetsuo Handa
2018-11-02 19:19                 ` Dmitry Vyukov
2018-11-02 22:45                   ` Dominique Martinet
2018-08-02 22:18               ` [V9fs-developer] " Dominique Martinet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).