linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: task hung in io_wq_destroy
@ 2019-10-28 19:42 syzbot
  2019-10-28 21:00 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2019-10-28 19:42 UTC (permalink / raw)
  To: andriy.shevchenko, axboe, davem, f.fainelli, gregkh, idosch,
	kimbrownkd, linux-fsdevel, linux-kernel, netdev, petrm,
	syzkaller-bugs, tglx, viro, wanghai26, yuehaibing

Hello,

syzbot found the following crash on:

HEAD commit:    139c2d13 Add linux-next specific files for 20191025
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=137a3e97600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=28fd7a693df38d29
dashboard link: https://syzkaller.appspot.com/bug?extid=0f1cc17f85154f400465
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15415bdf600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=101fa6e4e00000

The bug was bisected to:

commit 3f982fff29b4ad339b36e9cf43422d1039f9917a
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Oct 24 17:35:03 2019 +0000

     Merge branch 'for-5.5/drivers' into for-next

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17f7e44ce00000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=140fe44ce00000
console output: https://syzkaller.appspot.com/x/log.txt?x=100fe44ce00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
Fixes: 3f982fff29b4 ("Merge branch 'for-5.5/drivers' into for-next")

INFO: task syz-executor696:18072 blocked for more than 143 seconds.
       Not tainted 5.4.0-rc4-next-20191025 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor696 D28160 18072   9609 0x00004004
Call Trace:
  context_switch kernel/sched/core.c:3385 [inline]
  __schedule+0x94a/0x1e70 kernel/sched/core.c:4070
  schedule+0xdc/0x2b0 kernel/sched/core.c:4144
  schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
  do_wait_for_common kernel/sched/completion.c:83 [inline]
  __wait_for_common kernel/sched/completion.c:104 [inline]
  wait_for_common kernel/sched/completion.c:115 [inline]
  wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
  io_wq_destroy+0x247/0x470 fs/io-wq.c:784
  io_finish_async+0x102/0x180 fs/io_uring.c:2890
  io_ring_ctx_free fs/io_uring.c:3615 [inline]
  io_ring_ctx_wait_and_kill+0x249/0x710 fs/io_uring.c:3683
  io_uring_release+0x42/0x50 fs/io_uring.c:3691
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x65f/0x760 arch/x86/entry/common.c:300
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4019d0
Code: 01 f0 ff ff 0f 83 20 0c 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f  
44 00 00 83 3d fd 2c 2d 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 f4 0b 00 00 c3 48 83 ec 08 e8 5a 01 00 00
RSP: 002b:00007ffdcb9bf4b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00000000004019d0
RDX: 0000000000401970 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000670
R13: 0000000000402ae0 R14: 0000000000000000 R15: 0000000000000000

Showing all locks held in the system:
1 lock held by khungtaskd/1078:
  #0: ffffffff88faba80 (rcu_read_lock){....}, at:  
debug_show_all_locks+0x5f/0x279 kernel/locking/lockdep.c:5336
1 lock held by rsyslogd/9148:
  #0: ffff888099aa8660 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110  
fs/file.c:801
2 locks held by getty/9238:
  #0: ffff88809a77f090 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
  #1: ffffc90005f472e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9239:
  #0: ffff8880a7bc7090 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
  #1: ffffc90005f4f2e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9240:
  #0: ffff8880a83e7090 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
  #1: ffffc90005f3d2e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9241:
  #0: ffff88808e706090 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
  #1: ffffc90005f292e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9242:
  #0: ffff8880a7b75090 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
  #1: ffffc90005f312e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9243:
  #0: ffff8880a130f090 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
  #1: ffffc90005f532e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9244:
  #0: ffff88809b09e090 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
  #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1078 Comm: khungtaskd Not tainted 5.4.0-rc4-next-20191025 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
  nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline]
  watchdog+0xc8f/0x1350 kernel/hung_task.c:353
  kthread+0x361/0x430 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xe/0x10  
arch/x86/include/asm/irqflags.h:60


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: task hung in io_wq_destroy
  2019-10-28 19:42 INFO: task hung in io_wq_destroy syzbot
@ 2019-10-28 21:00 ` Jens Axboe
  2019-11-19  2:23   ` Eric Biggers
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2019-10-28 21:00 UTC (permalink / raw)
  To: syzbot, andriy.shevchenko, davem, f.fainelli, gregkh, idosch,
	kimbrownkd, linux-fsdevel, linux-kernel, netdev, petrm,
	syzkaller-bugs, tglx, viro, wanghai26, yuehaibing

On 10/28/19 1:42 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    139c2d13 Add linux-next specific files for 20191025
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=137a3e97600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=28fd7a693df38d29
> dashboard link: https://syzkaller.appspot.com/bug?extid=0f1cc17f85154f400465
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15415bdf600000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=101fa6e4e00000
> 
> The bug was bisected to:
> 
> commit 3f982fff29b4ad339b36e9cf43422d1039f9917a
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Thu Oct 24 17:35:03 2019 +0000
> 
>       Merge branch 'for-5.5/drivers' into for-next
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17f7e44ce00000
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=140fe44ce00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=100fe44ce00000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
> Fixes: 3f982fff29b4 ("Merge branch 'for-5.5/drivers' into for-next")
> 
> INFO: task syz-executor696:18072 blocked for more than 143 seconds.
>         Not tainted 5.4.0-rc4-next-20191025 #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor696 D28160 18072   9609 0x00004004
> Call Trace:
>    context_switch kernel/sched/core.c:3385 [inline]
>    __schedule+0x94a/0x1e70 kernel/sched/core.c:4070
>    schedule+0xdc/0x2b0 kernel/sched/core.c:4144
>    schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
>    do_wait_for_common kernel/sched/completion.c:83 [inline]
>    __wait_for_common kernel/sched/completion.c:104 [inline]
>    wait_for_common kernel/sched/completion.c:115 [inline]
>    wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
>    io_wq_destroy+0x247/0x470 fs/io-wq.c:784
>    io_finish_async+0x102/0x180 fs/io_uring.c:2890
>    io_ring_ctx_free fs/io_uring.c:3615 [inline]
>    io_ring_ctx_wait_and_kill+0x249/0x710 fs/io_uring.c:3683
>    io_uring_release+0x42/0x50 fs/io_uring.c:3691
>    __fput+0x2ff/0x890 fs/file_table.c:280
>    ____fput+0x16/0x20 fs/file_table.c:313
>    task_work_run+0x145/0x1c0 kernel/task_work.c:113
>    tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>    exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>    prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>    syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>    do_syscall_64+0x65f/0x760 arch/x86/entry/common.c:300
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4019d0
> Code: 01 f0 ff ff 0f 83 20 0c 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 83 3d fd 2c 2d 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 f4 0b 00 00 c3 48 83 ec 08 e8 5a 01 00 00
> RSP: 002b:00007ffdcb9bf4b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00000000004019d0
> RDX: 0000000000401970 RSI: 0000000020000080 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000670
> R13: 0000000000402ae0 R14: 0000000000000000 R15: 0000000000000000
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1078:
>    #0: ffffffff88faba80 (rcu_read_lock){....}, at:
> debug_show_all_locks+0x5f/0x279 kernel/locking/lockdep.c:5336
> 1 lock held by rsyslogd/9148:
>    #0: ffff888099aa8660 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110
> fs/file.c:801
> 2 locks held by getty/9238:
>    #0: ffff88809a77f090 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
>    #1: ffffc90005f472e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9239:
>    #0: ffff8880a7bc7090 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
>    #1: ffffc90005f4f2e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9240:
>    #0: ffff8880a83e7090 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
>    #1: ffffc90005f3d2e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9241:
>    #0: ffff88808e706090 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
>    #1: ffffc90005f292e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9242:
>    #0: ffff8880a7b75090 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
>    #1: ffffc90005f312e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9243:
>    #0: ffff8880a130f090 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
>    #1: ffffc90005f532e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9244:
>    #0: ffff88809b09e090 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
>    #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
> 
> =============================================
> 
> NMI backtrace for cpu 0
> CPU: 0 PID: 1078 Comm: khungtaskd Not tainted 5.4.0-rc4-next-20191025 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>    __dump_stack lib/dump_stack.c:77 [inline]
>    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
>    nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
>    nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
>    arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>    trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>    check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline]
>    watchdog+0xc8f/0x1350 kernel/hung_task.c:353
>    kthread+0x361/0x430 kernel/kthread.c:255
>    ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xe/0x10
> arch/x86/include/asm/irqflags.h:60

This is fixed in my for-next branch for a few days at least, unfortunately
linux-next is still on the old one. Next version should be better.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: task hung in io_wq_destroy
  2019-10-28 21:00 ` Jens Axboe
@ 2019-11-19  2:23   ` Eric Biggers
  2019-11-19  3:15     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Biggers @ 2019-11-19  2:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: syzbot, andriy.shevchenko, davem, f.fainelli, gregkh, idosch,
	kimbrownkd, linux-fsdevel, linux-kernel, netdev, petrm,
	syzkaller-bugs, tglx, viro, wanghai26, yuehaibing

Hi Jens,

On Mon, Oct 28, 2019 at 03:00:08PM -0600, Jens Axboe wrote:
> This is fixed in my for-next branch for a few days at least, unfortunately
> linux-next is still on the old one. Next version should be better.

This is still occurring on linux-next.  Here's a report on next-20191115 from
https://syzkaller.appspot.com/text?tag=CrashReport&x=16fa3d1ce00000

INFO: task syz-executor.5:11684 can't die for more than 143 seconds.
syz-executor.5  D28160 11684   9114 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3385 [inline]
 __schedule+0x8e1/0x1f30 kernel/sched/core.c:4081
 schedule+0xdc/0x2b0 kernel/sched/core.c:4155
 schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
 io_wq_destroy+0x22a/0x450 fs/io-wq.c:1068
 io_finish_async+0x128/0x1b0 fs/io_uring.c:3423
 io_ring_ctx_free fs/io_uring.c:4220 [inline]
 io_ring_ctx_wait_and_kill+0x2b7/0x810 fs/io_uring.c:4293
 io_uring_release+0x42/0x50 fs/io_uring.c:4301
 __fput+0x2ff/0x890 fs/file_table.c:280
 ____fput+0x16/0x20 fs/file_table.c:313
 task_work_run+0x145/0x1c0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:164
 prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:278 [inline]
 do_syscall_64+0x676/0x790 arch/x86/entry/common.c:304
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4141d1
Code: 65 00 31 d2 be 53 25 44 00 bf 09 3d 44 00 ff 15 95 57 24 00 b9 c0 99 65 00 31 d2 be 44 25 44 00 bf 09 3d 44 00 ff 15 7e 57 24 <00> b9 20 9b 65 00 31 d2 be ac 94 44 00 bf 09 3d 44 00 ff 15 67 57
RSP: 002b:00007ffc4b532100 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00000000004141d1
RDX: 0000001b2d520000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000034194edd R09: 0000000034194ee1
R10: 00007ffc4b5321e0 R11: 0000000000000293 R12: 000000000075bf20
R13: 00000000000428a4 R14: 00000000007604b0 R15: 000000000075bf2c
INFO: task syz-executor.5:11684 blocked for more than 143 seconds.
      Not tainted 5.4.0-rc7-next-20191115 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.5  D28160 11684   9114 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3385 [inline]
 __schedule+0x8e1/0x1f30 kernel/sched/core.c:4081
 schedule+0xdc/0x2b0 kernel/sched/core.c:4155
 schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
 io_wq_destroy+0x22a/0x450 fs/io-wq.c:1068
 io_finish_async+0x128/0x1b0 fs/io_uring.c:3423
 io_ring_ctx_free fs/io_uring.c:4220 [inline]
 io_ring_ctx_wait_and_kill+0x2b7/0x810 fs/io_uring.c:4293
 io_uring_release+0x42/0x50 fs/io_uring.c:4301
 __fput+0x2ff/0x890 fs/file_table.c:280
 ____fput+0x16/0x20 fs/file_table.c:313
 task_work_run+0x145/0x1c0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:164
 prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:278 [inline]
 do_syscall_64+0x676/0x790 arch/x86/entry/common.c:304
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4141d1
Code: 65 00 31 d2 be 53 25 44 00 bf 09 3d 44 00 ff 15 95 57 24 00 b9 c0 99 65 00 31 d2 be 44 25 44 00 bf 09 3d 44 00 ff 15 7e 57 24 <00> b9 20 9b 65 00 31 d2 be ac 94 44 00 bf 09 3d 44 00 ff 15 67 57
RSP: 002b:00007ffc4b532100 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00000000004141d1
RDX: 0000001b2d520000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000034194edd R09: 0000000034194ee1
R10: 00007ffc4b5321e0 R11: 0000000000000293 R12: 000000000075bf20
R13: 00000000000428a4 R14: 00000000007604b0 R15: 000000000075bf2c

Showing all locks held in the system:
1 lock held by khungtaskd/834:
 #0: ffffffff88faccc0 (rcu_read_lock){....}, at: debug_show_all_locks+0x5f/0x279 kernel/locking/lockdep.c:5334
3 locks held by kworker/u4:4/835:
 #0: ffff8880ae837358 (&rq->lock){-.-.}, at: newidle_balance+0xa28/0xe80 kernel/sched/fair.c:10049
 #1: ffffffff88faccc0 (rcu_read_lock){....}, at: __update_idle_core+0x45/0x3f0 kernel/sched/fair.c:5697
 #2: ffffffff88faccc0 (rcu_read_lock){....}, at: batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:405 [inline]
 #2: ffffffff88faccc0 (rcu_read_lock){....}, at: batadv_nc_worker+0xe3/0x760 net/batman-adv/network-coding.c:718
1 lock held by rsyslogd/8970:
 #0: ffff8880a9919b60 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110 fs/file.c:801
2 locks held by getty/9060:
 #0: ffff8880917eb090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9061:
 #0: ffff8880a4ab7090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9062:
 #0: ffff888096614090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9063:
 #0: ffff8880a776f090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f152e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9064:
 #0: ffff8880a5f2b090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f312e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9065:
 #0: ffff88809145c090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9066:
 #0: ffff8880a05c6090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 834 Comm: khungtaskd Not tainted 5.4.0-rc7-next-20191115 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline]
 watchdog+0xc8f/0x1350 kernel/hung_task.c:353
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: task hung in io_wq_destroy
  2019-11-19  2:23   ` Eric Biggers
@ 2019-11-19  3:15     ` Jens Axboe
  2019-11-19  4:34       ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2019-11-19  3:15 UTC (permalink / raw)
  To: Eric Biggers
  Cc: syzbot, andriy.shevchenko, davem, f.fainelli, gregkh, idosch,
	kimbrownkd, linux-fsdevel, linux-kernel, netdev, petrm,
	syzkaller-bugs, tglx, viro, wanghai26, yuehaibing

On 11/18/19 7:23 PM, Eric Biggers wrote:
> Hi Jens,
> 
> On Mon, Oct 28, 2019 at 03:00:08PM -0600, Jens Axboe wrote:
>> This is fixed in my for-next branch for a few days at least, unfortunately
>> linux-next is still on the old one. Next version should be better.
> 
> This is still occurring on linux-next.  Here's a report on next-20191115 from
> https://syzkaller.appspot.com/text?tag=CrashReport&x=16fa3d1ce00000

Hmm, I'll take a look. Looking at the reproducer, it's got a massive
sleep at the end. I take it this triggers before that time actually
passes? Because that's around 11.5 days of sleep.

No luck reproducing this so far, I'll try on linux-next.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: task hung in io_wq_destroy
  2019-11-19  3:15     ` Jens Axboe
@ 2019-11-19  4:34       ` Jens Axboe
  2019-11-19 15:20         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2019-11-19  4:34 UTC (permalink / raw)
  To: Eric Biggers
  Cc: syzbot, andriy.shevchenko, davem, f.fainelli, gregkh, idosch,
	kimbrownkd, linux-fsdevel, linux-kernel, netdev, petrm,
	syzkaller-bugs, tglx, viro, wanghai26, yuehaibing

On 11/18/19 8:15 PM, Jens Axboe wrote:
> On 11/18/19 7:23 PM, Eric Biggers wrote:
>> Hi Jens,
>>
>> On Mon, Oct 28, 2019 at 03:00:08PM -0600, Jens Axboe wrote:
>>> This is fixed in my for-next branch for a few days at least, unfortunately
>>> linux-next is still on the old one. Next version should be better.
>>
>> This is still occurring on linux-next.  Here's a report on next-20191115 from
>> https://syzkaller.appspot.com/text?tag=CrashReport&x=16fa3d1ce00000
> 
> Hmm, I'll take a look. Looking at the reproducer, it's got a massive
> sleep at the end. I take it this triggers before that time actually
> passes? Because that's around 11.5 days of sleep.
> 
> No luck reproducing this so far, I'll try on linux-next.

I see what it is - if the io-wq is setup and torn down before the
manager thread is started, then we won't create the workers we already
expected. The manager thread will exit without doing anything, but
teardown will wait for the expected workers to exit before being
allowed to proceed. That never happens.

I've got a patch for this, but I'll test it a bit and send it out
tomorrow.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: task hung in io_wq_destroy
  2019-11-19  4:34       ` Jens Axboe
@ 2019-11-19 15:20         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2019-11-19 15:20 UTC (permalink / raw)
  To: Eric Biggers
  Cc: syzbot, andriy.shevchenko, davem, f.fainelli, gregkh, idosch,
	kimbrownkd, linux-fsdevel, linux-kernel, netdev, petrm,
	syzkaller-bugs, tglx, viro, wanghai26, yuehaibing

On 11/18/19 9:34 PM, Jens Axboe wrote:
> On 11/18/19 8:15 PM, Jens Axboe wrote:
>> On 11/18/19 7:23 PM, Eric Biggers wrote:
>>> Hi Jens,
>>>
>>> On Mon, Oct 28, 2019 at 03:00:08PM -0600, Jens Axboe wrote:
>>>> This is fixed in my for-next branch for a few days at least, unfortunately
>>>> linux-next is still on the old one. Next version should be better.
>>>
>>> This is still occurring on linux-next.  Here's a report on next-20191115 from
>>> https://syzkaller.appspot.com/text?tag=CrashReport&x=16fa3d1ce00000
>>
>> Hmm, I'll take a look. Looking at the reproducer, it's got a massive
>> sleep at the end. I take it this triggers before that time actually
>> passes? Because that's around 11.5 days of sleep.
>>
>> No luck reproducing this so far, I'll try on linux-next.
> 
> I see what it is - if the io-wq is setup and torn down before the
> manager thread is started, then we won't create the workers we already
> expected. The manager thread will exit without doing anything, but
> teardown will wait for the expected workers to exit before being
> allowed to proceed. That never happens.
> 
> I've got a patch for this, but I'll test it a bit and send it out
> tomorrow.

This should fix it - wait until the manager is started and has created
the required fixed workers, then check if it failed or not. That closes
the gap between startup and teardown, as we have settled things before
anyone is allowed to call io_wq_destroy().


diff --git a/fs/io-wq.c b/fs/io-wq.c
index 9174007ce107..1f640c489f7c 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -33,6 +33,7 @@ enum {
 enum {
 	IO_WQ_BIT_EXIT		= 0,	/* wq exiting */
 	IO_WQ_BIT_CANCEL	= 1,	/* cancel work on list */
+	IO_WQ_BIT_ERROR		= 2,	/* error on setup */
 };
 
 enum {
@@ -562,14 +563,14 @@ void io_wq_worker_sleeping(struct task_struct *tsk)
 	spin_unlock_irq(&wqe->lock);
 }
 
-static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
+static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
 {
 	struct io_wqe_acct *acct =&wqe->acct[index];
 	struct io_worker *worker;
 
 	worker = kcalloc_node(1, sizeof(*worker), GFP_KERNEL, wqe->node);
 	if (!worker)
-		return;
+		return false;
 
 	refcount_set(&worker->ref, 1);
 	worker->nulls_node.pprev = NULL;
@@ -581,7 +582,7 @@ static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
 				"io_wqe_worker-%d/%d", index, wqe->node);
 	if (IS_ERR(worker->task)) {
 		kfree(worker);
-		return;
+		return false;
 	}
 
 	spin_lock_irq(&wqe->lock);
@@ -599,6 +600,7 @@ static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
 		atomic_inc(&wq->user->processes);
 
 	wake_up_process(worker->task);
+	return true;
 }
 
 static inline bool io_wqe_need_worker(struct io_wqe *wqe, int index)
@@ -606,9 +608,6 @@ static inline bool io_wqe_need_worker(struct io_wqe *wqe, int index)
 {
 	struct io_wqe_acct *acct = &wqe->acct[index];
 
-	/* always ensure we have one bounded worker */
-	if (index == IO_WQ_ACCT_BOUND && !acct->nr_workers)
-		return true;
 	/* if we have available workers or no work, no need */
 	if (!hlist_nulls_empty(&wqe->free_list) || !io_wqe_run_queue(wqe))
 		return false;
@@ -621,10 +620,19 @@ static inline bool io_wqe_need_worker(struct io_wqe *wqe, int index)
 static int io_wq_manager(void *data)
 {
 	struct io_wq *wq = data;
+	int i;
 
-	while (!kthread_should_stop()) {
-		int i;
+	/* create fixed workers */
+	for (i = 0; i < wq->nr_wqes; i++) {
+		if (create_io_worker(wq, wq->wqes[i], IO_WQ_ACCT_BOUND))
+			continue;
+		goto err;
+	}
 
+	refcount_set(&wq->refs, wq->nr_wqes);
+	complete(&wq->done);
+
+	while (!kthread_should_stop()) {
 		for (i = 0; i < wq->nr_wqes; i++) {
 			struct io_wqe *wqe = wq->wqes[i];
 			bool fork_worker[2] = { false, false };
@@ -644,6 +652,10 @@ static int io_wq_manager(void *data)
 		schedule_timeout(HZ);
 	}
 
+	return 0;
+err:
+	set_bit(IO_WQ_BIT_ERROR, &wq->state);
+	complete(&wq->done);
 	return 0;
 }
 
@@ -982,7 +994,6 @@ struct io_wq *io_wq_create(unsigned bounded, struct mm_struct *mm,
 	wq->user = user;
 
 	i = 0;
-	refcount_set(&wq->refs, wq->nr_wqes);
 	for_each_online_node(node) {
 		struct io_wqe *wqe;
 
@@ -1020,6 +1031,10 @@ struct io_wq *io_wq_create(unsigned bounded, struct mm_struct *mm,
 	wq->manager = kthread_create(io_wq_manager, wq, "io_wq_manager");
 	if (!IS_ERR(wq->manager)) {
 		wake_up_process(wq->manager);
+		wait_for_completion(&wq->done);
+		if (test_bit(IO_WQ_BIT_ERROR, &wq->state))
+			goto err;
+		reinit_completion(&wq->done);
 		return wq;
 	}
 
@@ -1041,10 +1056,9 @@ void io_wq_destroy(struct io_wq *wq)
 {
 	int i;
 
-	if (wq->manager) {
-		set_bit(IO_WQ_BIT_EXIT, &wq->state);
+	set_bit(IO_WQ_BIT_EXIT, &wq->state);
+	if (wq->manager)
 		kthread_stop(wq->manager);
-	}
 
 	rcu_read_lock();
 	for (i = 0; i < wq->nr_wqes; i++) {

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-19 15:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28 19:42 INFO: task hung in io_wq_destroy syzbot
2019-10-28 21:00 ` Jens Axboe
2019-11-19  2:23   ` Eric Biggers
2019-11-19  3:15     ` Jens Axboe
2019-11-19  4:34       ` Jens Axboe
2019-11-19 15:20         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).