linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] INFO: task hung in io_sq_thread_park (2)
@ 2021-07-15  3:19 syzbot
  2021-07-15  8:44 ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2021-07-15  3:19 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot found the following issue on:

HEAD commit:    3dbdb38e2869 Merge branch 'for-5.14' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12ed1402300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a1fcf15a09815757
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1088c06c300000

The issue was bisected to:

commit 4d004099a668c41522242aa146a38cc4eb59cb1e
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Oct 2 09:04:21 2020 +0000

    lockdep: Fix lockdep recursion

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13e4f19c300000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=1014f19c300000
console output: https://syzkaller.appspot.com/x/log.txt?x=17e4f19c300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")

INFO: task kworker/u4:4:9930 blocked for more than 143 seconds.
      Not tainted 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:4    state:D stack:25664 pid: 9930 ppid:     2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x934/0x2710 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
 io_ring_exit_work+0x15a/0x15d0 fs/io_uring.c:8823
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task iou-sqp-2939:2946 blocked for more than 143 seconds.
      Not tainted 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-sqp-2939    state:D stack:28696 pid: 2946 ppid:  8489 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x934/0x2710 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9203
 io_sq_thread+0xa99/0x1250 fs/io_uring.c:6963
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
1 lock held by khungtaskd/1643:
 #0: ffffffff8c17bb80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8150:
 #0: ffff888034297270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
3 locks held by kworker/u4:4/9930:
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
 #1: ffffc9000b31fdb0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
 #2: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
1 lock held by iou-sqp-2939/2946:
 #0: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6883

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1643 Comm: khungtaskd Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
 watchdog+0xd4b/0xfb0 kernel/hung_task.c:294
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 9784 Comm: kworker/0:7 Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_power_efficient gc_worker
RIP: 0010:check_region_inline mm/kasan/generic.c:175 [inline]
RIP: 0010:kasan_check_range+0x2a/0x180 mm/kasan/generic.c:189
Code: 48 85 f6 0f 84 70 01 00 00 49 89 f9 41 54 44 0f b6 c2 49 01 f1 55 53 0f 82 18 01 00 00 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 <0f> 86 05 01 00 00 49 83 e9 01 48 89 fd 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000aedfad8 EFLAGS: 00000016
RAX: ffff7fffffffffff RBX: 00000000000004c2 RCX: ffffffff815ac59f
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff9049d938
RBP: ffff88802d62267a R08: 0000000000000000 R09: ffffffff9049d940
R10: fffffbfff2093b27 R11: 0000000000000000 R12: ffff88802d622658
R13: ffff88802d621c40 R14: 0000000000000000 R15: 23147e44cee2677e
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb48abd5000 CR3: 000000000be8e000 CR4: 0000000000350ef0
Call Trace:
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
 hlock_class kernel/locking/lockdep.c:199 [inline]
 lookup_chain_cache_add kernel/locking/lockdep.c:3701 [inline]
 validate_chain kernel/locking/lockdep.c:3757 [inline]
 __lock_acquire+0x162f/0x54a0 kernel/locking/lockdep.c:5015
 lock_acquire kernel/locking/lockdep.c:5625 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
 process_one_work+0x8fc/0x1630 kernel/workqueue.c:2252
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15  3:19 [syzbot] INFO: task hung in io_sq_thread_park (2) syzbot
@ 2021-07-15  8:44 ` Pavel Begunkov
  2021-07-15  9:15   ` syzbot
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-07-15  8:44 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/15/21 4:19 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:

#syz test: git://git.kernel.dk/linux-block io_uring-5.14

> 
> HEAD commit:    3dbdb38e2869 Merge branch 'for-5.14' of git://git.kernel.o..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12ed1402300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a1fcf15a09815757
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1088c06c300000
> 
> The issue was bisected to:
> 
> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri Oct 2 09:04:21 2020 +0000
> 
>     lockdep: Fix lockdep recursion
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13e4f19c300000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=1014f19c300000
> console output: https://syzkaller.appspot.com/x/log.txt?x=17e4f19c300000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
> Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
> 
> INFO: task kworker/u4:4:9930 blocked for more than 143 seconds.
>       Not tainted 5.13.0-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u4:4    state:D stack:25664 pid: 9930 ppid:     2 flags:0x00004000
> Workqueue: events_unbound io_ring_exit_work
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x934/0x2710 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
>  __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
>  __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
>  io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
>  io_ring_exit_work+0x15a/0x15d0 fs/io_uring.c:8823
>  process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> INFO: task iou-sqp-2939:2946 blocked for more than 143 seconds.
>       Not tainted 5.13.0-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-sqp-2939    state:D stack:28696 pid: 2946 ppid:  8489 flags:0x00004004
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x934/0x2710 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9203
>  io_sq_thread+0xa99/0x1250 fs/io_uring.c:6963
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1643:
>  #0: ffffffff8c17bb80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
> 1 lock held by in:imklog/8150:
>  #0: ffff888034297270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
> 3 locks held by kworker/u4:4/9930:
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
>  #1: ffffc9000b31fdb0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
>  #2: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
> 1 lock held by iou-sqp-2939/2946:
>  #0: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6883
> 
> =============================================
> 
> NMI backtrace for cpu 1
> CPU: 1 PID: 1643 Comm: khungtaskd Not tainted 5.13.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:79 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
>  nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
>  watchdog+0xd4b/0xfb0 kernel/hung_task.c:294
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 9784 Comm: kworker/0:7 Not tainted 5.13.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events_power_efficient gc_worker
> RIP: 0010:check_region_inline mm/kasan/generic.c:175 [inline]
> RIP: 0010:kasan_check_range+0x2a/0x180 mm/kasan/generic.c:189
> Code: 48 85 f6 0f 84 70 01 00 00 49 89 f9 41 54 44 0f b6 c2 49 01 f1 55 53 0f 82 18 01 00 00 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 <0f> 86 05 01 00 00 49 83 e9 01 48 89 fd 48 b8 00 00 00 00 00 fc ff
> RSP: 0018:ffffc9000aedfad8 EFLAGS: 00000016
> RAX: ffff7fffffffffff RBX: 00000000000004c2 RCX: ffffffff815ac59f
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff9049d938
> RBP: ffff88802d62267a R08: 0000000000000000 R09: ffffffff9049d940
> R10: fffffbfff2093b27 R11: 0000000000000000 R12: ffff88802d622658
> R13: ffff88802d621c40 R14: 0000000000000000 R15: 23147e44cee2677e
> FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fb48abd5000 CR3: 000000000be8e000 CR4: 0000000000350ef0
> Call Trace:
>  instrument_atomic_read include/linux/instrumented.h:71 [inline]
>  test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
>  hlock_class kernel/locking/lockdep.c:199 [inline]
>  lookup_chain_cache_add kernel/locking/lockdep.c:3701 [inline]
>  validate_chain kernel/locking/lockdep.c:3757 [inline]
>  __lock_acquire+0x162f/0x54a0 kernel/locking/lockdep.c:5015
>  lock_acquire kernel/locking/lockdep.c:5625 [inline]
>  lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
>  process_one_work+0x8fc/0x1630 kernel/workqueue.c:2252
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15  8:44 ` Pavel Begunkov
@ 2021-07-15  9:15   ` syzbot
  2021-07-15 18:16     ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2021-07-15  9:15 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in io_sq_thread_park

INFO: task kworker/u4:1:10 blocked for more than 143 seconds.
      Not tainted 5.14.0-rc1-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:1    state:D stack:26320 pid:   10 ppid:     2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
 io_ring_exit_work+0x15a/0x1600 fs/io_uring.c:8771
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task iou-sqp-23145:23155 blocked for more than 143 seconds.
      Not tainted 5.14.0-rc1-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9150
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6916
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
3 locks held by kworker/u4:1/10:
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
 #1: ffffc90000cf7db0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
 #2: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
1 lock held by khungtaskd/1620:
 #0: ffffffff8b97b900 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8316:
 #0: ffff888022f82ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
1 lock held by iou-sqp-23145/23155:
 #0: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6836

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1620 Comm: khungtaskd Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xd0a/0xfc0 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 10540 Comm: kworker/u4:10 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: phy8 ieee80211_iface_work
RIP: 0010:write_comp_data kernel/kcov.c:218 [inline]
RIP: 0010:__sanitizer_cov_trace_switch+0x65/0xf0 kernel/kcov.c:320
Code: 10 31 c9 65 4c 8b 24 25 00 f0 01 00 4d 85 d2 74 6b 4c 89 e6 bf 03 00 00 00 4c 8b 4c 24 20 49 8b 6c c8 10 e8 2d ff ff ff 84 c0 <74> 47 49 8b 84 24 18 15 00 00 41 8b bc 24 14 15 00 00 48 8b 10 48
RSP: 0018:ffffc9000bfa7030 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000016
RDX: 0000000000000000 RSI: ffff88801c5d8000 RDI: 0000000000000003
RBP: 0000000000000084 R08: ffffffff8aaa7a20 R09: ffffffff88878073
R10: 0000000000000020 R11: 0000000000000003 R12: ffff88801c5d8000
R13: dffffc0000000000 R14: ffff888053061087 R15: ffff888053061086
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f87cdfe4090 CR3: 000000002a61f000 CR4: 0000000000350ef0
Call Trace:
 _ieee802_11_parse_elems_crc+0x1e3/0x1f90 net/mac80211/util.c:1018
 ieee802_11_parse_elems_crc+0x89e/0xfe0 net/mac80211/util.c:1478
 ieee802_11_parse_elems net/mac80211/ieee80211_i.h:2186 [inline]
 ieee80211_bss_info_update+0x463/0xb70 net/mac80211/scan.c:212
 ieee80211_rx_bss_info net/mac80211/ibss.c:1126 [inline]
 ieee80211_rx_mgmt_probe_beacon+0xcc6/0x17c0 net/mac80211/ibss.c:1615
 ieee80211_ibss_rx_queued_mgmt+0xd30/0x1610 net/mac80211/ibss.c:1642
 ieee80211_iface_process_skb net/mac80211/iface.c:1426 [inline]
 ieee80211_iface_work+0x7f7/0xa40 net/mac80211/iface.c:1462
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295


Tested on:

commit:         1b48773f io_uring: fix io_drain_req()
git tree:       git://git.kernel.dk/linux-block io_uring-5.14
console output: https://syzkaller.appspot.com/x/log.txt?x=12040824300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15  9:15   ` syzbot
@ 2021-07-15 18:16     ` Pavel Begunkov
  2021-07-16  0:14       ` syzbot
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-07-15 18:16 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/15/21 10:15 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: task hung in io_sq_thread_park

The interesting part here is this:

[  282.599913][ T1620] task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
[  282.609927][ T1620] Call Trace:
[  282.613711][ T1620]  __schedule+0x93a/0x26f0
[  282.618131][ T1620]  ? io_schedule_timeout+0x140/0x140
[  282.623446][ T1620]  ? lockdep_hardirqs_on+0x79/0x100
[  282.628789][ T1620]  ? _raw_spin_unlock_irqrestore+0x3d/0x70
[  282.634647][ T1620]  schedule+0xd3/0x270
[  282.638874][ T1620]  io_uring_cancel_generic+0x54d/0x890
[  282.644428][ T1620]  ? __io_uring_free+0x170/0x170
[  282.649361][ T1620]  ? lockdep_hardirqs_on+0x79/0x100
[  282.655639][ T1620]  ? finish_wait+0x270/0x270
[  282.660346][ T1620]  io_sq_thread+0xaac/0x1250
[  282.665096][ T1620]  ? io_uring_cancel_generic+0x890/0x890
[  282.670788][ T1620]  ? ret_from_fork+0x8/0x30
[  282.675295][ T1620]  ? finish_wait+0x270/0x270
[  282.679983][ T1620]  ? rwlock_bug.part.0+0x90/0x90
[  282.684958][ T1620]  ? _raw_spin_unlock_irq+0x1f/0x40
[  282.690155][ T1620]  ? io_uring_cancel_generic+0x890/0x890
[  282.696394][ T1620]  ret_from_fork+0x1f/0x30

So we have a dying SQPOLL task, which is stuck in cancellation.
Let's add some debug output and see if we have a request lost.

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang 


> INFO: task kworker/u4:1:10 blocked for more than 143 seconds.
>       Not tainted 5.14.0-rc1-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u4:1    state:D stack:26320 pid:   10 ppid:     2 flags:0x00004000
> Workqueue: events_unbound io_ring_exit_work
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
>  __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
>  __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
>  io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
>  io_ring_exit_work+0x15a/0x1600 fs/io_uring.c:8771
>  process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> INFO: task iou-sqp-23145:23155 blocked for more than 143 seconds.
>       Not tainted 5.14.0-rc1-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9150
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6916
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> Showing all locks held in the system:
> 3 locks held by kworker/u4:1/10:
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
>  #1: ffffc90000cf7db0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
>  #2: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
> 1 lock held by khungtaskd/1620:
>  #0: ffffffff8b97b900 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
> 1 lock held by in:imklog/8316:
>  #0: ffff888022f82ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
> 1 lock held by iou-sqp-23145/23155:
>  #0: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6836
> 
> =============================================
> 
> NMI backtrace for cpu 1
> CPU: 1 PID: 1620 Comm: khungtaskd Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
>  watchdog+0xd0a/0xfc0 kernel/hung_task.c:295
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 10540 Comm: kworker/u4:10 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: phy8 ieee80211_iface_work
> RIP: 0010:write_comp_data kernel/kcov.c:218 [inline]
> RIP: 0010:__sanitizer_cov_trace_switch+0x65/0xf0 kernel/kcov.c:320
> Code: 10 31 c9 65 4c 8b 24 25 00 f0 01 00 4d 85 d2 74 6b 4c 89 e6 bf 03 00 00 00 4c 8b 4c 24 20 49 8b 6c c8 10 e8 2d ff ff ff 84 c0 <74> 47 49 8b 84 24 18 15 00 00 41 8b bc 24 14 15 00 00 48 8b 10 48
> RSP: 0018:ffffc9000bfa7030 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000016
> RDX: 0000000000000000 RSI: ffff88801c5d8000 RDI: 0000000000000003
> RBP: 0000000000000084 R08: ffffffff8aaa7a20 R09: ffffffff88878073
> R10: 0000000000000020 R11: 0000000000000003 R12: ffff88801c5d8000
> R13: dffffc0000000000 R14: ffff888053061087 R15: ffff888053061086
> FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f87cdfe4090 CR3: 000000002a61f000 CR4: 0000000000350ef0
> Call Trace:
>  _ieee802_11_parse_elems_crc+0x1e3/0x1f90 net/mac80211/util.c:1018
>  ieee802_11_parse_elems_crc+0x89e/0xfe0 net/mac80211/util.c:1478
>  ieee802_11_parse_elems net/mac80211/ieee80211_i.h:2186 [inline]
>  ieee80211_bss_info_update+0x463/0xb70 net/mac80211/scan.c:212
>  ieee80211_rx_bss_info net/mac80211/ibss.c:1126 [inline]
>  ieee80211_rx_mgmt_probe_beacon+0xcc6/0x17c0 net/mac80211/ibss.c:1615
>  ieee80211_ibss_rx_queued_mgmt+0xd30/0x1610 net/mac80211/ibss.c:1642
>  ieee80211_iface_process_skb net/mac80211/iface.c:1426 [inline]
>  ieee80211_iface_work+0x7f7/0xa40 net/mac80211/iface.c:1462
>  process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> 
> Tested on:
> 
> commit:         1b48773f io_uring: fix io_drain_req()
> git tree:       git://git.kernel.dk/linux-block io_uring-5.14
> console output: https://syzkaller.appspot.com/x/log.txt?x=12040824300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> compiler:       
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15 18:16     ` Pavel Begunkov
@ 2021-07-16  0:14       ` syzbot
  2021-07-16 10:38         ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2021-07-16  0:14 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in io_uring_cancel_generic

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_dump fs/io_uring.c:9117 [inline]
WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Modules linked in:
CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
FS:  00007fb893f40700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f68dd0c6080 CR3: 000000002b457000 CR4: 0000000000350ef0
Call Trace:
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 panic+0x306/0x73d kernel/panic.c:232
 __warn.cold+0x35/0x44 kernel/panic.c:606
 report_bug+0x1bd/0x210 lib/bug.c:199
 handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit:         d55b8f3c io_uring: add syz debug info
git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
console output: https://syzkaller.appspot.com/x/log.txt?x=16f6cb80300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-16  0:14       ` syzbot
@ 2021-07-16 10:38         ` Pavel Begunkov
  2021-07-16 10:57           ` syzbot
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-07-16 10:38 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/16/21 1:14 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

attempt to get stats #2

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang 

> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_dump fs/io_uring.c:9117 [inline]
> WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
> Modules linked in:
> CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
> RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
> Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
> RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
> RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
> RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
> RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
> R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
> FS:  00007fb893f40700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f68dd0c6080 CR3: 000000002b457000 CR4: 0000000000350ef0
> Call Trace:
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  panic+0x306/0x73d kernel/panic.c:232
>  __warn.cold+0x35/0x44 kernel/panic.c:606
>  report_bug+0x1bd/0x210 lib/bug.c:199
>  handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
>  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
>  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
> RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
> RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
> Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
> RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
> RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
> RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
> RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
> R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> Tested on:
> 
> commit:         d55b8f3c io_uring: add syz debug info
> git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
> console output: https://syzkaller.appspot.com/x/log.txt?x=16f6cb80300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> compiler:       
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-16 10:38         ` Pavel Begunkov
@ 2021-07-16 10:57           ` syzbot
  2021-07-19 16:57             ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2021-07-16 10:57 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in io_uring_cancel_generic

ctx ffff8880467ee000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1 
------------[ cut here ]------------
WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_dump fs/io_uring.c:9142 [inline]
WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Modules linked in:
CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f9295c2d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1c3f1a3000 CR3: 0000000018755000 CR4: 0000000000350ee0
Call Trace:
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 panic+0x306/0x73d kernel/panic.c:232
 __warn.cold+0x35/0x44 kernel/panic.c:606
 report_bug+0x1bd/0x210 lib/bug.c:199
 handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit:         81fee56e io_uring: add syz debug info
git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
console output: https://syzkaller.appspot.com/x/log.txt?x=12bb485c300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-16 10:57           ` syzbot
@ 2021-07-19 16:57             ` Pavel Begunkov
  2021-07-19 17:13               ` Jens Axboe
  2021-07-19 17:15               ` syzbot
  0 siblings, 2 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-07-19 16:57 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/16/21 11:57 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

__arm_poll doesn't remove a second poll entry in case of failed
__io_queue_proc(), it's most likely the cause here.

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

> 
> ctx ffff8880467ee000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_dump fs/io_uring.c:9142 [inline]
> WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Modules linked in:
> CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
> RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
> RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
> RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
> RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
> R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
> R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
> FS:  00007f9295c2d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1c3f1a3000 CR3: 0000000018755000 CR4: 0000000000350ee0
> Call Trace:
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  panic+0x306/0x73d kernel/panic.c:232
>  __warn.cold+0x35/0x44 kernel/panic.c:606
>  report_bug+0x1bd/0x210 lib/bug.c:199
>  handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
>  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
>  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
> RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
> RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
> RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
> RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
> RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
> R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
> R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> Tested on:
> 
> commit:         81fee56e io_uring: add syz debug info
> git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
> console output: https://syzkaller.appspot.com/x/log.txt?x=12bb485c300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> compiler:       
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 16:57             ` Pavel Begunkov
@ 2021-07-19 17:13               ` Jens Axboe
  2021-07-19 17:28                 ` Pavel Begunkov
  2021-07-19 17:15               ` syzbot
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-07-19 17:13 UTC (permalink / raw)
  To: Pavel Begunkov, syzbot, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

On 7/19/21 10:57 AM, Pavel Begunkov wrote:
> On 7/16/21 11:57 AM, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>> WARNING in io_uring_cancel_generic
> 
> __arm_poll doesn't remove a second poll entry in case of failed
> __io_queue_proc(), it's most likely the cause here.
> 
> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

Was my thought on seeing the last debug run too. Haven't written a test
case, but my initial thought was catching this at the time that double
poll is armed, in __io_queue_proc(). Totally untested, just tossing
it out there.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0cac361bf6b8..ed33de5fffd2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5002,6 +5002,9 @@ static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
 	if (unlikely(poll->head)) {
 		struct io_poll_iocb *poll_one = poll;
 
+		/* first poll failed, don't arm double poll */
+		if (pt->error)
+			return;
 		/* already have a 2nd entry, fail a third attempt */
 		if (*poll_ptr) {
 			pt->error = -EINVAL;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 16:57             ` Pavel Begunkov
  2021-07-19 17:13               ` Jens Axboe
@ 2021-07-19 17:15               ` syzbot
  1 sibling, 0 replies; 12+ messages in thread
From: syzbot @ 2021-07-19 17:15 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com

Tested on:

commit:         61d27d88 io_uring: double remove on poll arm failure
git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 17:13               ` Jens Axboe
@ 2021-07-19 17:28                 ` Pavel Begunkov
  2021-07-19 18:38                   ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-07-19 17:28 UTC (permalink / raw)
  To: Jens Axboe, syzbot, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/19/21 6:13 PM, Jens Axboe wrote:
> On 7/19/21 10:57 AM, Pavel Begunkov wrote:
>> On 7/16/21 11:57 AM, syzbot wrote:
>>> Hello,
>>>
>>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>>> WARNING in io_uring_cancel_generic
>>
>> __arm_poll doesn't remove a second poll entry in case of failed
>> __io_queue_proc(), it's most likely the cause here.
>>
>> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang
> 
> Was my thought on seeing the last debug run too. Haven't written a test
> case, but my initial thought was catching this at the time that double
> poll is armed, in __io_queue_proc(). Totally untested, just tossing
> it out there.

Wouldn't help, unfortunately, the way syz triggers it is making a
request to go through __io_queue_proc() three times.

Either it's 3 waitqueues or we need to extend the check below to
the double poll entry.

if (poll_one->head == head)
	return;

> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 0cac361bf6b8..ed33de5fffd2 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5002,6 +5002,9 @@ static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
>  	if (unlikely(poll->head)) {
>  		struct io_poll_iocb *poll_one = poll;
>  
> +		/* first poll failed, don't arm double poll */
> +		if (pt->error)
> +			return;
>  		/* already have a 2nd entry, fail a third attempt */
>  		if (*poll_ptr) {
>  			pt->error = -EINVAL;
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 17:28                 ` Pavel Begunkov
@ 2021-07-19 18:38                   ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2021-07-19 18:38 UTC (permalink / raw)
  To: Pavel Begunkov, syzbot, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

On 7/19/21 11:28 AM, Pavel Begunkov wrote:
> On 7/19/21 6:13 PM, Jens Axboe wrote:
>> On 7/19/21 10:57 AM, Pavel Begunkov wrote:
>>> On 7/16/21 11:57 AM, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>>>> WARNING in io_uring_cancel_generic
>>>
>>> __arm_poll doesn't remove a second poll entry in case of failed
>>> __io_queue_proc(), it's most likely the cause here.
>>>
>>> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang
>>
>> Was my thought on seeing the last debug run too. Haven't written a test
>> case, but my initial thought was catching this at the time that double
>> poll is armed, in __io_queue_proc(). Totally untested, just tossing
>> it out there.
> 
> Wouldn't help, unfortunately, the way syz triggers it is making a
> request to go through __io_queue_proc() three times.
> 
> Either it's 3 waitqueues or we need to extend the check below to
> the double poll entry.
> 
> if (poll_one->head == head)
> 	return;

Yes good point, that'd depend on single poll erroring first. Given
the variety of cases for it, catching it after the fact like in your
patch is likely the simplest/cleanest way.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-19 19:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-15  3:19 [syzbot] INFO: task hung in io_sq_thread_park (2) syzbot
2021-07-15  8:44 ` Pavel Begunkov
2021-07-15  9:15   ` syzbot
2021-07-15 18:16     ` Pavel Begunkov
2021-07-16  0:14       ` syzbot
2021-07-16 10:38         ` Pavel Begunkov
2021-07-16 10:57           ` syzbot
2021-07-19 16:57             ` Pavel Begunkov
2021-07-19 17:13               ` Jens Axboe
2021-07-19 17:28                 ` Pavel Begunkov
2021-07-19 18:38                   ` Jens Axboe
2021-07-19 17:15               ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).