All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] INFO: task hung in io_sq_thread_park (2)
@ 2021-07-15  3:19 syzbot
  2021-07-15  8:44 ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: syzbot @ 2021-07-15  3:19 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot found the following issue on:

HEAD commit:    3dbdb38e2869 Merge branch 'for-5.14' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12ed1402300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a1fcf15a09815757
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1088c06c300000

The issue was bisected to:

commit 4d004099a668c41522242aa146a38cc4eb59cb1e
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Oct 2 09:04:21 2020 +0000

    lockdep: Fix lockdep recursion

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13e4f19c300000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=1014f19c300000
console output: https://syzkaller.appspot.com/x/log.txt?x=17e4f19c300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")

INFO: task kworker/u4:4:9930 blocked for more than 143 seconds.
      Not tainted 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:4    state:D stack:25664 pid: 9930 ppid:     2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x934/0x2710 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
 io_ring_exit_work+0x15a/0x15d0 fs/io_uring.c:8823
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task iou-sqp-2939:2946 blocked for more than 143 seconds.
      Not tainted 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-sqp-2939    state:D stack:28696 pid: 2946 ppid:  8489 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x934/0x2710 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9203
 io_sq_thread+0xa99/0x1250 fs/io_uring.c:6963
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
1 lock held by khungtaskd/1643:
 #0: ffffffff8c17bb80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8150:
 #0: ffff888034297270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
3 locks held by kworker/u4:4/9930:
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
 #1: ffffc9000b31fdb0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
 #2: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
1 lock held by iou-sqp-2939/2946:
 #0: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6883

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1643 Comm: khungtaskd Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
 watchdog+0xd4b/0xfb0 kernel/hung_task.c:294
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 9784 Comm: kworker/0:7 Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_power_efficient gc_worker
RIP: 0010:check_region_inline mm/kasan/generic.c:175 [inline]
RIP: 0010:kasan_check_range+0x2a/0x180 mm/kasan/generic.c:189
Code: 48 85 f6 0f 84 70 01 00 00 49 89 f9 41 54 44 0f b6 c2 49 01 f1 55 53 0f 82 18 01 00 00 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 <0f> 86 05 01 00 00 49 83 e9 01 48 89 fd 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000aedfad8 EFLAGS: 00000016
RAX: ffff7fffffffffff RBX: 00000000000004c2 RCX: ffffffff815ac59f
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff9049d938
RBP: ffff88802d62267a R08: 0000000000000000 R09: ffffffff9049d940
R10: fffffbfff2093b27 R11: 0000000000000000 R12: ffff88802d622658
R13: ffff88802d621c40 R14: 0000000000000000 R15: 23147e44cee2677e
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb48abd5000 CR3: 000000000be8e000 CR4: 0000000000350ef0
Call Trace:
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
 hlock_class kernel/locking/lockdep.c:199 [inline]
 lookup_chain_cache_add kernel/locking/lockdep.c:3701 [inline]
 validate_chain kernel/locking/lockdep.c:3757 [inline]
 __lock_acquire+0x162f/0x54a0 kernel/locking/lockdep.c:5015
 lock_acquire kernel/locking/lockdep.c:5625 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
 process_one_work+0x8fc/0x1630 kernel/workqueue.c:2252
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15  3:19 [syzbot] INFO: task hung in io_sq_thread_park (2) syzbot
@ 2021-07-15  8:44 ` Pavel Begunkov
  2021-07-15  9:15   ` syzbot
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2021-07-15  8:44 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/15/21 4:19 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:

#syz test: git://git.kernel.dk/linux-block io_uring-5.14

> 
> HEAD commit:    3dbdb38e2869 Merge branch 'for-5.14' of git://git.kernel.o..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12ed1402300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a1fcf15a09815757
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1088c06c300000
> 
> The issue was bisected to:
> 
> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri Oct 2 09:04:21 2020 +0000
> 
>     lockdep: Fix lockdep recursion
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13e4f19c300000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=1014f19c300000
> console output: https://syzkaller.appspot.com/x/log.txt?x=17e4f19c300000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
> Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
> 
> INFO: task kworker/u4:4:9930 blocked for more than 143 seconds.
>       Not tainted 5.13.0-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u4:4    state:D stack:25664 pid: 9930 ppid:     2 flags:0x00004000
> Workqueue: events_unbound io_ring_exit_work
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x934/0x2710 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
>  __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
>  __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
>  io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
>  io_ring_exit_work+0x15a/0x15d0 fs/io_uring.c:8823
>  process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> INFO: task iou-sqp-2939:2946 blocked for more than 143 seconds.
>       Not tainted 5.13.0-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-sqp-2939    state:D stack:28696 pid: 2946 ppid:  8489 flags:0x00004004
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x934/0x2710 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9203
>  io_sq_thread+0xa99/0x1250 fs/io_uring.c:6963
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1643:
>  #0: ffffffff8c17bb80 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
> 1 lock held by in:imklog/8150:
>  #0: ffff888034297270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
> 3 locks held by kworker/u4:4/9930:
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>  #0: ffff888011069138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
>  #1: ffffc9000b31fdb0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
>  #2: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7361
> 1 lock held by iou-sqp-2939/2946:
>  #0: ffff888045068870 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6883
> 
> =============================================
> 
> NMI backtrace for cpu 1
> CPU: 1 PID: 1643 Comm: khungtaskd Not tainted 5.13.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:79 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
>  nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
>  watchdog+0xd4b/0xfb0 kernel/hung_task.c:294
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 9784 Comm: kworker/0:7 Not tainted 5.13.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events_power_efficient gc_worker
> RIP: 0010:check_region_inline mm/kasan/generic.c:175 [inline]
> RIP: 0010:kasan_check_range+0x2a/0x180 mm/kasan/generic.c:189
> Code: 48 85 f6 0f 84 70 01 00 00 49 89 f9 41 54 44 0f b6 c2 49 01 f1 55 53 0f 82 18 01 00 00 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 <0f> 86 05 01 00 00 49 83 e9 01 48 89 fd 48 b8 00 00 00 00 00 fc ff
> RSP: 0018:ffffc9000aedfad8 EFLAGS: 00000016
> RAX: ffff7fffffffffff RBX: 00000000000004c2 RCX: ffffffff815ac59f
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff9049d938
> RBP: ffff88802d62267a R08: 0000000000000000 R09: ffffffff9049d940
> R10: fffffbfff2093b27 R11: 0000000000000000 R12: ffff88802d622658
> R13: ffff88802d621c40 R14: 0000000000000000 R15: 23147e44cee2677e
> FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fb48abd5000 CR3: 000000000be8e000 CR4: 0000000000350ef0
> Call Trace:
>  instrument_atomic_read include/linux/instrumented.h:71 [inline]
>  test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
>  hlock_class kernel/locking/lockdep.c:199 [inline]
>  lookup_chain_cache_add kernel/locking/lockdep.c:3701 [inline]
>  validate_chain kernel/locking/lockdep.c:3757 [inline]
>  __lock_acquire+0x162f/0x54a0 kernel/locking/lockdep.c:5015
>  lock_acquire kernel/locking/lockdep.c:5625 [inline]
>  lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
>  process_one_work+0x8fc/0x1630 kernel/workqueue.c:2252
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15  8:44 ` Pavel Begunkov
@ 2021-07-15  9:15   ` syzbot
  2021-07-15 18:16     ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: syzbot @ 2021-07-15  9:15 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in io_sq_thread_park

INFO: task kworker/u4:1:10 blocked for more than 143 seconds.
      Not tainted 5.14.0-rc1-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:1    state:D stack:26320 pid:   10 ppid:     2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
 io_ring_exit_work+0x15a/0x1600 fs/io_uring.c:8771
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task iou-sqp-23145:23155 blocked for more than 143 seconds.
      Not tainted 5.14.0-rc1-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9150
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6916
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
3 locks held by kworker/u4:1/10:
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
 #1: ffffc90000cf7db0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
 #2: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
1 lock held by khungtaskd/1620:
 #0: ffffffff8b97b900 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8316:
 #0: ffff888022f82ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
1 lock held by iou-sqp-23145/23155:
 #0: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6836

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1620 Comm: khungtaskd Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xd0a/0xfc0 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 10540 Comm: kworker/u4:10 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: phy8 ieee80211_iface_work
RIP: 0010:write_comp_data kernel/kcov.c:218 [inline]
RIP: 0010:__sanitizer_cov_trace_switch+0x65/0xf0 kernel/kcov.c:320
Code: 10 31 c9 65 4c 8b 24 25 00 f0 01 00 4d 85 d2 74 6b 4c 89 e6 bf 03 00 00 00 4c 8b 4c 24 20 49 8b 6c c8 10 e8 2d ff ff ff 84 c0 <74> 47 49 8b 84 24 18 15 00 00 41 8b bc 24 14 15 00 00 48 8b 10 48
RSP: 0018:ffffc9000bfa7030 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000016
RDX: 0000000000000000 RSI: ffff88801c5d8000 RDI: 0000000000000003
RBP: 0000000000000084 R08: ffffffff8aaa7a20 R09: ffffffff88878073
R10: 0000000000000020 R11: 0000000000000003 R12: ffff88801c5d8000
R13: dffffc0000000000 R14: ffff888053061087 R15: ffff888053061086
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f87cdfe4090 CR3: 000000002a61f000 CR4: 0000000000350ef0
Call Trace:
 _ieee802_11_parse_elems_crc+0x1e3/0x1f90 net/mac80211/util.c:1018
 ieee802_11_parse_elems_crc+0x89e/0xfe0 net/mac80211/util.c:1478
 ieee802_11_parse_elems net/mac80211/ieee80211_i.h:2186 [inline]
 ieee80211_bss_info_update+0x463/0xb70 net/mac80211/scan.c:212
 ieee80211_rx_bss_info net/mac80211/ibss.c:1126 [inline]
 ieee80211_rx_mgmt_probe_beacon+0xcc6/0x17c0 net/mac80211/ibss.c:1615
 ieee80211_ibss_rx_queued_mgmt+0xd30/0x1610 net/mac80211/ibss.c:1642
 ieee80211_iface_process_skb net/mac80211/iface.c:1426 [inline]
 ieee80211_iface_work+0x7f7/0xa40 net/mac80211/iface.c:1462
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295


Tested on:

commit:         1b48773f io_uring: fix io_drain_req()
git tree:       git://git.kernel.dk/linux-block io_uring-5.14
console output: https://syzkaller.appspot.com/x/log.txt?x=12040824300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15  9:15   ` syzbot
@ 2021-07-15 18:16     ` Pavel Begunkov
  2021-07-16  0:14       ` syzbot
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2021-07-15 18:16 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/15/21 10:15 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: task hung in io_sq_thread_park

The interesting part here is this:

[  282.599913][ T1620] task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
[  282.609927][ T1620] Call Trace:
[  282.613711][ T1620]  __schedule+0x93a/0x26f0
[  282.618131][ T1620]  ? io_schedule_timeout+0x140/0x140
[  282.623446][ T1620]  ? lockdep_hardirqs_on+0x79/0x100
[  282.628789][ T1620]  ? _raw_spin_unlock_irqrestore+0x3d/0x70
[  282.634647][ T1620]  schedule+0xd3/0x270
[  282.638874][ T1620]  io_uring_cancel_generic+0x54d/0x890
[  282.644428][ T1620]  ? __io_uring_free+0x170/0x170
[  282.649361][ T1620]  ? lockdep_hardirqs_on+0x79/0x100
[  282.655639][ T1620]  ? finish_wait+0x270/0x270
[  282.660346][ T1620]  io_sq_thread+0xaac/0x1250
[  282.665096][ T1620]  ? io_uring_cancel_generic+0x890/0x890
[  282.670788][ T1620]  ? ret_from_fork+0x8/0x30
[  282.675295][ T1620]  ? finish_wait+0x270/0x270
[  282.679983][ T1620]  ? rwlock_bug.part.0+0x90/0x90
[  282.684958][ T1620]  ? _raw_spin_unlock_irq+0x1f/0x40
[  282.690155][ T1620]  ? io_uring_cancel_generic+0x890/0x890
[  282.696394][ T1620]  ret_from_fork+0x1f/0x30

So we have a dying SQPOLL task, which is stuck in cancellation.
Let's add some debug output and see if we have a request lost.

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang 


> INFO: task kworker/u4:1:10 blocked for more than 143 seconds.
>       Not tainted 5.14.0-rc1-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u4:1    state:D stack:26320 pid:   10 ppid:     2 flags:0x00004000
> Workqueue: events_unbound io_ring_exit_work
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6078
>  __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
>  __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
>  io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
>  io_ring_exit_work+0x15a/0x1600 fs/io_uring.c:8771
>  process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> INFO: task iou-sqp-23145:23155 blocked for more than 143 seconds.
>       Not tainted 5.14.0-rc1-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
> Call Trace:
>  context_switch kernel/sched/core.c:4683 [inline]
>  __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
>  schedule+0xd3/0x270 kernel/sched/core.c:6019
>  io_uring_cancel_generic+0x54d/0x890 fs/io_uring.c:9150
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6916
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> Showing all locks held in the system:
> 3 locks held by kworker/u4:1/10:
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>  #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
>  #1: ffffc90000cf7db0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
>  #2: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x79/0xd0 fs/io_uring.c:7314
> 1 lock held by khungtaskd/1620:
>  #0: ffffffff8b97b900 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
> 1 lock held by in:imklog/8316:
>  #0: ffff888022f82ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
> 1 lock held by iou-sqp-23145/23155:
>  #0: ffff88803d708470 (&sqd->lock){+.+.}-{3:3}, at: io_sqd_handle_event+0x2d6/0x350 fs/io_uring.c:6836
> 
> =============================================
> 
> NMI backtrace for cpu 1
> CPU: 1 PID: 1620 Comm: khungtaskd Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
>  watchdog+0xd0a/0xfc0 kernel/hung_task.c:295
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 10540 Comm: kworker/u4:10 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: phy8 ieee80211_iface_work
> RIP: 0010:write_comp_data kernel/kcov.c:218 [inline]
> RIP: 0010:__sanitizer_cov_trace_switch+0x65/0xf0 kernel/kcov.c:320
> Code: 10 31 c9 65 4c 8b 24 25 00 f0 01 00 4d 85 d2 74 6b 4c 89 e6 bf 03 00 00 00 4c 8b 4c 24 20 49 8b 6c c8 10 e8 2d ff ff ff 84 c0 <74> 47 49 8b 84 24 18 15 00 00 41 8b bc 24 14 15 00 00 48 8b 10 48
> RSP: 0018:ffffc9000bfa7030 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000016
> RDX: 0000000000000000 RSI: ffff88801c5d8000 RDI: 0000000000000003
> RBP: 0000000000000084 R08: ffffffff8aaa7a20 R09: ffffffff88878073
> R10: 0000000000000020 R11: 0000000000000003 R12: ffff88801c5d8000
> R13: dffffc0000000000 R14: ffff888053061087 R15: ffff888053061086
> FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f87cdfe4090 CR3: 000000002a61f000 CR4: 0000000000350ef0
> Call Trace:
>  _ieee802_11_parse_elems_crc+0x1e3/0x1f90 net/mac80211/util.c:1018
>  ieee802_11_parse_elems_crc+0x89e/0xfe0 net/mac80211/util.c:1478
>  ieee802_11_parse_elems net/mac80211/ieee80211_i.h:2186 [inline]
>  ieee80211_bss_info_update+0x463/0xb70 net/mac80211/scan.c:212
>  ieee80211_rx_bss_info net/mac80211/ibss.c:1126 [inline]
>  ieee80211_rx_mgmt_probe_beacon+0xcc6/0x17c0 net/mac80211/ibss.c:1615
>  ieee80211_ibss_rx_queued_mgmt+0xd30/0x1610 net/mac80211/ibss.c:1642
>  ieee80211_iface_process_skb net/mac80211/iface.c:1426 [inline]
>  ieee80211_iface_work+0x7f7/0xa40 net/mac80211/iface.c:1462
>  process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>  kthread+0x3e5/0x4d0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> 
> Tested on:
> 
> commit:         1b48773f io_uring: fix io_drain_req()
> git tree:       git://git.kernel.dk/linux-block io_uring-5.14
> console output: https://syzkaller.appspot.com/x/log.txt?x=12040824300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> compiler:       
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-15 18:16     ` Pavel Begunkov
@ 2021-07-16  0:14       ` syzbot
  2021-07-16 10:38         ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: syzbot @ 2021-07-16  0:14 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in io_uring_cancel_generic

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_dump fs/io_uring.c:9117 [inline]
WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Modules linked in:
CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
FS:  00007fb893f40700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f68dd0c6080 CR3: 000000002b457000 CR4: 0000000000350ef0
Call Trace:
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 panic+0x306/0x73d kernel/panic.c:232
 __warn.cold+0x35/0x44 kernel/panic.c:606
 report_bug+0x1bd/0x210 lib/bug.c:199
 handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit:         d55b8f3c io_uring: add syz debug info
git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
console output: https://syzkaller.appspot.com/x/log.txt?x=16f6cb80300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-16  0:14       ` syzbot
@ 2021-07-16 10:38         ` Pavel Begunkov
  2021-07-16 10:57           ` syzbot
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2021-07-16 10:38 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/16/21 1:14 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

attempt to get stats #2

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang 

> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_dump fs/io_uring.c:9117 [inline]
> WARNING: CPU: 1 PID: 25279 at fs/io_uring.c:9122 io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
> Modules linked in:
> CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
> RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
> Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
> RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
> RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
> RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
> RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
> R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
> FS:  00007fb893f40700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f68dd0c6080 CR3: 000000002b457000 CR4: 0000000000350ef0
> Call Trace:
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 1 PID: 25279 Comm: iou-sqp-25271 Tainted: G        W         5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  panic+0x306/0x73d kernel/panic.c:232
>  __warn.cold+0x35/0x44 kernel/panic.c:606
>  report_bug+0x1bd/0x210 lib/bug.c:199
>  handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
>  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
>  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
> RIP: 0010:io_dump fs/io_uring.c:9122 [inline]
> RIP: 0010:io_uring_cancel_generic+0x5fb/0xe50 fs/io_uring.c:9195
> Code: 08 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 9e 06 00 00 48 8b 85 10 08 00 00 48 89 44 24 40 <0f> 0b 48 8b 44 24 10 48 05 98 00 00 00 48 89 c2 48 89 44 24 08 48
> RSP: 0018:ffffc9000ab87c50 EFLAGS: 00010246
> RAX: ffff88803ddb7800 RBX: ffff888049f73880 RCX: 0000000000000000
> RDX: 1ffff110093ee812 RSI: ffffffff81e2ebed RDI: ffff888049f74090
> RBP: ffff888049f73880 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81e2ebc5 R11: 0000000000000001 R12: ffffc9000ab87d40
> R13: ffff88803ddb7858 R14: 0000000000000001 R15: 0000000000000000
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6930
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> Tested on:
> 
> commit:         d55b8f3c io_uring: add syz debug info
> git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
> console output: https://syzkaller.appspot.com/x/log.txt?x=16f6cb80300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> compiler:       
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-16 10:38         ` Pavel Begunkov
@ 2021-07-16 10:57           ` syzbot
  2021-07-19 13:15             ` Pavel Begunkov
  2021-07-19 16:57             ` Pavel Begunkov
  0 siblings, 2 replies; 14+ messages in thread
From: syzbot @ 2021-07-16 10:57 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in io_uring_cancel_generic

ctx ffff8880467ee000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1 
------------[ cut here ]------------
WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_dump fs/io_uring.c:9142 [inline]
WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Modules linked in:
CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f9295c2d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1c3f1a3000 CR3: 0000000018755000 CR4: 0000000000350ee0
Call Trace:
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 panic+0x306/0x73d kernel/panic.c:232
 __warn.cold+0x35/0x44 kernel/panic.c:606
 report_bug+0x1bd/0x210 lib/bug.c:199
 handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit:         81fee56e io_uring: add syz debug info
git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
console output: https://syzkaller.appspot.com/x/log.txt?x=12bb485c300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-16 10:57           ` syzbot
@ 2021-07-19 13:15             ` Pavel Begunkov
  2021-07-19 13:46               ` syzbot
  2021-07-19 16:57             ` Pavel Begunkov
  1 sibling, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2021-07-19 13:15 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, syzkaller-bugs

On 7/16/21 11:57 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

Not clear yet, let's add more stats and rerun

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang 

> 
> ctx ffff8880467ee000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_dump fs/io_uring.c:9142 [inline]
> WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Modules linked in:
> CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
> RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
> RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
> RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
> RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
> R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
> R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
> FS:  00007f9295c2d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1c3f1a3000 CR3: 0000000018755000 CR4: 0000000000350ee0
> Call Trace:
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  panic+0x306/0x73d kernel/panic.c:232
>  __warn.cold+0x35/0x44 kernel/panic.c:606
>  report_bug+0x1bd/0x210 lib/bug.c:199
>  handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
>  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
>  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
> RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
> RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
> RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
> RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
> RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
> R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
> R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> Tested on:
> 
> commit:         81fee56e io_uring: add syz debug info
> git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
> console output: https://syzkaller.appspot.com/x/log.txt?x=12bb485c300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> compiler:       
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 13:15             ` Pavel Begunkov
@ 2021-07-19 13:46               ` syzbot
  0 siblings, 0 replies; 14+ messages in thread
From: syzbot @ 2021-07-19 13:46 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, syzkaller-bugs

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in io_uring_cancel_generic

3rd poll ffff88801857e000
ctx ffff88801857e000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1, cqes 1, poll tw 0, dpoll 1 
------------[ cut here ]------------
WARNING: CPU: 0 PID: 19055 at fs/io_uring.c:9166 io_dump fs/io_uring.c:9166 [inline]
WARNING: CPU: 0 PID: 19055 at fs/io_uring.c:9166 io_uring_cancel_generic+0x609/0x1050 fs/io_uring.c:9222
Modules linked in:
CPU: 0 PID: 19055 Comm: iou-sqp-19052 Tainted: G        W         5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_dump fs/io_uring.c:9166 [inline]
RIP: 0010:io_uring_cancel_generic+0x609/0x1050 fs/io_uring.c:9222
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 e7 97 92 ff <0f> 0b e9 b1 fe ff ff e8 db 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc90002ff7c20 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880474b54c0 RCX: 0000000000000000
RDX: ffff8880474b54c0 RSI: ffffffff81e2f0c9 RDI: ffff88801857e518
RBP: ffff8880276e9f80 R08: 000000000000009d R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffffc90002ff7d40
R13: ffff888017015058 R14: 0000000000000000 R15: 0000000000000001
FS:  00007f3ab01c2700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002aad848 CR3: 000000004a024000 CR4: 0000000000350ef0
Call Trace:
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6944
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 19055 Comm: iou-sqp-19052 Tainted: G        W         5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 panic+0x306/0x73d kernel/panic.c:232
 __warn.cold+0x35/0x44 kernel/panic.c:606
 report_bug+0x1bd/0x210 lib/bug.c:199
 handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
RIP: 0010:io_dump fs/io_uring.c:9166 [inline]
RIP: 0010:io_uring_cancel_generic+0x609/0x1050 fs/io_uring.c:9222
Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 e7 97 92 ff <0f> 0b e9 b1 fe ff ff e8 db 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
RSP: 0018:ffffc90002ff7c20 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880474b54c0 RCX: 0000000000000000
RDX: ffff8880474b54c0 RSI: ffffffff81e2f0c9 RDI: ffff88801857e518
RBP: ffff8880276e9f80 R08: 000000000000009d R09: 0000000000000000
R10: ffffffff815d066e R11: 0000000000000000 R12: ffffc90002ff7d40
R13: ffff888017015058 R14: 0000000000000000 R15: 0000000000000001
 io_sq_thread+0xaac/0x1250 fs/io_uring.c:6944
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit:         6828fc5e io_uring: add syz debug info
git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
console output: https://syzkaller.appspot.com/x/log.txt?x=10ed934c300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-16 10:57           ` syzbot
  2021-07-19 13:15             ` Pavel Begunkov
@ 2021-07-19 16:57             ` Pavel Begunkov
  2021-07-19 17:13               ` Jens Axboe
  2021-07-19 17:15               ` syzbot
  1 sibling, 2 replies; 14+ messages in thread
From: Pavel Begunkov @ 2021-07-19 16:57 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/16/21 11:57 AM, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in io_uring_cancel_generic

__arm_poll doesn't remove a second poll entry in case of failed
__io_queue_proc(), it's most likely the cause here.

#syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

> 
> ctx ffff8880467ee000 submitted 1, dismantled 0, crefs 0, inflight 1, fallbacks 0, poll1 0, poll2 0, tw 0, hash 0, cs 0, issued 1 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_dump fs/io_uring.c:9142 [inline]
> WARNING: CPU: 0 PID: 18216 at fs/io_uring.c:9142 io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Modules linked in:
> CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
> RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
> RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
> RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
> RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
> R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
> R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
> FS:  00007f9295c2d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1c3f1a3000 CR3: 0000000018755000 CR4: 0000000000350ee0
> Call Trace:
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 18216 Comm: iou-sqp-18211 Not tainted 5.14.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>  panic+0x306/0x73d kernel/panic.c:232
>  __warn.cold+0x35/0x44 kernel/panic.c:606
>  report_bug+0x1bd/0x210 lib/bug.c:199
>  handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
>  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
>  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:566
> RIP: 0010:io_dump fs/io_uring.c:9142 [inline]
> RIP: 0010:io_uring_cancel_generic+0x608/0xea0 fs/io_uring.c:9198
> Code: 48 c1 ea 03 80 3c 02 00 0f 85 d5 02 00 00 48 8b 44 24 10 48 8b a8 98 00 00 00 48 39 6c 24 08 0f 85 02 03 00 00 e8 f8 97 92 ff <0f> 0b e9 af fe ff ff e8 ec 97 92 ff 48 8b 74 24 20 b9 08 00 00 00
> RSP: 0018:ffffc9000afefc40 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff88802a09d4c0 RCX: 0000000000000000
> RDX: ffff88802a09d4c0 RSI: ffffffff81e2f0b8 RDI: ffff8880467ee510
> RBP: ffff8880462fb788 R08: 0000000000000081 R09: 0000000000000000
> R10: ffffffff815d066e R11: 0000000000000000 R12: ffff8880462fa458
> R13: ffffc9000afefd40 R14: 0000000000000001 R15: 0000000000000000
>  io_sq_thread+0xaac/0x1250 fs/io_uring.c:6932
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> Tested on:
> 
> commit:         81fee56e io_uring: add syz debug info
> git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
> console output: https://syzkaller.appspot.com/x/log.txt?x=12bb485c300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
> dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
> compiler:       
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 16:57             ` Pavel Begunkov
@ 2021-07-19 17:13               ` Jens Axboe
  2021-07-19 17:28                 ` Pavel Begunkov
  2021-07-19 17:15               ` syzbot
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2021-07-19 17:13 UTC (permalink / raw)
  To: Pavel Begunkov, syzbot, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

On 7/19/21 10:57 AM, Pavel Begunkov wrote:
> On 7/16/21 11:57 AM, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>> WARNING in io_uring_cancel_generic
> 
> __arm_poll doesn't remove a second poll entry in case of failed
> __io_queue_proc(), it's most likely the cause here.
> 
> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang

Was my thought on seeing the last debug run too. Haven't written a test
case, but my initial thought was catching this at the time that double
poll is armed, in __io_queue_proc(). Totally untested, just tossing
it out there.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0cac361bf6b8..ed33de5fffd2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5002,6 +5002,9 @@ static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
 	if (unlikely(poll->head)) {
 		struct io_poll_iocb *poll_one = poll;
 
+		/* first poll failed, don't arm double poll */
+		if (pt->error)
+			return;
 		/* already have a 2nd entry, fail a third attempt */
 		if (*poll_ptr) {
 			pt->error = -EINVAL;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 16:57             ` Pavel Begunkov
  2021-07-19 17:13               ` Jens Axboe
@ 2021-07-19 17:15               ` syzbot
  1 sibling, 0 replies; 14+ messages in thread
From: syzbot @ 2021-07-19 17:15 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com

Tested on:

commit:         61d27d88 io_uring: double remove on poll arm failure
git tree:       https://github.com/isilence/linux.git syztest_sqpoll_hang
kernel config:  https://syzkaller.appspot.com/x/.config?x=cfe2c0e42bc9993d
dashboard link: https://syzkaller.appspot.com/bug?extid=ac957324022b7132accf
compiler:       

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 17:13               ` Jens Axboe
@ 2021-07-19 17:28                 ` Pavel Begunkov
  2021-07-19 18:38                   ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2021-07-19 17:28 UTC (permalink / raw)
  To: Jens Axboe, syzbot, io-uring, linux-kernel, mingo, mingo, peterz,
	rostedt, syzkaller-bugs, will

On 7/19/21 6:13 PM, Jens Axboe wrote:
> On 7/19/21 10:57 AM, Pavel Begunkov wrote:
>> On 7/16/21 11:57 AM, syzbot wrote:
>>> Hello,
>>>
>>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>>> WARNING in io_uring_cancel_generic
>>
>> __arm_poll doesn't remove a second poll entry in case of failed
>> __io_queue_proc(), it's most likely the cause here.
>>
>> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang
> 
> Was my thought on seeing the last debug run too. Haven't written a test
> case, but my initial thought was catching this at the time that double
> poll is armed, in __io_queue_proc(). Totally untested, just tossing
> it out there.

Wouldn't help, unfortunately, the way syz triggers it is making a
request to go through __io_queue_proc() three times.

Either it's 3 waitqueues or we need to extend the check below to
the double poll entry.

if (poll_one->head == head)
	return;

> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 0cac361bf6b8..ed33de5fffd2 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5002,6 +5002,9 @@ static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
>  	if (unlikely(poll->head)) {
>  		struct io_poll_iocb *poll_one = poll;
>  
> +		/* first poll failed, don't arm double poll */
> +		if (pt->error)
> +			return;
>  		/* already have a 2nd entry, fail a third attempt */
>  		if (*poll_ptr) {
>  			pt->error = -EINVAL;
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] INFO: task hung in io_sq_thread_park (2)
  2021-07-19 17:28                 ` Pavel Begunkov
@ 2021-07-19 18:38                   ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2021-07-19 18:38 UTC (permalink / raw)
  To: Pavel Begunkov, syzbot, io-uring, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, will

On 7/19/21 11:28 AM, Pavel Begunkov wrote:
> On 7/19/21 6:13 PM, Jens Axboe wrote:
>> On 7/19/21 10:57 AM, Pavel Begunkov wrote:
>>> On 7/16/21 11:57 AM, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>>>> WARNING in io_uring_cancel_generic
>>>
>>> __arm_poll doesn't remove a second poll entry in case of failed
>>> __io_queue_proc(), it's most likely the cause here.
>>>
>>> #syz test: https://github.com/isilence/linux.git syztest_sqpoll_hang
>>
>> Was my thought on seeing the last debug run too. Haven't written a test
>> case, but my initial thought was catching this at the time that double
>> poll is armed, in __io_queue_proc(). Totally untested, just tossing
>> it out there.
> 
> Wouldn't help, unfortunately, the way syz triggers it is making a
> request to go through __io_queue_proc() three times.
> 
> Either it's 3 waitqueues or we need to extend the check below to
> the double poll entry.
> 
> if (poll_one->head == head)
> 	return;

Yes good point, that'd depend on single poll erroring first. Given
the variety of cases for it, catching it after the fact like in your
patch is likely the simplest/cleanest way.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-19 19:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-15  3:19 [syzbot] INFO: task hung in io_sq_thread_park (2) syzbot
2021-07-15  8:44 ` Pavel Begunkov
2021-07-15  9:15   ` syzbot
2021-07-15 18:16     ` Pavel Begunkov
2021-07-16  0:14       ` syzbot
2021-07-16 10:38         ` Pavel Begunkov
2021-07-16 10:57           ` syzbot
2021-07-19 13:15             ` Pavel Begunkov
2021-07-19 13:46               ` syzbot
2021-07-19 16:57             ` Pavel Begunkov
2021-07-19 17:13               ` Jens Axboe
2021-07-19 17:28                 ` Pavel Begunkov
2021-07-19 18:38                   ` Jens Axboe
2021-07-19 17:15               ` syzbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.