io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] INFO: task hung in io_wqe_worker
@ 2021-10-21 21:10 syzbot
  2021-10-21 23:47 ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2021-10-21 21:10 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    d999ade1cc86 Merge tag 'perf-tools-fixes-for-v5.15-2021-10..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=136f87d0b00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=bab9d35f204746a7
dashboard link: https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10d3f7ccb00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15d3600cb00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com

INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds.
      Not tainted 5.15.0-rc5-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6609    state:D stack:27944 pid: 6612 ppid:  6526 flags:0x00004006
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0xb44/0x5960 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 io_worker_exit fs/io-wq.c:183 [inline]
 io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
1 lock held by khungtaskd/27:
 #0: ffffffff8b981ae0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 27 Comm: khungtaskd Not tainted 5.15.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xc1d/0xf50 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 1414 Comm: kworker/u4:5 Not tainted 5.15.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:85 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0xde/0x180 mm/kasan/generic.c:189
Code: 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 48 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00 <74> f2 eb d4 41 bc 08 00 00 00 48 89 ea 45 29 dc 4d 8d 1c 2c eb 0c
RSP: 0018:ffffc90005aa7988 EFLAGS: 00000046
RAX: ffffed10021cd084 RBX: ffffed10021cd085 RCX: ffffffff81348c59
RDX: ffffed10021cd085 RSI: 0000000000000008 RDI: ffff888010e68420
RBP: ffffed10021cd084 R08: 0000000000000000 R09: ffff888010e68427
R10: ffffed10021cd084 R11: 000000000000003f R12: ffffffff8baabbe0
R13: ffff888010e68420 R14: 0000000000000000 R15: ffff88801dfeda50
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f93e8906000 CR3: 000000000b68e000 CR4: 0000000000350ee0
Call Trace:
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 atomic64_read include/linux/atomic/atomic-instrumented.h:605 [inline]
 switch_mm_irqs_off+0x1e9/0xa10 arch/x86/mm/tlb.c:615
 use_temporary_mm arch/x86/kernel/alternative.c:741 [inline]
 __text_poke+0x447/0x8c0 arch/x86/kernel/alternative.c:838
 text_poke_bp_batch+0x3d7/0x560 arch/x86/kernel/alternative.c:1178
 text_poke_flush arch/x86/kernel/alternative.c:1268 [inline]
 text_poke_flush arch/x86/kernel/alternative.c:1265 [inline]
 text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1275
 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146
 jump_label_update+0x1d5/0x430 kernel/jump_label.c:830
 static_key_enable_cpuslocked+0x1b1/0x260 kernel/jump_label.c:177
 static_key_enable+0x16/0x20 kernel/jump_label.c:190
 toggle_allocation_gate mm/kfence/core.c:626 [inline]
 toggle_allocation_gate+0x100/0x390 mm/kfence/core.c:618
 process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2444
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
----------------
Code disassembly (best guess):
   0:	74 f2                	je     0xfffffff4
   2:	48 89 c2             	mov    %rax,%rdx
   5:	b8 01 00 00 00       	mov    $0x1,%eax
   a:	48 85 d2             	test   %rdx,%rdx
   d:	75 56                	jne    0x65
   f:	5b                   	pop    %rbx
  10:	5d                   	pop    %rbp
  11:	41 5c                	pop    %r12
  13:	c3                   	retq
  14:	48 85 d2             	test   %rdx,%rdx
  17:	74 5e                	je     0x77
  19:	48 01 ea             	add    %rbp,%rdx
  1c:	eb 09                	jmp    0x27
  1e:	48 83 c0 01          	add    $0x1,%rax
  22:	48 39 d0             	cmp    %rdx,%rax
  25:	74 50                	je     0x77
  27:	80 38 00             	cmpb   $0x0,(%rax)
* 2a:	74 f2                	je     0x1e <-- trapping instruction
  2c:	eb d4                	jmp    0x2
  2e:	41 bc 08 00 00 00    	mov    $0x8,%r12d
  34:	48 89 ea             	mov    %rbp,%rdx
  37:	45 29 dc             	sub    %r11d,%r12d
  3a:	4d 8d 1c 2c          	lea    (%r12,%rbp,1),%r11
  3e:	eb 0c                	jmp    0x4c


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: task hung in io_wqe_worker
  2021-10-21 21:10 [syzbot] INFO: task hung in io_wqe_worker syzbot
@ 2021-10-21 23:47 ` Pavel Begunkov
  2021-10-22  4:38   ` syzbot
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-10-21 23:47 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, syzkaller-bugs

On 10/21/21 22:10, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    d999ade1cc86 Merge tag 'perf-tools-fixes-for-v5.15-2021-10..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136f87d0b00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=bab9d35f204746a7
> dashboard link: https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10d3f7ccb00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15d3600cb00000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com

#syz test: git://git.kernel.dk/linux-block io_uring-5.15


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: task hung in io_wqe_worker
  2021-10-21 23:47 ` Pavel Begunkov
@ 2021-10-22  4:38   ` syzbot
  2021-10-22 13:49     ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2021-10-22  4:38 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, syzkaller-bugs

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in io_wqe_worker

INFO: task iou-wrk-9392:9401 blocked for more than 143 seconds.
      Not tainted 5.15.0-rc2-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-9392    state:D stack:27952 pid: 9401 ppid:  7038 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0xb44/0x5960 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 io_worker_exit fs/io-wq.c:183 [inline]
 io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
1 lock held by khungtaskd/27:
 #0: ffffffff8b981ae0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by cron/6230:
 #0: ffff8880b9c31a58 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2b/0x120 kernel/sched/core.c:474
1 lock held by in:imklog/6237:
 #0: ffff88801db6ad70 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 27 Comm: khungtaskd Not tainted 5.15.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xc1d/0xf50 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 6872 Comm: kworker/0:4 Not tainted 5.15.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:set_canary_byte mm/kfence/core.c:214 [inline]
RIP: 0010:for_each_canary mm/kfence/core.c:249 [inline]
RIP: 0010:kfence_guarded_alloc mm/kfence/core.c:321 [inline]
RIP: 0010:__kfence_alloc+0x635/0xca0 mm/kfence/core.c:779
Code: 71 8a b8 ff 48 8b 6c 24 10 48 be 00 00 00 00 00 fc ff df 48 c1 ed 03 48 01 f5 4d 39 f7 73 5c e8 81 84 b8 ff 4c 89 f8 45 89 fe <4c> 89 fa 48 c1 e8 03 41 83 e6 07 83 e2 07 48 b9 00 00 00 00 00 fc
RSP: 0018:ffffc9000548fb48 EFLAGS: 00000093
RAX: ffff88823bce0d4b RBX: ffffffff9028ec08 RCX: 0000000000000000
RDX: ffff888079f7d580 RSI: ffffffff81be60df RDI: 0000000000000003
RBP: fffffbfff2051d8e R08: ffff88823bce0d4b R09: ffffffff8eef500f
R10: ffffffff81be6131 R11: 0000000000000001 R12: ffff8881441fa000
R13: 00000000000000e8 R14: 000000003bce0d4b R15: ffff88823bce0d4b
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efccb07f000 CR3: 000000000b68e000 CR4: 0000000000350ef0
Call Trace:
 kfence_alloc include/linux/kfence.h:124 [inline]
 slab_alloc_node mm/slub.c:3124 [inline]
 kmem_cache_alloc_node+0x213/0x3d0 mm/slub.c:3242
 __alloc_skb+0x20b/0x340 net/core/skbuff.c:414
 alloc_skb include/linux/skbuff.h:1116 [inline]
 nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:664 [inline]
 nsim_dev_trap_report drivers/net/netdevsim/dev.c:721 [inline]
 nsim_dev_trap_report_work+0x2ac/0xbd0 drivers/net/netdevsim/dev.c:762
 process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2444
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
----------------
Code disassembly (best guess):
   0:	71 8a                	jno    0xffffff8c
   2:	b8 ff 48 8b 6c       	mov    $0x6c8b48ff,%eax
   7:	24 10                	and    $0x10,%al
   9:	48 be 00 00 00 00 00 	movabs $0xdffffc0000000000,%rsi
  10:	fc ff df
  13:	48 c1 ed 03          	shr    $0x3,%rbp
  17:	48 01 f5             	add    %rsi,%rbp
  1a:	4d 39 f7             	cmp    %r14,%r15
  1d:	73 5c                	jae    0x7b
  1f:	e8 81 84 b8 ff       	callq  0xffb884a5
  24:	4c 89 f8             	mov    %r15,%rax
  27:	45 89 fe             	mov    %r15d,%r14d
* 2a:	4c 89 fa             	mov    %r15,%rdx <-- trapping instruction
  2d:	48 c1 e8 03          	shr    $0x3,%rax
  31:	41 83 e6 07          	and    $0x7,%r14d
  35:	83 e2 07             	and    $0x7,%edx
  38:	48                   	rex.W
  39:	b9 00 00 00 00       	mov    $0x0,%ecx
  3e:	00 fc                	add    %bh,%ah


Tested on:

commit:         b22fa62a io_uring: apply worker limits to previous users
git tree:       git://git.kernel.dk/linux-block io_uring-5.15
console output: https://syzkaller.appspot.com/x/log.txt?x=16a3172cb00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cf1d1005f4fd6ccb
dashboard link: https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: task hung in io_wqe_worker
  2021-10-22  4:38   ` syzbot
@ 2021-10-22 13:49     ` Pavel Begunkov
  2021-10-22 13:57       ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-10-22 13:49 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, syzkaller-bugs

On 10/22/21 05:38, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: task hung in io_wqe_worker
> 
> INFO: task iou-wrk-9392:9401 blocked for more than 143 seconds.
>        Not tainted 5.15.0-rc2-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-wrk-9392    state:D stack:27952 pid: 9401 ppid:  7038 flags:0x00004004
> Call Trace:
>   context_switch kernel/sched/core.c:4940 [inline]
>   __schedule+0xb44/0x5960 kernel/sched/core.c:6287
>   schedule+0xd3/0x270 kernel/sched/core.c:6366
>   schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
>   do_wait_for_common kernel/sched/completion.c:85 [inline]
>   __wait_for_common kernel/sched/completion.c:106 [inline]
>   wait_for_common kernel/sched/completion.c:117 [inline]
>   wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
>   io_worker_exit fs/io-wq.c:183 [inline]
>   io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Easily reproducible, it's stuck in

static void io_worker_exit(struct io_worker *worker)
{
	...
	wait_for_completion(&worker->ref_done);
	...
}

The reference belongs to a create_worker_cb() task_work item. It's expected
to either be executed or cancelled by io_wq_exit_workers(), but the owner
task never goes __io_uring_cancel (called in do_exit()) and so never
reaches io_wq_exit_workers().

Following the owner task, cat /proc/<pid>/stack:

[<0>] do_coredump+0x1d0/0x10e0
[<0>] get_signal+0x4a3/0x960
[<0>] arch_do_signal_or_restart+0xc3/0x6d0
[<0>] exit_to_user_mode_prepare+0x10e/0x190
[<0>] irqentry_exit_to_user_mode+0x9/0x20
[<0>] irqentry_exit+0x36/0x40
[<0>] exc_page_fault+0x95/0x190
[<0>] asm_exc_page_fault+0x1e/0x30

(gdb) l *(do_coredump+0x1d0-5)
0xffffffff81343ccb is in do_coredump (fs/coredump.c:469).
464
465             if (core_waiters > 0) {
466                     struct core_thread *ptr;
467
468                     freezer_do_not_count();
469                     wait_for_completion(&core_state->startup);
470                     freezer_count();

Can't say anything more at the moment as not familiar with coredump

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: task hung in io_wqe_worker
  2021-10-22 13:49     ` Pavel Begunkov
@ 2021-10-22 13:57       ` Pavel Begunkov
  2021-10-28 20:32         ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-10-22 13:57 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, syzkaller-bugs

On 10/22/21 14:49, Pavel Begunkov wrote:
> On 10/22/21 05:38, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>> INFO: task hung in io_wqe_worker
>>
>> INFO: task iou-wrk-9392:9401 blocked for more than 143 seconds.
>>        Not tainted 5.15.0-rc2-syzkaller #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:iou-wrk-9392    state:D stack:27952 pid: 9401 ppid:  7038 flags:0x00004004
>> Call Trace:
>>   context_switch kernel/sched/core.c:4940 [inline]
>>   __schedule+0xb44/0x5960 kernel/sched/core.c:6287
>>   schedule+0xd3/0x270 kernel/sched/core.c:6366
>>   schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
>>   do_wait_for_common kernel/sched/completion.c:85 [inline]
>>   __wait_for_common kernel/sched/completion.c:106 [inline]
>>   wait_for_common kernel/sched/completion.c:117 [inline]
>>   wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
>>   io_worker_exit fs/io-wq.c:183 [inline]
>>   io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
>>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> Easily reproducible, it's stuck in
> 
> static void io_worker_exit(struct io_worker *worker)
> {
>      ...
>      wait_for_completion(&worker->ref_done);
>      ...
> }
> 
> The reference belongs to a create_worker_cb() task_work item. It's expected
> to either be executed or cancelled by io_wq_exit_workers(), but the owner
> task never goes __io_uring_cancel (called in do_exit()) and so never
> reaches io_wq_exit_workers().
> 
> Following the owner task, cat /proc/<pid>/stack:
> 
> [<0>] do_coredump+0x1d0/0x10e0
> [<0>] get_signal+0x4a3/0x960
> [<0>] arch_do_signal_or_restart+0xc3/0x6d0
> [<0>] exit_to_user_mode_prepare+0x10e/0x190
> [<0>] irqentry_exit_to_user_mode+0x9/0x20
> [<0>] irqentry_exit+0x36/0x40
> [<0>] exc_page_fault+0x95/0x190
> [<0>] asm_exc_page_fault+0x1e/0x30
> 
> (gdb) l *(do_coredump+0x1d0-5)
> 0xffffffff81343ccb is in do_coredump (fs/coredump.c:469).
> 464
> 465             if (core_waiters > 0) {
> 466                     struct core_thread *ptr;
> 467
> 468                     freezer_do_not_count();
> 469                     wait_for_completion(&core_state->startup);
> 470                     freezer_count();
> 
> Can't say anything more at the moment as not familiar with coredump

A simple hack allowing task works to be executed from there
workarounds the problem


diff --git a/fs/coredump.c b/fs/coredump.c
index 3224dee44d30..f6f9dfb02296 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -466,7 +466,8 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
  		struct core_thread *ptr;
  
  		freezer_do_not_count();
-		wait_for_completion(&core_state->startup);
+		while (wait_for_completion_interruptible(&core_state->startup))
+			tracehook_notify_signal();
  		freezer_count();
  		/*
  		 * Wait for all the threads to become inactive, so that



-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: task hung in io_wqe_worker
  2021-10-22 13:57       ` Pavel Begunkov
@ 2021-10-28 20:32         ` Pavel Begunkov
  2021-10-28 22:35           ` syzbot
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-10-28 20:32 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-kernel, syzkaller-bugs

On 10/22/21 14:57, Pavel Begunkov wrote:
> On 10/22/21 14:49, Pavel Begunkov wrote:
>> On 10/22/21 05:38, syzbot wrote:
>>> Hello,
>>>
>>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>>> INFO: task hung in io_wqe_worker
>>>
>>> INFO: task iou-wrk-9392:9401 blocked for more than 143 seconds.
>>>        Not tainted 5.15.0-rc2-syzkaller #0
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:iou-wrk-9392    state:D stack:27952 pid: 9401 ppid:  7038 flags:0x00004004
>>> Call Trace:
>>>   context_switch kernel/sched/core.c:4940 [inline]
>>>   __schedule+0xb44/0x5960 kernel/sched/core.c:6287
>>>   schedule+0xd3/0x270 kernel/sched/core.c:6366
>>>   schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
>>>   do_wait_for_common kernel/sched/completion.c:85 [inline]
>>>   __wait_for_common kernel/sched/completion.c:106 [inline]
>>>   wait_for_common kernel/sched/completion.c:117 [inline]
>>>   wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
>>>   io_worker_exit fs/io-wq.c:183 [inline]
>>>   io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
>>>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

#syz test: https://github.com/isilence/linux.git syz_coredump



>>
>> Easily reproducible, it's stuck in
>>
>> static void io_worker_exit(struct io_worker *worker)
>> {
>>      ...
>>      wait_for_completion(&worker->ref_done);
>>      ...
>> }
>>
>> The reference belongs to a create_worker_cb() task_work item. It's expected
>> to either be executed or cancelled by io_wq_exit_workers(), but the owner
>> task never goes __io_uring_cancel (called in do_exit()) and so never
>> reaches io_wq_exit_workers().
>>
>> Following the owner task, cat /proc/<pid>/stack:
>>
>> [<0>] do_coredump+0x1d0/0x10e0
>> [<0>] get_signal+0x4a3/0x960
>> [<0>] arch_do_signal_or_restart+0xc3/0x6d0
>> [<0>] exit_to_user_mode_prepare+0x10e/0x190
>> [<0>] irqentry_exit_to_user_mode+0x9/0x20
>> [<0>] irqentry_exit+0x36/0x40
>> [<0>] exc_page_fault+0x95/0x190
>> [<0>] asm_exc_page_fault+0x1e/0x30
>>
>> (gdb) l *(do_coredump+0x1d0-5)
>> 0xffffffff81343ccb is in do_coredump (fs/coredump.c:469).
>> 464
>> 465             if (core_waiters > 0) {
>> 466                     struct core_thread *ptr;
>> 467
>> 468                     freezer_do_not_count();
>> 469                     wait_for_completion(&core_state->startup);
>> 470                     freezer_count();
>>
>> Can't say anything more at the moment as not familiar with coredump
> 
> A simple hack allowing task works to be executed from there
> workarounds the problem
> 
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 3224dee44d30..f6f9dfb02296 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -466,7 +466,8 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
>           struct core_thread *ptr;
> 
>           freezer_do_not_count();
> -        wait_for_completion(&core_state->startup);
> +        while (wait_for_completion_interruptible(&core_state->startup))
> +            tracehook_notify_signal();
>           freezer_count();
>           /*
>            * Wait for all the threads to become inactive, so that
> 
> 
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: task hung in io_wqe_worker
  2021-10-28 20:32         ` Pavel Begunkov
@ 2021-10-28 22:35           ` syzbot
  0 siblings, 0 replies; 7+ messages in thread
From: syzbot @ 2021-10-28 22:35 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, syzkaller-bugs

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com

Tested on:

commit:         5983fb88 io-wq: remove worker to owner dependency
git tree:       https://github.com/isilence/linux.git syz_coredump
kernel config:  https://syzkaller.appspot.com/x/.config?x=1f7f46d98a0da80e
dashboard link: https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-28 22:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 21:10 [syzbot] INFO: task hung in io_wqe_worker syzbot
2021-10-21 23:47 ` Pavel Begunkov
2021-10-22  4:38   ` syzbot
2021-10-22 13:49     ` Pavel Begunkov
2021-10-22 13:57       ` Pavel Begunkov
2021-10-28 20:32         ` Pavel Begunkov
2021-10-28 22:35           ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).