io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: INFO: task hung in __io_uring_task_cancel
       [not found] <CAGyP=7cFM6BJE7X2PN9YUptQgt5uQYwM4aVmOiVayQPJg1pqaA@mail.gmail.com>
@ 2021-01-03 21:53 ` Jens Axboe
       [not found] ` <20210103123701.1500-1-hdanton@sina.com>
  1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2021-01-03 21:53 UTC (permalink / raw)
  To: Palash Oswal, io-uring, linux-fsdevel, linux-kernel, mingo,
	mingo, peterz, rostedt, syzkaller-bugs, viro, will

On 1/2/21 9:14 PM, Palash Oswal wrote:
>  Hello,
> 
> I was running syzkaller and I found the following issue :
> 
> Head Commit : b1313fe517ca3703119dcc99ef3bbf75ab42bcfb ( v5.10.4 )
> Git Tree : stable
> Console Output :
> [  242.769080] INFO: task repro:2639 blocked for more than 120 seconds.
> [  242.769096]       Not tainted 5.10.4 #8
> [  242.769103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  242.769112] task:repro           state:D stack:    0 pid: 2639
> ppid:  2638 flags:0x00000004
> [  242.769126] Call Trace:
> [  242.769148]  __schedule+0x28d/0x7e0
> [  242.769162]  ? __percpu_counter_sum+0x75/0x90
> [  242.769175]  schedule+0x4f/0xc0
> [  242.769187]  __io_uring_task_cancel+0xad/0xf0
> [  242.769198]  ? wait_woken+0x80/0x80
> [  242.769210]  bprm_execve+0x67/0x8a0
> [  242.769223]  do_execveat_common+0x1d2/0x220
> [  242.769235]  __x64_sys_execveat+0x5d/0x70
> [  242.769249]  do_syscall_64+0x38/0x90
> [  242.769260]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  242.769270] RIP: 0033:0x7f59ce45967d
> [  242.769277] RSP: 002b:00007ffd05d10a58 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000142
> [  242.769290] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f59ce45967d
> [  242.769297] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
> [  242.769304] RBP: 00007ffd05d10a70 R08: 0000000000000000 R09: 00007ffd05d10a70
> [  242.769311] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a91d37d320
> [  242.769318] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Can you see if this helps? The reproducer is pretty brutal, it'll fork
thousands of tasks with rings! But should work of course. I think this
one is pretty straight forward, and actually an older issue with the
poll rewaiting.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ca46f314640b..539de04f9183 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5103,6 +5103,12 @@ static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
+	/* Never re-wait on poll if the ctx or task is going away */
+	if (percpu_ref_is_dying(&ctx->refs) || req->task->flags & PF_EXITING) {
+		spin_lock_irq(&ctx->completion_lock);
+		return false;
+	}
+
 	if (!req->result && !READ_ONCE(poll->canceled)) {
 		struct poll_table_struct pt = { ._key = poll->events };
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: INFO: task hung in __io_uring_task_cancel
       [not found] ` <20210103123701.1500-1-hdanton@sina.com>
@ 2021-01-04  5:23   ` Palash Oswal
       [not found]     ` <20210104065231.2579-1-hdanton@sina.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Palash Oswal @ 2021-01-04  5:23 UTC (permalink / raw)
  To: Hillf Danton, axboe
  Cc: io-uring, linux-kernel, syzkaller-bugs, Pavel Begunkov, viro,
	will, rostedt, peterz, mingo, mingo, linux-fsdevel

Hillf -
> Can you reproduce it again against 5.11-rc1 with the tiny diff applied
> to see if there is a missing wakeup in the mainline?

Hey Hillf, thanks for sharing the diff. It seems like the reproducer
that I had sent did not work on 5.11-rc1 itself, so I'm trying to get
an updated reproducer for that.
I'm not well versed with the io_uring code yet, and therefore it'll
take me longer to get the reproducer going for 5.11-rc1.

Jens -
> Can you see if this helps? The reproducer is pretty brutal, it'll fork
> thousands of tasks with rings! But should work of course. I think this
> one is pretty straight forward, and actually an older issue with the
> poll rewaiting.

Hey Jens, I applied your diff to 5.10.4 (
b1313fe517ca3703119dcc99ef3bbf75ab42bcfb ), and unfortunately, I'm
still seeing the task being hung. Here's the console log if this helps
further -
root@syzkaller:~# [  242.840696] INFO: task repro:395 blocked for more
than 120 seconds.
[  242.846353]       Not tainted 5.10.4+ #9
[  242.849951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  242.857665] task:repro           state:D stack:    0 pid:  395
ppid:   394 flags:0x00000004
[  242.867346] Call Trace:
[  242.870521]  __schedule+0x28d/0x7e0
[  242.873597]  ? __percpu_counter_sum+0x75/0x90
[  242.876794]  schedule+0x4f/0xc0
[  242.878803]  __io_uring_task_cancel+0xad/0xf0
[  242.880952]  ? wait_woken+0x80/0x80
[  242.882330]  bprm_execve+0x67/0x8a0
[  242.884142]  do_execveat_common+0x1d2/0x220
[  242.885610]  __x64_sys_execveat+0x5d/0x70
[  242.886708]  do_syscall_64+0x38/0x90
[  242.887727]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  242.889298] RIP: 0033:0x7ffabedd6469
[  242.890265] RSP: 002b:00007ffc56b8bc78 EFLAGS: 00000246 ORIG_RAX:
0000000000000142
[  242.892055] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffabedd6469
[  242.893776] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
[  242.895400] RBP: 00007ffc56b8bc90 R08: 0000000000000000 R09: 00007ffc56b8bc90
[  242.896879] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559c19400bf0
[  242.898335] R13: 00007ffc56b8bdb0 R14: 0000000000000000 R15: 0000000000000000
[  363.691144] INFO: task repro:395 blocked for more than 241 seconds.
[  363.693724]       Not tainted 5.10.4+ #9
[  363.695513] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.700543] task:repro           state:D stack:    0 pid:  395
ppid:   394 flags:0x00000004
[  363.705747] Call Trace:
[  363.707359]  __schedule+0x28d/0x7e0
[  363.709603]  ? __percpu_counter_sum+0x75/0x90
[  363.712900]  schedule+0x4f/0xc0
[  363.715002]  __io_uring_task_cancel+0xad/0xf0
[  363.718026]  ? wait_woken+0x80/0x80
[  363.720137]  bprm_execve+0x67/0x8a0
[  363.721992]  do_execveat_common+0x1d2/0x220
[  363.723997]  __x64_sys_execveat+0x5d/0x70
[  363.725857]  do_syscall_64+0x38/0x90
[  363.727501]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  363.729510] RIP: 0033:0x7ffabedd6469
[  363.730913] RSP: 002b:00007ffc56b8bc78 EFLAGS: 00000246 ORIG_RAX:
0000000000000142
[  363.733747] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffabedd6469
[  363.736138] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
[  363.738431] RBP: 00007ffc56b8bc90 R08: 0000000000000000 R09: 00007ffc56b8bc90
[  363.740504] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559c19400bf0
[  363.742560] R13: 00007ffc56b8bdb0 R14: 0000000000000000 R15: 0000000000000000

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: INFO: task hung in __io_uring_task_cancel
       [not found]     ` <20210104065231.2579-1-hdanton@sina.com>
@ 2021-01-04  9:40       ` Palash Oswal
  0 siblings, 0 replies; 3+ messages in thread
From: Palash Oswal @ 2021-01-04  9:40 UTC (permalink / raw)
  To: Hillf Danton
  Cc: axboe, io-uring, linux-kernel, syzkaller-bugs, Pavel Begunkov

On Mon, Jan 4, 2021 at 12:22 PM Hillf Danton <hdanton@sina.com> wrote:
> It is now updated.

Hello Hilf,

Thanks for the new diff. I tested by applying the diff on 5.10.4 with
the original reproducer, and the issue still persists.

root@syzkaller:~# [  242.925799] INFO: task repro:416 blocked for more
than 120 seconds.
[  242.928095]       Not tainted 5.10.4+ #12
[  242.929034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  242.930825] task:repro           state:D stack:    0 pid:  416
ppid:   415 flags:0x00000004
[  242.933404] Call Trace:
[  242.934365]  __schedule+0x28d/0x7e0
[  242.935199]  ? __percpu_counter_sum+0x75/0x90
[  242.936265]  schedule+0x4f/0xc0
[  242.937159]  __io_uring_task_cancel+0xc0/0xf0
[  242.938340]  ? wait_woken+0x80/0x80
[  242.939380]  bprm_execve+0x67/0x8a0
[  242.940163]  do_execveat_common+0x1d2/0x220
[  242.941090]  __x64_sys_execveat+0x5d/0x70
[  242.942056]  do_syscall_64+0x38/0x90
[  242.943088]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  242.944511] RIP: 0033:0x7fd0b781e469
[  242.945422] RSP: 002b:00007fffda20e9c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000142
[  242.947289] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd0b781e469
[  242.949031] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
[  242.950683] RBP: 00007fffda20e9e0 R08: 0000000000000000 R09: 00007fffda20e9e0
[  242.952450] R10: 0000000000000000 R11: 0000000000000246 R12: 0000556068200bf0
[  242.954045] R13: 00007fffda20eb00 R14: 0000000000000000 R15: 0000000000000000


linux git:(b1313fe517ca)  git diff
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0fcd065baa76..e0c5424e28b1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1867,8 +1867,7 @@ static void __io_free_req(struct io_kiocb *req)
     io_dismantle_req(req);

     percpu_counter_dec(&tctx->inflight);
-    if (atomic_read(&tctx->in_idle))
-        wake_up(&tctx->wait);
+    wake_up(&tctx->wait);
     put_task_struct(req->task);

     if (likely(!io_is_fallback_req(req)))
@@ -8853,12 +8852,11 @@ void __io_uring_task_cancel(void)
          * If we've seen completions, retry. This avoids a race where
          * a completion comes in before we did prepare_to_wait().
          */
-        if (inflight != tctx_inflight(tctx))
-            continue;
-        schedule();
+        if (inflight == tctx_inflight(tctx))
+            schedule();
+        finish_wait(&tctx->wait, &wait);
     } while (1);

-    finish_wait(&tctx->wait, &wait);
     atomic_dec(&tctx->in_idle);
 }

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-04  9:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAGyP=7cFM6BJE7X2PN9YUptQgt5uQYwM4aVmOiVayQPJg1pqaA@mail.gmail.com>
2021-01-03 21:53 ` INFO: task hung in __io_uring_task_cancel Jens Axboe
     [not found] ` <20210103123701.1500-1-hdanton@sina.com>
2021-01-04  5:23   ` Palash Oswal
     [not found]     ` <20210104065231.2579-1-hdanton@sina.com>
2021-01-04  9:40       ` Palash Oswal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).