linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix syzkaller task hung in exit_aio
@ 2019-03-06 13:53 zhengbin
  2019-03-06 19:44 ` Al Viro
  0 siblings, 1 reply; 3+ messages in thread
From: zhengbin @ 2019-03-06 13:53 UTC (permalink / raw)
  To: viro, bcrl, linux-fsdevel, linux-aio; +Cc: houtao1, yi.zhang

When I use syzkaller test kernel, will hung in exit_aio.

INFO: task syz-executor.2:22372 blocked for more than 140 seconds.
      Not tainted 4.19.25 #5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.2  D27568 22372   2689 0x90000002
Call Trace:
 schedule+0x7c/0x1a0 kernel/sched/core.c:3516
 schedule_timeout+0x4cf/0x1140 kernel/time/timer.c:1780
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x27a/0x3d0 kernel/sched/completion.c:136
 exit_aio+0x2ef/0x3c0 fs/aio.c:881
 __mmput kernel/fork.c:1047 [inline]
 mmput+0xb4/0x460 kernel/fork.c:1071
 exit_mm kernel/exit.c:545 [inline]
 do_exit+0x79c/0x2cb0 kernel/exit.c:862
 do_group_exit+0x106/0x2f0 kernel/exit.c:978
 get_signal+0x325/0x1c80 kernel/signal.c:2572
 do_signal+0x94/0x16a0 arch/x86/kernel/signal.c:816
 exit_to_usermode_loop+0x108/0x1d0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293

The reason is as follows:
io_submit_one-->aio_get_req-->percpu_ref_get(&ctx->reqs)
                           -->req->ki_refcnt=0
             -->aio_poll-->req->ki_refcnt=2
                        -->aio_poll_complete-->aio_complete-->iocb_put
                        -->iocb_put

iocb_put will decrease req->ki_refcnt, the number of calls of
aio_poll_complete must be equal with iocb_put. Unfortunately, in some
case, this is not equal, which is as follows:

CPU 0                          CPU 1
aio_poll-->vfs_poll
                               eventfd_write-->spin_lock_irq(lock)
                                            -->..-->aio_poll_wake
                                            -->spin_unlock_irq(lock)
        -->spin_lock(lock)
        -->if (req->woken)
		mask = 0; --->did not call aio_poll_complete
        -->iocb_put

aio_poll_wake
	req->woken = true;
	if (mask) {
		if (!(mask & req->events))
			return 0;  --->did not call aio_poll_complete too

vfs_poll-->eventfd_poll-->poll_wait-->aio_poll_queue_proc(add
aio_poll_wake to req->head)

eventfd_write-->wake_up_locked_poll-->__wake_up_common-->curr->func
-->aio_poll_wake

This patch fixes that. by the way, fix the bug of the error handling path.

Signed-off-by: zhengbin <zhengbin13@huawei.com>
---
 fs/aio.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 38b741a..3bf8cdc 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1668,8 +1668,6 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	__poll_t mask = key_to_poll(key);
 	unsigned long flags;

-	req->woken = true;
-
 	/* for instances that support it check for an event match first: */
 	if (mask) {
 		if (!(mask & req->events))
@@ -1687,12 +1685,14 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,

 			list_del_init(&req->wait.entry);
 			aio_poll_complete(iocb, mask);
+			req->woken = true;
 			return 1;
 		}
 	}

 	list_del_init(&req->wait.entry);
 	schedule_work(&req->work);
+	req->woken = true;
 	return 1;
 }

@@ -1777,8 +1777,10 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	spin_unlock_irq(&ctx->ctx_lock);

 out:
-	if (unlikely(apt.error))
+	if (unlikely(apt.error)) {
+		iocb_put(aiocb);
 		return apt.error;
+	}

 	if (mask)
 		aio_poll_complete(aiocb, mask);
--
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] fix syzkaller task hung in exit_aio
  2019-03-06 13:53 [PATCH] fix syzkaller task hung in exit_aio zhengbin
@ 2019-03-06 19:44 ` Al Viro
  2019-03-07  0:07   ` Al Viro
  0 siblings, 1 reply; 3+ messages in thread
From: Al Viro @ 2019-03-06 19:44 UTC (permalink / raw)
  To: zhengbin; +Cc: bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Wed, Mar 06, 2019 at 09:53:23PM +0800, zhengbin wrote:

> CPU 0                          CPU 1
> aio_poll-->vfs_poll
>                                eventfd_write-->spin_lock_irq(lock)
>                                             -->..-->aio_poll_wake
>                                             -->spin_unlock_irq(lock)
>         -->spin_lock(lock)
>         -->if (req->woken)
> 		mask = 0; --->did not call aio_poll_complete
>         -->iocb_put
> 
> aio_poll_wake
> 	req->woken = true;
> 	if (mask) {
> 		if (!(mask & req->events))
> 			return 0;  --->did not call aio_poll_complete too

... and it's still on waitqueue, so it shouldn't be different from
_not_ having had a wakeup yet.  And yes, aio_poll() in mainline right
now ends up _not_ adding it to "can be cancelled" list, leading to
that bug.

> vfs_poll-->eventfd_poll-->poll_wait-->aio_poll_queue_proc(add
> aio_poll_wake to req->head)
> 
> eventfd_write-->wake_up_locked_poll-->__wake_up_common-->curr->func
> -->aio_poll_wake
> 
> This patch fixes that. by the way, fix the bug of the error handling path.

Leak on error is real (see thread a few days ago), and overall logics for
"woken" should be similar to what you suggest, but I'd rather handle it
slightly differently (see the same thread).

I've a patch that ought to fix that and it seems to survive testing; I'll
post once I finish carving it up - too many cleanups mixed into it.  Give
me a couple of hours; should be done (and posted) by then.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] fix syzkaller task hung in exit_aio
  2019-03-06 19:44 ` Al Viro
@ 2019-03-07  0:07   ` Al Viro
  0 siblings, 0 replies; 3+ messages in thread
From: Al Viro @ 2019-03-07  0:07 UTC (permalink / raw)
  To: zhengbin; +Cc: bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Wed, Mar 06, 2019 at 07:44:55PM +0000, Al Viro wrote:

> Leak on error is real (see thread a few days ago), and overall logics for
> "woken" should be similar to what you suggest, but I'd rather handle it
> slightly differently (see the same thread).
> 
> I've a patch that ought to fix that and it seems to survive testing; I'll
> post once I finish carving it up - too many cleanups mixed into it.  Give
> me a couple of hours; should be done (and posted) by then.

Carved up and posted - sorry, too longer than I hoped ;-/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-07  0:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06 13:53 [PATCH] fix syzkaller task hung in exit_aio zhengbin
2019-03-06 19:44 ` Al Viro
2019-03-07  0:07   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).