io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: Jens Axboe <axboe@kernel.dk>, io-uring <io-uring@vger.kernel.org>
Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	Pavel Begunkov <asml.silence@gmail.com>
Subject: Re: [PATCH] io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
Date: Wed, 20 Jan 2021 10:07:34 +0800	[thread overview]
Message-ID: <7c176c50-0f62-6753-eeef-bbb7a803febf@linux.alibaba.com> (raw)
In-Reply-To: <e1617bdf-3b4f-f598-a0ad-13ad68bb1e42@kernel.dk>



On 1/20/21 1:18 AM, Jens Axboe wrote:
> IORING_OP_CLOSE is special in terms of cancelation, since it has an
> intermediate state where we've removed the file descriptor but hasn't
> closed the file yet. For that reason, it's currently marked with
> IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op
> is always run even if canceled, to prevent leaving us with a live file
> but an fd that is gone. However, with SQPOLL, since a cancel request
> doesn't carry any resources on behalf of the request being canceled, if
> we cancel before any of the close op has been run, we can end up with
> io-wq not having the ->files assigned. This can result in the following
> oops reported by Joseph:
> 
> BUG: kernel NULL pointer dereference, address: 00000000000000d8
> PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0
> Oops: 0000 [#1] SMP PTI
> CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> RIP: 0010:__lock_acquire+0x19d/0x18c0
> Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe
> RSP: 0018:ffffc90001933828 EFLAGS: 00010002
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8
> RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8
> FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  lock_acquire+0x31a/0x440
>  ? close_fd_get_file+0x39/0x160
>  ? __lock_acquire+0x647/0x18c0
>  _raw_spin_lock+0x2c/0x40
>  ? close_fd_get_file+0x39/0x160
>  close_fd_get_file+0x39/0x160
>  io_issue_sqe+0x1334/0x14e0
>  ? lock_acquire+0x31a/0x440
>  ? __io_free_req+0xcf/0x2e0
>  ? __io_free_req+0x175/0x2e0
>  ? find_held_lock+0x28/0xb0
>  ? io_wq_submit_work+0x7f/0x240
>  io_wq_submit_work+0x7f/0x240
>  io_wq_cancel_cb+0x161/0x580
>  ? io_wqe_wake_worker+0x114/0x360
>  ? io_uring_get_socket+0x40/0x40
>  io_async_find_and_cancel+0x3b/0x140
>  io_issue_sqe+0xbe1/0x14e0
>  ? __lock_acquire+0x647/0x18c0
>  ? __io_queue_sqe+0x10b/0x5f0
>  __io_queue_sqe+0x10b/0x5f0
>  ? io_req_prep+0xdb/0x1150
>  ? mark_held_locks+0x6d/0xb0
>  ? mark_held_locks+0x6d/0xb0
>  ? io_queue_sqe+0x235/0x4b0
>  io_queue_sqe+0x235/0x4b0
>  io_submit_sqes+0xd7e/0x12a0
>  ? _raw_spin_unlock_irq+0x24/0x30
>  ? io_sq_thread+0x3ae/0x940
>  io_sq_thread+0x207/0x940
>  ? do_wait_intr_irq+0xc0/0xc0
>  ? __ia32_sys_io_uring_enter+0x650/0x650
>  kthread+0x134/0x180
>  ? kthread_create_worker_on_cpu+0x90/0x90
>  ret_from_fork+0x1f/0x30
> 
> Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified
> the fdtable. Canceling before this point is totally fine, and running
> it in the io-wq context _after_ that point is also fine.
> 
> For 5.12, we'll handle this internally and get rid of the no-cancel
> flag, as IORING_OP_CLOSE is the only user of it.
> 
> Fixes: 14587a46646d ("io_uring: enable file table usage for SQPOLL rings")

As discussed with Pavel, this can not only happen in case sqpoll, but
also in case async cancel is from io-wq.

> Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com>

In fact, it is reported by "Abaci <abaci@linux.alibaba.com>"

> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Reviewed-and-tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> 
> ---
> 
> Joseph, can you test this patch and see if this fixes it for you?
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index b76bb50f18c7..5f6f1e48954e 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -4472,7 +4472,6 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	 * io_wq_work.flags, so initialize io_wq_work firstly.
>  	 */
>  	io_req_init_async(req);
> -	req->work.flags |= IO_WQ_WORK_NO_CANCEL;
>  
>  	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>  		return -EINVAL;
> @@ -4505,6 +4504,8 @@ static int io_close(struct io_kiocb *req, bool force_nonblock,
>  
>  	/* if the file has a flush method, be safe and punt to async */
>  	if (close->put_file->f_op->flush && force_nonblock) {
> +		/* not safe to cancel at this point */
> +		req->work.flags |= IO_WQ_WORK_NO_CANCEL;
>  		/* was never set, but play safe */
>  		req->flags &= ~REQ_F_NOWAIT;
>  		/* avoid grabbing files - we don't need the files */
> 

  reply	other threads:[~2021-01-20  2:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 17:18 [PATCH] io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state Jens Axboe
2021-01-20  2:07 ` Joseph Qi [this message]
2021-01-20  2:20   ` Jens Axboe
2021-01-20  2:25   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c176c50-0f62-6753-eeef-bbb7a803febf@linux.alibaba.com \
    --to=joseph.qi@linux.alibaba.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).