io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, Josef <josef.grieb@gmail.com>,
	io-uring@vger.kernel.org
Cc: norman@apache.org
Subject: Re: io_uring process termination/killing is not working
Date: Mon, 17 Aug 2020 13:16:21 +0300	[thread overview]
Message-ID: <c416ed54-67d8-6f55-53e3-bad43e60b379@gmail.com> (raw)
In-Reply-To: <67cf568c-27fc-d298-5267-1212f9421b74@kernel.dk>

On 16/08/2020 18:22, Jens Axboe wrote:
> On 8/16/20 7:53 AM, Jens Axboe wrote:
>> On 8/16/20 6:45 AM, Jens Axboe wrote:
>>> On 8/15/20 9:48 AM, Pavel Begunkov wrote:
>>>> On 15/08/2020 18:12, Jens Axboe wrote:
>>>>> On 8/15/20 12:45 AM, Pavel Begunkov wrote:
>>>>>> On 13/08/2020 02:32, Jens Axboe wrote:
>>>>>>> On 8/12/20 12:28 PM, Pavel Begunkov wrote:
>>>>>>>> On 12/08/2020 21:22, Pavel Begunkov wrote:
>>>>>>>>> On 12/08/2020 21:20, Pavel Begunkov wrote:
>>>>>>>>>> On 12/08/2020 21:05, Jens Axboe wrote:
>>>>>>>>>>> On 8/12/20 11:58 AM, Josef wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a weird issue on kernel 5.8.0/5.8.1, SIGINT even SIGKILL
>>>>>>>>>>>> doesn't work to kill this process(always state D or D+), literally I
>>>>>>>>>>>> have to terminate my VM because even the kernel can't kill the process
>>>>>>>>>>>> and no issue on 5.7.12-201, however if IOSQE_IO_LINK is not set, it
>>>>>>>>>>>> works
>>>>>>>>>>>>
>>>>>>>>>>>> I've attached a file to reproduce it
>>>>>>>>>>>> or here
>>>>>>>>>>>> https://gist.github.com/1Jo1/15cb3c63439d0c08e3589cfa98418b2c
>>>>>>>>>>>
>>>>>>>>>>> Thanks, I'll take a look at this. It's stuck in uninterruptible
>>>>>>>>>>> state, which is why you can't kill it.
>>>>>>>>>>
>>>>>>>>>> It looks like one of the hangs I've been talking about a few days ago,
>>>>>>>>>> an accept is inflight but can't be found by cancel_files() because it's
>>>>>>>>>> in a link.
>>>>>>>>>
>>>>>>>>> BTW, I described it a month ago, there were more details.
>>>>>>>>
>>>>>>>> https://lore.kernel.org/io-uring/34eb5e5a-8d37-0cae-be6c-c6ac4d85b5d4@gmail.com
>>>>>>>
>>>>>>> Yeah I think you're right. How about something like the below? That'll
>>>>>>> potentially cancel more than just the one we're looking for, but seems
>>>>>>> kind of silly to only cancel from the file table holding request and to
>>>>>>> the end.
>>>>>>
>>>>>> The bug is not poll/t-out related, IIRC my test reproduces it with
>>>>>> read(pipe)->open(). See the previously sent link.
>>>>>
>>>>> Right, but in this context for poll, I just mean any request that has a
>>>>> poll handler armed. Not necessarily only a pure poll. The patch should
>>>>> fix your case, too.
>>>>
>>>> Ok. I was thinking about sleeping in io_read(), etc. from io-wq context.
>>>> That should have the same effect.
>>>
>>> We already cancel any blocking work for the exiting task - but we do
>>> that _after_ trying to cancel files, so we should probably just swap
>>> those around in io_uring_flush(). That'll remove any need to find and
>>> cancel those explicitly in io_uring_cancel_files().
>>
>> I guess there's still the case of the task just closing the fd, not
>> necessarily exiting. So I do agree with you that the io-wq case is still
>> unhandled. I'll take a look...

Right. It should be in the cancel code itself.
Thanks for dealing with this. It seems, you had a busy week with all
these syzkaller reports.


> 
> The below should do it.
> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index dc506b75659c..346a3eb84785 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8063,6 +8063,33 @@ static bool io_timeout_remove_link(struct io_ring_ctx *ctx,
>  	return found;
>  }
>  
> +static bool io_cancel_link_cb(struct io_wq_work *work, void *data)
> +{
> +	return io_match_link(container_of(work, struct io_kiocb, work), data);
> +}
> +
> +static void io_attempt_cancel(struct io_ring_ctx *ctx, struct io_kiocb *req)
> +{
> +	enum io_wq_cancel cret;
> +
> +	/* cancel this particular work, if it's running */
> +	cret = io_wq_cancel_work(ctx->io_wq, &req->work);
> +	if (cret != IO_WQ_CANCEL_NOTFOUND)
> +		return;
> +
> +	/* find links that hold this pending, cancel those */
> +	cret = io_wq_cancel_cb(ctx->io_wq, io_cancel_link_cb, req, true);
> +	if (cret != IO_WQ_CANCEL_NOTFOUND)
> +		return;
> +
> +	/* if we have a poll link holding this pending, cancel that */
> +	if (io_poll_remove_link(ctx, req))
> +		return;
> +
> +	/* final option, timeout link is holding this req pending */
> +	io_timeout_remove_link(ctx, req);
> +}
> +
>  static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>  				  struct files_struct *files)
>  {
> @@ -8116,10 +8143,8 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>  				continue;
>  			}
>  		} else {
> -			io_wq_cancel_work(ctx->io_wq, &cancel_req->work);
> -			/* could be a link, check and remove if it is */
> -			if (!io_poll_remove_link(ctx, cancel_req))
> -				io_timeout_remove_link(ctx, cancel_req);
> +			/* cancel this request, or head link requests */
> +			io_attempt_cancel(ctx, cancel_req);
>  			io_put_req(cancel_req);
>  		}
>  
> 

-- 
Pavel Begunkov

      reply	other threads:[~2020-08-17 10:18 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-12 17:58 io_uring process termination/killing is not working Josef
2020-08-12 18:05 ` Jens Axboe
2020-08-12 18:20   ` Pavel Begunkov
2020-08-12 18:22     ` Pavel Begunkov
2020-08-12 18:28       ` Pavel Begunkov
2020-08-12 23:32         ` Jens Axboe
2020-08-13 16:07           ` Josef
2020-08-13 16:09             ` Jens Axboe
2020-08-15  7:45           ` Pavel Begunkov
2020-08-15 15:12             ` Jens Axboe
2020-08-15 16:48               ` Pavel Begunkov
2020-08-15 21:43                 ` Josef
2020-08-15 22:35                   ` Jens Axboe
2020-08-15 23:21                     ` Josef
2020-08-15 23:31                     ` Jens Axboe
2020-08-16  0:36                       ` Josef
2020-08-16  0:41                         ` Jens Axboe
2020-08-16  1:21                           ` Jens Axboe
2020-08-16  3:14                             ` Josef
2020-08-16  3:20                               ` Jens Axboe
2020-08-16 17:30                                 ` Jens Axboe
2020-08-16 21:09                                   ` Josef
2020-08-16 22:17                                     ` Jens Axboe
2020-08-17  8:58                                       ` Josef
2020-08-17 10:08                                         ` Pavel Begunkov
2020-08-16 13:45                 ` Jens Axboe
2020-08-16 14:53                   ` Jens Axboe
2020-08-16 15:22                     ` Jens Axboe
2020-08-17 10:16                       ` Pavel Begunkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c416ed54-67d8-6f55-53e3-bad43e60b379@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=josef.grieb@gmail.com \
    --cc=norman@apache.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).