All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
	Josef <josef.grieb@gmail.com>,
	io-uring@vger.kernel.org
Cc: norman@apache.org
Subject: Re: io_uring process termination/killing is not working
Date: Sun, 16 Aug 2020 08:22:53 -0700	[thread overview]
Message-ID: <67cf568c-27fc-d298-5267-1212f9421b74@kernel.dk> (raw)
In-Reply-To: <d2341bc7-e7c8-110f-e60c-39fc03c62160@kernel.dk>

On 8/16/20 7:53 AM, Jens Axboe wrote:
> On 8/16/20 6:45 AM, Jens Axboe wrote:
>> On 8/15/20 9:48 AM, Pavel Begunkov wrote:
>>> On 15/08/2020 18:12, Jens Axboe wrote:
>>>> On 8/15/20 12:45 AM, Pavel Begunkov wrote:
>>>>> On 13/08/2020 02:32, Jens Axboe wrote:
>>>>>> On 8/12/20 12:28 PM, Pavel Begunkov wrote:
>>>>>>> On 12/08/2020 21:22, Pavel Begunkov wrote:
>>>>>>>> On 12/08/2020 21:20, Pavel Begunkov wrote:
>>>>>>>>> On 12/08/2020 21:05, Jens Axboe wrote:
>>>>>>>>>> On 8/12/20 11:58 AM, Josef wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have a weird issue on kernel 5.8.0/5.8.1, SIGINT even SIGKILL
>>>>>>>>>>> doesn't work to kill this process(always state D or D+), literally I
>>>>>>>>>>> have to terminate my VM because even the kernel can't kill the process
>>>>>>>>>>> and no issue on 5.7.12-201, however if IOSQE_IO_LINK is not set, it
>>>>>>>>>>> works
>>>>>>>>>>>
>>>>>>>>>>> I've attached a file to reproduce it
>>>>>>>>>>> or here
>>>>>>>>>>> https://gist.github.com/1Jo1/15cb3c63439d0c08e3589cfa98418b2c
>>>>>>>>>>
>>>>>>>>>> Thanks, I'll take a look at this. It's stuck in uninterruptible
>>>>>>>>>> state, which is why you can't kill it.
>>>>>>>>>
>>>>>>>>> It looks like one of the hangs I've been talking about a few days ago,
>>>>>>>>> an accept is inflight but can't be found by cancel_files() because it's
>>>>>>>>> in a link.
>>>>>>>>
>>>>>>>> BTW, I described it a month ago, there were more details.
>>>>>>>
>>>>>>> https://lore.kernel.org/io-uring/34eb5e5a-8d37-0cae-be6c-c6ac4d85b5d4@gmail.com
>>>>>>
>>>>>> Yeah I think you're right. How about something like the below? That'll
>>>>>> potentially cancel more than just the one we're looking for, but seems
>>>>>> kind of silly to only cancel from the file table holding request and to
>>>>>> the end.
>>>>>
>>>>> The bug is not poll/t-out related, IIRC my test reproduces it with
>>>>> read(pipe)->open(). See the previously sent link.
>>>>
>>>> Right, but in this context for poll, I just mean any request that has a
>>>> poll handler armed. Not necessarily only a pure poll. The patch should
>>>> fix your case, too.
>>>
>>> Ok. I was thinking about sleeping in io_read(), etc. from io-wq context.
>>> That should have the same effect.
>>
>> We already cancel any blocking work for the exiting task - but we do
>> that _after_ trying to cancel files, so we should probably just swap
>> those around in io_uring_flush(). That'll remove any need to find and
>> cancel those explicitly in io_uring_cancel_files().
> 
> I guess there's still the case of the task just closing the fd, not
> necessarily exiting. So I do agree with you that the io-wq case is still
> unhandled. I'll take a look...

The below should do it.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc506b75659c..346a3eb84785 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8063,6 +8063,33 @@ static bool io_timeout_remove_link(struct io_ring_ctx *ctx,
 	return found;
 }
 
+static bool io_cancel_link_cb(struct io_wq_work *work, void *data)
+{
+	return io_match_link(container_of(work, struct io_kiocb, work), data);
+}
+
+static void io_attempt_cancel(struct io_ring_ctx *ctx, struct io_kiocb *req)
+{
+	enum io_wq_cancel cret;
+
+	/* cancel this particular work, if it's running */
+	cret = io_wq_cancel_work(ctx->io_wq, &req->work);
+	if (cret != IO_WQ_CANCEL_NOTFOUND)
+		return;
+
+	/* find links that hold this pending, cancel those */
+	cret = io_wq_cancel_cb(ctx->io_wq, io_cancel_link_cb, req, true);
+	if (cret != IO_WQ_CANCEL_NOTFOUND)
+		return;
+
+	/* if we have a poll link holding this pending, cancel that */
+	if (io_poll_remove_link(ctx, req))
+		return;
+
+	/* final option, timeout link is holding this req pending */
+	io_timeout_remove_link(ctx, req);
+}
+
 static void io_uring_cancel_files(struct io_ring_ctx *ctx,
 				  struct files_struct *files)
 {
@@ -8116,10 +8143,8 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx,
 				continue;
 			}
 		} else {
-			io_wq_cancel_work(ctx->io_wq, &cancel_req->work);
-			/* could be a link, check and remove if it is */
-			if (!io_poll_remove_link(ctx, cancel_req))
-				io_timeout_remove_link(ctx, cancel_req);
+			/* cancel this request, or head link requests */
+			io_attempt_cancel(ctx, cancel_req);
 			io_put_req(cancel_req);
 		}
 
-- 
Jens Axboe


  reply	other threads:[~2020-08-16 15:23 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-12 17:58 io_uring process termination/killing is not working Josef
2020-08-12 18:05 ` Jens Axboe
2020-08-12 18:20   ` Pavel Begunkov
2020-08-12 18:22     ` Pavel Begunkov
2020-08-12 18:28       ` Pavel Begunkov
2020-08-12 23:32         ` Jens Axboe
2020-08-13 16:07           ` Josef
2020-08-13 16:09             ` Jens Axboe
2020-08-15  7:45           ` Pavel Begunkov
2020-08-15 15:12             ` Jens Axboe
2020-08-15 16:48               ` Pavel Begunkov
2020-08-15 21:43                 ` Josef
2020-08-15 22:35                   ` Jens Axboe
2020-08-15 23:21                     ` Josef
2020-08-15 23:31                     ` Jens Axboe
2020-08-16  0:36                       ` Josef
2020-08-16  0:41                         ` Jens Axboe
2020-08-16  1:21                           ` Jens Axboe
2020-08-16  3:14                             ` Josef
2020-08-16  3:20                               ` Jens Axboe
2020-08-16 17:30                                 ` Jens Axboe
2020-08-16 21:09                                   ` Josef
2020-08-16 22:17                                     ` Jens Axboe
2020-08-17  8:58                                       ` Josef
2020-08-17 10:08                                         ` Pavel Begunkov
2020-08-16 13:45                 ` Jens Axboe
2020-08-16 14:53                   ` Jens Axboe
2020-08-16 15:22                     ` Jens Axboe [this message]
2020-08-17 10:16                       ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67cf568c-27fc-d298-5267-1212f9421b74@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=josef.grieb@gmail.com \
    --cc=norman@apache.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.