All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Timothy Pearson <tpearson@raptorengineering.com>
Cc: regressions <regressions@lists.linux.dev>,
	Pavel Begunkov <asml.silence@gmail.com>
Subject: Re: Regression in io_uring, leading to data corruption
Date: Tue, 7 Nov 2023 16:16:24 -0700	[thread overview]
Message-ID: <7bd67c30-3b9f-4541-864a-b082917ee8b5@kernel.dk> (raw)
In-Reply-To: <505960353.45641688.1699398777243.JavaMail.zimbra@raptorengineeringinc.com>

On 11/7/23 4:12 PM, Timothy Pearson wrote:
> 
> 
> ----- Original Message -----
>> From: "Jens Axboe" <axboe@kernel.dk>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov" <asml.silence@gmail.com>
>> Sent: Tuesday, November 7, 2023 4:44:56 PM
>> Subject: Re: Regression in io_uring, leading to data corruption
> 
>> On 11/7/23 3:29 PM, Timothy Pearson wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Jens Axboe" <axboe@kernel.dk>
>>>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>>>> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov"
>>>> <asml.silence@gmail.com>
>>>> Sent: Tuesday, November 7, 2023 4:16:51 PM
>>>> Subject: Re: Regression in io_uring, leading to data corruption
>>>
>>>> On 11/7/23 3:07 PM, Timothy Pearson wrote:
>>>>> Interestingly enough that's where my current investigation is leading
>>>>> as well.  After instrumenting and re-instrumenting the codebase far
>>>>> more times than I'd like to admit, I've noticed that the workers don't
>>>>> time out on kernel builds that show the corruption, they are directly
>>>>> terminated via signal (SIGKILL).  On kernel builds with the delay, the
>>>>> workers time out and self-terminate.  I'm still trying to parse out
>>>>> what the exact difference between these two mechanisms would be and
>>>>> how it plays into the corruption, but at least it's a start...
>>>>
>>>> Maybe poke at how they are exiting - you say timeout, so they've been
>>>> idle for a while and then go away? This would then cause worker creation
>>>> again later on if, say, we have 1 worker left and it goes to sleep. So
>>>> the timeout itself may not tell you much, outside of then causing that
>>>> other condition to happen. You could even try and shrink the timeout to
>>>> HZ / 10 or something like that to make it more likely to happen.
>>>
>>> Agreed.  As of right now I can confirm that with the delay in place
>>> (no corruption) the workers are exiting on their own, no signals and
>>> no IO_EXIT bit being set.  When I remove the delay (reintroducing the
>>> corruption) I see signal 9 being sent to the workers, and a mix of
>>> IO_EXIT being set and not being set.
>>>
>>> Ignoring the signal 9 does not fix the corruption, which makes me
>>> wonder more about IO_EXIT and whether things are not fully committed /
>>> properly torn down when the worker thread terminates.  This also
>>> dovetails nicely with the fact that the observed write corruption
>>> always seems to be in the latter portions of the page, never at the
>>> beginning of the page, also indicating rapid / unclean termination of
>>> the writer process.
>>>
>>> Will keep digging...
>>
>> This is useful. If the workers are exiting, they will try and process
>> work that is still pending. And it obviously does, or the process would
>> hang on exit or ring exit. But it'll also cancel said work, which
>> obviously did not happen for the old kthread scheme, as there was no way
>> to do that. So you'd just wait for it. Hence maybe what's happening here
>> is that mtr/mysql/mariadb isn't properly waiting for pending writes to
>> finish? It's just assuming that previously submitted writes will finish
>> if the task is killed?
> 
> It's entirely possible.  What is the correct way to wait for pending
> writes via the liburing API?  MariaDB uses liburing under the hood,
> and if I know the call(s) to look for I can make sure it's properly
> handling task exit.

I'd expect the task to wait and verify the results of pending requests
before doing io_uring_queue_exit(). But I'm not familiar with the code
base, maybe the task just exits? Closing the io_uring fd from an exiting
task would do the same.

I tried the below patch when running mtr here, but don't see any of them
trigger. You can try that, that'll tell you if we ever run cancelations
on pending io-wq work. If that triggers, I can try and cook up something
that would figure out where that is coming from.

But since you said you're seeing exits on signal 9, that would seem to
indicate that someone ran SIGKILL with potentially pending IO.

>> What page size are you using?
> 
> I've tested on both 4k and 64k page kernels with no difference.
> MariaDB is using a 16k page size on disk, and when the corruption
> happens it's apparently only writing part of the 16k page.

OK.


diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 522196dfb0ff..2ee18905d57e 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -553,6 +553,8 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
 	struct io_wq *wq = worker->wq;
 	bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state);
 
+	WARN_ON_ONCE(do_kill);
+
 	do {
 		struct io_wq_work *work;
 
@@ -889,6 +891,7 @@ static bool io_wq_worker_wake(struct io_worker *worker, void *data)
 static void io_run_cancel(struct io_wq_work *work, struct io_wq *wq)
 {
 	do {
+		WARN_ON_ONCE(1);
 		work->flags |= IO_WQ_WORK_CANCEL;
 		wq->do_work(work);
 		work = wq->free_work(work);
@@ -934,6 +937,7 @@ void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work)
 	 */
 	if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
 	    (work->flags & IO_WQ_WORK_CANCEL)) {
+		WARN_ON_ONCE(1);
 		io_run_cancel(work, wq);
 		return;
 	}
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ed254076c723..c0bd35e5429a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1943,7 +1943,9 @@ void io_wq_submit_work(struct io_wq_work *work)
 
 	/* either cancelled or io-wq is dying, so don't touch tctx->iowq */
 	if (work->flags & IO_WQ_WORK_CANCEL) {
+		WARN_ON_ONCE(1);
 fail:
+		WARN_ON_ONCE(1);
 		io_req_task_queue_fail(req, err);
 		return;
 	}

-- 
Jens Axboe


  reply	other threads:[~2023-11-07 23:16 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07 16:34 Regression in io_uring, leading to data corruption Timothy Pearson
2023-11-07 16:49 ` Jens Axboe
2023-11-07 16:57   ` Timothy Pearson
2023-11-07 17:14     ` Jens Axboe
2023-11-07 21:22 ` Jens Axboe
2023-11-07 21:39   ` Timothy Pearson
2023-11-07 21:46     ` Jens Axboe
2023-11-07 22:07       ` Timothy Pearson
2023-11-07 22:16         ` Jens Axboe
2023-11-07 22:29           ` Timothy Pearson
2023-11-07 22:44             ` Jens Axboe
2023-11-07 23:12               ` Timothy Pearson
2023-11-07 23:16                 ` Jens Axboe [this message]
2023-11-07 23:34                   ` Timothy Pearson
2023-11-07 23:52                     ` Jens Axboe
2023-11-08  0:02                       ` Timothy Pearson
2023-11-08  0:09                         ` Jens Axboe
2023-11-08  3:27                           ` Timothy Pearson
2023-11-08  3:30                             ` Timothy Pearson
2023-11-08  4:00                           ` Timothy Pearson
2023-11-08 15:10                             ` Jens Axboe
2023-11-08 15:14                               ` Jens Axboe
2023-11-08 17:10                                 ` Timothy Pearson
2023-11-08 17:26                                   ` Jens Axboe
2023-11-08 17:40                                     ` Timothy Pearson
2023-11-08 17:49                                       ` Jens Axboe
2023-11-08 17:57                                         ` Jens Axboe
2023-11-08 18:36                                           ` Timothy Pearson
2023-11-08 18:51                                             ` Timothy Pearson
2023-11-08 19:08                                               ` Jens Axboe
2023-11-08 19:06                                             ` Jens Axboe
2023-11-08 22:05                                               ` Jens Axboe
2023-11-08 22:15                                                 ` Timothy Pearson
2023-11-08 22:18                                                   ` Jens Axboe
2023-11-08 22:28                                                     ` Timothy Pearson
2023-11-08 23:58                                                     ` Jens Axboe
2023-11-09 15:12                                                       ` Jens Axboe
2023-11-09 17:00                                                         ` Timothy Pearson
2023-11-09 17:17                                                           ` Jens Axboe
2023-11-09 17:24                                                             ` Timothy Pearson
2023-11-09 17:30                                                               ` Jens Axboe
2023-11-09 17:36                                                                 ` Timothy Pearson
2023-11-09 17:38                                                                   ` Jens Axboe
2023-11-09 17:42                                                                     ` Timothy Pearson
2023-11-09 17:45                                                                       ` Jens Axboe
2023-11-09 18:20                                                                         ` tpearson
2023-11-10  3:51                                                                           ` Jens Axboe
2023-11-10  4:35                                                                             ` Timothy Pearson
2023-11-10  6:48                                                                               ` Timothy Pearson
2023-11-10 14:52                                                                                 ` Jens Axboe
2023-11-11 18:42                                                                                   ` Timothy Pearson
2023-11-11 18:58                                                                                     ` Jens Axboe
2023-11-11 19:04                                                                                       ` Timothy Pearson
2023-11-11 19:11                                                                                         ` Jens Axboe
2023-11-11 19:15                                                                                           ` Timothy Pearson
2023-11-11 19:23                                                                                             ` Jens Axboe
2023-11-11 21:57                                                                                     ` Timothy Pearson
2023-11-13 17:06                                                                                       ` Timothy Pearson
2023-11-13 17:39                                                                                         ` Jens Axboe
2023-11-13 19:02                                                                                           ` Timothy Pearson
2023-11-13 19:29                                                                                             ` Jens Axboe
2023-11-13 20:58                                                                                               ` Timothy Pearson
2023-11-13 21:22                                                                                                 ` Timothy Pearson
2023-11-13 22:15                                                                                                 ` Jens Axboe
2023-11-13 23:19                                                                                                   ` Timothy Pearson
2023-11-13 23:48                                                                                                     ` Jens Axboe
2023-11-14  0:04                                                                                                       ` Timothy Pearson
2023-11-14  0:13                                                                                                         ` Jens Axboe
2023-11-14  0:52                                                                                                           ` Timothy Pearson
2023-11-14  5:06                                                                                                             ` Timothy Pearson
2023-11-14 13:17                                                                                                               ` Jens Axboe
2023-11-14 16:59                                                                                                                 ` Timothy Pearson
2023-11-14 17:04                                                                                                                   ` Jens Axboe
2023-11-14 17:14                                                                                                                     ` Timothy Pearson
2023-11-14 17:17                                                                                                                       ` Jens Axboe
2023-11-14 17:21                                                                                                                         ` Timothy Pearson
2023-11-14 17:57                                                                                                                           ` Timothy Pearson
2023-11-14 18:02                                                                                                                             ` Jens Axboe
2023-11-14 18:12                                                                                                                               ` Timothy Pearson
2023-11-14 18:26                                                                                                                                 ` Jens Axboe
2023-11-15 11:03                                                                                                                                   ` Timothy Pearson
2023-11-15 16:46                                                                                                                                     ` Jens Axboe
2023-11-15 17:03                                                                                                                                       ` Timothy Pearson
2023-11-15 18:30                                                                                                                                         ` Jens Axboe
2023-11-15 18:35                                                                                                                                           ` Timothy Pearson
2023-11-15 18:37                                                                                                                                             ` Jens Axboe
2023-11-15 18:40                                                                                                                                               ` Timothy Pearson
2023-11-15 19:00                                                                                                                                           ` Jens Axboe
2023-11-16  3:28                                                                                                                                             ` Timothy Pearson
2023-11-16  3:46                                                                                                                                               ` Jens Axboe
2023-11-16  3:54                                                                                                                                                 ` Timothy Pearson
2023-11-19  0:16                                                                                                                                                   ` Timothy Pearson
2023-11-13 20:47                                                                                         ` Jens Axboe
2023-11-13 21:08                                                                                           ` Timothy Pearson
2023-11-10 14:48                                                                               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bd67c30-3b9f-4541-864a-b082917ee8b5@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=regressions@lists.linux.dev \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.