All of lore.kernel.org
 help / color / mirror / Atom feed
From: Timothy Pearson <tpearson@raptorengineering.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: regressions <regressions@lists.linux.dev>,
	 Pavel Begunkov <asml.silence@gmail.com>
Subject: Re: Regression in io_uring, leading to data corruption
Date: Sat, 11 Nov 2023 12:42:39 -0600 (CST)	[thread overview]
Message-ID: <414471427.46558741.1699728159797.JavaMail.zimbra@raptorengineeringinc.com> (raw)
In-Reply-To: <eee9b0ff-dae7-4613-8c63-fed1043e047c@kernel.dk>

----- Original Message -----
> From: "Jens Axboe" <axboe@kernel.dk>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov" <asml.silence@gmail.com>
> Sent: Friday, November 10, 2023 8:52:05 AM
> Subject: Re: Regression in io_uring, leading to data corruption

> On 11/9/23 11:48 PM, Timothy Pearson wrote:
>> 
>> 
>> ----- Original Message -----
>>> From: "Timothy Pearson" <tpearson@raptorengineering.com>
>>> To: "Jens Axboe" <axboe@kernel.dk>
>>> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov"
>>> <asml.silence@gmail.com>
>>> Sent: Thursday, November 9, 2023 10:35:08 PM
>>> Subject: Re: Regression in io_uring, leading to data corruption
>> 
>>> ----- Original Message -----
>>>> From: "Jens Axboe" <axboe@kernel.dk>
>>>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>>>> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov"
>>>> <asml.silence@gmail.com>
>>>> Sent: Thursday, November 9, 2023 9:51:09 PM
>>>> Subject: Re: Regression in io_uring, leading to data corruption
>>>
>>>> Just to go back to basics, can you try this one? It'll do the exact same
>>>> retry that io-wq is doing, just from the same task itself. If this
>>>> fails, then something core is wrong. I don't think it will, or we'd see
>>>> this on other platforms too of course. If this works, then it validates
>>>> that it's some oddity on ppc with punting this operation to a thread off
>>>> this main task.
>>>>
>>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>>> index 64390d4e20c1..1d760570df04 100644
>>>> --- a/io_uring/rw.c
>>>> +++ b/io_uring/rw.c
>>>> @@ -968,7 +968,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int
>>>> issue_flags)
>>>> 	return IOU_OK;
>>>> }
>>>>
>>>> -int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>>> +static int __io_write(struct io_kiocb *req, unsigned int issue_flags)
>>>> {
>>>> 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
>>>> 	struct io_rw_state __s, *s = &__s;
>>>> @@ -1092,6 +1092,19 @@ int io_write(struct io_kiocb *req, unsigned int
>>>> issue_flags)
>>>> 	return ret;
>>>> }
>>>>
>>>> +int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>>> +{
>>>> +	int ret;
>>>> +
>>>> +	ret = __io_write(req, issue_flags);
>>>> +	if (ret != -EAGAIN)
>>>> +		return ret;
>>>> +
>>>> +	ret = __io_write(req, issue_flags & ~IO_URING_F_NONBLOCK);
>>>> +	WARN_ON_ONCE(ret == -EAGAIN);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> void io_rw_fail(struct io_kiocb *req)
>>>> {
>>>> 	int res;
>>>>
>>>
>>> That does indeed "fix" the corruption issue.
>>>
>>> Where is the punting actually taking place?  I can see at least one location but
>>> if it's a general issue with the punting process I should probably apply any
>>> test mitigations to all locations, and I'm not familiar enough with the
>>> codebase to be sure I've got them all...
>>>
>>> Thanks!
>> 
>> I've been exploring a bunch of other possibilities, and one that has
>> been slowly coalescing is whether we're triggering a bug somewhere
>> else in the kernel.  Now that I know the io_write call is somehow
>> related to this issue, I went back and went over some of the earlier
>> logs, and might have found something.
>> 
>> When I enable KCSAN I sporadically see this type of race:
>> 
>> [ 1549.152381]
>> ==================================================================
>> [ 1549.152515] BUG: KCSAN: data-race in dd_has_work / dd_insert_requests
>> [ 1549.152609]
>> [ 1549.152644] read (marked) to 0xc0000000080c2e98 of 8 bytes by task 3372 on
>> cpu 27:
>> [ 1549.153193]  dd_has_work+0x160/0x1b0
>> [ 1549.153259]  __blk_mq_sched_dispatch_requests+0x42c/0xdf0
>> [ 1549.153331]  blk_mq_sched_dispatch_requests+0xe4/0x120
>> [ 1549.153403]  blk_mq_run_hw_queue+0x358/0x390
>> [ 1549.153479]  blk_mq_flush_plug_list+0x8fc/0xea0
>> [ 1549.153556]  __blk_flush_plug+0x2bc/0x360
>> [ 1549.153622]  blk_finish_plug+0x60/0xa0
>> [ 1549.153689]  __iomap_dio_rw+0xd28/0x1140
>> [ 1549.153759]  iomap_dio_rw+0x80/0xf0
>> [ 1549.153825]  ext4_file_write_iter+0x9f8/0xff0 [ext4]
>> [ 1549.154249]  io_write+0x4bc/0x900
>> [ 1549.154309]  io_issue_sqe+0x12c/0x5f0
>> [ 1549.154370]  io_submit_sqes+0xdd4/0x1050
>> [ 1549.154429]  sys_io_uring_enter+0x344/0x15d0
>> [ 1549.154499]  system_call_exception+0x354/0x400
>> [ 1549.154569]  system_call_vectored_common+0x15c/0x2ec
>> [ 1549.154651]
>> [ 1549.154685] write to 0xc0000000080c2e98 of 8 bytes by task 3667 on cpu 32:
>> [ 1549.154757]  dd_insert_requests+0x81c/0xac0
>> [ 1549.154825]  blk_mq_flush_plug_list+0x8ec/0xea0
>> [ 1549.154902]  __blk_flush_plug+0x2bc/0x360
>> [ 1549.154968]  blk_finish_plug+0x60/0xa0
>> [ 1549.155034]  __iomap_dio_rw+0xd28/0x1140
>> [ 1549.155100]  iomap_dio_rw+0x80/0xf0
>> [ 1549.155166]  ext4_file_write_iter+0x9f8/0xff0 [ext4]
>> [ 1549.155563]  io_write+0x4bc/0x900
>> [ 1549.155606]  io_issue_sqe+0x12c/0x5f0
>> [ 1549.155648]  io_wq_submit_work+0x2e4/0x490
>> [ 1549.155692]  io_worker_handle_work+0xbac/0x1020
>> [ 1549.155745]  io_wq_worker+0x224/0x7b0
>> [ 1549.155792]  ret_from_kernel_user_thread+0x14/0x1c
>> [ 1549.155841]
>> [ 1549.155864] Reported by Kernel Concurrency Sanitizer on:
>> [ 1549.155904] CPU: 32 PID: 3667 Comm: iou-wrk-3372 Not tainted 6.6.0+ #10
>> [ 1549.155961] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw)
>> 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
>> [ 1549.156032]
>> ==================================================================
>> 
>> Notably, the io_write calls are in the chain, and there were exactly
>> two of these races and two test failures in the this test run.
>> 
>> io_write+0x4bc seems to be WRITE_ONCE(list->first, last->next) in wq_list_cut.
> 
> The above race is would just cause hung IO in the IO scheduler, it would
> not lead to corruption. The io_write() call would be call_write_iter(),
> not sure where you get the other one from?
> 
> In any case, when I ran this test case last time, I just used /dev/shm/
> as the backing store and it still hit. Not io scheduler would be
> involved there.

Fair enough.  Was grasping at straws a bit that night.

Quick update on-list, it seems MariaDB uses io_uring for write then tries to go back and do a standard synchronous read.  The data is valid on-disk at some point after the read (i.e. after the process exits, the data is confirmed valid on-disk), but the read itself returns corrupt / stale / garbage data.  MariaDB is the only application I've seen that tries to mix io_uring and standard I/O operations on the same file, and this may be playing into the issues observed.

Investigation continues...

  reply	other threads:[~2023-11-11 18:42 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07 16:34 Regression in io_uring, leading to data corruption Timothy Pearson
2023-11-07 16:49 ` Jens Axboe
2023-11-07 16:57   ` Timothy Pearson
2023-11-07 17:14     ` Jens Axboe
2023-11-07 21:22 ` Jens Axboe
2023-11-07 21:39   ` Timothy Pearson
2023-11-07 21:46     ` Jens Axboe
2023-11-07 22:07       ` Timothy Pearson
2023-11-07 22:16         ` Jens Axboe
2023-11-07 22:29           ` Timothy Pearson
2023-11-07 22:44             ` Jens Axboe
2023-11-07 23:12               ` Timothy Pearson
2023-11-07 23:16                 ` Jens Axboe
2023-11-07 23:34                   ` Timothy Pearson
2023-11-07 23:52                     ` Jens Axboe
2023-11-08  0:02                       ` Timothy Pearson
2023-11-08  0:09                         ` Jens Axboe
2023-11-08  3:27                           ` Timothy Pearson
2023-11-08  3:30                             ` Timothy Pearson
2023-11-08  4:00                           ` Timothy Pearson
2023-11-08 15:10                             ` Jens Axboe
2023-11-08 15:14                               ` Jens Axboe
2023-11-08 17:10                                 ` Timothy Pearson
2023-11-08 17:26                                   ` Jens Axboe
2023-11-08 17:40                                     ` Timothy Pearson
2023-11-08 17:49                                       ` Jens Axboe
2023-11-08 17:57                                         ` Jens Axboe
2023-11-08 18:36                                           ` Timothy Pearson
2023-11-08 18:51                                             ` Timothy Pearson
2023-11-08 19:08                                               ` Jens Axboe
2023-11-08 19:06                                             ` Jens Axboe
2023-11-08 22:05                                               ` Jens Axboe
2023-11-08 22:15                                                 ` Timothy Pearson
2023-11-08 22:18                                                   ` Jens Axboe
2023-11-08 22:28                                                     ` Timothy Pearson
2023-11-08 23:58                                                     ` Jens Axboe
2023-11-09 15:12                                                       ` Jens Axboe
2023-11-09 17:00                                                         ` Timothy Pearson
2023-11-09 17:17                                                           ` Jens Axboe
2023-11-09 17:24                                                             ` Timothy Pearson
2023-11-09 17:30                                                               ` Jens Axboe
2023-11-09 17:36                                                                 ` Timothy Pearson
2023-11-09 17:38                                                                   ` Jens Axboe
2023-11-09 17:42                                                                     ` Timothy Pearson
2023-11-09 17:45                                                                       ` Jens Axboe
2023-11-09 18:20                                                                         ` tpearson
2023-11-10  3:51                                                                           ` Jens Axboe
2023-11-10  4:35                                                                             ` Timothy Pearson
2023-11-10  6:48                                                                               ` Timothy Pearson
2023-11-10 14:52                                                                                 ` Jens Axboe
2023-11-11 18:42                                                                                   ` Timothy Pearson [this message]
2023-11-11 18:58                                                                                     ` Jens Axboe
2023-11-11 19:04                                                                                       ` Timothy Pearson
2023-11-11 19:11                                                                                         ` Jens Axboe
2023-11-11 19:15                                                                                           ` Timothy Pearson
2023-11-11 19:23                                                                                             ` Jens Axboe
2023-11-11 21:57                                                                                     ` Timothy Pearson
2023-11-13 17:06                                                                                       ` Timothy Pearson
2023-11-13 17:39                                                                                         ` Jens Axboe
2023-11-13 19:02                                                                                           ` Timothy Pearson
2023-11-13 19:29                                                                                             ` Jens Axboe
2023-11-13 20:58                                                                                               ` Timothy Pearson
2023-11-13 21:22                                                                                                 ` Timothy Pearson
2023-11-13 22:15                                                                                                 ` Jens Axboe
2023-11-13 23:19                                                                                                   ` Timothy Pearson
2023-11-13 23:48                                                                                                     ` Jens Axboe
2023-11-14  0:04                                                                                                       ` Timothy Pearson
2023-11-14  0:13                                                                                                         ` Jens Axboe
2023-11-14  0:52                                                                                                           ` Timothy Pearson
2023-11-14  5:06                                                                                                             ` Timothy Pearson
2023-11-14 13:17                                                                                                               ` Jens Axboe
2023-11-14 16:59                                                                                                                 ` Timothy Pearson
2023-11-14 17:04                                                                                                                   ` Jens Axboe
2023-11-14 17:14                                                                                                                     ` Timothy Pearson
2023-11-14 17:17                                                                                                                       ` Jens Axboe
2023-11-14 17:21                                                                                                                         ` Timothy Pearson
2023-11-14 17:57                                                                                                                           ` Timothy Pearson
2023-11-14 18:02                                                                                                                             ` Jens Axboe
2023-11-14 18:12                                                                                                                               ` Timothy Pearson
2023-11-14 18:26                                                                                                                                 ` Jens Axboe
2023-11-15 11:03                                                                                                                                   ` Timothy Pearson
2023-11-15 16:46                                                                                                                                     ` Jens Axboe
2023-11-15 17:03                                                                                                                                       ` Timothy Pearson
2023-11-15 18:30                                                                                                                                         ` Jens Axboe
2023-11-15 18:35                                                                                                                                           ` Timothy Pearson
2023-11-15 18:37                                                                                                                                             ` Jens Axboe
2023-11-15 18:40                                                                                                                                               ` Timothy Pearson
2023-11-15 19:00                                                                                                                                           ` Jens Axboe
2023-11-16  3:28                                                                                                                                             ` Timothy Pearson
2023-11-16  3:46                                                                                                                                               ` Jens Axboe
2023-11-16  3:54                                                                                                                                                 ` Timothy Pearson
2023-11-19  0:16                                                                                                                                                   ` Timothy Pearson
2023-11-13 20:47                                                                                         ` Jens Axboe
2023-11-13 21:08                                                                                           ` Timothy Pearson
2023-11-10 14:48                                                                               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=414471427.46558741.1699728159797.JavaMail.zimbra@raptorengineeringinc.com \
    --to=tpearson@raptorengineering.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.