All of lore.kernel.org
 help / color / mirror / Atom feed
From: Timothy Pearson <tpearson@raptorengineering.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: regressions <regressions@lists.linux.dev>,
	 Pavel Begunkov <asml.silence@gmail.com>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	npiggin <npiggin@gmail.com>
Subject: Re: Regression in io_uring, leading to data corruption
Date: Wed, 15 Nov 2023 12:40:26 -0600 (CST)	[thread overview]
Message-ID: <904989402.47567402.1700073626048.JavaMail.zimbra@raptorengineeringinc.com> (raw)
In-Reply-To: <edc9d8b9-3583-4f68-9deb-57bff838138d@kernel.dk>



----- Original Message -----
> From: "Jens Axboe" <axboe@kernel.dk>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov" <asml.silence@gmail.com>, "Michael Ellerman"
> <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>
> Sent: Wednesday, November 15, 2023 12:37:31 PM
> Subject: Re: Regression in io_uring, leading to data corruption

> On 11/15/23 11:35 AM, Timothy Pearson wrote:
>> 
>> 
>> ----- Original Message -----
>>> From: "Jens Axboe" <axboe@kernel.dk>
>>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>>> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov"
>>> <asml.silence@gmail.com>, "Michael Ellerman"
>>> <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>
>>> Sent: Wednesday, November 15, 2023 12:30:15 PM
>>> Subject: Re: Regression in io_uring, leading to data corruption
>> 
>>> On 11/15/23 10:03 AM, Timothy Pearson wrote:
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Jens Axboe" <axboe@kernel.dk>
>>>>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>>>>> Cc: "regressions" <regressions@lists.linux.dev>, "Pavel Begunkov"
>>>>> <asml.silence@gmail.com>, "Michael Ellerman"
>>>>> <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>
>>>>> Sent: Wednesday, November 15, 2023 10:46:58 AM
>>>>> Subject: Re: Regression in io_uring, leading to data corruption
>>>>
>>>>> On 11/15/23 4:03 AM, Timothy Pearson wrote:
>>>>>> I haven't had much success in getting the IPI path to work properly,
>>>>>> but by leaving task_work_add() in TWA_SIGNAL_NO_IPI mode I was at
>>>>>> least able to narrow down one of the areas that's going wrong.  Bear
>>>>>> in mind that as soon as I reenable IPI the corruption returns with a
>>>>>> vengeance, so this is not the correct fix yet by any means -- I am
>>>>>> currently soliciting feedback on what else might be going wrong at
>>>>>> this point since I've already spent a couple of weeks on this and am
>>>>>> not sure how much more time I can spend before we just have to shut
>>>>>> io_uring down on ppc64 for the forseeable future.
>>>>>>
>>>>>> Whatever the root cause actually is, something is *very* sensitive to
>>>>>> timing in both the worker thread creation path and the io_queue_sqe()
>>>>>> / io_queue_async() paths.  I can make the corruption disappear by
>>>>>> adding a udelay(1000) before io_queue_async() in the io_queue_sqe()
>>>>>> function, however no amount of memory barriers in the io_queue_async()
>>>>>> path (including in the kbuf recycling code) will fully resolve the
>>>>>> problem.
>>>>>>
>>>>>> Jens, would a small delay like that in io_queue_sqe() reduce the
>>>>>> amount of workers being created overall?  I know with some of the
>>>>>> other delay locations worker allocation was changing, from what I see
>>>>>> this one wouldn't seem to have much effect, but I'm still looking for
>>>>>> a sanity check.  If we're needing to wait for a millisecond for some
>>>>>> other thread to complete before moving on that might be valuable
>>>>>> information -- would also potentially tie in to the IPI path still
>>>>>> malfunctioning as the worker would immediately start executing.
>>>>>
>>>>> If io_queue_sqe() ultimately ends up punting to io-wq for this request,
>>>>> then yes doing a 1ms delay in there would ultimately then need to a 1ms
>>>>> delay before we either pass to an existing worker or create a new one.
>>>>>
>>>>>> On a related note, how is inter-thread safety of the io_kiocb buffer
>>>>>> list guaranteed, especially on weak memory model systems?  As I
>>>>>> understand it, different workers running on different cores could
>>>>>> potentially be interacting with the same kiocb request and the same
>>>>>> buffer list, and that does dovetail with the fact that punting to a
>>>>>> different I/O worker (usually on another core) seems to provoke the
>>>>>> problem.  I tried adding memory barriers to some of the basic recycle
>>>>>> functions without too much success -- it seemed to help somewhat, but
>>>>>> nowhere near complete resolution, and the buffers are used in a number
>>>>>> of other places I didn't even try to poke at.  I wanted to get some
>>>>>> feedback on this concept before going down yet another rabbit hole...
>>>>>
>>>>> This relies on the fact that we grab the wq lock before inserting this
>>>>> work, and the unlocking will be a barrier. It's important to note that
>>>>> this isn't any different than from before io-wq was using native
>>>>> workers, the only difference is that it used to be kthreads before, and
>>>>> now it's native threads to the application. The kthreads did a bunch of
>>>>> work to assume the necessary identity to do the read or write operation
>>>>> (which is ultimately why that approach went away, as it was just
>>>>> inherently unsafe), whereas the native threads do not as they already
>>>>> have what they need.
>>>>>
>>>>> I had a patch that just punted to a kthread and did the necessary
>>>>> kthread_use_mm(), perform op, kthread_unuse_mm() and it works fine at
>>>>> that point. Within the existing code...
>>>>
>>>> Would you happen to have that patch still?  It would provide a
>>>> possible starting point for figuring out the exact difference.  If not
>>>> I guess I could hack something similar up.
>>>
>>> Let me see if I can find it, and make sure it applies on the current
>>> tree. I'll send you one in a bit.
>> 
>> Much appreciated.
>> 
>> New question -- should a user of liburing be able to oops the kernel
>> under any circumstances (userspace access / NULL pointer dereference)?
>> I've started putting together a torture test application and hit ...
>> something ... almost right away .  If this isn't supposed to happen
>> I'll send more details off-list for a double check.
> 
> No, certainly not. Please do send me the details. I'm assuming this is
> on an unmodified kernel, some of the patches that have been flung around
> in this thread are definitely not generally sane, it's just that knowing
> the context of what is being used makes that fine for test purposes.

I need to verify the kernel is in fact functionally unmodified, I've been caught out by my multiple test trees before.  Just wanted to verify this was the case before I check the kernel and try to put together a more reliable reproducer.

The location is exactly where I'd expect a problem with thread setup, and it's timing dependent...sound familiar? ;)

  reply	other threads:[~2023-11-15 18:40 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07 16:34 Regression in io_uring, leading to data corruption Timothy Pearson
2023-11-07 16:49 ` Jens Axboe
2023-11-07 16:57   ` Timothy Pearson
2023-11-07 17:14     ` Jens Axboe
2023-11-07 21:22 ` Jens Axboe
2023-11-07 21:39   ` Timothy Pearson
2023-11-07 21:46     ` Jens Axboe
2023-11-07 22:07       ` Timothy Pearson
2023-11-07 22:16         ` Jens Axboe
2023-11-07 22:29           ` Timothy Pearson
2023-11-07 22:44             ` Jens Axboe
2023-11-07 23:12               ` Timothy Pearson
2023-11-07 23:16                 ` Jens Axboe
2023-11-07 23:34                   ` Timothy Pearson
2023-11-07 23:52                     ` Jens Axboe
2023-11-08  0:02                       ` Timothy Pearson
2023-11-08  0:09                         ` Jens Axboe
2023-11-08  3:27                           ` Timothy Pearson
2023-11-08  3:30                             ` Timothy Pearson
2023-11-08  4:00                           ` Timothy Pearson
2023-11-08 15:10                             ` Jens Axboe
2023-11-08 15:14                               ` Jens Axboe
2023-11-08 17:10                                 ` Timothy Pearson
2023-11-08 17:26                                   ` Jens Axboe
2023-11-08 17:40                                     ` Timothy Pearson
2023-11-08 17:49                                       ` Jens Axboe
2023-11-08 17:57                                         ` Jens Axboe
2023-11-08 18:36                                           ` Timothy Pearson
2023-11-08 18:51                                             ` Timothy Pearson
2023-11-08 19:08                                               ` Jens Axboe
2023-11-08 19:06                                             ` Jens Axboe
2023-11-08 22:05                                               ` Jens Axboe
2023-11-08 22:15                                                 ` Timothy Pearson
2023-11-08 22:18                                                   ` Jens Axboe
2023-11-08 22:28                                                     ` Timothy Pearson
2023-11-08 23:58                                                     ` Jens Axboe
2023-11-09 15:12                                                       ` Jens Axboe
2023-11-09 17:00                                                         ` Timothy Pearson
2023-11-09 17:17                                                           ` Jens Axboe
2023-11-09 17:24                                                             ` Timothy Pearson
2023-11-09 17:30                                                               ` Jens Axboe
2023-11-09 17:36                                                                 ` Timothy Pearson
2023-11-09 17:38                                                                   ` Jens Axboe
2023-11-09 17:42                                                                     ` Timothy Pearson
2023-11-09 17:45                                                                       ` Jens Axboe
2023-11-09 18:20                                                                         ` tpearson
2023-11-10  3:51                                                                           ` Jens Axboe
2023-11-10  4:35                                                                             ` Timothy Pearson
2023-11-10  6:48                                                                               ` Timothy Pearson
2023-11-10 14:52                                                                                 ` Jens Axboe
2023-11-11 18:42                                                                                   ` Timothy Pearson
2023-11-11 18:58                                                                                     ` Jens Axboe
2023-11-11 19:04                                                                                       ` Timothy Pearson
2023-11-11 19:11                                                                                         ` Jens Axboe
2023-11-11 19:15                                                                                           ` Timothy Pearson
2023-11-11 19:23                                                                                             ` Jens Axboe
2023-11-11 21:57                                                                                     ` Timothy Pearson
2023-11-13 17:06                                                                                       ` Timothy Pearson
2023-11-13 17:39                                                                                         ` Jens Axboe
2023-11-13 19:02                                                                                           ` Timothy Pearson
2023-11-13 19:29                                                                                             ` Jens Axboe
2023-11-13 20:58                                                                                               ` Timothy Pearson
2023-11-13 21:22                                                                                                 ` Timothy Pearson
2023-11-13 22:15                                                                                                 ` Jens Axboe
2023-11-13 23:19                                                                                                   ` Timothy Pearson
2023-11-13 23:48                                                                                                     ` Jens Axboe
2023-11-14  0:04                                                                                                       ` Timothy Pearson
2023-11-14  0:13                                                                                                         ` Jens Axboe
2023-11-14  0:52                                                                                                           ` Timothy Pearson
2023-11-14  5:06                                                                                                             ` Timothy Pearson
2023-11-14 13:17                                                                                                               ` Jens Axboe
2023-11-14 16:59                                                                                                                 ` Timothy Pearson
2023-11-14 17:04                                                                                                                   ` Jens Axboe
2023-11-14 17:14                                                                                                                     ` Timothy Pearson
2023-11-14 17:17                                                                                                                       ` Jens Axboe
2023-11-14 17:21                                                                                                                         ` Timothy Pearson
2023-11-14 17:57                                                                                                                           ` Timothy Pearson
2023-11-14 18:02                                                                                                                             ` Jens Axboe
2023-11-14 18:12                                                                                                                               ` Timothy Pearson
2023-11-14 18:26                                                                                                                                 ` Jens Axboe
2023-11-15 11:03                                                                                                                                   ` Timothy Pearson
2023-11-15 16:46                                                                                                                                     ` Jens Axboe
2023-11-15 17:03                                                                                                                                       ` Timothy Pearson
2023-11-15 18:30                                                                                                                                         ` Jens Axboe
2023-11-15 18:35                                                                                                                                           ` Timothy Pearson
2023-11-15 18:37                                                                                                                                             ` Jens Axboe
2023-11-15 18:40                                                                                                                                               ` Timothy Pearson [this message]
2023-11-15 19:00                                                                                                                                           ` Jens Axboe
2023-11-16  3:28                                                                                                                                             ` Timothy Pearson
2023-11-16  3:46                                                                                                                                               ` Jens Axboe
2023-11-16  3:54                                                                                                                                                 ` Timothy Pearson
2023-11-19  0:16                                                                                                                                                   ` Timothy Pearson
2023-11-13 20:47                                                                                         ` Jens Axboe
2023-11-13 21:08                                                                                           ` Timothy Pearson
2023-11-10 14:48                                                                               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=904989402.47567402.1700073626048.JavaMail.zimbra@raptorengineeringinc.com \
    --to=tpearson@raptorengineering.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.