linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Black <daniel@mariadb.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Salvatore Bonaccorso <carnil@debian.org>,
	Pavel Begunkov <asml.silence@gmail.com>,
	linux-block@vger.kernel.org, io-uring@vger.kernel.org
Subject: Re: uring regression - lost write request
Date: Fri, 12 Nov 2021 17:25:31 +1100	[thread overview]
Message-ID: <CABVffEOEayBow2Oot7_jNHbXL0CQq9SZCWmiWEJjbT6gVC7WKg@mail.gmail.com> (raw)
In-Reply-To: <c92f97e5-1a38-e23f-f371-c00261cacb6d@kernel.dk>

On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 11/11/21 10:28 AM, Jens Axboe wrote:
> > On 11/11/21 9:55 AM, Jens Axboe wrote:
> >> On 11/11/21 9:19 AM, Jens Axboe wrote:
> >>> On 11/11/21 8:29 AM, Jens Axboe wrote:
> >>>> On 11/11/21 7:58 AM, Jens Axboe wrote:
> >>>>> On 11/11/21 7:30 AM, Jens Axboe wrote:
> >>>>>> On 11/10/21 11:52 PM, Daniel Black wrote:
> >>>>>>>> Would it be possible to turn this into a full reproducer script?
> >>>>>>>> Something that someone that knows nothing about mysqld/mariadb can just
> >>>>>>>> run and have it reproduce. If I install the 10.6 packages from above,
> >>>>>>>> then it doesn't seem to use io_uring or be linked against liburing.
> >>>>>>>
> >>>>>>> Sorry Jens.
> >>>>>>>
> >>>>>>> Hope containers are ok.
> >>>>>>
> >>>>>> Don't think I have a way to run that, don't even know what podman is
> >>>>>> and nor does my distro. I'll google a bit and see if I can get this
> >>>>>> running.
> >>>>>>
> >>>>>> I'm fine building from source and running from there, as long as I
> >>>>>> know what to do. Would that make it any easier? It definitely would
> >>>>>> for me :-)
> >>>>>
> >>>>> The podman approach seemed to work,

Thanks for bearing with it.

> >>>>> and I was able to run all three
> >>>>> steps. Didn't see any hangs. I'm going to try again dropping down
> >>>>> the innodb pool size (box only has 32G of RAM).
> >>>>>
> >>>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping
> >>>>> that up.

Good.

> >>>>>
> >>>>> Does your reproducer box have multiple NUMA nodes, or is it a single
> >>>>> socket/nod box?

It was NUMA. Pre 5.14.14 I could produce it on a simpler test on a single node.

> >>>>
> >>>> Doesn't seem to reproduce for me on current -git. What file system are
> >>>> you using?

Yes ext4.

> >>>
> >>> I seem to be able to hit it with ext4, guessing it has more cases that
> >>> punt to buffered IO. As I initially suspected, I think this is a race
> >>> with buffered file write hashing. I have a debug patch that just turns
> >>> a regular non-numa box into multi nodes, may or may not be needed be
> >>> needed to hit this, but I definitely can now. Looks like this:
> >>>
> >>> Node7 DUMP
> >>> index=0, nr_w=1, max=128, r=0, f=1, h=0
> >>>   w=ffff8f5e8b8470c0, hashed=1/0, flags=2
> >>>   w=ffff8f5e95a9b8c0, hashed=1/0, flags=2
> >>> index=1, nr_w=0, max=127877, r=0, f=0, h=0
> >>> free_list
> >>>   worker=ffff8f5eaf2e0540
> >>> all_list
> >>>   worker=ffff8f5eaf2e0540
> >>>
> >>> where we seed node7 in this case having two work items pending, but the
> >>> worker state is stalled on hash.
> >>>
> >>> The hash logic was rewritten as part of the io-wq worker threads being
> >>> changed for 5.11 iirc, which is why that was my initial suspicion here.
> >>>
> >>> I'll take a look at this and make a test patch. Looks like you are able
> >>> to test self-built kernels, is that correct?

I've been libreating prebuilt kernels, however on the path to self-built again.

Just searching for the holy penguin pee (from yaboot da(ze|ys)) to
peesign(sic) EFI kernels.
jk, working through docs:
https://docs.fedoraproject.org/en-US/quick-docs/kernel/build-custom-kernel/

> >> Can you try with this patch? It's against -git, but it will apply to
> >> 5.15 as well.
> >
> > I think that one covered one potential gap, but I just managed to
> > reproduce a stall even with it. So hang on testing that one, I'll send
> > you something more complete when I have confidence in it.
>
> Alright, give this one a go if you can. Against -git, but will apply to
> 5.15 as well.

Applied, built, attempting to boot....

  reply	other threads:[~2021-11-12  6:25 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-22  3:12 uring regression - lost write request Daniel Black
2021-10-22  9:10 ` Pavel Begunkov
2021-10-25  9:57   ` Pavel Begunkov
2021-10-25 11:09     ` Daniel Black
2021-10-25 11:25       ` Pavel Begunkov
2021-10-30  7:30         ` Salvatore Bonaccorso
2021-11-01  7:28           ` Daniel Black
2021-11-09 22:58             ` Daniel Black
2021-11-09 23:24               ` Jens Axboe
2021-11-10 18:01                 ` Jens Axboe
2021-11-11  6:52                   ` Daniel Black
2021-11-11 14:30                     ` Jens Axboe
2021-11-11 14:58                       ` Jens Axboe
2021-11-11 15:29                         ` Jens Axboe
2021-11-11 16:19                           ` Jens Axboe
2021-11-11 16:55                             ` Jens Axboe
2021-11-11 17:28                               ` Jens Axboe
2021-11-11 23:44                                 ` Jens Axboe
2021-11-12  6:25                                   ` Daniel Black [this message]
2021-11-12 19:19                                     ` Salvatore Bonaccorso
2021-11-14 20:33                                   ` Daniel Black
2021-11-14 20:55                                     ` Jens Axboe
2021-11-14 21:02                                       ` Salvatore Bonaccorso
2021-11-14 21:03                                         ` Jens Axboe
2021-11-24  3:27                                       ` Daniel Black
2021-11-24 15:28                                         ` Jens Axboe
2021-11-24 16:10                                           ` Jens Axboe
2021-11-24 16:18                                             ` Greg Kroah-Hartman
2021-11-24 16:22                                               ` Jens Axboe
2021-11-24 22:52                                                 ` Stefan Metzmacher
2021-11-25  0:58                                                   ` Jens Axboe
2021-11-25 16:35                                                     ` Stefan Metzmacher
2021-11-25 17:11                                                       ` Jens Axboe
2022-02-09 23:01                                                       ` Stefan Metzmacher
2022-02-10  0:10                                                         ` Daniel Black
2021-11-24 22:57                                                 ` Daniel Black

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABVffEOEayBow2Oot7_jNHbXL0CQq9SZCWmiWEJjbT6gVC7WKg@mail.gmail.com \
    --to=daniel@mariadb.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=carnil@debian.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).