From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Jens Axboe <axboe@kernel.dk>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: Memory coherency issue with IO thread offloading?
Date: Fri, 24 Mar 2023 07:27:25 +0000 [thread overview]
Message-ID: <272cda99-3b1a-95cd-ce03-bc3d17d572ec@csgroup.eu> (raw)
In-Reply-To: <2b015a34-220e-674e-7301-2cf17ef45ed9@kernel.dk>
Hi,
Le 23/03/2023 à 19:54, Jens Axboe a écrit :
> Hi,
>
> I got a report sent to me from mariadb, in where 5.10.158 works fine and
> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
> case. Beware that this email is long, as I'm trying to include
> everything that may be relevant...
Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ?
Christophe
>
> The test case in question is pretty simple. On debian testing, do:
>
> $ sudo apt-get install mariadb-test
> $ cd /usr/share/mysql/mysql-test
> $ ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql --force encryption.innodb_encryption,innodb,undo0 --repeat=200
>
> and if it fails, you'll see something like:
>
> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ] 3120
> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ] 3123
> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ] 3042
> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
> Test ended at 2023-03-23 16:55:17
>
> CURRENT_TEST: encryption.innodb_encryption
> mysqltest: At line 11: query 'SET @start_global_value = @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE (1193): Unknown system variable 'innodb_encryption_threads'
>
> The result from queries just before the failure was:
> SET @start_global_value = @@global.innodb_encryption_threads;
>
> - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/'
> ***Warnings generated in error logs during shutdown after running tests: encryption.innodb_encryption
>
> 2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of maturity level experimental while the server is stable
> 2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=221]. You may have to recover from a backup.
>
> where data read was not as expected.
>
> Now, there are a number of io_uring changes between .158 and .162, as it
> includes the backport that brought 5.10-stable into line with what
> 5.15-stable includes. I'll spare you all the digging I did to vet those
> changes, but the key thing is that it STILL happens on 6.3-git on
> powerpc.
>
> After ruling out many things, one key difference between 158 and 162 is
> that the former offloaded requests that could not be done nonblocking to
> a kthread, and 162 and newer offloads to an IO thread. An IO thread is
> just a normal thread created from the application submitting IO, the
> only difference is that it never exits to userspace. An IO thread has
> the same mm/files/you-name-it from the original task. It really is the
> same as a userspace thread created by the application The switch to IO
> threads was done exactly because of that, rather than rely on a fragile
> scheme of having the kthread worker assume all sorts of identify from
> the original task. surprises if things were missed. This is what caused
> most of the io_uring security issues in the past.
>
> The IO that mariadb does in this test is pretty simple - a bunch of
> largish buffered writes with IORING_OP_WRITEV, and some smallish (16K)
> buffered reads with IORING_OP_READV.
>
> Today I finally gave up and ran a basic experiment, which simply
> offloads the writes to a kthread. Since powerpc has an interesting
> memory coherency model, my suspicion was that the work involved with
> switching MMs for the kthread could just be the main difference here.
> The patch is really dumb and simple - rather than queue the write to an
> IO thread, it just offloads it to a kthread that then does
> kthread_use_mm(), perform write with the same write handler,
> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.
>
> Which then leads me to the question, what about the IO thread offload
> makes this fail on powerpc (and no other arch I've tested on, including
> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
> thread in userspace in the application, and having that thread just
> perform the writes. Is there some magic involved with the kthread mm
> use/unuse that makes this sufficiently consistent on powerpc? I've tried
> any mix of isync()/mb and making the flush_dcache_page() unconditionally
> done in the filemap read/write helpers, and it still falls flat on its
> face with the offload to an IO thread.
>
> I must clearly be missing something here, which is why I'm emailing the
> powerpc Gods for help :-)
>
next prev parent reply other threads:[~2023-03-24 7:28 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-23 18:54 Memory coherency issue with IO thread offloading? Jens Axboe
2023-03-24 7:27 ` Christophe Leroy [this message]
2023-03-24 12:06 ` Jens Axboe
2023-03-25 0:15 ` Michael Ellerman
2023-03-25 0:20 ` Jens Axboe
2023-03-25 0:42 ` Michael Ellerman
2023-03-25 1:15 ` Jens Axboe
2023-03-25 1:20 ` Jens Axboe
2023-03-27 4:22 ` Nicholas Piggin
2023-03-27 12:39 ` Jens Axboe
2023-03-27 21:24 ` Jens Axboe
2023-03-28 12:51 ` Michael Ellerman
2023-03-28 16:38 ` Jens Axboe
2023-03-27 13:53 ` Michael Ellerman
2023-03-28 6:20 Daniel Black
2023-03-28 12:10 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=272cda99-3b1a-95cd-ce03-bc3d17d572ec@csgroup.eu \
--to=christophe.leroy@csgroup.eu \
--cc=axboe@kernel.dk \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).