linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Jens Axboe <axboe@kernel.dk>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: Memory coherency issue with IO thread offloading?
Date: Fri, 24 Mar 2023 07:27:25 +0000	[thread overview]
Message-ID: <272cda99-3b1a-95cd-ce03-bc3d17d572ec@csgroup.eu> (raw)
In-Reply-To: <2b015a34-220e-674e-7301-2cf17ef45ed9@kernel.dk>

Hi,

Le 23/03/2023 à 19:54, Jens Axboe a écrit :
> Hi,
> 
> I got a report sent to me from mariadb, in where 5.10.158 works fine and
> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
> case. Beware that this email is long, as I'm trying to include
> everything that may be relevant...

Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ?

Christophe


> 
> The test case in question is pretty simple. On debian testing, do:
> 
> $ sudo apt-get install mariadb-test
> $ cd /usr/share/mysql/mysql-test
> $ ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql  --force encryption.innodb_encryption,innodb,undo0 --repeat=200
> 
> and if it fails, you'll see something like:
> 
> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
>          Test ended at 2023-03-23 16:55:17
> 
> CURRENT_TEST: encryption.innodb_encryption
> mysqltest: At line 11: query 'SET @start_global_value = @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE (1193): Unknown system variable 'innodb_encryption_threads'
> 
> The result from queries just before the failure was:
> SET @start_global_value = @@global.innodb_encryption_threads;
> 
>   - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/'
> ***Warnings generated in error logs during shutdown after running tests: encryption.innodb_encryption
> 
> 2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of maturity level experimental while the server is stable
> 2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=221]. You may have to recover from a backup.
> 
> where data read was not as expected.
> 
> Now, there are a number of io_uring changes between .158 and .162, as it
> includes the backport that brought 5.10-stable into line with what
> 5.15-stable includes. I'll spare you all the digging I did to vet those
> changes, but the key thing is that it STILL happens on 6.3-git on
> powerpc.
> 
> After ruling out many things, one key difference between 158 and 162 is
> that the former offloaded requests that could not be done nonblocking to
> a kthread, and 162 and newer offloads to an IO thread. An IO thread is
> just a normal thread created from the application submitting IO, the
> only difference is that it never exits to userspace. An IO thread has
> the same mm/files/you-name-it from the original task. It really is the
> same as a userspace thread created by the application The switch to IO
> threads was done exactly because of that, rather than rely on a fragile
> scheme of having the kthread worker assume all sorts of identify from
> the original task. surprises if things were missed. This is what caused
> most of the io_uring security issues in the past.
> 
> The IO that mariadb does in this test is pretty simple - a bunch of
> largish buffered writes with IORING_OP_WRITEV, and some smallish (16K)
> buffered reads with IORING_OP_READV.
> 
> Today I finally gave up and ran a basic experiment, which simply
> offloads the writes to a kthread. Since powerpc has an interesting
> memory coherency model, my suspicion was that the work involved with
> switching MMs for the kthread could just be the main difference here.
> The patch is really dumb and simple - rather than queue the write to an
> IO thread, it just offloads it to a kthread that then does
> kthread_use_mm(), perform write with the same write handler,
> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.
> 
> Which then leads me to the question, what about the IO thread offload
> makes this fail on powerpc (and no other arch I've tested on, including
> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
> thread in userspace in the application, and having that thread just
> perform the writes. Is there some magic involved with the kthread mm
> use/unuse that makes this sufficiently consistent on powerpc? I've tried
> any mix of isync()/mb and making the flush_dcache_page() unconditionally
> done in the filemap read/write helpers, and it still falls flat on its
> face with the offload to an IO thread.
> 
> I must clearly be missing something here, which is why I'm emailing the
> powerpc Gods for help :-)
> 

  reply	other threads:[~2023-03-24  7:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23 18:54 Memory coherency issue with IO thread offloading? Jens Axboe
2023-03-24  7:27 ` Christophe Leroy [this message]
2023-03-24 12:06   ` Jens Axboe
2023-03-25  0:15     ` Michael Ellerman
2023-03-25  0:20       ` Jens Axboe
2023-03-25  0:42 ` Michael Ellerman
2023-03-25  1:15   ` Jens Axboe
2023-03-25  1:20     ` Jens Axboe
2023-03-27  4:22       ` Nicholas Piggin
2023-03-27 12:39         ` Jens Axboe
2023-03-27 21:24           ` Jens Axboe
2023-03-28 12:51             ` Michael Ellerman
2023-03-28 16:38               ` Jens Axboe
2023-03-27 13:53     ` Michael Ellerman
2023-03-28  6:20 Daniel Black
2023-03-28 12:10 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=272cda99-3b1a-95cd-ce03-bc3d17d572ec@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=axboe@kernel.dk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).