linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Memory coherency issue with IO thread offloading?
@ 2023-03-23 18:54 Jens Axboe
  2023-03-24  7:27 ` Christophe Leroy
  2023-03-25  0:42 ` Michael Ellerman
  0 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2023-03-23 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy; +Cc: linuxppc-dev

Hi,

I got a report sent to me from mariadb, in where 5.10.158 works fine and
5.10.162 is broken. And in fact, current 6.3-rc also fails the test
case. Beware that this email is long, as I'm trying to include
everything that may be relevant...

The test case in question is pretty simple. On debian testing, do:

$ sudo apt-get install mariadb-test
$ cd /usr/share/mysql/mysql-test
$ ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql  --force encryption.innodb_encryption,innodb,undo0 --repeat=200

and if it fails, you'll see something like:

encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
        Test ended at 2023-03-23 16:55:17

CURRENT_TEST: encryption.innodb_encryption
mysqltest: At line 11: query 'SET @start_global_value = @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE (1193): Unknown system variable 'innodb_encryption_threads'

The result from queries just before the failure was:
SET @start_global_value = @@global.innodb_encryption_threads;

 - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/'
***Warnings generated in error logs during shutdown after running tests: encryption.innodb_encryption

2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of maturity level experimental while the server is stable
2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=221]. You may have to recover from a backup.

where data read was not as expected.

Now, there are a number of io_uring changes between .158 and .162, as it
includes the backport that brought 5.10-stable into line with what
5.15-stable includes. I'll spare you all the digging I did to vet those
changes, but the key thing is that it STILL happens on 6.3-git on
powerpc.

After ruling out many things, one key difference between 158 and 162 is
that the former offloaded requests that could not be done nonblocking to
a kthread, and 162 and newer offloads to an IO thread. An IO thread is
just a normal thread created from the application submitting IO, the
only difference is that it never exits to userspace. An IO thread has
the same mm/files/you-name-it from the original task. It really is the
same as a userspace thread created by the application The switch to IO
threads was done exactly because of that, rather than rely on a fragile
scheme of having the kthread worker assume all sorts of identify from
the original task. surprises if things were missed. This is what caused
most of the io_uring security issues in the past.

The IO that mariadb does in this test is pretty simple - a bunch of
largish buffered writes with IORING_OP_WRITEV, and some smallish (16K)
buffered reads with IORING_OP_READV.

Today I finally gave up and ran a basic experiment, which simply
offloads the writes to a kthread. Since powerpc has an interesting
memory coherency model, my suspicion was that the work involved with
switching MMs for the kthread could just be the main difference here.
The patch is really dumb and simple - rather than queue the write to an
IO thread, it just offloads it to a kthread that then does
kthread_use_mm(), perform write with the same write handler,
kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.

Which then leads me to the question, what about the IO thread offload
makes this fail on powerpc (and no other arch I've tested on, including
x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
thread in userspace in the application, and having that thread just
perform the writes. Is there some magic involved with the kthread mm
use/unuse that makes this sufficiently consistent on powerpc? I've tried
any mix of isync()/mb and making the flush_dcache_page() unconditionally
done in the filemap read/write helpers, and it still falls flat on its
face with the offload to an IO thread.

I must clearly be missing something here, which is why I'm emailing the
powerpc Gods for help :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread
* Memory coherency issue with IO thread offloading?
@ 2023-03-28  6:20 Daniel Black
  2023-03-28 12:10 ` Michael Ellerman
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Black @ 2023-03-28  6:20 UTC (permalink / raw)
  To: linuxppc-dev

Thanks Jens, Nick, Christophe and Michael for your work so far.

Apologies for the out of thread email.

Confirming MariabD-10.6+ is required( when we added liburing), and
previous versions used libaio (which tested without incident as mpe
retested).

We were (we're now back on the old good kernel Jens indicated) getting
failures like https://buildbot.mariadb.org/#/builders/231/builds/16857
in a container (of various distro userspaces) on bare metal.

bare metal end of /proc/cpuinfo

processor    : 127
cpu        : POWER9, altivec supported
clock        : 3283.000000MHz
revision    : 2.2 (pvr 004e 1202)

timebase    : 512000000
platform    : PowerNV
model        : 9006-22P
machine        : PowerNV 9006-22P
firmware    : OPAL
MMU        : Radix

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-03-28 16:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-23 18:54 Memory coherency issue with IO thread offloading? Jens Axboe
2023-03-24  7:27 ` Christophe Leroy
2023-03-24 12:06   ` Jens Axboe
2023-03-25  0:15     ` Michael Ellerman
2023-03-25  0:20       ` Jens Axboe
2023-03-25  0:42 ` Michael Ellerman
2023-03-25  1:15   ` Jens Axboe
2023-03-25  1:20     ` Jens Axboe
2023-03-27  4:22       ` Nicholas Piggin
2023-03-27 12:39         ` Jens Axboe
2023-03-27 21:24           ` Jens Axboe
2023-03-28 12:51             ` Michael Ellerman
2023-03-28 16:38               ` Jens Axboe
2023-03-27 13:53     ` Michael Ellerman
2023-03-28  6:20 Daniel Black
2023-03-28 12:10 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).