linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: Memory coherency issue with IO thread offloading?
Date: Fri, 24 Mar 2023 19:15:57 -0600	[thread overview]
Message-ID: <872a1b2b-5fe6-e1ac-5dda-dc806b21b3f5@kernel.dk> (raw)
In-Reply-To: <87h6u9u0e0.fsf@mpe.ellerman.id.au>

On 3/24/23 6:42?PM, Michael Ellerman wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>> Hi,
> 
> Hi Jens,
> 
> Thanks for the report.
> 
>> I got a report sent to me from mariadb, in where 5.10.158 works fine and
>> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
>> case. Beware that this email is long, as I'm trying to include
>> everything that may be relevant...
>>
>> The test case in question is pretty simple. On debian testing, do:
>>
>> $ sudo apt-get install mariadb-test
>> $ cd /usr/share/mysql/mysql-test
>> $ ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql  --force encryption.innodb_encryption,innodb,undo0 --repeat=200
> 
> I mostly use Fedora, the package name is the same but the mtr binary
> ends up in /usr/share/mysql.
> 
>> and if it fails, you'll see something like:
>>
>> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
>> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
>> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
>> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
>>         Test ended at 2023-03-23 16:55:17
> 
> I haven't been able to get this to fail yet. I've done several runs with
> --repeat=500 and haven't seen any errors yet.
> 
> Are there any CONFIG options I'd need to trip this?

I don't think you need any special CONFIG options. I'll attach my config
here, and I know the default distro one hits it too. But perhaps the
mariadb version is not new enough? I think you need 10.6 or above, as
will use io_uring by default. What version are you running?

> ...
>> Today I finally gave up and ran a basic experiment, which simply
>> offloads the writes to a kthread. Since powerpc has an interesting
>> memory coherency model, my suspicion was that the work involved with
>> switching MMs for the kthread could just be the main difference here.
>> The patch is really dumb and simple - rather than queue the write to an
>> IO thread, it just offloads it to a kthread that then does
>> kthread_use_mm(), perform write with the same write handler,
>> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
>> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.
> 
> Can you share the patch that does that? It would help me track down
> where exactly in the io_uring code you're talking about.

Shoot yes, I actually meant to attach it but then forgot. Below!

>> Which then leads me to the question, what about the IO thread offload
>> makes this fail on powerpc (and no other arch I've tested on, including
>> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
>> thread in userspace in the application, and having that thread just
>> perform the writes. Is there some magic involved with the kthread mm
>> use/unuse that makes this sufficiently consistent on powerpc? I've tried
>> any mix of isync()/mb and making the flush_dcache_page() unconditionally
>> done in the filemap read/write helpers, and it still falls flat on its
>> face with the offload to an IO thread.
> 
> My first guess would be that there's some missing barriers between the
> thread that queues the IO and the IO worker thread. 

That was my guess too, and I consulted Paul McKenney as well on that.
And he had some ideas of course, in terms of ordering of the CQ ring.
But tried it all out, and it still failed in the same way...

> I think you're using schedule_work() for that though, which should be a
> full barrier. Could it be on the completion side?

queue_work() for the patch, before that it's io-wq which is an internal
IO thread worker pool. The latter just needs a spin_lock() around
queueing the work, and then a wake of the task. Typing this out, maybe
this is where a barrier is now missing? If the IO thread is already
running rather than sleeping?

> I can't think of any magic in kthread_use_mm() other than extra
> barriers. In particular kthread_unuse_mm() has an
> smp_mb__after_spinlock() which is a full memory barrier on powerpc but
> is a nop on some other architectures, x86 at least.

Yeah, I did poke at kthread_use_mm() and the related powerpc bits, but
didn't immediately find anything that seemed promising in this regard.

>> I must clearly be missing something here, which is why I'm emailing the
>> powerpc Gods for help :-)
> 
> Unfortunately the true God of powerpc memory ordering has left us and
> ascended into the Metaverse ;)

;-)

-- 
Jens Axboe


  reply	other threads:[~2023-03-25  1:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23 18:54 Memory coherency issue with IO thread offloading? Jens Axboe
2023-03-24  7:27 ` Christophe Leroy
2023-03-24 12:06   ` Jens Axboe
2023-03-25  0:15     ` Michael Ellerman
2023-03-25  0:20       ` Jens Axboe
2023-03-25  0:42 ` Michael Ellerman
2023-03-25  1:15   ` Jens Axboe [this message]
2023-03-25  1:20     ` Jens Axboe
2023-03-27  4:22       ` Nicholas Piggin
2023-03-27 12:39         ` Jens Axboe
2023-03-27 21:24           ` Jens Axboe
2023-03-28 12:51             ` Michael Ellerman
2023-03-28 16:38               ` Jens Axboe
2023-03-27 13:53     ` Michael Ellerman
2023-03-28  6:20 Daniel Black
2023-03-28 12:10 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=872a1b2b-5fe6-e1ac-5dda-dc806b21b3f5@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).