Re: [GIT PULL] io_uring changes for 5.9-rc1

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring <io-uring@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] io_uring changes for 5.9-rc1
Date: Mon, 3 Aug 2020 13:48:12 -0700	[thread overview]
Message-ID: <CAHk-=wgaxWMA7DVTQq+KxqaWHPDrXDuScX9orzRgxdi7SBfmoA@mail.gmail.com> (raw)
In-Reply-To: <50466810-9148-e245-7c1e-e7435b753582@kernel.dk>

On Sun, Aug 2, 2020 at 2:41 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> Lots of cleanups in here, hardening the code and/or making it easier to
> read and fixing buts, but a core feature/change too adding support for
> real async buffered reads. With the latter in place, we just need
> buffered write async support and we're done relying on kthreads for the
> fast path. In detail:

That async buffered reads handling the the page locking flag is a
mess, and I'm really happy I committed my page locking scalability
change early, so that the conflicts there caught it.

Re-using the page bit waitqueue types and exporting them?

That part is fine, I guess, particularly since it came from the
wait_bit_key thing and have a smell of being generic.

Taking a random part of wake_page_function(), and calling it
"wake_page_match()" even though that's not at all that it does?

Not ok.

Adding random kiocb helper functions to a core header file, when they
are only used in one place, and when they only make sense in that one
place?

Not ok.

When the function is called "wake_page_match()", you'd expect it
matches the wake page information, wouldn't it?

Yeah, it did that. And then it also checked whether the bit we're
waiting had been set again, because everybody ostensibly wanted that.
Except they don't any more, and that's not what the name really
implied anyway.

And kiocb_wait_page_queue_init() has absolutely zero business being in
<linux/filemap.h>. There are absolutely no valid uses of that thing
outside of the one place that calls it.

I tried to fix up the things I could.

That said, like a lot of io_uring code, this is some seriously opaque
code. You say you've done a lot of cleanups, but I'm not convinced
those cleanups are in any way offsetting adding yet another union (how
many bugs did the last one add?) and a magic flag of "use this part of
the union" now.

And I don't know what loads you use for testing that thing, or what
happens when the "lock_page_async()" case actually fails to lock, and
just calls back the io_async_buf_func() wakeup function when the page
has unlocked...

That function doesn't actually lock the page either, but does the task
work. I hope that work then knows to do the right thing, but it's
really opaque and hard to follow.

Anyway, I'm not entirely happy with doing these kinds of changes in
the merge resolution, but the alternative was to not do the pull at
all, and require you to do a lot of cleanups before I would pull it.
Maybe I should have done that.

So this is a slightly grumpy email about how io_uring is (a) still
making me very nervous about a very lackadaisical approach to things,
and having the codepaths so obscure that I'm not convinced it's not
horribly buggy. And (b) I fixed things up without really being able to
test them. I tested that the _normal_ paths still seem to work fine,
but..

I really think that whole thing needs a lot of comments, particularly
around the whole io_rw_should_retry() area.

A bit and legible comment about how it will be caught by the
generic_file_buffered_read() page locking code, how the two cases
differ (it might get caught by the "I'm just waiting for it to be
unlocked", but it could *also* get caught by the "lock page now"
case), and how it continues the request.

As it is, it bounces between the generic code and very io_uring
specific code in strange and not easy to follow ways.

I've pushed out my merge of this thing, but you might also want to
take a look at commit 2a9127fcf229 ("mm: rewrite
wait_on_page_bit_common() logic"). I particular, the comment about how
there's no point in even testing the page bit any more when you get
woken up.

I left that

        if (test_bit(key->bit_nr, &key->page->flags))
                return -1;

logic in io_async_buf_func() (but it's not in "wake_page_match()" any
more), but I suspect it's bogus and pointless, for the same reason
that it isn't done for normal page waits now. Maybe it's better to
just queue the actual work regardless, it will then be caught in the
_real_ lock_page() or whatever it ends up doing - and if it only
really wants to see the "uptodate" bit being set, and was just waiting
for IO to finish, then it never really cared about the page lock bit
at all, it just wanted to be notified about IO being done.

So this was a really long email to tell you - again - that I'm not
happy  with how fragile io_uring is, and how the code seems to be
almost intentionally written to *be* fragile. Complex and hard to
understand, and as a result it has had a fairly high rate of fairly
nasty bugs.

I'm hoping this isn't going to be yet another case of "nasty bugs
because of complexity and a total disregard for explaining what is
going on".

              Linus