All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring <io-uring@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] io_uring changes for 5.9-rc1
Date: Mon, 3 Aug 2020 13:48:12 -0700	[thread overview]
Message-ID: <CAHk-=wgaxWMA7DVTQq+KxqaWHPDrXDuScX9orzRgxdi7SBfmoA@mail.gmail.com> (raw)
In-Reply-To: <50466810-9148-e245-7c1e-e7435b753582@kernel.dk>

On Sun, Aug 2, 2020 at 2:41 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> Lots of cleanups in here, hardening the code and/or making it easier to
> read and fixing buts, but a core feature/change too adding support for
> real async buffered reads. With the latter in place, we just need
> buffered write async support and we're done relying on kthreads for the
> fast path. In detail:

That async buffered reads handling the the page locking flag is a
mess, and I'm really happy I committed my page locking scalability
change early, so that the conflicts there caught it.

Re-using the page bit waitqueue types and exporting them?

That part is fine, I guess, particularly since it came from the
wait_bit_key thing and have a smell of being generic.

Taking a random part of wake_page_function(), and calling it
"wake_page_match()" even though that's not at all that it does?

Not ok.

Adding random kiocb helper functions to a core header file, when they
are only used in one place, and when they only make sense in that one
place?

Not ok.

When the function is called "wake_page_match()", you'd expect it
matches the wake page information, wouldn't it?

Yeah, it did that. And then it also checked whether the bit we're
waiting had been set again, because everybody ostensibly wanted that.
Except they don't any more, and that's not what the name really
implied anyway.

And kiocb_wait_page_queue_init() has absolutely zero business being in
<linux/filemap.h>. There are absolutely no valid uses of that thing
outside of the one place that calls it.

I tried to fix up the things I could.

That said, like a lot of io_uring code, this is some seriously opaque
code. You say you've done a lot of cleanups, but I'm not convinced
those cleanups are in any way offsetting adding yet another union (how
many bugs did the last one add?) and a magic flag of "use this part of
the union" now.

And I don't know what loads you use for testing that thing, or what
happens when the "lock_page_async()" case actually fails to lock, and
just calls back the io_async_buf_func() wakeup function when the page
has unlocked...

That function doesn't actually lock the page either, but does the task
work. I hope that work then knows to do the right thing, but it's
really opaque and hard to follow.

Anyway, I'm not entirely happy with doing these kinds of changes in
the merge resolution, but the alternative was to not do the pull at
all, and require you to do a lot of cleanups before I would pull it.
Maybe I should have done that.

So this is a slightly grumpy email about how io_uring is (a) still
making me very nervous about a very lackadaisical approach to things,
and having the codepaths so obscure that I'm not convinced it's not
horribly buggy. And (b) I fixed things up without really being able to
test them. I tested that the _normal_ paths still seem to work fine,
but..

I really think that whole thing needs a lot of comments, particularly
around the whole io_rw_should_retry() area.

A bit and legible comment about how it will be caught by the
generic_file_buffered_read() page locking code, how the two cases
differ (it might get caught by the "I'm just waiting for it to be
unlocked", but it could *also* get caught by the "lock page now"
case), and how it continues the request.

As it is, it bounces between the generic code and very io_uring
specific code in strange and not easy to follow ways.

I've pushed out my merge of this thing, but you might also want to
take a look at commit 2a9127fcf229 ("mm: rewrite
wait_on_page_bit_common() logic"). I particular, the comment about how
there's no point in even testing the page bit any more when you get
woken up.

I left that

        if (test_bit(key->bit_nr, &key->page->flags))
                return -1;

logic in io_async_buf_func() (but it's not in "wake_page_match()" any
more), but I suspect it's bogus and pointless, for the same reason
that it isn't done for normal page waits now. Maybe it's better to
just queue the actual work regardless, it will then be caught in the
_real_ lock_page() or whatever it ends up doing - and if it only
really wants to see the "uptodate" bit being set, and was just waiting
for IO to finish, then it never really cared about the page lock bit
at all, it just wanted to be notified about IO being done.

So this was a really long email to tell you - again - that I'm not
happy  with how fragile io_uring is, and how the code seems to be
almost intentionally written to *be* fragile. Complex and hard to
understand, and as a result it has had a fairly high rate of fairly
nasty bugs.

I'm hoping this isn't going to be yet another case of "nasty bugs
because of complexity and a total disregard for explaining what is
going on".

              Linus

  reply	other threads:[~2020-08-03 20:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-02 21:41 [GIT PULL] io_uring changes for 5.9-rc1 Jens Axboe
2020-08-03 20:48 ` Linus Torvalds [this message]
2020-08-03 20:53   ` Linus Torvalds
2020-08-03 21:10     ` Konstantin Ryabitsev
2020-08-03 22:31       ` Jens Axboe
2020-08-03 22:30   ` Jens Axboe
2020-08-03 23:18     ` Jens Axboe
2020-08-03 23:31       ` Jens Axboe
2020-08-03 23:49         ` Linus Torvalds
2020-08-03 23:56           ` Jens Axboe
2020-08-04  0:11             ` Linus Torvalds
2020-08-03 23:34       ` Linus Torvalds
2020-08-03 23:43         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgaxWMA7DVTQq+KxqaWHPDrXDuScX9orzRgxdi7SBfmoA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.