Re: [PATCH 2/2] fs: expose LOOKUP_NONBLOCK through openat2() RESOLVE_NONBLOCK

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 2/2] fs: expose LOOKUP_NONBLOCK through openat2() RESOLVE_NONBLOCK
Date: Fri, 11 Dec 2020 10:07:08 -0800	[thread overview]
Message-ID: <CAHk-=wjyquiZzQ9ws7s_NPREts=FgGO3oY26da77Eva4MRsuyQ@mail.gmail.com> (raw)
In-Reply-To: <20201211034541.GE3913616@dread.disaster.area>

On Thu, Dec 10, 2020 at 7:45 PM Dave Chinner <david@fromorbit.com> wrote:
>
> Part of the problem we have with the non-blocking behaviour is that
> the user interfaces have been under specified, poorly reviewed and
> targetted a single specific use case on a single filesystem rather
> than generic behaviour. And mostly they lack the necessary test
> coverage to ensure all filesystems behave the same way and to inform
> us of a regression that *would break userspace applications*.

Fair enough. I just didn't really see much a concern here, exactly
because this ends up not being a hard guarantee in the first place
(but the reason I suggested then adding that RESOLVE_NONBLOCK is so
that it could be tested without having to rely on timing and io_uring
that _should_ get the same result regardless in the end).

But the second reason I don't see much concern is exactly because it
wouldn't affect individual filesystems. There's nothing new going on
as far as  filesystem is concerned.

> Yes, I recognise and accept that some of the problems are partially
> my fault. I also have a habit of trying to learn from the mistakes
> I've made and then take steps to ensure that *we do not make those
> same mistakes again*.

So the third reason I reacted was because we have a history, and you
have traditionally not ever really cared unless it's about xfs and IO.
Which this thing would very explicitly not be about. The low-level
filesystem would never see the semantics at all, and could never get
it wrong.

Well, a filesystem could "get it wrong" in the same sense that it can
get the current LOOKUP_RCU wrong, of course.

But that would be either an outright bug and a correctness problem -
sleeping in RCU context - or be a performance problem - returning
ECHILD very easily due to other reasons. And it would be entirely
unrelated to the nonblocking path open, because it would be a
performance issue even _normally_, just not visible as semantics.

And that's the second reason I like this, and would like to see this,
and see RESOLVE_NONBLOCK: exactly because we have _had_ those subtle
issues that aren't actually correctness issues, but only "oh, this
situation always takes us out of RCU lookup, and it's a very subtle
performance problem".

For example, it used to be that whenever we saw a symlink, we'd throw
up our hands and say "we can't do this in RCU lookup" and give up.
That wasn't a low-level filesystem problem, it was literally at the
VFS layer, because the RCU lookup was fairly limited.

Or some of the security models (well, _all_ of them) used to just say
"oh, I can't do this check in RCU mode" and forced the slow path.

It was very noticeable from a performance angle under certain loads,
because RCU lookup is just _so_ much faster when you otherwise get
locking and reference counting cache conflicts. Yes, yes, Al fixed the
symlinks long ago, but we know RCU lookup still gives up fairly easily
in other circumstances.

And RCU lookup giving up eagerly is fine, of course. The whole point
is that it's an optimistic "let's see if we can do this really
quickly" interface that just works _most_ of the time.

But that also makes it easy to miss things, because it's so hard to
test when it's purely about latency and all the operations will retry
and get the right result in the end. The "noticeable performance
issues" are generally not very noticeable at the level of an
individual operation - you need to do a lot of them, and often in
parallel, to see the _big_ benefits.

So RESOLVE_NONBLOCK would be a nice way to perhaps add automated
testing for "did we screw up RCU pathname lookup", in addition to
perhaps making it easier to find the cases where we currently give up
too quickly just because it was _fairly_ rare and nobody had much
visibility into that case.

And we have had that "oh, RCU lookup broke" a couple of times by
mistake - and then it takes a while to notice, because it's purely
visible as a performance bug and not necessarily all _that_ obvious -
exactly because it's purely about performance, and the semantic end
result is the same.

           Linus