From: Dmitry Kadashev <dkadashev@gmail.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>,
Christian Brauner <christian.brauner@ubuntu.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
io-uring <io-uring@vger.kernel.org>
Subject: Re: [PATCH 05/14] namei: prepare do_mkdirat for refactoring
Date: Wed, 21 Jul 2021 17:02:37 +0700 [thread overview]
Message-ID: <CAOKbgA6JAGioiqMu1CQbTe-Wpb2_HDszRaz+rf=qna-oieF75A@mail.gmail.com> (raw)
In-Reply-To: <YPbV5wnUNw3SsSfI@zeniv-ca.linux.org.uk>
.On Tue, Jul 20, 2021 at 8:58 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Jul 20, 2021 at 01:59:29PM +0700, Dmitry Kadashev wrote:
>
> > > This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL
> > > is the final stage of escalation; if we had to go there, there's no
> > > point being optimistic about the last dcache lookup, nevermind trying
> > > to retry the parent pathwalk if we fail with -ESTALE doing it.
> > >
> > > I'm not saying that it's something worth optimizing for; the problem
> > > is different - the logics makes no sense whatsoever that way. It's
> > > a matter of reader's cycles wasted on "what the fuck are we trying
> > > to do here?", not the CPU cycles wasted on execution.
> > >
> > > While we are at it, it makes no sense for filename_parentat() and its
> > > ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean,
> > > look at the sequence of calls in there. And try to make sense of
> > > it. Especially of the "OK, RCU attempt told us to sod off and try normal;
> > > here, let's call path_parentat() with LOOKUP_REVAL for flags and if it
> > > says -ESTALE, call it again with exact same arguments" part.
> > >
> > > Seriously, look at that from the point of view of somebody who tries
> > > to make sense of the entire thing
> >
> > OK, let me try to venture down that "change the way ESTALE retries are
> > done completely" path. The problem here is I'm not familiar with the
> > code enough to be sure the conversion is 1-to-1 (i.e. that we can't get
> > ESTALE from somewhere unexpected), and that retries are open-coded in
> > quite a few places it seems. Anyway, I'll try and dig in and come back
> > with either an RFC patch or some questions. Thanks for the feedback, Al.
>
> I'd try to look at the primitives that go through RCU/normal/REVAL series.
> They are all in fs/namei.c; filename_lookup(), filename_parentat(),
> do_filp_open() and do_filo_open_root(). The latter pair almost certainly
> is fine as-is.
>
> retry_estale() crap is limited to user_path_at/user_path_at_empty users,
> along with some filename_parentat() ones.
>
> There we follow that series with something that might give us ESTALE,
> and if it does, we want to repeat the whole thing in REVAL mode.
>
> OTOH, there are callers (and fairly similar ones, at that - look at e.g.
> AF_UNIX bind doing mknod) where we don't have that kind of logics.
>
> Question 1: which of those are lacking retry_estale(), even though they
> might arguably need it? Note that e.g. AF_UNIX bind uses kern_path_create(),
> so we need to look at all callchains leading to those, not just the ones
> in fs/namei.c guts.
>
> If most of those really want retry_estale, we'd be better off if we took
> the REVAL fallback out of filename_lookup() and filename_parentat()
> and turned massaged the users from
> do rcu/normal/reval lookups
> if failed, fuck off
> do other work
> if it fails with ESTALE
> do rcu/reval/reval (yes, really)
> if failed, fuck off
> do other work
> into
> do rcu/normal lookups
> if not failed
> do other work
> if something (including initial lookup) failed with ESTALE
> repeat the entire thing with LOOKUP_REVAL in the mix
> possibly with a helper function involved.
> For the ones that need retry_estale that's a win; for the rest it'd
> be boilerplate (that's basically the ones where "do other work" never
> fails with ESTALE).
>
> Question 2: how are "need retry_estale"/"fine just with ESTALE fallback
> in filename_{lookup,parentat}()" cases are distributed?
>
> If the majority is in "need retry_estale" class, then something similar
> to what's been outlined above would probably be a decent solution.
>
> Otherwise we'll need wrappers equivalent to current behaviour, and that's
> where it can get unpleasant - at which level in call chain do we put
> that wrapper? Sure, we can add filename_lookup_as_it_fucking_used_to_be().
> Except that it's not called directly by those "don't need retry_estale"
> users, so we'd need to provide such counterparts for them as well ;-/
>
> IOW, we need the call tree for filename_lookup()/filename_parentat(),
> with leaves (originators of call chain) marked with "does that user
> do retry_estale?" (and tracked far back for the answer to depend only
> upon the call site - if an intermediate can come from both kinds
> of places, we need to track back to its callers).
>
> Then we'll be able to see at which levels do we want those "as it used
> to behave" wrappers...
>
> If you want to dig around, that would probably be a reasonable place to
> start.
Thanks for the pointers! I am happy to dig into this. I can't spend much
time per week on this though, so it will take some time.
--
Dmitry Kadashev
next prev parent reply other threads:[~2021-07-21 10:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 01/14] namei: prepare do_rmdir for refactoring Dmitry Kadashev
2021-07-15 19:49 ` Al Viro
2021-07-20 6:52 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 02/14] namei: clean up do_rmdir retry logic Dmitry Kadashev
2021-07-15 20:02 ` Al Viro
2021-07-15 10:35 ` [PATCH 03/14] namei: prepare do_unlinkat for refactoring Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 04/14] namei: clean up do_unlinkat retry logic Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 05/14] namei: prepare do_mkdirat for refactoring Dmitry Kadashev
2021-07-15 20:17 ` Al Viro
2021-07-20 6:59 ` Dmitry Kadashev
2021-07-20 13:55 ` Al Viro
2021-07-21 10:02 ` Dmitry Kadashev [this message]
2021-07-15 10:35 ` [PATCH 06/14] namei: clean up do_mkdirat retry logic Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 07/14] namei: prepare do_mknodat for refactoring Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 08/14] namei: clean up do_mknodat retry logic Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 09/14] namei: prepare do_symlinkat for refactoring Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 10/14] namei: clean up do_symlinkat retry logic Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 11/14] namei: prepare do_linkat for refactoring Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 12/14] namei: clean up do_linkat retry logic Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 13/14] namei: prepare do_renameat2 for refactoring Dmitry Kadashev
2021-07-15 10:36 ` [PATCH 14/14] namei: clean up do_renameat2 retry logic Dmitry Kadashev
2021-07-15 10:39 ` [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOKbgA6JAGioiqMu1CQbTe-Wpb2_HDszRaz+rf=qna-oieF75A@mail.gmail.com' \
--to=dkadashev@gmail.com \
--cc=axboe@kernel.dk \
--cc=christian.brauner@ubuntu.com \
--cc=io-uring@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).