linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Daire Byrne <daire@dneg.com>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 01/10] VFS: support parallel updates in the one directory.
Date: Fri, 26 Aug 2022 12:06:55 -0700	[thread overview]
Message-ID: <CAHk-=wi_wwTxPTnFXsG8zdaem5YDnSd4OsCeP78yJgueQCb-1g@mail.gmail.com> (raw)
In-Reply-To: <166147984370.25420.13019217727422217511.stgit@noble.brown>

On Thu, Aug 25, 2022 at 7:16 PM NeilBrown <neilb@suse.de> wrote:
>
> If a filesystem supports parallel modification in directories, it sets
> FS_PAR_DIR_UPDATE on the file_system_type.  lookup_open() and the new
> lookup_hash_update() notice the flag and take a shared lock on the
> directory, and rely on a lock-bit in d_flags, much like parallel lookup
> relies on DCACHE_PAR_LOOKUP.

Ugh.

I absolutely believe in the DCACHE_PAR_LOOKUP model, and in "parallel
updates" being important, but I *despise* locking code like this

+       if (wq && IS_PAR_UPDATE(dir))
+               inode_lock_shared_nested(dir, I_MUTEX_PARENT);
+       else
+               inode_lock_nested(dir, I_MUTEX_PARENT);

and I really really hope there's some better model for this.

That "wq" test in particular is just absolutely disgusting. So now it
doesn't just depend on whether the directory supports parallel
updates, now the caller can choose whether to do the parallel thing or
not, and that's how "create" is different from "rename".

And that last difference is, I think, the one that makes me go "No. HELL NO".

Instead of it being up to the filesystem to say "I can do parallel
creates, but I need to serialize renames", this whole thing has been
set up to be about the caller making that decision.

That's just feels horribly horribly wrong.

Yes, I realize that to you that's just a "later patches will do
renames", but what if it really is a filesystem issue where the
filesystem can easily handle new names, but needs something else for
renames because it has various cross-directory issues, perhaps?

So I feel this is fundamentally wrong, and this ugliness is a small
effect of that wrongness.

I think we should strive for

 (a) make that 'wq' and DCACHE_PAR_LOOKUP bit be unconditional

 (b) aim for the inode lock being taken *after* the _lookup_hash(),
since the VFS layer side has to be able to handle the concurrency on
the dcache side anyway

 (c) at that point, the serialization really ends up being about the
call into the filesystem, and aim to simply move the
inode_lock{_shared]_nested() into the filesystem so that there's no
need for a flag and related conditional locking at all.

Because right now I think the main reason we cannot move the lock into
the filesystem is literally that we've made the lock cover not just
the filesystem part, but the "lookup and create dentry" part too.

But once you have that "DCACHE_PAR_LOOKUP" bit and the
d_alloc_parallel() logic to serialize a _particular_ dentry being
created (as opposed to serializing all the sleeping ops to that
directly), I really think we should strive to move the locking - that
no longer helps the VFS dcache layer - closer to the filesystem call
and eventually into it.

Please? I think these patches are "mostly going in the right
direction", but at the same time I feel like there's some serious
mis-design going on.

                Linus

  reply	other threads:[~2022-08-26 19:07 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26  2:10 [PATCH/RFC 00/10 v5] Improve scalability of directory operations NeilBrown
2022-08-26  2:10 ` [PATCH 09/10] VFS: add LOOKUP_SILLY_RENAME NeilBrown
2022-08-27  1:21   ` Al Viro
2022-08-29  3:15     ` NeilBrown
2022-08-26  2:10 ` [PATCH 01/10] VFS: support parallel updates in the one directory NeilBrown
2022-08-26 19:06   ` Linus Torvalds [this message]
2022-08-26 23:06     ` NeilBrown
2022-08-27  0:13       ` Linus Torvalds
2022-08-27  0:23         ` Al Viro
2022-08-27 21:14         ` Al Viro
2022-08-27  0:17     ` Al Viro
2022-09-01  0:31       ` NeilBrown
2022-09-01  3:44         ` Al Viro
2022-08-27  3:43   ` Al Viro
2022-08-29  1:59     ` NeilBrown
2022-09-03  0:06       ` Al Viro
2022-09-03  1:40         ` NeilBrown
2022-09-03  2:12           ` Al Viro
2022-09-03 17:52             ` Al Viro
2022-09-04 23:33               ` NeilBrown
2022-08-26  2:10 ` [PATCH 08/10] NFSD: allow parallel creates from nfsd NeilBrown
2022-08-27  4:37   ` Al Viro
2022-08-29  3:12     ` NeilBrown
2022-08-26  2:10 ` [PATCH 05/10] VFS: export done_path_update() NeilBrown
2022-08-26  2:10 ` [PATCH 02/10] VFS: move EEXIST and ENOENT tests into lookup_hash_update() NeilBrown
2022-08-26  2:10 ` [PATCH 06/10] VFS: support concurrent renames NeilBrown
2022-08-27  4:12   ` Al Viro
2022-08-29  3:08     ` NeilBrown
2022-08-26  2:10 ` [PATCH 10/10] NFS: support parallel updates in the one directory NeilBrown
2022-08-26 15:31   ` John Stoffel
2022-08-26 23:13     ` NeilBrown
2022-08-26  2:10 ` [PATCH 03/10] VFS: move want_write checks into lookup_hash_update() NeilBrown
2022-08-27  3:48   ` Al Viro
2022-08-26  2:10 ` [PATCH 04/10] VFS: move dput() and mnt_drop_write() into done_path_update() NeilBrown
2022-08-26  2:10 ` [PATCH 07/10] VFS: hold DCACHE_PAR_UPDATE lock across d_revalidate() NeilBrown
2022-08-26 14:42 ` [PATCH/RFC 00/10 v5] Improve scalability of directory operations John Stoffel
2022-08-26 23:30   ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wi_wwTxPTnFXsG8zdaem5YDnSd4OsCeP78yJgueQCb-1g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=chuck.lever@oracle.com \
    --cc=daire@dneg.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trond.myklebust@hammerspace.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).