All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 00/10 v5] Improve scalability of directory operations
@ 2022-08-26  2:10 NeilBrown
  2022-08-26  2:10 ` [PATCH 09/10] VFS: add LOOKUP_SILLY_RENAME NeilBrown
                   ` (10 more replies)
  0 siblings, 11 replies; 37+ messages in thread
From: NeilBrown @ 2022-08-26  2:10 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds, Daire Byrne, Trond Myklebust, Chuck Lever
  Cc: Linux NFS Mailing List, linux-fsdevel, LKML

[I made up "v5" - I haven't been counting]

VFS currently holds an exclusive lock on the directory while making
changes: add, remove, rename.
When multiple threads make changes in the one directory, the contention
can be noticeable.
In the case of NFS with a high latency link, this can easily be
demonstrated.  NFS doesn't really need VFS locking as the server ensures
correctness.

Lustre uses a single(?) directory for object storage, and has patches
for ext4 to support concurrent updates (Lustre accesses ext4 directly,
not via the VFS).

XFS (it is claimed) doesn't its own locking and doesn't need the VFS to
help at all.

This patch series allows filesystems to request a shared lock on
directories and provides serialisation on just the affected name, not the
whole directory.  It changes both the VFS and NFSD to use shared locks
when appropriate, and changes NFS to request shared locks.

The central enabling feature is a new dentry flag DCACHE_PAR_UPDATE
which acts as a bit-lock.  The ->d_lock spinlock is taken to set/clear
it, and wait_var_event() is used for waiting.  This flag is set on all
dentries that are part of a directory update, not just when a shared
lock is taken.

When a shared lock is taken we must use alloc_dentry_parallel() which
needs a wq which must remain until the update is completed.  To make use
of concurrent create, kern_path_create() would need to be passed a wq.
Rather than the churn required for that, we use exclusive locking when
no wq is provided.

One interesting consequence of this is that silly-rename becomes a
little more complex.  As the directory may not be exclusively locked,
the new silly-name needs to be locked (DCACHE_PAR_UPDATE) as well.
A new LOOKUP_SILLY_RENAME is added which helps implement this using
common code.

While testing I found some odd behaviour that was caused by
d_revalidate() racing with rename().  To resolve this I used
DCACHE_PAR_UPDATE to ensure they cannot race any more.

Testing, review, or other comments would be most welcome,

NeilBrown


---

NeilBrown (10):
      VFS: support parallel updates in the one directory.
      VFS: move EEXIST and ENOENT tests into lookup_hash_update()
      VFS: move want_write checks into lookup_hash_update()
      VFS: move dput() and mnt_drop_write() into done_path_update()
      VFS: export done_path_update()
      VFS: support concurrent renames.
      VFS: hold DCACHE_PAR_UPDATE lock across d_revalidate()
      NFSD: allow parallel creates from nfsd
      VFS: add LOOKUP_SILLY_RENAME
      NFS: support parallel updates in the one directory.


 fs/dcache.c            |  72 ++++-
 fs/namei.c             | 616 ++++++++++++++++++++++++++++++++---------
 fs/nfs/dir.c           |  28 +-
 fs/nfs/fs_context.c    |   6 +-
 fs/nfs/internal.h      |   3 +-
 fs/nfs/unlink.c        |  51 +++-
 fs/nfsd/nfs3proc.c     |  28 +-
 fs/nfsd/nfs4proc.c     |  29 +-
 fs/nfsd/nfsfh.c        |   9 +
 fs/nfsd/nfsproc.c      |  29 +-
 fs/nfsd/vfs.c          | 177 +++++-------
 include/linux/dcache.h |  28 ++
 include/linux/fs.h     |   5 +-
 include/linux/namei.h  |  39 ++-
 14 files changed, 799 insertions(+), 321 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2022-09-04 23:33 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26  2:10 [PATCH/RFC 00/10 v5] Improve scalability of directory operations NeilBrown
2022-08-26  2:10 ` [PATCH 09/10] VFS: add LOOKUP_SILLY_RENAME NeilBrown
2022-08-27  1:21   ` Al Viro
2022-08-29  3:15     ` NeilBrown
2022-08-26  2:10 ` [PATCH 01/10] VFS: support parallel updates in the one directory NeilBrown
2022-08-26 19:06   ` Linus Torvalds
2022-08-26 23:06     ` NeilBrown
2022-08-27  0:13       ` Linus Torvalds
2022-08-27  0:23         ` Al Viro
2022-08-27 21:14         ` Al Viro
2022-08-27  0:17     ` Al Viro
2022-09-01  0:31       ` NeilBrown
2022-09-01  3:44         ` Al Viro
2022-08-27  3:43   ` Al Viro
2022-08-29  1:59     ` NeilBrown
2022-09-03  0:06       ` Al Viro
2022-09-03  1:40         ` NeilBrown
2022-09-03  2:12           ` Al Viro
2022-09-03 17:52             ` Al Viro
2022-09-04 23:33               ` NeilBrown
2022-08-26  2:10 ` [PATCH 08/10] NFSD: allow parallel creates from nfsd NeilBrown
2022-08-27  4:37   ` Al Viro
2022-08-29  3:12     ` NeilBrown
2022-08-26  2:10 ` [PATCH 05/10] VFS: export done_path_update() NeilBrown
2022-08-26  2:10 ` [PATCH 02/10] VFS: move EEXIST and ENOENT tests into lookup_hash_update() NeilBrown
2022-08-26  2:10 ` [PATCH 06/10] VFS: support concurrent renames NeilBrown
2022-08-27  4:12   ` Al Viro
2022-08-29  3:08     ` NeilBrown
2022-08-26  2:10 ` [PATCH 10/10] NFS: support parallel updates in the one directory NeilBrown
2022-08-26 15:31   ` John Stoffel
2022-08-26 23:13     ` NeilBrown
2022-08-26  2:10 ` [PATCH 03/10] VFS: move want_write checks into lookup_hash_update() NeilBrown
2022-08-27  3:48   ` Al Viro
2022-08-26  2:10 ` [PATCH 04/10] VFS: move dput() and mnt_drop_write() into done_path_update() NeilBrown
2022-08-26  2:10 ` [PATCH 07/10] VFS: hold DCACHE_PAR_UPDATE lock across d_revalidate() NeilBrown
2022-08-26 14:42 ` [PATCH/RFC 00/10 v5] Improve scalability of directory operations John Stoffel
2022-08-26 23:30   ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.