linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daire Byrne <daire@dneg.com>
To: NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC 00/12] Allow concurrent directory updates.
Date: Wed, 15 Jun 2022 14:46:14 +0100	[thread overview]
Message-ID: <CAPt2mGNjWXad6e7nSUTu=0ez1qU1wBNegrntgHKm5hOeBs5gQA@mail.gmail.com> (raw)
In-Reply-To: <165516173293.21248.14587048046993234326.stgit@noble.brown>

Neil,

Firstly, thank you for your work on this. I'm probably the main
beneficiary of this (NFSD) effort atm so I feel extra special and
lucky!

I have done some quick artificial tests similar to before where I am
using a NFS server and client separated by an (extreme) 200ms of
latency (great for testing parallelism). I am only using NFSv3 due to
the NFSD_CACHE_SIZE_SLOTS_PER_SESSION parallelism limitations for
NFSv4.

Firstly, a client direct to server (VFS) with 10 simultaneous create
processes hitting the same directory:

client1 # for x in {1..1000}; do
    echo /srv/server1/data/touch.$x
done | xargs -n1 -P 10 -iX -t touch X 2>&1 | pv -l -a >|/dev/null

Without the patch ( on the client), this reaches a steady state of 2.4
creates/s and increasing the number of parallel create processes does
not change this aggregate performance.

With the patch, the creation rate increases to 15 creates/s and with
100 processes, it further scales up to 121 creates/s.

Now for the re-export case (NFSD) where an intermediary server
re-exports the originating server (200ms away) to clients on it's
local LAN, there is no noticeable improvement for a single (not
patched) client. But we do see an aggregate improvement when we use
multiple clients at once.

# pdsh -Rssh -w 'client[1-10]' 'for x in {1..1000}; do echo
/srv/reexport1/data/$(hostname -s).$x; done | xargs -n1 -P 10 -iX -t
touch X 2>&1' | pv -l -a >|/dev/null

Without the patch applied to the reexport server, the aggregate is
around 2.2 create/s which is similar to doing it directly to the
originating server from a single client (above).

With the patch, the aggregate increases to 15 creates/s for 10 clients
which again matches the results of a single patched client. Not quite
a x10 increase but a healthy improvement nonetheless.

However, it is at this point that I started to experience some
stability issues with the re-export server that are not present with
the vanilla unpatched v5.19-rc2 kernel. In particular the knfsd
threads start to lock up with stack traces like this:

[ 1234.460696] INFO: task nfsd:5514 blocked for more than 123 seconds.
[ 1234.461481]       Tainted: G        W   E     5.19.0-1.dneg.x86_64 #1
[ 1234.462289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1234.463227] task:nfsd            state:D stack:    0 pid: 5514
ppid:     2 flags:0x00004000
[ 1234.464212] Call Trace:
[ 1234.464677]  <TASK>
[ 1234.465104]  __schedule+0x2a9/0x8a0
[ 1234.465663]  schedule+0x55/0xc0
[ 1234.466183]  ? nfs_lookup_revalidate_dentry+0x3a0/0x3a0 [nfs]
[ 1234.466995]  __nfs_lookup_revalidate+0xdf/0x120 [nfs]
[ 1234.467732]  ? put_prev_task_stop+0x170/0x170
[ 1234.468374]  nfs_lookup_revalidate+0x15/0x20 [nfs]
[ 1234.469073]  lookup_dcache+0x5a/0x80
[ 1234.469639]  lookup_one_unlocked+0x59/0xa0
[ 1234.470244]  lookup_one_len_unlocked+0x1d/0x20
[ 1234.470951]  nfsd_lookup_dentry+0x190/0x470 [nfsd]
[ 1234.471663]  nfsd_lookup+0x88/0x1b0 [nfsd]
[ 1234.472294]  nfsd3_proc_lookup+0xb4/0x100 [nfsd]
[ 1234.473012]  nfsd_dispatch+0x161/0x290 [nfsd]
[ 1234.473689]  svc_process_common+0x48a/0x620 [sunrpc]
[ 1234.474402]  ? nfsd_svc+0x330/0x330 [nfsd]
[ 1234.475038]  ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd]
[ 1234.475772]  svc_process+0xbc/0xf0 [sunrpc]
[ 1234.476408]  nfsd+0xda/0x190 [nfsd]
[ 1234.477011]  kthread+0xf0/0x120
[ 1234.477522]  ? kthread_complete_and_exit+0x20/0x20
[ 1234.478199]  ret_from_fork+0x22/0x30
[ 1234.478755]  </TASK>

For whatever reason, they seem to affect our Netapp mounts and
re-exports rather than our originating Linux NFS servers (against
which all tests were done). This may be related to the fact that those
Netapps serve our home directories so there could be some unique
locking patterns going on there.

This issue made things a bit too unstable to test at larger scales or
with our production workloads.

So all in all, the performance improvements in the knfsd re-export
case is looking great and we have real world use cases that this helps
with (batch processing workloads with latencies >10ms). If we can
figure out the hanging knfsd threads, then I can test it more heavily.

Many thanks,

Daire

On Tue, 14 Jun 2022 at 00:19, NeilBrown <neilb@suse.de> wrote:
>
> VFS currently holds an exclusive lock on a directory during create,
> unlink, rename.  This imposes serialisation on all filesystems though
> some may not benefit from it, and some may be able to provide finer
> grained locking internally, thus reducing contention.
>
> This series allows the filesystem to request that the inode lock be
> shared rather than exclusive.  In that case an exclusive lock will be
> held on the dentry instead, much as is done for parallel lookup.
>
> The NFS filesystem can easily support concurrent updates (server does
> any needed serialiation) so it is converted.
>
> This series also converts nfsd to use the new interfaces so concurrent
> incoming NFS requests in the one directory can be handled concurrently.
>
> As a net result, if an NFS mounted filesystem is reexported over NFS,
> then multiple clients can create files in a single directory and all
> synchronisation will be handled on the final server.  This helps hid
> latency on link from client to server.
>
> I include a few nfsd patches that aren't strictly needed for this work,
> but seem to be a logical consequence of the changes that I did have to
> make.
>
> I have only tested this lightly.  In particular the rename support is
> quite new and I haven't tried to break it yet.
>
> I post this for general review, and hopefully extra testing...  Daire
> Byrne has expressed interest in the NFS re-export parallelism.
>
> NeilBrown
>
>
> ---
>
> NeilBrown (12):
>       VFS: support parallel updates in the one directory.
>       VFS: move EEXIST and ENOENT tests into lookup_hash_update()
>       VFS: move want_write checks into lookup_hash_update()
>       VFS: move dput() and mnt_drop_write() into done_path_update()
>       VFS: export done_path_update()
>       VFS: support concurrent renames.
>       NFS: support parallel updates in the one directory.
>       nfsd: allow parallel creates from nfsd
>       nfsd: support concurrent renames.
>       nfsd: reduce locking in nfsd_lookup()
>       nfsd: use (un)lock_inode instead of fh_(un)lock
>       nfsd: discard fh_locked flag and fh_lock/fh_unlock
>
>
>  fs/dcache.c            |  59 ++++-
>  fs/namei.c             | 578 ++++++++++++++++++++++++++++++++---------
>  fs/nfs/dir.c           |  29 ++-
>  fs/nfs/inode.c         |   2 +
>  fs/nfs/unlink.c        |   5 +-
>  fs/nfsd/nfs2acl.c      |   6 +-
>  fs/nfsd/nfs3acl.c      |   4 +-
>  fs/nfsd/nfs3proc.c     |  37 +--
>  fs/nfsd/nfs4acl.c      |   7 +-
>  fs/nfsd/nfs4proc.c     |  61 ++---
>  fs/nfsd/nfs4state.c    |   8 +-
>  fs/nfsd/nfsfh.c        |  10 +-
>  fs/nfsd/nfsfh.h        |  58 +----
>  fs/nfsd/nfsproc.c      |  31 +--
>  fs/nfsd/vfs.c          | 243 ++++++++---------
>  fs/nfsd/vfs.h          |   8 +-
>  include/linux/dcache.h |  27 ++
>  include/linux/fs.h     |   1 +
>  include/linux/namei.h  |  30 ++-
>  19 files changed, 791 insertions(+), 413 deletions(-)
>
> --
> Signature
>

  parent reply	other threads:[~2022-06-15 13:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 23:18 [PATCH RFC 00/12] Allow concurrent directory updates NeilBrown
2022-06-13 23:18 ` [PATCH 04/12] VFS: move dput() and mnt_drop_write() into done_path_update() NeilBrown
2022-06-13 23:18 ` [PATCH 03/12] VFS: move want_write checks into lookup_hash_update() NeilBrown
2022-06-13 23:18 ` [PATCH 02/12] VFS: move EEXIST and ENOENT tests " NeilBrown
2022-06-13 23:18 ` [PATCH 01/12] VFS: support parallel updates in the one directory NeilBrown
2022-06-13 23:18 ` [PATCH 05/12] VFS: export done_path_update() NeilBrown
2022-06-13 23:18 ` [PATCH 08/12] nfsd: allow parallel creates from nfsd NeilBrown
2022-06-24 14:43   ` Chuck Lever III
2022-06-28 22:35   ` Chuck Lever III
2022-06-28 23:09     ` NeilBrown
2022-07-04 17:17       ` Chuck Lever III
2022-06-13 23:18 ` [PATCH 07/12] NFS: support parallel updates in the one directory NeilBrown
2022-06-13 23:18 ` [PATCH 11/12] nfsd: use (un)lock_inode instead of fh_(un)lock NeilBrown
2022-06-24 14:43   ` Chuck Lever III
2022-06-13 23:18 ` [PATCH 06/12] VFS: support concurrent renames NeilBrown
2022-06-14  4:35   ` kernel test robot
2022-06-14 12:37   ` kernel test robot
2022-06-14 13:28   ` kernel test robot
2022-06-26 13:07   ` [VFS] 46a2afd9f6: ltp.rename10.fail kernel test robot
2022-06-13 23:18 ` [PATCH 12/12] nfsd: discard fh_locked flag and fh_lock/fh_unlock NeilBrown
2022-06-24 14:43   ` Chuck Lever III
2022-06-13 23:18 ` [PATCH 10/12] nfsd: reduce locking in nfsd_lookup() NeilBrown
2022-06-24 14:43   ` Chuck Lever III
2022-06-13 23:18 ` [PATCH 09/12] nfsd: support concurrent renames NeilBrown
2022-06-24 14:43   ` Chuck Lever III
2022-06-15 13:46 ` Daire Byrne [this message]
2022-06-16  0:55   ` [PATCH RFC 00/12] Allow concurrent directory updates NeilBrown
2022-06-16 10:48     ` Daire Byrne
2022-06-17  5:49       ` NeilBrown
2022-06-17 15:27         ` Daire Byrne
2022-06-20 10:18           ` Daire Byrne
2022-06-16 13:49     ` Anna Schumaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPt2mGNjWXad6e7nSUTu=0ez1qU1wBNegrntgHKm5hOeBs5gQA@mail.gmail.com' \
    --to=daire@dneg.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trond.myklebust@hammerspace.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).