From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 0/7] repair: Phase 6 performance improvements
Date: Thu, 22 Oct 2020 16:15:30 +1100 [thread overview]
Message-ID: <20201022051537.2286402-1-david@fromorbit.com> (raw)
Hi folks
Phase 6 is single threaded, processing a single AG at a time and a
single directory inode at a time. Phase 6 if often IO latency bound
despite the prefetching it does, resulting in low disk utilisation
and high runtimes. The solution for this is the same as phase 3 and
4 - scan multiple AGs at once for directory inodes to process. This
patch set enables phase 6 to scan multiple AGS at once, and hence
requires concurrent updates of inode records as tehy can be accessed
and modified by multiple scanning threads now. We also need to
protect the bad inodes list from concurrent access and then we can
enable concurrent processing of directories.
However, directory entry checking and reconstruction can also be CPU
bound - large directories overwhelm the directory name hash
structures because the algorithms have poor scalability - one is O(n
+ n^2), another is O(n^2) when the number of dirents greatly
outsizes the hash table sizes. Hence we need to more than just
parallelise across AGs - we need to parallelise processing within
AGs so that a single large directory doesn't completely serialise
processing within an AG. This is done by using bound-depth
workqueues to allow inode records to be processed asynchronously as
the inode records are fetched from disk.
Further, we need to fix the bad alogrithmic scalability of the in
memory directory tracking structures. This is done through a
combination of better structures and more appropriate dynamic size
choices.
The results on a filesystem with a single 10 million entry directory
containing 400MB of directory entry data is as follows:
v5.6.0 (Baseline)
XFS_REPAIR Summary Thu Oct 22 12:10:52 2020
Phase Start End Duration
Phase 1: 10/22 12:06:41 10/22 12:06:41
Phase 2: 10/22 12:06:41 10/22 12:06:41
Phase 3: 10/22 12:06:41 10/22 12:07:00 19 seconds
Phase 4: 10/22 12:07:00 10/22 12:07:12 12 seconds
Phase 5: 10/22 12:07:12 10/22 12:07:13 1 second
Phase 6: 10/22 12:07:13 10/22 12:10:51 3 minutes, 38 seconds
Phase 7: 10/22 12:10:51 10/22 12:10:51
Total run time: 4 minutes, 10 seconds
real 4m11.151s
user 4m20.083s
sys 0m14.744s
5.9.0-rc1 + patchset:
XFS_REPAIR Summary Thu Oct 22 13:19:02 2020
Phase Start End Duration
Phase 1: 10/22 13:18:09 10/22 13:18:09
Phase 2: 10/22 13:18:09 10/22 13:18:09
Phase 3: 10/22 13:18:09 10/22 13:18:31 22 seconds
Phase 4: 10/22 13:18:31 10/22 13:18:45 14 seconds
Phase 5: 10/22 13:18:45 10/22 13:18:45
Phase 6: 10/22 13:18:45 10/22 13:19:00 15 seconds
Phase 7: 10/22 13:19:00 10/22 13:19:00
Total run time: 51 seconds
real 0m52.375s
user 1m3.739s
sys 0m20.346s
Performance improvements on filesystems with small directories and
really fast storage are, at best, modest. The big improvements are
seen with either really large directories and/or relatively slow
devices that are IO latency bound and can benefit from having more
IO in flight at once.
Cheers,
Dave.
Dave Chinner (7):
workqueue: bound maximum queue depth
repair: Protect bad inode list with mutex
repair: protect inode chunk tree records with a mutex
repair: parallelise phase 6
repair: don't duplicate names in phase 6
repair: convert the dir byaddr hash to a radix tree
repair: scale duplicate name checking in phase 6.
libfrog/radix-tree.c | 46 +++++
libfrog/workqueue.c | 42 ++++-
libfrog/workqueue.h | 4 +
repair/dir2.c | 32 ++--
repair/incore.h | 23 +++
repair/incore_ino.c | 15 ++
repair/phase6.c | 396 +++++++++++++++++++++----------------------
7 files changed, 338 insertions(+), 220 deletions(-)
--
2.28.0
next reply other threads:[~2020-10-22 5:15 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-22 5:15 Dave Chinner [this message]
2020-10-22 5:15 ` [PATCH 1/7] workqueue: bound maximum queue depth Dave Chinner
2020-10-22 5:54 ` Darrick J. Wong
2020-10-22 8:11 ` Dave Chinner
2020-10-25 4:41 ` Darrick J. Wong
2020-10-26 22:29 ` Dave Chinner
2020-10-26 22:40 ` Darrick J. Wong
2020-10-26 22:57 ` Dave Chinner
2020-10-22 5:15 ` [PATCH 2/7] repair: Protect bad inode list with mutex Dave Chinner
2020-10-22 5:45 ` Darrick J. Wong
2020-10-29 9:35 ` Christoph Hellwig
2020-10-22 5:15 ` [PATCH 3/7] repair: protect inode chunk tree records with a mutex Dave Chinner
2020-10-22 6:02 ` Darrick J. Wong
2020-10-22 8:15 ` Dave Chinner
2020-10-29 16:45 ` Darrick J. Wong
2020-10-22 5:15 ` [PATCH 4/7] repair: parallelise phase 6 Dave Chinner
2020-10-22 6:11 ` Darrick J. Wong
2020-10-27 5:10 ` Dave Chinner
2020-10-29 17:20 ` Darrick J. Wong
2020-10-22 5:15 ` [PATCH 5/7] repair: don't duplicate names in " Dave Chinner
2020-10-22 6:21 ` Darrick J. Wong
2020-10-22 8:23 ` Dave Chinner
2020-10-22 15:53 ` Darrick J. Wong
2020-10-29 9:39 ` Christoph Hellwig
2020-10-22 5:15 ` [PATCH 6/7] repair: convert the dir byaddr hash to a radix tree Dave Chinner
2020-10-29 16:41 ` Darrick J. Wong
2020-10-22 5:15 ` [PATCH 7/7] repair: scale duplicate name checking in phase 6 Dave Chinner
2020-10-29 16:29 ` Darrick J. Wong
2021-03-19 1:33 [PATCH 0/7] repair: Phase 6 performance improvements Dave Chinner
2021-03-19 1:38 ` Gao Xiang
2021-03-19 18:22 ` Darrick J. Wong
2021-03-20 2:09 ` Gao Xiang
2021-03-24 1:26 ` Gao Xiang
2021-03-24 2:08 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201022051537.2286402-1-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).