fstests.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [NYE DELUGE 2/4] xfs: online repair in its entirety
@ 2022-12-30 21:14 Darrick J. Wong
  2022-12-30 22:19 ` [PATCHSET v24.0 0/5] fstests: race online scrub with fsstress Darrick J. Wong
                   ` (17 more replies)
  0 siblings, 18 replies; 106+ messages in thread
From: Darrick J. Wong @ 2022-12-30 21:14 UTC (permalink / raw)
  To: Dave Chinner, Allison Henderson, Chandan Babu R, Catherine Hoang, djwong
  Cc: xfs, greg.marsden, shirley.ma, konrad.wilk, linux-fsdevel,
	Matthew Wilcox, tpkelly, smahar, Christoph Hellwig, fstests,
	Zorro Lang, Carlos Maiolino

Hi everyone,

As I've mentioned several times throughout 2022, I would like to merge
the online fsck feature in time for the 2023 LTS kernel.  This is the
second part of that effort.

This deluge contains all of the online repair kernel code, a significant
amount of restructuring of how repairs work in the userspace driver
program, and a ton of fstests updates to provide automated fuzz testing
and stress testing of forced repairs.

Within the kernel section, the major pieces are the use of tmpfs files
to provide pageable kernel memory for staging repair information;
lightweight hooks into the main xfs filesystem for scrub via jump
labels; coordinated inode scans for live index construction; and the
atomic file mapping swap feature.

Changes to the userspace driver program fall into two main categories:
restructuring how repairs are scheduled so that they're tracked by inode
or AG; establishing data dependency chains so that we scan and repair
things in the correct order; and reworking the systemd background
services to be more secure, enable periodic media scans, and provide
some semblance of fs corruption reporting.

The fstests changes are a substantial reworking of the fuzzing code to
fit the testing described in the design documentation; adding stress
testing of online repairs vs. fsstress; and functional tests for all the
new features that ride in with online repair.

For this review, I would like people to focus the following:

- Are the major subsystems sufficiently documented that you could figure
  out what the code does?

- Do you see any problems that are severe enough to cause long term
  support hassles? (e.g. bad API design, writing weird metadata to disk)

- Can you spot mis-interactions between the subsystems?

- What were my blind spots in devising this feature?

- Are there missing pieces that you'd like to help build?

- Can I just merge all of this?

The one thing that is /not/ in scope for this review are requests for
more refactoring of existing subsystems.  While there are usually valid
arguments for performing such cleanups, those are separate tasks to be
prioritized separately.  I will get to them after merging online fsck,
because revising existing subsystems generally involves rebasing work
in this patchset, which means the affected patches need re-reviewing.
Unless it's absolutely necessary, this just creates more work for
everybody.

I've been running daily online **repairs** of every computer I own for
the last eight months.  All modifications so far have been to optimize
data structures (holes in the xattr structures, excessively large rmap
btrees, and bugs in quota resource counter updates).  So far, no damage
has resulted from these operations.  All issues observed in that time
have been corrected in this submission.

Fuzz and stress testing of online repairs have been running well for a
year now.  As of this writing, online repair can fix slightly more
things than offline repair, and the fsstress+repair long soak test has
passed 100 million repairs with zero problems observed.

(For comparison, the long soak fsx test recently passed 92 billion file
operations, so online fsck has a ways to go...)

As a warning, the patches will likely take several days to trickle in.
While everyone else looks at this, I plan to prototype directory tree
reconstruction with Allison's parent pointers v27 patchset.  Having a
user of that functionality is (I think) the last major hurdle to
ensuring that parent pointers are a good fit for the problems that need
solving, which in turn is the last requirement for merging that feature.

--D

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2023-03-01  2:57 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-30 21:14 [NYE DELUGE 2/4] xfs: online repair in its entirety Darrick J. Wong
2022-12-30 22:19 ` [PATCHSET v24.0 0/5] fstests: race online scrub with fsstress Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/5] xfs: race fsstress with online scrubbers for AG and fs metadata Darrick J. Wong
2023-02-05 13:04     ` Zorro Lang
2023-02-07 16:58       ` Darrick J. Wong
2023-02-07 17:02     ` [PATCH v24.1 " Darrick J. Wong
2023-02-07 18:45       ` Zorro Lang
2022-12-30 22:19   ` [PATCH 5/5] xfs: race fsstress with online scrubbers for file metadata Darrick J. Wong
2022-12-30 22:19   ` [PATCH 3/5] fuzzy: add a custom xfs find utility for scrub stress tests Darrick J. Wong
2023-02-05 12:57     ` Zorro Lang
2023-02-07 16:57       ` Darrick J. Wong
2023-02-07 17:01     ` [PATCH v24.1 " Darrick J. Wong
2023-02-07 18:42       ` Zorro Lang
2022-12-30 22:19   ` [PATCH 1/5] xfs/357: switch fuzzing to agi 1 Darrick J. Wong
2023-02-07 18:46     ` Zorro Lang
2022-12-30 22:19   ` [PATCH 4/5] fuzzy: allow xfs scrub stress tests to pick preconfigured fsstress configs Darrick J. Wong
2023-02-07 18:48     ` Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] xfs: force rebuilding of metadata Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] fuzzy: use FORCE_REBUILD over injecting force_repair Darrick J. Wong
2023-02-14  8:00     ` Zorro Lang
2023-02-14 18:18       ` Darrick J. Wong
2023-02-16 14:57         ` Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/2] fstests: online repair of AG btrees Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/2] xfs: stress test ag repair functions Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/2] xfs: test rebuilding the entire filesystem with online fsck Darrick J. Wong
2023-02-18  6:06   ` [PATCHSET v24.0 0/2] fstests: online repair of AG btrees Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] fstests: online repair of inodes Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] xfs: race fsstress with online repair for inode record metadata Darrick J. Wong
2023-02-18  6:07     ` Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/4] fstests: online repair of file fork mappings Darrick J. Wong
2022-12-30 22:19   ` [PATCH 4/4] xfs: race fsstress with online repair for special file metadata Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/4] xfs: test rebuilding xattrs when the data fork is btree format Darrick J. Wong
2022-12-30 22:19   ` [PATCH 3/4] xfs: ensure that online file data fork repairs don't hit EDQUOT Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/4] xfs: race fsstress with online repair for inode and fork metadata Darrick J. Wong
2023-02-18  6:07   ` [PATCHSET v24.0 0/4] fstests: online repair of file fork mappings Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] fstests: online repair of quota and counters Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] xfs: race fsstress with online scrub and repair for quota metadata Darrick J. Wong
2023-02-18  6:10     ` Zorro Lang
2023-02-18  6:12     ` Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] fstests: online repair of quota counters Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] xfs: race fsstress with online scrub and repair for quotacheck Darrick J. Wong
2023-02-18  6:12     ` Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] fstests: online repair of file link counts Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] xfs: race fsstress with inode link count check and repair Darrick J. Wong
2023-02-18  6:13     ` Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/2] fstests: online repair for fs summary counters Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/2] xfs: test fs summary counter online repair Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/2] xfs: race fsstress with online repair for summary counters Darrick J. Wong
2023-02-18  6:14   ` [PATCHSET v24.0 0/2] fstests: online repair for fs " Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] fstests: online repair of rmap btrees Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] xfs/422: don't freeze while racing rmap repair and fsstress Darrick J. Wong
2023-02-18  6:15     ` Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 0/2] fstests: fix a few bugs in fs population Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/2] populate: take a snapshot of the filesystem if creation fails Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/2] populate: fix some weirdness in __populate_check_xfs_agbtree_height Darrick J. Wong
2023-02-18  6:16   ` [PATCHSET v24.0 0/2] fstests: fix a few bugs in fs population Zorro Lang
2022-12-30 22:19 ` [PATCHSET v24.0 00/24] fstests: improve xfs fuzzing Darrick J. Wong
2022-12-30 22:19   ` [PATCH 01/24] fuzzy: disable per-field random fuzzing by default Darrick J. Wong
2022-12-30 22:19   ` [PATCH 02/24] fuzzy: disable timstamp " Darrick J. Wong
2022-12-30 22:19   ` [PATCH 07/24] fuzzy: don't fuzz xattr namespace flags and values Darrick J. Wong
2022-12-30 22:19   ` [PATCH 05/24] fuzzy: don't fuzz inode generation numbers Darrick J. Wong
2022-12-30 22:19   ` [PATCH 03/24] fuzzy: don't fuzz the log sequence number Darrick J. Wong
2022-12-30 22:19   ` [PATCH 06/24] fuzzy: don't fuzz user-controllable inode flags Darrick J. Wong
2022-12-30 22:19   ` [PATCH 04/24] fuzzy: don't fuzz obsolete inode fields Darrick J. Wong
2022-12-30 22:19   ` [PATCH 12/24] common/fuzzy: fix some problems with the offline repair strategy Darrick J. Wong
2022-12-30 22:19   ` [PATCH 11/24] common/fuzzy: fix some problems with the online " Darrick J. Wong
2022-12-30 22:19   ` [PATCH 08/24] common/fuzzy: split out each repair strategy into a separate helper Darrick J. Wong
2022-12-30 22:19   ` [PATCH 10/24] common/fuzzy: hoist the post-repair fs modification step Darrick J. Wong
2022-12-30 22:19   ` [PATCH 09/24] common/fuzzy: add an underline to the full log between sections Darrick J. Wong
2022-12-30 22:19   ` [PATCH 13/24] common/fuzzy: fix some problems with the no-repair strategy Darrick J. Wong
2022-12-30 22:19   ` [PATCH 14/24] common/fuzzy: fix some problems with the online-then-offline repair strategy Darrick J. Wong
2022-12-30 22:19   ` [PATCH 16/24] xfs/{35[45],455}: fix bogus corruption errors Darrick J. Wong
2022-12-30 22:19   ` [PATCH 21/24] fuzzy: compress coredumps created while fuzzing Darrick J. Wong
2022-12-30 22:19   ` [PATCH 15/24] common/fuzzy: fix some problems with the post-repair fs modification code Darrick J. Wong
2022-12-30 22:19   ` [PATCH 19/24] common/fuzzy: exercise the filesystem a little harder after repairing Darrick J. Wong
2022-12-30 22:19   ` [PATCH 22/24] fuzzy: report the fuzzing repair strategy in seqres.full Darrick J. Wong
2022-12-30 22:19   ` [PATCH 17/24] common/fuzzy: evaluate xfs_check vs xfs_repair Darrick J. Wong
2022-12-30 22:19   ` [PATCH 20/24] fuzzy: dump metadata state before fuzzing Darrick J. Wong
2022-12-30 22:19   ` [PATCH 18/24] common: check xfs health after doing an online scrub Darrick J. Wong
2022-12-30 22:19   ` [PATCH 24/24] fuzzy: for fuzzing ag btrees, find the path to the AG header Darrick J. Wong
2022-12-30 22:19   ` [PATCH 23/24] xfs: improve metadata array field handling when fuzzing Darrick J. Wong
2022-12-30 22:19 ` [PATCHSET v24.0 0/5] fstests: strengthen fuzz testing Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/5] fuzzy: test fuzzing directory block mappings Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/5] fuzzy: test fuzzing xattr " Darrick J. Wong
2022-12-30 22:19   ` [PATCH 5/5] fuzzy: fuzz test key/pointers of inode btrees Darrick J. Wong
2022-12-30 22:19   ` [PATCH 4/5] xfs: fuzz test both repair strategies Darrick J. Wong
2022-12-30 22:19   ` [PATCH 3/5] fuzzy: test fuzzing realtime free space metadata Darrick J. Wong
2022-12-30 22:19 ` [PATCHSET v24.0 0/7] fstests: atomic file updates Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/7] xfs/122: fix for swapext log items Darrick J. Wong
2022-12-30 22:19   ` [PATCH 5/7] generic: test that file privilege gets dropped with FIEXCHANGE_RANGE Darrick J. Wong
2022-12-30 22:19   ` [PATCH 4/7] generic, xfs: test scatter-gather atomic file updates Darrick J. Wong
2022-12-30 22:19   ` [PATCH 3/7] generic: test new vfs swapext ioctl Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/7] generic: test old xfs extent swapping ioctl Darrick J. Wong
2022-12-30 22:19   ` [PATCH 7/7] fsstress: update for FIEXCHANGE_RANGE Darrick J. Wong
2022-12-30 22:19   ` [PATCH 6/7] fsx: support FIEXCHANGE_RANGE Darrick J. Wong
2023-02-28  1:55     ` Zorro Lang
2023-03-01  2:56       ` Darrick J. Wong
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] fstests: online repair of realtime summaries Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] xfs: race fsstress with online repair of realtime summary files Darrick J. Wong
2022-12-30 22:19 ` [PATCHSET v24.0 0/1] fstests: online repair of extended attributes Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/1] xfs: race fsstress with online repair of extended attribute data Darrick J. Wong
2022-12-30 22:19 ` [PATCHSET v24.0 0/2] fstests: online repair of directories Darrick J. Wong
2022-12-30 22:19   ` [PATCH 1/2] xfs: ensure that online directory repairs don't hit EDQUOT Darrick J. Wong
2022-12-30 22:19   ` [PATCH 2/2] xfs: race fsstress with online repair of dirs and parent pointers Darrick J. Wong
2022-12-30 22:20 ` [PATCHSET v24.0 0/1] fstests: test automatic scrub optimization by default Darrick J. Wong
2022-12-30 22:20   ` [PATCH 1/1] xfs: test xfs_scrub dry run, preen, and repair mode Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).