All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 00/20] xfs: online repair support
@ 2018-02-23  2:01 Darrick J. Wong
  2018-02-23  2:01 ` [PATCH 01/20] xfs: add helpers to calculate btree size Darrick J. Wong
                   ` (19 more replies)
  0 siblings, 20 replies; 25+ messages in thread
From: Darrick J. Wong @ 2018-02-23  2:01 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is the twelfth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.

The first five patches add or expose various libxfs helpers that the
online repair code will use to reconstruct broken metadata.  Most
notably we add a NORMAP flag to the bmapi functions so that we can
use rmap data to rebuild block maps.

Patch six allows us to disable inode reclamation temporarily for the few
things that requires full filesystem scans; at the moment that is
limited to the rmap rebuilder.

Patches 7-20 introduce the online repair functionality for space
metadata.  Our general strategy for rebuilding damaged primary metadata
is to rebuild the structure completely from secondary metadata and free
the old structure after the fact; we do not try to salvage anything.
Consequently, online repair requires rmapbt.  Rebuilding the secondary
metadata (rmap) is much harder -- due to our locking rules (primary and
then secondary) we have to shut down the filesystem temporarily while we
scan all the primary metadata for data to put in the new secondary
structure.

Reconstructing inodes is difficult -- the ability to rebuild files
depends on the filesystem being able to load an inode (xfs_iget), which
means repair has to know how to zap any part of an inode record that
might trigger corruption errors from iget.  To that end, we can now
reset most of an inode record or an inode fork so that we can rebuild
the file.

The refcount rebuilder is more or less the same algorithm that
xfs_repair uses, but modified to reflect the constraints of running in
kernel space.

For rmap rebuilds, we cannot have anything on the filesystem taking
exclusive locks and we cannot have any allocation activity at all.
Therefore, we start by freezing the filesystem to allow other
transactions to finish.  Then, we disable periodic inode reclaim and
roll the freeze back just enough so that we can create our own
transactions but other writes will block.  Next, we scan all other AG
metadata structures, every inode, and every block map to reconstruct the
rmap data.  Then, we reinitialize the rmap btree root and reload the
rmap btree.  Finally, we release all the resource we grabbed and the
filesystem returns to normal.

Looking forward, the parent pointer feature that Allison Henderson is
working on will enable us to reconstruct directories, at which point
we'll be able to reconstruct most of a lightly damaged filesystem.  But
that's future talk.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.16-rc2.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread
* [PATCH v13 00/20] xfs-4.17: online repair support
@ 2018-03-15 20:26 Darrick J. Wong
  2018-03-15 20:26 ` [PATCH 03/20] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2018-03-15 20:26 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is the thirteenth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.

The first five patches add or expose various libxfs helpers that the
online repair code will use to reconstruct broken metadata.  Most
notably we add a NORMAP flag to the bmapi functions so that we can
use rmap data to rebuild block maps.

Patch six allows us to disable inode reclamation temporarily for the few
things that requires full filesystem scans; at the moment that is
limited to the rmap rebuilder.

Patches 7-20 introduce the online repair functionality for space
metadata.  Our general strategy for rebuilding damaged primary metadata
is to rebuild the structure completely from secondary metadata and free
the old structure after the fact; we do not try to salvage anything.
Consequently, online repair requires rmapbt.  Rebuilding the secondary
metadata (rmap) is much harder -- due to our locking rules (primary and
then secondary) we have to shut down the filesystem temporarily while we
scan all the primary metadata for data to put in the new secondary
structure.

Reconstructing inodes is difficult -- the ability to rebuild files
depends on the filesystem being able to load an inode (xfs_iget), which
means repair has to know how to zap any part of an inode record that
might trigger corruption errors from iget.  To that end, we can now
reset most of an inode record or an inode fork so that we can rebuild
the file.

The refcount rebuilder is more or less the same algorithm that
xfs_repair uses, but modified to reflect the constraints of running in
kernel space.

For rmap rebuilds, we cannot have anything on the filesystem taking
exclusive locks and we cannot have any allocation activity at all.
Therefore, we start by freezing the filesystem to allow other
transactions to finish.  Then, we disable periodic inode reclaim and
roll the freeze back just enough so that we can create our own
transactions but other writes will block.  Next, we scan all other AG
metadata structures, every inode, and every block map to reconstruct the
rmap data.  Then, we reinitialize the rmap btree root and reload the
rmap btree.  Finally, we release all the resource we grabbed and the
filesystem returns to normal.

Looking forward, the parent pointer feature that Allison Henderson is
working on will enable us to reconstruct directories, at which point
we'll be able to reconstruct most of a lightly damaged filesystem.  But
that's future talk.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.16-rc5.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread
* [PATCH v14 00/20] xfs-4.17: online repair support
@ 2018-03-26 23:55 Darrick J. Wong
  2018-03-26 23:56 ` [PATCH 03/20] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2018-03-26 23:55 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is the fourteenth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair.  There aren't any
on-disk format changes.

New since v13 of these patches is the addition of a new output flag
(XFS_SCRUB_OFLAG_UNTOUCHED) that is set when userspace has requested a
repair or a preen, but the kernel did not find that the metadata needed
fixing or optimization.  The flag was added because a misreporting
problem was discovered in xfs_scrub.  If metadata objects A and B can be
cross-referenced, a corruption in B results in xfs_scrub thinking that
it has to repair B (OFLAG_CORRUPT) and ought to ask the kernel if A also
needs repairs (OFLAG_XCORRUPT).  If we repair B and then try to repair
A, the re-examination of A has no way to communicate to xfs_scrub that A
was actually fine, and xfs_scrub mistakenly reports that it fixed A.
This series also fixes a bug wherein if userspace asked the kernel to
repair a metadata object D and the kernel did not support repairing D,
the kernel would return a runtime error even if D was not in need of a
repair.  This caused further reporting errors when xfs_scrub tried to
have OFLAG_XCORRUPT objects re-examined.

The first five patches add or expose various libxfs helpers that the
online repair code will use to reconstruct broken metadata.  Most
notably we add a NORMAP flag to the bmapi functions so that we can
use rmap data to rebuild block maps.

Patch six allows us to disable inode reclamation temporarily for the few
things that requires full filesystem scans; at the moment that is
limited to the rmap rebuilder.

Patches 7-20 introduce the online repair functionality for space
metadata.  Our general strategy for rebuilding damaged primary metadata
is to rebuild the structure completely from secondary metadata and free
the old structure after the fact; we do not try to salvage anything.
Consequently, online repair requires rmapbt.  Rebuilding the secondary
metadata (rmap) is much harder -- due to our locking rules (primary and
then secondary) we have to shut down the filesystem temporarily while we
scan all the primary metadata for data to put in the new secondary
structure.

Reconstructing inodes is difficult -- the ability to rebuild files
depends on the filesystem being able to load an inode (xfs_iget), which
means repair has to know how to zap any part of an inode record that
might trigger corruption errors from iget.  To that end, we can now
reset most of an inode record or an inode fork so that we can rebuild
the file.

The refcount rebuilder is more or less the same algorithm that
xfs_repair uses, but modified to reflect the constraints of running in
kernel space.

For rmap rebuilds, we cannot have anything on the filesystem taking
exclusive locks and we cannot have any allocation activity at all.
Therefore, we start by freezing the filesystem to allow other
transactions to finish.  Then, we disable periodic inode reclaim and
roll the freeze back just enough so that we can create our own
transactions but other writes will block.  Next, we scan all other AG
metadata structures, every inode, and every block map to reconstruct the
rmap data.  Then, we reinitialize the rmap btree root and reload the
rmap btree.  Finally, we release all the resource we grabbed and the
filesystem returns to normal.

Looking forward, the parent pointer feature that Allison Henderson is
working on will enable us to reconstruct directories, at which point
we'll be able to reconstruct most of a lightly damaged filesystem.  But
that's future talk.

If you're going to start using this mess, you probably ought to just
pull from my git trees.  The kernel patches[1] should apply against
4.16-rc7.  xfsprogs[2] and xfstests[3] can be found in their usual
places.  The git trees contain all four series' worth of changes.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-03-27 23:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-23  2:01 [PATCH v12 00/20] xfs: online repair support Darrick J. Wong
2018-02-23  2:01 ` [PATCH 01/20] xfs: add helpers to calculate btree size Darrick J. Wong
2018-02-23  2:01 ` [PATCH 02/20] xfs: expose various functions to repair code Darrick J. Wong
2018-02-23  2:02 ` [PATCH 03/20] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
2018-02-23  2:02 ` [PATCH 04/20] xfs: add repair helpers for the reference count btree Darrick J. Wong
2018-02-23  2:02 ` [PATCH 05/20] xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmpabt Darrick J. Wong
2018-02-23  2:02 ` [PATCH 06/20] xfs: halt auto-reclamation activities while rebuilding rmap Darrick J. Wong
2018-02-23  2:02 ` [PATCH 07/20] xfs: create tracepoints for online repair Darrick J. Wong
2018-02-23  2:02 ` [PATCH 08/20] xfs: implement the metadata repair ioctl flag Darrick J. Wong
2018-02-23  2:02 ` [PATCH 09/20] xfs: add helper routines for the repair code Darrick J. Wong
2018-02-23  2:02 ` [PATCH 10/20] xfs: repair superblocks Darrick J. Wong
2018-02-23  2:03 ` [PATCH 11/20] xfs: repair the AGF and AGFL Darrick J. Wong
2018-02-23  2:03 ` [PATCH 12/20] xfs: repair the AGI Darrick J. Wong
2018-02-23  2:03 ` [PATCH 13/20] xfs: repair free space btrees Darrick J. Wong
2018-02-23  2:03 ` [PATCH 14/20] xfs: repair inode btrees Darrick J. Wong
2018-02-23  2:03 ` [PATCH 15/20] xfs: repair the rmapbt Darrick J. Wong
2018-02-23  2:03 ` [PATCH 16/20] xfs: repair refcount btrees Darrick J. Wong
2018-02-23  2:03 ` [PATCH 17/20] xfs: repair inode records Darrick J. Wong
2018-02-23  2:03 ` [PATCH 18/20] xfs: repair inode forks Darrick J. Wong
2018-02-23  2:03 ` [PATCH 19/20] xfs: repair inode block maps Darrick J. Wong
2018-02-23  2:03 ` [PATCH 20/20] xfs: repair damaged symlinks Darrick J. Wong
2018-03-15 20:26 [PATCH v13 00/20] xfs-4.17: online repair support Darrick J. Wong
2018-03-15 20:26 ` [PATCH 03/20] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
2018-03-26 23:55 [PATCH v14 00/20] xfs-4.17: online repair support Darrick J. Wong
2018-03-26 23:56 ` [PATCH 03/20] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
2018-03-27 23:03   ` Dave Chinner
2018-03-27 23:29     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.