All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: djwong@kernel.org
Cc: linux-xfs@vger.kernel.org, willy@infradead.org,
	chandan.babu@oracle.com, allison.henderson@oracle.com,
	linux-fsdevel@vger.kernel.org, hch@infradead.org,
	catherine.hoang@oracle.com
Subject: [PATCH 4/8] xfs: document the testing plan for online fsck
Date: Mon, 06 Jun 2022 18:49:05 -0700	[thread overview]
Message-ID: <165456654534.167418.5247534406783316379.stgit@magnolia> (raw)
In-Reply-To: <165456652256.167418.912764930038710353.stgit@magnolia>

From: Darrick J. Wong <djwong@kernel.org>

Start the fourth chapter of the online fsck design documentation, which
discusses the user interface and the background scrubbing service.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../filesystems/xfs-online-fsck-design.rst         |  105 ++++++++++++++++++++
 1 file changed, 105 insertions(+)


diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst
index 536698b138b8..bdb4bdda3180 100644
--- a/Documentation/filesystems/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs-online-fsck-design.rst
@@ -712,3 +712,108 @@ and the `evolution of existing per-function stress testing
 <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
 Each kernel patchset adding an online repair function will use the same branch
 name across the kernel, xfsprogs, and fstests git repos.
+
+User Interface
+==============
+
+Like offline fsck, the primary user of online fsck should be the system
+administrator.
+Online fsck presents two modes of operation to administrators:
+A foreground CLI process for online fsck on demand, and a background service
+that performs autonomous checking and repair.
+
+Checking on Demand
+------------------
+
+For administrators who want the absolute freshest information about the
+metadata in a filesystem, ``xfs_scrub`` can be run as a foreground process on
+a command line.
+The program checks every piece of metadata in the filesystem while the
+administrator waits for the results to be reported, just like the existing
+``xfs_repair`` tool.
+Both tools share a ``-n`` option to perform a read-only scan, and a ``-v``
+option to increase the verbosity of the information reported.
+
+A new feature of ``xfs_scrub`` is the ``-x`` option, which employs the error
+correction capabilities of the hardware to check data file contents.
+The media scan is not enabled by default because it may dramatically increase
+program runtime and consume a lot of bandwidth on older storage hardware.
+
+The output of a foreground invocation will be captured in the system log.
+
+The ``xfs_scrub_all`` program walks the list of mounted filesystems and
+initiates ``xfs_scrub`` for each of them in parallel.
+It serializes scans for any filesystems that resolve to the same top level
+kernel block device to prevent resource overconsumption.
+
+Background Service
+------------------
+
+To reduce the workload of system administrators, the ``xfs_scrub`` package
+provides a suite of `systemd <https://systemd.io/>`_ timers and services that
+run online fsck automatically on weekends.
+The background service configures scrub to run with as little privilege as
+possible (which is quite a lot), the lowest IO priority, and in a single
+threaded mode to minimize the amount of load generated on the system to avoid
+starving regular workloads.
+
+The output of the background service will also be captured in the system log.
+If desired, reports of failures (either due to inconsistencies or mere runtime
+errors) can be emailed automatically by setting the ``EMAIL_ADDR`` environment
+variable in the following service files:
+
+* ``xfs_scrub_fail@.service``
+* ``xfs_scrub_media_fail@.service``
+* ``xfs_scrub_all_fail.service``
+
+The decision to enable the background scan is left to the system administrator.
+This can be done by enabling either of the following services:
+
+* ``xfs_scrub_all.timer`` on systemd systems to enable a weekly scan of the
+  metadata of all mounted filesystems.
+* ``xfs_scrub_all.cron`` can be used on non-systemd systems to schedule a
+  weekly scan of all mounted filesystems.
+
+The automatic weekly scan is configured out of the box to perform an additional
+media scan of all file data once per month.
+This is less foolproof than, say, storing file data block checksums, but much
+more performant if application software provides its own integrity checking,
+redundancy can be provided elsewhere above the filesystem, or the storage
+device's integrity guarantees are deemed sufficient.
+
+**Question**: Are we using systemd unit directives to their maximum advantage
+to isolate the scrub process and control its resource usage?
+**Question**: Should we document how system administrators can modify the
+xfs_scrub@ service file to contain the QoS hit?
+Or do we assume admins are familiar with existing systemd documentation?
+Where do we even document that?
+
+Proposed patchsets include
+`enabling the background service
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_.
+
+Health Reporting
+----------------
+
+XFS caches a summary of each filesystem's health status in memory.
+The information is updated whenever ``xfs_scrub`` is run, as well as whenever
+inconsistencies are detected in the filesystem metadata.
+System administrators can use the ``health`` command of ``xfs_spaceman`` to
+download this information into a human-readable format.
+If problems have been observed, the administrator can decide to schedule a
+reduced service window in which to run the online repair tool to correct the
+problem.
+Failing that, the administrator can decide to schedule a maintenance window to
+run the traditional offline repair tool to correct the problem.
+
+**Question**: Should the health reporting integrate with the new inotify fs
+error notification system?
+**Question**: Should we write a daemon to listen for corruption notifications
+and initiate a repair?
+
+Proposed patchsets include
+`wiring up health reports to correction returns
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_
+and
+`preservation of sickness info during memory reclaim
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_.


  parent reply	other threads:[~2022-06-07  1:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07  1:48 [PATCHSET 0/8] xfs: design documentation for online fsck Darrick J. Wong
2022-06-07  1:48 ` [PATCH 1/8] xfs: document the motivation for online fsck design Darrick J. Wong
2022-06-07  1:48 ` [PATCH 2/8] xfs: document the general theory underlying " Darrick J. Wong
2022-06-07  1:48 ` [PATCH 3/8] xfs: document the testing plan for online fsck Darrick J. Wong
2022-06-07  1:49 ` Darrick J. Wong [this message]
2022-06-07  1:49 ` [PATCH 5/8] xfs: document technical aspects of kernel space metadata repair code Darrick J. Wong
2022-06-07  1:49 ` [PATCH 6/8] xfs: document technical aspects of kernel space file " Darrick J. Wong
2022-06-07  1:49 ` [PATCH 7/8] xfs: document specific technical aspects of userspace driver program Darrick J. Wong
2022-06-07  1:49 ` [PATCH 8/8] xfs: document future directions of online fsck Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=165456654534.167418.5247534406783316379.stgit@magnolia \
    --to=djwong@kernel.org \
    --cc=allison.henderson@oracle.com \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.