All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Jan Tulak <jtulak@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Eric Sandeen <sandeen@sandeen.net>,
	linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
Date: Thu, 8 Mar 2018 08:28:38 -0800	[thread overview]
Message-ID: <20180308162838.GY18989@magnolia> (raw)
In-Reply-To: <CACj3i71SHP90DOAhpS3kWyY_6gnx9OzTqgHy7=uxAZH0cfUKMw@mail.gmail.com>

On Thu, Mar 08, 2018 at 11:57:40AM +0100, Jan Tulak wrote:
> On Tue, Mar 6, 2018 at 10:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
> >> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> >> > On 3/5/18 4:31 PM, Dave Chinner wrote:
> >> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> >> >>> As for running automatically and fix any problems, we may need to make
> >> >>> a decision.  If it won't mount due to a log problem, do we automatically
> >> >>> use -L or drop to a shell and punt to the admin?  (That's what we would
> >> >>> do w/o any fsck -f invocation today...)
> >> >>
> >> >> Define the expected "forcefsck" semantics, and that will tell us
> >> >> what we need to do. Is it automatic system recovery? What if the
> >> >> root fs can't be mounted due to log replay problems?
> >> >
> >> > You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
> >> > is copy what e2fsck does - it tries to replay the log before running
> >> > the actual fsck.  So ... what does e2fsck do if /it/ can't replay
> >> > the log?
> >>
> >> As far as I can tell, in that case, e2fsck exit code indicates 4 -
> >> File system errors left uncorrected, but I'm studying ext testing
> >> tools and will try to verify it.
> >> About the -L flag, I think it is a bad idea - we don't want anything
> >> dangerous to happen here, so if it can't be fixed safely and in an
> >> automated way, just bail out.
> >> That being said, I added a log replay attempt in there (via mount/unmount).
> >
> > I really don't advise doing that for a forced filesystem check. If
> > the log is corrupt, mounting it will trigger the problems we are
> > trying to avoid/fix by running a forced filesystem check. As it is,
> > we're probably being run in this mode because mounting has already
> > failed and causing the system not to boot.
> >
> > What we need to do is list how the startup scripts work according to
> > what error is returned, and then match the behaviour we want in a
> > specific corruption case to the behaviour of a specific return
> > value.
> >
> > i.e. if we have a dirty log, then really we need manual
> > intervention. That means we need to return an error that will cause
> > the startup script to stop and drop into an interactive shell for
> > the admin to fix manually.
> >
> > This is what I mean by "define the expected forcefsck semantics" -
> > describe the behaviour of the system in reponse to the errors we can
> > return to it, and match them to the problem cases we need to resolve
> > with fsck.xfs.
> 
> I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
> corrected, system should be rebooted" and "File system errors left
> uncorrected") drop the user into the emergency shell. Anything else
> and the boot continues.

FWIW Debian seems to panic() if the exit code has (1 << 2) set, where
"panic()" either drops to a shell if panic= is not given or actually
reboots the machine if panic= is given.  All other cases proceed with
boot, including 2 (errors fixed, reboot now).

That said, the installer seems to set up root xfs as pass 0 in fstab so
fsck is not included in the initramfs at all.

> This happens before root volume is mounted during the boot, so I
> propose this behaviour for fsck.xfs:
> - if the volume/device is mounted, exit with 16 - usage or syntax
> error (just to be sure)
> - if the volume/device has a dirty log, exit with 4 - errors left
> uncorrected (drop to the shell)
> - if we find no errors, exit with 0 - no errors
> - if we find anything and xfs_repair ends successfully, exit with 1 -
> errors corrected
> - anything else and exit with 8 - operational error
> 
> And is there any other way how to get the "there were some errors, but
> we corrected them" other than either 1) screenscrape xfs_repair or 2)
> run xfs_repair twice, once with -n to detect and then without -n to
> fix the found errors?

I wouldn't run it twice, repair can take quite a while to run.

--D

> >
> >> >>>> I also wonder if we can limit this to just the boot infrastructure,
> >> >>>> because I really don't like the idea of users using fsck.xfs -f to
> >> >>>> repair damage filesystems because "that's what I do to repair ext4
> >> >>>> filesystems"....
> >> >>>
> >> >>> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
> >> >>> than bare xfs_repair...  (Unless all of the above suggestions about dirty
> >> >>> logs get added, then it certainly is!)  So, yeah...
> >> >>>
> >> >>> How would you propose limiting it to the boot environment?
> >> >>
> >> >> I have no idea - this is all way outside my area of expertise...
> >> >
> >> > A halfway measure would be to test whether the script is interactive, perhaps?
> >> >
> >> > https://www.tldp.org/LDP/abs/html/intandnonint.html
> >> >
> >> > case $- in
> >> > *i*)    # interactive shell
> >> > ;;
> >> > *)      # non-interactive shell
> >> > ;;
> >> >
> >>
> >> IMO, any such test would make fsck.xfs behave unpredictably for the
> >> user. If anyone wants to run fsck.xfs -f instead of xfs_repair, it is
> >> their choice.
> >
> > We limit user choices all the time. Default values, config options,
> > tuning variables, etc, IOWs, it's our choice as developers to allow
> > users to do something or not.  And in this case, we made this choice
> > to limit what fsck.xfs could do a long time ago:
> >
> > # man fsck.xfs
> > .....
> >         If you wish to check the consistency of an XFS filesystem,
> >         or repair a damaged or corrupt XFS filesystem, see
> >         xfs_repair(8).
> > .....
> > # fsck.xfs
> > If you wish to check the consistency of an XFS filesystem or
> > repair a damaged filesystem, see xfs_repair(8).
> > #
> >
> 
> At that point, it was a consistent behaviour, do nothing all the time,
> no matter what.
> 
> >
> >> We can print something "next time use xfs_repair
> >> directly" for an interactive session, but I don't like the idea of the
> >> script doing different things based on some (for the user) hidden
> >> variables.
> >
> > What hidden variable are you talking about here? Having a script
> > determine behaviour based on whether it is in an interactive
> > sessions or not is a common thing to do. There's nothing tricky or
> > unusual about it....
> >
> 
> I'm not aware of any script or tool that would refuse to work except
> when started in a specific environment and noninteractively (doesn't
> mean they don't exist, but it is not common). And because it seems
> that fsck.xfs -f will do only what bare xfs_repair would do, no log
> replay, nothing... then I really think that changing what the script
> does (not just altering its output) based on environment tests is
> unnecessary. And for anyone without this specific knowledge, it would
> be confusing - people expect that for the same input the application
> or script does the same thing at the end.
> 
> Cheers,
> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-03-08 16:28 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05 15:05 [PATCH] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
2018-03-05 21:56 ` Dave Chinner
2018-03-05 22:06   ` Eric Sandeen
2018-03-05 22:20     ` Darrick J. Wong
2018-03-05 22:31     ` Dave Chinner
2018-03-05 23:33       ` Eric Sandeen
2018-03-06 11:51         ` Jan Tulak
2018-03-06 21:39           ` Dave Chinner
2018-03-08 10:57             ` Jan Tulak
2018-03-08 16:28               ` Darrick J. Wong [this message]
2018-03-08 22:36                 ` Dave Chinner
2018-03-14 13:51                   ` Jan Tulak
2018-03-14 15:25                     ` Darrick J. Wong
2018-03-14 21:10                       ` Dave Chinner
2018-03-15 17:01                       ` Jan Tulak
2018-03-08 23:28               ` Eric Sandeen
2018-03-14 13:30                 ` Jan Tulak
2018-03-14 15:19                   ` Eric Sandeen
2018-03-15 11:16                     ` Jan Tulak
2018-03-15 22:19                       ` Dave Chinner
2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
2018-03-15 17:45   ` [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
2018-03-15 17:47     ` [PATCH 2/2 v2] " Jan Tulak
2018-03-15 17:50       ` [PATCH 2/2] " Jan Tulak
2018-03-15 18:11         ` Darrick J. Wong
2018-03-15 18:22           ` Jan Tulak
2018-03-15 18:28         ` [PATCH 2/2 v4] " Jan Tulak
2018-03-15 18:49           ` Darrick J. Wong
2018-03-16 10:19             ` Jan Tulak
2018-03-16 15:39               ` Darrick J. Wong
2018-03-16 17:07           ` [PATCH 2/2 v5] " Jan Tulak
2018-03-23  2:37             ` Eric Sandeen
2018-03-23  3:25               ` Darrick J. Wong
2018-03-23  3:29                 ` Eric Sandeen
2018-03-23  3:42                   ` Darrick J. Wong
2018-03-23 14:00                     ` Jan Tulak
2018-03-23 14:14               ` Jan Tulak
2018-03-23 14:33             ` [PATCH 2/2 v6] " Jan Tulak
2022-09-28  5:28               ` Darrick J. Wong
2022-09-29  8:31                 ` Carlos Maiolino
2018-03-15 18:03   ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Darrick J. Wong
2018-03-15 18:23   ` [PATCH 1/2 v2] " Jan Tulak
2018-03-15 18:44     ` Darrick J. Wong
2018-03-23  1:57     ` Eric Sandeen
2018-03-23  9:24       ` Jan Tulak
2018-03-23 14:32     ` [PATCH 1/2 v3] xfs_repair: add flag -e to modify exit code for " Jan Tulak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180308162838.GY18989@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jtulak@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.