All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fsck.xfs: allow forced repairs using xfs_repair
@ 2018-03-05 15:05 Jan Tulak
  2018-03-05 21:56 ` Dave Chinner
  2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
  0 siblings, 2 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-05 15:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: sandeen, Jan Tulak

The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force), so user can require the repair,
without the repair being run all the time.

(Thanks Eric for suggesting this.)

Signed-off-by: Jan Tulak <jtulak@redhat.com>
---
 fsck/xfs_fsck.sh | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index e52969e4..71bfa2e1 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -4,10 +4,12 @@
 #
 
 AUTO=false
-while getopts ":aApy" c
+FORCE=false
+while getopts ":aApyf" c
 do
 	case $c in
 	a|A|p|y)	AUTO=true;;
+	f)      	FORCE=true;;
 	esac
 done
 eval DEV=\${$#}
@@ -15,10 +17,18 @@ if [ ! -e $DEV ]; then
 	echo "$0: $DEV does not exist"
 	exit 8
 fi
+
+# The flag -f is added by systemd/init scripts when /forcefsck file is present
+# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
+# this check, user has to explicitly require a forced fsck.
+if $FORCE; then
+	xfs_repair $DEV
+	exit $?
+fi
+
 if $AUTO; then
 	echo "$0: XFS file system."
 else
 	echo "If you wish to check the consistency of an XFS filesystem or"
 	echo "repair a damaged filesystem, see xfs_repair(8)."
 fi
-exit 0
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-05 15:05 [PATCH] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
@ 2018-03-05 21:56 ` Dave Chinner
  2018-03-05 22:06   ` Eric Sandeen
  2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
  1 sibling, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2018-03-05 21:56 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-xfs, sandeen

On Mon, Mar 05, 2018 at 04:05:46PM +0100, Jan Tulak wrote:
> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> run on every unclean shutdown. However, sometimes it may happen that the
> root filesystem really requires the usage of xfs_repair and then it is a
> hassle. This patch makes the situation a bit easier by detecting forced
> checks (/forcefsck or fsck.mode=force), so user can require the repair,
> without the repair being run all the time.
> 
> (Thanks Eric for suggesting this.)
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>

Ok, so I can see why support for this is probably neecssary, I have
a few reservations about the implementation....

> ---
>  fsck/xfs_fsck.sh | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index e52969e4..71bfa2e1 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -4,10 +4,12 @@
>  #
>  
>  AUTO=false
> -while getopts ":aApy" c
> +FORCE=false
> +while getopts ":aApyf" c
>  do
>  	case $c in
>  	a|A|p|y)	AUTO=true;;
> +	f)      	FORCE=true;;
>  	esac
>  done
>  eval DEV=\${$#}
> @@ -15,10 +17,18 @@ if [ ! -e $DEV ]; then
>  	echo "$0: $DEV does not exist"
>  	exit 8
>  fi
> +
> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> +# this check, user has to explicitly require a forced fsck.
> +if $FORCE; then
> +	xfs_repair $DEV
> +	exit $?
> +fi

This needs to check that the xfs_repair binary is present in the
environment that is running fsck. If this is checking the root fs
from the initramfs, then distros are going to need to package
xfs_repair into their initramfs build scripts...

Also, if the log is dirty, xfs_repair won't run. If the filesystem
is already mounted read-only, xfs_repair won't run. So if we're
forcing a boot time check, we want it to run unconditionally and fix
any problems found automatically, right?

Also, fsck exit values have specific meaning to the boot
infrastructure and xfs_repair does not follow them. Hence returning
the output of xfs_repair to the fsck caller is going to result in
unexpected/undesired behaviour. From the fsck man page:

      The exit code returned by fsck is the sum of the following conditions:

              0      No errors
              1      Filesystem errors corrected
              2      System should be rebooted
              4      Filesystem errors left uncorrected
              8      Operational error
              16     Usage or syntax error
              32     Checking canceled by user request
              128    Shared-library error

So there's error post processing that is needed here so that the
infrastructure is given the correct status indication so it will
do things like reboot the system if necessary after a repair...

I also wonder if we can limit this to just the boot infrastructure,
because I really don't like the idea of users using fsck.xfs -f to
repair damage filesystems because "that's what I do to repair ext4
filesystems"....

Also missing is a fsck.xfs man page update to document the option.

>  if $AUTO; then
>  	echo "$0: XFS file system."
>  else
>  	echo "If you wish to check the consistency of an XFS filesystem or"
>  	echo "repair a damaged filesystem, see xfs_repair(8)."
>  fi
> -exit 0

I think we still need to exit with a zero status if we did nothing,
because that's what the caller is expecting....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-05 21:56 ` Dave Chinner
@ 2018-03-05 22:06   ` Eric Sandeen
  2018-03-05 22:20     ` Darrick J. Wong
  2018-03-05 22:31     ` Dave Chinner
  0 siblings, 2 replies; 46+ messages in thread
From: Eric Sandeen @ 2018-03-05 22:06 UTC (permalink / raw)
  To: Dave Chinner, Jan Tulak; +Cc: linux-xfs

On 3/5/18 3:56 PM, Dave Chinner wrote:
> On Mon, Mar 05, 2018 at 04:05:46PM +0100, Jan Tulak wrote:
>> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
>> run on every unclean shutdown. However, sometimes it may happen that the
>> root filesystem really requires the usage of xfs_repair and then it is a
>> hassle. This patch makes the situation a bit easier by detecting forced
>> checks (/forcefsck or fsck.mode=force), so user can require the repair,
>> without the repair being run all the time.
>>
>> (Thanks Eric for suggesting this.)
>>
>> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> 
> Ok, so I can see why support for this is probably neecssary, I have
> a few reservations about the implementation....
> 
>> ---
>>  fsck/xfs_fsck.sh | 14 ++++++++++++--
>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
>> index e52969e4..71bfa2e1 100755
>> --- a/fsck/xfs_fsck.sh
>> +++ b/fsck/xfs_fsck.sh
>> @@ -4,10 +4,12 @@
>>  #
>>  
>>  AUTO=false
>> -while getopts ":aApy" c
>> +FORCE=false
>> +while getopts ":aApyf" c
>>  do
>>  	case $c in
>>  	a|A|p|y)	AUTO=true;;
>> +	f)      	FORCE=true;;
>>  	esac
>>  done
>>  eval DEV=\${$#}
>> @@ -15,10 +17,18 @@ if [ ! -e $DEV ]; then
>>  	echo "$0: $DEV does not exist"
>>  	exit 8
>>  fi
>> +
>> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
>> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
>> +# this check, user has to explicitly require a forced fsck.
>> +if $FORCE; then
>> +	xfs_repair $DEV
>> +	exit $?
>> +fi
> 
> This needs to check that the xfs_repair binary is present in the
> environment that is running fsck. If this is checking the root fs
> from the initramfs, then distros are going to need to package
> xfs_repair into their initramfs build scripts...

Fedora and RHEL does, FWIW.  Can others check?  What does Debian do?

> Also, if the log is dirty, xfs_repair won't run. If the filesystem
> is already mounted read-only, xfs_repair won't run. So if we're
> forcing a boot time check, we want it to run unconditionally and fix
> any problems found automatically, right?

Yep, I'm curious if this was tested - I played with something like this
a while ago but didn't take notes.  ;)

As for running automatically and fix any problems, we may need to make
a decision.  If it won't mount due to a log problem, do we automatically
use -L or drop to a shell and punt to the admin?  (That's what we would
do w/o any fsck -f invocation today...)

> Also, fsck exit values have specific meaning to the boot
> infrastructure and xfs_repair does not follow them. Hence returning
> the output of xfs_repair to the fsck caller is going to result in
> unexpected/undesired behaviour. From the fsck man page:
> 
>       The exit code returned by fsck is the sum of the following conditions:
> 
>               0      No errors
>               1      Filesystem errors corrected
>               2      System should be rebooted
>               4      Filesystem errors left uncorrected
>               8      Operational error
>               16     Usage or syntax error
>               32     Checking canceled by user request
>               128    Shared-library error
> 
> So there's error post processing that is needed here so that the
> infrastructure is given the correct status indication so it will
> do things like reboot the system if necessary after a repair...

Good point, thanks.
 
> I also wonder if we can limit this to just the boot infrastructure,
> because I really don't like the idea of users using fsck.xfs -f to
> repair damage filesystems because "that's what I do to repair ext4
> filesystems"....

Depending on how this gets fleshed out, fsck.xfs -f isn't any different
than bare xfs_repair...  (Unless all of the above suggestions about dirty
logs get added, then it certainly is!)  So, yeah...

How would you propose limiting it to the boot environment?  I wondered
about the script itself checking for /forcefsck or the boot parameters,
but at least the boot params probably last for the duration of the uptime.
And re-coding / re-implementing the systemd checks in our own script
probably is a bad idea, so forget I suggested it ...

> Also missing is a fsck.xfs man page update to document the option.

*nod*

 
>>  if $AUTO; then
>>  	echo "$0: XFS file system."
>>  else
>>  	echo "If you wish to check the consistency of an XFS filesystem or"
>>  	echo "repair a damaged filesystem, see xfs_repair(8)."
>>  fi
>> -exit 0
> 
> I think we still need to exit with a zero status if we did nothing,
> because that's what the caller is expecting....
> 
> Cheers,
> 
> Dave.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-05 22:06   ` Eric Sandeen
@ 2018-03-05 22:20     ` Darrick J. Wong
  2018-03-05 22:31     ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-05 22:20 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Jan Tulak, linux-xfs

On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> On 3/5/18 3:56 PM, Dave Chinner wrote:
> > On Mon, Mar 05, 2018 at 04:05:46PM +0100, Jan Tulak wrote:
> >> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> >> run on every unclean shutdown. However, sometimes it may happen that the
> >> root filesystem really requires the usage of xfs_repair and then it is a
> >> hassle. This patch makes the situation a bit easier by detecting forced
> >> checks (/forcefsck or fsck.mode=force), so user can require the repair,
> >> without the repair being run all the time.
> >>
> >> (Thanks Eric for suggesting this.)
> >>
> >> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> > 
> > Ok, so I can see why support for this is probably neecssary, I have
> > a few reservations about the implementation....
> > 
> >> ---
> >>  fsck/xfs_fsck.sh | 14 ++++++++++++--
> >>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> >> index e52969e4..71bfa2e1 100755
> >> --- a/fsck/xfs_fsck.sh
> >> +++ b/fsck/xfs_fsck.sh
> >> @@ -4,10 +4,12 @@
> >>  #
> >>  
> >>  AUTO=false
> >> -while getopts ":aApy" c
> >> +FORCE=false
> >> +while getopts ":aApyf" c
> >>  do
> >>  	case $c in
> >>  	a|A|p|y)	AUTO=true;;
> >> +	f)      	FORCE=true;;
> >>  	esac
> >>  done
> >>  eval DEV=\${$#}
> >> @@ -15,10 +17,18 @@ if [ ! -e $DEV ]; then
> >>  	echo "$0: $DEV does not exist"
> >>  	exit 8
> >>  fi
> >> +
> >> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> >> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> >> +# this check, user has to explicitly require a forced fsck.
> >> +if $FORCE; then
> >> +	xfs_repair $DEV
> >> +	exit $?
> >> +fi
> > 
> > This needs to check that the xfs_repair binary is present in the
> > environment that is running fsck. If this is checking the root fs
> > from the initramfs, then distros are going to need to package
> > xfs_repair into their initramfs build scripts...
> 
> Fedora and RHEL does, FWIW.  Can others check?  What does Debian do?

Ubuntu 16.04's initramfs hooks copy /sbin/fsck and
/sbin/fsck.$detectedrootfstype into the initramfs.

(...and, because I hate my own distro's defaults, I have my own
initramfs hook to stuff xfs_repair and e2fsck into the initramfs. :P)

> > Also, if the log is dirty, xfs_repair won't run. If the filesystem
> > is already mounted read-only, xfs_repair won't run. So if we're
> > forcing a boot time check, we want it to run unconditionally and fix
> > any problems found automatically, right?
> 
> Yep, I'm curious if this was tested - I played with something like this
> a while ago but didn't take notes.  ;)
> 
> As for running automatically and fix any problems, we may need to make
> a decision.  If it won't mount due to a log problem, do we automatically
> use -L or drop to a shell and punt to the admin?  (That's what we would
> do w/o any fsck -f invocation today...)

<shrug> I don't particularly like the idea of automatic -L.  That might
just be paranoia on my part, since the last time I had to run repair -L
was because the rootfs wouldn't mount was due to a bug in the log, and
in the end reinstalling the system was less troublesome than digging
through all the pieces of the now-destroyed rootfs. :/

--D

> > Also, fsck exit values have specific meaning to the boot
> > infrastructure and xfs_repair does not follow them. Hence returning
> > the output of xfs_repair to the fsck caller is going to result in
> > unexpected/undesired behaviour. From the fsck man page:
> > 
> >       The exit code returned by fsck is the sum of the following conditions:
> > 
> >               0      No errors
> >               1      Filesystem errors corrected
> >               2      System should be rebooted
> >               4      Filesystem errors left uncorrected
> >               8      Operational error
> >               16     Usage or syntax error
> >               32     Checking canceled by user request
> >               128    Shared-library error
> > 
> > So there's error post processing that is needed here so that the
> > infrastructure is given the correct status indication so it will
> > do things like reboot the system if necessary after a repair...
> 
> Good point, thanks.
>  
> > I also wonder if we can limit this to just the boot infrastructure,
> > because I really don't like the idea of users using fsck.xfs -f to
> > repair damage filesystems because "that's what I do to repair ext4
> > filesystems"....
> 
> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
> than bare xfs_repair...  (Unless all of the above suggestions about dirty
> logs get added, then it certainly is!)  So, yeah...
> 
> How would you propose limiting it to the boot environment?  I wondered
> about the script itself checking for /forcefsck or the boot parameters,
> but at least the boot params probably last for the duration of the uptime.
> And re-coding / re-implementing the systemd checks in our own script
> probably is a bad idea, so forget I suggested it ...
> 
> > Also missing is a fsck.xfs man page update to document the option.
> 
> *nod*
> 
>  
> >>  if $AUTO; then
> >>  	echo "$0: XFS file system."
> >>  else
> >>  	echo "If you wish to check the consistency of an XFS filesystem or"
> >>  	echo "repair a damaged filesystem, see xfs_repair(8)."
> >>  fi
> >> -exit 0
> > 
> > I think we still need to exit with a zero status if we did nothing,
> > because that's what the caller is expecting....
> > 
> > Cheers,
> > 
> > Dave.
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-05 22:06   ` Eric Sandeen
  2018-03-05 22:20     ` Darrick J. Wong
@ 2018-03-05 22:31     ` Dave Chinner
  2018-03-05 23:33       ` Eric Sandeen
  1 sibling, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2018-03-05 22:31 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Jan Tulak, linux-xfs

On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> On 3/5/18 3:56 PM, Dave Chinner wrote:
> > On Mon, Mar 05, 2018 at 04:05:46PM +0100, Jan Tulak wrote:
> >> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> >> run on every unclean shutdown. However, sometimes it may happen that the
> >> root filesystem really requires the usage of xfs_repair and then it is a
> >> hassle. This patch makes the situation a bit easier by detecting forced
> >> checks (/forcefsck or fsck.mode=force), so user can require the repair,
> >> without the repair being run all the time.
> >>
> >> (Thanks Eric for suggesting this.)
> >>
> >> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> > 
> > Ok, so I can see why support for this is probably neecssary, I have
> > a few reservations about the implementation....
> > 
> >> ---
> >>  fsck/xfs_fsck.sh | 14 ++++++++++++--
> >>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> >> index e52969e4..71bfa2e1 100755
> >> --- a/fsck/xfs_fsck.sh
> >> +++ b/fsck/xfs_fsck.sh
> >> @@ -4,10 +4,12 @@
> >>  #
> >>  
> >>  AUTO=false
> >> -while getopts ":aApy" c
> >> +FORCE=false
> >> +while getopts ":aApyf" c
> >>  do
> >>  	case $c in
> >>  	a|A|p|y)	AUTO=true;;
> >> +	f)      	FORCE=true;;
> >>  	esac
> >>  done
> >>  eval DEV=\${$#}
> >> @@ -15,10 +17,18 @@ if [ ! -e $DEV ]; then
> >>  	echo "$0: $DEV does not exist"
> >>  	exit 8
> >>  fi
> >> +
> >> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> >> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> >> +# this check, user has to explicitly require a forced fsck.
> >> +if $FORCE; then
> >> +	xfs_repair $DEV
> >> +	exit $?
> >> +fi
> > 
> > This needs to check that the xfs_repair binary is present in the
> > environment that is running fsck. If this is checking the root fs
> > from the initramfs, then distros are going to need to package
> > xfs_repair into their initramfs build scripts...
> 
> Fedora and RHEL does, FWIW.  Can others check?  What does Debian do?

No idea. I've never needed to peel an initramfs to check what is
inside.

Ah, debian has a tool for that (lsinitramfs)....

$ lsinitramfs /boot/initrd.img-4.15.0-1-amd64  |grep xfs
lib/modules/4.13.0-1-amd64/kernel/fs/xfs
lib/modules/4.13.0-1-amd64/kernel/fs/xfs/xfs.ko
sbin/fsck.xfs
$

Nope, xfs_repair is not packaged in the debian initramfs. And, well,
on a more recently installed machine:

$ lsinitramfs /boot/initrd.img-4.15.0-rc8-amd64  |grep xfs
lib/modules/4.15.0-rc8-amd64/kernel/fs/xfs
lib/modules/4.15.0-rc8-amd64/kernel/fs/xfs/xfs.ko
$

fsck.xfs isn't even in the built initramfs for a machine running
only XFS filesystems....

> > Also, if the log is dirty, xfs_repair won't run. If the filesystem
> > is already mounted read-only, xfs_repair won't run. So if we're
> > forcing a boot time check, we want it to run unconditionally and fix
> > any problems found automatically, right?
> 
> Yep, I'm curious if this was tested - I played with something like this
> a while ago but didn't take notes.  ;)
> 
> As for running automatically and fix any problems, we may need to make
> a decision.  If it won't mount due to a log problem, do we automatically
> use -L or drop to a shell and punt to the admin?  (That's what we would
> do w/o any fsck -f invocation today...)

Define the expected "forcefsck" semantics, and that will tell us
what we need to do. Is it automatic system recovery? What if the
root fs can't be mounted due to log replay problems?

> > I also wonder if we can limit this to just the boot infrastructure,
> > because I really don't like the idea of users using fsck.xfs -f to
> > repair damage filesystems because "that's what I do to repair ext4
> > filesystems"....
> 
> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
> than bare xfs_repair...  (Unless all of the above suggestions about dirty
> logs get added, then it certainly is!)  So, yeah...
> 
> How would you propose limiting it to the boot environment? 

I have no idea - this is all way outside my area of expertise...

> I wondered
> about the script itself checking for /forcefsck or the boot parameters,
> but at least the boot params probably last for the duration of the uptime.
> And re-coding / re-implementing the systemd checks in our own script
> probably is a bad idea, so forget I suggested it ...

But, yeah, we don't want to have to do that....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-05 22:31     ` Dave Chinner
@ 2018-03-05 23:33       ` Eric Sandeen
  2018-03-06 11:51         ` Jan Tulak
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Sandeen @ 2018-03-05 23:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Tulak, linux-xfs

On 3/5/18 4:31 PM, Dave Chinner wrote:
> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:

...

> Nope, xfs_repair is not packaged in the debian initramfs. And, well,
> on a more recently installed machine:
> 
> $ lsinitramfs /boot/initrd.img-4.15.0-rc8-amd64  |grep xfs
> lib/modules/4.15.0-rc8-amd64/kernel/fs/xfs
> lib/modules/4.15.0-rc8-amd64/kernel/fs/xfs/xfs.ko
> $
> 
> fsck.xfs isn't even in the built initramfs for a machine running
> only XFS filesystems....

Ok, well, that's a distro issue not an xfsprogs issue.  :)  Shows
that the script needs to test for presence of xfs_repair, though.

>>> Also, if the log is dirty, xfs_repair won't run. If the filesystem
>>> is already mounted read-only, xfs_repair won't run. So if we're
>>> forcing a boot time check, we want it to run unconditionally and fix
>>> any problems found automatically, right?
>>
>> Yep, I'm curious if this was tested - I played with something like this
>> a while ago but didn't take notes.  ;)
>>
>> As for running automatically and fix any problems, we may need to make
>> a decision.  If it won't mount due to a log problem, do we automatically
>> use -L or drop to a shell and punt to the admin?  (That's what we would
>> do w/o any fsck -f invocation today...)
> 
> Define the expected "forcefsck" semantics, and that will tell us
> what we need to do. Is it automatic system recovery? What if the
> root fs can't be mounted due to log replay problems?

You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
is copy what e2fsck does - it tries to replay the log before running
the actual fsck.  So ... what does e2fsck do if /it/ can't replay
the log?
 
>>> I also wonder if we can limit this to just the boot infrastructure,
>>> because I really don't like the idea of users using fsck.xfs -f to
>>> repair damage filesystems because "that's what I do to repair ext4
>>> filesystems"....
>>
>> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
>> than bare xfs_repair...  (Unless all of the above suggestions about dirty
>> logs get added, then it certainly is!)  So, yeah...
>>
>> How would you propose limiting it to the boot environment? 
> 
> I have no idea - this is all way outside my area of expertise...

A halfway measure would be to test whether the script is interactive, perhaps?

https://www.tldp.org/LDP/abs/html/intandnonint.html

case $- in
*i*)    # interactive shell
;;
*)      # non-interactive shell
;;

-Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-05 23:33       ` Eric Sandeen
@ 2018-03-06 11:51         ` Jan Tulak
  2018-03-06 21:39           ` Dave Chinner
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-06 11:51 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, linux-xfs

On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 3/5/18 4:31 PM, Dave Chinner wrote:
>> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
>
> ...
>
>> Nope, xfs_repair is not packaged in the debian initramfs. And, well,
>> on a more recently installed machine:
>>
>> $ lsinitramfs /boot/initrd.img-4.15.0-rc8-amd64  |grep xfs
>> lib/modules/4.15.0-rc8-amd64/kernel/fs/xfs
>> lib/modules/4.15.0-rc8-amd64/kernel/fs/xfs/xfs.ko
>> $
>>
>> fsck.xfs isn't even in the built initramfs for a machine running
>> only XFS filesystems....
>
> Ok, well, that's a distro issue not an xfsprogs issue.  :)  Shows
> that the script needs to test for presence of xfs_repair, though.

Agreed.

>
>>>> Also, if the log is dirty, xfs_repair won't run. If the filesystem
>>>> is already mounted read-only, xfs_repair won't run. So if we're
>>>> forcing a boot time check, we want it to run unconditionally and fix
>>>> any problems found automatically, right?
>>>
>>> Yep, I'm curious if this was tested - I played with something like this
>>> a while ago but didn't take notes.  ;)

I tested if it doesn't run when it shouldn't, but forgot to test with
a damaged fs. I thought I did test a dirty log, but apparently, I
didn't check the result.
/me looks ashamed at his "grep xfs_repair" history.

>>>
>>> As for running automatically and fix any problems, we may need to make
>>> a decision.  If it won't mount due to a log problem, do we automatically
>>> use -L or drop to a shell and punt to the admin?  (That's what we would
>>> do w/o any fsck -f invocation today...)
>>
>> Define the expected "forcefsck" semantics, and that will tell us
>> what we need to do. Is it automatic system recovery? What if the
>> root fs can't be mounted due to log replay problems?
>
> You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
> is copy what e2fsck does - it tries to replay the log before running
> the actual fsck.  So ... what does e2fsck do if /it/ can't replay
> the log?

As far as I can tell, in that case, e2fsck exit code indicates 4 -
File system errors left uncorrected, but I'm studying ext testing
tools and will try to verify it.
About the -L flag, I think it is a bad idea - we don't want anything
dangerous to happen here, so if it can't be fixed safely and in an
automated way, just bail out.
That being said, I added a log replay attempt in there (via mount/unmount).

>
>>>> I also wonder if we can limit this to just the boot infrastructure,
>>>> because I really don't like the idea of users using fsck.xfs -f to
>>>> repair damage filesystems because "that's what I do to repair ext4
>>>> filesystems"....
>>>
>>> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
>>> than bare xfs_repair...  (Unless all of the above suggestions about dirty
>>> logs get added, then it certainly is!)  So, yeah...
>>>
>>> How would you propose limiting it to the boot environment?
>>
>> I have no idea - this is all way outside my area of expertise...
>
> A halfway measure would be to test whether the script is interactive, perhaps?
>
> https://www.tldp.org/LDP/abs/html/intandnonint.html
>
> case $- in
> *i*)    # interactive shell
> ;;
> *)      # non-interactive shell
> ;;
>

IMO, any such test would make fsck.xfs behave unpredictably for the
user. If anyone wants to run fsck.xfs -f instead of xfs_repair, it is
their choice. We can print something "next time use xfs_repair
directly" for an interactive session, but I don't like the idea of the
script doing different things based on some (for the user) hidden
variables.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-06 11:51         ` Jan Tulak
@ 2018-03-06 21:39           ` Dave Chinner
  2018-03-08 10:57             ` Jan Tulak
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2018-03-06 21:39 UTC (permalink / raw)
  To: Jan Tulak; +Cc: Eric Sandeen, linux-xfs

On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> > On 3/5/18 4:31 PM, Dave Chinner wrote:
> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> >>> As for running automatically and fix any problems, we may need to make
> >>> a decision.  If it won't mount due to a log problem, do we automatically
> >>> use -L or drop to a shell and punt to the admin?  (That's what we would
> >>> do w/o any fsck -f invocation today...)
> >>
> >> Define the expected "forcefsck" semantics, and that will tell us
> >> what we need to do. Is it automatic system recovery? What if the
> >> root fs can't be mounted due to log replay problems?
> >
> > You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
> > is copy what e2fsck does - it tries to replay the log before running
> > the actual fsck.  So ... what does e2fsck do if /it/ can't replay
> > the log?
> 
> As far as I can tell, in that case, e2fsck exit code indicates 4 -
> File system errors left uncorrected, but I'm studying ext testing
> tools and will try to verify it.
> About the -L flag, I think it is a bad idea - we don't want anything
> dangerous to happen here, so if it can't be fixed safely and in an
> automated way, just bail out.
> That being said, I added a log replay attempt in there (via mount/unmount).

I really don't advise doing that for a forced filesystem check. If
the log is corrupt, mounting it will trigger the problems we are
trying to avoid/fix by running a forced filesystem check. As it is,
we're probably being run in this mode because mounting has already
failed and causing the system not to boot.

What we need to do is list how the startup scripts work according to
what error is returned, and then match the behaviour we want in a
specific corruption case to the behaviour of a specific return
value.

i.e. if we have a dirty log, then really we need manual
intervention. That means we need to return an error that will cause
the startup script to stop and drop into an interactive shell for
the admin to fix manually.

This is what I mean by "define the expected forcefsck semantics" -
describe the behaviour of the system in reponse to the errors we can
return to it, and match them to the problem cases we need to resolve
with fsck.xfs.

> >>>> I also wonder if we can limit this to just the boot infrastructure,
> >>>> because I really don't like the idea of users using fsck.xfs -f to
> >>>> repair damage filesystems because "that's what I do to repair ext4
> >>>> filesystems"....
> >>>
> >>> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
> >>> than bare xfs_repair...  (Unless all of the above suggestions about dirty
> >>> logs get added, then it certainly is!)  So, yeah...
> >>>
> >>> How would you propose limiting it to the boot environment?
> >>
> >> I have no idea - this is all way outside my area of expertise...
> >
> > A halfway measure would be to test whether the script is interactive, perhaps?
> >
> > https://www.tldp.org/LDP/abs/html/intandnonint.html
> >
> > case $- in
> > *i*)    # interactive shell
> > ;;
> > *)      # non-interactive shell
> > ;;
> >
> 
> IMO, any such test would make fsck.xfs behave unpredictably for the
> user. If anyone wants to run fsck.xfs -f instead of xfs_repair, it is
> their choice.

We limit user choices all the time. Default values, config options,
tuning variables, etc, IOWs, it's our choice as developers to allow
users to do something or not.  And in this case, we made this choice
to limit what fsck.xfs could do a long time ago:

# man fsck.xfs
.....
	If you wish to check the consistency of an XFS filesystem,
	or repair a damaged or corrupt XFS filesystem, see
	xfs_repair(8).
.....
# fsck.xfs
If you wish to check the consistency of an XFS filesystem or
repair a damaged filesystem, see xfs_repair(8).
#


> We can print something "next time use xfs_repair
> directly" for an interactive session, but I don't like the idea of the
> script doing different things based on some (for the user) hidden
> variables.

What hidden variable are you talking about here? Having a script
determine behaviour based on whether it is in an interactive
sessions or not is a common thing to do. There's nothing tricky or
unusual about it....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-06 21:39           ` Dave Chinner
@ 2018-03-08 10:57             ` Jan Tulak
  2018-03-08 16:28               ` Darrick J. Wong
  2018-03-08 23:28               ` Eric Sandeen
  0 siblings, 2 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-08 10:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs

On Tue, Mar 6, 2018 at 10:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
>> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> > On 3/5/18 4:31 PM, Dave Chinner wrote:
>> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
>> >>> As for running automatically and fix any problems, we may need to make
>> >>> a decision.  If it won't mount due to a log problem, do we automatically
>> >>> use -L or drop to a shell and punt to the admin?  (That's what we would
>> >>> do w/o any fsck -f invocation today...)
>> >>
>> >> Define the expected "forcefsck" semantics, and that will tell us
>> >> what we need to do. Is it automatic system recovery? What if the
>> >> root fs can't be mounted due to log replay problems?
>> >
>> > You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
>> > is copy what e2fsck does - it tries to replay the log before running
>> > the actual fsck.  So ... what does e2fsck do if /it/ can't replay
>> > the log?
>>
>> As far as I can tell, in that case, e2fsck exit code indicates 4 -
>> File system errors left uncorrected, but I'm studying ext testing
>> tools and will try to verify it.
>> About the -L flag, I think it is a bad idea - we don't want anything
>> dangerous to happen here, so if it can't be fixed safely and in an
>> automated way, just bail out.
>> That being said, I added a log replay attempt in there (via mount/unmount).
>
> I really don't advise doing that for a forced filesystem check. If
> the log is corrupt, mounting it will trigger the problems we are
> trying to avoid/fix by running a forced filesystem check. As it is,
> we're probably being run in this mode because mounting has already
> failed and causing the system not to boot.
>
> What we need to do is list how the startup scripts work according to
> what error is returned, and then match the behaviour we want in a
> specific corruption case to the behaviour of a specific return
> value.
>
> i.e. if we have a dirty log, then really we need manual
> intervention. That means we need to return an error that will cause
> the startup script to stop and drop into an interactive shell for
> the admin to fix manually.
>
> This is what I mean by "define the expected forcefsck semantics" -
> describe the behaviour of the system in reponse to the errors we can
> return to it, and match them to the problem cases we need to resolve
> with fsck.xfs.

I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
corrected, system should be rebooted" and "File system errors left
uncorrected") drop the user into the emergency shell. Anything else
and the boot continues.

This happens before root volume is mounted during the boot, so I
propose this behaviour for fsck.xfs:
- if the volume/device is mounted, exit with 16 - usage or syntax
error (just to be sure)
- if the volume/device has a dirty log, exit with 4 - errors left
uncorrected (drop to the shell)
- if we find no errors, exit with 0 - no errors
- if we find anything and xfs_repair ends successfully, exit with 1 -
errors corrected
- anything else and exit with 8 - operational error

And is there any other way how to get the "there were some errors, but
we corrected them" other than either 1) screenscrape xfs_repair or 2)
run xfs_repair twice, once with -n to detect and then without -n to
fix the found errors?

>
>> >>>> I also wonder if we can limit this to just the boot infrastructure,
>> >>>> because I really don't like the idea of users using fsck.xfs -f to
>> >>>> repair damage filesystems because "that's what I do to repair ext4
>> >>>> filesystems"....
>> >>>
>> >>> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
>> >>> than bare xfs_repair...  (Unless all of the above suggestions about dirty
>> >>> logs get added, then it certainly is!)  So, yeah...
>> >>>
>> >>> How would you propose limiting it to the boot environment?
>> >>
>> >> I have no idea - this is all way outside my area of expertise...
>> >
>> > A halfway measure would be to test whether the script is interactive, perhaps?
>> >
>> > https://www.tldp.org/LDP/abs/html/intandnonint.html
>> >
>> > case $- in
>> > *i*)    # interactive shell
>> > ;;
>> > *)      # non-interactive shell
>> > ;;
>> >
>>
>> IMO, any such test would make fsck.xfs behave unpredictably for the
>> user. If anyone wants to run fsck.xfs -f instead of xfs_repair, it is
>> their choice.
>
> We limit user choices all the time. Default values, config options,
> tuning variables, etc, IOWs, it's our choice as developers to allow
> users to do something or not.  And in this case, we made this choice
> to limit what fsck.xfs could do a long time ago:
>
> # man fsck.xfs
> .....
>         If you wish to check the consistency of an XFS filesystem,
>         or repair a damaged or corrupt XFS filesystem, see
>         xfs_repair(8).
> .....
> # fsck.xfs
> If you wish to check the consistency of an XFS filesystem or
> repair a damaged filesystem, see xfs_repair(8).
> #
>

At that point, it was a consistent behaviour, do nothing all the time,
no matter what.

>
>> We can print something "next time use xfs_repair
>> directly" for an interactive session, but I don't like the idea of the
>> script doing different things based on some (for the user) hidden
>> variables.
>
> What hidden variable are you talking about here? Having a script
> determine behaviour based on whether it is in an interactive
> sessions or not is a common thing to do. There's nothing tricky or
> unusual about it....
>

I'm not aware of any script or tool that would refuse to work except
when started in a specific environment and noninteractively (doesn't
mean they don't exist, but it is not common). And because it seems
that fsck.xfs -f will do only what bare xfs_repair would do, no log
replay, nothing... then I really think that changing what the script
does (not just altering its output) based on environment tests is
unnecessary. And for anyone without this specific knowledge, it would
be confusing - people expect that for the same input the application
or script does the same thing at the end.

Cheers,
Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-08 10:57             ` Jan Tulak
@ 2018-03-08 16:28               ` Darrick J. Wong
  2018-03-08 22:36                 ` Dave Chinner
  2018-03-08 23:28               ` Eric Sandeen
  1 sibling, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-08 16:28 UTC (permalink / raw)
  To: Jan Tulak; +Cc: Dave Chinner, Eric Sandeen, linux-xfs

On Thu, Mar 08, 2018 at 11:57:40AM +0100, Jan Tulak wrote:
> On Tue, Mar 6, 2018 at 10:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
> >> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> >> > On 3/5/18 4:31 PM, Dave Chinner wrote:
> >> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> >> >>> As for running automatically and fix any problems, we may need to make
> >> >>> a decision.  If it won't mount due to a log problem, do we automatically
> >> >>> use -L or drop to a shell and punt to the admin?  (That's what we would
> >> >>> do w/o any fsck -f invocation today...)
> >> >>
> >> >> Define the expected "forcefsck" semantics, and that will tell us
> >> >> what we need to do. Is it automatic system recovery? What if the
> >> >> root fs can't be mounted due to log replay problems?
> >> >
> >> > You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
> >> > is copy what e2fsck does - it tries to replay the log before running
> >> > the actual fsck.  So ... what does e2fsck do if /it/ can't replay
> >> > the log?
> >>
> >> As far as I can tell, in that case, e2fsck exit code indicates 4 -
> >> File system errors left uncorrected, but I'm studying ext testing
> >> tools and will try to verify it.
> >> About the -L flag, I think it is a bad idea - we don't want anything
> >> dangerous to happen here, so if it can't be fixed safely and in an
> >> automated way, just bail out.
> >> That being said, I added a log replay attempt in there (via mount/unmount).
> >
> > I really don't advise doing that for a forced filesystem check. If
> > the log is corrupt, mounting it will trigger the problems we are
> > trying to avoid/fix by running a forced filesystem check. As it is,
> > we're probably being run in this mode because mounting has already
> > failed and causing the system not to boot.
> >
> > What we need to do is list how the startup scripts work according to
> > what error is returned, and then match the behaviour we want in a
> > specific corruption case to the behaviour of a specific return
> > value.
> >
> > i.e. if we have a dirty log, then really we need manual
> > intervention. That means we need to return an error that will cause
> > the startup script to stop and drop into an interactive shell for
> > the admin to fix manually.
> >
> > This is what I mean by "define the expected forcefsck semantics" -
> > describe the behaviour of the system in reponse to the errors we can
> > return to it, and match them to the problem cases we need to resolve
> > with fsck.xfs.
> 
> I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
> corrected, system should be rebooted" and "File system errors left
> uncorrected") drop the user into the emergency shell. Anything else
> and the boot continues.

FWIW Debian seems to panic() if the exit code has (1 << 2) set, where
"panic()" either drops to a shell if panic= is not given or actually
reboots the machine if panic= is given.  All other cases proceed with
boot, including 2 (errors fixed, reboot now).

That said, the installer seems to set up root xfs as pass 0 in fstab so
fsck is not included in the initramfs at all.

> This happens before root volume is mounted during the boot, so I
> propose this behaviour for fsck.xfs:
> - if the volume/device is mounted, exit with 16 - usage or syntax
> error (just to be sure)
> - if the volume/device has a dirty log, exit with 4 - errors left
> uncorrected (drop to the shell)
> - if we find no errors, exit with 0 - no errors
> - if we find anything and xfs_repair ends successfully, exit with 1 -
> errors corrected
> - anything else and exit with 8 - operational error
> 
> And is there any other way how to get the "there were some errors, but
> we corrected them" other than either 1) screenscrape xfs_repair or 2)
> run xfs_repair twice, once with -n to detect and then without -n to
> fix the found errors?

I wouldn't run it twice, repair can take quite a while to run.

--D

> >
> >> >>>> I also wonder if we can limit this to just the boot infrastructure,
> >> >>>> because I really don't like the idea of users using fsck.xfs -f to
> >> >>>> repair damage filesystems because "that's what I do to repair ext4
> >> >>>> filesystems"....
> >> >>>
> >> >>> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
> >> >>> than bare xfs_repair...  (Unless all of the above suggestions about dirty
> >> >>> logs get added, then it certainly is!)  So, yeah...
> >> >>>
> >> >>> How would you propose limiting it to the boot environment?
> >> >>
> >> >> I have no idea - this is all way outside my area of expertise...
> >> >
> >> > A halfway measure would be to test whether the script is interactive, perhaps?
> >> >
> >> > https://www.tldp.org/LDP/abs/html/intandnonint.html
> >> >
> >> > case $- in
> >> > *i*)    # interactive shell
> >> > ;;
> >> > *)      # non-interactive shell
> >> > ;;
> >> >
> >>
> >> IMO, any such test would make fsck.xfs behave unpredictably for the
> >> user. If anyone wants to run fsck.xfs -f instead of xfs_repair, it is
> >> their choice.
> >
> > We limit user choices all the time. Default values, config options,
> > tuning variables, etc, IOWs, it's our choice as developers to allow
> > users to do something or not.  And in this case, we made this choice
> > to limit what fsck.xfs could do a long time ago:
> >
> > # man fsck.xfs
> > .....
> >         If you wish to check the consistency of an XFS filesystem,
> >         or repair a damaged or corrupt XFS filesystem, see
> >         xfs_repair(8).
> > .....
> > # fsck.xfs
> > If you wish to check the consistency of an XFS filesystem or
> > repair a damaged filesystem, see xfs_repair(8).
> > #
> >
> 
> At that point, it was a consistent behaviour, do nothing all the time,
> no matter what.
> 
> >
> >> We can print something "next time use xfs_repair
> >> directly" for an interactive session, but I don't like the idea of the
> >> script doing different things based on some (for the user) hidden
> >> variables.
> >
> > What hidden variable are you talking about here? Having a script
> > determine behaviour based on whether it is in an interactive
> > sessions or not is a common thing to do. There's nothing tricky or
> > unusual about it....
> >
> 
> I'm not aware of any script or tool that would refuse to work except
> when started in a specific environment and noninteractively (doesn't
> mean they don't exist, but it is not common). And because it seems
> that fsck.xfs -f will do only what bare xfs_repair would do, no log
> replay, nothing... then I really think that changing what the script
> does (not just altering its output) based on environment tests is
> unnecessary. And for anyone without this specific knowledge, it would
> be confusing - people expect that for the same input the application
> or script does the same thing at the end.
> 
> Cheers,
> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-08 16:28               ` Darrick J. Wong
@ 2018-03-08 22:36                 ` Dave Chinner
  2018-03-14 13:51                   ` Jan Tulak
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2018-03-08 22:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Jan Tulak, Eric Sandeen, linux-xfs

On Thu, Mar 08, 2018 at 08:28:38AM -0800, Darrick J. Wong wrote:
> On Thu, Mar 08, 2018 at 11:57:40AM +0100, Jan Tulak wrote:
> > On Tue, Mar 6, 2018 at 10:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> > > On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
> > >> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> > >> > On 3/5/18 4:31 PM, Dave Chinner wrote:
> > >> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> > >> >>> As for running automatically and fix any problems, we may need to make
> > >> >>> a decision.  If it won't mount due to a log problem, do we automatically
> > >> >>> use -L or drop to a shell and punt to the admin?  (That's what we would
> > >> >>> do w/o any fsck -f invocation today...)
> > >> >>
> > >> >> Define the expected "forcefsck" semantics, and that will tell us
> > >> >> what we need to do. Is it automatic system recovery? What if the
> > >> >> root fs can't be mounted due to log replay problems?
> > >> >
> > >> > You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
> > >> > is copy what e2fsck does - it tries to replay the log before running
> > >> > the actual fsck.  So ... what does e2fsck do if /it/ can't replay
> > >> > the log?
> > >>
> > >> As far as I can tell, in that case, e2fsck exit code indicates 4 -
> > >> File system errors left uncorrected, but I'm studying ext testing
> > >> tools and will try to verify it.
> > >> About the -L flag, I think it is a bad idea - we don't want anything
> > >> dangerous to happen here, so if it can't be fixed safely and in an
> > >> automated way, just bail out.
> > >> That being said, I added a log replay attempt in there (via mount/unmount).
> > >
> > > I really don't advise doing that for a forced filesystem check. If
> > > the log is corrupt, mounting it will trigger the problems we are
> > > trying to avoid/fix by running a forced filesystem check. As it is,
> > > we're probably being run in this mode because mounting has already
> > > failed and causing the system not to boot.
> > >
> > > What we need to do is list how the startup scripts work according to
> > > what error is returned, and then match the behaviour we want in a
> > > specific corruption case to the behaviour of a specific return
> > > value.
> > >
> > > i.e. if we have a dirty log, then really we need manual
> > > intervention. That means we need to return an error that will cause
> > > the startup script to stop and drop into an interactive shell for
> > > the admin to fix manually.
> > >
> > > This is what I mean by "define the expected forcefsck semantics" -
> > > describe the behaviour of the system in reponse to the errors we can
> > > return to it, and match them to the problem cases we need to resolve
> > > with fsck.xfs.
> > 
> > I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
> > corrected, system should be rebooted" and "File system errors left
> > uncorrected") drop the user into the emergency shell. Anything else
> > and the boot continues.
> 
> FWIW Debian seems to panic() if the exit code has (1 << 2) set, where
> "panic()" either drops to a shell if panic= is not given or actually
> reboots the machine if panic= is given.  All other cases proceed with
> boot, including 2 (errors fixed, reboot now).
> 
> That said, the installer seems to set up root xfs as pass 0 in fstab so
> fsck is not included in the initramfs at all.

Ok, so how do we deal with the "all distros are different, none
correctly follow documented fsck behaviour"?  It seems to me like we
need fsck to drop into an admin shell if anything goes wrong,
perhaps with the output saying what went wrong.

e.g. if we corrected errors and need a reboot, then we return 4 to
get the system to drop into a shell with this:

*** Errors corrected, You must reboot now! ***
#

If we had any other failure, then we drop to the shell with this:

*** Failed to repair filesystem. Manual correction required! ***
#

And only in the case that we have an unmounted, clean filesystem do
we continue to boot and mount the root filesystem?

> > This happens before root volume is mounted during the boot, so I
> > propose this behaviour for fsck.xfs:
> > - if the volume/device is mounted, exit with 16 - usage or syntax
> > error (just to be sure)

That's error 4 - requires manual intervention to repair.

> > - if the volume/device has a dirty log, exit with 4 - errors left
> > uncorrected (drop to the shell)

Yup.

> > - if we find no errors, exit with 0 - no errors

Yup, but only if the filesystem is not mounted, otherwise it's
"requires reboot" because repair with no errors still rewrites all
the per-ag metadata and so changes the on disk metadata layout.
Continuing at this point with a mounted filesystem is guaranteed to
corrupt the filesystem.

> > - if we find anything and xfs_repair ends successfully, exit with 1 -
> > errors corrected

Same as the above case - needs reboot.

> > - anything else and exit with 8 - operational error

I'd argue that's "errors uncorrected" (i.e. error 4) because it
requires manual intervention to determine what the error was and
resolve the situation.

> > And is there any other way how to get the "there were some errors, but
> > we corrected them" other than either 1) screenscrape xfs_repair or 2)
> > run xfs_repair twice, once with -n to detect and then without -n to
> > fix the found errors?
> 
> I wouldn't run it twice, repair can take quite a while to run.

Besides, we have to treat both cases the same because even if there
were no errors xfs_repair still modifies the metadata on disk....

> > I'm not aware of any script or tool that would refuse to work except
> > when started in a specific environment and noninteractively (doesn't
> > mean they don't exist, but it is not common). And because it seems
> > that fsck.xfs -f will do only what bare xfs_repair would do, no log
> > replay, nothing... then I really think that changing what the script
> > does (not just altering its output) based on environment tests is
> > unnecessary.

This isn't a technical issue - this is a behaviour management issue.
We want people to always run xfs_repair when there's a problem that
needs fixing, not fsck.xfs. fsck.xfs is for startup scripts only and
has very limited scope in what we allow it to do. xfs_repair is the
tool users should be running to fix their filesystems, not trying to
do it indirectly through startup script infrastructure....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-08 10:57             ` Jan Tulak
  2018-03-08 16:28               ` Darrick J. Wong
@ 2018-03-08 23:28               ` Eric Sandeen
  2018-03-14 13:30                 ` Jan Tulak
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Sandeen @ 2018-03-08 23:28 UTC (permalink / raw)
  To: Jan Tulak, Dave Chinner; +Cc: linux-xfs



On 3/8/18 4:57 AM, Jan Tulak wrote:
> I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
> corrected, system should be rebooted" and "File system errors left
> uncorrected") drop the user into the emergency shell. Anything else
> and the boot continues.
> 
> This happens before root volume is mounted during the boot, so I
> propose this behaviour for fsck.xfs:
> - if the volume/device is mounted, exit with 16 - usage or syntax
> error (just to be sure)
> - if the volume/device has a dirty log, exit with 4 - errors left
> uncorrected (drop to the shell)
> - if we find no errors, exit with 0 - no errors
> - if we find anything and xfs_repair ends successfully, exit with 1 -
> errors corrected
> - anything else and exit with 8 - operational error
> 
> And is there any other way how to get the "there were some errors, but
> we corrected them" other than either 1) screenscrape xfs_repair or 2)
> run xfs_repair twice, once with -n to detect and then without -n to
> fix the found errors?

If it's critical to report whether errors were fixed, it would be trivial
to add a new option to xfs_repair which causes it to test fs_is_dirty for
runs without "-n", and exit with a different value.

The approach above looks ok in general.  The main motivation for this
is to be able to use some config management tool to push out a config
to a whole fleet of machines in a lab, and have them check/fix the
root fs without interaction.  If a filesystem is failing to mount due
to a dirty log, it's likely that the admin is in a more hands-on mode
by then anyway.

A dirty log could also be "operational error" if we decree that forcefsck
and friends should only be used on a clean reboot - but I'm less certain
about that.

Perhaps a guiding principal should be that we have no easy way to repair
the root fs today in /any/ situation, so any correct/safe improvement
to that situation is an improvement, even if not all cases are handled
at this point.

-Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-08 23:28               ` Eric Sandeen
@ 2018-03-14 13:30                 ` Jan Tulak
  2018-03-14 15:19                   ` Eric Sandeen
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-14 13:30 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, linux-xfs

On Fri, Mar 9, 2018 at 12:28 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> If it's critical to report whether errors were fixed, it would be trivial
> to add a new option to xfs_repair which causes it to test fs_is_dirty for
> runs without "-n", and exit with a different value.

I have toyed with it a bit and this seems to be the best option. A
flag that changes the exit code on a successful run. Is exit code 3
ok? According to man page, only 1 and 2 are currently used and the
"everything is ok now, but an issue was there" should not be mixed
with the existing ones. I also thought about a flag that would change
all exit codes to fsck ones, but that seems too complicated and
completely unnecessary.

       [...] always return a  status  code
       of  0  if it completes without problems.  If a runtime error is encoun-
       tered during operation, it will return a status of 1.   In  this  case,
       xfs_repair should be restarted.  If xfs_repair is unable to proceed due
       to a dirty log, it will return a status of 2.

As for the flag, how about -e? It is not used yet and it makes some
sense as "e-exit code."

>
> The approach above looks ok in general.  The main motivation for this
> is to be able to use some config management tool to push out a config
> to a whole fleet of machines in a lab, and have them check/fix the
> root fs without interaction.  If a filesystem is failing to mount due
> to a dirty log, it's likely that the admin is in a more hands-on mode
> by then anyway.
>
> A dirty log could also be "operational error" if we decree that forcefsck
> and friends should only be used on a clean reboot - but I'm less certain
> about that.

I thought about it, but "errors left uncorrected" makes more sense here to me.
>
> Perhaps a guiding principal should be that we have no easy way to repair
> the root fs today in /any/ situation, so any correct/safe improvement
> to that situation is an improvement, even if not all cases are handled
> at this point.
>

Agreed.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-08 22:36                 ` Dave Chinner
@ 2018-03-14 13:51                   ` Jan Tulak
  2018-03-14 15:25                     ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-14 13:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs

On Thu, Mar 8, 2018 at 11:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Mar 08, 2018 at 08:28:38AM -0800, Darrick J. Wong wrote:
>> On Thu, Mar 08, 2018 at 11:57:40AM +0100, Jan Tulak wrote:
>> > On Tue, Mar 6, 2018 at 10:39 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > > On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
>> > >> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> > >> > On 3/5/18 4:31 PM, Dave Chinner wrote:
>> > >> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
>> > >> >>> As for running automatically and fix any problems, we may need to make
>> > >> >>> a decision.  If it won't mount due to a log problem, do we automatically
>> > >> >>> use -L or drop to a shell and punt to the admin?  (That's what we would
>> > >> >>> do w/o any fsck -f invocation today...)
>> > >> >>
>> > >> >> Define the expected "forcefsck" semantics, and that will tell us
>> > >> >> what we need to do. Is it automatic system recovery? What if the
>> > >> >> root fs can't be mounted due to log replay problems?
>> > >> >
>> > >> > You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
>> > >> > is copy what e2fsck does - it tries to replay the log before running
>> > >> > the actual fsck.  So ... what does e2fsck do if /it/ can't replay
>> > >> > the log?
>> > >>
>> > >> As far as I can tell, in that case, e2fsck exit code indicates 4 -
>> > >> File system errors left uncorrected, but I'm studying ext testing
>> > >> tools and will try to verify it.
>> > >> About the -L flag, I think it is a bad idea - we don't want anything
>> > >> dangerous to happen here, so if it can't be fixed safely and in an
>> > >> automated way, just bail out.
>> > >> That being said, I added a log replay attempt in there (via mount/unmount).
>> > >
>> > > I really don't advise doing that for a forced filesystem check. If
>> > > the log is corrupt, mounting it will trigger the problems we are
>> > > trying to avoid/fix by running a forced filesystem check. As it is,
>> > > we're probably being run in this mode because mounting has already
>> > > failed and causing the system not to boot.
>> > >
>> > > What we need to do is list how the startup scripts work according to
>> > > what error is returned, and then match the behaviour we want in a
>> > > specific corruption case to the behaviour of a specific return
>> > > value.
>> > >
>> > > i.e. if we have a dirty log, then really we need manual
>> > > intervention. That means we need to return an error that will cause
>> > > the startup script to stop and drop into an interactive shell for
>> > > the admin to fix manually.
>> > >
>> > > This is what I mean by "define the expected forcefsck semantics" -
>> > > describe the behaviour of the system in reponse to the errors we can
>> > > return to it, and match them to the problem cases we need to resolve
>> > > with fsck.xfs.
>> >
>> > I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
>> > corrected, system should be rebooted" and "File system errors left
>> > uncorrected") drop the user into the emergency shell. Anything else
>> > and the boot continues.
>>
>> FWIW Debian seems to panic() if the exit code has (1 << 2) set, where
>> "panic()" either drops to a shell if panic= is not given or actually
>> reboots the machine if panic= is given.  All other cases proceed with
>> boot, including 2 (errors fixed, reboot now).
>>
>> That said, the installer seems to set up root xfs as pass 0 in fstab so
>> fsck is not included in the initramfs at all.
>
> Ok, so how do we deal with the "all distros are different, none
> correctly follow documented fsck behaviour"?  It seems to me like we
> need fsck to drop into an admin shell if anything goes wrong,
> perhaps with the output saying what went wrong.
>
> e.g. if we corrected errors and need a reboot, then we return 4 to
> get the system to drop into a shell with this:
>
> *** Errors corrected, You must reboot now! ***
> #

That makes a sense, but I have a question here. Can it happen with
xfs, with on an unmounted fs, and can I detect it somehow? I can parse
through -vvvv output, but that doesn't look like a sensible solution -
too many things that could be missed if it is an issue and we need to
watch out for corrections that requires a reboot.

And another thing is that after we drop the user into the shell, they
won't see any message directly I think but will have to read the logs
to see the "Errors corrected, You must reboot now!"

>
> If we had any other failure, then we drop to the shell with this:
>
> *** Failed to repair filesystem. Manual correction required! ***
> #
>
> And only in the case that we have an unmounted, clean filesystem do
> we continue to boot and mount the root filesystem?
>
>> > This happens before root volume is mounted during the boot, so I
>> > propose this behaviour for fsck.xfs:
>> > - if the volume/device is mounted, exit with 16 - usage or syntax
>> > error (just to be sure)
>
> That's error 4 - requires manual intervention to repair.

Mmh... yes, probably better to throw the shell at the user. If this
happens, something has gone bad anyway, because the fsck should be run
before systemd/init scripts attempts to mount the fs, so it should
never happen in the boot environment.

>
>> > - if the volume/device has a dirty log, exit with 4 - errors left
>> > uncorrected (drop to the shell)
>
> Yup.
>
>> > - if we find no errors, exit with 0 - no errors
>
> Yup, but only if the filesystem is not mounted, otherwise it's
> "requires reboot" because repair with no errors still rewrites all
> the per-ag metadata and so changes the on disk metadata layout.
> Continuing at this point with a mounted filesystem is guaranteed to
> corrupt the filesystem.

We refuse to start with a mounted fs, so this is no issue.

>
>> > - if we find anything and xfs_repair ends successfully, exit with 1 -
>> > errors corrected
>
> Same as the above case - needs reboot.

Do we? The fs wasn't mounted at this point yet. Maybe there is a
reason for a reboot, I just don't know about it. :-)

>
>> > - anything else and exit with 8 - operational error
>
> I'd argue that's "errors uncorrected" (i.e. error 4) because it
> requires manual intervention to determine what the error was and
> resolve the situation.
>
>> > And is there any other way how to get the "there were some errors, but
>> > we corrected them" other than either 1) screenscrape xfs_repair or 2)
>> > run xfs_repair twice, once with -n to detect and then without -n to
>> > fix the found errors?
>>
>> I wouldn't run it twice, repair can take quite a while to run.
>
> Besides, we have to treat both cases the same because even if there
> were no errors xfs_repair still modifies the metadata on disk....

A small change in xfs_repair solves this, no issue for fsck anymore.

>
>> > I'm not aware of any script or tool that would refuse to work except
>> > when started in a specific environment and noninteractively (doesn't
>> > mean they don't exist, but it is not common). And because it seems
>> > that fsck.xfs -f will do only what bare xfs_repair would do, no log
>> > replay, nothing... then I really think that changing what the script
>> > does (not just altering its output) based on environment tests is
>> > unnecessary.
>
> This isn't a technical issue - this is a behaviour management issue.
> We want people to always run xfs_repair when there's a problem that
> needs fixing, not fsck.xfs. fsck.xfs is for startup scripts only and
> has very limited scope in what we allow it to do. xfs_repair is the
> tool users should be running to fix their filesystems, not trying to
> do it indirectly through startup script infrastructure....
>

OK

Cheers,
Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-14 13:30                 ` Jan Tulak
@ 2018-03-14 15:19                   ` Eric Sandeen
  2018-03-15 11:16                     ` Jan Tulak
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Sandeen @ 2018-03-14 15:19 UTC (permalink / raw)
  To: Jan Tulak; +Cc: Dave Chinner, linux-xfs

On 3/14/18 9:30 AM, Jan Tulak wrote:
> On Fri, Mar 9, 2018 at 12:28 AM, Eric Sandeen <sandeen@sandeen.net>
> wrote:
>> 
>> If it's critical to report whether errors were fixed, it would be
>> trivial to add a new option to xfs_repair which causes it to test
>> fs_is_dirty for runs without "-n", and exit with a different
>> value.
> 
> I have toyed with it a bit and this seems to be the best option. A 
> flag that changes the exit code on a successful run. Is exit code 3 
> ok? According to man page, only 1 and 2 are currently used and the 
> "everything is ok now, but an issue was there" should not be mixed 
> with the existing ones. I also thought about a flag that would
> change all exit codes to fsck ones, but that seems too complicated
> and completely unnecessary.

Hm, I guess we'll have to.
 
> [...] always return a  status  code of  0  if it completes without
> problems.  If a runtime error is encoun- tered during operation, it
> will return a status of 1.   In  this  case, xfs_repair should be
> restarted.  If xfs_repair is unable to proceed due to a dirty log, it
> will return a status of 2.
> 
> As for the flag, how about -e? It is not used yet and it makes some 
> sense as "e-exit code."

sure, sounds fine.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-14 13:51                   ` Jan Tulak
@ 2018-03-14 15:25                     ` Darrick J. Wong
  2018-03-14 21:10                       ` Dave Chinner
  2018-03-15 17:01                       ` Jan Tulak
  0 siblings, 2 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-14 15:25 UTC (permalink / raw)
  To: Jan Tulak; +Cc: Dave Chinner, Eric Sandeen, linux-xfs

On Wed, Mar 14, 2018 at 02:51:51PM +0100, Jan Tulak wrote:
> On Thu, Mar 8, 2018 at 11:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Mar 08, 2018 at 08:28:38AM -0800, Darrick J. Wong wrote:
> >> On Thu, Mar 08, 2018 at 11:57:40AM +0100, Jan Tulak wrote:
> >> > On Tue, Mar 6, 2018 at 10:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > > On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
> >> > >> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> >> > >> > On 3/5/18 4:31 PM, Dave Chinner wrote:
> >> > >> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> >> > >> >>> As for running automatically and fix any problems, we may need to make
> >> > >> >>> a decision.  If it won't mount due to a log problem, do we automatically
> >> > >> >>> use -L or drop to a shell and punt to the admin?  (That's what we would
> >> > >> >>> do w/o any fsck -f invocation today...)
> >> > >> >>
> >> > >> >> Define the expected "forcefsck" semantics, and that will tell us
> >> > >> >> what we need to do. Is it automatic system recovery? What if the
> >> > >> >> root fs can't be mounted due to log replay problems?
> >> > >> >
> >> > >> > You're asking too much.  ;)  Semantics?  ;)  Best we can probably do
> >> > >> > is copy what e2fsck does - it tries to replay the log before running
> >> > >> > the actual fsck.  So ... what does e2fsck do if /it/ can't replay
> >> > >> > the log?
> >> > >>
> >> > >> As far as I can tell, in that case, e2fsck exit code indicates 4 -
> >> > >> File system errors left uncorrected, but I'm studying ext testing
> >> > >> tools and will try to verify it.
> >> > >> About the -L flag, I think it is a bad idea - we don't want anything
> >> > >> dangerous to happen here, so if it can't be fixed safely and in an
> >> > >> automated way, just bail out.
> >> > >> That being said, I added a log replay attempt in there (via mount/unmount).
> >> > >
> >> > > I really don't advise doing that for a forced filesystem check. If
> >> > > the log is corrupt, mounting it will trigger the problems we are
> >> > > trying to avoid/fix by running a forced filesystem check. As it is,
> >> > > we're probably being run in this mode because mounting has already
> >> > > failed and causing the system not to boot.
> >> > >
> >> > > What we need to do is list how the startup scripts work according to
> >> > > what error is returned, and then match the behaviour we want in a
> >> > > specific corruption case to the behaviour of a specific return
> >> > > value.
> >> > >
> >> > > i.e. if we have a dirty log, then really we need manual
> >> > > intervention. That means we need to return an error that will cause
> >> > > the startup script to stop and drop into an interactive shell for
> >> > > the admin to fix manually.
> >> > >
> >> > > This is what I mean by "define the expected forcefsck semantics" -
> >> > > describe the behaviour of the system in reponse to the errors we can
> >> > > return to it, and match them to the problem cases we need to resolve
> >> > > with fsck.xfs.
> >> >
> >> > I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
> >> > corrected, system should be rebooted" and "File system errors left
> >> > uncorrected") drop the user into the emergency shell. Anything else
> >> > and the boot continues.
> >>
> >> FWIW Debian seems to panic() if the exit code has (1 << 2) set, where
> >> "panic()" either drops to a shell if panic= is not given or actually
> >> reboots the machine if panic= is given.  All other cases proceed with
> >> boot, including 2 (errors fixed, reboot now).
> >>
> >> That said, the installer seems to set up root xfs as pass 0 in fstab so
> >> fsck is not included in the initramfs at all.
> >
> > Ok, so how do we deal with the "all distros are different, none
> > correctly follow documented fsck behaviour"?  It seems to me like we
> > need fsck to drop into an admin shell if anything goes wrong,
> > perhaps with the output saying what went wrong.
> >
> > e.g. if we corrected errors and need a reboot, then we return 4 to
> > get the system to drop into a shell with this:
> >
> > *** Errors corrected, You must reboot now! ***
> > #
> 
> That makes a sense, but I have a question here. Can it happen with
> xfs, with on an unmounted fs, and can I detect it somehow? I can parse

"errors fixed, you must reboot now" ought never happen with an unmounted
fs.  Eric pointed out that at least on Debian the initramfs scripts rely
on the initramfs hook scripts dumping everything the initramfs scripts
will need into the initramfs itself, so there is no more "load the root
fs to check the rootfs" like we used to have.

Perhaps that's why Debian ignore that case in
/usr/share/initramfs*/scripts/functions, and if we can make the same
argument for rhel/suse/anyotherdistrothatcares then maybe we can lay
this case to rest once and for all.

(Or not bother with -d I suppose, as Jan suggests further down.)

> through -vvvv output, but that doesn't look like a sensible solution -
> too many things that could be missed if it is an issue and we need to
> watch out for corrections that requires a reboot.
> 
> And another thing is that after we drop the user into the shell, they
> won't see any message directly I think but will have to read the logs
> to see the "Errors corrected, You must reboot now!"

Hopefully the repair output was the last thing printed on the screen
before we started the shell?

> >
> > If we had any other failure, then we drop to the shell with this:
> >
> > *** Failed to repair filesystem. Manual correction required! ***
> > #
> >
> > And only in the case that we have an unmounted, clean filesystem do
> > we continue to boot and mount the root filesystem?
> >
> >> > This happens before root volume is mounted during the boot, so I
> >> > propose this behaviour for fsck.xfs:
> >> > - if the volume/device is mounted, exit with 16 - usage or syntax
> >> > error (just to be sure)
> >
> > That's error 4 - requires manual intervention to repair.
> 
> Mmh... yes, probably better to throw the shell at the user. If this
> happens, something has gone bad anyway, because the fsck should be run
> before systemd/init scripts attempts to mount the fs, so it should
> never happen in the boot environment.

Agreed.

> >
> >> > - if the volume/device has a dirty log, exit with 4 - errors left
> >> > uncorrected (drop to the shell)
> >
> > Yup.
> >
> >> > - if we find no errors, exit with 0 - no errors
> >
> > Yup, but only if the filesystem is not mounted, otherwise it's
> > "requires reboot" because repair with no errors still rewrites all
> > the per-ag metadata and so changes the on disk metadata layout.
> > Continuing at this point with a mounted filesystem is guaranteed to
> > corrupt the filesystem.
> 
> We refuse to start with a mounted fs, so this is no issue.
> 
> >
> >> > - if we find anything and xfs_repair ends successfully, exit with 1 -
> >> > errors corrected
> >
> > Same as the above case - needs reboot.
> 
> Do we? The fs wasn't mounted at this point yet. Maybe there is a
> reason for a reboot, I just don't know about it. :-)

I agree that it's fine to move on to mounting the rootfs provided we
don't do -d.

--D

> >
> >> > - anything else and exit with 8 - operational error
> >
> > I'd argue that's "errors uncorrected" (i.e. error 4) because it
> > requires manual intervention to determine what the error was and
> > resolve the situation.
> >
> >> > And is there any other way how to get the "there were some errors, but
> >> > we corrected them" other than either 1) screenscrape xfs_repair or 2)
> >> > run xfs_repair twice, once with -n to detect and then without -n to
> >> > fix the found errors?
> >>
> >> I wouldn't run it twice, repair can take quite a while to run.
> >
> > Besides, we have to treat both cases the same because even if there
> > were no errors xfs_repair still modifies the metadata on disk....
> 
> A small change in xfs_repair solves this, no issue for fsck anymore.
> 
> >
> >> > I'm not aware of any script or tool that would refuse to work except
> >> > when started in a specific environment and noninteractively (doesn't
> >> > mean they don't exist, but it is not common). And because it seems
> >> > that fsck.xfs -f will do only what bare xfs_repair would do, no log
> >> > replay, nothing... then I really think that changing what the script
> >> > does (not just altering its output) based on environment tests is
> >> > unnecessary.
> >
> > This isn't a technical issue - this is a behaviour management issue.
> > We want people to always run xfs_repair when there's a problem that
> > needs fixing, not fsck.xfs. fsck.xfs is for startup scripts only and
> > has very limited scope in what we allow it to do. xfs_repair is the
> > tool users should be running to fix their filesystems, not trying to
> > do it indirectly through startup script infrastructure....
> >
> 
> OK
> 
> Cheers,
> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-14 15:25                     ` Darrick J. Wong
@ 2018-03-14 21:10                       ` Dave Chinner
  2018-03-15 17:01                       ` Jan Tulak
  1 sibling, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2018-03-14 21:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Jan Tulak, Eric Sandeen, linux-xfs

On Wed, Mar 14, 2018 at 08:25:04AM -0700, Darrick J. Wong wrote:
> On Wed, Mar 14, 2018 at 02:51:51PM +0100, Jan Tulak wrote:
> > On Thu, Mar 8, 2018 at 11:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> > >> > - if the volume/device has a dirty log, exit with 4 - errors left
> > >> > uncorrected (drop to the shell)
> > >
> > > Yup.
> > >
> > >> > - if we find no errors, exit with 0 - no errors
> > >
> > > Yup, but only if the filesystem is not mounted, otherwise it's
> > > "requires reboot" because repair with no errors still rewrites all
> > > the per-ag metadata and so changes the on disk metadata layout.
> > > Continuing at this point with a mounted filesystem is guaranteed to
> > > corrupt the filesystem.
> > 
> > We refuse to start with a mounted fs, so this is no issue.
> > 
> > >
> > >> > - if we find anything and xfs_repair ends successfully, exit with 1 -
> > >> > errors corrected
> > >
> > > Same as the above case - needs reboot.
> > 
> > Do we? The fs wasn't mounted at this point yet. Maybe there is a
> > reason for a reboot, I just don't know about it. :-)
> 
> I agree that it's fine to move on to mounting the rootfs provided we
> don't do -d.

Right, my was "same as the above case" where I said:

| Yup, but only if the filesystem is not mounted, otherwise it's
| "requires reboot"

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-14 15:19                   ` Eric Sandeen
@ 2018-03-15 11:16                     ` Jan Tulak
  2018-03-15 22:19                       ` Dave Chinner
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 11:16 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, linux-xfs

On Wed, Mar 14, 2018 at 4:19 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 3/14/18 9:30 AM, Jan Tulak wrote:
>> On Fri, Mar 9, 2018 at 12:28 AM, Eric Sandeen <sandeen@sandeen.net>
>> wrote:
>>>
>>> If it's critical to report whether errors were fixed, it would be
>>> trivial to add a new option to xfs_repair which causes it to test
>>> fs_is_dirty for runs without "-n", and exit with a different
>>> value.
>>
>> I have toyed with it a bit and this seems to be the best option. A
>> flag that changes the exit code on a successful run. Is exit code 3
>> ok? According to man page, only 1 and 2 are currently used and the
>> "everything is ok now, but an issue was there" should not be mixed
>> with the existing ones. I also thought about a flag that would
>> change all exit codes to fsck ones, but that seems too complicated
>> and completely unnecessary.
>
> Hm, I guess we'll have to.

We can either map xfs_repair to fsck in the fsck.xfs script, or change
xfs_repair to use the fsck exit codes. I'm for the first variant (just
add a one new exit code for xfs_repair and remap it for fsck codes in
the script) rather than complicate xfs_repair with two sets of exit
codes.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-14 15:25                     ` Darrick J. Wong
  2018-03-14 21:10                       ` Dave Chinner
@ 2018-03-15 17:01                       ` Jan Tulak
  1 sibling, 0 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 17:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, Eric Sandeen, linux-xfs

On Wed, Mar 14, 2018 at 4:25 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>> through -vvvv output, but that doesn't look like a sensible solution -
>> too many things that could be missed if it is an issue and we need to
>> watch out for corrections that requires a reboot.
>>
>> And another thing is that after we drop the user into the shell, they
>> won't see any message directly I think but will have to read the logs
>> to see the "Errors corrected, You must reboot now!"
>
> Hopefully the repair output was the last thing printed on the screen
> before we started the shell?
>

Sadly, not in all distros. What you see in Fedora is that a service
failed and "see 'systemctl status systemd-fsck-root.service' for
details" followed by some other status reports for other things. It at
least points you in the right direction and in the log you can see the
fsck.xfs and xfs_repair output. But I doubt there is any other option,
and at least it is in the log if not on the screen.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors
  2018-03-05 15:05 [PATCH] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
  2018-03-05 21:56 ` Dave Chinner
@ 2018-03-15 17:45 ` Jan Tulak
  2018-03-15 17:45   ` [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
                     ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 17:45 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

xfs_repair ends with a return code 0 if it finished ok, no matter if
there were some errors in the fs, or not. The new flag -e means that we
can avoid screenscraping and parsing text output to detect if an error
was found (and corrected).

If something could not be corrected or in any other case than the "found
something but fixed it all," the behaviour with this flag is unchanged.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
---
 man/man8/xfs_repair.8 | 14 ++++++++++----
 repair/xfs_repair.c   | 17 +++++++++++++++--
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
index 85e4dc97..1b197571 100644
--- a/man/man8/xfs_repair.8
+++ b/man/man8/xfs_repair.8
@@ -4,7 +4,7 @@ xfs_repair \- repair an XFS filesystem
 .SH SYNOPSIS
 .B xfs_repair
 [
-.B \-dfLnPv
+.B \-defLnPv
 ] [
 .B \-m
 .I maxmem
@@ -168,6 +168,9 @@ Repair dangerously. Allow
 to repair an XFS filesystem mounted read only. This is typically done
 on a root filesystem from single user mode, immediately followed by a reboot.
 .TP
+.B \-e
+If any error was found and repaired, the exit code is 3 instead of the usual 0.
+.TP
 .B \-V
 Prints the version number and exits.
 .SS Checks Performed
@@ -512,14 +515,17 @@ will return a status of 1 if filesystem corruption was detected and
 0 if no filesystem corruption was detected.
 .B xfs_repair
 run without the \-n option will always return a status code of 0 if
-it completes without problems.  If a runtime error is encountered
-during operation, it will return a status of 1.  In this case,
+it completes without problems, unless the flag
+.B -e
+is used. If it is used, then status code 3 is reported when any issue with the
+filesystem was found, but could be fixed. If a runtime error is encountered during
+operation, it will return a status of 1. In this case,
 .B xfs_repair
 should be restarted.  If
 .B xfs_repair is unable
 to proceed due to a dirty log, it will return a status of 2.  See below.
 .SH DIRTY LOGS
-Due to the design of the XFS log, a dirty log can only be replayed 
+Due to the design of the XFS log, a dirty log can only be replayed
 by the kernel, on a machine having the same CPU architecture as the
 machine which was writing to the log.
 .B xfs_repair
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 312a0d08..94015da6 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -77,6 +77,7 @@ static char *c_opts[] = {
 static int	bhash_option_used;
 static long	max_mem_specified;	/* in megabytes */
 static int	phase2_threads = 32;
+static int report_corrected = 0;
 
 static void
 usage(void)
@@ -97,6 +98,7 @@ usage(void)
 "  -o subopts   Override default behaviour, refer to man page.\n"
 "  -t interval  Reporting interval in seconds.\n"
 "  -d           Repair dangerously.\n"
+"  -e           Exit with a non-zero code even when all errors were repaired.\n"
 "  -V           Reports version and exits.\n"), progname);
 	exit(1);
 }
@@ -214,12 +216,13 @@ process_args(int argc, char **argv)
 	ag_stride = 0;
 	thread_count = 1;
 	report_interval = PROG_RPT_DEFAULT;
+	report_corrected = 0;
 
 	/*
 	 * XXX have to add suboption processing here
 	 * attributes, quotas, nlinks, aligned_inos, sb_fbits
 	 */
-	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:")) != EOF)  {
+	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPet:")) != EOF)  {
 		switch (c) {
 		case 'D':
 			dumpcore = 1;
@@ -329,6 +332,9 @@ process_args(int argc, char **argv)
 		case 't':
 			report_interval = (int)strtol(optarg, NULL, 0);
 			break;
+		case 'e':
+			report_corrected = 1;
+			break;
 		case '?':
 			usage();
 		}
@@ -1096,5 +1102,12 @@ _("Repair of readonly mount complete.  Immediate reboot encouraged.\n"));
 
 	free(msgbuf);
 
-	return (0);
+	if (report_corrected) {
+		if (fs_is_dirty)
+			return (3);
+		else
+			return (0);
+	} else {
+		return (0);
+	}
 }
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
@ 2018-03-15 17:45   ` Jan Tulak
  2018-03-15 17:47     ` [PATCH 2/2 v2] " Jan Tulak
  2018-03-15 18:03   ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Darrick J. Wong
  2018-03-15 18:23   ` [PATCH 1/2 v2] " Jan Tulak
  2 siblings, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 17:45 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force), so user can require the repair,
without the repair being run all the time.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
Changelog:
v1:
- test for xfs_repair binary
- run only in non-interactive session
- translate xfs_repair return codes to fsck ones
- run only if the filesystem is not mounted
- add manpage update
---
 fsck/xfs_fsck.sh    | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 man/man8/fsck.xfs.8 | 12 +++++++++-
 2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index e52969e4..92f246cf 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -3,11 +3,42 @@
 # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 
+NAME=$0
+
+# get the right return code for fsck
+function repair2fsck_code() {
+	case $1 in
+	0)  return 0 # everything is ok
+		;;
+	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
+		return 4 # errors left uncorrected
+		;;
+	2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		return 4 # it should not me mounted during boot, something is wrong
+		;;
+	3)  return 1 # The fs has been fixed
+		;;
+	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
+		return 4 # something went wrong with xfs_repair
+	esac
+}
+
+function ensure_not_mounted() {
+	local dev=$1
+	mounted=`grep -c "^$dev " /proc/mounts`
+	if [ $mounted -ne 0 ]; then
+		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		exit 4
+	fi
+}
+
 AUTO=false
-while getopts ":aApy" c
+FORCE=false
+while getopts ":aApyf" c
 do
 	case $c in
 	a|A|p|y)	AUTO=true;;
+	f)      	FORCE=true;;
 	esac
 done
 eval DEV=\${$#}
@@ -15,10 +46,41 @@ if [ ! -e $DEV ]; then
 	echo "$0: $DEV does not exist"
 	exit 8
 fi
+
+# The flag -f is added by systemd/init scripts when /forcefsck file is present
+# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
+# this check, user has to explicitly require a forced fsck.
+# But first of all, test if it is a non-interactive session. Use multiple
+# methods to capture most of the cases:
+# The case for *i* and -n "$PS1" are commonly suggested in bash manual
+# and the -t 0 test checks stdin
+case $- in
+	*i*) FORCE=false ;;
+esac
+if [ -n "$PS1" -o -t 0 ]; then
+	FORCE=false
+fi
+
+if $FORCE; then
+	if [ -f /sbin/xfs_repair ]; then
+		BIN="/sbin/xfs_repair"
+	elif [ -f /usr/sbin/xfs_repair ]; then
+		BIN="/usr/sbin/xfs_repair"
+	else
+		echo "$NAME error: xfs_repair was not found!" 1>&2
+		exit 4
+	fi
+
+	ensure_not_mounted $DEV
+
+	$BIN -e $DEV
+	repair2fsck_code $?
+	exit $?
+fi
+
 if $AUTO; then
 	echo "$0: XFS file system."
 else
 	echo "If you wish to check the consistency of an XFS filesystem or"
 	echo "repair a damaged filesystem, see xfs_repair(8)."
 fi
-exit 0
diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
index ace7252d..3eb3ad7f 100644
--- a/man/man8/fsck.xfs.8
+++ b/man/man8/fsck.xfs.8
@@ -1,6 +1,6 @@
 .TH fsck.xfs 8
 .SH NAME
-fsck.xfs \- do nothing, successfully
+fsck.xfs \- do nothing, successfuly
 .SH SYNOPSIS
 .B fsck.xfs
 [
@@ -21,6 +21,16 @@ If you wish to check the consistency of an XFS filesystem,
 or repair a damaged or corrupt XFS filesystem,
 see
 .BR xfs_repair (8).
+.PP
+However, it may happen that a forced repair is required and in makes sense to start
+.BR xfs_repair (8)
+on startup (either using /forcefsck file or fsck.mode=force kernel option). In this case
+.B fsck.xfs
+run
+.B xfs_repair
+for the user. But if run outside of boot environment or without the
+.B -f
+option, it does nothing.
 .
 .SH FILES
 .IR /etc/fstab .
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2 v2] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 17:45   ` [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
@ 2018-03-15 17:47     ` Jan Tulak
  2018-03-15 17:50       ` [PATCH 2/2] " Jan Tulak
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 17:47 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force), so user can require the repair,
without the repair being run all the time.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
Changelog:
v2:
- return the "exit 0" at the end

v1:
- test for xfs_repair binary
- run only in non-interactive session
- translate xfs_repair return codes to fsck ones
- run only if the filesystem is not mounted
- add manpage update
---
 fsck/xfs_fsck.sh    | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 man/man8/fsck.xfs.8 | 12 +++++++++-
 2 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index e52969e4..01561498 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -3,11 +3,42 @@
 # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 
+NAME=$0
+
+# get the right return code for fsck
+function repair2fsck_code() {
+	case $1 in
+	0)  return 0 # everything is ok
+		;;
+	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
+		return 4 # errors left uncorrected
+		;;
+	2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		return 4 # it should not me mounted during boot, something is wrong
+		;;
+	3)  return 1 # The fs has been fixed
+		;;
+	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
+		return 4 # something went wrong with xfs_repair
+	esac
+}
+
+function ensure_not_mounted() {
+	local dev=$1
+	mounted=`grep -c "^$dev " /proc/mounts`
+	if [ $mounted -ne 0 ]; then
+		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		exit 4
+	fi
+}
+
 AUTO=false
-while getopts ":aApy" c
+FORCE=false
+while getopts ":aApyf" c
 do
 	case $c in
 	a|A|p|y)	AUTO=true;;
+	f)      	FORCE=true;;
 	esac
 done
 eval DEV=\${$#}
@@ -15,10 +46,42 @@ if [ ! -e $DEV ]; then
 	echo "$0: $DEV does not exist"
 	exit 8
 fi
+
+# The flag -f is added by systemd/init scripts when /forcefsck file is present
+# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
+# this check, user has to explicitly require a forced fsck.
+# But first of all, test if it is a non-interactive session. Use multiple
+# methods to capture most of the cases:
+# The case for *i* and -n "$PS1" are commonly suggested in bash manual
+# and the -t 0 test checks stdin
+case $- in
+	*i*) FORCE=false ;;
+esac
+if [ -n "$PS1" -o -t 0 ]; then
+	FORCE=false
+fi
+
+if $FORCE; then
+	if [ -f /sbin/xfs_repair ]; then
+		BIN="/sbin/xfs_repair"
+	elif [ -f /usr/sbin/xfs_repair ]; then
+		BIN="/usr/sbin/xfs_repair"
+	else
+		echo "$NAME error: xfs_repair was not found!" 1>&2
+		exit 4
+	fi
+
+	ensure_not_mounted $DEV
+
+	$BIN -e $DEV
+	repair2fsck_code $?
+	exit $?
+fi
+
 if $AUTO; then
 	echo "$0: XFS file system."
 else
 	echo "If you wish to check the consistency of an XFS filesystem or"
 	echo "repair a damaged filesystem, see xfs_repair(8)."
 fi
-exit 0
+exit 0
\ No newline at end of file
diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
index ace7252d..3eb3ad7f 100644
--- a/man/man8/fsck.xfs.8
+++ b/man/man8/fsck.xfs.8
@@ -1,6 +1,6 @@
 .TH fsck.xfs 8
 .SH NAME
-fsck.xfs \- do nothing, successfully
+fsck.xfs \- do nothing, successfuly
 .SH SYNOPSIS
 .B fsck.xfs
 [
@@ -21,6 +21,16 @@ If you wish to check the consistency of an XFS filesystem,
 or repair a damaged or corrupt XFS filesystem,
 see
 .BR xfs_repair (8).
+.PP
+However, it may happen that a forced repair is required and in makes sense to start
+.BR xfs_repair (8)
+on startup (either using /forcefsck file or fsck.mode=force kernel option). In this case
+.B fsck.xfs
+run
+.B xfs_repair
+for the user. But if run outside of boot environment or without the
+.B -f
+option, it does nothing.
 .
 .SH FILES
 .IR /etc/fstab .
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 17:47     ` [PATCH 2/2 v2] " Jan Tulak
@ 2018-03-15 17:50       ` Jan Tulak
  2018-03-15 18:11         ` Darrick J. Wong
  2018-03-15 18:28         ` [PATCH 2/2 v4] " Jan Tulak
  0 siblings, 2 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 17:50 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force), so user can require the repair,
without the repair being run all the time.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
Changelog:
v3:
- too quick with fixing in v2... add line at the end of the file
v2:
- return the "exit 0" at the end

v1:
- test for xfs_repair binary
- run only in non-interactive session
- translate xfs_repair return codes to fsck ones
- run only if the filesystem is not mounted
- add manpage update
---
 fsck/xfs_fsck.sh    | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 man/man8/fsck.xfs.8 | 12 +++++++++-
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index e52969e4..0ec6b049 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -3,11 +3,42 @@
 # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 
+NAME=$0
+
+# get the right return code for fsck
+function repair2fsck_code() {
+	case $1 in
+	0)  return 0 # everything is ok
+		;;
+	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
+		return 4 # errors left uncorrected
+		;;
+	2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		return 4 # it should not me mounted during boot, something is wrong
+		;;
+	3)  return 1 # The fs has been fixed
+		;;
+	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
+		return 4 # something went wrong with xfs_repair
+	esac
+}
+
+function ensure_not_mounted() {
+	local dev=$1
+	mounted=`grep -c "^$dev " /proc/mounts`
+	if [ $mounted -ne 0 ]; then
+		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		exit 4
+	fi
+}
+
 AUTO=false
-while getopts ":aApy" c
+FORCE=false
+while getopts ":aApyf" c
 do
 	case $c in
 	a|A|p|y)	AUTO=true;;
+	f)      	FORCE=true;;
 	esac
 done
 eval DEV=\${$#}
@@ -15,6 +46,38 @@ if [ ! -e $DEV ]; then
 	echo "$0: $DEV does not exist"
 	exit 8
 fi
+
+# The flag -f is added by systemd/init scripts when /forcefsck file is present
+# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
+# this check, user has to explicitly require a forced fsck.
+# But first of all, test if it is a non-interactive session. Use multiple
+# methods to capture most of the cases:
+# The case for *i* and -n "$PS1" are commonly suggested in bash manual
+# and the -t 0 test checks stdin
+case $- in
+	*i*) FORCE=false ;;
+esac
+if [ -n "$PS1" -o -t 0 ]; then
+	FORCE=false
+fi
+
+if $FORCE; then
+	if [ -f /sbin/xfs_repair ]; then
+		BIN="/sbin/xfs_repair"
+	elif [ -f /usr/sbin/xfs_repair ]; then
+		BIN="/usr/sbin/xfs_repair"
+	else
+		echo "$NAME error: xfs_repair was not found!" 1>&2
+		exit 4
+	fi
+
+	ensure_not_mounted $DEV
+
+	$BIN -e $DEV
+	repair2fsck_code $?
+	exit $?
+fi
+
 if $AUTO; then
 	echo "$0: XFS file system."
 else
diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
index ace7252d..3eb3ad7f 100644
--- a/man/man8/fsck.xfs.8
+++ b/man/man8/fsck.xfs.8
@@ -1,6 +1,6 @@
 .TH fsck.xfs 8
 .SH NAME
-fsck.xfs \- do nothing, successfully
+fsck.xfs \- do nothing, successfuly
 .SH SYNOPSIS
 .B fsck.xfs
 [
@@ -21,6 +21,16 @@ If you wish to check the consistency of an XFS filesystem,
 or repair a damaged or corrupt XFS filesystem,
 see
 .BR xfs_repair (8).
+.PP
+However, it may happen that a forced repair is required and in makes sense to start
+.BR xfs_repair (8)
+on startup (either using /forcefsck file or fsck.mode=force kernel option). In this case
+.B fsck.xfs
+run
+.B xfs_repair
+for the user. But if run outside of boot environment or without the
+.B -f
+option, it does nothing.
 .
 .SH FILES
 .IR /etc/fstab .
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors
  2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
  2018-03-15 17:45   ` [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
@ 2018-03-15 18:03   ` Darrick J. Wong
  2018-03-15 18:23   ` [PATCH 1/2 v2] " Jan Tulak
  2 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-15 18:03 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-xfs

On Thu, Mar 15, 2018 at 06:45:20PM +0100, Jan Tulak wrote:
> xfs_repair ends with a return code 0 if it finished ok, no matter if
> there were some errors in the fs, or not. The new flag -e means that we
> can avoid screenscraping and parsing text output to detect if an error
> was found (and corrected).
> 
> If something could not be corrected or in any other case than the "found
> something but fixed it all," the behaviour with this flag is unchanged.
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> ---
>  man/man8/xfs_repair.8 | 14 ++++++++++----
>  repair/xfs_repair.c   | 17 +++++++++++++++--
>  2 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
> index 85e4dc97..1b197571 100644
> --- a/man/man8/xfs_repair.8
> +++ b/man/man8/xfs_repair.8
> @@ -4,7 +4,7 @@ xfs_repair \- repair an XFS filesystem
>  .SH SYNOPSIS
>  .B xfs_repair
>  [
> -.B \-dfLnPv
> +.B \-defLnPv
>  ] [
>  .B \-m
>  .I maxmem
> @@ -168,6 +168,9 @@ Repair dangerously. Allow
>  to repair an XFS filesystem mounted read only. This is typically done
>  on a root filesystem from single user mode, immediately followed by a reboot.
>  .TP
> +.B \-e
> +If any error was found and repaired, the exit code is 3 instead of the usual 0.

"If any metadata corruption was found, the status returned is 3..."

Clarifying exactly what kinds of errors generate status 3, and making
the language more consistent with the EXIT STATUS section.

> +.TP
>  .B \-V
>  Prints the version number and exits.
>  .SS Checks Performed
> @@ -512,14 +515,17 @@ will return a status of 1 if filesystem corruption was detected and
>  0 if no filesystem corruption was detected.
>  .B xfs_repair
>  run without the \-n option will always return a status code of 0 if
> -it completes without problems.  If a runtime error is encountered
> -during operation, it will return a status of 1.  In this case,
> +it completes without problems, unless the flag
> +.B -e
> +is used. If it is used, then status code 3 is reported when any issue with the

"...status 3 is returned..."

> +filesystem was found, but could be fixed. If a runtime error is encountered during
> +operation, it will return a status of 1. In this case,
>  .B xfs_repair
>  should be restarted.  If
>  .B xfs_repair is unable
>  to proceed due to a dirty log, it will return a status of 2.  See below.
>  .SH DIRTY LOGS
> -Due to the design of the XFS log, a dirty log can only be replayed 
> +Due to the design of the XFS log, a dirty log can only be replayed
>  by the kernel, on a machine having the same CPU architecture as the
>  machine which was writing to the log.
>  .B xfs_repair
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index 312a0d08..94015da6 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -77,6 +77,7 @@ static char *c_opts[] = {
>  static int	bhash_option_used;
>  static long	max_mem_specified;	/* in megabytes */
>  static int	phase2_threads = 32;
> +static int report_corrected = 0;

It is not necessary to initialize static global variables to zero.

Also, bool not int because the value is boolean.

>  static void
>  usage(void)
> @@ -97,6 +98,7 @@ usage(void)
>  "  -o subopts   Override default behaviour, refer to man page.\n"
>  "  -t interval  Reporting interval in seconds.\n"
>  "  -d           Repair dangerously.\n"
> +"  -e           Exit with a non-zero code even when all errors were repaired.\n"
>  "  -V           Reports version and exits.\n"), progname);
>  	exit(1);
>  }
> @@ -214,12 +216,13 @@ process_args(int argc, char **argv)
>  	ag_stride = 0;
>  	thread_count = 1;
>  	report_interval = PROG_RPT_DEFAULT;
> +	report_corrected = 0;

(or here)

>  
>  	/*
>  	 * XXX have to add suboption processing here
>  	 * attributes, quotas, nlinks, aligned_inos, sb_fbits
>  	 */
> -	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:")) != EOF)  {
> +	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPet:")) != EOF)  {
>  		switch (c) {
>  		case 'D':
>  			dumpcore = 1;
> @@ -329,6 +332,9 @@ process_args(int argc, char **argv)
>  		case 't':
>  			report_interval = (int)strtol(optarg, NULL, 0);
>  			break;
> +		case 'e':
> +			report_corrected = 1;
> +			break;
>  		case '?':
>  			usage();
>  		}
> @@ -1096,5 +1102,12 @@ _("Repair of readonly mount complete.  Immediate reboot encouraged.\n"));
>  
>  	free(msgbuf);
>  
> -	return (0);
> +	if (report_corrected) {
> +		if (fs_is_dirty)
> +			return (3);
> +		else
> +			return (0);
> +	} else {
> +		return (0);
> +	}

if (fs_is_dirty && report_corrected)
	return (3);
return (0);

?

--D

>  }
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 17:50       ` [PATCH 2/2] " Jan Tulak
@ 2018-03-15 18:11         ` Darrick J. Wong
  2018-03-15 18:22           ` Jan Tulak
  2018-03-15 18:28         ` [PATCH 2/2 v4] " Jan Tulak
  1 sibling, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-15 18:11 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-xfs

On Thu, Mar 15, 2018 at 06:50:26PM +0100, Jan Tulak wrote:
> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> run on every unclean shutdown. However, sometimes it may happen that the
> root filesystem really requires the usage of xfs_repair and then it is a
> hassle. This patch makes the situation a bit easier by detecting forced
> checks (/forcefsck or fsck.mode=force), so user can require the repair,
> without the repair being run all the time.
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> 
> ---
> Changelog:
> v3:
> - too quick with fixing in v2... add line at the end of the file
> v2:
> - return the "exit 0" at the end
> 
> v1:
> - test for xfs_repair binary
> - run only in non-interactive session
> - translate xfs_repair return codes to fsck ones
> - run only if the filesystem is not mounted
> - add manpage update
> ---
>  fsck/xfs_fsck.sh    | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  man/man8/fsck.xfs.8 | 12 +++++++++-
>  2 files changed, 75 insertions(+), 2 deletions(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index e52969e4..0ec6b049 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -3,11 +3,42 @@
>  # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
>  #
>  
> +NAME=$0
> +
> +# get the right return code for fsck
> +function repair2fsck_code() {
> +	case $1 in
> +	0)  return 0 # everything is ok
> +		;;
> +	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
> +		return 4 # errors left uncorrected
> +		;;
> +	2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
> +		return 4 # it should not me mounted during boot, something is wrong
> +		;;
> +	3)  return 1 # The fs has been fixed
> +		;;
> +	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
> +		return 4 # something went wrong with xfs_repair
> +	esac
> +}
> +
> +function ensure_not_mounted() {
> +	local dev=$1
> +	mounted=`grep -c "^$dev " /proc/mounts`
> +	if [ $mounted -ne 0 ]; then
> +		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
> +		exit 4
> +	fi
> +}
> +
>  AUTO=false
> -while getopts ":aApy" c
> +FORCE=false
> +while getopts ":aApyf" c
>  do
>  	case $c in
>  	a|A|p|y)	AUTO=true;;
> +	f)      	FORCE=true;;
>  	esac
>  done
>  eval DEV=\${$#}
> @@ -15,6 +46,38 @@ if [ ! -e $DEV ]; then
>  	echo "$0: $DEV does not exist"
>  	exit 8
>  fi
> +
> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> +# this check, user has to explicitly require a forced fsck.
> +# But first of all, test if it is a non-interactive session. Use multiple
> +# methods to capture most of the cases:
> +# The case for *i* and -n "$PS1" are commonly suggested in bash manual
> +# and the -t 0 test checks stdin
> +case $- in
> +	*i*) FORCE=false ;;
> +esac
> +if [ -n "$PS1" -o -t 0 ]; then
> +	FORCE=false
> +fi
> +
> +if $FORCE; then
> +	if [ -f /sbin/xfs_repair ]; then
> +		BIN="/sbin/xfs_repair"
> +	elif [ -f /usr/sbin/xfs_repair ]; then
> +		BIN="/usr/sbin/xfs_repair"

Can we just run xfs_repair and assume it's in the PATH?

> +	else
> +		echo "$NAME error: xfs_repair was not found!" 1>&2
> +		exit 4
> +	fi
> +
> +	ensure_not_mounted $DEV
> +
> +	$BIN -e $DEV
> +	repair2fsck_code $?
> +	exit $?
> +fi
> +
>  if $AUTO; then
>  	echo "$0: XFS file system."
>  else
> diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
> index ace7252d..3eb3ad7f 100644
> --- a/man/man8/fsck.xfs.8
> +++ b/man/man8/fsck.xfs.8
> @@ -1,6 +1,6 @@
>  .TH fsck.xfs 8
>  .SH NAME
> -fsck.xfs \- do nothing, successfully
> +fsck.xfs \- do nothing, successfuly

successfully has two 'l'.

>  .SH SYNOPSIS
>  .B fsck.xfs
>  [
> @@ -21,6 +21,16 @@ If you wish to check the consistency of an XFS filesystem,
>  or repair a damaged or corrupt XFS filesystem,
>  see
>  .BR xfs_repair (8).
> +.PP
> +However, it may happen that a forced repair is required and in makes sense to start
> +.BR xfs_repair (8)
> +on startup (either using /forcefsck file or fsck.mode=force kernel option). In this case
> +.B fsck.xfs
> +run
> +.B xfs_repair
> +for the user. But if run outside of boot environment or without the
> +.B -f
> +option, it does nothing.

Bleh, I hate manpages.  I'll just ... do this without the formatting.

However, the system administrator can force fsck.xfs to run
xfs_repair(8) by creating a /forcefsck file, booting the system with
"fsck.mode=force" on the kernel command line, or by running fsck.xfs -f.

--D

>  .
>  .SH FILES
>  .IR /etc/fstab .
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 18:11         ` Darrick J. Wong
@ 2018-03-15 18:22           ` Jan Tulak
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 18:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Mar 15, 2018 at 7:11 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>> +
>> +if $FORCE; then
>> +     if [ -f /sbin/xfs_repair ]; then
>> +             BIN="/sbin/xfs_repair"
>> +     elif [ -f /usr/sbin/xfs_repair ]; then
>> +             BIN="/usr/sbin/xfs_repair"
>
> Can we just run xfs_repair and assume it's in the PATH?

Because not every distro has bundled xfs_repair into initramfs, I
would rather expect that they can forget to add it after this patch
than have an unexpected failure.

Other points are ok, I'm sending the patches...

Thanks,
Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/2 v2] xfs_repair: add flag -e to detect corrected errors
  2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
  2018-03-15 17:45   ` [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
  2018-03-15 18:03   ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Darrick J. Wong
@ 2018-03-15 18:23   ` Jan Tulak
  2018-03-15 18:44     ` Darrick J. Wong
                       ` (2 more replies)
  2 siblings, 3 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 18:23 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

xfs_repair ends with a return code 0 if it finished ok, no matter if
there were some errors in the fs, or not. The new flag -e means that we
can avoid screenscraping and parsing text output to detect if an error
was found (and corrected).

If something could not be corrected or in any other case than the "found
something but fixed it all," the behaviour with this flag is unchanged.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
v2:
- edit man page changes
- report_corrected is now bool
- minor code simplification
---
 man/man8/xfs_repair.8 | 15 +++++++++++----
 repair/xfs_repair.c   | 10 +++++++++-
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
index 85e4dc97..1ca3b614 100644
--- a/man/man8/xfs_repair.8
+++ b/man/man8/xfs_repair.8
@@ -4,7 +4,7 @@ xfs_repair \- repair an XFS filesystem
 .SH SYNOPSIS
 .B xfs_repair
 [
-.B \-dfLnPv
+.B \-defLnPv
 ] [
 .B \-m
 .I maxmem
@@ -168,6 +168,10 @@ Repair dangerously. Allow
 to repair an XFS filesystem mounted read only. This is typically done
 on a root filesystem from single user mode, immediately followed by a reboot.
 .TP
+.B \-e
+If any metadata corruption was found, the status returned is 3 instead of the
+usual 0.
+.TP
 .B \-V
 Prints the version number and exits.
 .SS Checks Performed
@@ -512,14 +516,17 @@ will return a status of 1 if filesystem corruption was detected and
 0 if no filesystem corruption was detected.
 .B xfs_repair
 run without the \-n option will always return a status code of 0 if
-it completes without problems.  If a runtime error is encountered
-during operation, it will return a status of 1.  In this case,
+it completes without problems, unless the flag
+.B -e
+is used. If it is used, then status 3 is reported when any issue with the
+filesystem was found, but could be fixed. If a runtime error is encountered during
+operation, it will return a status of 1. In this case,
 .B xfs_repair
 should be restarted.  If
 .B xfs_repair is unable
 to proceed due to a dirty log, it will return a status of 2.  See below.
 .SH DIRTY LOGS
-Due to the design of the XFS log, a dirty log can only be replayed 
+Due to the design of the XFS log, a dirty log can only be replayed
 by the kernel, on a machine having the same CPU architecture as the
 machine which was writing to the log.
 .B xfs_repair
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 312a0d08..a65709ce 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -77,6 +77,7 @@ static char *c_opts[] = {
 static int	bhash_option_used;
 static long	max_mem_specified;	/* in megabytes */
 static int	phase2_threads = 32;
+static bool report_corrected;
 
 static void
 usage(void)
@@ -97,6 +98,7 @@ usage(void)
 "  -o subopts   Override default behaviour, refer to man page.\n"
 "  -t interval  Reporting interval in seconds.\n"
 "  -d           Repair dangerously.\n"
+"  -e           Exit with a non-zero code even when all errors were repaired.\n"
 "  -V           Reports version and exits.\n"), progname);
 	exit(1);
 }
@@ -214,12 +216,13 @@ process_args(int argc, char **argv)
 	ag_stride = 0;
 	thread_count = 1;
 	report_interval = PROG_RPT_DEFAULT;
+	report_corrected = false;
 
 	/*
 	 * XXX have to add suboption processing here
 	 * attributes, quotas, nlinks, aligned_inos, sb_fbits
 	 */
-	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:")) != EOF)  {
+	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPet:")) != EOF)  {
 		switch (c) {
 		case 'D':
 			dumpcore = 1;
@@ -329,6 +332,9 @@ process_args(int argc, char **argv)
 		case 't':
 			report_interval = (int)strtol(optarg, NULL, 0);
 			break;
+		case 'e':
+			report_corrected = true;
+			break;
 		case '?':
 			usage();
 		}
@@ -1096,5 +1102,7 @@ _("Repair of readonly mount complete.  Immediate reboot encouraged.\n"));
 
 	free(msgbuf);
 
+	if (fs_is_dirty && report_corrected)
+		return (3);
 	return (0);
 }
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2 v4] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 17:50       ` [PATCH 2/2] " Jan Tulak
  2018-03-15 18:11         ` Darrick J. Wong
@ 2018-03-15 18:28         ` Jan Tulak
  2018-03-15 18:49           ` Darrick J. Wong
  2018-03-16 17:07           ` [PATCH 2/2 v5] " Jan Tulak
  1 sibling, 2 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-15 18:28 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force), so user can require the repair,
without the repair being run all the time.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
I omitted the ", or by running fsck.xfs -f." part, Darrick, because it
works only for non-interactive sessions. Running it manually should do
nothing.

---
Changelog:
v4:
- man page changes
v3:
- too quick with fixing in v2... add line at the end of the file
v2:
- return the "exit 0" at the end

v1:
- test for xfs_repair binary
- run only in non-interactive session
- translate xfs_repair return codes to fsck ones
- run only if the filesystem is not mounted
- add manpage update
---
 fsck/xfs_fsck.sh    | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 man/man8/fsck.xfs.8 |  7 ++++++
 2 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index e52969e4..0ec6b049 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -3,11 +3,42 @@
 # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 
+NAME=$0
+
+# get the right return code for fsck
+function repair2fsck_code() {
+	case $1 in
+	0)  return 0 # everything is ok
+		;;
+	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
+		return 4 # errors left uncorrected
+		;;
+	2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		return 4 # it should not me mounted during boot, something is wrong
+		;;
+	3)  return 1 # The fs has been fixed
+		;;
+	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
+		return 4 # something went wrong with xfs_repair
+	esac
+}
+
+function ensure_not_mounted() {
+	local dev=$1
+	mounted=`grep -c "^$dev " /proc/mounts`
+	if [ $mounted -ne 0 ]; then
+		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		exit 4
+	fi
+}
+
 AUTO=false
-while getopts ":aApy" c
+FORCE=false
+while getopts ":aApyf" c
 do
 	case $c in
 	a|A|p|y)	AUTO=true;;
+	f)      	FORCE=true;;
 	esac
 done
 eval DEV=\${$#}
@@ -15,6 +46,38 @@ if [ ! -e $DEV ]; then
 	echo "$0: $DEV does not exist"
 	exit 8
 fi
+
+# The flag -f is added by systemd/init scripts when /forcefsck file is present
+# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
+# this check, user has to explicitly require a forced fsck.
+# But first of all, test if it is a non-interactive session. Use multiple
+# methods to capture most of the cases:
+# The case for *i* and -n "$PS1" are commonly suggested in bash manual
+# and the -t 0 test checks stdin
+case $- in
+	*i*) FORCE=false ;;
+esac
+if [ -n "$PS1" -o -t 0 ]; then
+	FORCE=false
+fi
+
+if $FORCE; then
+	if [ -f /sbin/xfs_repair ]; then
+		BIN="/sbin/xfs_repair"
+	elif [ -f /usr/sbin/xfs_repair ]; then
+		BIN="/usr/sbin/xfs_repair"
+	else
+		echo "$NAME error: xfs_repair was not found!" 1>&2
+		exit 4
+	fi
+
+	ensure_not_mounted $DEV
+
+	$BIN -e $DEV
+	repair2fsck_code $?
+	exit $?
+fi
+
 if $AUTO; then
 	echo "$0: XFS file system."
 else
diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
index ace7252d..08812be8 100644
--- a/man/man8/fsck.xfs.8
+++ b/man/man8/fsck.xfs.8
@@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
 or repair a damaged or corrupt XFS filesystem,
 see
 .BR xfs_repair (8).
+.PP
+However, the system administrator can force
+.B fsck.xfs
+to run
+.BR xfs_repair (8)
+by creating a /forcefsck file or booting the system with
+"fsck.mode=force" on the kernel command line.
 .
 .SH FILES
 .IR /etc/fstab .
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/2 v2] xfs_repair: add flag -e to detect corrected errors
  2018-03-15 18:23   ` [PATCH 1/2 v2] " Jan Tulak
@ 2018-03-15 18:44     ` Darrick J. Wong
  2018-03-23  1:57     ` Eric Sandeen
  2018-03-23 14:32     ` [PATCH 1/2 v3] xfs_repair: add flag -e to modify exit code for " Jan Tulak
  2 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-15 18:44 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-xfs

On Thu, Mar 15, 2018 at 07:23:08PM +0100, Jan Tulak wrote:
> xfs_repair ends with a return code 0 if it finished ok, no matter if
> there were some errors in the fs, or not. The new flag -e means that we
> can avoid screenscraping and parsing text output to detect if an error
> was found (and corrected).
> 
> If something could not be corrected or in any other case than the "found
> something but fixed it all," the behaviour with this flag is unchanged.
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> ---
> v2:
> - edit man page changes
> - report_corrected is now bool
> - minor code simplification
> ---
>  man/man8/xfs_repair.8 | 15 +++++++++++----
>  repair/xfs_repair.c   | 10 +++++++++-
>  2 files changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
> index 85e4dc97..1ca3b614 100644
> --- a/man/man8/xfs_repair.8
> +++ b/man/man8/xfs_repair.8
> @@ -4,7 +4,7 @@ xfs_repair \- repair an XFS filesystem
>  .SH SYNOPSIS
>  .B xfs_repair
>  [
> -.B \-dfLnPv
> +.B \-defLnPv
>  ] [
>  .B \-m
>  .I maxmem
> @@ -168,6 +168,10 @@ Repair dangerously. Allow
>  to repair an XFS filesystem mounted read only. This is typically done
>  on a root filesystem from single user mode, immediately followed by a reboot.
>  .TP
> +.B \-e
> +If any metadata corruption was found, the status returned is 3 instead of the
> +usual 0.
> +.TP
>  .B \-V
>  Prints the version number and exits.
>  .SS Checks Performed
> @@ -512,14 +516,17 @@ will return a status of 1 if filesystem corruption was detected and
>  0 if no filesystem corruption was detected.
>  .B xfs_repair
>  run without the \-n option will always return a status code of 0 if
> -it completes without problems.  If a runtime error is encountered
> -during operation, it will return a status of 1.  In this case,
> +it completes without problems, unless the flag
> +.B -e
> +is used. If it is used, then status 3 is reported when any issue with the
> +filesystem was found, but could be fixed. If a runtime error is encountered during
> +operation, it will return a status of 1. In this case,
>  .B xfs_repair
>  should be restarted.  If
>  .B xfs_repair is unable
>  to proceed due to a dirty log, it will return a status of 2.  See below.
>  .SH DIRTY LOGS
> -Due to the design of the XFS log, a dirty log can only be replayed 
> +Due to the design of the XFS log, a dirty log can only be replayed
>  by the kernel, on a machine having the same CPU architecture as the
>  machine which was writing to the log.
>  .B xfs_repair
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index 312a0d08..a65709ce 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -77,6 +77,7 @@ static char *c_opts[] = {
>  static int	bhash_option_used;
>  static long	max_mem_specified;	/* in megabytes */
>  static int	phase2_threads = 32;
> +static bool report_corrected;
>  
>  static void
>  usage(void)
> @@ -97,6 +98,7 @@ usage(void)
>  "  -o subopts   Override default behaviour, refer to man page.\n"
>  "  -t interval  Reporting interval in seconds.\n"
>  "  -d           Repair dangerously.\n"
> +"  -e           Exit with a non-zero code even when all errors were repaired.\n"
>  "  -V           Reports version and exits.\n"), progname);
>  	exit(1);
>  }
> @@ -214,12 +216,13 @@ process_args(int argc, char **argv)
>  	ag_stride = 0;
>  	thread_count = 1;
>  	report_interval = PROG_RPT_DEFAULT;
> +	report_corrected = false;
>  
>  	/*
>  	 * XXX have to add suboption processing here
>  	 * attributes, quotas, nlinks, aligned_inos, sb_fbits
>  	 */
> -	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:")) != EOF)  {
> +	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPet:")) != EOF)  {
>  		switch (c) {
>  		case 'D':
>  			dumpcore = 1;
> @@ -329,6 +332,9 @@ process_args(int argc, char **argv)
>  		case 't':
>  			report_interval = (int)strtol(optarg, NULL, 0);
>  			break;
> +		case 'e':
> +			report_corrected = true;
> +			break;
>  		case '?':
>  			usage();
>  		}
> @@ -1096,5 +1102,7 @@ _("Repair of readonly mount complete.  Immediate reboot encouraged.\n"));
>  
>  	free(msgbuf);
>  
> +	if (fs_is_dirty && report_corrected)
> +		return (3);
>  	return (0);
>  }
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v4] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 18:28         ` [PATCH 2/2 v4] " Jan Tulak
@ 2018-03-15 18:49           ` Darrick J. Wong
  2018-03-16 10:19             ` Jan Tulak
  2018-03-16 17:07           ` [PATCH 2/2 v5] " Jan Tulak
  1 sibling, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-15 18:49 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-xfs

On Thu, Mar 15, 2018 at 07:28:50PM +0100, Jan Tulak wrote:
> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> run on every unclean shutdown. However, sometimes it may happen that the
> root filesystem really requires the usage of xfs_repair and then it is a
> hassle. This patch makes the situation a bit easier by detecting forced
> checks (/forcefsck or fsck.mode=force), so user can require the repair,
> without the repair being run all the time.
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> 
> ---
> I omitted the ", or by running fsck.xfs -f." part, Darrick, because it
> works only for non-interactive sessions. Running it manually should do
> nothing.

Ok.

> ---
> Changelog:
> v4:
> - man page changes
> v3:
> - too quick with fixing in v2... add line at the end of the file
> v2:
> - return the "exit 0" at the end
> 
> v1:
> - test for xfs_repair binary
> - run only in non-interactive session
> - translate xfs_repair return codes to fsck ones
> - run only if the filesystem is not mounted
> - add manpage update
> ---
>  fsck/xfs_fsck.sh    | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  man/man8/fsck.xfs.8 |  7 ++++++
>  2 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index e52969e4..0ec6b049 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -3,11 +3,42 @@
>  # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
>  #
>  
> +NAME=$0
> +
> +# get the right return code for fsck
> +function repair2fsck_code() {
> +	case $1 in
> +	0)  return 0 # everything is ok
> +		;;
> +	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
> +		return 4 # errors left uncorrected
> +		;;
> +	2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
> +		return 4 # it should not me mounted during boot, something is wrong

Sorry I missed this on the first go-round, but repair status 2 means the
log is dirty, so the admin must mount the fs to try to replay the log or
run xfs_repair -L to dump the log.  It does not mean that the fs is
already mounted.

"$NAME error: The filesystem log is dirty, either mount it to recover
the log.  If that fails, run xfs_repair -L to clear the log."

--D

> +		;;
> +	3)  return 1 # The fs has been fixed
> +		;;
> +	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
> +		return 4 # something went wrong with xfs_repair
> +	esac
> +}
> +
> +function ensure_not_mounted() {
> +	local dev=$1
> +	mounted=`grep -c "^$dev " /proc/mounts`
> +	if [ $mounted -ne 0 ]; then
> +		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
> +		exit 4
> +	fi
> +}
> +
>  AUTO=false
> -while getopts ":aApy" c
> +FORCE=false
> +while getopts ":aApyf" c
>  do
>  	case $c in
>  	a|A|p|y)	AUTO=true;;
> +	f)      	FORCE=true;;
>  	esac
>  done
>  eval DEV=\${$#}
> @@ -15,6 +46,38 @@ if [ ! -e $DEV ]; then
>  	echo "$0: $DEV does not exist"
>  	exit 8
>  fi
> +
> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> +# this check, user has to explicitly require a forced fsck.
> +# But first of all, test if it is a non-interactive session. Use multiple
> +# methods to capture most of the cases:
> +# The case for *i* and -n "$PS1" are commonly suggested in bash manual
> +# and the -t 0 test checks stdin
> +case $- in
> +	*i*) FORCE=false ;;
> +esac
> +if [ -n "$PS1" -o -t 0 ]; then
> +	FORCE=false
> +fi
> +
> +if $FORCE; then
> +	if [ -f /sbin/xfs_repair ]; then
> +		BIN="/sbin/xfs_repair"
> +	elif [ -f /usr/sbin/xfs_repair ]; then
> +		BIN="/usr/sbin/xfs_repair"
> +	else
> +		echo "$NAME error: xfs_repair was not found!" 1>&2
> +		exit 4
> +	fi
> +
> +	ensure_not_mounted $DEV
> +
> +	$BIN -e $DEV
> +	repair2fsck_code $?
> +	exit $?
> +fi
> +
>  if $AUTO; then
>  	echo "$0: XFS file system."
>  else
> diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
> index ace7252d..08812be8 100644
> --- a/man/man8/fsck.xfs.8
> +++ b/man/man8/fsck.xfs.8
> @@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
>  or repair a damaged or corrupt XFS filesystem,
>  see
>  .BR xfs_repair (8).
> +.PP
> +However, the system administrator can force
> +.B fsck.xfs
> +to run
> +.BR xfs_repair (8)
> +by creating a /forcefsck file or booting the system with
> +"fsck.mode=force" on the kernel command line.
>  .
>  .SH FILES
>  .IR /etc/fstab .
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 11:16                     ` Jan Tulak
@ 2018-03-15 22:19                       ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2018-03-15 22:19 UTC (permalink / raw)
  To: Jan Tulak; +Cc: Eric Sandeen, linux-xfs

On Thu, Mar 15, 2018 at 12:16:44PM +0100, Jan Tulak wrote:
> On Wed, Mar 14, 2018 at 4:19 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> > On 3/14/18 9:30 AM, Jan Tulak wrote:
> >> On Fri, Mar 9, 2018 at 12:28 AM, Eric Sandeen <sandeen@sandeen.net>
> >> wrote:
> >>>
> >>> If it's critical to report whether errors were fixed, it would be
> >>> trivial to add a new option to xfs_repair which causes it to test
> >>> fs_is_dirty for runs without "-n", and exit with a different
> >>> value.
> >>
> >> I have toyed with it a bit and this seems to be the best option. A
> >> flag that changes the exit code on a successful run. Is exit code 3
> >> ok? According to man page, only 1 and 2 are currently used and the
> >> "everything is ok now, but an issue was there" should not be mixed
> >> with the existing ones. I also thought about a flag that would
> >> change all exit codes to fsck ones, but that seems too complicated
> >> and completely unnecessary.
> >
> > Hm, I guess we'll have to.
> 
> We can either map xfs_repair to fsck in the fsck.xfs script, or change
> xfs_repair to use the fsck exit codes. I'm for the first variant (just
> add a one new exit code for xfs_repair and remap it for fsck codes in
> the script) rather than complicate xfs_repair with two sets of exit
> codes.

Agreed, any fsck specific exit code mapping should be in the fsck
script. Let's try to keep xfs_repair isolated from all the fsck
interface bogosities....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v4] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 18:49           ` Darrick J. Wong
@ 2018-03-16 10:19             ` Jan Tulak
  2018-03-16 15:39               ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-16 10:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Mar 15, 2018 at 7:49 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Thu, Mar 15, 2018 at 07:28:50PM +0100, Jan Tulak wrote:
>> +
>> +# get the right return code for fsck
>> +function repair2fsck_code() {
>> +     case $1 in
>> +     0)  return 0 # everything is ok
>> +             ;;
>> +     1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
>> +             return 4 # errors left uncorrected
>> +             ;;
>> +     2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
>> +             return 4 # it should not me mounted during boot, something is wrong
>
> Sorry I missed this on the first go-round, but repair status 2 means the
> log is dirty, so the admin must mount the fs to try to replay the log or
> run xfs_repair -L to dump the log.  It does not mean that the fs is
> already mounted.
>
> "$NAME error: The filesystem log is dirty, either mount it to recover
> the log.  If that fails, run xfs_repair -L to clear the log."
>

Right, thanks for spotting it. But I wonder if telling the user to
blindly use -L is safe. Maybe something like "run xfs_repair -L (can
be dangerous, refer to manual pages)" would be better.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v4] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-16 10:19             ` Jan Tulak
@ 2018-03-16 15:39               ` Darrick J. Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-16 15:39 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-xfs

On Fri, Mar 16, 2018 at 11:19:50AM +0100, Jan Tulak wrote:
> On Thu, Mar 15, 2018 at 7:49 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Thu, Mar 15, 2018 at 07:28:50PM +0100, Jan Tulak wrote:
> >> +
> >> +# get the right return code for fsck
> >> +function repair2fsck_code() {
> >> +     case $1 in
> >> +     0)  return 0 # everything is ok
> >> +             ;;
> >> +     1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
> >> +             return 4 # errors left uncorrected
> >> +             ;;
> >> +     2)  echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
> >> +             return 4 # it should not me mounted during boot, something is wrong
> >
> > Sorry I missed this on the first go-round, but repair status 2 means the
> > log is dirty, so the admin must mount the fs to try to replay the log or
> > run xfs_repair -L to dump the log.  It does not mean that the fs is
> > already mounted.
> >
> > "$NAME error: The filesystem log is dirty, either mount it to recover
> > the log.  If that fails, run xfs_repair -L to clear the log."
> >
> 
> Right, thanks for spotting it. But I wonder if telling the user to
> blindly use -L is safe. Maybe something like "run xfs_repair -L (can
> be dangerous, refer to manual pages)" would be better.

Yes.  "The filesystem log is dirty, please mount the filesystem to
recover the log.  If that fails, refer to the section DIRTY LOGS in the
xfs_repair manual page." ?

--D

> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 2/2 v5] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-15 18:28         ` [PATCH 2/2 v4] " Jan Tulak
  2018-03-15 18:49           ` Darrick J. Wong
@ 2018-03-16 17:07           ` Jan Tulak
  2018-03-23  2:37             ` Eric Sandeen
  2018-03-23 14:33             ` [PATCH 2/2 v6] " Jan Tulak
  1 sibling, 2 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-16 17:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force), so user can require the repair,
without the repair being run all the time.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
Changelog:
v5:
- Change the message for xfs_repair code 2
v4:
- man page changes
v3:
- too quick with fixing in v2... add line at the end of the file
v2:
- return the "exit 0" at the end
v1:
- test for xfs_repair binary
- run only in non-interactive session
- translate xfs_repair return codes to fsck ones
- run only if the filesystem is not mounted
- add manpage update
---
 fsck/xfs_fsck.sh    | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 man/man8/fsck.xfs.8 |  7 ++++++
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index e52969e4..2ac09aeb 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -3,11 +3,44 @@
 # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 
+NAME=$0
+
+# get the right return code for fsck
+function repair2fsck_code() {
+	case $1 in
+	0)  return 0 # everything is ok
+		;;
+	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
+		return 4 # errors left uncorrected
+		;;
+	2)  echo "$NAME error: The filesystem log is dirty, mount it to recover" \
+		     "the log. If that fails, refer to the section DIRTY LOGS in the" \
+		     "xfs_repair manual page." 1>&2
+		return 4 # dirty log, don't do anything and let the user solve it
+		;;
+	3)  return 1 # The fs has been fixed
+		;;
+	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
+		return 4 # something went wrong with xfs_repair
+	esac
+}
+
+function ensure_not_mounted() {
+	local dev=$1
+	mounted=`grep -c "^$dev " /proc/mounts`
+	if [ $mounted -ne 0 ]; then
+		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
+		exit 4
+	fi
+}
+
 AUTO=false
-while getopts ":aApy" c
+FORCE=false
+while getopts ":aApyf" c
 do
 	case $c in
 	a|A|p|y)	AUTO=true;;
+	f)      	FORCE=true;;
 	esac
 done
 eval DEV=\${$#}
@@ -15,6 +48,38 @@ if [ ! -e $DEV ]; then
 	echo "$0: $DEV does not exist"
 	exit 8
 fi
+
+# The flag -f is added by systemd/init scripts when /forcefsck file is present
+# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
+# this check, user has to explicitly require a forced fsck.
+# But first of all, test if it is a non-interactive session. Use multiple
+# methods to capture most of the cases:
+# The case for *i* and -n "$PS1" are commonly suggested in bash manual
+# and the -t 0 test checks stdin
+case $- in
+	*i*) FORCE=false ;;
+esac
+if [ -n "$PS1" -o -t 0 ]; then
+	FORCE=false
+fi
+
+if $FORCE; then
+	if [ -f /sbin/xfs_repair ]; then
+		BIN="/sbin/xfs_repair"
+	elif [ -f /usr/sbin/xfs_repair ]; then
+		BIN="/usr/sbin/xfs_repair"
+	else
+		echo "$NAME error: xfs_repair was not found!" 1>&2
+		exit 4
+	fi
+
+	ensure_not_mounted $DEV
+
+	$BIN -e $DEV
+	repair2fsck_code $?
+	exit $?
+fi
+
 if $AUTO; then
 	echo "$0: XFS file system."
 else
diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
index ace7252d..08812be8 100644
--- a/man/man8/fsck.xfs.8
+++ b/man/man8/fsck.xfs.8
@@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
 or repair a damaged or corrupt XFS filesystem,
 see
 .BR xfs_repair (8).
+.PP
+However, the system administrator can force
+.B fsck.xfs
+to run
+.BR xfs_repair (8)
+by creating a /forcefsck file or booting the system with
+"fsck.mode=force" on the kernel command line.
 .
 .SH FILES
 .IR /etc/fstab .
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/2 v2] xfs_repair: add flag -e to detect corrected errors
  2018-03-15 18:23   ` [PATCH 1/2 v2] " Jan Tulak
  2018-03-15 18:44     ` Darrick J. Wong
@ 2018-03-23  1:57     ` Eric Sandeen
  2018-03-23  9:24       ` Jan Tulak
  2018-03-23 14:32     ` [PATCH 1/2 v3] xfs_repair: add flag -e to modify exit code for " Jan Tulak
  2 siblings, 1 reply; 46+ messages in thread
From: Eric Sandeen @ 2018-03-23  1:57 UTC (permalink / raw)
  To: Jan Tulak, linux-xfs

On 3/15/18 1:23 PM, Jan Tulak wrote:
> xfs_repair ends with a return code 0 if it finished ok, no matter if
> there were some errors in the fs, or not. The new flag -e means that we
> can avoid screenscraping and parsing text output to detect if an error
> was found (and corrected).
> 
> If something could not be corrected or in any other case than the "found
> something but fixed it all," the behaviour with this flag is unchanged.
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>

A couple more minor things, sorry for chiming in late.

Can we make the changelog summary a little more specific, i.e.

"xfs_repair: add flag -e to modify exit code for corrected errors"

I had a late-breaking thought ;) that maybe we should skip up to exit
code 4, so that if for some reason in the future we need to, we can
OR together exit codes ala e2fsck to convey more information.
I don't know what other exit codes we might need, but this might
future-proof it a little.

(Actually, I can see an use today:  1 | 4 = 5 could mean
"we found and fixed some errors and then encountered an operational
problem and exited." - but that can come later, not here.  Skipping
to 4 would keep this option open.)

more below

> ---
> v2:
> - edit man page changes
> - report_corrected is now bool
> - minor code simplification
> ---
>  man/man8/xfs_repair.8 | 15 +++++++++++----
>  repair/xfs_repair.c   | 10 +++++++++-
>  2 files changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
> index 85e4dc97..1ca3b614 100644
> --- a/man/man8/xfs_repair.8
> +++ b/man/man8/xfs_repair.8
> @@ -4,7 +4,7 @@ xfs_repair \- repair an XFS filesystem
>  .SH SYNOPSIS
>  .B xfs_repair
>  [
> -.B \-dfLnPv
> +.B \-defLnPv
>  ] [
>  .B \-m
>  .I maxmem
> @@ -168,6 +168,10 @@ Repair dangerously. Allow
>  to repair an XFS filesystem mounted read only. This is typically done
>  on a root filesystem from single user mode, immediately followed by a reboot.
>  .TP
> +.B \-e
> +If any metadata corruption was found, the status returned is 3 instead of the
> +usual 0.

Sorry, I know Darrick already commented on this once, but I think
it probably needs to distinguish from the -n case a little more.  Maybe:

+If any metadata corruption was repaired, the status returned is 3 (4?) instead of the
+usual 0.

which makes it more clear that it's in repair mode, not dry-run mode?

> +.TP
>  .B \-V
>  Prints the version number and exits.
>  .SS Checks Performed
> @@ -512,14 +516,17 @@ will return a status of 1 if filesystem corruption was detected and
>  0 if no filesystem corruption was detected.
>  .B xfs_repair
>  run without the \-n option will always return a status code of 0 if
> -it completes without problems.  If a runtime error is encountered
> -during operation, it will return a status of 1.  In this case,
> +it completes without problems, unless the flag
> +.B -e
> +is used. If it is used, then status 3 is reported when any issue with the
> +filesystem was found, but could be fixed. If a runtime error is encountered during
> +operation, it will return a status of 1. In this case,
>  .B xfs_repair
>  should be restarted.  If
>  .B xfs_repair is unable
>  to proceed due to a dirty log, it will return a status of 2.  See below.
>  .SH DIRTY LOGS
> -Due to the design of the XFS log, a dirty log can only be replayed 
> +Due to the design of the XFS log, a dirty log can only be replayed
>  by the kernel, on a machine having the same CPU architecture as the
>  machine which was writing to the log.
>  .B xfs_repair
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index 312a0d08..a65709ce 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -77,6 +77,7 @@ static char *c_opts[] = {
>  static int	bhash_option_used;
>  static long	max_mem_specified;	/* in megabytes */
>  static int	phase2_threads = 32;
> +static bool report_corrected;

tab that out please, to match lines above

>  
>  static void
>  usage(void)
> @@ -97,6 +98,7 @@ usage(void)
>  "  -o subopts   Override default behaviour, refer to man page.\n"
>  "  -t interval  Reporting interval in seconds.\n"
>  "  -d           Repair dangerously.\n"
> +"  -e           Exit with a non-zero code even when all errors were repaired.\n"

+"  -e           Exit with a non-zero code if any errors were repaired.\n"


>  "  -V           Reports version and exits.\n"), progname);
>  	exit(1);
>  }
> @@ -214,12 +216,13 @@ process_args(int argc, char **argv)
>  	ag_stride = 0;
>  	thread_count = 1;
>  	report_interval = PROG_RPT_DEFAULT;
> +	report_corrected = false;
>  
>  	/*
>  	 * XXX have to add suboption processing here
>  	 * attributes, quotas, nlinks, aligned_inos, sb_fbits
>  	 */
> -	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:")) != EOF)  {
> +	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPet:")) != EOF)  {
>  		switch (c) {
>  		case 'D':
>  			dumpcore = 1;
> @@ -329,6 +332,9 @@ process_args(int argc, char **argv)
>  		case 't':
>  			report_interval = (int)strtol(optarg, NULL, 0);
>  			break;
> +		case 'e':
> +			report_corrected = true;
> +			break;
>  		case '?':
>  			usage();
>  		}

It looks like we can specify -e and -n together; I think they need to be
mutually exclusive, because the combination has no valid meaning that
I can see.

(if so, I guess that needs a usage/summary/manpage update to reflect the change)

Thanks,
-Eric

> @@ -1096,5 +1102,7 @@ _("Repair of readonly mount complete.  Immediate reboot encouraged.\n"));
>  
>  	free(msgbuf);
>  
> +	if (fs_is_dirty && report_corrected)
> +		return (3);
>  	return (0);
>  }
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v5] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-16 17:07           ` [PATCH 2/2 v5] " Jan Tulak
@ 2018-03-23  2:37             ` Eric Sandeen
  2018-03-23  3:25               ` Darrick J. Wong
  2018-03-23 14:14               ` Jan Tulak
  2018-03-23 14:33             ` [PATCH 2/2 v6] " Jan Tulak
  1 sibling, 2 replies; 46+ messages in thread
From: Eric Sandeen @ 2018-03-23  2:37 UTC (permalink / raw)
  To: Jan Tulak, linux-xfs

On 3/16/18 12:07 PM, Jan Tulak wrote:
> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> run on every unclean shutdown. However, sometimes it may happen that the
> root filesystem really requires the usage of xfs_repair and then it is a
> hassle. This patch makes the situation a bit easier by detecting forced
> checks (/forcefsck or fsck.mode=force)

and invoking xfs_repair.

(and then can strike the rest below)

>, so user can require the repair,
> without the repair being run all the time.
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> 
> ---
> Changelog:
> v5:
> - Change the message for xfs_repair code 2
> v4:
> - man page changes
> v3:
> - too quick with fixing in v2... add line at the end of the file
> v2:
> - return the "exit 0" at the end
> v1:
> - test for xfs_repair binary
> - run only in non-interactive session
> - translate xfs_repair return codes to fsck ones
> - run only if the filesystem is not mounted
> - add manpage update
> ---
>  fsck/xfs_fsck.sh    | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  man/man8/fsck.xfs.8 |  7 ++++++
>  2 files changed, 73 insertions(+), 1 deletion(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index e52969e4..2ac09aeb 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -3,11 +3,44 @@
>  # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
>  #
>  
> +NAME=$0
> +
> +# get the right return code for fsck
> +function repair2fsck_code() {
> +	case $1 in
> +	0)  return 0 # everything is ok
> +		;;
> +	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
> +		return 4 # errors left uncorrected
> +		;;
> +	2)  echo "$NAME error: The filesystem log is dirty, mount it to recover" \
> +		     "the log. If that fails, refer to the section DIRTY LOGS in the" \
> +		     "xfs_repair manual page." 1>&2
> +		return 4 # dirty log, don't do anything and let the user solve it
> +		;;
> +	3)  return 1 # The fs has been fixed

(4, if patch 1/2 changes ...)

> +		;;
> +	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
> +		return 4 # something went wrong with xfs_repair
> +	esac
> +}
> +
> +function ensure_not_mounted() {
> +	local dev=$1
> +	mounted=`grep -c "^$dev " /proc/mounts`
> +	if [ $mounted -ne 0 ]; then
> +		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
> +		exit 4
> +	fi
> +}

Is this necessary?  Doesn't xfs_repair already check this?

# xfs_repair /dev/loop0
xfs_repair: /dev/loop0 contains a mounted filesystem
xfs_repair: /dev/loop0 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library

# echo $?
1

(script would then say "error: xfs_repair could not fix the filesystem."
and exit with 4.)

It might be nice to have the cleaner message above, but I wonder about
the wisdom and safety of hand-rolling another simplistic mount check in a bash
script.... thoughts?

> +
>  AUTO=false
> -while getopts ":aApy" c
> +FORCE=false
> +while getopts ":aApyf" c
>  do
>  	case $c in
>  	a|A|p|y)	AUTO=true;;
> +	f)      	FORCE=true;;
>  	esac
>  done
>  eval DEV=\${$#}
> @@ -15,6 +48,38 @@ if [ ! -e $DEV ]; then
>  	echo "$0: $DEV does not exist"
>  	exit 8
>  fi
> +
> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> +# this check, user has to explicitly require a forced fsck.
> +# But first of all, test if it is a non-interactive session.

Let's say why, here:

"Invoking xfs_repair via fsck.xfs is only intended to happen via initscripts.
Normal administrative filesystem repairs should always invoke xfs_repair directly."

or something like that.

> Use multiple
> +# methods to capture most of the cases:
> +# The case for *i* and -n "$PS1" are commonly suggested in bash manual
> +# and the -t 0 test checks stdin
> +case $- in
> +	*i*) FORCE=false ;;
> +esac
> +if [ -n "$PS1" -o -t 0 ]; then
> +	FORCE=false
> +fi

Ok - I thought maybe we should be noisy about ignoring -f, but on
second thought, I like hiding it.

> +
> +if $FORCE; then
> +	if [ -f /sbin/xfs_repair ]; then
> +		BIN="/sbin/xfs_repair"
> +	elif [ -f /usr/sbin/xfs_repair ]; then
> +		BIN="/usr/sbin/xfs_repair"
> +	else
> +		echo "$NAME error: xfs_repair was not found!" 1>&2
> +		exit 4
> +	fi

Ok, similar question as Darrick had - why hardcode paths?  Seems reasonable
to assume that xfs_repair will be in the path, and who knows what the
path may be in any given initrd...

Could just do:

XFS_REPAIR=$(command -v xfs_repair)
if [ ! -x "$XFS_REPAIR" ] ; then
	echo "$NAME error: xfs_repair was not found!" 1>&2
	exit 4
fi

It make sense to check that it's available before running it, but I think
hardcoding paths runs the risk of missing it if it's somewhere unique.

> +
> +	ensure_not_mounted $DEV
> +
> +	$BIN -e $DEV

please rename $BIN as $XFS_REPAIR for clarity?

> +	repair2fsck_code $?
> +	exit $?
> +fi
> +
>  if $AUTO; then
>  	echo "$0: XFS file system."
>  else
> diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
> index ace7252d..08812be8 100644
> --- a/man/man8/fsck.xfs.8
> +++ b/man/man8/fsck.xfs.8
> @@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
>  or repair a damaged or corrupt XFS filesystem,
>  see
>  .BR xfs_repair (8).
> +.PP
> +However, the system administrator can force
> +.B fsck.xfs
> +to run
> +.BR xfs_repair (8)

at boot time

> +by creating a /forcefsck file or booting the system with
> +"fsck.mode=force" on the kernel command line.
>  .
>  .SH FILES
>  .IR /etc/fstab .
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v5] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-23  2:37             ` Eric Sandeen
@ 2018-03-23  3:25               ` Darrick J. Wong
  2018-03-23  3:29                 ` Eric Sandeen
  2018-03-23 14:14               ` Jan Tulak
  1 sibling, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-23  3:25 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Jan Tulak, linux-xfs

On Thu, Mar 22, 2018 at 09:37:08PM -0500, Eric Sandeen wrote:
> On 3/16/18 12:07 PM, Jan Tulak wrote:
> > The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> > run on every unclean shutdown. However, sometimes it may happen that the
> > root filesystem really requires the usage of xfs_repair and then it is a
> > hassle. This patch makes the situation a bit easier by detecting forced
> > checks (/forcefsck or fsck.mode=force)
> 
> and invoking xfs_repair.
> 
> (and then can strike the rest below)
> 
> >, so user can require the repair,
> > without the repair being run all the time.
> > 
> > Signed-off-by: Jan Tulak <jtulak@redhat.com>
> > 
> > ---
> > Changelog:
> > v5:
> > - Change the message for xfs_repair code 2
> > v4:
> > - man page changes
> > v3:
> > - too quick with fixing in v2... add line at the end of the file
> > v2:
> > - return the "exit 0" at the end
> > v1:
> > - test for xfs_repair binary
> > - run only in non-interactive session
> > - translate xfs_repair return codes to fsck ones
> > - run only if the filesystem is not mounted
> > - add manpage update
> > ---
> >  fsck/xfs_fsck.sh    | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  man/man8/fsck.xfs.8 |  7 ++++++
> >  2 files changed, 73 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> > index e52969e4..2ac09aeb 100755
> > --- a/fsck/xfs_fsck.sh
> > +++ b/fsck/xfs_fsck.sh
> > @@ -3,11 +3,44 @@
> >  # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
> >  #
> >  
> > +NAME=$0
> > +
> > +# get the right return code for fsck
> > +function repair2fsck_code() {
> > +	case $1 in
> > +	0)  return 0 # everything is ok
> > +		;;
> > +	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
> > +		return 4 # errors left uncorrected
> > +		;;
> > +	2)  echo "$NAME error: The filesystem log is dirty, mount it to recover" \
> > +		     "the log. If that fails, refer to the section DIRTY LOGS in the" \
> > +		     "xfs_repair manual page." 1>&2
> > +		return 4 # dirty log, don't do anything and let the user solve it
> > +		;;
> > +	3)  return 1 # The fs has been fixed
> 
> (4, if patch 1/2 changes ...)
> 
> > +		;;
> > +	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
> > +		return 4 # something went wrong with xfs_repair
> > +	esac
> > +}
> > +
> > +function ensure_not_mounted() {
> > +	local dev=$1
> > +	mounted=`grep -c "^$dev " /proc/mounts`
> > +	if [ $mounted -ne 0 ]; then
> > +		echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
> > +		exit 4
> > +	fi
> > +}
> 
> Is this necessary?  Doesn't xfs_repair already check this?
> 
> # xfs_repair /dev/loop0
> xfs_repair: /dev/loop0 contains a mounted filesystem
> xfs_repair: /dev/loop0 contains a mounted and writable filesystem
> 
> fatal error -- couldn't initialize XFS library
> 
> # echo $?
> 1
> 
> (script would then say "error: xfs_repair could not fix the filesystem."
> and exit with 4.)
> 
> It might be nice to have the cleaner message above, but I wonder about
> the wisdom and safety of hand-rolling another simplistic mount check in a bash
> script.... thoughts?
> 
> > +
> >  AUTO=false
> > -while getopts ":aApy" c
> > +FORCE=false
> > +while getopts ":aApyf" c
> >  do
> >  	case $c in
> >  	a|A|p|y)	AUTO=true;;
> > +	f)      	FORCE=true;;
> >  	esac
> >  done
> >  eval DEV=\${$#}
> > @@ -15,6 +48,38 @@ if [ ! -e $DEV ]; then
> >  	echo "$0: $DEV does not exist"
> >  	exit 8
> >  fi
> > +
> > +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> > +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> > +# this check, user has to explicitly require a forced fsck.
> > +# But first of all, test if it is a non-interactive session.
> 
> Let's say why, here:
> 
> "Invoking xfs_repair via fsck.xfs is only intended to happen via initscripts.
> Normal administrative filesystem repairs should always invoke xfs_repair directly."
> 
> or something like that.
> 
> > Use multiple
> > +# methods to capture most of the cases:
> > +# The case for *i* and -n "$PS1" are commonly suggested in bash manual
> > +# and the -t 0 test checks stdin
> > +case $- in
> > +	*i*) FORCE=false ;;
> > +esac
> > +if [ -n "$PS1" -o -t 0 ]; then
> > +	FORCE=false
> > +fi
> 
> Ok - I thought maybe we should be noisy about ignoring -f, but on
> second thought, I like hiding it.
> 
> > +
> > +if $FORCE; then
> > +	if [ -f /sbin/xfs_repair ]; then
> > +		BIN="/sbin/xfs_repair"
> > +	elif [ -f /usr/sbin/xfs_repair ]; then
> > +		BIN="/usr/sbin/xfs_repair"
> > +	else
> > +		echo "$NAME error: xfs_repair was not found!" 1>&2
> > +		exit 4
> > +	fi
> 
> Ok, similar question as Darrick had - why hardcode paths?  Seems reasonable
> to assume that xfs_repair will be in the path, and who knows what the
> path may be in any given initrd...
> 
> Could just do:
> 
> XFS_REPAIR=$(command -v xfs_repair)
> if [ ! -x "$XFS_REPAIR" ] ; then
> 	echo "$NAME error: xfs_repair was not found!" 1>&2
> 	exit 4
> fi
>
> It make sense to check that it's available before running it, but I think
> hardcoding paths runs the risk of missing it if it's somewhere unique.

OTOH hardcoding the path (since we control where xfs_repair gets
installed) means that we avoid the "someone poisoned PATH" attack.  I'm
not sure I buy that argument since if you have root access you probably
can just bind mount a garbage atop /sbin/xfs_repair, but eh.

If you really /do/ want to hardcode paths into the script, you should
pick up PKG_ROOT_SBIN_DIR from the build system.

--D

> 
> > +
> > +	ensure_not_mounted $DEV
> > +
> > +	$BIN -e $DEV
> 
> please rename $BIN as $XFS_REPAIR for clarity?
> 
> > +	repair2fsck_code $?
> > +	exit $?
> > +fi
> > +
> >  if $AUTO; then
> >  	echo "$0: XFS file system."
> >  else
> > diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
> > index ace7252d..08812be8 100644
> > --- a/man/man8/fsck.xfs.8
> > +++ b/man/man8/fsck.xfs.8
> > @@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
> >  or repair a damaged or corrupt XFS filesystem,
> >  see
> >  .BR xfs_repair (8).
> > +.PP
> > +However, the system administrator can force
> > +.B fsck.xfs
> > +to run
> > +.BR xfs_repair (8)
> 
> at boot time
> 
> > +by creating a /forcefsck file or booting the system with
> > +"fsck.mode=force" on the kernel command line.
> >  .
> >  .SH FILES
> >  .IR /etc/fstab .
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v5] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-23  3:25               ` Darrick J. Wong
@ 2018-03-23  3:29                 ` Eric Sandeen
  2018-03-23  3:42                   ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Sandeen @ 2018-03-23  3:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Jan Tulak, linux-xfs



On 3/22/18 10:25 PM, Darrick J. Wong wrote:
>> It make sense to check that it's available before running it, but I think
>> hardcoding paths runs the risk of missing it if it's somewhere unique.

> OTOH hardcoding the path (since we control where xfs_repair gets
> installed) means that we avoid the "someone poisoned PATH" attack.  I'm
> not sure I buy that argument since if you have root access you probably
> can just bind mount a garbage atop /sbin/xfs_repair, but eh.

But this is the initramfs, right?  Do we really have any idea where things
land?  Do initramfs paths follow the installed system paths?  (maybe?)

-Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v5] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-23  3:29                 ` Eric Sandeen
@ 2018-03-23  3:42                   ` Darrick J. Wong
  2018-03-23 14:00                     ` Jan Tulak
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2018-03-23  3:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Jan Tulak, linux-xfs

On Thu, Mar 22, 2018 at 10:29:27PM -0500, Eric Sandeen wrote:
> 
> 
> On 3/22/18 10:25 PM, Darrick J. Wong wrote:
> >> It make sense to check that it's available before running it, but I think
> >> hardcoding paths runs the risk of missing it if it's somewhere unique.
> 
> > OTOH hardcoding the path (since we control where xfs_repair gets
> > installed) means that we avoid the "someone poisoned PATH" attack.  I'm
> > not sure I buy that argument since if you have root access you probably
> > can just bind mount a garbage atop /sbin/xfs_repair, but eh.
> 
> But this is the initramfs, right?  Do we really have any idea where things
> land?  Do initramfs paths follow the installed system paths?  (maybe?)

They land wherever the initramfs construction script puts them, though
at least on Debian the binaries are usually put in the same places as
they are on the rootfs.  Not sure about dracut. :)

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/2 v2] xfs_repair: add flag -e to detect corrected errors
  2018-03-23  1:57     ` Eric Sandeen
@ 2018-03-23  9:24       ` Jan Tulak
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-23  9:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Fri, Mar 23, 2018 at 2:57 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 3/15/18 1:23 PM, Jan Tulak wrote:
>> xfs_repair ends with a return code 0 if it finished ok, no matter if
>> there were some errors in the fs, or not. The new flag -e means that we
>> can avoid screenscraping and parsing text output to detect if an error
>> was found (and corrected).
>>
>> If something could not be corrected or in any other case than the "found
>> something but fixed it all," the behaviour with this flag is unchanged.
>>
>> Signed-off-by: Jan Tulak <jtulak@redhat.com>
>
> A couple more minor things, sorry for chiming in late.
>
> Can we make the changelog summary a little more specific, i.e.
>
> "xfs_repair: add flag -e to modify exit code for corrected errors"
>
> I had a late-breaking thought ;) that maybe we should skip up to exit
> code 4, so that if for some reason in the future we need to, we can
> OR together exit codes ala e2fsck to convey more information.
> I don't know what other exit codes we might need, but this might
> future-proof it a little.
>
> (Actually, I can see an use today:  1 | 4 = 5 could mean
> "we found and fixed some errors and then encountered an operational
> problem and exited." - but that can come later, not here.  Skipping
> to 4 would keep this option open.)
>

That makes sense. I will update it accordingly. And thanks for the
rest of the things, fixing that too.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v5] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-23  3:42                   ` Darrick J. Wong
@ 2018-03-23 14:00                     ` Jan Tulak
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-23 14:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, linux-xfs

On Fri, Mar 23, 2018 at 4:42 AM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Thu, Mar 22, 2018 at 10:29:27PM -0500, Eric Sandeen wrote:
>>
>>
>> On 3/22/18 10:25 PM, Darrick J. Wong wrote:
>> >> It make sense to check that it's available before running it, but I think
>> >> hardcoding paths runs the risk of missing it if it's somewhere unique.
>>
>> > OTOH hardcoding the path (since we control where xfs_repair gets
>> > installed) means that we avoid the "someone poisoned PATH" attack.  I'm
>> > not sure I buy that argument since if you have root access you probably
>> > can just bind mount a garbage atop /sbin/xfs_repair, but eh.
>>
>> But this is the initramfs, right?  Do we really have any idea where things
>> land?  Do initramfs paths follow the installed system paths?  (maybe?)
>
> They land wherever the initramfs construction script puts them, though
> at least on Debian the binaries are usually put in the same places as
> they are on the rootfs.  Not sure about dracut. :)
>

AFAICT in the few cases I saw (Debian based distros, Fedora, Arch),
the paths were always the same on the rootfs and initramfs. But yeah,
I agree that it's better to avoid hardcoding. I didn't know about
"command -v", and "which" is not usually present, so I went with
hardcoding rather than implement it on my own. Which is no issue
now...

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v5] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-23  2:37             ` Eric Sandeen
  2018-03-23  3:25               ` Darrick J. Wong
@ 2018-03-23 14:14               ` Jan Tulak
  1 sibling, 0 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-23 14:14 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Fri, Mar 23, 2018 at 3:37 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> +function ensure_not_mounted() {
>> +     local dev=$1
>> +     mounted=`grep -c "^$dev " /proc/mounts`
>> +     if [ $mounted -ne 0 ]; then
>> +             echo "$NAME error: The filesystem to be checked must not be mounted." 1>&2
>> +             exit 4
>> +     fi
>> +}
>
> Is this necessary?  Doesn't xfs_repair already check this?
>
> # xfs_repair /dev/loop0
> xfs_repair: /dev/loop0 contains a mounted filesystem
> xfs_repair: /dev/loop0 contains a mounted and writable filesystem
>
> fatal error -- couldn't initialize XFS library
>
> # echo $?
> 1
>
> (script would then say "error: xfs_repair could not fix the filesystem."
> and exit with 4.)
>
> It might be nice to have the cleaner message above, but I wonder about
> the wisdom and safety of hand-rolling another simplistic mount check in a bash
> script.... thoughts?
>

Mmm, we are not hiding the message, so I guess it doesn't really
matter, the xfs_repair errors are going to be visible. This testing
remained there from when I wanted to use another exit code for mounted
fs and xfs_repair failures. But I'm removing the check now, with the
merged exit code it is not useful.

Other stuff: ok, changed.

Thanks,
Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/2 v3] xfs_repair: add flag -e to modify exit code for corrected errors
  2018-03-15 18:23   ` [PATCH 1/2 v2] " Jan Tulak
  2018-03-15 18:44     ` Darrick J. Wong
  2018-03-23  1:57     ` Eric Sandeen
@ 2018-03-23 14:32     ` Jan Tulak
  2 siblings, 0 replies; 46+ messages in thread
From: Jan Tulak @ 2018-03-23 14:32 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

xfs_repair ends with a return code 0 if it finished ok, no matter if
there were some errors in the fs, or not. The new flag -e means that we
can avoid screenscraping and parsing text output to detect if an error
was found (and corrected).

If something could not be corrected or in any other case than the "found
something but fixed it all," the behaviour with this flag is unchanged.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
v3:
- change the code from 3 to 4
- disallow -e -n combination
- change patch title
- minor man page edits

v2:
- edit man page changes
- report_corrected is now bool
- minor code simplification
---
 man/man8/xfs_repair.8 | 20 +++++++++++++++-----
 repair/xfs_repair.c   | 15 ++++++++++++++-
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
index 85e4dc97..19c13962 100644
--- a/man/man8/xfs_repair.8
+++ b/man/man8/xfs_repair.8
@@ -4,7 +4,7 @@ xfs_repair \- repair an XFS filesystem
 .SH SYNOPSIS
 .B xfs_repair
 [
-.B \-dfLnPv
+.B \-defLnPv
 ] [
 .B \-m
 .I maxmem
@@ -91,7 +91,9 @@ for a detailed description of the XFS realtime section.
 No modify mode. Specifies that
 .B xfs_repair
 should not modify the filesystem but should only scan the
-filesystem and indicate what repairs would have been made.
+filesystem and indicate what repairs would have been made. This option cannot
+be used together with
+.BR \-e .
 .TP
 .B \-P
 Disable prefetching of inode and directory blocks. Use this option if
@@ -168,6 +170,11 @@ Repair dangerously. Allow
 to repair an XFS filesystem mounted read only. This is typically done
 on a root filesystem from single user mode, immediately followed by a reboot.
 .TP
+.B \-e
+If any metadata corruption was repaired, the status returned is 4 instead of the
+usual 0. This option cannot be used together with
+.BR \-n .
+.TP
 .B \-V
 Prints the version number and exits.
 .SS Checks Performed
@@ -512,14 +519,17 @@ will return a status of 1 if filesystem corruption was detected and
 0 if no filesystem corruption was detected.
 .B xfs_repair
 run without the \-n option will always return a status code of 0 if
-it completes without problems.  If a runtime error is encountered
-during operation, it will return a status of 1.  In this case,
+it completes without problems, unless the flag
+.B -e
+is used. If it is used, then status 3 is reported when any issue with the
+filesystem was found, but could be fixed. If a runtime error is encountered during
+operation, it will return a status of 1. In this case,
 .B xfs_repair
 should be restarted.  If
 .B xfs_repair is unable
 to proceed due to a dirty log, it will return a status of 2.  See below.
 .SH DIRTY LOGS
-Due to the design of the XFS log, a dirty log can only be replayed 
+Due to the design of the XFS log, a dirty log can only be replayed
 by the kernel, on a machine having the same CPU architecture as the
 machine which was writing to the log.
 .B xfs_repair
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 312a0d08..6bb8ea26 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -77,6 +77,7 @@ static char *c_opts[] = {
 static int	bhash_option_used;
 static long	max_mem_specified;	/* in megabytes */
 static int	phase2_threads = 32;
+static bool	report_corrected;
 
 static void
 usage(void)
@@ -90,6 +91,7 @@ usage(void)
 "  -l logdev    Specifies the device where the external log resides.\n"
 "  -m maxmem    Maximum amount of memory to be used in megabytes.\n"
 "  -n           No modify mode, just checks the filesystem for damage.\n"
+"               (Cannot be used together with -e.)\n"
 "  -P           Disables prefetching.\n"
 "  -r rtdev     Specifies the device where the realtime section resides.\n"
 "  -v           Verbose output.\n"
@@ -97,6 +99,8 @@ usage(void)
 "  -o subopts   Override default behaviour, refer to man page.\n"
 "  -t interval  Reporting interval in seconds.\n"
 "  -d           Repair dangerously.\n"
+"  -e           Exit with a non-zero code if any errors were repaired.\n"
+"               (Cannot be used together with -n.)\n"
 "  -V           Reports version and exits.\n"), progname);
 	exit(1);
 }
@@ -214,12 +218,13 @@ process_args(int argc, char **argv)
 	ag_stride = 0;
 	thread_count = 1;
 	report_interval = PROG_RPT_DEFAULT;
+	report_corrected = false;
 
 	/*
 	 * XXX have to add suboption processing here
 	 * attributes, quotas, nlinks, aligned_inos, sb_fbits
 	 */
-	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:")) != EOF)  {
+	while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPet:")) != EOF)  {
 		switch (c) {
 		case 'D':
 			dumpcore = 1;
@@ -329,6 +334,9 @@ process_args(int argc, char **argv)
 		case 't':
 			report_interval = (int)strtol(optarg, NULL, 0);
 			break;
+		case 'e':
+			report_corrected = true;
+			break;
 		case '?':
 			usage();
 		}
@@ -339,6 +347,9 @@ process_args(int argc, char **argv)
 
 	if ((fs_name = argv[optind]) == NULL)
 		usage();
+
+	if (report_corrected && no_modify)
+		usage();
 }
 
 void __attribute__((noreturn))
@@ -1096,5 +1107,7 @@ _("Repair of readonly mount complete.  Immediate reboot encouraged.\n"));
 
 	free(msgbuf);
 
+	if (fs_is_dirty && report_corrected)
+		return (4);
 	return (0);
 }
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2 v6] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-16 17:07           ` [PATCH 2/2 v5] " Jan Tulak
  2018-03-23  2:37             ` Eric Sandeen
@ 2018-03-23 14:33             ` Jan Tulak
  2022-09-28  5:28               ` Darrick J. Wong
  1 sibling, 1 reply; 46+ messages in thread
From: Jan Tulak @ 2018-03-23 14:33 UTC (permalink / raw)
  To: linux-xfs; +Cc: Jan Tulak

The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force) and invoking xfs_repair.

Signed-off-by: Jan Tulak <jtulak@redhat.com>

---
Changelog:
v6:
- update exit code 3->4
- avoid hardcoding xfs_repair path
- rename $BIN->$xfs_repair
- remove mounted check - xfs_repair does it on its own
- small manpage edit
- better explanation in the comment
v5:
- Change the message for xfs_repair code 2
v4:
- man page changes
v3:
- too quick with fixing in v2... add line at the end of the file
v2:
- return the "exit 0" at the end

v1:
- test for xfs_repair binary
- run only in non-interactive session
- translate xfs_repair return codes to fsck ones
- run only if the filesystem is not mounted
- add manpage update
---
 fsck/xfs_fsck.sh    | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 man/man8/fsck.xfs.8 |  7 +++++++
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
index e52969e4..c9fc3eb3 100755
--- a/fsck/xfs_fsck.sh
+++ b/fsck/xfs_fsck.sh
@@ -3,11 +3,35 @@
 # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 
+NAME=$0
+
+# get the right return code for fsck
+function repair2fsck_code() {
+	case $1 in
+	0)  return 0 # everything is ok
+		;;
+	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
+		return 4 # errors left uncorrected
+		;;
+	2)  echo "$NAME error: The filesystem log is dirty, mount it to recover" \
+		     "the log. If that fails, refer to the section DIRTY LOGS in the" \
+		     "xfs_repair manual page." 1>&2
+		return 4 # dirty log, don't do anything and let the user solve it
+		;;
+	4)  return 1 # The fs has been fixed
+		;;
+	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
+		return 4 # something went wrong with xfs_repair
+	esac
+}
+
 AUTO=false
-while getopts ":aApy" c
+FORCE=false
+while getopts ":aApyf" c
 do
 	case $c in
 	a|A|p|y)	AUTO=true;;
+	f)      	FORCE=true;;
 	esac
 done
 eval DEV=\${$#}
@@ -15,6 +39,37 @@ if [ ! -e $DEV ]; then
 	echo "$0: $DEV does not exist"
 	exit 8
 fi
+
+# The flag -f is added by systemd/init scripts when /forcefsck file is present
+# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
+# this check, user has to explicitly require a forced fsck.
+# But first of all, test if it is a non-interactive session.
+# Invoking xfs_repair via fsck.xfs is only intended to happen via initscripts.
+# Normal administrative filesystem repairs should always invoke xfs_repair
+# directly.
+#
+# Use multiple methods to capture most of the cases:
+# The case for *i* and -n "$PS1" are commonly suggested in bash manual
+# and the -t 0 test checks stdin
+case $- in
+	*i*) FORCE=false ;;
+esac
+if [ -n "$PS1" -o -t 0 ]; then
+	FORCE=false
+fi
+
+if $FORCE; then
+	XFS_REPAIR=`command -v xfs_repair`
+	if [ ! -x "$XFS_REPAIR" ] ; then
+		echo "$NAME error: xfs_repair was not found!" 1>&2
+		exit 4
+	fi
+
+	$XFS_REPAIR -e $DEV
+	repair2fsck_code $?
+	exit $?
+fi
+
 if $AUTO; then
 	echo "$0: XFS file system."
 else
diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
index ace7252d..a51baf7c 100644
--- a/man/man8/fsck.xfs.8
+++ b/man/man8/fsck.xfs.8
@@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
 or repair a damaged or corrupt XFS filesystem,
 see
 .BR xfs_repair (8).
+.PP
+However, the system administrator can force
+.B fsck.xfs
+to run
+.BR xfs_repair (8)
+at boot time by creating a /forcefsck file or booting the system with
+"fsck.mode=force" on the kernel command line.
 .
 .SH FILES
 .IR /etc/fstab .
-- 
2.16.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v6] fsck.xfs: allow forced repairs using xfs_repair
  2018-03-23 14:33             ` [PATCH 2/2 v6] " Jan Tulak
@ 2022-09-28  5:28               ` Darrick J. Wong
  2022-09-29  8:31                 ` Carlos Maiolino
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2022-09-28  5:28 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-xfs

On Fri, Mar 23, 2018 at 03:33:13PM +0100, Jan Tulak wrote:
> The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
> run on every unclean shutdown. However, sometimes it may happen that the
> root filesystem really requires the usage of xfs_repair and then it is a
> hassle. This patch makes the situation a bit easier by detecting forced
> checks (/forcefsck or fsck.mode=force) and invoking xfs_repair.
> 
> Signed-off-by: Jan Tulak <jtulak@redhat.com>
> 
> ---
> Changelog:
> v6:
> - update exit code 3->4
> - avoid hardcoding xfs_repair path
> - rename $BIN->$xfs_repair
> - remove mounted check - xfs_repair does it on its own
> - small manpage edit
> - better explanation in the comment
> v5:
> - Change the message for xfs_repair code 2
> v4:
> - man page changes
> v3:
> - too quick with fixing in v2... add line at the end of the file
> v2:
> - return the "exit 0" at the end
> 
> v1:
> - test for xfs_repair binary
> - run only in non-interactive session
> - translate xfs_repair return codes to fsck ones
> - run only if the filesystem is not mounted
> - add manpage update
> ---
>  fsck/xfs_fsck.sh    | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  man/man8/fsck.xfs.8 |  7 +++++++
>  2 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh
> index e52969e4..c9fc3eb3 100755
> --- a/fsck/xfs_fsck.sh
> +++ b/fsck/xfs_fsck.sh
> @@ -3,11 +3,35 @@
>  # Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
>  #
>  
> +NAME=$0
> +
> +# get the right return code for fsck
> +function repair2fsck_code() {
> +	case $1 in
> +	0)  return 0 # everything is ok
> +		;;
> +	1)  echo "$NAME error: xfs_repair could not fix the filesystem." 1>&2
> +		return 4 # errors left uncorrected
> +		;;
> +	2)  echo "$NAME error: The filesystem log is dirty, mount it to recover" \
> +		     "the log. If that fails, refer to the section DIRTY LOGS in the" \
> +		     "xfs_repair manual page." 1>&2
> +		return 4 # dirty log, don't do anything and let the user solve it
> +		;;
> +	4)  return 1 # The fs has been fixed
> +		;;
> +	*)  echo "$NAME error: An unknown return code from xfs_repair '$1'" 1>&2
> +		return 4 # something went wrong with xfs_repair
> +	esac
> +}
> +
>  AUTO=false
> -while getopts ":aApy" c
> +FORCE=false
> +while getopts ":aApyf" c
>  do
>  	case $c in
>  	a|A|p|y)	AUTO=true;;
> +	f)      	FORCE=true;;
>  	esac
>  done
>  eval DEV=\${$#}
> @@ -15,6 +39,37 @@ if [ ! -e $DEV ]; then
>  	echo "$0: $DEV does not exist"
>  	exit 8
>  fi
> +
> +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> +# this check, user has to explicitly require a forced fsck.
> +# But first of all, test if it is a non-interactive session.
> +# Invoking xfs_repair via fsck.xfs is only intended to happen via initscripts.
> +# Normal administrative filesystem repairs should always invoke xfs_repair
> +# directly.
> +#
> +# Use multiple methods to capture most of the cases:
> +# The case for *i* and -n "$PS1" are commonly suggested in bash manual
> +# and the -t 0 test checks stdin
> +case $- in
> +	*i*) FORCE=false ;;
> +esac
> +if [ -n "$PS1" -o -t 0 ]; then
> +	FORCE=false
> +fi
> +
> +if $FORCE; then
> +	XFS_REPAIR=`command -v xfs_repair`
> +	if [ ! -x "$XFS_REPAIR" ] ; then
> +		echo "$NAME error: xfs_repair was not found!" 1>&2
> +		exit 4
> +	fi
> +
> +	$XFS_REPAIR -e $DEV
> +	repair2fsck_code $?

Just to reopen years-old discussions --

Recently, a customer decided to add "fsck.mode=force" to the kernel
command line to force systemd to fsck the rootfs on boot.  They
performed a powerfail simulation, and on next boot they were dropped to
an emergency shell because the log was dirty and xfs_repair returned a
nonzero error code.  If the system was rebooted cleanly then xfs_repair
rebuilds the space metadata and exits quietly.

Earlier in this thread we decided not to do a mount/umount cycle to
clear a dirty log for fear that the mount could crash the kernel.  Would
anyone like to entertain the idea of adding that cycle to fsck.xfs if
the program argv includes '-y' and xfs_repair returns 2?  That would
only happen if the sysadmin *also* adds "fsck.repair=yes" to the kernel
command line.

Omitting a fsck.repair= setting means systemd passes -a to fsck instead
of -y.

--D

> +	exit $?
> +fi
> +
>  if $AUTO; then
>  	echo "$0: XFS file system."
>  else
> diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
> index ace7252d..a51baf7c 100644
> --- a/man/man8/fsck.xfs.8
> +++ b/man/man8/fsck.xfs.8
> @@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
>  or repair a damaged or corrupt XFS filesystem,
>  see
>  .BR xfs_repair (8).
> +.PP
> +However, the system administrator can force
> +.B fsck.xfs
> +to run
> +.BR xfs_repair (8)
> +at boot time by creating a /forcefsck file or booting the system with
> +"fsck.mode=force" on the kernel command line.
>  .
>  .SH FILES
>  .IR /etc/fstab .
> -- 
> 2.16.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/2 v6] fsck.xfs: allow forced repairs using xfs_repair
  2022-09-28  5:28               ` Darrick J. Wong
@ 2022-09-29  8:31                 ` Carlos Maiolino
  0 siblings, 0 replies; 46+ messages in thread
From: Carlos Maiolino @ 2022-09-29  8:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> > +
> > +# The flag -f is added by systemd/init scripts when /forcefsck file is present
> > +# or fsck.mode=force is used during boot; an unclean shutdown won't trigger
> > +# this check, user has to explicitly require a forced fsck.
> > +# But first of all, test if it is a non-interactive session.
> > +# Invoking xfs_repair via fsck.xfs is only intended to happen via initscripts.
> > +# Normal administrative filesystem repairs should always invoke xfs_repair
> > +# directly.
> > +#
> > +# Use multiple methods to capture most of the cases:
> > +# The case for *i* and -n "$PS1" are commonly suggested in bash manual
> > +# and the -t 0 test checks stdin
> > +case $- in
> > +	*i*) FORCE=false ;;
> > +esac
> > +if [ -n "$PS1" -o -t 0 ]; then
> > +	FORCE=false
> > +fi
> > +
> > +if $FORCE; then
> > +	XFS_REPAIR=`command -v xfs_repair`
> > +	if [ ! -x "$XFS_REPAIR" ] ; then
> > +		echo "$NAME error: xfs_repair was not found!" 1>&2
> > +		exit 4
> > +	fi
> > +
> > +	$XFS_REPAIR -e $DEV
> > +	repair2fsck_code $?
> 
> Just to reopen years-old discussions --
> 
> Recently, a customer decided to add "fsck.mode=force" to the kernel
> command line to force systemd to fsck the rootfs on boot.  They
> performed a powerfail simulation, and on next boot they were dropped to
> an emergency shell because the log was dirty and xfs_repair returned a
> nonzero error code.  If the system was rebooted cleanly then xfs_repair
> rebuilds the space metadata and exits quietly.
> 
> Earlier in this thread we decided not to do a mount/umount cycle to
> clear a dirty log for fear that the mount could crash the kernel.  Would
> anyone like to entertain the idea of adding that cycle to fsck.xfs if
> the program argv includes '-y' and xfs_repair returns 2?  That would
> only happen if the sysadmin *also* adds "fsck.repair=yes" to the kernel
> command line.
> 

I am not really opposed at it.

Particularly, I don't like the idea of the chance of unnoticed corruptions. And
I'm afraid this will just encourage some users to set fsck.mode=force
'by default', which I don't think is ideal. Not to mention, one of the advantages
of journaling FS'es is exactly avoid forcing a fsck at mount time :)

Anyway, this is just my $0.02, I'm not really opposed to this if people find
this useful somehow to avoid human interaction in case of a corrupted rootfs.

> Omitting a fsck.repair= setting means systemd passes -a to fsck instead
> of -y.
> 
> --D
> 
> > +	exit $?
> > +fi
> > +
> >  if $AUTO; then
> >  	echo "$0: XFS file system."
> >  else
> > diff --git a/man/man8/fsck.xfs.8 b/man/man8/fsck.xfs.8
> > index ace7252d..a51baf7c 100644
> > --- a/man/man8/fsck.xfs.8
> > +++ b/man/man8/fsck.xfs.8
> > @@ -21,6 +21,13 @@ If you wish to check the consistency of an XFS filesystem,
> >  or repair a damaged or corrupt XFS filesystem,
> >  see
> >  .BR xfs_repair (8).
> > +.PP
> > +However, the system administrator can force
> > +.B fsck.xfs
> > +to run
> > +.BR xfs_repair (8)
> > +at boot time by creating a /forcefsck file or booting the system with
> > +"fsck.mode=force" on the kernel command line.
> >  .
> >  .SH FILES
> >  .IR /etc/fstab .
> > --
> > 2.16.2
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos Maiolino

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-09-29  8:31 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 15:05 [PATCH] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
2018-03-05 21:56 ` Dave Chinner
2018-03-05 22:06   ` Eric Sandeen
2018-03-05 22:20     ` Darrick J. Wong
2018-03-05 22:31     ` Dave Chinner
2018-03-05 23:33       ` Eric Sandeen
2018-03-06 11:51         ` Jan Tulak
2018-03-06 21:39           ` Dave Chinner
2018-03-08 10:57             ` Jan Tulak
2018-03-08 16:28               ` Darrick J. Wong
2018-03-08 22:36                 ` Dave Chinner
2018-03-14 13:51                   ` Jan Tulak
2018-03-14 15:25                     ` Darrick J. Wong
2018-03-14 21:10                       ` Dave Chinner
2018-03-15 17:01                       ` Jan Tulak
2018-03-08 23:28               ` Eric Sandeen
2018-03-14 13:30                 ` Jan Tulak
2018-03-14 15:19                   ` Eric Sandeen
2018-03-15 11:16                     ` Jan Tulak
2018-03-15 22:19                       ` Dave Chinner
2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
2018-03-15 17:45   ` [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
2018-03-15 17:47     ` [PATCH 2/2 v2] " Jan Tulak
2018-03-15 17:50       ` [PATCH 2/2] " Jan Tulak
2018-03-15 18:11         ` Darrick J. Wong
2018-03-15 18:22           ` Jan Tulak
2018-03-15 18:28         ` [PATCH 2/2 v4] " Jan Tulak
2018-03-15 18:49           ` Darrick J. Wong
2018-03-16 10:19             ` Jan Tulak
2018-03-16 15:39               ` Darrick J. Wong
2018-03-16 17:07           ` [PATCH 2/2 v5] " Jan Tulak
2018-03-23  2:37             ` Eric Sandeen
2018-03-23  3:25               ` Darrick J. Wong
2018-03-23  3:29                 ` Eric Sandeen
2018-03-23  3:42                   ` Darrick J. Wong
2018-03-23 14:00                     ` Jan Tulak
2018-03-23 14:14               ` Jan Tulak
2018-03-23 14:33             ` [PATCH 2/2 v6] " Jan Tulak
2022-09-28  5:28               ` Darrick J. Wong
2022-09-29  8:31                 ` Carlos Maiolino
2018-03-15 18:03   ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Darrick J. Wong
2018-03-15 18:23   ` [PATCH 1/2 v2] " Jan Tulak
2018-03-15 18:44     ` Darrick J. Wong
2018-03-23  1:57     ` Eric Sandeen
2018-03-23  9:24       ` Jan Tulak
2018-03-23 14:32     ` [PATCH 1/2 v3] xfs_repair: add flag -e to modify exit code for " Jan Tulak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.