util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fsck command line API
@ 2017-12-27 10:14 Pali Rohár
  2017-12-29 12:02 ` Karel Zak
  0 siblings, 1 reply; 6+ messages in thread
From: Pali Rohár @ 2017-12-27 10:14 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, Vojtěch Vladyka

Hello!

Vojtěch Vladyka is working on a new fsck tool udffsck for UDF filesystem
[1] and I would like to know if there is a some standardized command line
API for fsck tools, so s new udffsck would be compatible.

In util-linux repository there is some generic wrapper fsck which starts
correct filesystem fsck tool... And probably systemd has own wrapper
which do similar thing.

Are there already defined some set of command line arguments which are
expected for fsck tools? Or exit status values from those tools?

Also, lot of filesystems store information if last usage/mount was
correctly synchronized and unmounted. E.g. FAT has dirty bit, ext4 has
journal clean state, UDF has integrity field... and in most cases when
fsck is started at boot time it make sense to skip fsck data check
routine if filesystem state is clean (last time properly unmounted). Is
there some command line API to tell fsck tool if it should do full disk
check (including data) or do it conditionally if filesystem is in dirty
state?

[1] - https://github.com/pali/udftools/pull/7

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck command line API
  2017-12-27 10:14 fsck command line API Pali Rohár
@ 2017-12-29 12:02 ` Karel Zak
  2017-12-29 18:37   ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Karel Zak @ 2017-12-29 12:02 UTC (permalink / raw)
  To: Pali Rohár; +Cc: util-linux, Vojtěch Vladyka

On Wed, Dec 27, 2017 at 11:14:29AM +0100, Pali Rohár wrote:
> Vojtěch Vladyka is working on a new fsck tool udffsck for UDF filesystem
> [1] and I would like to know if there is a some standardized command line
> API for fsck tools, so s new udffsck would be compatible.
> 
> In util-linux repository there is some generic wrapper fsck which starts
> correct filesystem fsck tool... And probably systemd has own wrapper
> which do similar thing.

The systemd calls fsck wrapper from util-linux:
https://github.com/systemd/systemd/blob/master/src/fsck/fsck.c#L446

> Are there already defined some set of command line arguments which are
> expected for fsck tools? Or exit status values from those tools?

We have no API. The solution is to call fsck wrapper with proper
command line.

    fsck [options] [--] [fs-specific-options]

The exit code returned when multiple filesystems are checked is the
bit-wise OR of the exit codes for each filesystem that is checked. The
man page contains some return codes. It's probably good idea to use
the same codes in the fs specific fsck tools.

See man fsck:

    FILESYSTEM SPECIFIC OPTIONS

       Options which are not understood by fsck are passed to the
       filesystem-specific checker!

       These options must not take arguments, as there is no way for
       fsck to be able to properly guess which options take arguments
       and which don't.

       Options and arguments which follow the -- are treated as
       filesystem-specific options to be passed to the
       filesystem-specific checker.

       Please  note  that  fsck  is not designed to pass arbitrarily
       complicated options to filesystem-specific checkers.  If you're
       doing something complicated, please just execute the
       filesystem-specific checker directly.  If you pass fsck some
       horribly complicated options and arguments, and it doesn't do
       what you expect, don't bother reporting it  as  a  bug.  You're
       almost certainly doing something that you shouldn't be doing
       with fsck.  Options to different filesystem-specific fsck's are
       not standardized.



The fsck wrapper has been originally designed by Ted as well as
fsck.extN tools. So, my suggestion is to follow fsck.extN to keep
things compatible as much as possible :-)

> Also, lot of filesystems store information if last usage/mount was
> correctly synchronized and unmounted. E.g. FAT has dirty bit, ext4 has
> journal clean state, UDF has integrity field... and in most cases when
> fsck is started at boot time it make sense to skip fsck data check
> routine if filesystem state is clean (last time properly unmounted). Is
> there some command line API to tell fsck tool if it should do full disk
> check (including data) or do it conditionally if filesystem is in dirty
> state?

The fsck wrapper does not care. It's fsck.<type> responsibility to do
good things by default and if the default setting is not possible
(safe, etc.) then ask for manual execution without the wrapper:

    fsck.foo: failed to fix foo; use "fsck.foo --something"

I think many fsck.<type> tools use options like -a (or -p), -y, -f and
-C. The -f (force) and -C (progress bar) is supported by systemd too.

    Karel


-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck command line API
  2017-12-29 12:02 ` Karel Zak
@ 2017-12-29 18:37   ` Theodore Ts'o
  2017-12-30 12:03     ` Pali Rohár
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2017-12-29 18:37 UTC (permalink / raw)
  To: Karel Zak; +Cc: Pali Rohár, util-linux, Vojtěch Vladyka

On Fri, Dec 29, 2017 at 01:02:42PM +0100, Karel Zak wrote:
> > Are there already defined some set of command line arguments which are
> > expected for fsck tools? Or exit status values from those tools?
> 
> We have no API. The solution is to call fsck wrapper with proper
> command line.
> 
>     fsck [options] [--] [fs-specific-options]
> 
> The exit code returned when multiple filesystems are checked is the
> bit-wise OR of the exit codes for each filesystem that is checked. The
> man page contains some return codes. It's probably good idea to use
> the same codes in the fs specific fsck tools.

Well *fsck* may not have an API.  But the init scripts or systemd unit
files do have some common expectations.  In general fsck -a or -p will
be used the boot scripts, and that means "automatic" or "preen".  The
-a and -p option should do the same thing, and it's the sort of
automatic, "safe" fixups that can be done in an unattended setup, such
as during the boot.

If there is a file system inconsistency that can not been fixed, then
fsck should exit with the exit status of 4.  From the fsck man page:

       The exit code returned by fsck is the sum of the following
       conditions:

              0      No errors
              1      Filesystem errors corrected
              2      System should be rebooted
              4      Filesystem errors left uncorrected
              8      Operational error
              16     Usage or syntax error
              32     Checking canceled by user request
              128    Shared-library error

       The exit code returned when multiple filesystems are checked is
       the bit-wise OR of the exit codes for each filesystem that is
       checked.

An exit status of 2 means that the file system should be rebooted.
This is typically combined with 1 because this tends to happen when
the root file system has been modified, and so the kernel may have
incorrect file system state cached from using the root file system to
run fsck.  So fsck.extN will check to see if it is the root file
system being checked, and if it is, and changes were made to the file
system, then it will exit with a status code of 3.  If changes were
made to the file system and it is not the root file system, then it
will return with an exit code of 1.  This is not an error; just an
informative signal that changes were made by fsck.extN.  If no changes
were necessary, fsck.extN will return with an exit code of 0.

The other exit codes can be used if they are applicable; if the user
aborts an fsck run with ^C, then fsck.extN will return an exit status
code of 32.  If the command line has options which are not recognized
fsck.extN will return 16.  And so on.


The other commonly used conventions is that the -y option means to
just blindly fix all errors, regardless of whether it might be safe or
whether in the hands of a skilled sysadmin, some data loss might be
avoidable if the file system specific fsck is run manually, perhaps in
conjuction with low-level debugging tools (e.g., such as debugfs for
ext2/3/4 or xfs_db for xfs, etc.)

Some init scripts will use -y because they know that there is no
skilled operator around.  For example, in an mobile handset or an IOT
device, maybe fsck -y will be used because if the file system is
corrupted because of a power drop interacting with crappy (sorry,
"cost optimized") flash, all you can do use fsck -y and pray.

Also -n is used to not fix any errors at all (-y means "all yes", -n
means "all no").  This is used with the exit status codes for scripts
that are trying to check the consistency of a file system without
actually doing anything.

Typically, if no arguments are given, then the fsck will operate in
interactive mode, where it will report each inconsistency, and then
ask the user if she would like to fix the inconsistency or not.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck command line API
  2017-12-29 18:37   ` Theodore Ts'o
@ 2017-12-30 12:03     ` Pali Rohár
  2017-12-30 13:20       ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Pali Rohár @ 2017-12-30 12:03 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Karel Zak, util-linux, Vojtěch Vladyka

On Friday 29 December 2017 13:37:54 Theodore Ts'o wrote:
> On Fri, Dec 29, 2017 at 01:02:42PM +0100, Karel Zak wrote:
> > > Are there already defined some set of command line arguments which are
> > > expected for fsck tools? Or exit status values from those tools?
> > 
> > We have no API. The solution is to call fsck wrapper with proper
> > command line.
> > 
> >     fsck [options] [--] [fs-specific-options]
> > 
> > The exit code returned when multiple filesystems are checked is the
> > bit-wise OR of the exit codes for each filesystem that is checked. The
> > man page contains some return codes. It's probably good idea to use
> > the same codes in the fs specific fsck tools.
> 
> Well *fsck* may not have an API.  But the init scripts or systemd unit
> files do have some common expectations.  In general fsck -a or -p will
> be used the boot scripts, and that means "automatic" or "preen".  The
> -a and -p option should do the same thing, and it's the sort of
> automatic, "safe" fixups that can be done in an unattended setup, such
> as during the boot.
> 
> If there is a file system inconsistency that can not been fixed, then
> fsck should exit with the exit status of 4.  From the fsck man page:
> 
>        The exit code returned by fsck is the sum of the following
>        conditions:
> 
>               0      No errors
>               1      Filesystem errors corrected
>               2      System should be rebooted
>               4      Filesystem errors left uncorrected
>               8      Operational error
>               16     Usage or syntax error
>               32     Checking canceled by user request
>               128    Shared-library error
> 
>        The exit code returned when multiple filesystems are checked is
>        the bit-wise OR of the exit codes for each filesystem that is
>        checked.
> 
> An exit status of 2 means that the file system should be rebooted.
> This is typically combined with 1 because this tends to happen when
> the root file system has been modified, and so the kernel may have
> incorrect file system state cached from using the root file system to
> run fsck.  So fsck.extN will check to see if it is the root file
> system being checked, and if it is, and changes were made to the file
> system, then it will exit with a status code of 3.  If changes were
> made to the file system and it is not the root file system, then it
> will return with an exit code of 1.  This is not an error; just an
> informative signal that changes were made by fsck.extN.  If no changes
> were necessary, fsck.extN will return with an exit code of 0.
> 
> The other exit codes can be used if they are applicable; if the user
> aborts an fsck run with ^C, then fsck.extN will return an exit status
> code of 32.  If the command line has options which are not recognized
> fsck.extN will return 16.  And so on.
> 
> 
> The other commonly used conventions is that the -y option means to
> just blindly fix all errors, regardless of whether it might be safe or
> whether in the hands of a skilled sysadmin, some data loss might be
> avoidable if the file system specific fsck is run manually, perhaps in
> conjuction with low-level debugging tools (e.g., such as debugfs for
> ext2/3/4 or xfs_db for xfs, etc.)
> 
> Some init scripts will use -y because they know that there is no
> skilled operator around.  For example, in an mobile handset or an IOT
> device, maybe fsck -y will be used because if the file system is
> corrupted because of a power drop interacting with crappy (sorry,
> "cost optimized") flash, all you can do use fsck -y and pray.
> 
> Also -n is used to not fix any errors at all (-y means "all yes", -n
> means "all no").  This is used with the exit status codes for scripts
> that are trying to check the consistency of a file system without
> actually doing anything.
> 
> Typically, if no arguments are given, then the fsck will operate in
> interactive mode, where it will report each inconsistency, and then
> ask the user if she would like to fix the inconsistency or not.
> 
> Regards,
> 
> 						- Ted

Hi! Thank you for detailed information. I have just one more question,
what should filesystem specific fsck do if is started by -a or -p (or
with -y) during boot and filesystem structures indicates that last time
it was successfully (clean) unmounted? Should it scan whole disk and
check all data (files/directories/structures) for consistency? Or should
trust for "clean" state and stop? And if stop, how to tell that
filesystem fsck to really scan whole disk? For example scanning 2TB disk
is really time consuming, specially at boot time.

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck command line API
  2017-12-30 12:03     ` Pali Rohár
@ 2017-12-30 13:20       ` Theodore Ts'o
  2017-12-30 19:48         ` Pali Rohár
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2017-12-30 13:20 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Karel Zak, util-linux, Vojtěch Vladyka

On Sat, Dec 30, 2017 at 01:03:52PM +0100, Pali Rohár wrote:
> 
> Hi! Thank you for detailed information. I have just one more question,
> what should filesystem specific fsck do if is started by -a or -p (or
> with -y) during boot and filesystem structures indicates that last time
> it was successfully (clean) unmounted? Should it scan whole disk and
> check all data (files/directories/structures) for consistency? Or should
> trust for "clean" state and stop? And if stop, how to tell that
> filesystem fsck to really scan whole disk? For example scanning 2TB disk
> is really time consuming, specially at boot time.

What e2fsck (fsck.extN) does is check to see if the file system has
the "errors/corruptions were detected by the kernel" bit set.  If so,
it will do a full check.  Otherwise, if time-based or mount-based
criteria is enabled, and exceeded, then e2fsck will do a full check.
Otherwise, it will stop.

The other thing e2fsck for ext4 file systems will do is to replay the
journal.  This is useful because fsck will run fsck in parallel, while
mount -a mounts file systems serially.  So running the journal in
parallel when you have multiple disk spindles can be a big win.  This
may be less of a big deal these days since systemd will run mounts in
parallel.

As far as time-based or mount-based full checks (see tune2fs for
discussion on this topic), we don't enable this by default any more in
e2fsprogs.  That's precisely because doing full check for 10 TB disk
takes and 60TB RAID arrays takes a long time.  The idea of doing mount
based checks goes back to the BSD days, because disks and memory are
really crappy, and so checking to find problems before they become
catastrophic data loss events made sense.  These days for big disks,
the cost/benefit ratio doesn't work out as well.

Also, if you have snapshot support in your file system, you can simply
create a snapshot, and run the fsck on the snapshot.  An example of
how to do this can be found here:

https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/contrib/e2croncheck

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck command line API
  2017-12-30 13:20       ` Theodore Ts'o
@ 2017-12-30 19:48         ` Pali Rohár
  0 siblings, 0 replies; 6+ messages in thread
From: Pali Rohár @ 2017-12-30 19:48 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Karel Zak, util-linux, Vojtěch Vladyka

On Saturday 30 December 2017 08:20:45 Theodore Ts'o wrote:
> On Sat, Dec 30, 2017 at 01:03:52PM +0100, Pali Rohár wrote:
> > 
> > Hi! Thank you for detailed information. I have just one more question,
> > what should filesystem specific fsck do if is started by -a or -p (or
> > with -y) during boot and filesystem structures indicates that last time
> > it was successfully (clean) unmounted? Should it scan whole disk and
> > check all data (files/directories/structures) for consistency? Or should
> > trust for "clean" state and stop? And if stop, how to tell that
> > filesystem fsck to really scan whole disk? For example scanning 2TB disk
> > is really time consuming, specially at boot time.
> 
> What e2fsck (fsck.extN) does is check to see if the file system has
> the "errors/corruptions were detected by the kernel" bit set.  If so,
> it will do a full check.  Otherwise, if time-based or mount-based
> criteria is enabled, and exceeded, then e2fsck will do a full check.
> Otherwise, it will stop.
> 
> The other thing e2fsck for ext4 file systems will do is to replay the
> journal.  This is useful because fsck will run fsck in parallel, while
> mount -a mounts file systems serially.  So running the journal in
> parallel when you have multiple disk spindles can be a big win.  This
> may be less of a big deal these days since systemd will run mounts in
> parallel.
> 
> As far as time-based or mount-based full checks (see tune2fs for
> discussion on this topic), we don't enable this by default any more in
> e2fsprogs.  That's precisely because doing full check for 10 TB disk
> takes and 60TB RAID arrays takes a long time.  The idea of doing mount
> based checks goes back to the BSD days, because disks and memory are
> really crappy, and so checking to find problems before they become
> catastrophic data loss events made sense.  These days for big disks,
> the cost/benefit ratio doesn't work out as well.
> 
> Also, if you have snapshot support in your file system, you can simply
> create a snapshot, and run the fsck on the snapshot.  An example of
> how to do this can be found here:
> 
> https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/contrib/e2croncheck
> 
> Cheers,
> 
> 					- Ted

Ok, thank you for explanation. In case for a new UDF fsck tool it would
also make sense then to not do full check at boot time if filesystem is
marked as clean.

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-12-30 19:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-27 10:14 fsck command line API Pali Rohár
2017-12-29 12:02 ` Karel Zak
2017-12-29 18:37   ` Theodore Ts'o
2017-12-30 12:03     ` Pali Rohár
2017-12-30 13:20       ` Theodore Ts'o
2017-12-30 19:48         ` Pali Rohár

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).