* fsck command line API @ 2017-12-27 10:14 Pali Rohár 2017-12-29 12:02 ` Karel Zak 0 siblings, 1 reply; 6+ messages in thread From: Pali Rohár @ 2017-12-27 10:14 UTC (permalink / raw) To: Karel Zak; +Cc: util-linux, Vojtěch Vladyka Hello! Vojtěch Vladyka is working on a new fsck tool udffsck for UDF filesystem [1] and I would like to know if there is a some standardized command line API for fsck tools, so s new udffsck would be compatible. In util-linux repository there is some generic wrapper fsck which starts correct filesystem fsck tool... And probably systemd has own wrapper which do similar thing. Are there already defined some set of command line arguments which are expected for fsck tools? Or exit status values from those tools? Also, lot of filesystems store information if last usage/mount was correctly synchronized and unmounted. E.g. FAT has dirty bit, ext4 has journal clean state, UDF has integrity field... and in most cases when fsck is started at boot time it make sense to skip fsck data check routine if filesystem state is clean (last time properly unmounted). Is there some command line API to tell fsck tool if it should do full disk check (including data) or do it conditionally if filesystem is in dirty state? [1] - https://github.com/pali/udftools/pull/7 -- Pali Rohár pali.rohar@gmail.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fsck command line API 2017-12-27 10:14 fsck command line API Pali Rohár @ 2017-12-29 12:02 ` Karel Zak 2017-12-29 18:37 ` Theodore Ts'o 0 siblings, 1 reply; 6+ messages in thread From: Karel Zak @ 2017-12-29 12:02 UTC (permalink / raw) To: Pali Rohár; +Cc: util-linux, Vojtěch Vladyka On Wed, Dec 27, 2017 at 11:14:29AM +0100, Pali Rohár wrote: > Vojtěch Vladyka is working on a new fsck tool udffsck for UDF filesystem > [1] and I would like to know if there is a some standardized command line > API for fsck tools, so s new udffsck would be compatible. > > In util-linux repository there is some generic wrapper fsck which starts > correct filesystem fsck tool... And probably systemd has own wrapper > which do similar thing. The systemd calls fsck wrapper from util-linux: https://github.com/systemd/systemd/blob/master/src/fsck/fsck.c#L446 > Are there already defined some set of command line arguments which are > expected for fsck tools? Or exit status values from those tools? We have no API. The solution is to call fsck wrapper with proper command line. fsck [options] [--] [fs-specific-options] The exit code returned when multiple filesystems are checked is the bit-wise OR of the exit codes for each filesystem that is checked. The man page contains some return codes. It's probably good idea to use the same codes in the fs specific fsck tools. See man fsck: FILESYSTEM SPECIFIC OPTIONS Options which are not understood by fsck are passed to the filesystem-specific checker! These options must not take arguments, as there is no way for fsck to be able to properly guess which options take arguments and which don't. Options and arguments which follow the -- are treated as filesystem-specific options to be passed to the filesystem-specific checker. Please note that fsck is not designed to pass arbitrarily complicated options to filesystem-specific checkers. If you're doing something complicated, please just execute the filesystem-specific checker directly. If you pass fsck some horribly complicated options and arguments, and it doesn't do what you expect, don't bother reporting it as a bug. You're almost certainly doing something that you shouldn't be doing with fsck. Options to different filesystem-specific fsck's are not standardized. The fsck wrapper has been originally designed by Ted as well as fsck.extN tools. So, my suggestion is to follow fsck.extN to keep things compatible as much as possible :-) > Also, lot of filesystems store information if last usage/mount was > correctly synchronized and unmounted. E.g. FAT has dirty bit, ext4 has > journal clean state, UDF has integrity field... and in most cases when > fsck is started at boot time it make sense to skip fsck data check > routine if filesystem state is clean (last time properly unmounted). Is > there some command line API to tell fsck tool if it should do full disk > check (including data) or do it conditionally if filesystem is in dirty > state? The fsck wrapper does not care. It's fsck.<type> responsibility to do good things by default and if the default setting is not possible (safe, etc.) then ask for manual execution without the wrapper: fsck.foo: failed to fix foo; use "fsck.foo --something" I think many fsck.<type> tools use options like -a (or -p), -y, -f and -C. The -f (force) and -C (progress bar) is supported by systemd too. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fsck command line API 2017-12-29 12:02 ` Karel Zak @ 2017-12-29 18:37 ` Theodore Ts'o 2017-12-30 12:03 ` Pali Rohár 0 siblings, 1 reply; 6+ messages in thread From: Theodore Ts'o @ 2017-12-29 18:37 UTC (permalink / raw) To: Karel Zak; +Cc: Pali Rohár, util-linux, Vojtěch Vladyka On Fri, Dec 29, 2017 at 01:02:42PM +0100, Karel Zak wrote: > > Are there already defined some set of command line arguments which are > > expected for fsck tools? Or exit status values from those tools? > > We have no API. The solution is to call fsck wrapper with proper > command line. > > fsck [options] [--] [fs-specific-options] > > The exit code returned when multiple filesystems are checked is the > bit-wise OR of the exit codes for each filesystem that is checked. The > man page contains some return codes. It's probably good idea to use > the same codes in the fs specific fsck tools. Well *fsck* may not have an API. But the init scripts or systemd unit files do have some common expectations. In general fsck -a or -p will be used the boot scripts, and that means "automatic" or "preen". The -a and -p option should do the same thing, and it's the sort of automatic, "safe" fixups that can be done in an unattended setup, such as during the boot. If there is a file system inconsistency that can not been fixed, then fsck should exit with the exit status of 4. From the fsck man page: The exit code returned by fsck is the sum of the following conditions: 0 No errors 1 Filesystem errors corrected 2 System should be rebooted 4 Filesystem errors left uncorrected 8 Operational error 16 Usage or syntax error 32 Checking canceled by user request 128 Shared-library error The exit code returned when multiple filesystems are checked is the bit-wise OR of the exit codes for each filesystem that is checked. An exit status of 2 means that the file system should be rebooted. This is typically combined with 1 because this tends to happen when the root file system has been modified, and so the kernel may have incorrect file system state cached from using the root file system to run fsck. So fsck.extN will check to see if it is the root file system being checked, and if it is, and changes were made to the file system, then it will exit with a status code of 3. If changes were made to the file system and it is not the root file system, then it will return with an exit code of 1. This is not an error; just an informative signal that changes were made by fsck.extN. If no changes were necessary, fsck.extN will return with an exit code of 0. The other exit codes can be used if they are applicable; if the user aborts an fsck run with ^C, then fsck.extN will return an exit status code of 32. If the command line has options which are not recognized fsck.extN will return 16. And so on. The other commonly used conventions is that the -y option means to just blindly fix all errors, regardless of whether it might be safe or whether in the hands of a skilled sysadmin, some data loss might be avoidable if the file system specific fsck is run manually, perhaps in conjuction with low-level debugging tools (e.g., such as debugfs for ext2/3/4 or xfs_db for xfs, etc.) Some init scripts will use -y because they know that there is no skilled operator around. For example, in an mobile handset or an IOT device, maybe fsck -y will be used because if the file system is corrupted because of a power drop interacting with crappy (sorry, "cost optimized") flash, all you can do use fsck -y and pray. Also -n is used to not fix any errors at all (-y means "all yes", -n means "all no"). This is used with the exit status codes for scripts that are trying to check the consistency of a file system without actually doing anything. Typically, if no arguments are given, then the fsck will operate in interactive mode, where it will report each inconsistency, and then ask the user if she would like to fix the inconsistency or not. Regards, - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fsck command line API 2017-12-29 18:37 ` Theodore Ts'o @ 2017-12-30 12:03 ` Pali Rohár 2017-12-30 13:20 ` Theodore Ts'o 0 siblings, 1 reply; 6+ messages in thread From: Pali Rohár @ 2017-12-30 12:03 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Karel Zak, util-linux, Vojtěch Vladyka On Friday 29 December 2017 13:37:54 Theodore Ts'o wrote: > On Fri, Dec 29, 2017 at 01:02:42PM +0100, Karel Zak wrote: > > > Are there already defined some set of command line arguments which are > > > expected for fsck tools? Or exit status values from those tools? > > > > We have no API. The solution is to call fsck wrapper with proper > > command line. > > > > fsck [options] [--] [fs-specific-options] > > > > The exit code returned when multiple filesystems are checked is the > > bit-wise OR of the exit codes for each filesystem that is checked. The > > man page contains some return codes. It's probably good idea to use > > the same codes in the fs specific fsck tools. > > Well *fsck* may not have an API. But the init scripts or systemd unit > files do have some common expectations. In general fsck -a or -p will > be used the boot scripts, and that means "automatic" or "preen". The > -a and -p option should do the same thing, and it's the sort of > automatic, "safe" fixups that can be done in an unattended setup, such > as during the boot. > > If there is a file system inconsistency that can not been fixed, then > fsck should exit with the exit status of 4. From the fsck man page: > > The exit code returned by fsck is the sum of the following > conditions: > > 0 No errors > 1 Filesystem errors corrected > 2 System should be rebooted > 4 Filesystem errors left uncorrected > 8 Operational error > 16 Usage or syntax error > 32 Checking canceled by user request > 128 Shared-library error > > The exit code returned when multiple filesystems are checked is > the bit-wise OR of the exit codes for each filesystem that is > checked. > > An exit status of 2 means that the file system should be rebooted. > This is typically combined with 1 because this tends to happen when > the root file system has been modified, and so the kernel may have > incorrect file system state cached from using the root file system to > run fsck. So fsck.extN will check to see if it is the root file > system being checked, and if it is, and changes were made to the file > system, then it will exit with a status code of 3. If changes were > made to the file system and it is not the root file system, then it > will return with an exit code of 1. This is not an error; just an > informative signal that changes were made by fsck.extN. If no changes > were necessary, fsck.extN will return with an exit code of 0. > > The other exit codes can be used if they are applicable; if the user > aborts an fsck run with ^C, then fsck.extN will return an exit status > code of 32. If the command line has options which are not recognized > fsck.extN will return 16. And so on. > > > The other commonly used conventions is that the -y option means to > just blindly fix all errors, regardless of whether it might be safe or > whether in the hands of a skilled sysadmin, some data loss might be > avoidable if the file system specific fsck is run manually, perhaps in > conjuction with low-level debugging tools (e.g., such as debugfs for > ext2/3/4 or xfs_db for xfs, etc.) > > Some init scripts will use -y because they know that there is no > skilled operator around. For example, in an mobile handset or an IOT > device, maybe fsck -y will be used because if the file system is > corrupted because of a power drop interacting with crappy (sorry, > "cost optimized") flash, all you can do use fsck -y and pray. > > Also -n is used to not fix any errors at all (-y means "all yes", -n > means "all no"). This is used with the exit status codes for scripts > that are trying to check the consistency of a file system without > actually doing anything. > > Typically, if no arguments are given, then the fsck will operate in > interactive mode, where it will report each inconsistency, and then > ask the user if she would like to fix the inconsistency or not. > > Regards, > > - Ted Hi! Thank you for detailed information. I have just one more question, what should filesystem specific fsck do if is started by -a or -p (or with -y) during boot and filesystem structures indicates that last time it was successfully (clean) unmounted? Should it scan whole disk and check all data (files/directories/structures) for consistency? Or should trust for "clean" state and stop? And if stop, how to tell that filesystem fsck to really scan whole disk? For example scanning 2TB disk is really time consuming, specially at boot time. -- Pali Rohár pali.rohar@gmail.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fsck command line API 2017-12-30 12:03 ` Pali Rohár @ 2017-12-30 13:20 ` Theodore Ts'o 2017-12-30 19:48 ` Pali Rohár 0 siblings, 1 reply; 6+ messages in thread From: Theodore Ts'o @ 2017-12-30 13:20 UTC (permalink / raw) To: Pali Rohár; +Cc: Karel Zak, util-linux, Vojtěch Vladyka On Sat, Dec 30, 2017 at 01:03:52PM +0100, Pali Rohár wrote: > > Hi! Thank you for detailed information. I have just one more question, > what should filesystem specific fsck do if is started by -a or -p (or > with -y) during boot and filesystem structures indicates that last time > it was successfully (clean) unmounted? Should it scan whole disk and > check all data (files/directories/structures) for consistency? Or should > trust for "clean" state and stop? And if stop, how to tell that > filesystem fsck to really scan whole disk? For example scanning 2TB disk > is really time consuming, specially at boot time. What e2fsck (fsck.extN) does is check to see if the file system has the "errors/corruptions were detected by the kernel" bit set. If so, it will do a full check. Otherwise, if time-based or mount-based criteria is enabled, and exceeded, then e2fsck will do a full check. Otherwise, it will stop. The other thing e2fsck for ext4 file systems will do is to replay the journal. This is useful because fsck will run fsck in parallel, while mount -a mounts file systems serially. So running the journal in parallel when you have multiple disk spindles can be a big win. This may be less of a big deal these days since systemd will run mounts in parallel. As far as time-based or mount-based full checks (see tune2fs for discussion on this topic), we don't enable this by default any more in e2fsprogs. That's precisely because doing full check for 10 TB disk takes and 60TB RAID arrays takes a long time. The idea of doing mount based checks goes back to the BSD days, because disks and memory are really crappy, and so checking to find problems before they become catastrophic data loss events made sense. These days for big disks, the cost/benefit ratio doesn't work out as well. Also, if you have snapshot support in your file system, you can simply create a snapshot, and run the fsck on the snapshot. An example of how to do this can be found here: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/contrib/e2croncheck Cheers, - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fsck command line API 2017-12-30 13:20 ` Theodore Ts'o @ 2017-12-30 19:48 ` Pali Rohár 0 siblings, 0 replies; 6+ messages in thread From: Pali Rohár @ 2017-12-30 19:48 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Karel Zak, util-linux, Vojtěch Vladyka On Saturday 30 December 2017 08:20:45 Theodore Ts'o wrote: > On Sat, Dec 30, 2017 at 01:03:52PM +0100, Pali Rohár wrote: > > > > Hi! Thank you for detailed information. I have just one more question, > > what should filesystem specific fsck do if is started by -a or -p (or > > with -y) during boot and filesystem structures indicates that last time > > it was successfully (clean) unmounted? Should it scan whole disk and > > check all data (files/directories/structures) for consistency? Or should > > trust for "clean" state and stop? And if stop, how to tell that > > filesystem fsck to really scan whole disk? For example scanning 2TB disk > > is really time consuming, specially at boot time. > > What e2fsck (fsck.extN) does is check to see if the file system has > the "errors/corruptions were detected by the kernel" bit set. If so, > it will do a full check. Otherwise, if time-based or mount-based > criteria is enabled, and exceeded, then e2fsck will do a full check. > Otherwise, it will stop. > > The other thing e2fsck for ext4 file systems will do is to replay the > journal. This is useful because fsck will run fsck in parallel, while > mount -a mounts file systems serially. So running the journal in > parallel when you have multiple disk spindles can be a big win. This > may be less of a big deal these days since systemd will run mounts in > parallel. > > As far as time-based or mount-based full checks (see tune2fs for > discussion on this topic), we don't enable this by default any more in > e2fsprogs. That's precisely because doing full check for 10 TB disk > takes and 60TB RAID arrays takes a long time. The idea of doing mount > based checks goes back to the BSD days, because disks and memory are > really crappy, and so checking to find problems before they become > catastrophic data loss events made sense. These days for big disks, > the cost/benefit ratio doesn't work out as well. > > Also, if you have snapshot support in your file system, you can simply > create a snapshot, and run the fsck on the snapshot. An example of > how to do this can be found here: > > https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/contrib/e2croncheck > > Cheers, > > - Ted Ok, thank you for explanation. In case for a new UDF fsck tool it would also make sense then to not do full check at boot time if filesystem is marked as clean. -- Pali Rohár pali.rohar@gmail.com ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-12-30 19:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-27 10:14 fsck command line API Pali Rohár 2017-12-29 12:02 ` Karel Zak 2017-12-29 18:37 ` Theodore Ts'o 2017-12-30 12:03 ` Pali Rohár 2017-12-30 13:20 ` Theodore Ts'o 2017-12-30 19:48 ` Pali Rohár
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).