* ext4 fsck vs. kernel recovery policy @ 2019-08-27 19:10 dann frazier 2019-08-27 20:27 ` Andreas Dilger 2019-08-27 20:29 ` Eric Sandeen 0 siblings, 2 replies; 6+ messages in thread From: dann frazier @ 2019-08-27 19:10 UTC (permalink / raw) To: linux-fsdevel, Theodore Ts'o, Jan Kara; +Cc: Colin King, Ryan Harper hey, I'm curious if there's a policy about what types of unclean shutdowns 'e2fsck -p' can recover, vs. what the kernel will automatically recover on mount. We're seeing that unclean shutdowns w/ data=journal,journal_csum frequently result in invalid checksums that causes the kernel to abort recovery, while 'e2fsck -p' resolves the issue non-interactively. Driver for this question is that some Ubuntu installs set fstab's passno=0 for the root fs - which I'm told is based on the assumption that both kernel & e2fsck -p have parity when it comes to automatic recovery - that's obviously does not appear to be the case - but I wanted to confirm whether or not that is by design. -dann ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext4 fsck vs. kernel recovery policy 2019-08-27 19:10 ext4 fsck vs. kernel recovery policy dann frazier @ 2019-08-27 20:27 ` Andreas Dilger 2019-08-29 22:53 ` dann frazier 2019-08-27 20:29 ` Eric Sandeen 1 sibling, 1 reply; 6+ messages in thread From: Andreas Dilger @ 2019-08-27 20:27 UTC (permalink / raw) To: dann frazier Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Colin King, Ryan Harper [-- Attachment #1: Type: text/plain, Size: 1586 bytes --] On Aug 27, 2019, at 1:10 PM, dann frazier <dann.frazier@canonical.com> wrote: > > hey, > I'm curious if there's a policy about what types of unclean > shutdowns 'e2fsck -p' can recover, vs. what the kernel will > automatically recover on mount. We're seeing that unclean shutdowns w/ > data=journal,journal_csum frequently result in invalid checksums that > causes the kernel to abort recovery, while 'e2fsck -p' resolves the > issue non-interactively. The kernel journal recovery will only replay the journal blocks. It doesn't do any check and repair of filesystem correctness. During and after e2fsck replays the journal blocks it still does basic correctness checking, and if an error is found it will fall back to a full scan. > Driver for this question is that some Ubuntu installs set fstab's > passno=0 for the root fs - which I'm told is based on the assumption > that both kernel & e2fsck -p have parity when it comes to automatic > recovery - that's obviously does not appear to be the case - but I > wanted to confirm whether or not that is by design. The first thing to figure out is why there are errors with the journal blocks. That can cause problems for both the kernel and e2fsck journal replay. Using data=journal is not a common option, so it is likely that the issue relates to this. IMHO, using data=journal could be helpful for small file writes and/or sync IO, but there have been discussions lately about removing this functionality. If you have some use case that shows real improvements with data=journal, please let us know. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext4 fsck vs. kernel recovery policy 2019-08-27 20:27 ` Andreas Dilger @ 2019-08-29 22:53 ` dann frazier 0 siblings, 0 replies; 6+ messages in thread From: dann frazier @ 2019-08-29 22:53 UTC (permalink / raw) To: Andreas Dilger Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Colin King, Ryan Harper On Tue, Aug 27, 2019 at 02:27:25PM -0600, Andreas Dilger wrote: > On Aug 27, 2019, at 1:10 PM, dann frazier <dann.frazier@canonical.com> wrote: > > > > hey, > > I'm curious if there's a policy about what types of unclean > > shutdowns 'e2fsck -p' can recover, vs. what the kernel will > > automatically recover on mount. We're seeing that unclean shutdowns w/ > > data=journal,journal_csum frequently result in invalid checksums that > > causes the kernel to abort recovery, while 'e2fsck -p' resolves the > > issue non-interactively. > > The kernel journal recovery will only replay the journal blocks. It > doesn't do any check and repair of filesystem correctness. During and > after e2fsck replays the journal blocks it still does basic correctness > checking, and if an error is found it will fall back to a full scan. hey Andreas! Here's a log to clarify what I'm seeing: $ sudo mount /dev/nbd0 mnt JBD2: Invalid checksum recovering data block 517634 in log JBD2: Invalid checksum recovering data block 517633 in log [...] JBD2: Invalid checksum recovering data block 517004 in log JBD2: Invalid checksum recovering data block 4915712 in log JBD2: recovery failed EXT4-fs (nbd0): error loading journal mount: /tmp/mnt: can't read superblock on /dev/nbd0. $ sudo e2fsck -p /dev/nbd0 /dev/nbd0: recovering journal JBD2: Invalid checksum recovering block 517732 in log JBD2: Invalid checksum recovering block 517519 in log [...] JBD2: Invalid checksum recovering block 4915712 in log Journal checksum error found in /dev/nbd0 /dev/nbd0: Clearing orphaned inode 128798 (uid=0, gid=0, mode=040600, size=4096) /dev/nbd0: Clearing orphaned inode 514998 (uid=0, gid=0, mode=040600, size=4096) [...] /dev/nbd0: Clearing orphaned inode 774759 (uid=0, gid=0, mode=0100600, size=4096) /dev/nbd0 was not cleanly unmounted, check forced. /dev/nbd0: 2127984/2195456 files (0.0% non-contiguous), 2963178/8780544 blocks So is it correct to say that the checksum errors were identifying filesystem correctness issues, and therefore e2fsck was needed to correct them? > > Driver for this question is that some Ubuntu installs set fstab's > > passno=0 for the root fs - which I'm told is based on the assumption > > that both kernel & e2fsck -p have parity when it comes to automatic > > recovery - that's obviously does not appear to be the case - but I > > wanted to confirm whether or not that is by design. > > The first thing to figure out is why there are errors with the journal > blocks. That can cause problems for both the kernel and e2fsck journal > replay. > > Using data=journal is not a common option, so it is likely that the > issue relates to this. You're probably right - this issue is very easy to reproduce w/ data=journal,journal_checksum. I was never able to reproduce it otherwise. > IMHO, using data=journal could be helpful for > small file writes and/or sync IO, but there have been discussions lately > about removing this functionality. If you have some use case that shows > real improvements with data=journal, please let us know. I don't have such a use case myself. The issue was reported by a user, and it got me wondering about the basis for our passno=0 default. -dann ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext4 fsck vs. kernel recovery policy 2019-08-27 19:10 ext4 fsck vs. kernel recovery policy dann frazier 2019-08-27 20:27 ` Andreas Dilger @ 2019-08-27 20:29 ` Eric Sandeen 2019-08-27 21:39 ` Eric Sandeen 2019-08-29 23:50 ` dann frazier 1 sibling, 2 replies; 6+ messages in thread From: Eric Sandeen @ 2019-08-27 20:29 UTC (permalink / raw) To: dann frazier, linux-fsdevel, Theodore Ts'o, Jan Kara Cc: Colin King, Ryan Harper On 8/27/19 2:10 PM, dann frazier wrote: > hey, > I'm curious if there's a policy about what types of unclean > shutdowns 'e2fsck -p' can recover, vs. what the kernel will > automatically recover on mount. We're seeing that unclean shutdowns w/ > data=journal,journal_csum frequently result in invalid checksums that > causes the kernel to abort recovery, while 'e2fsck -p' resolves the > issue non-interactively. > > Driver for this question is that some Ubuntu installs set fstab's > passno=0 for the root fs - which I'm told is based on the assumption > that both kernel & e2fsck -p have parity when it comes to automatic > recovery - that's obviously does not appear to be the case - but I > wanted to confirm whether or not that is by design. > > -dann Ted or others more involved w/ ext4 will speak w/ authority but it's my understanding that log replay, whether done by userspace or by the kernel, should always return the filesystem to a consistent state. If that's not the case, scripting things so that you grab a qcow-format e2image prior to fsck so that you can share the problematic image with developers may help. (In XFS land, a large portion of the unreplayable logs we see are the result of storage that didn't /actually/ persist IOs that it claimed were persisted prior to the crash/poweroff.) -Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext4 fsck vs. kernel recovery policy 2019-08-27 20:29 ` Eric Sandeen @ 2019-08-27 21:39 ` Eric Sandeen 2019-08-29 23:50 ` dann frazier 1 sibling, 0 replies; 6+ messages in thread From: Eric Sandeen @ 2019-08-27 21:39 UTC (permalink / raw) To: dann frazier, linux-fsdevel, Theodore Ts'o, Jan Kara Cc: Colin King, Ryan Harper On 8/27/19 3:29 PM, Eric Sandeen wrote: > On 8/27/19 2:10 PM, dann frazier wrote: >> hey, >> I'm curious if there's a policy about what types of unclean >> shutdowns 'e2fsck -p' can recover, vs. what the kernel will >> automatically recover on mount. We're seeing that unclean shutdowns w/ >> data=journal,journal_csum frequently result in invalid checksums that >> causes the kernel to abort recovery, while 'e2fsck -p' resolves the >> issue non-interactively. >> >> Driver for this question is that some Ubuntu installs set fstab's >> passno=0 for the root fs - which I'm told is based on the assumption >> that both kernel & e2fsck -p have parity when it comes to automatic >> recovery - that's obviously does not appear to be the case - but I >> wanted to confirm whether or not that is by design. >> >> -dann > > Ted or others more involved w/ ext4 will speak w/ authority but it's my > understanding that log replay, whether done by userspace or by the kernel, > should always return the filesystem to a consistent state. I should amend: "from an otherwise normal unclean shutdown" corruption discovered during recovery is a different matter, as adilger pointed out. -Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext4 fsck vs. kernel recovery policy 2019-08-27 20:29 ` Eric Sandeen 2019-08-27 21:39 ` Eric Sandeen @ 2019-08-29 23:50 ` dann frazier 1 sibling, 0 replies; 6+ messages in thread From: dann frazier @ 2019-08-29 23:50 UTC (permalink / raw) To: Eric Sandeen Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Colin King, Ryan Harper On Tue, Aug 27, 2019 at 03:29:09PM -0500, Eric Sandeen wrote: > On 8/27/19 2:10 PM, dann frazier wrote: > > hey, > > I'm curious if there's a policy about what types of unclean > > shutdowns 'e2fsck -p' can recover, vs. what the kernel will > > automatically recover on mount. We're seeing that unclean shutdowns w/ > > data=journal,journal_csum frequently result in invalid checksums that > > causes the kernel to abort recovery, while 'e2fsck -p' resolves the > > issue non-interactively. > > > > Driver for this question is that some Ubuntu installs set fstab's > > passno=0 for the root fs - which I'm told is based on the assumption > > that both kernel & e2fsck -p have parity when it comes to automatic > > recovery - that's obviously does not appear to be the case - but I > > wanted to confirm whether or not that is by design. > > > > -dann > > Ted or others more involved w/ ext4 will speak w/ authority but it's my > understanding that log replay, whether done by userspace or by the kernel, > should always return the filesystem to a consistent state. If that's not > the case, scripting things so that you grab a qcow-format e2image prior > to fsck so that you can share the problematic image with developers may > help. Thanks Eric. I captured an image in case it's useful: https://people.canonical.com/~dannf/md2.e2ic.qcow2 -dann > > (In XFS land, a large portion of the unreplayable logs we see are the > result of storage that didn't /actually/ persist IOs that it claimed were > persisted prior to the crash/poweroff.) > > -Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-08-29 23:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-27 19:10 ext4 fsck vs. kernel recovery policy dann frazier 2019-08-27 20:27 ` Andreas Dilger 2019-08-29 22:53 ` dann frazier 2019-08-27 20:29 ` Eric Sandeen 2019-08-27 21:39 ` Eric Sandeen 2019-08-29 23:50 ` dann frazier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).