* [PATCH] handle journal checksum errors via ext4_error() @ 2009-11-04 17:37 Eric Sandeen 2009-11-08 21:34 ` Theodore Tso 2009-11-14 20:27 ` Theodore Tso 0 siblings, 2 replies; 7+ messages in thread From: Eric Sandeen @ 2009-11-04 17:37 UTC (permalink / raw) To: ext4 development As the code stands today, journal checksum errors on the root fs, resulting in only a partial journal replay, result in no kernel messages and no error condition, because it is all handled under a "!(sb->s_flags & MS_RDONLY)" test - and the root fs starts out life in RO mode. It seems to me that just calling ext4_error() here is almost the right thing to do; it will unconditionally mark the fs with errors, and act according to the selected mount -o errors= behavior... However, for the root fs, ext4_error will be short-circuited because the root is mounted readonly; and in fact these errors came into being while we were writing to a read-only fs during journal recovery. So we won't do a journal_abort or anything like that, and other than a printk and an error flag, not much action is taken for a partially-recovered fs. Does this seem like enough? Should we just let the root fs carry on if it's been partially-recovered, and therefore pretty well known to be corrupt? Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 312211e..08370ae 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2719,25 +2719,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL)) { if (ext4_load_journal(sb, es, journal_devnum)) goto failed_mount3; - if (!(sb->s_flags & MS_RDONLY) && - EXT4_SB(sb)->s_journal->j_failed_commit) { - ext4_msg(sb, KERN_CRIT, "error: " - "ext4_fill_super: Journal transaction " - "%u is corrupt", - EXT4_SB(sb)->s_journal->j_failed_commit); - if (test_opt(sb, ERRORS_RO)) { - ext4_msg(sb, KERN_CRIT, - "Mounting filesystem read-only"); - sb->s_flags |= MS_RDONLY; - EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; - es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - } - if (test_opt(sb, ERRORS_PANIC)) { - EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; - es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - ext4_commit_super(sb, 1); - goto failed_mount4; - } + + if (EXT4_SB(sb)->s_journal->j_failed_commit) { + ext4_error(sb, __func__, + "Journal transaction %u is corrupt, " + "filesystem only partially recovered", + EXT4_SB(sb)->s_journal->j_failed_commit); } } else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) && EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER)) { ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] handle journal checksum errors via ext4_error() 2009-11-04 17:37 [PATCH] handle journal checksum errors via ext4_error() Eric Sandeen @ 2009-11-08 21:34 ` Theodore Tso 2009-11-14 20:27 ` Theodore Tso 1 sibling, 0 replies; 7+ messages in thread From: Theodore Tso @ 2009-11-08 21:34 UTC (permalink / raw) To: Eric Sandeen; +Cc: ext4 development On Wed, Nov 04, 2009 at 11:37:20AM -0600, Eric Sandeen wrote: > As the code stands today, journal checksum errors on the root fs, > resulting in only a partial journal replay, result in no kernel > messages and no error condition, because it is all handled under > a "!(sb->s_flags & MS_RDONLY)" test - and the root fs starts out > life in RO mode. > > It seems to me that just calling ext4_error() here is almost the > right thing to do; it will unconditionally mark the fs with errors, > and act according to the selected mount -o errors= behavior... > > However, for the root fs, ext4_error will be short-circuited because > the root is mounted readonly; and in fact these errors came into > being while we were writing to a read-only fs during journal recovery. > So we won't do a journal_abort or anything like that, and other than > a printk and an error flag, not much action is taken for a > partially-recovered fs. Does this seem like enough? > > Should we just let the root fs carry on if it's been partially-recovered, > and therefore pretty well known to be corrupt? I think we need to do more, especially since there will be a large number of desktop/laptop systems that only have a single root file system. Given that we are willing to modify the file system to replay the journal and deal with the orphaned inode list, even when it is mounted read-only, I think we should temporarily mark the file-system read/write, before calling ext4_error(), and then restore the file-system back to read-only status if we return from ext4_error(). That way when the init scripts run fsck, we'll at least get a forced fsck to clean up the file system afterwards. Given that we've seen how disatrous things can be when we abort a journal replay several checksums before the end, I think we really need to think about adding per-block checksums, so that in the case where the block is later overwritten by a subsequent transaction (which is quite likely for most metadata blocks, especially block or inode allocation bitmaps), we have a better chance of recovering from a failed checksum. Once we have per-block checksums, even if we don't have a newer version of the block, it may be that we're better off continuing to replay subsequent commits, and just not write the block where we know the checksum is bad, on the theory that while we will still need to run fsck afterwards, there will likely be less damage to repair... The other thing which we might do is see if we can be more aggressive checkpointing transactions, since if the blocks have been written to the final lcoation on disk, it doesn't matter if there was a failure writing them to the journal. This obviously has performance implications, though. - Ted ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] handle journal checksum errors via ext4_error() 2009-11-04 17:37 [PATCH] handle journal checksum errors via ext4_error() Eric Sandeen 2009-11-08 21:34 ` Theodore Tso @ 2009-11-14 20:27 ` Theodore Tso 2009-11-15 20:28 ` [PATCH] ext4: Remove failed journal checksum check Theodore Ts'o 1 sibling, 1 reply; 7+ messages in thread From: Theodore Tso @ 2009-11-14 20:27 UTC (permalink / raw) To: Eric Sandeen; +Cc: ext4 development On Wed, Nov 04, 2009 at 11:37:20AM -0600, Eric Sandeen wrote: > As the code stands today, journal checksum errors on the root fs, > resulting in only a partial journal replay, result in no kernel > messages and no error condition, because it is all handled under > a "!(sb->s_flags & MS_RDONLY)" test - and the root fs starts out > life in RO mode. > > It seems to me that just calling ext4_error() here is almost the > right thing to do; it will unconditionally mark the fs with errors, > and act according to the selected mount -o errors= behavior... > > However, for the root fs, ext4_error will be short-circuited because > the root is mounted readonly; and in fact these errors came into > being while we were writing to a read-only fs during journal recovery. > So we won't do a journal_abort or anything like that, and other than > a printk and an error flag, not much action is taken for a > partially-recovered fs. Does this seem like enough? > > Should we just let the root fs carry on if it's been partially-recovered, > and therefore pretty well known to be corrupt? I've been thinking about this some more, and I think we need to make change to jbd2_journal_load() so that it checks for the failed journal checksum, and then avoid resetting the journal. This allows for a userspace program like e2fsck to do something better about recovering from this situation. - Ted jbd2: don't wipe the journal on a failed journal checksum If there is a failed journal checksum, don't reset the journal. This allows for userspace programs to decide how to recover from this situation. It may be that ignoring the journal checksum failure might be a better way of recovering the file system. Once we add per-block checksums, we can definitely do better. Until then, a system administrator can try backing up the file system image (or taking a snapshot) and and trying to determine experimentally whether ignoring the checksum failure or aborting the journal replay results in less data loss. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- fs/jbd2/journal.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index fed8538..af60d98 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1248,6 +1248,13 @@ int jbd2_journal_load(journal_t *journal) if (jbd2_journal_recover(journal)) goto recovery_error; + if (journal->j_failed_commit) { + printk(KERN_ERR "JBD2: journal transaction %u on %s " + "is corrupt.\n", journal->j_failed_commit, + journal->j_devname); + return -EIO; + } + /* OK, we've finished with the dynamic journal bits: * reinitialise the dynamic contents of the superblock in memory * and reset them on disk. */ ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH] ext4: Remove failed journal checksum check 2009-11-14 20:27 ` Theodore Tso @ 2009-11-15 20:28 ` Theodore Ts'o 2009-11-17 16:05 ` Jan Kara 0 siblings, 1 reply; 7+ messages in thread From: Theodore Ts'o @ 2009-11-15 20:28 UTC (permalink / raw) To: Ext4 Developers List; +Cc: Theodore Ts'o Now that we are checking for failed journal checksums in the jbd2 layer, we don't need to check in the ext4 mount path --- since a checksum fail will result in ext4_load_journal() returning an error, causing the file system to refuse to be mounted until e2fsck can deal with the problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- fs/ext4/super.c | 20 -------------------- 1 files changed, 0 insertions(+), 20 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 8662b2e..04c6690 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2721,26 +2721,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL)) { if (ext4_load_journal(sb, es, journal_devnum)) goto failed_mount3; - if (!(sb->s_flags & MS_RDONLY) && - EXT4_SB(sb)->s_journal->j_failed_commit) { - ext4_msg(sb, KERN_CRIT, "error: " - "ext4_fill_super: Journal transaction " - "%u is corrupt", - EXT4_SB(sb)->s_journal->j_failed_commit); - if (test_opt(sb, ERRORS_RO)) { - ext4_msg(sb, KERN_CRIT, - "Mounting filesystem read-only"); - sb->s_flags |= MS_RDONLY; - EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; - es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - } - if (test_opt(sb, ERRORS_PANIC)) { - EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; - es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - ext4_commit_super(sb, 1); - goto failed_mount4; - } - } } else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) && EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER)) { ext4_msg(sb, KERN_ERR, "required journal recovery " -- 1.6.5.216.g5288a.dirty ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ext4: Remove failed journal checksum check 2009-11-15 20:28 ` [PATCH] ext4: Remove failed journal checksum check Theodore Ts'o @ 2009-11-17 16:05 ` Jan Kara 2009-11-18 3:50 ` tytso 0 siblings, 1 reply; 7+ messages in thread From: Jan Kara @ 2009-11-17 16:05 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Ext4 Developers List > Now that we are checking for failed journal checksums in the jbd2 > layer, we don't need to check in the ext4 mount path --- since a > checksum fail will result in ext4_load_journal() returning an error, > causing the file system to refuse to be mounted until e2fsck can deal > with the problem. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/ext4/super.c | 20 -------------------- > 1 files changed, 0 insertions(+), 20 deletions(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 8662b2e..04c6690 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -2721,26 +2721,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL)) { > if (ext4_load_journal(sb, es, journal_devnum)) > goto failed_mount3; But shouldn't we set the EXT4_ERROR_FS flag? We don't semm to do this in ext4_load_journal() when jbd2_journal_load() fails. Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext4: Remove failed journal checksum check 2009-11-17 16:05 ` Jan Kara @ 2009-11-18 3:50 ` tytso 2009-11-18 10:10 ` Jan Kara 0 siblings, 1 reply; 7+ messages in thread From: tytso @ 2009-11-18 3:50 UTC (permalink / raw) To: Jan Kara; +Cc: Ext4 Developers List On Tue, Nov 17, 2009 at 05:05:42PM +0100, Jan Kara wrote: > But shouldn't we set the EXT4_ERROR_FS flag? We don't semm to do this > in ext4_load_journal() when jbd2_journal_load() fails. No, we don't need to set the EXT4_ERROR_FS flag. When jbd2_journal_load() fails, we are leaving the journal in place and we are refusing the mount. In the case of a root file system with this problem, this will lead to a panic, and the user will have to use a rescue CD. In any case, when e2fsck runs, the current version will report the error, abort the journal playback, and then force a full check of the file system. So this actually does what we want without setting the EXT4_ERROR_FS flag. In fact setting the flag will likely be pointless, since if the superblock is journalled, it will get overwritten during the journal replay. In fact, what I think e2fsck should do as the default option is to *skip* the journal transaction with the failed checksum, but *not* abort the journal replay, and to replay the rest of the journal transactions with correct checksums, and then force a full fsck. Aborting a journal transaction and abandoning 10 or more transactions after the failed transaction is likely to do far more damage. We're better off replaying the transactions, hope that some or all of the blocks in the skipped, failed transaction, are contained in subsequent transaction, and then clean up the file system afterwards. E2fsck should have a (non-default) option to replay the failed transaction anyway, and a really paranoid system administrator, though, could try it both ways. Using a LVM snapshot would allow the sysadmin to try both ways quite efficiently. Here's an excerpt from journal of a file system that was aborted during an fs_mark run. (Generated using "logdump -a" in debugfs): Found expected sequence 5735, type 2 (commit block) at block 1977 Found expected sequence 5736, type 1 (descriptor block) at block 1978 Dumping descriptor block, sequence 5736, at block 1978: FS block 277 logged at journal block 1979 (flags 0x0) FS block 2 logged at journal block 1980 (flags 0x2) FS block 1009 logged at journal block 1981 (flags 0x2) FS block 547 logged at journal block 1982 (flags 0x2) FS block 4433 logged at journal block 1983 (flags 0x2) FS block 267 logged at journal block 1984 (flags 0xa) Found expected sequence 5736, type 2 (commit block) at block 1985 Found expected sequence 5737, type 1 (descriptor block) at block 1986 Dumping descriptor block, sequence 5737, at block 1986: FS block 277 logged at journal block 1987 (flags 0x0) FS block 2 logged at journal block 1988 (flags 0x2) FS block 1009 logged at journal block 1989 (flags 0x2) FS block 547 logged at journal block 1990 (flags 0x2) FS block 4451 logged at journal block 1991 (flags 0x2) FS block 267 logged at journal block 1992 (flags 0xa) Found expected sequence 5737, type 2 (commit block) at block 1993 Found expected sequence 5738, type 1 (descriptor block) at block 1994 Dumping descriptor block, sequence 5738, at block 1994: FS block 277 logged at journal block 1995 (flags 0x0) FS block 2 logged at journal block 1996 (flags 0x2) FS block 1009 logged at journal block 1997 (flags 0x2) FS block 547 logged at journal block 1998 (flags 0x2) FS block 4680 logged at journal block 1999 (flags 0x2) FS block 267 logged at journal block 2000 (flags 0xa) Found expected sequence 5738, type 2 (commit block) at block 2001 Found expected sequence 5739, type 1 (descriptor block) at block 2002 Dumping descriptor block, sequence 5739, at block 2002: FS block 277 logged at journal block 2003 (flags 0x0) FS block 2 logged at journal block 2004 (flags 0x2) FS block 1009 logged at journal block 2005 (flags 0x2) FS block 547 logged at journal block 2006 (flags 0x2) FS block 4714 logged at journal block 2007 (flags 0x2) FS block 291 logged at journal block 2008 (flags 0xa) This is a best case, but note how many blocks can appear multiple times in the journal. If fs blocks 277, 2, 1009, or 547 are corrupted in any transaction before #5739, causing a checksum failure in commit #5436 (for example), replaying the subsequent transactions will recover the damage. In fact, if blocks 4433 or 267 are intact, we're better off replaying commit #5436, even if the journal checksum doesn't match, since the corrupted blocks will be repaired by subsequent commits, and at least that way we don't lose the updates to blocks 4433 and 267. So this is something that we really need to address in userspace, by making e2fsck smarter. (And this is also is why we really need per-block checksums; it will help us recover from corrupted journals much more easily and automatically.) - Ted ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext4: Remove failed journal checksum check 2009-11-18 3:50 ` tytso @ 2009-11-18 10:10 ` Jan Kara 0 siblings, 0 replies; 7+ messages in thread From: Jan Kara @ 2009-11-18 10:10 UTC (permalink / raw) To: tytso; +Cc: Jan Kara, Ext4 Developers List On Tue 17-11-09 22:50:14, tytso@mit.edu wrote: > On Tue, Nov 17, 2009 at 05:05:42PM +0100, Jan Kara wrote: > > But shouldn't we set the EXT4_ERROR_FS flag? We don't semm to do this > > in ext4_load_journal() when jbd2_journal_load() fails. > > No, we don't need to set the EXT4_ERROR_FS flag. When > jbd2_journal_load() fails, we are leaving the journal in place and we > are refusing the mount. In the case of a root file system with this > problem, this will lead to a panic, and the user will have to use a > rescue CD. > > In any case, when e2fsck runs, the current version will report the > error, abort the journal playback, and then force a full check of the > file system. So this actually does what we want without setting the > EXT4_ERROR_FS flag. In fact setting the flag will likely be > pointless, since if the superblock is journalled, it will get > overwritten during the journal replay. Ah, you're right. Thanks for explanation. > In fact, what I think e2fsck should do as the default option is to > *skip* the journal transaction with the failed checksum, but *not* > abort the journal replay, and to replay the rest of the journal > transactions with correct checksums, and then force a full fsck. > Aborting a journal transaction and abandoning 10 or more transactions > after the failed transaction is likely to do far more damage. We're > better off replaying the transactions, hope that some or all of the > blocks in the skipped, failed transaction, are contained in subsequent > transaction, and then clean up the file system afterwards. I agree, Replaying all transactions with correct checksum seems like a better option. Of course, having per-block checksums and replaying all blocks that have correct checksum is even better. > E2fsck should have a (non-default) option to replay the failed > transaction anyway, and a really paranoid system administrator, > though, could try it both ways. Using a LVM snapshot would allow the > sysadmin to try both ways quite efficiently. Yes, possibly this can be useful as well. > Here's an excerpt from journal of a file system that was aborted > during an fs_mark run. (Generated using "logdump -a" in debugfs): > > Found expected sequence 5735, type 2 (commit block) at block 1977 > Found expected sequence 5736, type 1 (descriptor block) at block 1978 > Dumping descriptor block, sequence 5736, at block 1978: > FS block 277 logged at journal block 1979 (flags 0x0) > FS block 2 logged at journal block 1980 (flags 0x2) > FS block 1009 logged at journal block 1981 (flags 0x2) > FS block 547 logged at journal block 1982 (flags 0x2) > FS block 4433 logged at journal block 1983 (flags 0x2) > FS block 267 logged at journal block 1984 (flags 0xa) > Found expected sequence 5736, type 2 (commit block) at block 1985 > Found expected sequence 5737, type 1 (descriptor block) at block 1986 > Dumping descriptor block, sequence 5737, at block 1986: > FS block 277 logged at journal block 1987 (flags 0x0) > FS block 2 logged at journal block 1988 (flags 0x2) > FS block 1009 logged at journal block 1989 (flags 0x2) > FS block 547 logged at journal block 1990 (flags 0x2) > FS block 4451 logged at journal block 1991 (flags 0x2) > FS block 267 logged at journal block 1992 (flags 0xa) > Found expected sequence 5737, type 2 (commit block) at block 1993 > Found expected sequence 5738, type 1 (descriptor block) at block 1994 > Dumping descriptor block, sequence 5738, at block 1994: > FS block 277 logged at journal block 1995 (flags 0x0) > FS block 2 logged at journal block 1996 (flags 0x2) > FS block 1009 logged at journal block 1997 (flags 0x2) > FS block 547 logged at journal block 1998 (flags 0x2) > FS block 4680 logged at journal block 1999 (flags 0x2) > FS block 267 logged at journal block 2000 (flags 0xa) > Found expected sequence 5738, type 2 (commit block) at block 2001 > Found expected sequence 5739, type 1 (descriptor block) at block 2002 > Dumping descriptor block, sequence 5739, at block 2002: > FS block 277 logged at journal block 2003 (flags 0x0) > FS block 2 logged at journal block 2004 (flags 0x2) > FS block 1009 logged at journal block 2005 (flags 0x2) > FS block 547 logged at journal block 2006 (flags 0x2) > FS block 4714 logged at journal block 2007 (flags 0x2) > FS block 291 logged at journal block 2008 (flags 0xa) > > This is a best case, but note how many blocks can appear multiple > times in the journal. If fs blocks 277, 2, 1009, or 547 are corrupted > in any transaction before #5739, causing a checksum failure in commit > #5436 (for example), replaying the subsequent transactions will > recover the damage. In fact, if blocks 4433 or 267 are intact, we're > better off replaying commit #5436, even if the journal checksum > doesn't match, since the corrupted blocks will be repaired by > subsequent commits, and at least that way we don't lose the updates to > blocks 4433 and 267. Yes, it heavily depends on the load. But you're right that bitmaps, group descriptors and superblock are likely to be in further transactions. OTOH inodes, directory blocks or indirect blocks (or however we call such blocks for extents) and not that likely to be there. Honza ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-11-18 10:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-11-04 17:37 [PATCH] handle journal checksum errors via ext4_error() Eric Sandeen 2009-11-08 21:34 ` Theodore Tso 2009-11-14 20:27 ` Theodore Tso 2009-11-15 20:28 ` [PATCH] ext4: Remove failed journal checksum check Theodore Ts'o 2009-11-17 16:05 ` Jan Kara 2009-11-18 3:50 ` tytso 2009-11-18 10:10 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.