* [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() @ 2009-11-06 22:33 Eric Sandeen 2009-11-07 0:26 ` Andreas Dilger 0 siblings, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2009-11-06 22:33 UTC (permalink / raw) To: ext4 development commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 08370ae..b02daf6 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3605,6 +3605,8 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf) struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_super_block *es = sbi->s_es; u64 fsid; + handle_t *handle; + int err = 0; if (test_opt(sb, MINIX_DF)) { sbi->s_overhead_last = 0; @@ -3650,20 +3652,33 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_blocks = ext4_blocks_count(es) - sbi->s_overhead_last; buf->f_bfree = percpu_counter_sum_positive(&sbi->s_freeblocks_counter) - percpu_counter_sum_positive(&sbi->s_dirtyblocks_counter); - ext4_free_blocks_count_set(es, buf->f_bfree); buf->f_bavail = buf->f_bfree - ext4_r_blocks_count(es); if (buf->f_bfree < ext4_r_blocks_count(es)) buf->f_bavail = 0; buf->f_files = le32_to_cpu(es->s_inodes_count); buf->f_ffree = percpu_counter_sum_positive(&sbi->s_freeinodes_counter); - es->s_free_inodes_count = cpu_to_le32(buf->f_ffree); buf->f_namelen = EXT4_NAME_LEN; fsid = le64_to_cpup((void *)es->s_uuid) ^ le64_to_cpup((void *)es->s_uuid + sizeof(u64)); buf->f_fsid.val[0] = fsid & 0xFFFFFFFFUL; buf->f_fsid.val[1] = (fsid >> 32) & 0xFFFFFFFFUL; - return 0; + handle = ext4_journal_start_sb(sb, 1); + if (IS_ERR(handle)) { + err = PTR_ERR(handle); + goto out; + } + err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); + if (err) + goto journal_stop; + es->s_free_inodes_count = cpu_to_le32(buf->f_ffree); + ext4_free_blocks_count_set(es, buf->f_bfree); + err = ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh); + +journal_stop: + ext4_journal_stop(handle); +out: + return err; } /* Helper function for writing quotas on sync - we need to start transaction ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-06 22:33 [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() Eric Sandeen @ 2009-11-07 0:26 ` Andreas Dilger 2009-11-07 1:08 ` Eric Sandeen 2009-11-08 21:48 ` Theodore Tso 0 siblings, 2 replies; 14+ messages in thread From: Andreas Dilger @ 2009-11-07 0:26 UTC (permalink / raw) To: Eric Sandeen; +Cc: ext4 development On 2009-11-06, at 15:33, Eric Sandeen wrote: > commit a71ce8c6c9bf269b192f352ea555217815cf027e updated > ext4_statfs() to update the on-disk superblock counters, > but modified this buffer directly without any journaling of > the change. This is one of the accesses that was causing the crc > errors in journal replay as seen in kernel.org bugzilla #14354. > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > > @@ -3650,20 +3652,33 @@ static int ext4_statfs(struct dentry > *dentry, struct kstatfs *buf) > + handle = ext4_journal_start_sb(sb, 1); > + if (IS_ERR(handle)) { > + err = PTR_ERR(handle); > + goto out; > + } > + err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); > + if (err) > + goto journal_stop; > + es->s_free_inodes_count = cpu_to_le32(buf->f_ffree); > + ext4_free_blocks_count_set(es, buf->f_bfree); > + err = ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh); I admit to being the instigator of this change. The intention is that we want to update the on-disk superblock block/ inode counters from the per-cpu data periodically, since they are never updated anymore (only the group summaries are updated, to avoid contention). However, this isn't critical work, since it is only useful for read-only e2fsck not reporting spurious errors on the filesystem and dumpe2fs/debugfs having some chance at reporting a reasonable value for the filesystem space usage. Starting a transaction as part of statfs is really counter-productive to making that code efficient, which was the whole point of the original patch to remove the per-call "overhead" calculation. The intention was that the in-memory superblock would be updated whenever statfs is called (this doesn't cost anything, since we've already computed the value for statfs), and if the superblock is written to disk for some other reason they will go along for the ride. If the choice is between adding a proper transaction here, or not doing this at all, I'd rather just not do it at all. Of course, I'd like to work out some kind of compromise, like only updating the superblock when there is already a shadow BH that is being used to write to the journal, or similar. If there is a desire to keep a transaction here and update the superblock counters, it _definitely_ doesn't need to be done on every statfs, but at most once every 30 seconds or whatever. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-07 0:26 ` Andreas Dilger @ 2009-11-07 1:08 ` Eric Sandeen 2009-11-08 21:48 ` Theodore Tso 1 sibling, 0 replies; 14+ messages in thread From: Eric Sandeen @ 2009-11-07 1:08 UTC (permalink / raw) To: Andreas Dilger; +Cc: ext4 development Andreas Dilger wrote: > On 2009-11-06, at 15:33, Eric Sandeen wrote: >> commit a71ce8c6c9bf269b192f352ea555217815cf027e updated >> ext4_statfs() to update the on-disk superblock counters, but >> modified this buffer directly without any journaling of the change. >> This is one of the accesses that was causing the crc errors in >> journal replay as seen in kernel.org bugzilla #14354. >> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- ... > I admit to being the instigator of this change. > > The intention is that we want to update the on-disk superblock > block/inode counters from the per-cpu data periodically, since they > are never updated anymore (only the group summaries are updated, to > avoid contention). However, this isn't critical work, since it is > only useful for read-only e2fsck not reporting spurious errors on the > filesystem and dumpe2fs/debugfs having some chance at reporting a > reasonable value for the filesystem space usage. > > Starting a transaction as part of statfs is really counter-productive > to making that code efficient, which was the whole point of the > original patch to remove the per-call "overhead" calculation. > > The intention was that the in-memory superblock would be updated > whenever statfs is called (this doesn't cost anything, since we've > already computed the value for statfs), and if the superblock is > written to disk for some other reason they will go along for the > ride. > > If the choice is between adding a proper transaction here, or not > doing this at all, I'd rather just not do it at all. Of course, I'd > like to work out some kind of compromise, like only updating the > superblock when there is already a shadow BH that is being used to > write to the journal, or similar. > > If there is a desire to keep a transaction here and update the > superblock counters, it _definitely_ doesn't need to be done on every > statfs, but at most once every 30 seconds or whatever. You know, I think I thought about all that, and I wrote the patch anyway somehow; blame a late friday evening for that one. :) I'll think of a better route to take. Thanks, -Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-07 0:26 ` Andreas Dilger 2009-11-07 1:08 ` Eric Sandeen @ 2009-11-08 21:48 ` Theodore Tso 2009-11-08 22:09 ` Eric Sandeen 2009-11-09 4:41 ` Andreas Dilger 1 sibling, 2 replies; 14+ messages in thread From: Theodore Tso @ 2009-11-08 21:48 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, ext4 development On Fri, Nov 06, 2009 at 05:26:51PM -0700, Andreas Dilger wrote: > If the choice is between adding a proper transaction here, or not > doing this at all, I'd rather just not do it at all. Of course, I'd > like to work out some kind of compromise, like only updating the > superblock when there is already a shadow BH that is being used to > write to the journal, or similar. In practice, the superblock is never going to modified in normal operations, unless a resize happens to be happening. Since we already force the superblock summary counters to be correct during an unmount or file system freeze, we really only need this so that it's correct after a file system crash. I don't think people generally end up calling statfs() all that frequently, so it's not clear how much adding a 30 second throttle would help. Maybe we should just not bother trying to update the superblock at all on a statfs()? Hmm... - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-08 21:48 ` Theodore Tso @ 2009-11-08 22:09 ` Eric Sandeen 2009-11-09 12:53 ` Theodore Tso 2009-11-09 4:41 ` Andreas Dilger 1 sibling, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2009-11-08 22:09 UTC (permalink / raw) To: Theodore Tso; +Cc: Andreas Dilger, ext4 development Theodore Tso wrote: > On Fri, Nov 06, 2009 at 05:26:51PM -0700, Andreas Dilger wrote: >> If the choice is between adding a proper transaction here, or not >> doing this at all, I'd rather just not do it at all. Of course, I'd >> like to work out some kind of compromise, like only updating the >> superblock when there is already a shadow BH that is being used to >> write to the journal, or similar. > > In practice, the superblock is never going to modified in normal > operations, unless a resize happens to be happening. Since we already > force the superblock summary counters to be correct during an unmount > or file system freeze, we really only need this so that it's correct > after a file system crash. > > I don't think people generally end up calling statfs() all that > frequently, so it's not clear how much adding a 30 second throttle > would help. Maybe we should just not bother trying to update the > superblock at all on a statfs()? for now maybe that's better.... But don't we journal the superblock sometimes, not others ... for example write_super -> ext4_write_super -> ext4_commit_super does no journaling of superblock modifications. ext4_orphan_add, however, does. This would likely lead to trouble w/ the debugging patch ... though I didn't see it ... ? So I was premature w/ this patch, I think. Maybe we could unconditionally do the copy-out in jbd2_journal_write_metadata_buffer() ...? -Eric > Hmm... > > - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-08 22:09 ` Eric Sandeen @ 2009-11-09 12:53 ` Theodore Tso 2009-11-09 17:55 ` Andreas Dilger 0 siblings, 1 reply; 14+ messages in thread From: Theodore Tso @ 2009-11-09 12:53 UTC (permalink / raw) To: Eric Sandeen; +Cc: Andreas Dilger, ext4 development On Sun, Nov 08, 2009 at 04:09:40PM -0600, Eric Sandeen wrote: > > But don't we journal the superblock sometimes, not others ... for > example write_super -> ext4_write_super -> ext4_commit_super does no > journaling of superblock modifications. ext4_orphan_add, however, does. > This would likely lead to trouble w/ the debugging patch ... though I > didn't see it ... ? Ah, I had forgotten about ext4_orphan_add(); that is indeed the one place where we would be updating the super block under normal operations, besides online-resize. I've been looking at the write_super() paths, and from what I can tell it's only used in two places. The generic fsync() handler, file_fsync(), which we do't use, and sync_supers(), which will indeed call write_super() -> ext4_write_super() if sb->s_dirt is set. That led me to examine the places where we set s_dirt, and it's in a lot of places where we're no longer modifying the superblock any more, but we're still setting sb->s_dirt. I don't know why you didn't see problems with the debugging patch; the only thing I can think of is that since the actual superblock update is deferred to a timer-triggered callback, you were getting consistently lucky --- which is hard for me to believe, but I don't have a better suggestion. What I think we do need to do is eliminate all of the places where we set sb->s_dirt, and if we need to update the superblock, we do it ourselves, under journaling control. That leaves places which call ext4_commit_super() directly, which is at mount and unmount time (which should be OK, as long as it's before or after journalling is active) and when we freeze the filesystem, which might be OK, but we need to take a careful look at it. - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-09 12:53 ` Theodore Tso @ 2009-11-09 17:55 ` Andreas Dilger 0 siblings, 0 replies; 14+ messages in thread From: Andreas Dilger @ 2009-11-09 17:55 UTC (permalink / raw) To: Theodore Tso; +Cc: Eric Sandeen, ext4 development On 2009-11-09, at 05:53, Theodore Tso wrote: > On Sun, Nov 08, 2009 at 04:09:40PM -0600, Eric Sandeen wrote: >> But don't we journal the superblock sometimes, not others ... for >> example write_super -> ext4_write_super -> ext4_commit_super does no >> journaling of superblock modifications. ext4_orphan_add, however, >> does. >> This would likely lead to trouble w/ the debugging patch ... though I >> didn't see it ... ? > > Ah, I had forgotten about ext4_orphan_add(); that is indeed the one > place where we would be updating the super block under normal > operations, besides online-resize. > > I've been looking at the write_super() paths, and from what I can tell > it's only used in two places. The generic fsync() handler, > file_fsync(), which we do't use, and sync_supers(), which will indeed > call write_super() -> ext4_write_super() if sb->s_dirt is set. That > led me to examine the places where we set s_dirt, and it's in a lot of > places where we're no longer modifying the superblock any more, but > we're still setting sb->s_dirt. I don't know why you didn't see > problems with the debugging patch; the only thing I can think of is > that since the actual superblock update is deferred to a > timer-triggered callback, you were getting consistently lucky --- > which is hard for me to believe, but I don't have a better suggestion. I suspect this is because the only thing that changes in the superblock these days is the orphan list, so out-of-order writes to the superblock will at worst result in a few entries added/missing from the orphan list. I do recall that there are "inodes from a corrupt orphan list found" messages seen occasionally during full e2fsck runs, but it has never been important enough to investigate. > What I think we do need to do is eliminate all of the places where we > set sb->s_dirt, and if we need to update the superblock, we do it > ourselves, under journaling control. We have to ensure that writeout of the superblock is still being done correctly during non-journal mode operation. > That leaves places which call ext4_commit_super() directly, which is > at mount and unmount time (which should be OK, as long as it's before > or after journalling is active) and when we freeze the filesystem, > which might be OK, but we need to take a careful look at it. We also write out the superblock directly in ext4_error(), so that the EXT4_ERROR_FS flag is written to disk (if at all possible) rather than putting the superblock into a journal transaction that will not be replayed (due to the transaction never committing after the journal is aborted or the node panics). Since that will be in the last transaction anyways (unless errors=continue is used) I don't see it as a major problem. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-08 21:48 ` Theodore Tso 2009-11-08 22:09 ` Eric Sandeen @ 2009-11-09 4:41 ` Andreas Dilger 2009-11-15 3:29 ` Theodore Tso 1 sibling, 1 reply; 14+ messages in thread From: Andreas Dilger @ 2009-11-09 4:41 UTC (permalink / raw) To: Theodore Tso; +Cc: Eric Sandeen, ext4 development On 2009-11-08, at 14:48, Theodore Tso wrote: > On Fri, Nov 06, 2009 at 05:26:51PM -0700, Andreas Dilger wrote: >> If the choice is between adding a proper transaction here, or not >> doing this at all, I'd rather just not do it at all. Of course, I'd >> like to work out some kind of compromise, like only updating the >> superblock when there is already a shadow BH that is being used to >> write to the journal, or similar. > > In practice, the superblock is never going to modified in normal > operations, unless a resize happens to be happening. Actually, Eric had a important case where the superblock is modified is whenever any file is added to the orphan list, so this basically happens all the time. When the orphan code was added, it made sense to put the head of the orphan list on the superblock because it was updated for every truncate to change the free block counters. > Since we already force the superblock summary counters to be correct > during an unmount or file system freeze, we really only need this so > that it's correct after a file system crash. > > I don't think people generally end up calling statfs() all that > frequently, so it's not clear how much adding a 30 second throttle > would help. Maybe we should just not bother trying to update the > superblock at all on a statfs()? The reason we added this was for running a read-only e2fsck on a filesystem without getting spurious errors just because the superblock summaries were incorrect. The other alternative is to change e2fsck so that it doesn't consider just a block/inode summary an error. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-09 4:41 ` Andreas Dilger @ 2009-11-15 3:29 ` Theodore Tso 2009-11-16 23:38 ` Andreas Dilger 0 siblings, 1 reply; 14+ messages in thread From: Theodore Tso @ 2009-11-15 3:29 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, ext4 development On Sun, Nov 08, 2009 at 09:41:28PM -0700, Andreas Dilger wrote: > > The reason we added this was for running a read-only e2fsck on a > filesystem without getting spurious errors just because the superblock > summaries were incorrect. The other alternative is to change e2fsck > so that it doesn't consider just a block/inode summary an error. > We've been doing this for a while --- e2fsprogs 1.34, since April 2003. In e2fsck/super.c:check_super_block(): /* * Update the global counts from the block group counts. This * is needed for an experimental patch which eliminates * locking the entire filesystem when allocating blocks or * inodes; if the filesystem is not unmounted cleanly, the * global counts may not be accurate. */ if ((free_blocks != sb->s_free_blocks_count) || (free_inodes != sb->s_free_inodes_count)) { if (ctx->options & E2F_OPT_READONLY) ext2fs_unmark_valid(fs); else { sb->s_free_blocks_count = free_blocks; sb->s_free_inodes_count = free_inodes; ext2fs_mark_super_dirty(fs); } } I suppose I should edit out the bit about "experimental patch", which might have been true in 2003, but it's certainly not true any more. Hence, I think it's safe to eliminate the updates in ext4_statfs() altogether. - Ted commit 48efef97e03811e4492607ed7a7aefb35e2c0c00 Author: Theodore Ts'o <tytso@mit.edu> Date: Sat Nov 14 21:37:52 2009 -0500 ext4: Don't update the superblock in ext4_statfs() commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 8662b2e..3cd519b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3668,13 +3668,11 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_blocks = ext4_blocks_count(es) - sbi->s_overhead_last; buf->f_bfree = percpu_counter_sum_positive(&sbi->s_freeblocks_counter) - percpu_counter_sum_positive(&sbi->s_dirtyblocks_counter); - ext4_free_blocks_count_set(es, buf->f_bfree); buf->f_bavail = buf->f_bfree - ext4_r_blocks_count(es); if (buf->f_bfree < ext4_r_blocks_count(es)) buf->f_bavail = 0; buf->f_files = le32_to_cpu(es->s_inodes_count); buf->f_ffree = percpu_counter_sum_positive(&sbi->s_freeinodes_counter); - es->s_free_inodes_count = cpu_to_le32(buf->f_ffree); buf->f_namelen = EXT4_NAME_LEN; fsid = le64_to_cpup((void *)es->s_uuid) ^ le64_to_cpup((void *)es->s_uuid + sizeof(u64)); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-15 3:29 ` Theodore Tso @ 2009-11-16 23:38 ` Andreas Dilger 2009-11-19 19:08 ` tytso 0 siblings, 1 reply; 14+ messages in thread From: Andreas Dilger @ 2009-11-16 23:38 UTC (permalink / raw) To: Theodore Tso; +Cc: Eric Sandeen, ext4 development On 2009-11-14, at 19:29, Theodore Tso wrote: > On Sun, Nov 08, 2009 at 09:41:28PM -0700, Andreas Dilger wrote: >> >> The reason we added this was for running a read-only e2fsck on a >> filesystem without getting spurious errors just because the >> superblock >> summaries were incorrect. The other alternative is to change e2fsck >> so that it doesn't consider just a block/inode summary an error. > > We've been doing this for a while --- e2fsprogs 1.34, since April > 2003. In e2fsck/super.c:check_super_block(): > > if ((free_blocks != sb->s_free_blocks_count) || > (free_inodes != sb->s_free_inodes_count)) { > if (ctx->options & E2F_OPT_READONLY) > ext2fs_unmark_valid(fs); > else { The problem is that if you do "e2fsck -fn" it will still report this as an error in the filesystem, even though "e2fsck -fp" will silently fix it. I just repeated this test and still see errors, even 30 minutes after a file was modified, even after multiple syncs. [adilger@webber ~]$ sync; sleep 10; sync [adilger@webber ~]$ e2fsck -fn /dev/dm-0 e2fsck 1.41.6.sun1 (30-May-2009) Warning! /dev/dm-0 is mounted. Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Deleted inode 884739 has zero dtime. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -1784645 Fix? no Inode bitmap differences: -884739 Fix? no home: ********** WARNING: Filesystem still has errors ********** home: 72709/2621440 files (17.6% non-contiguous), 4730591/5242880 blocks [adilger@webber ~]$ echo $? 4 > Hence, I think it's safe to eliminate the updates in ext4_statfs() > altogether. Not yet. I'd be happy if the "-n" e2fsck treated these fields in the same way as it does for the real e2fsck. The other thing that comes to mind is that we don't recover the journal for a read-only e2fsck, but we DO recover it on a read-only mount seems inconsistent. It wouldn't be hard to have e2fsck -n read the journal and persistently cache the journal blocks in its internal cache (i.e. flag them so they can't be discarded from cache) before it runs the rest of the e2fsck. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-16 23:38 ` Andreas Dilger @ 2009-11-19 19:08 ` tytso 2009-11-23 11:57 ` Duane Griffin 0 siblings, 1 reply; 14+ messages in thread From: tytso @ 2009-11-19 19:08 UTC (permalink / raw) To: Andreas Dilger; +Cc: Eric Sandeen, ext4 development On Mon, Nov 16, 2009 at 03:38:16PM -0800, Andreas Dilger wrote: > > The problem is that if you do "e2fsck -fn" it will still report this > as an error in the filesystem, even though "e2fsck -fp" will > silently fix it. I just repeated this test and still see errors, > even 30 minutes after a file was modified, even after multiple > syncs. Sure, but running e2fsck -fn on a mounted file system will always potentially show problems. In fact, in your demonstration: > [adilger@webber ~]$ sync; sleep 10; sync > [adilger@webber ~]$ e2fsck -fn /dev/dm-0 > e2fsck 1.41.6.sun1 (30-May-2009) > Warning! /dev/dm-0 is mounted. > Warning: skipping journal recovery because doing a read-only > filesystem check. ... > Pass 1: Checking inodes, blocks, and sizes > Deleted inode 884739 has zero dtime. Fix? no ... > Pass 5: Checking group summary information\x0f > Block bitmap differences: -1784645 > Fix? no > > Inode bitmap differences: -884739 > Fix? no .... neither of these errors would be fixed by the hacking of updating the summary free blocks and inode counts. If the concern is what happens when someone runs e2fsck -fn on a mountd file system, I have a very hard time getting excited about that.... > The other thing that comes to mind is that we don't recover the journal > for a read-only e2fsck, but we DO recover it on a read-only mount > seems inconsistent. It wouldn't be hard to have e2fsck -n read the > journal and > persistently cache the journal blocks in its internal cache (i.e. flag > them so they can't be discarded from cache) before it runs the rest > of the > e2fsck. Eventually it would be nice if we did the same thing in both kernel and userspace when doing a read-only mount/check: build a redirection table that maps specific physical blocks to the block in the journal, and whenever the system tries to access a specific physical block, we look up the proper block to use instead in the redirection block. The one tricky bit about doing this in the kernel is that we would still have to replay the journal in the case of the read-only root. Why? Because otherwise older e2fsck's would get confused and replay the journal, and that would lead to some potentially serious confusion. Even if we fix this in future versions of e2fsck, we still need to be careful dealing with remounting a r/o filesystem to be read/write, especially in the journal=data mode. The simple way of handling journaled data blocks is to hack the bmap() function to use the redirection block, but the problem with doing that is the journal block will be left in the buffer heads in the page cache. If the file system is remounted r/w without first flushing these buffer heads, future attempts to modify these pages in the page cache could result in a random block in the journalling getting corrupted by an update, instead of updating the proper final location on disk for that data block. If we have someone who is at least some basic experience in kernel coding, but and an entry-level project getting involved with ext4, this would be an ideal, self-contained thing to try doing. I'd suggest implementing it in userspace first, using the userspace/kernel API framework that allows e2fsck/recovery.c to be roughly kept in sync with fs/jbd[2]/recovery.c, and avoiding the hair of r/o roots by always replaying the journal in the case of the root file system. Anyone interested? If so, let me know... - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-19 19:08 ` tytso @ 2009-11-23 11:57 ` Duane Griffin 2009-11-23 14:26 ` tytso 0 siblings, 1 reply; 14+ messages in thread From: Duane Griffin @ 2009-11-23 11:57 UTC (permalink / raw) To: tytso; +Cc: Andreas Dilger, Eric Sandeen, ext4 development 2009/11/19 <tytso@mit.edu>: > On Mon, Nov 16, 2009 at 03:38:16PM -0800, Andreas Dilger wrote: >> The other thing that comes to mind is that we don't recover the journal >> for a read-only e2fsck, but we DO recover it on a read-only mount >> seems inconsistent. It wouldn't be hard to have e2fsck -n read the >> journal and >> persistently cache the journal blocks in its internal cache (i.e. flag >> them so they can't be discarded from cache) before it runs the rest >> of the >> e2fsck. > > Eventually it would be nice if we did the same thing in both kernel > and userspace when doing a read-only mount/check: build a redirection > table that maps specific physical blocks to the block in the journal, > and whenever the system tries to access a specific physical block, we > look up the proper block to use instead in the redirection block. Unfortunately you can't just blindly give back the journalled block: it may have been escaped. So you need to read in the block from the journal, unescape it if required, then give it back. > The one tricky bit about doing this in the kernel is that we would > still have to replay the journal in the case of the read-only root. > Why? Because otherwise older e2fsck's would get confused and replay > the journal, and that would lead to some potentially serious > confusion. Even if we fix this in future versions of e2fsck, we still > need to be careful dealing with remounting a r/o filesystem to be > read/write, especially in the journal=data mode. Hmm. The e2fsck confusion is an interesting wrinkle. > The simple way of handling journaled data blocks is to hack the > bmap() function to use the redirection block, but the problem with > doing that is the journal block will be left in the buffer heads in > the page cache. If the file system is remounted r/w without first > flushing these buffer heads, future attempts to modify these pages in > the page cache could result in a random block in the journalling > getting corrupted by an update, instead of updating the proper final > location on disk for that data block. Yes, they certainly need to be flushed. > If we have someone who is at least some basic experience in kernel > coding, but and an entry-level project getting involved with ext4, > this would be an ideal, self-contained thing to try doing. I'd > suggest implementing it in userspace first, using the userspace/kernel > API framework that allows e2fsck/recovery.c to be roughly kept in sync > with fs/jbd[2]/recovery.c, and avoiding the hair of r/o roots by > always replaying the journal in the case of the root file system. > Anyone interested? If so, let me know... I am (still) interested in this. I'll have a look at the userspace side of things. > - Ted Cheers, Duane. -- "I never could learn to drink that blood and call it wine" - Bob Dylan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-23 11:57 ` Duane Griffin @ 2009-11-23 14:26 ` tytso 2009-11-23 14:40 ` Duane Griffin 0 siblings, 1 reply; 14+ messages in thread From: tytso @ 2009-11-23 14:26 UTC (permalink / raw) To: Duane Griffin; +Cc: Andreas Dilger, Eric Sandeen, ext4 development On Mon, Nov 23, 2009 at 11:57:44AM +0000, Duane Griffin wrote: > Unfortunately you can't just blindly give back the journalled block: > it may have been escaped. So you need to read in the block from the > journal, unescape it if required, then give it back. Good point; this is going to make supporting a read-only mount that doesn't replay the journal very difficult to implement for data=journal mode, since it would mean intercepting the actual block I/O read functions for data reads, which is outside of fs/ext4 in the generic fs/ and mm/ functions. Doing it for metadata blocks will be annoying, but at least it's all inside fs/ext4. - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() 2009-11-23 14:26 ` tytso @ 2009-11-23 14:40 ` Duane Griffin 0 siblings, 0 replies; 14+ messages in thread From: Duane Griffin @ 2009-11-23 14:40 UTC (permalink / raw) To: tytso; +Cc: Andreas Dilger, Eric Sandeen, ext4 development 2009/11/23 <tytso@mit.edu>: > On Mon, Nov 23, 2009 at 11:57:44AM +0000, Duane Griffin wrote: >> Unfortunately you can't just blindly give back the journalled block: >> it may have been escaped. So you need to read in the block from the >> journal, unescape it if required, then give it back. > > Good point; this is going to make supporting a read-only mount that > doesn't replay the journal very difficult to implement for > data=journal mode, since it would mean intercepting the actual block > I/O read functions for data reads, which is outside of fs/ext4 in the > generic fs/ and mm/ functions. Doing it for metadata blocks will be > annoying, but at least it's all inside fs/ext4. Yes, that is the sticking point I got to with my last attempt at this :( I was kinda hoping that someone would have a bright idea that doesn't involve hacking an IO completion callback into get_block_t... > - Ted Cheers, Duane. -- "I never could learn to drink that blood and call it wine" - Bob Dylan ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-11-23 14:40 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-11-06 22:33 [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() Eric Sandeen 2009-11-07 0:26 ` Andreas Dilger 2009-11-07 1:08 ` Eric Sandeen 2009-11-08 21:48 ` Theodore Tso 2009-11-08 22:09 ` Eric Sandeen 2009-11-09 12:53 ` Theodore Tso 2009-11-09 17:55 ` Andreas Dilger 2009-11-09 4:41 ` Andreas Dilger 2009-11-15 3:29 ` Theodore Tso 2009-11-16 23:38 ` Andreas Dilger 2009-11-19 19:08 ` tytso 2009-11-23 11:57 ` Duane Griffin 2009-11-23 14:26 ` tytso 2009-11-23 14:40 ` Duane Griffin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.