* Fix(es) for ext2 fsync bug @ 2007-02-14 19:54 Valerie Henson 2007-02-14 20:31 ` David Chinner ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Valerie Henson @ 2007-02-14 19:54 UTC (permalink / raw) To: linux-fsdevel; +Cc: Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o Just some quick notes on possible ways to fix the ext2 fsync bug that eXplode found. Whether or not anyone will bother to implement it is another matter. Background: The eXplode file system checker found a bug in ext2 fsync behavior. Do the following: truncate file A, create file B which reallocates one of A's old indirect blocks, fsync file B. If you then crash before file A's metadata is all written out, fsck will complete the truncate for file A... thereby deleting file B's data. So fsync file B doesn't guarantee data is on disk after a crash. Details: http://www.stanford.edu/~engler/explode-osdi06.pdf Two possible solutions I can think of: * Rearrange order of duplicate block checking and fixing file size in fsck. Not sure how hard this is. (Ted?) * Keep a set of "still allocated on disk" block bitmaps that gets flushed whenever a sync happens. Don't allocate these blocks. Journaling file systems already have to do this. -VAL ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson @ 2007-02-14 20:31 ` David Chinner 2007-02-14 21:26 ` Dave Kleikamp 2007-02-14 21:08 ` sfaibish 2007-02-15 14:20 ` Theodore Tso 2 siblings, 1 reply; 20+ messages in thread From: David Chinner @ 2007-02-14 20:31 UTC (permalink / raw) To: Valerie Henson Cc: linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote: > Just some quick notes on possible ways to fix the ext2 fsync bug that > eXplode found. Whether or not anyone will bother to implement it is > another matter. > > Background: The eXplode file system checker found a bug in ext2 fsync > behavior. Do the following: truncate file A, create file B which > reallocates one of A's old indirect blocks, fsync file B. If you then > crash before file A's metadata is all written out, fsck will complete > the truncate for file A... thereby deleting file B's data. So fsync > file B doesn't guarantee data is on disk after a crash. Details: > > http://www.stanford.edu/~engler/explode-osdi06.pdf > > Two possible solutions I can think of: > > * Rearrange order of duplicate block checking and fixing file size in > fsck. Not sure how hard this is. (Ted?) > > * Keep a set of "still allocated on disk" block bitmaps that gets > flushed whenever a sync happens. Don't allocate these blocks. > Journaling file systems already have to do this. You don't need anything on disk or to fsck to fix this problem - just avoid it completely by keeping a list of recently truncated blocks in memory and don't reuse them until the old owner inode is sync'd to disk. XFS solves this problem in exactly this manner - it keeps a list of recently freed blocks whose freeing transactions have not yet been committed to disk to prevent them from being reused before it is safe to. See xfs_alloc_search_busy() and callers - if we try to reallocate a "busy" extent, we force the log to get the free transaction on disk before allowing the block to be reusued... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-14 20:31 ` David Chinner @ 2007-02-14 21:26 ` Dave Kleikamp 2007-02-14 23:32 ` David Chinner 0 siblings, 1 reply; 20+ messages in thread From: Dave Kleikamp @ 2007-02-14 21:26 UTC (permalink / raw) To: David Chinner Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o On Thu, 2007-02-15 at 07:31 +1100, David Chinner wrote: > On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote: > > Just some quick notes on possible ways to fix the ext2 fsync bug that > > eXplode found. Whether or not anyone will bother to implement it is > > another matter. > > > > Background: The eXplode file system checker found a bug in ext2 fsync > > behavior. Do the following: truncate file A, create file B which > > reallocates one of A's old indirect blocks, fsync file B. If you then > > crash before file A's metadata is all written out, fsck will complete > > the truncate for file A... thereby deleting file B's data. So fsync > > file B doesn't guarantee data is on disk after a crash. Details: > > > > http://www.stanford.edu/~engler/explode-osdi06.pdf > > > > Two possible solutions I can think of: > > > > * Rearrange order of duplicate block checking and fixing file size in > > fsck. Not sure how hard this is. (Ted?) > > > > * Keep a set of "still allocated on disk" block bitmaps that gets > > flushed whenever a sync happens. Don't allocate these blocks. > > Journaling file systems already have to do this. > > You don't need anything on disk or to fsck to fix this problem - > just avoid it completely by keeping a list of recently truncated > blocks in memory and don't reuse them until the old owner inode is > sync'd to disk. I think that's pretty much what Val is suggesting. She suggests bitmaps rather than a list though. Maybe she should have used a better term than "flushed", as this list only needs to be cleared, rather than written to disk. -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-14 21:26 ` Dave Kleikamp @ 2007-02-14 23:32 ` David Chinner 0 siblings, 0 replies; 20+ messages in thread From: David Chinner @ 2007-02-14 23:32 UTC (permalink / raw) To: Dave Kleikamp Cc: David Chinner, Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o On Wed, Feb 14, 2007 at 03:26:22PM -0600, Dave Kleikamp wrote: > On Thu, 2007-02-15 at 07:31 +1100, David Chinner wrote: > > On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote: > > > Just some quick notes on possible ways to fix the ext2 fsync bug that > > > eXplode found. Whether or not anyone will bother to implement it is > > > another matter. > > > > > > Background: The eXplode file system checker found a bug in ext2 fsync > > > behavior. Do the following: truncate file A, create file B which > > > reallocates one of A's old indirect blocks, fsync file B. If you then > > > crash before file A's metadata is all written out, fsck will complete > > > the truncate for file A... thereby deleting file B's data. So fsync > > > file B doesn't guarantee data is on disk after a crash. Details: > > > > > > http://www.stanford.edu/~engler/explode-osdi06.pdf > > > > > > Two possible solutions I can think of: > > > > > > * Rearrange order of duplicate block checking and fixing file size in > > > fsck. Not sure how hard this is. (Ted?) > > > > > > * Keep a set of "still allocated on disk" block bitmaps that gets > > > flushed whenever a sync happens. Don't allocate these blocks. > > > Journaling file systems already have to do this. > > > > You don't need anything on disk or to fsck to fix this problem - just > > avoid it completely by keeping a list of recently truncated blocks in > > memory and don't reuse them until the old owner inode is sync'd to disk. > > I think that's pretty much what Val is suggesting. She suggests bitmaps > rather than a list though. Maybe she should have used a better term than > "flushed", as this list only needs to be cleared, rather than written to > disk. Yeah, probably was - I misparsed the still allocated on disk block bitmaps phrase differently to what may have been intended... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson 2007-02-14 20:31 ` David Chinner @ 2007-02-14 21:08 ` sfaibish 2007-02-15 14:20 ` Theodore Tso 2 siblings, 0 replies; 20+ messages in thread From: sfaibish @ 2007-02-14 21:08 UTC (permalink / raw) To: Valerie Henson, linux-fsdevel Cc: Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o, kernel list Val, Maybe it is not only our (FS people) problem. We probably need to bring the kernel people judge as ext2 and ext3 are the base Linux FS. I add the kernel list for opinion. /Sorin On Wed, 14 Feb 2007 14:54:54 -0500, Valerie Henson <val_henson@linux.intel.com> wrote: > Just some quick notes on possible ways to fix the ext2 fsync bug that > eXplode found. Whether or not anyone will bother to implement it is > another matter. > > Background: The eXplode file system checker found a bug in ext2 fsync > behavior. Do the following: truncate file A, create file B which > reallocates one of A's old indirect blocks, fsync file B. If you then > crash before file A's metadata is all written out, fsck will complete > the truncate for file A... thereby deleting file B's data. So fsync > file B doesn't guarantee data is on disk after a crash. Details: > > http://www.stanford.edu/~engler/explode-osdi06.pdf > > Two possible solutions I can think of: > > * Rearrange order of duplicate block checking and fixing file size in > fsck. Not sure how hard this is. (Ted?) > > * Keep a set of "still allocated on disk" block bitmaps that gets > flushed whenever a sync happens. Don't allocate these blocks. > Journaling file systems already have to do this. > > -VAL > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson 2007-02-14 20:31 ` David Chinner 2007-02-14 21:08 ` sfaibish @ 2007-02-15 14:20 ` Theodore Tso 2007-02-15 15:09 ` Dave Kleikamp 2007-02-20 21:30 ` Valerie Henson 2 siblings, 2 replies; 20+ messages in thread From: Theodore Tso @ 2007-02-15 14:20 UTC (permalink / raw) To: Valerie Henson; +Cc: linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote: > Background: The eXplode file system checker found a bug in ext2 fsync > behavior. Do the following: truncate file A, create file B which > reallocates one of A's old indirect blocks, fsync file B. If you then > crash before file A's metadata is all written out, fsck will complete > the truncate for file A... thereby deleting file B's data. So fsync > file B doesn't guarantee data is on disk after a crash. Details: It's actually not the case that fsck will complete the truncate for file A. The problem is that while e2fsck is processing indirect blocks in pass 1, the block which is marked as file A's indirect block (but which actually contain's file B's data) gets "fixed" when e2fsck sees block numbers which look like illegal block numbers. So this ends up corrupting file B's data. This is actually legal end result, BTW, since it's POSIX states the result of fsync() is undefined if the system crashes. Technically fsync() did actually guarantee that file B's data is "on disk"; the problem is that e2fsck would corrupt the data afterwards. Ironically, fsync()'ing file B actually makes it more likely that it might get corrupted afterwards, since normally filesystem metadata gets sync'ed out on 5 second intervals, while data gets sync'ed out at 30 second intervals. > * Rearrange order of duplicate block checking and fixing file size in > fsck. Not sure how hard this is. (Ted?) It's not a matter of changing when we deal with fixing the file size, as described above. At the fsck time, we would need to keep backup copies of any indirect blocks that get modified for whatever reason, and then in pass 1D, when we clone a block that has been claimed by multiple inods, the inodes which claim the block as a data block should get a copy of the block before it was modified by e2fsck. > * Keep a set of "still allocated on disk" block bitmaps that gets > flushed whenever a sync happens. Don't allocate these blocks. > Journaling file systems already have to do this. A list would be more efficient, as others have pointed out. That would work, although the knowing when entries could be removed from the list. The machinery for knowing when metadata has been updated isn't present in ext2, and that's a fair amount of complexity. You could clear the list/bitmap after the 5 second metadata flush command has been kicked off, or if you associate a data block with the previous inode's owner, you could clear the entry when the inode's dirty bit has been cleared, but that doesn't completely get rid of the race unless you tie it to when the write has completed (and this assumes write barriers to make sure the block was actually flushed to the media). Another very heavyweight approach would be to simply force a full sync of the filesystem whenever fysnc() is called. Not pretty, and without the proper write ordering, the race is still potentially there. I'd say that the best way to handle this is in fsck, but quite frankly it's relatively low priority "bug" to handle, since a much simpler workaround is to tell people to use ext3 instead. Regards, - Ted ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 14:20 ` Theodore Tso @ 2007-02-15 15:09 ` Dave Kleikamp 2007-02-15 15:59 ` sfaibish 2007-02-20 21:13 ` Valerie Henson 2007-02-20 21:30 ` Valerie Henson 1 sibling, 2 replies; 20+ messages in thread From: Dave Kleikamp @ 2007-02-15 15:09 UTC (permalink / raw) To: Theodore Tso Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Thu, 2007-02-15 at 09:20 -0500, Theodore Tso wrote: > Another very heavyweight approach would be to simply force a full sync > of the filesystem whenever fysnc() is called. Not pretty, and without > the proper write ordering, the race is still potentially there. I don't think this race is an issue, in that it would require the crash to happen before the fsync completed, so there would be no expectation that the data is safe. It's a moot point, since I don't think this is an acceptable solution anyway. > I'd say that the best way to handle this is in fsck, but quite frankly > it's relatively low priority "bug" to handle, since a much simpler > workaround is to tell people to use ext3 instead. Right. Who's still using ext2? -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 15:09 ` Dave Kleikamp @ 2007-02-15 15:59 ` sfaibish 2007-02-15 16:39 ` Dave Kleikamp 2007-02-15 18:54 ` Dawson Engler 2007-02-20 21:13 ` Valerie Henson 1 sibling, 2 replies; 20+ messages in thread From: sfaibish @ 2007-02-15 15:59 UTC (permalink / raw) To: Dave Kleikamp, Theodore Tso Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Thu, 15 Feb 2007 10:09:22 -0500, Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote: > On Thu, 2007-02-15 at 09:20 -0500, Theodore Tso wrote: > >> Another very heavyweight approach would be to simply force a full sync >> of the filesystem whenever fysnc() is called. Not pretty, and without >> the proper write ordering, the race is still potentially there. > > I don't think this race is an issue, in that it would require the crash > to happen before the fsync completed, so there would be no expectation > that the data is safe. It's a moot point, since I don't think this is > an acceptable solution anyway. > >> I'd say that the best way to handle this is in fsck, but quite frankly >> it's relatively low priority "bug" to handle, since a much simpler >> workaround is to tell people to use ext3 instead. > > Right. Who's still using ext2? It was my understanding from the persentation of Dawson that ext3 and jfs have same problem. It is not an ext2 only problem. Also whatever solution we adopt we need to be sure that we test it using the eXplode methodology. /Sorin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 15:59 ` sfaibish @ 2007-02-15 16:39 ` Dave Kleikamp 2007-02-15 17:15 ` Theodore Tso [not found] ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com> 2007-02-15 18:54 ` Dawson Engler 1 sibling, 2 replies; 20+ messages in thread From: Dave Kleikamp @ 2007-02-15 16:39 UTC (permalink / raw) To: sfaibish Cc: Theodore Tso, Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Thu, 2007-02-15 at 10:59 -0500, sfaibish wrote: > On Thu, 15 Feb 2007 10:09:22 -0500, Dave Kleikamp > <shaggy@linux.vnet.ibm.com> wrote: > > > On Thu, 2007-02-15 at 09:20 -0500, Theodore Tso wrote: > > > >> Another very heavyweight approach would be to simply force a full sync > >> of the filesystem whenever fysnc() is called. Not pretty, and without > >> the proper write ordering, the race is still potentially there. > > > > I don't think this race is an issue, in that it would require the crash > > to happen before the fsync completed, so there would be no expectation > > that the data is safe. It's a moot point, since I don't think this is > > an acceptable solution anyway. > > > >> I'd say that the best way to handle this is in fsck, but quite frankly > >> it's relatively low priority "bug" to handle, since a much simpler > >> workaround is to tell people to use ext3 instead. > > > > Right. Who's still using ext2? > It was my understanding from the persentation of Dawson that ext3 and jfs > have > same problem. Hmm. If jfs has the problem, it is a bug. jfs is designed to handle this correctly. I'm pretty sure I've fixed at least one bug that eXplode has uncovered in the past. I'm not sure what was mentioned in the presentation though. I'd like any information about current problems in jfs. > It is not an ext2 only problem. Also whatever solution we > adopt > we need to be sure that we test it using the eXplode methodology. > > /Sorin -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 16:39 ` Dave Kleikamp @ 2007-02-15 17:15 ` Theodore Tso 2007-02-15 17:52 ` sfaibish [not found] ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com> 1 sibling, 1 reply; 20+ messages in thread From: Theodore Tso @ 2007-02-15 17:15 UTC (permalink / raw) To: Dave Kleikamp Cc: sfaibish, Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Thu, Feb 15, 2007 at 10:39:02AM -0600, Dave Kleikamp wrote: > > It was my understanding from the persentation of Dawson that ext3 and jfs > > have ame problem. > > Hmm. If jfs has the problem, it is a bug. jfs is designed to handle > this correctly. I'm pretty sure I've fixed at least one bug that > eXplode has uncovered in the past. I'm not sure what was mentioned in > the presentation though. I'd like any information about current > problems in jfs. That was not my understanding of the charts that were presented earlier this week. Ext3 journaling code will deal with this case explicitly, just as jfs does. - Ted ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 17:15 ` Theodore Tso @ 2007-02-15 17:52 ` sfaibish 0 siblings, 0 replies; 20+ messages in thread From: sfaibish @ 2007-02-15 17:52 UTC (permalink / raw) To: Theodore Tso, Dave Kleikamp Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Thu, 15 Feb 2007 12:15:59 -0500, Theodore Tso <tytso@mit.edu> wrote: > On Thu, Feb 15, 2007 at 10:39:02AM -0600, Dave Kleikamp wrote: >> > It was my understanding from the persentation of Dawson that ext3 and >> jfs >> > have ame problem. >> >> Hmm. If jfs has the problem, it is a bug. jfs is designed to handle >> this correctly. I'm pretty sure I've fixed at least one bug that >> eXplode has uncovered in the past. I'm not sure what was mentioned in >> the presentation though. I'd like any information about current >> problems in jfs. > > That was not my understanding of the charts that were presented > earlier this week. Ext3 journaling code will deal with this case > explicitly, just as jfs does. My mistake: there were fsync bugs in JFS and ext2 that cannot be fixed by fsck. Not same for JFS and ext2. See quote: "There were two interesting fsync errors, one in JFS and one in ext2. The ext2 bug is a case where an implementation error points out a deeper design problem." ... "We found two bugs (one in JFS, one in Reiser4) where crashed disks cannot be recovered by fsck." > > - Ted > > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com>]
* Re: Fix(es) for ext2 fsync bug [not found] ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com> @ 2007-02-15 19:26 ` Dave Kleikamp 0 siblings, 0 replies; 20+ messages in thread From: Dave Kleikamp @ 2007-02-15 19:26 UTC (permalink / raw) To: Junfeng Yang Cc: sfaibish, Theodore Tso, Valerie Henson, linux-fsdevel, Can Sar, Dawson Engler On Thu, 2007-02-15 at 11:11 -0800, Junfeng Yang wrote: > Hmm. If jfs has the problem, it is a bug. jfs is designed to > handle > this correctly. I'm pretty sure I've fixed at least one bug > that > eXplode has uncovered in the past. I'm not sure what was > mentioned in > the presentation though. I'd like any information about > current > problems in jfs. > > > I believe you have fixed the JFS fsync bug, Dave. It was caused by > reusing a directory inode as a file inode. If the machine crashes > later, fsck would think this file is a directory, and clear all its > data. Yeah. That one was fixed a while back. Thanks for clearing this up. Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 15:59 ` sfaibish 2007-02-15 16:39 ` Dave Kleikamp @ 2007-02-15 18:54 ` Dawson Engler [not found] ` <21e789ec0702151118x1c6af801gd34981d72db0f5b2@mail.gmail.com> 1 sibling, 1 reply; 20+ messages in thread From: Dawson Engler @ 2007-02-15 18:54 UTC (permalink / raw) To: sfaibish Cc: Dave Kleikamp, Theodore Tso, Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang > It was my understanding from the persentation of Dawson that ext3 and jfs > have > same problem. It is not an ext2 only problem. Also whatever solution we > adopt > we need to be sure that we test it using the eXplode methodology. apologies for dropping in randomly into the discussion: if this is about the crash-during-recovery bugs, the specific ones i discussed have been fixed in jfs and ext3 (junfeng: this is correct, right?). i should have made this clear in the talk (along with many other things: grabbing junfeng's slides and blathering about them w/o preperation is not the right algorithm for giving a good talk.) the other error --- fsync of file data on ext2 that reuses a freed inode from a file that was not flushed to disk ---- is still open. ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <21e789ec0702151118x1c6af801gd34981d72db0f5b2@mail.gmail.com>]
[parent not found: <21e789ec0702151128x744f61e5lb24d2da972af185a@mail.gmail.com>]
* Re: Fix(es) for ext2 fsync bug [not found] ` <21e789ec0702151128x744f61e5lb24d2da972af185a@mail.gmail.com> @ 2007-02-16 1:18 ` Theodore Tso 0 siblings, 0 replies; 20+ messages in thread From: Theodore Tso @ 2007-02-16 1:18 UTC (permalink / raw) To: Junfeng Yang Cc: engler, sfaibish, Dave Kleikamp, Valerie Henson, linux-fsdevel, Can Sar On Thu, Feb 15, 2007 at 11:28:46AM -0800, Junfeng Yang wrote: > > Actually, we found a crash-during-recovery bug in ext3 too. It's a race > between resetting the journal super block and replay of the journal. This > bug was fixed by Ted long time ago (3 years?). That was found in your original work (using UML) not the more recent work using EXPLODE, correct? - Ted ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 15:09 ` Dave Kleikamp 2007-02-15 15:59 ` sfaibish @ 2007-02-20 21:13 ` Valerie Henson [not found] ` <21e789ec0702201330x1c2706b7kcd055b97cb37e0e@mail.gmail.com> 1 sibling, 1 reply; 20+ messages in thread From: Valerie Henson @ 2007-02-20 21:13 UTC (permalink / raw) To: Dave Kleikamp Cc: Theodore Tso, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Thu, Feb 15, 2007 at 09:09:22AM -0600, Dave Kleikamp wrote: > > Right. Who's still using ext2? Google. (GoogleFS runs on top of ext2.) -VAL ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <21e789ec0702201330x1c2706b7kcd055b97cb37e0e@mail.gmail.com>]
* Re: Fix(es) for ext2 fsync bug [not found] ` <21e789ec0702201330x1c2706b7kcd055b97cb37e0e@mail.gmail.com> @ 2007-02-20 21:39 ` Valerie Henson 2007-02-20 21:47 ` Dawson Engler 2007-02-20 22:25 ` Dave Kleikamp 0 siblings, 2 replies; 20+ messages in thread From: Valerie Henson @ 2007-02-20 21:39 UTC (permalink / raw) To: Junfeng Yang Cc: Dave Kleikamp, Theodore Tso, linux-fsdevel, Can Sar, Dawson Engler On Tue, Feb 20, 2007 at 01:30:25PM -0800, Junfeng Yang wrote: > On 2/20/07, Valerie Henson <val_henson@linux.intel.com> wrote: > > > >Google. (GoogleFS runs on top of ext2.) > > It's surprising to know that... I guess they reply on GoogleFS's own > replication and checksumming for consistency. Yep, they just want a local file system with ultrafast on-line performance. They don't care about recovery time particularly because of the GoogleFS replication (although I heard rumors they have some fast fsck scheme, maybe resembling the dirty bit stuff I did last year). -VAL ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-20 21:39 ` Valerie Henson @ 2007-02-20 21:47 ` Dawson Engler 2007-02-20 22:25 ` Dave Kleikamp 1 sibling, 0 replies; 20+ messages in thread From: Dawson Engler @ 2007-02-20 21:47 UTC (permalink / raw) To: Valerie Henson Cc: Junfeng Yang, Dave Kleikamp, Theodore Tso, linux-fsdevel, Can Sar > On Tue, Feb 20, 2007 at 01:30:25PM -0800, Junfeng Yang wrote: > > On 2/20/07, Valerie Henson <val_henson@linux.intel.com> wrote: > > > > > >Google. (GoogleFS runs on top of ext2.) > > > > It's surprising to know that... I guess they reply on GoogleFS's own > > replication and checksumming for consistency. > > Yep, they just want a local file system with ultrafast on-line > performance. They don't care about recovery time particularly because > of the GoogleFS replication (although I heard rumors they have some > fast fsck scheme, maybe resembling the dirty bit stuff I did last > year). Actually, according to the GFS paper (which may be out of date), for the chunkservers that is true, but for their "master" they really want fast recovery as a way to reduce mean-time-to-repair (and thus increase availability). Though, given that they have shadow masters perhaps everyone is happy as long as master recovery usually fast. Dawson ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-20 21:39 ` Valerie Henson 2007-02-20 21:47 ` Dawson Engler @ 2007-02-20 22:25 ` Dave Kleikamp 1 sibling, 0 replies; 20+ messages in thread From: Dave Kleikamp @ 2007-02-20 22:25 UTC (permalink / raw) To: Valerie Henson Cc: Junfeng Yang, Theodore Tso, linux-fsdevel, Can Sar, Dawson Engler On Tue, 2007-02-20 at 21:39 +0000, Valerie Henson wrote: > On Tue, Feb 20, 2007 at 01:30:25PM -0800, Junfeng Yang wrote: > > On 2/20/07, Valerie Henson <val_henson@linux.intel.com> wrote: > > > > > >Google. (GoogleFS runs on top of ext2.) > > > > It's surprising to know that... I guess they reply on GoogleFS's own > > replication and checksumming for consistency. > > Yep, they just want a local file system with ultrafast on-line > performance. They don't care about recovery time particularly because > of the GoogleFS replication (although I heard rumors they have some > fast fsck scheme, maybe resembling the dirty bit stuff I did last > year). I wonder if they would consider this a important bug? I know nothing about GoogleFS, but I would guess that they have more sophisticated recovery than relying on an fsync shortly before a crash to ensure data integrity. Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-15 14:20 ` Theodore Tso 2007-02-15 15:09 ` Dave Kleikamp @ 2007-02-20 21:30 ` Valerie Henson 2007-02-20 22:12 ` Erez Zadok 1 sibling, 1 reply; 20+ messages in thread From: Valerie Henson @ 2007-02-20 21:30 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler On Thu, Feb 15, 2007 at 09:20:21AM -0500, Theodore Tso wrote: > > It's actually not the case that fsck will complete the truncate for > file A. The problem is that while e2fsck is processing indirect > blocks in pass 1, the block which is marked as file A's indirect block > (but which actually contain's file B's data) gets "fixed" when e2fsck > sees block numbers which look like illegal block numbers. So this > ends up corrupting file B's data. Ah, that's what happens. Thanks for the clarification. > This is actually legal end result, BTW, since it's POSIX states the > result of fsync() is undefined if the system crashes. Technically And POSIX also states that sync() is only required to schedule the writes, but may return before the actual writing is done. Looks like the only way you can guarantee data is on-disk according to POSIX is to reboot the system after every synchronous write. Man, we file systems developers sure have it easy! -VAL ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fix(es) for ext2 fsync bug 2007-02-20 21:30 ` Valerie Henson @ 2007-02-20 22:12 ` Erez Zadok 0 siblings, 0 replies; 20+ messages in thread From: Erez Zadok @ 2007-02-20 22:12 UTC (permalink / raw) To: Valerie Henson Cc: Theodore Tso, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler In message <20070220213014.GC5264@nifty>, Valerie Henson writes: > On Thu, Feb 15, 2007 at 09:20:21AM -0500, Theodore Tso wrote: > And POSIX also states that sync() is only required to schedule the > writes, but may return before the actual writing is done. Looks like One more reason to form a group to discuss POSIX updates/changes (as per LSF last week). > the only way you can guarantee data is on-disk according to POSIX is > to reboot the system after every synchronous write. Man, we file > systems developers sure have it easy! No need to be that extreme. :-) It should be enough to just unmount all file systems, unload all fs and disk drivers, then reload+remount everything. No? > -VAL Erez. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-02-20 22:25 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson 2007-02-14 20:31 ` David Chinner 2007-02-14 21:26 ` Dave Kleikamp 2007-02-14 23:32 ` David Chinner 2007-02-14 21:08 ` sfaibish 2007-02-15 14:20 ` Theodore Tso 2007-02-15 15:09 ` Dave Kleikamp 2007-02-15 15:59 ` sfaibish 2007-02-15 16:39 ` Dave Kleikamp 2007-02-15 17:15 ` Theodore Tso 2007-02-15 17:52 ` sfaibish [not found] ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com> 2007-02-15 19:26 ` Dave Kleikamp 2007-02-15 18:54 ` Dawson Engler [not found] ` <21e789ec0702151118x1c6af801gd34981d72db0f5b2@mail.gmail.com> [not found] ` <21e789ec0702151128x744f61e5lb24d2da972af185a@mail.gmail.com> 2007-02-16 1:18 ` Theodore Tso 2007-02-20 21:13 ` Valerie Henson [not found] ` <21e789ec0702201330x1c2706b7kcd055b97cb37e0e@mail.gmail.com> 2007-02-20 21:39 ` Valerie Henson 2007-02-20 21:47 ` Dawson Engler 2007-02-20 22:25 ` Dave Kleikamp 2007-02-20 21:30 ` Valerie Henson 2007-02-20 22:12 ` Erez Zadok
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.