All of lore.kernel.org
 help / color / mirror / Atom feed
* Fix(es) for ext2 fsync bug
@ 2007-02-14 19:54 Valerie Henson
  2007-02-14 20:31 ` David Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Valerie Henson @ 2007-02-14 19:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o

Just some quick notes on possible ways to fix the ext2 fsync bug that
eXplode found.  Whether or not anyone will bother to implement it is
another matter.

Background: The eXplode file system checker found a bug in ext2 fsync
behavior.  Do the following: truncate file A, create file B which
reallocates one of A's old indirect blocks, fsync file B.  If you then
crash before file A's metadata is all written out, fsck will complete
the truncate for file A... thereby deleting file B's data.  So fsync
file B doesn't guarantee data is on disk after a crash.  Details:

http://www.stanford.edu/~engler/explode-osdi06.pdf 

Two possible solutions I can think of:

* Rearrange order of duplicate block checking and fixing file size in
  fsck.  Not sure how hard this is. (Ted?)

* Keep a set of "still allocated on disk" block bitmaps that gets
  flushed whenever a sync happens.  Don't allocate these blocks.
  Journaling file systems already have to do this.

-VAL

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson
@ 2007-02-14 20:31 ` David Chinner
  2007-02-14 21:26   ` Dave Kleikamp
  2007-02-14 21:08 ` sfaibish
  2007-02-15 14:20 ` Theodore Tso
  2 siblings, 1 reply; 20+ messages in thread
From: David Chinner @ 2007-02-14 20:31 UTC (permalink / raw)
  To: Valerie Henson
  Cc: linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o

On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote:
> Just some quick notes on possible ways to fix the ext2 fsync bug that
> eXplode found.  Whether or not anyone will bother to implement it is
> another matter.
> 
> Background: The eXplode file system checker found a bug in ext2 fsync
> behavior.  Do the following: truncate file A, create file B which
> reallocates one of A's old indirect blocks, fsync file B.  If you then
> crash before file A's metadata is all written out, fsck will complete
> the truncate for file A... thereby deleting file B's data.  So fsync
> file B doesn't guarantee data is on disk after a crash.  Details:
> 
> http://www.stanford.edu/~engler/explode-osdi06.pdf 
> 
> Two possible solutions I can think of:
> 
> * Rearrange order of duplicate block checking and fixing file size in
>   fsck.  Not sure how hard this is. (Ted?)
> 
> * Keep a set of "still allocated on disk" block bitmaps that gets
>   flushed whenever a sync happens.  Don't allocate these blocks.
>   Journaling file systems already have to do this.

You don't need anything on disk or to fsck to fix this problem -
just avoid it completely by keeping a list of recently truncated
blocks in memory and don't reuse them until the old owner inode is
sync'd to disk.

XFS solves this problem in exactly this manner - it keeps a list of
recently freed blocks whose freeing transactions have not yet been
committed to disk to prevent them from being reused before it is
safe to. See xfs_alloc_search_busy() and callers - if we try to
reallocate a "busy" extent, we force the log to get the free
transaction on disk before allowing the block to be reusued...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson
  2007-02-14 20:31 ` David Chinner
@ 2007-02-14 21:08 ` sfaibish
  2007-02-15 14:20 ` Theodore Tso
  2 siblings, 0 replies; 20+ messages in thread
From: sfaibish @ 2007-02-14 21:08 UTC (permalink / raw)
  To: Valerie Henson, linux-fsdevel
  Cc: Can Sar, Junfeng Yang, Dawson Engler, Theodore Ts'o, kernel list

Val,

Maybe it is not only our (FS people) problem. We probably need to
bring the kernel people judge as ext2 and ext3 are the base Linux FS.
I add the kernel list for opinion.

/Sorin

On Wed, 14 Feb 2007 14:54:54 -0500, Valerie Henson  
<val_henson@linux.intel.com> wrote:

> Just some quick notes on possible ways to fix the ext2 fsync bug that
> eXplode found.  Whether or not anyone will bother to implement it is
> another matter.
>
> Background: The eXplode file system checker found a bug in ext2 fsync
> behavior.  Do the following: truncate file A, create file B which
> reallocates one of A's old indirect blocks, fsync file B.  If you then
> crash before file A's metadata is all written out, fsck will complete
> the truncate for file A... thereby deleting file B's data.  So fsync
> file B doesn't guarantee data is on disk after a crash.  Details:
>
> http://www.stanford.edu/~engler/explode-osdi06.pdf
>
> Two possible solutions I can think of:
>
> * Rearrange order of duplicate block checking and fixing file size in
>   fsck.  Not sure how hard this is. (Ted?)
>
> * Keep a set of "still allocated on disk" block bitmaps that gets
>   flushed whenever a sync happens.  Don't allocate these blocks.
>   Journaling file systems already have to do this.
>
> -VAL
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-14 20:31 ` David Chinner
@ 2007-02-14 21:26   ` Dave Kleikamp
  2007-02-14 23:32     ` David Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Kleikamp @ 2007-02-14 21:26 UTC (permalink / raw)
  To: David Chinner
  Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang,
	Dawson Engler, Theodore Ts'o

On Thu, 2007-02-15 at 07:31 +1100, David Chinner wrote:
> On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote:
> > Just some quick notes on possible ways to fix the ext2 fsync bug that
> > eXplode found.  Whether or not anyone will bother to implement it is
> > another matter.
> > 
> > Background: The eXplode file system checker found a bug in ext2 fsync
> > behavior.  Do the following: truncate file A, create file B which
> > reallocates one of A's old indirect blocks, fsync file B.  If you then
> > crash before file A's metadata is all written out, fsck will complete
> > the truncate for file A... thereby deleting file B's data.  So fsync
> > file B doesn't guarantee data is on disk after a crash.  Details:
> > 
> > http://www.stanford.edu/~engler/explode-osdi06.pdf 
> > 
> > Two possible solutions I can think of:
> > 
> > * Rearrange order of duplicate block checking and fixing file size in
> >   fsck.  Not sure how hard this is. (Ted?)
> > 
> > * Keep a set of "still allocated on disk" block bitmaps that gets
> >   flushed whenever a sync happens.  Don't allocate these blocks.
> >   Journaling file systems already have to do this.
> 
> You don't need anything on disk or to fsck to fix this problem -
> just avoid it completely by keeping a list of recently truncated
> blocks in memory and don't reuse them until the old owner inode is
> sync'd to disk.

I think that's pretty much what Val is suggesting.  She suggests bitmaps
rather than a list though.  Maybe she should have used a better term
than "flushed", as this list only needs to be cleared, rather than
written to disk.

-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-14 21:26   ` Dave Kleikamp
@ 2007-02-14 23:32     ` David Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: David Chinner @ 2007-02-14 23:32 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: David Chinner, Valerie Henson, linux-fsdevel, Can Sar,
	Junfeng Yang, Dawson Engler, Theodore Ts'o

On Wed, Feb 14, 2007 at 03:26:22PM -0600, Dave Kleikamp wrote:
> On Thu, 2007-02-15 at 07:31 +1100, David Chinner wrote:
> > On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote:
> > > Just some quick notes on possible ways to fix the ext2 fsync bug that
> > > eXplode found.  Whether or not anyone will bother to implement it is
> > > another matter.
> > > 
> > > Background: The eXplode file system checker found a bug in ext2 fsync
> > > behavior.  Do the following: truncate file A, create file B which
> > > reallocates one of A's old indirect blocks, fsync file B.  If you then
> > > crash before file A's metadata is all written out, fsck will complete
> > > the truncate for file A... thereby deleting file B's data.  So fsync
> > > file B doesn't guarantee data is on disk after a crash.  Details:
> > > 
> > > http://www.stanford.edu/~engler/explode-osdi06.pdf 
> > > 
> > > Two possible solutions I can think of:
> > > 
> > > * Rearrange order of duplicate block checking and fixing file size in
> > > fsck.  Not sure how hard this is. (Ted?)
> > > 
> > > * Keep a set of "still allocated on disk" block bitmaps that gets
> > > flushed whenever a sync happens.  Don't allocate these blocks.
> > > Journaling file systems already have to do this.
> > 
> > You don't need anything on disk or to fsck to fix this problem - just
> > avoid it completely by keeping a list of recently truncated blocks in
> > memory and don't reuse them until the old owner inode is sync'd to disk.
> 
> I think that's pretty much what Val is suggesting.  She suggests bitmaps
> rather than a list though.  Maybe she should have used a better term than
> "flushed", as this list only needs to be cleared, rather than written to
> disk.

Yeah, probably was - I misparsed the still allocated on disk block bitmaps
phrase differently to what may have been intended...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson
  2007-02-14 20:31 ` David Chinner
  2007-02-14 21:08 ` sfaibish
@ 2007-02-15 14:20 ` Theodore Tso
  2007-02-15 15:09   ` Dave Kleikamp
  2007-02-20 21:30   ` Valerie Henson
  2 siblings, 2 replies; 20+ messages in thread
From: Theodore Tso @ 2007-02-15 14:20 UTC (permalink / raw)
  To: Valerie Henson; +Cc: linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler

On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote:
> Background: The eXplode file system checker found a bug in ext2 fsync
> behavior.  Do the following: truncate file A, create file B which
> reallocates one of A's old indirect blocks, fsync file B.  If you then
> crash before file A's metadata is all written out, fsck will complete
> the truncate for file A... thereby deleting file B's data.  So fsync
> file B doesn't guarantee data is on disk after a crash.  Details:

It's actually not the case that fsck will complete the truncate for
file A.  The problem is that while e2fsck is processing indirect
blocks in pass 1, the block which is marked as file A's indirect block
(but which actually contain's file B's data) gets "fixed" when e2fsck
sees block numbers which look like illegal block numbers.  So this
ends up corrupting file B's data.

This is actually legal end result, BTW, since it's POSIX states the
result of fsync() is undefined if the system crashes.  Technically
fsync() did actually guarantee that file B's data is "on disk"; the
problem is that e2fsck would corrupt the data afterwards.  Ironically,
fsync()'ing file B actually makes it more likely that it might get
corrupted afterwards, since normally filesystem metadata gets sync'ed
out on 5 second intervals, while data gets sync'ed out at 30 second
intervals.

> * Rearrange order of duplicate block checking and fixing file size in
>   fsck.  Not sure how hard this is. (Ted?)

It's not a matter of changing when we deal with fixing the file size,
as described above.  At the fsck time, we would need to keep backup
copies of any indirect blocks that get modified for whatever reason,
and then in pass 1D, when we clone a block that has been claimed by
multiple inods, the inodes which claim the block as a data block
should get a copy of the block before it was modified by e2fsck.

> * Keep a set of "still allocated on disk" block bitmaps that gets
>   flushed whenever a sync happens.  Don't allocate these blocks.
>   Journaling file systems already have to do this.

A list would be more efficient, as others have pointed out.  That
would work, although the knowing when entries could be removed from
the list.  The machinery for knowing when metadata has been updated
isn't present in ext2, and that's a fair amount of complexity.  You
could clear the list/bitmap after the 5 second metadata flush command
has been kicked off, or if you associate a data block with the
previous inode's owner, you could clear the entry when the inode's
dirty bit has been cleared, but that doesn't completely get rid of the
race unless you tie it to when the write has completed (and this
assumes write barriers to make sure the block was actually flushed to
the media).

Another very heavyweight approach would be to simply force a full sync
of the filesystem whenever fysnc() is called.  Not pretty, and without
the proper write ordering, the race is still potentially there.

I'd say that the best way to handle this is in fsck, but quite frankly
it's relatively low priority "bug" to handle, since a much simpler
workaround is to tell people to use ext3 instead.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 14:20 ` Theodore Tso
@ 2007-02-15 15:09   ` Dave Kleikamp
  2007-02-15 15:59     ` sfaibish
  2007-02-20 21:13     ` Valerie Henson
  2007-02-20 21:30   ` Valerie Henson
  1 sibling, 2 replies; 20+ messages in thread
From: Dave Kleikamp @ 2007-02-15 15:09 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler

On Thu, 2007-02-15 at 09:20 -0500, Theodore Tso wrote:

> Another very heavyweight approach would be to simply force a full sync
> of the filesystem whenever fysnc() is called.  Not pretty, and without
> the proper write ordering, the race is still potentially there.

I don't think this race is an issue, in that it would require the crash
to happen before the fsync completed, so there would be no expectation
that the data is safe.  It's a moot point, since I don't think this is
an acceptable solution anyway.

> I'd say that the best way to handle this is in fsck, but quite frankly
> it's relatively low priority "bug" to handle, since a much simpler
> workaround is to tell people to use ext3 instead.

Right.  Who's still using ext2?
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 15:09   ` Dave Kleikamp
@ 2007-02-15 15:59     ` sfaibish
  2007-02-15 16:39       ` Dave Kleikamp
  2007-02-15 18:54       ` Dawson Engler
  2007-02-20 21:13     ` Valerie Henson
  1 sibling, 2 replies; 20+ messages in thread
From: sfaibish @ 2007-02-15 15:59 UTC (permalink / raw)
  To: Dave Kleikamp, Theodore Tso
  Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler

On Thu, 15 Feb 2007 10:09:22 -0500, Dave Kleikamp  
<shaggy@linux.vnet.ibm.com> wrote:

> On Thu, 2007-02-15 at 09:20 -0500, Theodore Tso wrote:
>
>> Another very heavyweight approach would be to simply force a full sync
>> of the filesystem whenever fysnc() is called.  Not pretty, and without
>> the proper write ordering, the race is still potentially there.
>
> I don't think this race is an issue, in that it would require the crash
> to happen before the fsync completed, so there would be no expectation
> that the data is safe.  It's a moot point, since I don't think this is
> an acceptable solution anyway.
>
>> I'd say that the best way to handle this is in fsck, but quite frankly
>> it's relatively low priority "bug" to handle, since a much simpler
>> workaround is to tell people to use ext3 instead.
>
> Right.  Who's still using ext2?
It was my understanding from the persentation of Dawson that ext3 and jfs  
have
same problem. It is not an ext2 only problem. Also whatever solution we  
adopt
we need to be sure that we test it using the eXplode methodology.

/Sorin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 15:59     ` sfaibish
@ 2007-02-15 16:39       ` Dave Kleikamp
  2007-02-15 17:15         ` Theodore Tso
       [not found]         ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com>
  2007-02-15 18:54       ` Dawson Engler
  1 sibling, 2 replies; 20+ messages in thread
From: Dave Kleikamp @ 2007-02-15 16:39 UTC (permalink / raw)
  To: sfaibish
  Cc: Theodore Tso, Valerie Henson, linux-fsdevel, Can Sar,
	Junfeng Yang, Dawson Engler

On Thu, 2007-02-15 at 10:59 -0500, sfaibish wrote:
> On Thu, 15 Feb 2007 10:09:22 -0500, Dave Kleikamp  
> <shaggy@linux.vnet.ibm.com> wrote:
> 
> > On Thu, 2007-02-15 at 09:20 -0500, Theodore Tso wrote:
> >
> >> Another very heavyweight approach would be to simply force a full sync
> >> of the filesystem whenever fysnc() is called.  Not pretty, and without
> >> the proper write ordering, the race is still potentially there.
> >
> > I don't think this race is an issue, in that it would require the crash
> > to happen before the fsync completed, so there would be no expectation
> > that the data is safe.  It's a moot point, since I don't think this is
> > an acceptable solution anyway.
> >
> >> I'd say that the best way to handle this is in fsck, but quite frankly
> >> it's relatively low priority "bug" to handle, since a much simpler
> >> workaround is to tell people to use ext3 instead.
> >
> > Right.  Who's still using ext2?
> It was my understanding from the persentation of Dawson that ext3 and jfs  
> have
> same problem.

Hmm.  If jfs has the problem, it is a bug.  jfs is designed to handle
this correctly.  I'm pretty sure I've fixed at least one bug that
eXplode has uncovered in the past.  I'm not sure what was mentioned in
the presentation though.  I'd like any information about current
problems in jfs.

> It is not an ext2 only problem. Also whatever solution we  
> adopt
> we need to be sure that we test it using the eXplode methodology.
> 
> /Sorin
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 16:39       ` Dave Kleikamp
@ 2007-02-15 17:15         ` Theodore Tso
  2007-02-15 17:52           ` sfaibish
       [not found]         ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com>
  1 sibling, 1 reply; 20+ messages in thread
From: Theodore Tso @ 2007-02-15 17:15 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: sfaibish, Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang,
	Dawson Engler

On Thu, Feb 15, 2007 at 10:39:02AM -0600, Dave Kleikamp wrote:
> > It was my understanding from the persentation of Dawson that ext3 and jfs  
> > have ame problem.
> 
> Hmm.  If jfs has the problem, it is a bug.  jfs is designed to handle
> this correctly.  I'm pretty sure I've fixed at least one bug that
> eXplode has uncovered in the past.  I'm not sure what was mentioned in
> the presentation though.  I'd like any information about current
> problems in jfs.

That was not my understanding of the charts that were presented
earlier this week.  Ext3 journaling code will deal with this case
explicitly, just as jfs does.  

						- Ted

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 17:15         ` Theodore Tso
@ 2007-02-15 17:52           ` sfaibish
  0 siblings, 0 replies; 20+ messages in thread
From: sfaibish @ 2007-02-15 17:52 UTC (permalink / raw)
  To: Theodore Tso, Dave Kleikamp
  Cc: Valerie Henson, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler

On Thu, 15 Feb 2007 12:15:59 -0500, Theodore Tso <tytso@mit.edu> wrote:

> On Thu, Feb 15, 2007 at 10:39:02AM -0600, Dave Kleikamp wrote:
>> > It was my understanding from the persentation of Dawson that ext3 and  
>> jfs
>> > have ame problem.
>>
>> Hmm.  If jfs has the problem, it is a bug.  jfs is designed to handle
>> this correctly.  I'm pretty sure I've fixed at least one bug that
>> eXplode has uncovered in the past.  I'm not sure what was mentioned in
>> the presentation though.  I'd like any information about current
>> problems in jfs.
>
> That was not my understanding of the charts that were presented
> earlier this week.  Ext3 journaling code will deal with this case
> explicitly, just as jfs does.

My mistake: there were fsync bugs in JFS and ext2 that cannot be
fixed by fsck. Not same for JFS and ext2. See quote:
"There were two interesting fsync errors, one in JFS
and one in ext2. The ext2 bug is a case where an
implementation error points out a deeper design problem."
...
"We found two bugs (one in JFS, one in Reiser4) where crashed
disks cannot be recovered by fsck."


>
> 						- Ted
>
>



-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 15:59     ` sfaibish
  2007-02-15 16:39       ` Dave Kleikamp
@ 2007-02-15 18:54       ` Dawson Engler
       [not found]         ` <21e789ec0702151118x1c6af801gd34981d72db0f5b2@mail.gmail.com>
  1 sibling, 1 reply; 20+ messages in thread
From: Dawson Engler @ 2007-02-15 18:54 UTC (permalink / raw)
  To: sfaibish
  Cc: Dave Kleikamp, Theodore Tso, Valerie Henson, linux-fsdevel,
	Can Sar, Junfeng Yang

> It was my understanding from the persentation of Dawson that ext3 and jfs  
> have
> same problem. It is not an ext2 only problem. Also whatever solution we  
> adopt
> we need to be sure that we test it using the eXplode methodology.

apologies for dropping in randomly into the discussion: if this is
about the crash-during-recovery bugs, the specific ones i discussed
have been fixed in jfs and ext3 (junfeng: this is correct, right?).

i should have made this clear in the talk (along with many other things:
grabbing junfeng's slides and blathering about them w/o preperation is
not the right algorithm for giving a good talk.)

the other error --- fsync of file data on ext2 that reuses a freed inode
from a file that was not flushed to disk ---- is still open.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
       [not found]         ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com>
@ 2007-02-15 19:26           ` Dave Kleikamp
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Kleikamp @ 2007-02-15 19:26 UTC (permalink / raw)
  To: Junfeng Yang
  Cc: sfaibish, Theodore Tso, Valerie Henson, linux-fsdevel, Can Sar,
	Dawson Engler

On Thu, 2007-02-15 at 11:11 -0800, Junfeng Yang wrote:
>         Hmm.  If jfs has the problem, it is a bug.  jfs is designed to
>         handle
>         this correctly.  I'm pretty sure I've fixed at least one bug
>         that 
>         eXplode has uncovered in the past.  I'm not sure what was
>         mentioned in
>         the presentation though.  I'd like any information about
>         current
>         problems in jfs.
> 
> 
> I believe you have fixed the JFS fsync bug, Dave.  It was caused by
> reusing a directory inode as a file inode.  If the machine crashes
> later, fsck would think this file is a directory, and clear all its
> data. 

Yeah.  That one was fixed a while back.  Thanks for clearing this up.

Shaggy

-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
       [not found]           ` <21e789ec0702151128x744f61e5lb24d2da972af185a@mail.gmail.com>
@ 2007-02-16  1:18             ` Theodore Tso
  0 siblings, 0 replies; 20+ messages in thread
From: Theodore Tso @ 2007-02-16  1:18 UTC (permalink / raw)
  To: Junfeng Yang
  Cc: engler, sfaibish, Dave Kleikamp, Valerie Henson, linux-fsdevel, Can Sar

On Thu, Feb 15, 2007 at 11:28:46AM -0800, Junfeng Yang wrote:
> 
> Actually,  we found a crash-during-recovery bug in ext3 too.  It's a race
> between resetting the journal super block and replay of the journal.  This
> bug was fixed by Ted long time ago (3 years?).

That was found in your original work (using UML) not the more recent
work using EXPLODE, correct?

						- Ted


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 15:09   ` Dave Kleikamp
  2007-02-15 15:59     ` sfaibish
@ 2007-02-20 21:13     ` Valerie Henson
       [not found]       ` <21e789ec0702201330x1c2706b7kcd055b97cb37e0e@mail.gmail.com>
  1 sibling, 1 reply; 20+ messages in thread
From: Valerie Henson @ 2007-02-20 21:13 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Theodore Tso, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler

On Thu, Feb 15, 2007 at 09:09:22AM -0600, Dave Kleikamp wrote:
> 
> Right.  Who's still using ext2?

Google. (GoogleFS runs on top of ext2.)

-VAL

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-15 14:20 ` Theodore Tso
  2007-02-15 15:09   ` Dave Kleikamp
@ 2007-02-20 21:30   ` Valerie Henson
  2007-02-20 22:12     ` Erez Zadok
  1 sibling, 1 reply; 20+ messages in thread
From: Valerie Henson @ 2007-02-20 21:30 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler

On Thu, Feb 15, 2007 at 09:20:21AM -0500, Theodore Tso wrote:
> 
> It's actually not the case that fsck will complete the truncate for
> file A.  The problem is that while e2fsck is processing indirect
> blocks in pass 1, the block which is marked as file A's indirect block
> (but which actually contain's file B's data) gets "fixed" when e2fsck
> sees block numbers which look like illegal block numbers.  So this
> ends up corrupting file B's data.

Ah, that's what happens.  Thanks for the clarification.

> This is actually legal end result, BTW, since it's POSIX states the
> result of fsync() is undefined if the system crashes.  Technically

And POSIX also states that sync() is only required to schedule the
writes, but may return before the actual writing is done.  Looks like
the only way you can guarantee data is on-disk according to POSIX is
to reboot the system after every synchronous write.  Man, we file
systems developers sure have it easy!

-VAL

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
       [not found]       ` <21e789ec0702201330x1c2706b7kcd055b97cb37e0e@mail.gmail.com>
@ 2007-02-20 21:39         ` Valerie Henson
  2007-02-20 21:47           ` Dawson Engler
  2007-02-20 22:25           ` Dave Kleikamp
  0 siblings, 2 replies; 20+ messages in thread
From: Valerie Henson @ 2007-02-20 21:39 UTC (permalink / raw)
  To: Junfeng Yang
  Cc: Dave Kleikamp, Theodore Tso, linux-fsdevel, Can Sar, Dawson Engler

On Tue, Feb 20, 2007 at 01:30:25PM -0800, Junfeng Yang wrote:
> On 2/20/07, Valerie Henson <val_henson@linux.intel.com> wrote:
> >
> >Google. (GoogleFS runs on top of ext2.)
>
> It's surprising to know that... I guess they reply on GoogleFS's own
> replication and checksumming for consistency.

Yep, they just want a local file system with ultrafast on-line
performance.  They don't care about recovery time particularly because
of the GoogleFS replication (although I heard rumors they have some
fast fsck scheme, maybe resembling the dirty bit stuff I did last
year).

-VAL

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-20 21:39         ` Valerie Henson
@ 2007-02-20 21:47           ` Dawson Engler
  2007-02-20 22:25           ` Dave Kleikamp
  1 sibling, 0 replies; 20+ messages in thread
From: Dawson Engler @ 2007-02-20 21:47 UTC (permalink / raw)
  To: Valerie Henson
  Cc: Junfeng Yang, Dave Kleikamp, Theodore Tso, linux-fsdevel, Can Sar

> On Tue, Feb 20, 2007 at 01:30:25PM -0800, Junfeng Yang wrote:
> > On 2/20/07, Valerie Henson <val_henson@linux.intel.com> wrote:
> > >
> > >Google. (GoogleFS runs on top of ext2.)
> >
> > It's surprising to know that... I guess they reply on GoogleFS's own
> > replication and checksumming for consistency.
> 
> Yep, they just want a local file system with ultrafast on-line
> performance.  They don't care about recovery time particularly because
> of the GoogleFS replication (although I heard rumors they have some
> fast fsck scheme, maybe resembling the dirty bit stuff I did last
> year).

Actually, according to the GFS paper (which may be out of date), for
the chunkservers that is true, but for their "master" they really want
fast recovery as a way to reduce mean-time-to-repair (and thus increase
availability).

Though, given that they have shadow masters perhaps everyone is happy as
long as master recovery usually fast.

Dawson

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-20 21:30   ` Valerie Henson
@ 2007-02-20 22:12     ` Erez Zadok
  0 siblings, 0 replies; 20+ messages in thread
From: Erez Zadok @ 2007-02-20 22:12 UTC (permalink / raw)
  To: Valerie Henson
  Cc: Theodore Tso, linux-fsdevel, Can Sar, Junfeng Yang, Dawson Engler

In message <20070220213014.GC5264@nifty>, Valerie Henson writes:
> On Thu, Feb 15, 2007 at 09:20:21AM -0500, Theodore Tso wrote:

> And POSIX also states that sync() is only required to schedule the
> writes, but may return before the actual writing is done.  Looks like

One more reason to form a group to discuss POSIX updates/changes (as per LSF
last week).

> the only way you can guarantee data is on-disk according to POSIX is
> to reboot the system after every synchronous write.  Man, we file
> systems developers sure have it easy!

No need to be that extreme. :-) It should be enough to just unmount all file
systems, unload all fs and disk drivers, then reload+remount everything.
No?

> -VAL

Erez.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fix(es) for ext2 fsync bug
  2007-02-20 21:39         ` Valerie Henson
  2007-02-20 21:47           ` Dawson Engler
@ 2007-02-20 22:25           ` Dave Kleikamp
  1 sibling, 0 replies; 20+ messages in thread
From: Dave Kleikamp @ 2007-02-20 22:25 UTC (permalink / raw)
  To: Valerie Henson
  Cc: Junfeng Yang, Theodore Tso, linux-fsdevel, Can Sar, Dawson Engler

On Tue, 2007-02-20 at 21:39 +0000, Valerie Henson wrote:
> On Tue, Feb 20, 2007 at 01:30:25PM -0800, Junfeng Yang wrote:
> > On 2/20/07, Valerie Henson <val_henson@linux.intel.com> wrote:
> > >
> > >Google. (GoogleFS runs on top of ext2.)
> >
> > It's surprising to know that... I guess they reply on GoogleFS's own
> > replication and checksumming for consistency.
> 
> Yep, they just want a local file system with ultrafast on-line
> performance.  They don't care about recovery time particularly because
> of the GoogleFS replication (although I heard rumors they have some
> fast fsck scheme, maybe resembling the dirty bit stuff I did last
> year).

I wonder if they would consider this a important bug?  I know nothing
about GoogleFS, but I would guess that they have more sophisticated
recovery than relying on an fsync shortly before a crash to ensure data
integrity.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-02-20 22:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-14 19:54 Fix(es) for ext2 fsync bug Valerie Henson
2007-02-14 20:31 ` David Chinner
2007-02-14 21:26   ` Dave Kleikamp
2007-02-14 23:32     ` David Chinner
2007-02-14 21:08 ` sfaibish
2007-02-15 14:20 ` Theodore Tso
2007-02-15 15:09   ` Dave Kleikamp
2007-02-15 15:59     ` sfaibish
2007-02-15 16:39       ` Dave Kleikamp
2007-02-15 17:15         ` Theodore Tso
2007-02-15 17:52           ` sfaibish
     [not found]         ` <21e789ec0702151111v4cb2aa8dqa168c886cb909c9@mail.gmail.com>
2007-02-15 19:26           ` Dave Kleikamp
2007-02-15 18:54       ` Dawson Engler
     [not found]         ` <21e789ec0702151118x1c6af801gd34981d72db0f5b2@mail.gmail.com>
     [not found]           ` <21e789ec0702151128x744f61e5lb24d2da972af185a@mail.gmail.com>
2007-02-16  1:18             ` Theodore Tso
2007-02-20 21:13     ` Valerie Henson
     [not found]       ` <21e789ec0702201330x1c2706b7kcd055b97cb37e0e@mail.gmail.com>
2007-02-20 21:39         ` Valerie Henson
2007-02-20 21:47           ` Dawson Engler
2007-02-20 22:25           ` Dave Kleikamp
2007-02-20 21:30   ` Valerie Henson
2007-02-20 22:12     ` Erez Zadok

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.