From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Kleikamp Subject: Re: Fix(es) for ext2 fsync bug Date: Wed, 14 Feb 2007 15:26:22 -0600 Message-ID: <1171488382.13092.5.camel@kleikamp.austin.ibm.com> References: <20070214195453.GB7521@nifty> <20070214203101.GQ44411608@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Valerie Henson , linux-fsdevel@vger.kernel.org, Can Sar , Junfeng Yang , Dawson Engler , "Theodore Ts'o" To: David Chinner Return-path: Received: from e5.ny.us.ibm.com ([32.97.182.145]:51517 "EHLO e5.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932637AbXBNV0t (ORCPT ); Wed, 14 Feb 2007 16:26:49 -0500 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l1ELQmcS032328 for ; Wed, 14 Feb 2007 16:26:48 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.2) with ESMTP id l1ELQat2180884 for ; Wed, 14 Feb 2007 16:26:36 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l1ELQZQ7032035 for ; Wed, 14 Feb 2007 16:26:36 -0500 In-Reply-To: <20070214203101.GQ44411608@melbourne.sgi.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, 2007-02-15 at 07:31 +1100, David Chinner wrote: > On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote: > > Just some quick notes on possible ways to fix the ext2 fsync bug that > > eXplode found. Whether or not anyone will bother to implement it is > > another matter. > > > > Background: The eXplode file system checker found a bug in ext2 fsync > > behavior. Do the following: truncate file A, create file B which > > reallocates one of A's old indirect blocks, fsync file B. If you then > > crash before file A's metadata is all written out, fsck will complete > > the truncate for file A... thereby deleting file B's data. So fsync > > file B doesn't guarantee data is on disk after a crash. Details: > > > > http://www.stanford.edu/~engler/explode-osdi06.pdf > > > > Two possible solutions I can think of: > > > > * Rearrange order of duplicate block checking and fixing file size in > > fsck. Not sure how hard this is. (Ted?) > > > > * Keep a set of "still allocated on disk" block bitmaps that gets > > flushed whenever a sync happens. Don't allocate these blocks. > > Journaling file systems already have to do this. > > You don't need anything on disk or to fsck to fix this problem - > just avoid it completely by keeping a list of recently truncated > blocks in memory and don't reuse them until the old owner inode is > sync'd to disk. I think that's pretty much what Val is suggesting. She suggests bitmaps rather than a list though. Maybe she should have used a better term than "flushed", as this list only needs to be cleared, rather than written to disk. -- David Kleikamp IBM Linux Technology Center