Re: ext4: Always verify extent tree blocks

From: "Darrick J. Wong" <djwong@us.ibm.com>
To: "黃一軒 (I-Hsuan Huang)" <ihhuang@abmail.org>
Cc: linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: ext4: Always verify extent tree blocks
Date: Tue, 30 Aug 2011 16:04:19 -0700	[thread overview]
Message-ID: <20110830230419.GF11984@tux1.beaverton.ibm.com> (raw)
In-Reply-To: <CAEd0oO1wm0iinsUJ2zwZM_+i2BnfU7uNA77ReiAcFTdwomKfkA@mail.gmail.com>

On Tue, Aug 30, 2011 at 09:08:59AM +0800, 黃一軒 (I-Hsuan Huang) wrote:
> Dear Darrick,
> 
> Thanks for your correction.
> Would you please kindly give me some idea about how to prevent extent
> attribute block error?
> I would like to try journal_checksum.

First you'd have to figure out how the extent attribute got corrupted (bad
disk, bad cable, bad memory, bad software....) and then we could put together a
plan to fix it.  The ext4 metadata checksumming patches (which are nearly ready
to be sent out) would at least help with silent corruption.

--D
> 
> Thanks,
> IH
> 
> 2011/8/30 Darrick J. Wong <djwong@us.ibm.com>
> 
> > On Mon, Aug 29, 2011 at 01:40:11PM +0800, 黃一軒 (I-Hsuan Huang) wrote:
> > > Dear Darrick,
> > >
> > > I still met the following failure symptom with your patch.
> > >
> > > <2>[    3.311674] EXT4-fs error (device mmcblk0p23): ext4_iget: inode
> > > #22944: (comm init) bad extended attribute block 173879342
> > >
> > > The validation code looks like not work.
> > > Is changing the flag enough to force validating extent header?
> >
> > Extended attribute blocks are not extent tree blocks.
> >
> > The patch in question only made ext4 complain consistently if the extent
> > tree
> > got corrupt.
> >
> > --D
> > >
> > > Thanks,
> > > IH Huang
> > >
> > > 2011/8/25 Darrick J. Wong <djwong@us.ibm.com>
> > >
> > > > On Wed, Aug 24, 2011 at 05:54:15PM +0800, 黃一軒 (I-Hsuan Huang) wrote:
> > > > > Dear Darrick,
> > > > >
> > > > > I have read your ext4 patch on Patchwork and have a question about
> > your
> > > > ext4
> > > > > patch to validate the extent header.
> > > > > Is you patch related to the Bug 20992:
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=20992
> > > >
> > > > Yes, I think it would fix that bug.  That said, I'm pretty close to
> > posting
> > > > a
> > > > patch set to implement full metadata checksuming, which includes a
> > better
> > > > solution than what's in patchwork.  Basically it implements a new
> > > > bufferhead
> > > > flag that tracks whether or not a metadata buffer has been validated,
> > and
> > > > skips
> > > > the checks if it has.
> > > >
> > > > Obviously that won't help you if you overwrite the filesystem while
> > it's
> > > > mounted... but oh well, at least it'll fix your case.  It was meant to
> > fix
> > > > the
> > > > case where the first time a bad extent block gets read it fails the
> > check
> > > > and
> > > > complains, but subsequent accesses of the same block don't even check.
> > > >
> > > > --D
> > > > >
> > > > > Ted mentioned that
> > > > >
> > > > > *Yep, looks like a bug alright.
> > > > >
> > > > > From what I can tell, you were in the middle of async I/O, at the
> > time
> > > > when
> > > > > the
> > > > > disk was corrupted.  The problem seemed to come after the I/O was
> > > > completed,
> > > > > and  ext4_convert_unwritten_extents() was trying to set the
> > initialized
> > > > bit
> > > > > on
> > > > > the extent tree.  At that point the extent tree must have gotten
> > > > corrupted
> > > > > on
> > > > > disk, and this seriously confused the extent conversion code, which
> > ended
> > > > up
> > > > > passing 0 to ext4_ext_put_in_cache() as the length of the extent, and
> > > > that
> > > > > tripped the BUG_ON in ext4_ext_put_in_cache().
> > > > >
> > > > > How did you corrupt the file system while it was mounted?   Was it
> > via
> > > > some
> > > > > dd
> > > > > to the disk device directly?
> > > > >
> > > > > We do have code that checks to make sure the extent tree is sane, but
> > we
> > > > > skip
> > > > > it if the data was already in the buffer cache, to save CPU costs.
> >  But
> > > > if
> > > > > you
> > > > > wrote to the disk device directly, it would have gone through the
> > buffer
> > > > > cache,
> > > > > since the extent tree was already cached, we would have skipped the
> > > > > validation
> > > > > step, and that could be the explanation for how the bug got
> > triggered.
> > > > >
> > > > > If so, I'm loathe to turn on the validation check unconditionally,
> > since
> > > > > that
> > > > > would kill performance.  I can probably change the BUG_ON in
> > > > > ext4_put_in_cache() to rather set the cache state to "invalid", which
> > > > would
> > > > > at
> > > > > least prevent the BUG_ON.  The filesystem was probably well and truly
> > > > > trashed,
> > > > > though, so sooner or later the ext4 fs code would have hit something
> > to
> > > > > cause
> > > > > it to be very unhappy.  Hoepfully it would be an ext4_error() call to
> > > > mark
> > > > > the
> > > > > file system as corrupted, as opposed to another BUG_ON.
> > > > > *
> > > > > Have a nice day.
> > > > >
> > > > > Thanks,
> > > > > Randall
> > > >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html