All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Kevin Shanahan <kmshanah@ucwb.org.au>
Cc: linux-ext4@vger.kernel.org
Subject: Re: More ext4 acl/xattr corruption - 4th occurence now
Date: Thu, 14 May 2009 10:07:32 -0400	[thread overview]
Message-ID: <20090514140732.GI11352@mit.edu> (raw)
In-Reply-To: <20090514132506.GD5146@kulgan>

On Thu, May 14, 2009 at 10:55:06PM +0930, Kevin Shanahan wrote:
> On Thu, May 14, 2009 at 08:37:00PM +0930, Kevin Shanahan wrote:
> > Sure - now running with 2.6.29.3 + your patch.
> > 
> >   patching file fs/ext4/inode.c
> >   Hunk #1 succeeded at 1040 with fuzz 1 (offset -80 lines).
> >   Hunk #2 succeeded at 1113 (offset -81 lines).
> >   Hunk #3 succeeded at 1184 (offset -93 lines).
> > 
> > I'll report any hits for "check_block_validity" in syslog.
> 
> That didn't take long:
> 
> May 14 22:49:17 hermes kernel: EXT4-fs error (device dm-0): check_block_validity: inode #759 logical block 1741329 mapped to 529 (size 1)
> May 14 22:49:17 hermes kernel: Aborting journal on device dm-0:8.
> May 14 22:49:17 hermes kernel: ext4_da_writepages: jbd2_start: 293 pages, ino 759; err -30
> May 14 22:49:17 hermes kernel: Pid: 374, comm: pdflush Not tainted 2.6.29.3 #1
> May 14 22:49:17 hermes kernel: Call Trace:

The reason why I put in the ext4_error() was because I figured it was
better to stop the filesystem from stomping on the inode table (since
that way would lie data loss).  It would also freeze the filesystem so
we could see what was happening.

Could you run debugfs on the filesystem, and then send me the results
of these commands:

debugfs: stat <759>
debugfs: ncheck 759

Also, could you try run e2fsck on the filesystem and send me the
output?  The ext4_error() should have stopped the kernel from doing
any damage (like stomping on block 529, which was part of the inode
table or block group descriptors).  If send me the dumpe2fs output
about group 0, we'll know for sure (i.e.):

Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
  Checksum 0x3e19, unused inodes 8169
  Primary superblock at 0, Group descriptors at 1-5
  Reserved GDT blocks at 6-32
  Block bitmap at 33 (+33), Inode bitmap at 49 (+49)
  Inode table at 65-576 (+65)

(Your numbers will be different; this was from a 80GB hard drive).

> Any clues there? I don't think I'll be able to run this during the day
> if it's going to trigger and remount the fs read-only as easily as
> this.

Yes, this is a *huge*.  Finding this may have been just what we needed
to determine what has been going on.  Thank you!!

What I would suggest doing is to mount the filesystem with the mount
option -o nodelalloc.  This would be useful for two fronts: (1) it
would determine whether or not the problem shows up if we suppress
delayed allocation, and (2) if it works, hopefully it will prevent you
from further filesystem corruption incidents.  (At least as a
workaround; turning off delayed allocation will result in a
performance decrease, but better that then data loss, I always say!)

The fact that this triggered so quickly is actually a bit scary, so
hopefully we'll be able to track down exactly what happened fairly
quickly from here, and then fix the problem for real.

Thanks,

					- Ted

  reply	other threads:[~2009-05-14 14:07 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-13  6:26 More ext4 acl/xattr corruption - 4th occurence now Kevin Shanahan
2009-05-13 23:56 ` Kevin Shanahan
2009-05-14  4:40 ` Theodore Tso
2009-05-14 11:07   ` Kevin Shanahan
2009-05-14 11:17     ` Manish Katiyar
2009-05-14 12:30       ` Theodore Tso
2009-05-14 13:25     ` Kevin Shanahan
2009-05-14 14:07       ` Theodore Tso [this message]
2009-05-14 14:30         ` Kevin Shanahan
2009-05-14 15:44           ` Eric Sandeen
2009-05-14 21:07             ` Kevin Shanahan
2009-05-14 21:08               ` Eric Sandeen
2009-05-14 16:12           ` Theodore Tso
2009-05-14 21:02             ` Kevin Shanahan
2009-05-14 21:23               ` Theodore Tso
2009-05-14 21:33                 ` Kevin Shanahan
2009-05-15 23:18                   ` Kevin Shanahan
2009-05-15  1:21                 ` Eric Sandeen
2009-05-15 12:50                   ` Theodore Tso
2009-05-15 12:58                     ` Eric Sandeen
2009-05-15 15:24                       ` Eric Sandeen
2009-05-15 16:27                         ` Eric Sandeen
2009-05-15  4:55                 ` Aneesh Kumar K.V
2009-05-15 10:11                   ` Theodore Tso
2009-05-15 13:07                   ` Theodore Tso
2009-05-19 10:00                 ` Thierry Vignaud
2009-05-19 11:36                   ` Theodore Tso
2009-05-19 12:01                     ` Alex Tomas
2009-05-19 15:04                       ` Theodore Tso
2009-05-19 15:16                         ` Alex Tomas
2009-05-19 15:18                         ` Thierry Vignaud
2009-05-15  3:57             ` Alex Tomas
2009-05-15  4:58   ` Aneesh Kumar K.V
2009-05-15 10:27     ` Theodore Tso
2009-05-18  2:14       ` [PATCH] ext4: Add a comprehensive block validity check to ext4_get_blocks() (Was: More ext4 acl/xattr corruption - 4th occurence now) Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090514140732.GI11352@mit.edu \
    --to=tytso@mit.edu \
    --cc=kmshanah@ucwb.org.au \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.