All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Tso <tytso-3s7WtUTddSA@public.gmane.org>
To: Sylvain Rochet <gradator-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Fw: 2.6.28.9: EXT3/NFS inodes corruption
Date: Wed, 22 Apr 2009 20:11:39 -0400	[thread overview]
Message-ID: <20090423001139.GX15541@mit.edu> (raw)
In-Reply-To: <20090422234823.GA24477-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>

On Thu, Apr 23, 2009 at 01:48:23AM +0200, Sylvain Rochet wrote:
> > 
> > This is on the client side; what happens when you look at the same
> > directory from the server side?
> 
> This is on the server side ;)
> 

On the server side, that means you also an inode table block look
corrupted.  I'm pretty sure that if you used debugfs to examine those
blocks you would have seen that the inodes were completely garbaged.
Depending on the inode size, and assuming a 4k block size, there are
typically 128 or 64 inodes in a 4k block, so if you were to look at
the inodes by inode number, you normally find that adjacent inodes are
corrupted within a 4k block.  Of course, this just tells us what had
gotten damaged; whether it was damanged by a kernel bug, a memory bug,
a hard drive or controller failure (and there are multiple types of
storage stack failures; complete garbage getting written into the
right place, and the right data getting written into the wrong place).

> Well, this is the inode numbers of directories with entries pointing on 
> inexisting inodes, of course we cannot delete these directories anymore 
> through a regular recursive deletion (well, without debugfs ;). 
> Considering the amount of inodes, this is quite a very low corruption 
> rate.

Well, sure, but any amount of corruption is extremely troubling....

> Yes, this is what we thought too, especially because we use ext3/nfs for 
> a very long time without problem like that. I moved all the data to the 
> backup array so we can now do read-write tests on the primary one 
> without impacting much the production.
> 
> So, let's check the raid6 array, well, this is going to take a few days.
> 
> # badblocks -w -s /dev/md10
> 
> If everything goes well I will check disk by disk.
> 
> By the way, if such corruptions doesn't happen on the backup storage 
> array we can conclude to a hardware problem around the primary one, but, 
> we are not going to be able to conclude before a few weeks.

Good luck!!

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Theodore Tso <tytso@mit.edu>
To: Sylvain Rochet <gradator-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: Fw: 2.6.28.9: EXT3/NFS inodes corruption
Date: Wed, 22 Apr 2009 20:11:39 -0400	[thread overview]
Message-ID: <20090423001139.GX15541@mit.edu> (raw)
In-Reply-To: <20090422234823.GA24477-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>

On Thu, Apr 23, 2009 at 01:48:23AM +0200, Sylvain Rochet wrote:
> > 
> > This is on the client side; what happens when you look at the same
> > directory from the server side?
> 
> This is on the server side ;)
> 

On the server side, that means you also an inode table block look
corrupted.  I'm pretty sure that if you used debugfs to examine those
blocks you would have seen that the inodes were completely garbaged.
Depending on the inode size, and assuming a 4k block size, there are
typically 128 or 64 inodes in a 4k block, so if you were to look at
the inodes by inode number, you normally find that adjacent inodes are
corrupted within a 4k block.  Of course, this just tells us what had
gotten damaged; whether it was damanged by a kernel bug, a memory bug,
a hard drive or controller failure (and there are multiple types of
storage stack failures; complete garbage getting written into the
right place, and the right data getting written into the wrong place).

> Well, this is the inode numbers of directories with entries pointing on 
> inexisting inodes, of course we cannot delete these directories anymore 
> through a regular recursive deletion (well, without debugfs ;). 
> Considering the amount of inodes, this is quite a very low corruption 
> rate.

Well, sure, but any amount of corruption is extremely troubling....

> Yes, this is what we thought too, especially because we use ext3/nfs for 
> a very long time without problem like that. I moved all the data to the 
> backup array so we can now do read-write tests on the primary one 
> without impacting much the production.
> 
> So, let's check the raid6 array, well, this is going to take a few days.
> 
> # badblocks -w -s /dev/md10
> 
> If everything goes well I will check disk by disk.
> 
> By the way, if such corruptions doesn't happen on the backup storage 
> array we can conclude to a hardware problem around the primary one, but, 
> we are not going to be able to conclude before a few weeks.

Good luck!!

						- Ted

  parent reply	other threads:[~2009-04-23  0:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-22 21:24 Fw: 2.6.28.9: EXT3/NFS inodes corruption Andrew Morton
2009-04-22 22:44 ` Theodore Tso
     [not found]   ` <20090422224455.GV15541-3s7WtUTddSA@public.gmane.org>
2009-04-22 23:48     ` Sylvain Rochet
2009-04-22 23:48       ` Sylvain Rochet
     [not found]       ` <20090422234823.GA24477-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>
2009-04-23  0:11         ` Theodore Tso [this message]
2009-04-23  0:11           ` Theodore Tso
     [not found]           ` <20090423001139.GX15541-3s7WtUTddSA@public.gmane.org>
2009-04-23 23:14             ` Sylvain Rochet
2009-04-23 23:14               ` Sylvain Rochet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090423001139.GX15541@mit.edu \
    --to=tytso-3s7wtutddsa@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=gradator-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org \
    --cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.