From: Theodore Tso <tytso-3s7WtUTddSA@public.gmane.org> To: Sylvain Rochet <gradator-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: Re: Fw: 2.6.28.9: EXT3/NFS inodes corruption Date: Wed, 22 Apr 2009 20:11:39 -0400 [thread overview] Message-ID: <20090423001139.GX15541@mit.edu> (raw) In-Reply-To: <20090422234823.GA24477-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> On Thu, Apr 23, 2009 at 01:48:23AM +0200, Sylvain Rochet wrote: > > > > This is on the client side; what happens when you look at the same > > directory from the server side? > > This is on the server side ;) > On the server side, that means you also an inode table block look corrupted. I'm pretty sure that if you used debugfs to examine those blocks you would have seen that the inodes were completely garbaged. Depending on the inode size, and assuming a 4k block size, there are typically 128 or 64 inodes in a 4k block, so if you were to look at the inodes by inode number, you normally find that adjacent inodes are corrupted within a 4k block. Of course, this just tells us what had gotten damaged; whether it was damanged by a kernel bug, a memory bug, a hard drive or controller failure (and there are multiple types of storage stack failures; complete garbage getting written into the right place, and the right data getting written into the wrong place). > Well, this is the inode numbers of directories with entries pointing on > inexisting inodes, of course we cannot delete these directories anymore > through a regular recursive deletion (well, without debugfs ;). > Considering the amount of inodes, this is quite a very low corruption > rate. Well, sure, but any amount of corruption is extremely troubling.... > Yes, this is what we thought too, especially because we use ext3/nfs for > a very long time without problem like that. I moved all the data to the > backup array so we can now do read-write tests on the primary one > without impacting much the production. > > So, let's check the raid6 array, well, this is going to take a few days. > > # badblocks -w -s /dev/md10 > > If everything goes well I will check disk by disk. > > By the way, if such corruptions doesn't happen on the backup storage > array we can conclude to a hardware problem around the primary one, but, > we are not going to be able to conclude before a few weeks. Good luck!! - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Theodore Tso <tytso@mit.edu> To: Sylvain Rochet <gradator-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> Cc: Andrew Morton <akpm@linux-foundation.org>, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org Subject: Re: Fw: 2.6.28.9: EXT3/NFS inodes corruption Date: Wed, 22 Apr 2009 20:11:39 -0400 [thread overview] Message-ID: <20090423001139.GX15541@mit.edu> (raw) In-Reply-To: <20090422234823.GA24477-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> On Thu, Apr 23, 2009 at 01:48:23AM +0200, Sylvain Rochet wrote: > > > > This is on the client side; what happens when you look at the same > > directory from the server side? > > This is on the server side ;) > On the server side, that means you also an inode table block look corrupted. I'm pretty sure that if you used debugfs to examine those blocks you would have seen that the inodes were completely garbaged. Depending on the inode size, and assuming a 4k block size, there are typically 128 or 64 inodes in a 4k block, so if you were to look at the inodes by inode number, you normally find that adjacent inodes are corrupted within a 4k block. Of course, this just tells us what had gotten damaged; whether it was damanged by a kernel bug, a memory bug, a hard drive or controller failure (and there are multiple types of storage stack failures; complete garbage getting written into the right place, and the right data getting written into the wrong place). > Well, this is the inode numbers of directories with entries pointing on > inexisting inodes, of course we cannot delete these directories anymore > through a regular recursive deletion (well, without debugfs ;). > Considering the amount of inodes, this is quite a very low corruption > rate. Well, sure, but any amount of corruption is extremely troubling.... > Yes, this is what we thought too, especially because we use ext3/nfs for > a very long time without problem like that. I moved all the data to the > backup array so we can now do read-write tests on the primary one > without impacting much the production. > > So, let's check the raid6 array, well, this is going to take a few days. > > # badblocks -w -s /dev/md10 > > If everything goes well I will check disk by disk. > > By the way, if such corruptions doesn't happen on the backup storage > array we can conclude to a hardware problem around the primary one, but, > we are not going to be able to conclude before a few weeks. Good luck!! - Ted
next prev parent reply other threads:[~2009-04-23 0:11 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-04-22 21:24 Fw: 2.6.28.9: EXT3/NFS inodes corruption Andrew Morton 2009-04-22 22:44 ` Theodore Tso [not found] ` <20090422224455.GV15541-3s7WtUTddSA@public.gmane.org> 2009-04-22 23:48 ` Sylvain Rochet 2009-04-22 23:48 ` Sylvain Rochet [not found] ` <20090422234823.GA24477-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> 2009-04-23 0:11 ` Theodore Tso [this message] 2009-04-23 0:11 ` Theodore Tso [not found] ` <20090423001139.GX15541-3s7WtUTddSA@public.gmane.org> 2009-04-23 23:14 ` Sylvain Rochet 2009-04-23 23:14 ` Sylvain Rochet
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20090423001139.GX15541@mit.edu \ --to=tytso-3s7wtutddsa@public.gmane.org \ --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \ --cc=gradator-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org \ --cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.