linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.5 data corruption
@ 2001-06-12 20:17 Larry McVoy
  2001-06-13 15:09 ` Nathan Straz
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Larry McVoy @ 2001-06-12 20:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: tytso

Folks, I believe I have a reproducible test case which corrupts data in
2.4.5.

We do nightly, weekly, and monthly backups by copying our entire /home
partition on the company file server:

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda1             1.9G  1.7G  123M  93% /
/dev/hda6             1.9G  437M  1.4G  23% /tmp
/dev/sda1              37G   26G   11G  71% /home
/dev/sdc1              37G   26G   11G  70% /weekly
/dev/sdd1              37G   24G   13G  65% /monthly
/dev/sdb1              37G   26G   11G  71% /nightly

The sd? drives are actually ide drives on a 3ware escalade controller.
I have reason to believe the drives are good, before I installed them
I scrubbed them with varying data patterns and verified that that I got
back what I put there.  All tested cleanly overnight.

I recently added an integrity check to our backups - the integrity checker
writes out the path, the gzip adler32 checksum, the size, and the mtime of
each file.  Each time I do a backup, the backup scripts look for the 
integrity listing in the other partitions and compares all files with the
same path, size, and modtime.  

This morning I had a pile of errors after things having gone smoothly for
the last few weeks.  I suspected that I had screwed something up, looked
over the backup scripts, simplified them down to a simple cpio, and tried
again.  Another pile of errors, different set of files.  

In both cases, the newly created files were corrupted, the ones on the 
live /home partition as well as the /weekly & /monthly partitions all 
compared cleanly.

I rebooted into 2.2.19, tried again, no errors.  I was running 2.4.5,
no patches.  I power cycled the machine between each reboot, went through
the bios memory check, and also went through my own memory check; memory 
does not seem to be an issue.

I think I can reproduce this, it takes a reboot and about 2 hours.  I made
it happen twice with 2.4.5, the first try on 2.2.19 did not work.

The data corruption looks like *extra* bytes added at the beginning of
files.  I only looked at a few, if we go down the path of debugging this
I'll save them all next time.  The extra byte counts were small, in one
case there was the letter "1" added to the start of the file, other than
that it was identical.  That's really weird, as a file system guy, I'd
expect to see blocks of data not small chunks of data.  Very strange.

One thing I haven't done is to rule out the 3ware controller.  I tend to
doubt it is the problem but who knows.  

There were no kernel messages complaining about anything during the 
backup, so the kernel doesn't seem to know there is a problem.

So, does anyone recognize these symptoms?  Does anyone care?  

^ permalink raw reply	[flat|nested] 14+ messages in thread
[parent not found: <53B208BD9A7FD311881A009027B6BBFB9EACFE@siamese>]

end of thread, other threads:[~2001-06-19 20:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-12 20:17 2.4.5 data corruption Larry McVoy
2001-06-13 15:09 ` Nathan Straz
2001-06-13 23:39 ` Chris Mason
2001-06-14 18:20 ` Alan Cox
2001-06-14 22:27   ` Eugene Crosser
2001-06-14 22:49     ` Alan Cox
2001-06-15 19:54       ` Eugene Crosser
2001-06-15 20:17         ` Larry McVoy
2001-06-15 12:02     ` Russell Leighton
2001-06-19  3:00   ` Stefan Traby
2001-06-19  7:49     ` Alan Cox
2001-06-19  9:13   ` Pedro M. Rodrigues
     [not found] <53B208BD9A7FD311881A009027B6BBFB9EACFE@siamese>
2001-06-19 19:01 ` Alan Cox
2001-06-19 20:06   ` Stefan Traby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).