linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jeffrey E. Hundstad" <jeffrey@hundstad.net>
To: Ville Herva <vherva@niksula.hut.fi>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Something corrupts raid5 disks slightly during reboot
Date: Fri, 31 Oct 2003 19:41:30 -0600	[thread overview]
Message-ID: <3FA30F4A.5030500@hundstad.net> (raw)
In-Reply-To: <20031031190829.GM4868@niksula.cs.hut.fi>

Try:

hdparm -W0 /dev/hdX

for each of your ide drives.  This turns off write-caching which is 
usually a bad thing with ide drives anyway.

Ville Herva wrote:

>I've been experiencing strange corruption on a raid5 volume for some time.
>Basically, after unmounting the filesystem, I can mount it again without
>problems. I can also raidstop the raid device in between and all is still
>fine:
>
>  
>
>>umount /dev/md4; mount /dev/md4
>>    
>>
>    - no corruption
>  
>
>>umount /dev/md4; raidstop /dev/md4; raidstart /dev/md4; mount /dev/md4
>>    
>>
>    - no corruption
>
>But after a reboot, the filesystem is corrupted:
>
>  
>
>>mount /dev/md4
>>    
>>
> EXT2-fs error (device md(9,4)): ext2_check_descriptors: Block bitmap for
> group 17 not in group (block 0)!
> EXT2-fs: group descriptors corrupted !
>
>(This is recoverable with e2fsck.)
>
>The array consists of three 80GB Samsung disks in raid5 mode, but I
>experienced this problem with two of the disks in raid0 mode, too. The raid
>consists of raw disks hdb,hdc,hdg (rather than partitions hdb1,hdc1,hdg1).
>
>On the same box I have three other raid arrays on different disks, all of
>which consist of partitions. These do not show corruption on boot.
>
>I made a little experiment and saved first megabyte of hd[bcg] between
>umount,mount and umount,raidstop,raidstart,mount operations. They did not
>change.
>
>The I did umount,raidstop and rebooted. After boot, the beginning hdb was
>intact, but hdc and hdg had been tampered. (Unfortunately, raidstart was
>automatically run on boot, but I did raidstop as the first thing.)
>
>I narrowed the difference down to bytes between 1060-1080 on hdc:
>
>root@linux:/scratch>od -x hdc_bytes-1060-1080_before_boot
>0000000 1e1e 00d0 000d 00d0 752e 4264 7714 3fa2
>0000020 0002 0014
>root@linux:/scratch>od -x hdc_bytes-1060-1080_after_boot
>0000000 1e1e 00d0 000d 00d0 75ff 4264 7427 3fa2
>0000020 0003 0014
>
>On hdg, this range differed too:
>
>root@linux:/scratch>od -x hdg_bytes-1060-1080_after_boot
>0000000 8000 0000 8000 0000 7526 3fa2 7539 3fa2
>0000020 0002 0014
>root@linux:/scratch>od -x hdg_bytes-1060-1080_after_boot
>0000000 8000 0000 8000 0000 75f7 3fa2 760a 3fa2
>0000020 0003 0014
>
>But there was additional difference somewhere between 1kB and 5kB that
>wasn't there on hdc.
>
>When I copied the saved 1MB blocks back in place, the fs mounted without
>problems.
>
>AFAIK, the first 512b on each disk should be the raid superblock and the
>next 512 may be ext2 superblock. I assume 1060-1080 falls into group
>descriptor table that gets corrupted.
>
>It may be something in userspace that corrupts the disks, but I cannot think
>what it could be.
>
>Right now, the kernel is 2.2.25-secure + patches, but earlier 2.2.x kernels
>exhibited this as well. These include the newest raid 0.90 patches for 2.2.
>
>Any ideas what might cause this or how to debug this further?
>
>
>-- v --
>
>v@iki.fi
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>


  reply	other threads:[~2003-11-01  1:41 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-31 19:08 Something corrupts raid5 disks slightly during reboot Ville Herva
2003-11-01  1:41 ` Jeffrey E. Hundstad [this message]
2003-11-01  1:57   ` Mike Fedyk
2003-11-01  8:33     ` Ville Herva
2003-11-01  8:27   ` ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot] Ville Herva
2003-11-01 15:56     ` Willy Tarreau
2003-11-01 18:25       ` Ville Herva
2003-11-01 19:01         ` Willy Tarreau
2003-11-01 21:02           ` Ville Herva
2003-11-02  6:05             ` Andre Hedrick
2003-11-02  8:28               ` Ville Herva
2003-11-02 20:57                 ` Matthias Andree
2003-11-03  5:34                 ` Andre Hedrick
2003-11-03  6:38                   ` Ville Herva
2004-01-02 19:42           ` Something corrupts raid5 disks slightly during reboot Ville Herva
2004-01-02 20:02             ` Ville Herva
2004-01-14 14:46             ` Ville Herva
2004-01-14 22:22               ` Willy Tarreau
2004-01-14 22:46                 ` Ville Herva
2004-01-14 16:39 Samium Gromoff
2004-01-14 22:30 ` Ville Herva
2004-01-15 12:42   ` Samium Gromoff
2004-01-15 19:57     ` Ville Herva
2004-01-16 10:24       ` Samium Gromoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3FA30F4A.5030500@hundstad.net \
    --to=jeffrey@hundstad.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vherva@niksula.hut.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).