linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ville Herva <vherva@niksula.hut.fi>
To: linux-kernel@vger.kernel.org
Subject: Something corrupts raid5 disks slightly during reboot
Date: Fri, 31 Oct 2003 21:08:29 +0200	[thread overview]
Message-ID: <20031031190829.GM4868@niksula.cs.hut.fi> (raw)

I've been experiencing strange corruption on a raid5 volume for some time.
Basically, after unmounting the filesystem, I can mount it again without
problems. I can also raidstop the raid device in between and all is still
fine:

> umount /dev/md4; mount /dev/md4
    - no corruption
> umount /dev/md4; raidstop /dev/md4; raidstart /dev/md4; mount /dev/md4
    - no corruption

But after a reboot, the filesystem is corrupted:

> mount /dev/md4
 EXT2-fs error (device md(9,4)): ext2_check_descriptors: Block bitmap for
 group 17 not in group (block 0)!
 EXT2-fs: group descriptors corrupted !

(This is recoverable with e2fsck.)

The array consists of three 80GB Samsung disks in raid5 mode, but I
experienced this problem with two of the disks in raid0 mode, too. The raid
consists of raw disks hdb,hdc,hdg (rather than partitions hdb1,hdc1,hdg1).

On the same box I have three other raid arrays on different disks, all of
which consist of partitions. These do not show corruption on boot.

I made a little experiment and saved first megabyte of hd[bcg] between
umount,mount and umount,raidstop,raidstart,mount operations. They did not
change.

The I did umount,raidstop and rebooted. After boot, the beginning hdb was
intact, but hdc and hdg had been tampered. (Unfortunately, raidstart was
automatically run on boot, but I did raidstop as the first thing.)

I narrowed the difference down to bytes between 1060-1080 on hdc:

root@linux:/scratch>od -x hdc_bytes-1060-1080_before_boot
0000000 1e1e 00d0 000d 00d0 752e 4264 7714 3fa2
0000020 0002 0014
root@linux:/scratch>od -x hdc_bytes-1060-1080_after_boot
0000000 1e1e 00d0 000d 00d0 75ff 4264 7427 3fa2
0000020 0003 0014

On hdg, this range differed too:

root@linux:/scratch>od -x hdg_bytes-1060-1080_after_boot
0000000 8000 0000 8000 0000 7526 3fa2 7539 3fa2
0000020 0002 0014
root@linux:/scratch>od -x hdg_bytes-1060-1080_after_boot
0000000 8000 0000 8000 0000 75f7 3fa2 760a 3fa2
0000020 0003 0014

But there was additional difference somewhere between 1kB and 5kB that
wasn't there on hdc.

When I copied the saved 1MB blocks back in place, the fs mounted without
problems.

AFAIK, the first 512b on each disk should be the raid superblock and the
next 512 may be ext2 superblock. I assume 1060-1080 falls into group
descriptor table that gets corrupted.

It may be something in userspace that corrupts the disks, but I cannot think
what it could be.

Right now, the kernel is 2.2.25-secure + patches, but earlier 2.2.x kernels
exhibited this as well. These include the newest raid 0.90 patches for 2.2.

Any ideas what might cause this or how to debug this further?


-- v --

v@iki.fi

             reply	other threads:[~2003-10-31 19:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-31 19:08 Ville Herva [this message]
2003-11-01  1:41 ` Something corrupts raid5 disks slightly during reboot Jeffrey E. Hundstad
2003-11-01  1:57   ` Mike Fedyk
2003-11-01  8:33     ` Ville Herva
2003-11-01  8:27   ` ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot] Ville Herva
2003-11-01 15:56     ` Willy Tarreau
2003-11-01 18:25       ` Ville Herva
2003-11-01 19:01         ` Willy Tarreau
2003-11-01 21:02           ` Ville Herva
2003-11-02  6:05             ` Andre Hedrick
2003-11-02  8:28               ` Ville Herva
2003-11-02 20:57                 ` Matthias Andree
2003-11-03  5:34                 ` Andre Hedrick
2003-11-03  6:38                   ` Ville Herva
2004-01-02 19:42           ` Something corrupts raid5 disks slightly during reboot Ville Herva
2004-01-02 20:02             ` Ville Herva
2004-01-14 14:46             ` Ville Herva
2004-01-14 22:22               ` Willy Tarreau
2004-01-14 22:46                 ` Ville Herva
2004-01-14 16:39 Samium Gromoff
2004-01-14 22:30 ` Ville Herva
2004-01-15 12:42   ` Samium Gromoff
2004-01-15 19:57     ` Ville Herva
2004-01-16 10:24       ` Samium Gromoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031031190829.GM4868@niksula.cs.hut.fi \
    --to=vherva@niksula.hut.fi \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).