All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid6 recovery
@ 2020-03-19 19:55 Glenn Greibesland
  2020-03-20 19:15 ` Wols Lists
  0 siblings, 1 reply; 13+ messages in thread
From: Glenn Greibesland @ 2020-03-19 19:55 UTC (permalink / raw)
  To: linux-raid

Hi. I need some help with recovering from multiple disk failure on a
RAID6 array.
I had two failed disks and therefore shut down the server and
connected new disks.
After I powered on the server, another disk got booted out of the
array leaving it with only 15 out of 18 working devices, so it won’t
start.
I ran an offline test with smartctl and the disk that got thrown out
of the array seems totally fine.

Here is where I think I made a mistake. I use the –re-add command on
the disk. Now it is regarded as spare and the array still won’t start.

I’ve been reading on
https://raid.wiki.kernel.org/index.php/RAID_Recovery and I have tried
`–assemble –scan –force –verbose` and manual `–assemble –force` with
specifying each drive. Neither of them works (reporting that 15 out of
18 devices is not enough).

All drives has the same event count and used dev size, but two of the
devices has a lower Avail Dev Size, and a different Data Offset.

After a bit of digging in the manual and on different forums I have
concluded that the next step for me is to recreate the array using
–assume-clean and –data-offset=variable.
I have tried a dry run of the command (answering no to “Continue
creating array”), and mdadm accepts the parameters without any errors:


mdadm --create --assume-clean --level=6 --raid-devices=18
--size=3906763776s --chunk=512K --data-offset=variable /dev/md0
/dev/sdj1:262144s /dev/sdk1:262144s /dev/sdi1:262144s
/dev/sdh1:262144s /dev/sdo1:262144s /dev/sdp1:262144s
/dev/sdr1:262144s /dev/sdq1:262144s /dev/sdf1:262144s
/dev/sdb1:262144ss /dev/sdg1:262144s /dev/sdd1:262144s
/dev/sdm1:262144s /dev/sdf2:241664s missing missing /dev/sdc2:241664s
/dev/sdc1:262144s
mdadm: /dev/sdj1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdk1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdi1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdh1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdo1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdp1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdr1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdq1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdf1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdb1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: partition table exists on /dev/sdb1 but will be lost or
       meaningless after creating array
mdadm: /dev/sdg1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdd1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdm1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdf2 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdc2 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
mdadm: /dev/sdc1 appears to be part of a raid array:
       level=raid6 devices=18 ctime=Wed Nov 14 22:53:28 2012
Continue creating array? N

My only worries now are the size and data-offset parameters. According
to the man page, the size should be specified in KiloBytes. It was
KibiBytes previously.
The Used Device Size of all array members is 3906763776 sectors
(1862.89 GiB 2000.26 GB).

Should I convert the sectors into KiloBytes or does mdadm support
using sectors as unit for –size and data-offset? It is not mentioned
in the manual, but I’ve seen it being used on different forum threads
and mdadm does not blow up if I try using it.

Any other suggestions?

^ permalink raw reply	[flat|nested] 13+ messages in thread
* raid6 recovery
@ 2011-01-14 16:16 Björn Englund
  2011-01-14 21:52 ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Björn Englund @ 2011-01-14 16:16 UTC (permalink / raw)
  To: linux-raid

Hi.

After a loss of communication with a drive in a 10 disk raid6 the disk
was dropped out of the raid.

I added it again with
mdadm /dev/md16 --add /dev/sdbq1

The array resynced and I used the xfs filesystem on top of the raid.

After a while I started noticing filesystem errors.

I did
echo check > /sys/block/md16/md/sync_action

I got a lot of errors in /sys/block/md16/md/mismatch_cnt

I failed and removed the disk I added before from the array.

Did a check again (on the 9/10 array)
echo check > /sys/block/md16/md/sync_action

No errors  /sys/block/md16/md/mismatch_cnt

Wiped the superblock from /dev/sdbq1 and added it again to the array.
Let it finish resyncing.
Did a check and once again a lot of errors.

The drive now has slot 10 instead of slot 3 which it had before the
first error.

Examining each device (see below) shows 11 slots and one failed?
(0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3) ?


Any idea what is going on?

mdadm --version
mdadm - v2.6.9 - 10th March 2009

Centos 5.5


mdadm -D /dev/md16
/dev/md16:
        Version : 1.01
  Creation Time : Thu Nov 25 09:15:54 2010
     Raid Level : raid6
     Array Size : 7809792000 (7448.00 GiB 7997.23 GB)
  Used Dev Size : 976224000 (931.00 GiB 999.65 GB)
   Raid Devices : 10
  Total Devices : 10
Preferred Minor : 16
    Persistence : Superblock is persistent

    Update Time : Fri Jan 14 16:22:10 2011
          State : clean
 Active Devices : 10
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

           Name : 16
           UUID : fcd585d0:f2918552:7090d8da:532927c8
         Events : 90

    Number   Major   Minor   RaidDevice State
       0       8      145        0      active sync   /dev/sdj1
       1      65        1        1      active sync   /dev/sdq1
       2      65       17        2      active sync   /dev/sdr1
      10      68       65        3      active sync   /dev/sdbq1
       4      65       49        4      active sync   /dev/sdt1
       5      65       65        5      active sync   /dev/sdu1
       6      65      113        6      active sync   /dev/sdx1
       7      65      129        7      active sync   /dev/sdy1
       8      65       33        8      active sync   /dev/sds1
       9      65      145        9      active sync   /dev/sdz1



mdadm -E /dev/sdj1
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : fcd585d0:f2918552:7090d8da:532927c8
           Name : 16
  Creation Time : Thu Nov 25 09:15:54 2010
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
     Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
  Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 5db9c8f7:ce5b375e:757c53d0:04e89a06

    Update Time : Fri Jan 14 16:22:10 2011
       Checksum : 1f17a675 - correct
         Events : 90

     Chunk Size : 256K

    Array Slot : 0 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
   Array State : Uuuuuuuuuu 1 failed



mdadm -E /dev/sdq1
/dev/sdq1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : fcd585d0:f2918552:7090d8da:532927c8
           Name : 16
  Creation Time : Thu Nov 25 09:15:54 2010
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
     Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
  Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : fb113255:fda391a6:7368a42b:1d6d4655

    Update Time : Fri Jan 14 16:22:10 2011
       Checksum : 6ed7b859 - correct
         Events : 90

     Chunk Size : 256K

    Array Slot : 1 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
   Array State : uUuuuuuuuu 1 failed


 mdadm -E /dev/sdr1
/dev/sdr1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : fcd585d0:f2918552:7090d8da:532927c8
           Name : 16
  Creation Time : Thu Nov 25 09:15:54 2010
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
     Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
  Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : afcb4dd8:2aa58944:40a32ed9:eb6178af

    Update Time : Fri Jan 14 16:22:10 2011
       Checksum : 97a7a2d7 - correct
         Events : 90

     Chunk Size : 256K

    Array Slot : 2 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
   Array State : uuUuuuuuuu 1 failed


mdadm -E /dev/sdbq1
/dev/sdbq1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : fcd585d0:f2918552:7090d8da:532927c8
           Name : 16
  Creation Time : Thu Nov 25 09:15:54 2010
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
     Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
  Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 93c6ae7c:d8161356:7ada1043:d0c5a924

    Update Time : Fri Jan 14 16:22:10 2011
       Checksum : 2ca5aa8f - correct
         Events : 90

     Chunk Size : 256K

    Array Slot : 10 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
   Array State : uuuUuuuuuu 1 failed


and so on for the rest of the drives.

^ permalink raw reply	[flat|nested] 13+ messages in thread
* raid6 recovery
@ 2009-01-15 15:24 Jason Weber
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Weber @ 2009-01-15 15:24 UTC (permalink / raw)
  To: linux-raid

Before I cause to much damage, I really need expert help.

Early this morning, machine locked up and my 4x500Gb raid6 did not
recover on reboot.
A smaller 2x18Gb raid came up as normal.

/var/log/messages has:

Jan 15 01:12:22 wildfire Pid: 6056, comm: mdadm Tainted: P
2.6.19-gentoo-r5 #3

with some codes and a lot of others like it when it went down. And then,

Jan 15 01:16:37 wildfire mdadm: DeviceDisappeared event detected on md
device /dev/md1

I tried simple readds:

# mdadm /dev/md1 --add /dev/sdd /dev/sde
mdadm: cannot get array info for /dev/md1

Eventually I noticed that the drives had a different UUID than mdadm.conf;
one byte had changed.  I have a backup of mdadm.conf so I know that
was the same.

So, I changed mdadm.conf to match the drives and started an assemble

# mdadm --assemble --verbose /dev/md1
mdadm: looking for devices for /dev/md1
mdadm: cannot open device
/dev/disk/by-uuid/d7a08e91-0a49-4e91-91d7-d9d1e9e6cda1: Device or
resource busy
mdadm: /dev/disk/by-uuid/d7a08e91-0a49-4e91-91d7-d9d1e9e6cda1 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sdg1
mdadm: /dev/sdg1 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: cannot open device /dev/sdi2: Device or resource busy
mdadm: /dev/sdi2 has wrong uuid.
mdadm: cannot open device /dev/sdi1: Device or resource busy
mdadm: /dev/sdi1 has wrong uuid.
mdadm: cannot open device /dev/sdi: Device or resource busy
mdadm: /dev/sdi has wrong uuid.
mdadm: cannot open device /dev/sdh1: Device or resource busy
mdadm: /dev/sdh1 has wrong uuid.
mdadm: cannot open device /dev/sdh: Device or resource busy
mdadm: /dev/sdh has wrong uuid.
mdadm: /dev/sdc has wrong uuid.
mdadm: cannot open device /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 has wrong uuid.
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda4: Device or resource busy
mdadm: /dev/sda4 has wrong uuid.
mdadm: cannot open device /dev/sda3: Device or resource busy
mdadm: /dev/sda3 has wrong uuid.
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: /dev/sdf is identified as a member of /dev/md1, slot 1.
mdadm: /dev/sde is identified as a member of /dev/md1, slot 0.
mdadm: /dev/sdd is identified as a member of /dev/md1, slot 3.

which has been sitting there for about four hours, full CPU, and as
far as I can tell not much drive
activity (how can I tell?  they're not very loud relative to the
overall machine noise).

As for "damage" I've done, first of all, one typo added /dev/sdc, once
of md1, to the md0 array
so now it thinks it is 18Gb according to mdadm -E, but hopefully it
was only set to spare so
maybe it didn't get scrambled:

# mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 96a4204f:7b6211e6:34105f4c:9857a351
  Creation Time : Tue May 17 23:03:53 2005
     Raid Level : raid1
  Used Dev Size : 17952512 (17.12 GiB 18.38 GB)
     Array Size : 17952512 (17.12 GiB 18.38 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0

    Update Time : Thu Jan 15 01:52:42 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 195f64d3 - correct
         Events : 0.39649024


      Number   Major   Minor   RaidDevice State
this     2       8       32        2      spare   /dev/sdc

   0     0       8      113        0      active sync   /dev/sdh1
   1     1       8      129        1      active sync   /dev/sdi1
   2     2       8       32        2      spare   /dev/sdc

Here's the others:

# mdadm -E /dev/sdd
/dev/sdd:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : f92d43a8:5ab3f411:26e606b2:3c378a67
  Creation Time : Sat Oct 13 00:23:51 2007
     Raid Level : raid6
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
     Array Size : 976772992 (931.52 GiB 1000.22 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

  Reshape pos'n : 9223371671782555647

    Update Time : Thu Jan 15 01:12:21 2009
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : dca29b4 - correct
         Events : 0.79926

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       48        3      active sync   /dev/sdd

   0     0       8       64        0      active sync   /dev/sde
   1     1       8       80        1      active sync   /dev/sdf
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       48        3      active sync   /dev/sdd

# mdadm -E /dev/sde
/dev/sde:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : f92d43a8:5ab3f411:26e606b2:3c378a67
  Creation Time : Sat Oct 13 00:23:51 2007
     Raid Level : raid6
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
     Array Size : 976772992 (931.52 GiB 1000.22 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

  Reshape pos'n : 9223371671782555647

    Update Time : Thu Jan 15 01:12:21 2009
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : dca29be - correct
         Events : 0.79926

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       64        0      active sync   /dev/sde

   0     0       8       64        0      active sync   /dev/sde
   1     1       8       80        1      active sync   /dev/sdf
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       48        3      active sync   /dev/sdd

# mdadm -E /dev/sdf
/dev/sdf:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : f92d43a8:5ab3f411:26e606b2:3c378a67
  Creation Time : Sat Oct 13 00:23:51 2007
     Raid Level : raid6
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
     Array Size : 976772992 (931.52 GiB 1000.22 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

  Reshape pos'n : 9223371671782555647

    Update Time : Thu Jan 15 01:12:21 2009
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : dca29d0 - correct
         Events : 0.79926

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       80        1      active sync   /dev/sdf

   0     0       8       64        0      active sync   /dev/sde
   1     1       8       80        1      active sync   /dev/sdf
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       48        3      active sync   /dev/sdd

/etc/mdadm.conf:
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md1 level=raid6 num-devices=4
UUID=f92d43a8:5ab3f411:26e606b2:3c378a67
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=96a4204f:7b6211e6:34105f4c:9857a351

# This file was auto-generated on Tue, 11 Mar 2008 00:10:35 -0700
# by mkconf $Id: mkconf 324 2007-05-05 18:49:44Z madduck $

It previously said:
UUID=f92d43a8:5ab3f491:26e606b2:3c378a67

with a ...491.. instead of ...411...

Is mdadm --assemble supposed to take a long time or should it almost
immediately come back
and let me watch /proc/mdstat, which currently just says:

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sdh1[0] sdi1[1]
      17952512 blocks [2/2] [UU]

unused devices: <none>

Also, I did modprobe raid456 manually before the assemble since I
noticed it was only saying raid1.
Maybe it would have been automatic at the right moment anyhow.

Should I just wait for the assemble or is it doing nothing?
Can I recover /dev/sdc as well or is that unimportant since I can
clear it and readd if the other three
(or even two) sync up and become available.

This md1 has been trouble since inception a couple years ago.  I get
corrupt files every week or
so it seems.  My little U320 scsi md0 raid1 has been nearly uneventful
for a much longer time.
Is raid6 less stable or maybe by sata_sil24 card is a bad choice?
Maybe sata doesn't measure
up to scsi.  So please point out any obvious foolishness on my part.

I do have a five day old single non-raid partial backup which is now
the only container of the data.
I'm very nervous about critical loss.  If I absolutely need to start
over, I'd like to get some redundancy
in my data as soon as possible.  Perhaps breaking it into a pair of
raid1 arrays is smarter anyhow.

-- Jason P Weber

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-03-23 12:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-19 19:55 Raid6 recovery Glenn Greibesland
2020-03-20 19:15 ` Wols Lists
     [not found]   ` <CA+9eyigMV-E=FwtXDWZszSsV6JOxxFOFVh6WzmeH=OC3heMUHw@mail.gmail.com>
2020-03-21  0:06     ` antlists
2020-03-21 11:54       ` Glenn Greibesland
2020-03-21 19:24         ` Phil Turmel
2020-03-21 22:12           ` Glenn Greibesland
2020-03-22  0:32             ` Phil Turmel
2020-03-23  9:23               ` Wols Lists
2020-03-23 12:35                 ` Glenn Greibesland
2020-03-22  0:05           ` Wols Lists
  -- strict thread matches above, loose matches on Subject: below --
2011-01-14 16:16 raid6 recovery Björn Englund
2011-01-14 21:52 ` NeilBrown
2009-01-15 15:24 Jason Weber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.