All of lore.kernel.org
 help / color / mirror / Atom feed
* [HELP] Recover a RAID5 with 8 drives
@ 2014-01-28 15:29 Maurizio De Santis
  2014-01-28 20:11 ` AW: " Samer, Michael (I/ET-83, extern)
  0 siblings, 1 reply; 7+ messages in thread
From: Maurizio De Santis @ 2014-01-28 15:29 UTC (permalink / raw)
  To: linux-raid

Hi!

I think I've got a problem :-/ I have a QNAP NAS with a 8 disks RAID5. 
Some days ago I got a "Disk Read/Write Error" on the 8th drive 
(/dev/sdh), with the suggestion to replace the disk.

I replaced it, but after a bit the RAID rebuilding failed, and the QNAP 
Admin Interface still gives me a "Disk Read/Write Error" on /dev/sdh. 
Plus, I can't access to the RAID data anymore :-/

I was following this guide 
https://raid.wiki.kernel.org/index.php/RAID_Recovery but, since I 
haven't got any backup (I promise I will do them in the future!) I'm 
afraid to run any possibly destructive command.

How do you suggest to proceed? I would like to make a RAID excluding the 
8th disk in order to mount it and backup important data, but I don't 
even know if it is doable :-/ Moreover, looking at `mdadm --examine` 
output I see that sdb seems to have problems too, also if QNAP Admin 
Interface doesn't report it.

Here some informations about the machine status:

# uname -a
Linux NAS 3.4.6 #1 SMP Thu Sep 12 10:56:51 CST 2013 x86_64 unknown

# mdadm -V
mdadm - v2.6.3 - 20th August 2007

# cat /etc/mdadm.conf
ARRAY /dev/md0 
devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3,/dev/sde3,/dev/sdf3,/dev/sdg3,/dev/sdh3

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] 
[raid4] [multipath]
md8 : active raid1 sdg2[2](S) sdf2[3](S) sde2[4](S) sdd2[5](S) 
sdc2[6](S) sdb2[1] sda2[0]
       530048 blocks [2/2] [UU]

md13 : active raid1 sda4[0] sde4[6] sdf4[5] sdg4[4] sdd4[3] sdc4[2] sdb4[1]
       458880 blocks [8/7] [UUUUUUU_]
       bitmap: 8/57 pages [32KB], 4KB chunk

md9 : active raid1 sda1[0] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
       530048 blocks [8/7] [UUUUUUU_]
       bitmap: 30/65 pages [120KB], 4KB chunk

unused devices: <none>

# mdadm --examine /dev/sd[abcdefgh]3
/dev/sda3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047ab - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdb3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 8
Preferred Minor : 0

     Update Time : Fri Jan 24 17:09:57 2014
           State : active
  Active Devices : 7
Working Devices : 8
  Failed Devices : 1
   Spare Devices : 1
        Checksum : 97f3567d - correct
          Events : 0.2944837

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     1       8       19        1      active sync   /dev/sdb3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       8       19        1      active sync   /dev/sdb3
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
    8     8       8      115        8      spare   /dev/sdh3
/dev/sdc3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047cf - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     2       8       35        2      active sync   /dev/sdc3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdd3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047e1 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8       51        3      active sync   /dev/sdd3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sde3:
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047f3 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     4       8       67        4      active sync   /dev/sde3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdf3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 98204805 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     5       8       83        5      active sync   /dev/sdf3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdg3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 98204817 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     6       8       99        6      active sync   /dev/sdg3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdh3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 8
Preferred Minor : 0

     Update Time : Fri Jan 24 17:18:26 2014
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : 98204851 - correct
          Events : 0.2944847

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     8       8      115        8      spare   /dev/sdh3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
    8     8       8      115        8      spare   /dev/sdh3

# dmesg **edited (removed unuseful parts)**
, wo:0, o:1, dev:sdb2
[  975.516724] RAID1 conf printout:
[  975.516728]  --- wd:2 rd:2
[  975.516732]  disk 0, wo:0, o:1, dev:sda2
[  975.516737]  disk 1, wo:0, o:1, dev:sdb2
[  975.516740] RAID1 conf printout:
[  975.516744]  --- wd:2 rd:2
[  975.516748]  disk 0, wo:0, o:1, dev:sda2
[  975.516753]  disk 1, wo:0, o:1, dev:sdb2
[  977.495709] md: unbind<sdh2>
[  977.505048] md: export_rdev(sdh2)
[  977.535277] md/raid1:md9: Disk failure on sdh1, disabling device.
[  977.575038]  disk 2, wo:0, o:1, dev:sdc1
[  977.575043]  disk 3, wo:0, o:1, dev:sdd1
[  977.575048]  disk 4, wo:0, o:1, dev:sde1
[  977.575053]  disk 5, wo:0, o:1, dev:sdf1
[  977.575058]  disk 6, wo:0, o:1, dev:sdg1
[  979.547149] md: unbind<sdh1>
[  979.558031] md: export_rdev(sdh1)
[  979.592646] md/raid1:md13: Disk failure on sdh4, disabling device.
[  979.592650] md/raid1:md13: Operation continuing on 7 devices.
[  979.650862] RAID1 conf printout:
[  979.650869]  --- wd:7 rd:8
[  979.650875]  disk 0, wo:0, o:1, dev:sda4
[  979.650880]  disk 1, wo:0, o:1, dev:sdb4
[  979.650885]  disk 2, wo:0, o:1, dev:sdc4
[  979.650890]  disk 3, wo:0, o:1, dev:sdd4
[  979.650895]  disk 4, wo:0, o:1, dev:sdg4
[  979.650900]  disk 5, wo:0, o:1, dev:sdf4
[  979.650905]  disk 6, wo:0, o:1, dev:sde4
[  979.650911]  disk 7, wo:1, o:0, dev:sdh4
[  979.656024] RAID1 conf printout:
[  979.656029]  --- wd:7 rd:8
[  979.656034]  disk 0, wo:0, o:1, dev:sda4
[  979.656039]  disk 1, wo:0, o:1, dev:sdb4
[  979.656044]  disk 2, wo:0, o:1, dev:sdc4
[  979.656049]  disk 3, wo:0, o:1, dev:sdd4
[  979.656054]  disk 4, wo:0, o:1, dev:sdg4
[  979.656059]  disk 5, wo:0, o:1, dev:sdf4
[  979.656063]  disk 6, wo:0, o:1, dev:sde4
[  981.604906] md: unbind<sdh4>
[  981.616035] md: export_rdev(sdh4)
[  981.753058] md/raid:md0: Disk failure on sdh3, disabling device.
[  981.753062] md/raid:md0: Operation continuing on 6 devices.
[  983.765852] md: unbind<sdh3>
[  983.777030] md: export_rdev(sdh3)
[ 1060.094825] journal commit I/O error
[ 1060.099196] journal commit I/O error
[ 1060.103525] journal commit I/O error
[ 1060.108698] journal commit I/O error
[ 1060.116311] journal commit I/O error
[ 1060.123634] journal commit I/O error
[ 1060.127225] journal commit I/O error
[ 1060.130930] journal commit I/O error
[ 1060.137651] EXT4-fs (md0): previous I/O error to superblock detected
[ 1060.178323] Buffer I/O error on device md0, logical block 0
[ 1060.181873] lost page write due to I/O error on md0
[ 1060.185634] EXT4-fs error (device md0): ext4_put_super:849: Couldn't 
clean up the journal
[ 1062.662723] md0: detected capacity change from 13991546060800 to 0
[ 1062.666308] md: md0 stopped.
[ 1062.669760] md: unbind<sda3>
[ 1062.681031] md: export_rdev(sda3)
[ 1062.684466] md: unbind<sdg3>
[ 1062.695023] md: export_rdev(sdg3)
[ 1062.698342] md: unbind<sdf3>
[ 1062.709021] md: export_rdev(sdf3)
[ 1062.712310] md: unbind<sde3>
[ 1062.723029] md: export_rdev(sde3)
[ 1062.726245] md: unbind<sdd3>
[ 1062.737022] md: export_rdev(sdd3)
[ 1062.740112] md: unbind<sdc3>
[ 1062.751022] md: export_rdev(sdc3)
[ 1062.753934] md: unbind<sdb3>
[ 1062.764021] md: export_rdev(sdb3)
[ 1063.772687] md: md0 stopped.
[ 1064.782381] md: md0 stopped.
[ 1065.792585] md: md0 stopped.
[ 1066.801668] md: md0 stopped.
[ 1067.812573] md: md0 stopped.
[ 1068.821548] md: md0 stopped.
[ 1069.830667] md: md0 stopped.
[ 1070.839554] md: md0 stopped.
[ 1071.848418] md: md0 stopped.

-- 

Maurizio De Santis


^ permalink raw reply	[flat|nested] 7+ messages in thread

* AW: [HELP] Recover a RAID5 with 8 drives
  2014-01-28 15:29 [HELP] Recover a RAID5 with 8 drives Maurizio De Santis
@ 2014-01-28 20:11 ` Samer, Michael (I/ET-83, extern)
  2014-01-29 14:14   ` Maurizio De Santis
  0 siblings, 1 reply; 7+ messages in thread
From: Samer, Michael (I/ET-83, extern) @ 2014-01-28 20:11 UTC (permalink / raw)
  To: 'Maurizio De Santis'; +Cc: 'linux-raid@vger.kernel.org'

Hello Maurizio
A very likewise case did happened to me (search for QNAP).
Your box dropped a second one (=full failure) while rebuilding, I guess due to read errors and no TLER capable drive.
Western Digital is prone for this.

I was lucky to be able to copy all of my faulty (5 of 8) drives and currently I try to recreate the md superblocks which have been lost on the last write.
What drives do you use?

Cheers
Sam


-----Ursprüngliche Nachricht-----
Von: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] Im Auftrag von Maurizio De Santis
Gesendet: Dienstag, 28. Januar 2014 16:30
An: linux-raid@vger.kernel.org
Betreff: [HELP] Recover a RAID5 with 8 drives

Hi!

I think I've got a problem :-/ I have a QNAP NAS with a 8 disks RAID5. 
Some days ago I got a "Disk Read/Write Error" on the 8th drive 
(/dev/sdh), with the suggestion to replace the disk.

I replaced it, but after a bit the RAID rebuilding failed, and the QNAP 
Admin Interface still gives me a "Disk Read/Write Error" on /dev/sdh. 
Plus, I can't access to the RAID data anymore :-/

I was following this guide 
https://raid.wiki.kernel.org/index.php/RAID_Recovery but, since I 
haven't got any backup (I promise I will do them in the future!) I'm 
afraid to run any possibly destructive command.

How do you suggest to proceed? I would like to make a RAID excluding the 
8th disk in order to mount it and backup important data, but I don't 
even know if it is doable :-/ Moreover, looking at `mdadm --examine` 
output I see that sdb seems to have problems too, also if QNAP Admin 
Interface doesn't report it.

Here some informations about the machine status:

# uname -a
Linux NAS 3.4.6 #1 SMP Thu Sep 12 10:56:51 CST 2013 x86_64 unknown

# mdadm -V
mdadm - v2.6.3 - 20th August 2007

# cat /etc/mdadm.conf
ARRAY /dev/md0 
devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3,/dev/sde3,/dev/sdf3,/dev/sdg3,/dev/sdh3

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] 
[raid4] [multipath]
md8 : active raid1 sdg2[2](S) sdf2[3](S) sde2[4](S) sdd2[5](S) 
sdc2[6](S) sdb2[1] sda2[0]
       530048 blocks [2/2] [UU]

md13 : active raid1 sda4[0] sde4[6] sdf4[5] sdg4[4] sdd4[3] sdc4[2] sdb4[1]
       458880 blocks [8/7] [UUUUUUU_]
       bitmap: 8/57 pages [32KB], 4KB chunk

md9 : active raid1 sda1[0] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
       530048 blocks [8/7] [UUUUUUU_]
       bitmap: 30/65 pages [120KB], 4KB chunk

unused devices: <none>

# mdadm --examine /dev/sd[abcdefgh]3
/dev/sda3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047ab - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdb3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 8
Preferred Minor : 0

     Update Time : Fri Jan 24 17:09:57 2014
           State : active
  Active Devices : 7
Working Devices : 8
  Failed Devices : 1
   Spare Devices : 1
        Checksum : 97f3567d - correct
          Events : 0.2944837

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     1       8       19        1      active sync   /dev/sdb3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       8       19        1      active sync   /dev/sdb3
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
    8     8       8      115        8      spare   /dev/sdh3
/dev/sdc3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047cf - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     2       8       35        2      active sync   /dev/sdc3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdd3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047e1 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8       51        3      active sync   /dev/sdd3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sde3:
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 982047f3 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     4       8       67        4      active sync   /dev/sde3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdf3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 98204805 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     5       8       83        5      active sync   /dev/sdf3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdg3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 7
Preferred Minor : 0

     Update Time : Fri Jan 24 17:19:58 2014
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 2
   Spare Devices : 0
        Checksum : 98204817 - correct
          Events : 0.2944851

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     6       8       99        6      active sync   /dev/sdg3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
/dev/sdh3:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
   Creation Time : Fri Jan 20 02:19:47 2012
      Raid Level : raid5
   Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
      Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
    Raid Devices : 8
   Total Devices : 8
Preferred Minor : 0

     Update Time : Fri Jan 24 17:18:26 2014
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 2
   Spare Devices : 1
        Checksum : 98204851 - correct
          Events : 0.2944847

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     8       8      115        8      spare   /dev/sdh3

    0     0       8        3        0      active sync   /dev/sda3
    1     1       0        0        1      faulty removed
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8       51        3      active sync   /dev/sdd3
    4     4       8       67        4      active sync   /dev/sde3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8       99        6      active sync   /dev/sdg3
    7     7       0        0        7      faulty removed
    8     8       8      115        8      spare   /dev/sdh3

# dmesg **edited (removed unuseful parts)**
, wo:0, o:1, dev:sdb2
[  975.516724] RAID1 conf printout:
[  975.516728]  --- wd:2 rd:2
[  975.516732]  disk 0, wo:0, o:1, dev:sda2
[  975.516737]  disk 1, wo:0, o:1, dev:sdb2
[  975.516740] RAID1 conf printout:
[  975.516744]  --- wd:2 rd:2
[  975.516748]  disk 0, wo:0, o:1, dev:sda2
[  975.516753]  disk 1, wo:0, o:1, dev:sdb2
[  977.495709] md: unbind<sdh2>
[  977.505048] md: export_rdev(sdh2)
[  977.535277] md/raid1:md9: Disk failure on sdh1, disabling device.
[  977.575038]  disk 2, wo:0, o:1, dev:sdc1
[  977.575043]  disk 3, wo:0, o:1, dev:sdd1
[  977.575048]  disk 4, wo:0, o:1, dev:sde1
[  977.575053]  disk 5, wo:0, o:1, dev:sdf1
[  977.575058]  disk 6, wo:0, o:1, dev:sdg1
[  979.547149] md: unbind<sdh1>
[  979.558031] md: export_rdev(sdh1)
[  979.592646] md/raid1:md13: Disk failure on sdh4, disabling device.
[  979.592650] md/raid1:md13: Operation continuing on 7 devices.
[  979.650862] RAID1 conf printout:
[  979.650869]  --- wd:7 rd:8
[  979.650875]  disk 0, wo:0, o:1, dev:sda4
[  979.650880]  disk 1, wo:0, o:1, dev:sdb4
[  979.650885]  disk 2, wo:0, o:1, dev:sdc4
[  979.650890]  disk 3, wo:0, o:1, dev:sdd4
[  979.650895]  disk 4, wo:0, o:1, dev:sdg4
[  979.650900]  disk 5, wo:0, o:1, dev:sdf4
[  979.650905]  disk 6, wo:0, o:1, dev:sde4
[  979.650911]  disk 7, wo:1, o:0, dev:sdh4
[  979.656024] RAID1 conf printout:
[  979.656029]  --- wd:7 rd:8
[  979.656034]  disk 0, wo:0, o:1, dev:sda4
[  979.656039]  disk 1, wo:0, o:1, dev:sdb4
[  979.656044]  disk 2, wo:0, o:1, dev:sdc4
[  979.656049]  disk 3, wo:0, o:1, dev:sdd4
[  979.656054]  disk 4, wo:0, o:1, dev:sdg4
[  979.656059]  disk 5, wo:0, o:1, dev:sdf4
[  979.656063]  disk 6, wo:0, o:1, dev:sde4
[  981.604906] md: unbind<sdh4>
[  981.616035] md: export_rdev(sdh4)
[  981.753058] md/raid:md0: Disk failure on sdh3, disabling device.
[  981.753062] md/raid:md0: Operation continuing on 6 devices.
[  983.765852] md: unbind<sdh3>
[  983.777030] md: export_rdev(sdh3)
[ 1060.094825] journal commit I/O error
[ 1060.099196] journal commit I/O error
[ 1060.103525] journal commit I/O error
[ 1060.108698] journal commit I/O error
[ 1060.116311] journal commit I/O error
[ 1060.123634] journal commit I/O error
[ 1060.127225] journal commit I/O error
[ 1060.130930] journal commit I/O error
[ 1060.137651] EXT4-fs (md0): previous I/O error to superblock detected
[ 1060.178323] Buffer I/O error on device md0, logical block 0
[ 1060.181873] lost page write due to I/O error on md0
[ 1060.185634] EXT4-fs error (device md0): ext4_put_super:849: Couldn't 
clean up the journal
[ 1062.662723] md0: detected capacity change from 13991546060800 to 0
[ 1062.666308] md: md0 stopped.
[ 1062.669760] md: unbind<sda3>
[ 1062.681031] md: export_rdev(sda3)
[ 1062.684466] md: unbind<sdg3>
[ 1062.695023] md: export_rdev(sdg3)
[ 1062.698342] md: unbind<sdf3>
[ 1062.709021] md: export_rdev(sdf3)
[ 1062.712310] md: unbind<sde3>
[ 1062.723029] md: export_rdev(sde3)
[ 1062.726245] md: unbind<sdd3>
[ 1062.737022] md: export_rdev(sdd3)
[ 1062.740112] md: unbind<sdc3>
[ 1062.751022] md: export_rdev(sdc3)
[ 1062.753934] md: unbind<sdb3>
[ 1062.764021] md: export_rdev(sdb3)
[ 1063.772687] md: md0 stopped.
[ 1064.782381] md: md0 stopped.
[ 1065.792585] md: md0 stopped.
[ 1066.801668] md: md0 stopped.
[ 1067.812573] md: md0 stopped.
[ 1068.821548] md: md0 stopped.
[ 1069.830667] md: md0 stopped.
[ 1070.839554] md: md0 stopped.
[ 1071.848418] md: md0 stopped.

-- 

Maurizio De Santis

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AW: [HELP] Recover a RAID5 with 8 drives
  2014-01-28 20:11 ` AW: " Samer, Michael (I/ET-83, extern)
@ 2014-01-29 14:14   ` Maurizio De Santis
  2014-01-30  9:26     ` Brad Campbell
  2014-01-30 12:20     ` AW: " Samer, Michael (I/ET-83, extern)
  0 siblings, 2 replies; 7+ messages in thread
From: Maurizio De Santis @ 2014-01-29 14:14 UTC (permalink / raw)
  To: Samer, Michael (I/ET-83, extern); +Cc: 'linux-raid@vger.kernel.org'

*** resent in order to send it in text format (this time for real :-/ 
:-/ ) ***

Hi Michael,

I agree with you that our situations seem very similar, moreover your 
analysis seems correct to me, since our hard disks are all WD Caviar 
Green, so they lack of the TLER feature (which I wasn't aware of, thanks 
for pointing out this too).

Luckily I just managed to access to the RAID in order to backup 
important data, executing `mdadm --assemble --force /dev/md0 
/dev/sd[abcdefgh]3`; so the crucial part is done; now I have the 
"freedom" to do everything in order to resolve the issue.

Now I would ask you:

  * how did you proceed in order to restore your situation? Do you have
    any suggestion?
  * reading about TLER I believe I understood that the failing disks are
    not necessarly broken, but the RAID thinks they are; does it mean
    that I can still use the failing disks?



Il 28/01/2014 21:11, Samer, Michael (I/ET-83, extern) ha scritto:
> Hello Maurizio
> A very likewise case did happened to me (search for QNAP).
> Your box dropped a second one (=full failure) while rebuilding, I guess due to read errors and no TLER capable drive.
> Western Digital is prone for this.
>
> I was lucky to be able to copy all of my faulty (5 of 8) drives and currently I try to recreate the md superblocks which have been lost on the last write.
> What drives do you use?
>
> Cheers
> Sam
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] Im Auftrag von Maurizio De Santis
> Gesendet: Dienstag, 28. Januar 2014 16:30
> An: linux-raid@vger.kernel.org
> Betreff: [HELP] Recover a RAID5 with 8 drives
>
> Hi!
>
> I think I've got a problem :-/ I have a QNAP NAS with a 8 disks RAID5.
> Some days ago I got a "Disk Read/Write Error" on the 8th drive
> (/dev/sdh), with the suggestion to replace the disk.
>
> I replaced it, but after a bit the RAID rebuilding failed, and the QNAP
> Admin Interface still gives me a "Disk Read/Write Error" on /dev/sdh.
> Plus, I can't access to the RAID data anymore :-/
>
> I was following this guide
> https://raid.wiki.kernel.org/index.php/RAID_Recovery but, since I
> haven't got any backup (I promise I will do them in the future!) I'm
> afraid to run any possibly destructive command.
>
> How do you suggest to proceed? I would like to make a RAID excluding the
> 8th disk in order to mount it and backup important data, but I don't
> even know if it is doable :-/ Moreover, looking at `mdadm --examine`
> output I see that sdb seems to have problems too, also if QNAP Admin
> Interface doesn't report it.
>
> Here some informations about the machine status:
>
> # uname -a
> Linux NAS 3.4.6 #1 SMP Thu Sep 12 10:56:51 CST 2013 x86_64 unknown
>
> # mdadm -V
> mdadm - v2.6.3 - 20th August 2007
>
> # cat /etc/mdadm.conf
> ARRAY /dev/md0
> devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3,/dev/sde3,/dev/sdf3,/dev/sdg3,/dev/sdh3
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath]
> md8 : active raid1 sdg2[2](S) sdf2[3](S) sde2[4](S) sdd2[5](S)
> sdc2[6](S) sdb2[1] sda2[0]
>         530048 blocks [2/2] [UU]
>
> md13 : active raid1 sda4[0] sde4[6] sdf4[5] sdg4[4] sdd4[3] sdc4[2] sdb4[1]
>         458880 blocks [8/7] [UUUUUUU_]
>         bitmap: 8/57 pages [32KB], 4KB chunk
>
> md9 : active raid1 sda1[0] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>         530048 blocks [8/7] [UUUUUUU_]
>         bitmap: 30/65 pages [120KB], 4KB chunk
>
> unused devices: <none>
>
> # mdadm --examine /dev/sd[abcdefgh]3
> /dev/sda3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047ab - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     0       8        3        0      active sync   /dev/sda3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdb3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:09:57 2014
>             State : active
>    Active Devices : 7
> Working Devices : 8
>    Failed Devices : 1
>     Spare Devices : 1
>          Checksum : 97f3567d - correct
>            Events : 0.2944837
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     1       8       19        1      active sync   /dev/sdb3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       8       19        1      active sync   /dev/sdb3
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
> /dev/sdc3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047cf - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     2       8       35        2      active sync   /dev/sdc3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdd3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047e1 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     3       8       51        3      active sync   /dev/sdd3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sde3:
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047f3 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     4       8       67        4      active sync   /dev/sde3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdf3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204805 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     5       8       83        5      active sync   /dev/sdf3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdg3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204817 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     6       8       99        6      active sync   /dev/sdg3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdh3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:18:26 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 7
>    Failed Devices : 2
>     Spare Devices : 1
>          Checksum : 98204851 - correct
>            Events : 0.2944847
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     8       8      115        8      spare   /dev/sdh3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
>
> # dmesg **edited (removed unuseful parts)**
> , wo:0, o:1, dev:sdb2
> [  975.516724] RAID1 conf printout:
> [  975.516728]  --- wd:2 rd:2
> [  975.516732]  disk 0, wo:0, o:1, dev:sda2
> [  975.516737]  disk 1, wo:0, o:1, dev:sdb2
> [  975.516740] RAID1 conf printout:
> [  975.516744]  --- wd:2 rd:2
> [  975.516748]  disk 0, wo:0, o:1, dev:sda2
> [  975.516753]  disk 1, wo:0, o:1, dev:sdb2
> [  977.495709] md: unbind<sdh2>
> [  977.505048] md: export_rdev(sdh2)
> [  977.535277] md/raid1:md9: Disk failure on sdh1, disabling device.
> [  977.575038]  disk 2, wo:0, o:1, dev:sdc1
> [  977.575043]  disk 3, wo:0, o:1, dev:sdd1
> [  977.575048]  disk 4, wo:0, o:1, dev:sde1
> [  977.575053]  disk 5, wo:0, o:1, dev:sdf1
> [  977.575058]  disk 6, wo:0, o:1, dev:sdg1
> [  979.547149] md: unbind<sdh1>
> [  979.558031] md: export_rdev(sdh1)
> [  979.592646] md/raid1:md13: Disk failure on sdh4, disabling device.
> [  979.592650] md/raid1:md13: Operation continuing on 7 devices.
> [  979.650862] RAID1 conf printout:
> [  979.650869]  --- wd:7 rd:8
> [  979.650875]  disk 0, wo:0, o:1, dev:sda4
> [  979.650880]  disk 1, wo:0, o:1, dev:sdb4
> [  979.650885]  disk 2, wo:0, o:1, dev:sdc4
> [  979.650890]  disk 3, wo:0, o:1, dev:sdd4
> [  979.650895]  disk 4, wo:0, o:1, dev:sdg4
> [  979.650900]  disk 5, wo:0, o:1, dev:sdf4
> [  979.650905]  disk 6, wo:0, o:1, dev:sde4
> [  979.650911]  disk 7, wo:1, o:0, dev:sdh4
> [  979.656024] RAID1 conf printout:
> [  979.656029]  --- wd:7 rd:8
> [  979.656034]  disk 0, wo:0, o:1, dev:sda4
> [  979.656039]  disk 1, wo:0, o:1, dev:sdb4
> [  979.656044]  disk 2, wo:0, o:1, dev:sdc4
> [  979.656049]  disk 3, wo:0, o:1, dev:sdd4
> [  979.656054]  disk 4, wo:0, o:1, dev:sdg4
> [  979.656059]  disk 5, wo:0, o:1, dev:sdf4
> [  979.656063]  disk 6, wo:0, o:1, dev:sde4
> [  981.604906] md: unbind<sdh4>
> [  981.616035] md: export_rdev(sdh4)
> [  981.753058] md/raid:md0: Disk failure on sdh3, disabling device.
> [  981.753062] md/raid:md0: Operation continuing on 6 devices.
> [  983.765852] md: unbind<sdh3>
> [  983.777030] md: export_rdev(sdh3)
> [ 1060.094825] journal commit I/O error
> [ 1060.099196] journal commit I/O error
> [ 1060.103525] journal commit I/O error
> [ 1060.108698] journal commit I/O error
> [ 1060.116311] journal commit I/O error
> [ 1060.123634] journal commit I/O error
> [ 1060.127225] journal commit I/O error
> [ 1060.130930] journal commit I/O error
> [ 1060.137651] EXT4-fs (md0): previous I/O error to superblock detected
> [ 1060.178323] Buffer I/O error on device md0, logical block 0
> [ 1060.181873] lost page write due to I/O error on md0
> [ 1060.185634] EXT4-fs error (device md0): ext4_put_super:849: Couldn't
> clean up the journal
> [ 1062.662723] md0: detected capacity change from 13991546060800 to 0
> [ 1062.666308] md: md0 stopped.
> [ 1062.669760] md: unbind<sda3>
> [ 1062.681031] md: export_rdev(sda3)
> [ 1062.684466] md: unbind<sdg3>
> [ 1062.695023] md: export_rdev(sdg3)
> [ 1062.698342] md: unbind<sdf3>
> [ 1062.709021] md: export_rdev(sdf3)
> [ 1062.712310] md: unbind<sde3>
> [ 1062.723029] md: export_rdev(sde3)
> [ 1062.726245] md: unbind<sdd3>
> [ 1062.737022] md: export_rdev(sdd3)
> [ 1062.740112] md: unbind<sdc3>
> [ 1062.751022] md: export_rdev(sdc3)
> [ 1062.753934] md: unbind<sdb3>
> [ 1062.764021] md: export_rdev(sdb3)
> [ 1063.772687] md: md0 stopped.
> [ 1064.782381] md: md0 stopped.
> [ 1065.792585] md: md0 stopped.
> [ 1066.801668] md: md0 stopped.
> [ 1067.812573] md: md0 stopped.
> [ 1068.821548] md: md0 stopped.
> [ 1069.830667] md: md0 stopped.
> [ 1070.839554] md: md0 stopped.
> [ 1071.848418] md: md0 stopped.
>


-- 

Maurizio De Santis
DEVELOPMENT MANAGER
Morgan S.p.A.
Via Degli Olmetti, 36
00060 Formello (RM), Italy
t. 06.9075275
w. www.morganspa.com
m. m.desantis@morganspa.com

In ottemperanza al Dlgs. 196/2003 sulla tutela dei dati personali, le informazioni contenute in questo messaggio sono strettamente riservate e sono esclusivamente indirizzate al destinatario; qualsiasi uso, o divulgazione dello stesso è vietata. Nel caso in cui abbiate ricevuto questo messaggio per errore. Vi invitiamo ad avvertire il mittente al più presto e a procedere all'immediata distruzione dello stesso.

According to Italian law Dlgs. 196/2003 concerning privacy, information contained in this message is confidential and intended for the addressee only; any use, copy or distribution of same is strictly prohibited. If you have received this message in error, you are requested to inform the sender as soon as possible and immediately destroy it.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AW: [HELP] Recover a RAID5 with 8 drives
  2014-01-29 14:14   ` Maurizio De Santis
@ 2014-01-30  9:26     ` Brad Campbell
  2014-01-30 12:20     ` AW: " Samer, Michael (I/ET-83, extern)
  1 sibling, 0 replies; 7+ messages in thread
From: Brad Campbell @ 2014-01-30  9:26 UTC (permalink / raw)
  To: Maurizio De Santis, Samer, Michael (I/ET-83, extern)
  Cc: 'linux-raid@vger.kernel.org'

On 29/01/14 22:14, Maurizio De Santis wrote:
> *** resent in order to send it in text format (this time for real :-/
> :-/ ) ***
>
> Hi Michael,
>
> I agree with you that our situations seem very similar, moreover your
> analysis seems correct to me, since our hard disks are all WD Caviar
> Green, so they lack of the TLER feature (which I wasn't aware of, thanks
> for pointing out this too).


I'd double check this. 9 out of 10 of my WD 2T Green drives have TLER. 
It's just not enabled by default.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* AW: AW: [HELP] Recover a RAID5 with 8 drives
  2014-01-29 14:14   ` Maurizio De Santis
  2014-01-30  9:26     ` Brad Campbell
@ 2014-01-30 12:20     ` Samer, Michael (I/ET-83, extern)
  2014-01-30 12:22       ` Samer, Michael (I/ET-83, extern)
  2014-01-30 21:48       ` Chris Murphy
  1 sibling, 2 replies; 7+ messages in thread
From: Samer, Michael (I/ET-83, extern) @ 2014-01-30 12:20 UTC (permalink / raw)
  To: 'Maurizio De Santis'; +Cc: 'linux-raid@vger.kernel.org'

Hi Michael,

Hi Maurizio

I agree with you that our situations seem very similar, moreover your 
analysis seems correct to me, since our hard disks are all WD Caviar 
Green, so they lack of the TLER feature (which I wasn't aware of, thanks 
for pointing out this too).

The TLER is partly available, but per default not active as you surely
have found out. Everytime to use smartctl.... just to force using it is
a hardware problem, and we solved it by using real disks this time.

Luckily I just managed to access to the RAID in order to backup 
important data, executing `mdadm --assemble --force /dev/md0 
/dev/sd[abcdefgh]3`; so the crucial part is done; now I have the 
"freedom" to do everything in order to resolve the issue.

Luckily your disks were not corrupted (badBlacks) as mine. 5 out of 8
disks hat BB from 8k to 47k blocks, one was out of order (so a collegue
of Ontrack helped me to get this one running=sdg). After getting most
of the datas transferred I recognized the md superblock is gone on 5
of 8 HDs and the partition table is unreadable on two, so I had to
implement them into the transferred disks.

Now I would ask you:

  * how did you proceed in order to restore your situation? Do you have
    any suggestion?

I'd not have asked in my first mail for help if I could assemble it
somehow. I'd advice for the doing as mdadm is far away of my knowledge
(which is good on hardware RAID Controllers and ZFS Systems, which I
run on big servers). Currently I need a way which could work. As my
problem was not even replied once it seems I'm on my own.
My way would be (as I have a backup of all disks when they were
delivered to me): 
a) using testdisk to extract the Part table from a running
Disk into the PT missing 5 disks
b) get mdadm to write new superblocks into the disks (the datas are
not tempered/altered that way as I understand mdadm manual) via
--create --asssume-clean and leave the sdb out via the missing
Parameter so like
mdadm --create --assume-clean /dev/md0 --level=5 --raid-devices=8
/dev/sd{a,c,d,e,f,g,h}3 missing
c) run an e2fsck on the md0 (which is now possible as I added
2GB RAM)
d) get the datas into our SAN

e) recreate the array manually and exporting this time into the
mdadm.conf (which the QNAP does not do)


Funny thing is that the QNAP support provided me with wrong ways
to go according to Neil Browns answer and the QNAP 859 is not able
to run an fsck by default as it has not enough ram when volume is over
8TiB



  * reading about TLER I believe I understood that the failing disks
are not necessarly broken, but the RAID thinks they are; does it mean
    that I can still use the failing disks?

One bad Block is for me enough to swap a drive, but you are right:
the disk just took too much time to recover so it was thrown out of
the array. As my disks had big areas with defective blocks in just
2 years of running all disks were replaced. Just the 8 month of
waiting now costed more than the complete array and replacement drives.
It just proves: good disk are worth their money!

You might have read in my thread that this was the second QNAP with
the same problem (this time much more sincere; the first case about
one year back was just like your one)


Cheers
Michael


Il 28/01/2014 21:11, Samer, Michael (I/ET-83, extern) ha scritto:
> Hello Maurizio
> A very likewise case did happened to me (search for QNAP).
> Your box dropped a second one (=full failure) while rebuilding, I guess due to read errors and no TLER capable drive.
> Western Digital is prone for this.
>
> I was lucky to be able to copy all of my faulty (5 of 8) drives and currently I try to recreate the md superblocks which have been lost on the last write.
> What drives do you use?
>
> Cheers
> Sam
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] Im Auftrag von Maurizio De Santis
> Gesendet: Dienstag, 28. Januar 2014 16:30
> An: linux-raid@vger.kernel.org
> Betreff: [HELP] Recover a RAID5 with 8 drives
>
> Hi!
>
> I think I've got a problem :-/ I have a QNAP NAS with a 8 disks RAID5.
> Some days ago I got a "Disk Read/Write Error" on the 8th drive
> (/dev/sdh), with the suggestion to replace the disk.
>
> I replaced it, but after a bit the RAID rebuilding failed, and the QNAP
> Admin Interface still gives me a "Disk Read/Write Error" on /dev/sdh.
> Plus, I can't access to the RAID data anymore :-/
>
> I was following this guide
> https://raid.wiki.kernel.org/index.php/RAID_Recovery but, since I
> haven't got any backup (I promise I will do them in the future!) I'm
> afraid to run any possibly destructive command.
>
> How do you suggest to proceed? I would like to make a RAID excluding the
> 8th disk in order to mount it and backup important data, but I don't
> even know if it is doable :-/ Moreover, looking at `mdadm --examine`
> output I see that sdb seems to have problems too, also if QNAP Admin
> Interface doesn't report it.
>
> Here some informations about the machine status:
>
> # uname -a
> Linux NAS 3.4.6 #1 SMP Thu Sep 12 10:56:51 CST 2013 x86_64 unknown
>
> # mdadm -V
> mdadm - v2.6.3 - 20th August 2007
>
> # cat /etc/mdadm.conf
> ARRAY /dev/md0
> devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3,/dev/sde3,/dev/sdf3,/dev/sdg3,/dev/sdh3
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath]
> md8 : active raid1 sdg2[2](S) sdf2[3](S) sde2[4](S) sdd2[5](S)
> sdc2[6](S) sdb2[1] sda2[0]
>         530048 blocks [2/2] [UU]
>
> md13 : active raid1 sda4[0] sde4[6] sdf4[5] sdg4[4] sdd4[3] sdc4[2] sdb4[1]
>         458880 blocks [8/7] [UUUUUUU_]
>         bitmap: 8/57 pages [32KB], 4KB chunk
>
> md9 : active raid1 sda1[0] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>         530048 blocks [8/7] [UUUUUUU_]
>         bitmap: 30/65 pages [120KB], 4KB chunk
>
> unused devices: <none>
>
> # mdadm --examine /dev/sd[abcdefgh]3
> /dev/sda3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047ab - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     0       8        3        0      active sync   /dev/sda3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdb3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:09:57 2014
>             State : active
>    Active Devices : 7
> Working Devices : 8
>    Failed Devices : 1
>     Spare Devices : 1
>          Checksum : 97f3567d - correct
>            Events : 0.2944837
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     1       8       19        1      active sync   /dev/sdb3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       8       19        1      active sync   /dev/sdb3
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
> /dev/sdc3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047cf - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     2       8       35        2      active sync   /dev/sdc3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdd3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047e1 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     3       8       51        3      active sync   /dev/sdd3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sde3:
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047f3 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     4       8       67        4      active sync   /dev/sde3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdf3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204805 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     5       8       83        5      active sync   /dev/sdf3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdg3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204817 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     6       8       99        6      active sync   /dev/sdg3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdh3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:18:26 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 7
>    Failed Devices : 2
>     Spare Devices : 1
>          Checksum : 98204851 - correct
>            Events : 0.2944847
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     8       8      115        8      spare   /dev/sdh3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
>
> # dmesg **edited (removed unuseful parts)**
> , wo:0, o:1, dev:sdb2
> [  975.516724] RAID1 conf printout:
> [  975.516728]  --- wd:2 rd:2
> [  975.516732]  disk 0, wo:0, o:1, dev:sda2
> [  975.516737]  disk 1, wo:0, o:1, dev:sdb2
> [  975.516740] RAID1 conf printout:
> [  975.516744]  --- wd:2 rd:2
> [  975.516748]  disk 0, wo:0, o:1, dev:sda2
> [  975.516753]  disk 1, wo:0, o:1, dev:sdb2
> [  977.495709] md: unbind<sdh2>
> [  977.505048] md: export_rdev(sdh2)
> [  977.535277] md/raid1:md9: Disk failure on sdh1, disabling device.
> [  977.575038]  disk 2, wo:0, o:1, dev:sdc1
> [  977.575043]  disk 3, wo:0, o:1, dev:sdd1
> [  977.575048]  disk 4, wo:0, o:1, dev:sde1
> [  977.575053]  disk 5, wo:0, o:1, dev:sdf1
> [  977.575058]  disk 6, wo:0, o:1, dev:sdg1
> [  979.547149] md: unbind<sdh1>
> [  979.558031] md: export_rdev(sdh1)
> [  979.592646] md/raid1:md13: Disk failure on sdh4, disabling device.
> [  979.592650] md/raid1:md13: Operation continuing on 7 devices.
> [  979.650862] RAID1 conf printout:
> [  979.650869]  --- wd:7 rd:8
> [  979.650875]  disk 0, wo:0, o:1, dev:sda4
> [  979.650880]  disk 1, wo:0, o:1, dev:sdb4
> [  979.650885]  disk 2, wo:0, o:1, dev:sdc4
> [  979.650890]  disk 3, wo:0, o:1, dev:sdd4
> [  979.650895]  disk 4, wo:0, o:1, dev:sdg4
> [  979.650900]  disk 5, wo:0, o:1, dev:sdf4
> [  979.650905]  disk 6, wo:0, o:1, dev:sde4
> [  979.650911]  disk 7, wo:1, o:0, dev:sdh4
> [  979.656024] RAID1 conf printout:
> [  979.656029]  --- wd:7 rd:8
> [  979.656034]  disk 0, wo:0, o:1, dev:sda4
> [  979.656039]  disk 1, wo:0, o:1, dev:sdb4
> [  979.656044]  disk 2, wo:0, o:1, dev:sdc4
> [  979.656049]  disk 3, wo:0, o:1, dev:sdd4
> [  979.656054]  disk 4, wo:0, o:1, dev:sdg4
> [  979.656059]  disk 5, wo:0, o:1, dev:sdf4
> [  979.656063]  disk 6, wo:0, o:1, dev:sde4
> [  981.604906] md: unbind<sdh4>
> [  981.616035] md: export_rdev(sdh4)
> [  981.753058] md/raid:md0: Disk failure on sdh3, disabling device.
> [  981.753062] md/raid:md0: Operation continuing on 6 devices.
> [  983.765852] md: unbind<sdh3>
> [  983.777030] md: export_rdev(sdh3)
> [ 1060.094825] journal commit I/O error
> [ 1060.099196] journal commit I/O error
> [ 1060.103525] journal commit I/O error
> [ 1060.108698] journal commit I/O error
> [ 1060.116311] journal commit I/O error
> [ 1060.123634] journal commit I/O error
> [ 1060.127225] journal commit I/O error
> [ 1060.130930] journal commit I/O error
> [ 1060.137651] EXT4-fs (md0): previous I/O error to superblock detected
> [ 1060.178323] Buffer I/O error on device md0, logical block 0
> [ 1060.181873] lost page write due to I/O error on md0
> [ 1060.185634] EXT4-fs error (device md0): ext4_put_super:849: Couldn't
> clean up the journal
> [ 1062.662723] md0: detected capacity change from 13991546060800 to 0
> [ 1062.666308] md: md0 stopped.
> [ 1062.669760] md: unbind<sda3>
> [ 1062.681031] md: export_rdev(sda3)
> [ 1062.684466] md: unbind<sdg3>
> [ 1062.695023] md: export_rdev(sdg3)
> [ 1062.698342] md: unbind<sdf3>
> [ 1062.709021] md: export_rdev(sdf3)
> [ 1062.712310] md: unbind<sde3>
> [ 1062.723029] md: export_rdev(sde3)
> [ 1062.726245] md: unbind<sdd3>
> [ 1062.737022] md: export_rdev(sdd3)
> [ 1062.740112] md: unbind<sdc3>
> [ 1062.751022] md: export_rdev(sdc3)
> [ 1062.753934] md: unbind<sdb3>
> [ 1062.764021] md: export_rdev(sdb3)
> [ 1063.772687] md: md0 stopped.
> [ 1064.782381] md: md0 stopped.
> [ 1065.792585] md: md0 stopped.
> [ 1066.801668] md: md0 stopped.
> [ 1067.812573] md: md0 stopped.
> [ 1068.821548] md: md0 stopped.
> [ 1069.830667] md: md0 stopped.
> [ 1070.839554] md: md0 stopped.
> [ 1071.848418] md: md0 stopped.
>


-- 

Maurizio De Santis
DEVELOPMENT MANAGER
Morgan S.p.A.
Via Degli Olmetti, 36
00060 Formello (RM), Italy
t. 06.9075275
w. www.morganspa.com
m. m.desantis@morganspa.com

In ottemperanza al Dlgs. 196/2003 sulla tutela dei dati personali, le informazioni contenute in questo messaggio sono strettamente riservate e sono esclusivamente indirizzate al destinatario; qualsiasi uso, o divulgazione dello stesso è vietata. Nel caso in cui abbiate ricevuto questo messaggio per errore. Vi invitiamo ad avvertire il mittente al più presto e a procedere all'immediata distruzione dello stesso.

According to Italian law Dlgs. 196/2003 concerning privacy, information contained in this message is confidential and intended for the addressee only; any use, copy or distribution of same is strictly prohibited. If you have received this message in error, you are requested to inform the sender as soon as possible and immediately destroy it.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* AW: AW: [HELP] Recover a RAID5 with 8 drives
  2014-01-30 12:20     ` AW: " Samer, Michael (I/ET-83, extern)
@ 2014-01-30 12:22       ` Samer, Michael (I/ET-83, extern)
  2014-01-30 21:48       ` Chris Murphy
  1 sibling, 0 replies; 7+ messages in thread
From: Samer, Michael (I/ET-83, extern) @ 2014-01-30 12:22 UTC (permalink / raw)
  To: 'Maurizio De Santis'; +Cc: 'linux-raid@vger.kernel.org'

>Hi Michael,


Hi Maurizio

>I agree with you that our situations seem very similar, moreover your 
>analysis seems correct to me, since our hard disks are all WD Caviar 
>Green, so they lack of the TLER feature (which I wasn't aware of, thanks 
>for pointing out this too).


The TLER is partly available, but per default not active as you surely
have found out. Everytime to use smartctl.... just to force using it is
a hardware problem, and we solved it by using real disks this time.

>Luckily I just managed to access to the RAID in order to backup 
>important data, executing `mdadm --assemble --force /dev/md0 
>/dev/sd[abcdefgh]3`; so the crucial part is done; now I have the 
>"freedom" to do everything in order to resolve the issue.

Luckily your disks were not corrupted (badBlacks) as mine. 5 out of 8
disks hat BB from 8k to 47k blocks, one was out of order (so a collegue
of Ontrack helped me to get this one running=sdg). After getting most
of the datas transferred I recognized the md superblock is gone on 5
of 8 HDs and the partition table is unreadable on two, so I had to
implement them into the transferred disks.

>Now I would ask you:

>  * how did you proceed in order to restore your situation? Do you have
>    any suggestion?

I'd not have asked in my first mail for help if I could assemble it
somehow. I'd advice for the doing as mdadm is far away of my knowledge
(which is good on hardware RAID Controllers and ZFS Systems, which I
run on big servers). Currently I need a way which could work. As my
problem was not even replied once it seems I'm on my own.
My way would be (as I have a backup of all disks when they were
delivered to me): 
a) using testdisk to extract the Part table from a running
Disk into the PT missing 5 disks
b) get mdadm to write new superblocks into the disks (the datas are
not tempered/altered that way as I understand mdadm manual) via
--create --asssume-clean and leave the sdb out via the missing
Parameter so like
mdadm --create --assume-clean /dev/md0 --level=5 --raid-devices=8
/dev/sd{a,c,d,e,f,g,h}3 missing
c) run an e2fsck on the md0 (which is now possible as I added
2GB RAM)
d) get the datas into our SAN

e) recreate the array manually and exporting this time into the
mdadm.conf (which the QNAP does not do)


Funny thing is that the QNAP support provided me with wrong ways
to go according to Neil Browns answer and the QNAP 859 is not able
to run an fsck by default as it has not enough ram when volume is over
8TiB



>  * reading about TLER I believe I understood that the failing disks
>are not necessarly broken, but the RAID thinks they are; does it mean
>    that I can still use the failing disks?

One bad Block is for me enough to swap a drive, but you are right:
the disk just took too much time to recover so it was thrown out of
the array. As my disks had big areas with defective blocks in just
2 years of running all disks were replaced. Just the 8 month of
waiting now costed more than the complete array and replacement drives.
It just proves: good disk are worth their money!

You might have read in my thread that this was the second QNAP with
the same problem (this time much more sincere; the first case about
one year back was just like your one)


Cheers
Michael


Il 28/01/2014 21:11, Samer, Michael (I/ET-83, extern) ha scritto:
> Hello Maurizio
> A very likewise case did happened to me (search for QNAP).
> Your box dropped a second one (=full failure) while rebuilding, I guess due to read errors and no TLER capable drive.
> Western Digital is prone for this.
>
> I was lucky to be able to copy all of my faulty (5 of 8) drives and currently I try to recreate the md superblocks which have been lost on the last write.
> What drives do you use?
>
> Cheers
> Sam
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] Im Auftrag von Maurizio De Santis
> Gesendet: Dienstag, 28. Januar 2014 16:30
> An: linux-raid@vger.kernel.org
> Betreff: [HELP] Recover a RAID5 with 8 drives
>
> Hi!
>
> I think I've got a problem :-/ I have a QNAP NAS with a 8 disks RAID5.
> Some days ago I got a "Disk Read/Write Error" on the 8th drive
> (/dev/sdh), with the suggestion to replace the disk.
>
> I replaced it, but after a bit the RAID rebuilding failed, and the QNAP
> Admin Interface still gives me a "Disk Read/Write Error" on /dev/sdh.
> Plus, I can't access to the RAID data anymore :-/
>
> I was following this guide
> https://raid.wiki.kernel.org/index.php/RAID_Recovery but, since I
> haven't got any backup (I promise I will do them in the future!) I'm
> afraid to run any possibly destructive command.
>
> How do you suggest to proceed? I would like to make a RAID excluding the
> 8th disk in order to mount it and backup important data, but I don't
> even know if it is doable :-/ Moreover, looking at `mdadm --examine`
> output I see that sdb seems to have problems too, also if QNAP Admin
> Interface doesn't report it.
>
> Here some informations about the machine status:
>
> # uname -a
> Linux NAS 3.4.6 #1 SMP Thu Sep 12 10:56:51 CST 2013 x86_64 unknown
>
> # mdadm -V
> mdadm - v2.6.3 - 20th August 2007
>
> # cat /etc/mdadm.conf
> ARRAY /dev/md0
> devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3,/dev/sde3,/dev/sdf3,/dev/sdg3,/dev/sdh3
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath]
> md8 : active raid1 sdg2[2](S) sdf2[3](S) sde2[4](S) sdd2[5](S)
> sdc2[6](S) sdb2[1] sda2[0]
>         530048 blocks [2/2] [UU]
>
> md13 : active raid1 sda4[0] sde4[6] sdf4[5] sdg4[4] sdd4[3] sdc4[2] sdb4[1]
>         458880 blocks [8/7] [UUUUUUU_]
>         bitmap: 8/57 pages [32KB], 4KB chunk
>
> md9 : active raid1 sda1[0] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>         530048 blocks [8/7] [UUUUUUU_]
>         bitmap: 30/65 pages [120KB], 4KB chunk
>
> unused devices: <none>
>
> # mdadm --examine /dev/sd[abcdefgh]3
> /dev/sda3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047ab - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     0       8        3        0      active sync   /dev/sda3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdb3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:09:57 2014
>             State : active
>    Active Devices : 7
> Working Devices : 8
>    Failed Devices : 1
>     Spare Devices : 1
>          Checksum : 97f3567d - correct
>            Events : 0.2944837
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     1       8       19        1      active sync   /dev/sdb3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       8       19        1      active sync   /dev/sdb3
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
> /dev/sdc3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047cf - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     2       8       35        2      active sync   /dev/sdc3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdd3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047e1 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     3       8       51        3      active sync   /dev/sdd3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sde3:
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047f3 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     4       8       67        4      active sync   /dev/sde3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdf3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204805 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     5       8       83        5      active sync   /dev/sdf3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdg3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204817 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     6       8       99        6      active sync   /dev/sdg3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdh3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:18:26 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 7
>    Failed Devices : 2
>     Spare Devices : 1
>          Checksum : 98204851 - correct
>            Events : 0.2944847
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     8       8      115        8      spare   /dev/sdh3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
>
> # dmesg **edited (removed unuseful parts)**
> , wo:0, o:1, dev:sdb2
> [  975.516724] RAID1 conf printout:
> [  975.516728]  --- wd:2 rd:2
> [  975.516732]  disk 0, wo:0, o:1, dev:sda2
> [  975.516737]  disk 1, wo:0, o:1, dev:sdb2
> [  975.516740] RAID1 conf printout:
> [  975.516744]  --- wd:2 rd:2
> [  975.516748]  disk 0, wo:0, o:1, dev:sda2
> [  975.516753]  disk 1, wo:0, o:1, dev:sdb2
> [  977.495709] md: unbind<sdh2>
> [  977.505048] md: export_rdev(sdh2)
> [  977.535277] md/raid1:md9: Disk failure on sdh1, disabling device.
> [  977.575038]  disk 2, wo:0, o:1, dev:sdc1
> [  977.575043]  disk 3, wo:0, o:1, dev:sdd1
> [  977.575048]  disk 4, wo:0, o:1, dev:sde1
> [  977.575053]  disk 5, wo:0, o:1, dev:sdf1
> [  977.575058]  disk 6, wo:0, o:1, dev:sdg1
> [  979.547149] md: unbind<sdh1>
> [  979.558031] md: export_rdev(sdh1)
> [  979.592646] md/raid1:md13: Disk failure on sdh4, disabling device.
> [  979.592650] md/raid1:md13: Operation continuing on 7 devices.
> [  979.650862] RAID1 conf printout:
> [  979.650869]  --- wd:7 rd:8
> [  979.650875]  disk 0, wo:0, o:1, dev:sda4
> [  979.650880]  disk 1, wo:0, o:1, dev:sdb4
> [  979.650885]  disk 2, wo:0, o:1, dev:sdc4
> [  979.650890]  disk 3, wo:0, o:1, dev:sdd4
> [  979.650895]  disk 4, wo:0, o:1, dev:sdg4
> [  979.650900]  disk 5, wo:0, o:1, dev:sdf4
> [  979.650905]  disk 6, wo:0, o:1, dev:sde4
> [  979.650911]  disk 7, wo:1, o:0, dev:sdh4
> [  979.656024] RAID1 conf printout:
> [  979.656029]  --- wd:7 rd:8
> [  979.656034]  disk 0, wo:0, o:1, dev:sda4
> [  979.656039]  disk 1, wo:0, o:1, dev:sdb4
> [  979.656044]  disk 2, wo:0, o:1, dev:sdc4
> [  979.656049]  disk 3, wo:0, o:1, dev:sdd4
> [  979.656054]  disk 4, wo:0, o:1, dev:sdg4
> [  979.656059]  disk 5, wo:0, o:1, dev:sdf4
> [  979.656063]  disk 6, wo:0, o:1, dev:sde4
> [  981.604906] md: unbind<sdh4>
> [  981.616035] md: export_rdev(sdh4)
> [  981.753058] md/raid:md0: Disk failure on sdh3, disabling device.
> [  981.753062] md/raid:md0: Operation continuing on 6 devices.
> [  983.765852] md: unbind<sdh3>
> [  983.777030] md: export_rdev(sdh3)
> [ 1060.094825] journal commit I/O error
> [ 1060.099196] journal commit I/O error
> [ 1060.103525] journal commit I/O error
> [ 1060.108698] journal commit I/O error
> [ 1060.116311] journal commit I/O error
> [ 1060.123634] journal commit I/O error
> [ 1060.127225] journal commit I/O error
> [ 1060.130930] journal commit I/O error
> [ 1060.137651] EXT4-fs (md0): previous I/O error to superblock detected
> [ 1060.178323] Buffer I/O error on device md0, logical block 0
> [ 1060.181873] lost page write due to I/O error on md0
> [ 1060.185634] EXT4-fs error (device md0): ext4_put_super:849: Couldn't
> clean up the journal
> [ 1062.662723] md0: detected capacity change from 13991546060800 to 0
> [ 1062.666308] md: md0 stopped.
> [ 1062.669760] md: unbind<sda3>
> [ 1062.681031] md: export_rdev(sda3)
> [ 1062.684466] md: unbind<sdg3>
> [ 1062.695023] md: export_rdev(sdg3)
> [ 1062.698342] md: unbind<sdf3>
> [ 1062.709021] md: export_rdev(sdf3)
> [ 1062.712310] md: unbind<sde3>
> [ 1062.723029] md: export_rdev(sde3)
> [ 1062.726245] md: unbind<sdd3>
> [ 1062.737022] md: export_rdev(sdd3)
> [ 1062.740112] md: unbind<sdc3>
> [ 1062.751022] md: export_rdev(sdc3)
> [ 1062.753934] md: unbind<sdb3>
> [ 1062.764021] md: export_rdev(sdb3)
> [ 1063.772687] md: md0 stopped.
> [ 1064.782381] md: md0 stopped.
> [ 1065.792585] md: md0 stopped.
> [ 1066.801668] md: md0 stopped.
> [ 1067.812573] md: md0 stopped.
> [ 1068.821548] md: md0 stopped.
> [ 1069.830667] md: md0 stopped.
> [ 1070.839554] md: md0 stopped.
> [ 1071.848418] md: md0 stopped.
>


-- 

Maurizio De Santis
DEVELOPMENT MANAGER
Morgan S.p.A.
Via Degli Olmetti, 36
00060 Formello (RM), Italy
t. 06.9075275
w. www.morganspa.com
m. m.desantis@morganspa.com

In ottemperanza al Dlgs. 196/2003 sulla tutela dei dati personali, le informazioni contenute in questo messaggio sono strettamente riservate e sono esclusivamente indirizzate al destinatario; qualsiasi uso, o divulgazione dello stesso è vietata. Nel caso in cui abbiate ricevuto questo messaggio per errore. Vi invitiamo ad avvertire il mittente al più presto e a procedere all'immediata distruzione dello stesso.

According to Italian law Dlgs. 196/2003 concerning privacy, information contained in this message is confidential and intended for the addressee only; any use, copy or distribution of same is strictly prohibited. If you have received this message in error, you are requested to inform the sender as soon as possible and immediately destroy it.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [HELP] Recover a RAID5 with 8 drives
  2014-01-30 12:20     ` AW: " Samer, Michael (I/ET-83, extern)
  2014-01-30 12:22       ` Samer, Michael (I/ET-83, extern)
@ 2014-01-30 21:48       ` Chris Murphy
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Murphy @ 2014-01-30 21:48 UTC (permalink / raw)
  To: Samer, Michael (I/ET-83, extern)
  Cc: 'Maurizio De Santis', 'linux-raid@vger.kernel.org'


On Jan 30, 2014, at 5:20 AM, "Samer, Michael (I/ET-83, extern)" <extern.michael.samer@audi.de> wrote:
> 
>  * reading about TLER I believe I understood that the failing disks
> are not necessarly broken, but the RAID thinks they are; does it mean
>    that I can still use the failing disks?

For starters, make sure the SCSI command timer is increased to at least that of the drive timeout, there's a chance the drive might recover data before there's a read error.

echo 120 >/sys/block/sdX/device/timeout

This is per drive kernel setting. So you have to do this for each drive. It's also not persistent across reboots.

If during your rebuild any two drives report a read error, then it's a problem. And the rebuild stops. In this case, you can find the affected drive(s) and sectors causing the read error from dmesg. Then on individual physical drives, do sector reads to locate the exact LBAs affected. If these are AF disks, don't forget for each bad 4096 byte sector, you'll get 8 LBA read errors. To fix it, you must write to all 8 LBAs. I'd avoid using bs=4096 to avoid confusion because that also changes any seek= value away from 512e based LBAs. Just keep it all 512 sector based, and just know you have to write out a count of at least 8 to fix each physical sector. Before doing this though, I'd report back if you're getting two UREs on a stripe and the rebuild fails. Someone else with more experience may help with which disk to choose. 

Obviously the above means destroying some data on disk, but it's better to zero a bad sector (which is lost anyway) rather than it producing a URE - this way you only get one URE for that stripe on the next rebuild. What data is corrupted as a result of this depends on what was located in those sectors, it has a decent chance of being free space if the array isn't full. Next most likely is some file data or metadata. And then less likely is file system metadata… once the rebuild is done, you'll run an fsck with -n, to see if there are file system problems and how major they are. You don't necessarily want to run an fsck that writes changes to the file system as a first step, it could write changes that hose the fs even though the array rebuild was successful. Anyway, cross that bridge when you come to it, but it's better to work slower and make as few changes as necessary if this is an array that you don't have a current backup for.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-30 21:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-28 15:29 [HELP] Recover a RAID5 with 8 drives Maurizio De Santis
2014-01-28 20:11 ` AW: " Samer, Michael (I/ET-83, extern)
2014-01-29 14:14   ` Maurizio De Santis
2014-01-30  9:26     ` Brad Campbell
2014-01-30 12:20     ` AW: " Samer, Michael (I/ET-83, extern)
2014-01-30 12:22       ` Samer, Michael (I/ET-83, extern)
2014-01-30 21:48       ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.