linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Data recovery after the failure of two disks of 4
@ 2012-09-05 13:34 Carabetta Giulio
  2012-09-11  1:03 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Carabetta Giulio @ 2012-09-05 13:34 UTC (permalink / raw)
  To: 'linux-raid@vger.kernel.org'

I'm trying to retrieve a raid 5 array after the failure of two disks of 4.
"Simply", the controller has lost a disk, and after a couple of minutes, it lost another.
The disappearance of the disk also happened to me while I was trying to pull out the data from the disk, so I guess it should be a problem with the control board of the disks...

However, the server at the time of the fault was not doing anything special, so the data "critics"  are still there, on the surface of the disk ...

Anyhow, I have two good disks and two faults.

More specifically, the disks (4 identical 2TB WD20EARS) are all partitioned in the same way: the first partition, about 250mb, the second with the rest of the free space.
- sda1 and sdb1 as md0 (raid1) with /boot
- sdc1 and sdd1 as md2 (raid1) with swaps 
- sd[abcd]2 as md1 (RAID5) with root partition.

Swap is not a matter, and boot array has no problem. The first time I found the problem it didn't boot just because the bios did not see the disks (both with boot partition...), but was temporary error...

The first disk to fail was sdb, and the second was sda: I'm guessing by looking at the differences between the superblocks: (the full dump of superblocks is queued to the message)

---
sda2:
        Update Time: Mon Aug 27 20:46:05 2012
             Events: 622
       Array State: A.AA ('A' == active, '.' == Missing)

sdb2:
        Update Time: Mon Aug 27 20:44:22 2012
             Events: 600
       Array State: AAAA ('A' == active, '.' == Missing)

SdC2:
        Update Time: Mon Aug 27 20:46:33 2012
             Events: 625
       Array State: ..AA ('A' == active, '.' == Missing)

sdd2:
        Update Time: Mon Aug 27 20:46:33 2012
             Events: 625
       Array State: ..AA ('A' == active, '.' == Missing)
---

Now I'm copying partitions elsewhere, with ddrescue, to replace the faulty disks and rebuild everything.

In the meantime, I did a first test on the array md1 (root partition, the one with all my data...)

Trying to reassemble the array I got:

# Mdadm --assemble --force --verbose /dev/md11 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
mdadm: forcing event count in /dev/sda2(o) from 622 upto 625
mdadm: Marking array /dev/md11 as 'clean'
mdadm: added /dev/sdb2 to /dev/md11 as 1 (possibly out of date)
mdadm: /dev/md11 has been started with 3 drives (out of 4).


Then I mounted the array and I saw the correct file system.
To avoid a new fault (with disks very unstable), I stopped and removed the array very quickly, so I didn't tryed to read a file, I simply did few ls...

Now the question.

I was copying only 3 disks, sdd, sdc, and the "freshest" faulty: sda. With 3 out of 4 disks in raid5 should be sufficient...
But while copying the data, I got a read error on sda. I lost just 4Kbyte, but I do not know what piece of data is part of what...

So now I'm ddrescue'ing the fourth disk.

And then what?

While I wait for the replacement disks (luckily under warranty, at least that ...), I need some suggestions.

I supposed to copy the images on the new disk, and then try to assemble the array, but not know what could be the best approach (and if there's another one over a simple "mdadm --assemble").

Keeping hold sdc and sdd as they are intact (at the moment ...): on the one hand we have a data disk "old" (sdb, the first to break ...) but without surface errors, and on the other hand, we have the other disk with the newest data (sda, the last to break), but with a 4k hole.
Moreover sda has been forced as "good"...

Which options I have?

Thanks

Giulio Carabetta

===================================================
    root@PartedMagic:/mnt# mdadm --examine /dev/sda2
    /dev/sda2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
               Name : ubuntu:0
      Creation Time : Sun Sep 25 09:10:23 2011
         Raid Level : raid5
       Raid Devices : 4
     
     Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
         Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
      Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
              State : active
        Device UUID : 3d01cfa9:6313d51c:402b3ca5:815a84e9
     
        Update Time : Mon Aug 27 20:46:05 2012
           Checksum : c51fe8dc - correct
             Events : 622
     
             Layout : left-symmetric
         Chunk Size : 512K
     
       Device Role : Active device 0
       Array State : A.AA ('A' == active, '.' == missing)
     
     
    root@PartedMagic:/mnt# mdadm --examine /dev/sdb2
    /dev/sdb2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
               Name : ubuntu:0
      Creation Time : Sun Sep 25 09:10:23 2011
         Raid Level : raid5
       Raid Devices : 4
     
     Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
         Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
      Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 0c64fdf8:c55ee450:01f05a3c:57b87308
     
        Update Time : Mon Aug 27 20:44:22 2012
           Checksum : fe6eb926 - correct
             Events : 600
     
             Layout : left-symmetric
         Chunk Size : 512K
     
       Device Role : Active device 1
       Array State : AAAA ('A' == active, '.' == missing)
     
     
    root@PartedMagic:/mnt# mdadm --examine /dev/sdc2
    /dev/sdc2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
               Name : ubuntu:0
      Creation Time : Sun Sep 25 09:10:23 2011
         Raid Level : raid5
       Raid Devices : 4
     
     Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
         Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
      Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 0bb6c440:a2e47ae9:50eee929:fee9fa5e
     
        Update Time : Mon Aug 27 20:46:33 2012
           Checksum : 22e0c195 - correct
             Events : 625
     
             Layout : left-symmetric
         Chunk Size : 512K
     
       Device Role : Active device 2
       Array State : ..AA ('A' == active, '.' == missing)
     
     
    root@PartedMagic:/mnt# mdadm --examine /dev/sdd2
    /dev/sdd2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
               Name : ubuntu:0
      Creation Time : Sun Sep 25 09:10:23 2011
         Raid Level : raid5
       Raid Devices : 4
     
     Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
         Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
      Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 1f06610d:379589ed:db2a719b:82419b35
     
        Update Time : Mon Aug 27 20:46:33 2012
           Checksum : 3bb3564f - correct
             Events : 625
     
             Layout : left-symmetric
         Chunk Size : 512K
     
       Device Role : Active device 3
       Array State : ..AA ('A' == active, '.' == missing)

===================================================

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Data recovery after the failure of two disks of 4
  2012-09-05 13:34 Data recovery after the failure of two disks of 4 Carabetta Giulio
@ 2012-09-11  1:03 ` NeilBrown
  2012-09-12  7:59   ` R: " Carabetta Giulio
  2012-11-06 14:37   ` Carabetta Giulio
  0 siblings, 2 replies; 4+ messages in thread
From: NeilBrown @ 2012-09-11  1:03 UTC (permalink / raw)
  To: Carabetta Giulio; +Cc: 'linux-raid@vger.kernel.org'

[-- Attachment #1: Type: text/plain, Size: 9025 bytes --]

On Wed, 5 Sep 2012 15:34:00 +0200 Carabetta Giulio <g.carabetta@abi.it> wrote:

> I'm trying to retrieve a raid 5 array after the failure of two disks of 4.
> "Simply", the controller has lost a disk, and after a couple of minutes, it lost another.
> The disappearance of the disk also happened to me while I was trying to pull out the data from the disk, so I guess it should be a problem with the control board of the disks...
> 
> However, the server at the time of the fault was not doing anything special, so the data "critics"  are still there, on the surface of the disk ...
> 
> Anyhow, I have two good disks and two faults.
> 
> More specifically, the disks (4 identical 2TB WD20EARS) are all partitioned in the same way: the first partition, about 250mb, the second with the rest of the free space.
> - sda1 and sdb1 as md0 (raid1) with /boot
> - sdc1 and sdd1 as md2 (raid1) with swaps 
> - sd[abcd]2 as md1 (RAID5) with root partition.
> 
> Swap is not a matter, and boot array has no problem. The first time I found the problem it didn't boot just because the bios did not see the disks (both with boot partition...), but was temporary error...
> 
> The first disk to fail was sdb, and the second was sda: I'm guessing by looking at the differences between the superblocks: (the full dump of superblocks is queued to the message)
> 
> ---
> sda2:
>         Update Time: Mon Aug 27 20:46:05 2012
>              Events: 622
>        Array State: A.AA ('A' == active, '.' == Missing)
> 
> sdb2:
>         Update Time: Mon Aug 27 20:44:22 2012
>              Events: 600
>        Array State: AAAA ('A' == active, '.' == Missing)
> 
> SdC2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> 
> sdd2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> ---
> 
> Now I'm copying partitions elsewhere, with ddrescue, to replace the faulty disks and rebuild everything.
> 
> In the meantime, I did a first test on the array md1 (root partition, the one with all my data...)
> 
> Trying to reassemble the array I got:
> 
> # Mdadm --assemble --force --verbose /dev/md11 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
> mdadm: forcing event count in /dev/sda2(o) from 622 upto 625
> mdadm: Marking array /dev/md11 as 'clean'
> mdadm: added /dev/sdb2 to /dev/md11 as 1 (possibly out of date)
> mdadm: /dev/md11 has been started with 3 drives (out of 4).
> 
> 
> Then I mounted the array and I saw the correct file system.
> To avoid a new fault (with disks very unstable), I stopped and removed the array very quickly, so I didn't tryed to read a file, I simply did few ls...

Use --assemble --force is the correct thing to do.  It gives you the best
chance of getting all your data.
If you don't trust the drives, you should get replacements and use ddrescue
to copy the data from the bad device to the new device.  Then assemble the
array using the new device.

> 
> Now the question.
> 
> I was copying only 3 disks, sdd, sdc, and the "freshest" faulty: sda. With 3 out of 4 disks in raid5 should be sufficient...
> But while copying the data, I got a read error on sda. I lost just 4Kbyte, but I do not know what piece of data is part of what...

You might be lucky and it is a block that isn't used.  You might be unlucky
and it is some critical data.  There isn't a lot you can do about that though
- the data appears to be gone.

> 
> So now I'm ddrescue'ing the fourth disk.
> 
> And then what?
> 
> While I wait for the replacement disks (luckily under warranty, at least that ...), I need some suggestions.
> 
> I supposed to copy the images on the new disk, and then try to assemble the array, but not know what could be the best approach (and if there's another one over a simple "mdadm --assemble").

Yes, just copy from bad disk to good disk with ddrescue, then assemble with
mdadm.

NeilBrown


> 
> Keeping hold sdc and sdd as they are intact (at the moment ...): on the one hand we have a data disk "old" (sdb, the first to break ...) but without surface errors, and on the other hand, we have the other disk with the newest data (sda, the last to break), but with a 4k hole.
> Moreover sda has been forced as "good"...
> 
> Which options I have?
> 
> Thanks
> 
> Giulio Carabetta
> 
> ===================================================
>     root@PartedMagic:/mnt# mdadm --examine /dev/sda2
>     /dev/sda2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : active
>         Device UUID : 3d01cfa9:6313d51c:402b3ca5:815a84e9
>      
>         Update Time : Mon Aug 27 20:46:05 2012
>            Checksum : c51fe8dc - correct
>              Events : 622
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 0
>        Array State : A.AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdb2
>     /dev/sdb2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0c64fdf8:c55ee450:01f05a3c:57b87308
>      
>         Update Time : Mon Aug 27 20:44:22 2012
>            Checksum : fe6eb926 - correct
>              Events : 600
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 1
>        Array State : AAAA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdc2
>     /dev/sdc2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0bb6c440:a2e47ae9:50eee929:fee9fa5e
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 22e0c195 - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 2
>        Array State : ..AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdd2
>     /dev/sdd2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 1f06610d:379589ed:db2a719b:82419b35
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 3bb3564f - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 3
>        Array State : ..AA ('A' == active, '.' == missing)
> 
> ===================================================--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* R: Data recovery after the failure of two disks of 4
  2012-09-11  1:03 ` NeilBrown
@ 2012-09-12  7:59   ` Carabetta Giulio
  2012-11-06 14:37   ` Carabetta Giulio
  1 sibling, 0 replies; 4+ messages in thread
From: Carabetta Giulio @ 2012-09-12  7:59 UTC (permalink / raw)
  To: 'NeilBrown'; +Cc: 'linux-raid@vger.kernel.org'

Thanks a lot, Neil.

I'll update you: I'm waiting for warranty replacement...


Giulio Carabetta 
Ufficio Sistemi Informativi
Tel. +39 066767733
Cell. 3489025709
Fax. 0667678028
g.carabetta@abi.it
www.abi.it

 Logo

 AmbientePrima di stampare questa mail, pensa all'ambiente ** Think about the environment before printing 
Il contenuto e gli allegati di questo messaggio sono strettamente confidenziali, e ne sono vietati la diffusione e l'uso non autorizzato.
Le opinioni eventualmente espresse sono quelle dell'autore: pertanto il messaggio non costituisce impegno contrattuale tra l'ABI e il destinatario
e l'Associazione non assume alcuna responsabilita' riguardo ai contenuti del testo e dei relativi allegati, ne' per eventuali intercettazioni,
modifiche o danneggiamenti. Se questo messaggio le fosse pervenuto per errore,la preghiamo di distruggerlo e comunicarne l'errata ricezione a postmaster@abi.it. 
________________________________

 


-----Messaggio originale-----
Da: NeilBrown [mailto:neilb@suse.de] 
Inviato: martedì 11 settembre 2012 3.03
A: Carabetta Giulio
Cc: 'linux-raid@vger.kernel.org'
Oggetto: Re: Data recovery after the failure of two disks of 4

On Wed, 5 Sep 2012 15:34:00 +0200 Carabetta Giulio <g.carabetta@abi.it> wrote:

> I'm trying to retrieve a raid 5 array after the failure of two disks of 4.
> "Simply", the controller has lost a disk, and after a couple of minutes, it lost another.
> The disappearance of the disk also happened to me while I was trying to pull out the data from the disk, so I guess it should be a problem with the control board of the disks...
> 
> However, the server at the time of the fault was not doing anything special, so the data "critics"  are still there, on the surface of the disk ...
> 
> Anyhow, I have two good disks and two faults.
> 
> More specifically, the disks (4 identical 2TB WD20EARS) are all partitioned in the same way: the first partition, about 250mb, the second with the rest of the free space.
> - sda1 and sdb1 as md0 (raid1) with /boot
> - sdc1 and sdd1 as md2 (raid1) with swaps
> - sd[abcd]2 as md1 (RAID5) with root partition.
> 
> Swap is not a matter, and boot array has no problem. The first time I found the problem it didn't boot just because the bios did not see the disks (both with boot partition...), but was temporary error...
> 
> The first disk to fail was sdb, and the second was sda: I'm guessing 
> by looking at the differences between the superblocks: (the full dump 
> of superblocks is queued to the message)
> 
> ---
> sda2:
>         Update Time: Mon Aug 27 20:46:05 2012
>              Events: 622
>        Array State: A.AA ('A' == active, '.' == Missing)
> 
> sdb2:
>         Update Time: Mon Aug 27 20:44:22 2012
>              Events: 600
>        Array State: AAAA ('A' == active, '.' == Missing)
> 
> SdC2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> 
> sdd2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> ---
> 
> Now I'm copying partitions elsewhere, with ddrescue, to replace the faulty disks and rebuild everything.
> 
> In the meantime, I did a first test on the array md1 (root partition, 
> the one with all my data...)
> 
> Trying to reassemble the array I got:
> 
> # Mdadm --assemble --force --verbose /dev/md11 /dev/sda2 /dev/sdb2 
> /dev/sdc2 /dev/sdd2
> mdadm: forcing event count in /dev/sda2(o) from 622 upto 625
> mdadm: Marking array /dev/md11 as 'clean'
> mdadm: added /dev/sdb2 to /dev/md11 as 1 (possibly out of date)
> mdadm: /dev/md11 has been started with 3 drives (out of 4).
> 
> 
> Then I mounted the array and I saw the correct file system.
> To avoid a new fault (with disks very unstable), I stopped and removed the array very quickly, so I didn't tryed to read a file, I simply did few ls...

Use --assemble --force is the correct thing to do.  It gives you the best chance of getting all your data.
If you don't trust the drives, you should get replacements and use ddrescue to copy the data from the bad device to the new device.  Then assemble the array using the new device.

> 
> Now the question.
> 
> I was copying only 3 disks, sdd, sdc, and the "freshest" faulty: sda. With 3 out of 4 disks in raid5 should be sufficient...
> But while copying the data, I got a read error on sda. I lost just 4Kbyte, but I do not know what piece of data is part of what...

You might be lucky and it is a block that isn't used.  You might be unlucky and it is some critical data.  There isn't a lot you can do about that though
- the data appears to be gone.

> 
> So now I'm ddrescue'ing the fourth disk.
> 
> And then what?
> 
> While I wait for the replacement disks (luckily under warranty, at least that ...), I need some suggestions.
> 
> I supposed to copy the images on the new disk, and then try to assemble the array, but not know what could be the best approach (and if there's another one over a simple "mdadm --assemble").

Yes, just copy from bad disk to good disk with ddrescue, then assemble with mdadm.

NeilBrown


> 
> Keeping hold sdc and sdd as they are intact (at the moment ...): on the one hand we have a data disk "old" (sdb, the first to break ...) but without surface errors, and on the other hand, we have the other disk with the newest data (sda, the last to break), but with a 4k hole.
> Moreover sda has been forced as "good"...
> 
> Which options I have?
> 
> Thanks
> 
> Giulio Carabetta
> 
> ===================================================
>     root@PartedMagic:/mnt# mdadm --examine /dev/sda2
>     /dev/sda2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : active
>         Device UUID : 3d01cfa9:6313d51c:402b3ca5:815a84e9
>      
>         Update Time : Mon Aug 27 20:46:05 2012
>            Checksum : c51fe8dc - correct
>              Events : 622
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 0
>        Array State : A.AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdb2
>     /dev/sdb2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0c64fdf8:c55ee450:01f05a3c:57b87308
>      
>         Update Time : Mon Aug 27 20:44:22 2012
>            Checksum : fe6eb926 - correct
>              Events : 600
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 1
>        Array State : AAAA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdc2
>     /dev/sdc2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0bb6c440:a2e47ae9:50eee929:fee9fa5e
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 22e0c195 - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 2
>        Array State : ..AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdd2
>     /dev/sdd2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 1f06610d:379589ed:db2a719b:82419b35
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 3bb3564f - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 3
>        Array State : ..AA ('A' == active, '.' == missing)
> 
> ===================================================--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* R: Data recovery after the failure of two disks of 4
  2012-09-11  1:03 ` NeilBrown
  2012-09-12  7:59   ` R: " Carabetta Giulio
@ 2012-11-06 14:37   ` Carabetta Giulio
  1 sibling, 0 replies; 4+ messages in thread
From: Carabetta Giulio @ 2012-11-06 14:37 UTC (permalink / raw)
  To: 'linux-raid@vger.kernel.org'

I just did as suggested. Ddrescue data to new disks and --assemble --force did the job, I have all my data.

I did some check (xfs_repair -n / smart tests) and all seems to be ok.

I found only this:

#mdadm --examine-bitmap /dev/md0
	Filename : /dev/md0
	   Magic : 42534658
mdadm: invalid bitmap magic 0x42534658, the bitmap file appears to be corrupted
	 Version : 1048576
mdadm: unknown bitmap version 1048576, either the bitmap file is corrupted or you need to upgrade your tools


May be because I did the --assemble --force from a live cd newer than my system. I'm on Ubuntu 11.04 (kernel 2.6.38-15 - mdadm v3.1.4) and the live was PartedMagic (kernel 3.4.6 - mdadm v3.2.5).

But the same --examine-bitmap done with the live cd gave the same error.

I just need to recreate the bitmap?

Just curiosity: There whase a disk error (non-relocatable sector) and a mobo error that loses some sata channels...


Giulio Carabetta 



-----Messaggio originale-----
Da: NeilBrown [mailto:neilb@suse.de] 
Inviato: martedì 11 settembre 2012 3.03
A: Carabetta Giulio
Cc: 'linux-raid@vger.kernel.org'
Oggetto: Re: Data recovery after the failure of two disks of 4

On Wed, 5 Sep 2012 15:34:00 +0200 Carabetta Giulio <g.carabetta@abi.it> wrote:

> I'm trying to retrieve a raid 5 array after the failure of two disks of 4.
> "Simply", the controller has lost a disk, and after a couple of minutes, it lost another.
> The disappearance of the disk also happened to me while I was trying to pull out the data from the disk, so I guess it should be a problem with the control board of the disks...
> 
> However, the server at the time of the fault was not doing anything special, so the data "critics"  are still there, on the surface of the disk ...
> 
> Anyhow, I have two good disks and two faults.
> 
> More specifically, the disks (4 identical 2TB WD20EARS) are all partitioned in the same way: the first partition, about 250mb, the second with the rest of the free space.
> - sda1 and sdb1 as md0 (raid1) with /boot
> - sdc1 and sdd1 as md2 (raid1) with swaps
> - sd[abcd]2 as md1 (RAID5) with root partition.
> 
> Swap is not a matter, and boot array has no problem. The first time I found the problem it didn't boot just because the bios did not see the disks (both with boot partition...), but was temporary error...
> 
> The first disk to fail was sdb, and the second was sda: I'm guessing 
> by looking at the differences between the superblocks: (the full dump 
> of superblocks is queued to the message)
> 
> ---
> sda2:
>         Update Time: Mon Aug 27 20:46:05 2012
>              Events: 622
>        Array State: A.AA ('A' == active, '.' == Missing)
> 
> sdb2:
>         Update Time: Mon Aug 27 20:44:22 2012
>              Events: 600
>        Array State: AAAA ('A' == active, '.' == Missing)
> 
> SdC2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> 
> sdd2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> ---
> 
> Now I'm copying partitions elsewhere, with ddrescue, to replace the faulty disks and rebuild everything.
> 
> In the meantime, I did a first test on the array md1 (root partition, 
> the one with all my data...)
> 
> Trying to reassemble the array I got:
> 
> # Mdadm --assemble --force --verbose /dev/md11 /dev/sda2 /dev/sdb2 
> /dev/sdc2 /dev/sdd2
> mdadm: forcing event count in /dev/sda2(o) from 622 upto 625
> mdadm: Marking array /dev/md11 as 'clean'
> mdadm: added /dev/sdb2 to /dev/md11 as 1 (possibly out of date)
> mdadm: /dev/md11 has been started with 3 drives (out of 4).
> 
> 
> Then I mounted the array and I saw the correct file system.
> To avoid a new fault (with disks very unstable), I stopped and removed the array very quickly, so I didn't tryed to read a file, I simply did few ls...

Use --assemble --force is the correct thing to do.  It gives you the best chance of getting all your data.
If you don't trust the drives, you should get replacements and use ddrescue to copy the data from the bad device to the new device.  Then assemble the array using the new device.

> 
> Now the question.
> 
> I was copying only 3 disks, sdd, sdc, and the "freshest" faulty: sda. With 3 out of 4 disks in raid5 should be sufficient...
> But while copying the data, I got a read error on sda. I lost just 4Kbyte, but I do not know what piece of data is part of what...

You might be lucky and it is a block that isn't used.  You might be unlucky and it is some critical data.  There isn't a lot you can do about that though
- the data appears to be gone.

> 
> So now I'm ddrescue'ing the fourth disk.
> 
> And then what?
> 
> While I wait for the replacement disks (luckily under warranty, at least that ...), I need some suggestions.
> 
> I supposed to copy the images on the new disk, and then try to assemble the array, but not know what could be the best approach (and if there's another one over a simple "mdadm --assemble").

Yes, just copy from bad disk to good disk with ddrescue, then assemble with mdadm.

NeilBrown


> 
> Keeping hold sdc and sdd as they are intact (at the moment ...): on the one hand we have a data disk "old" (sdb, the first to break ...) but without surface errors, and on the other hand, we have the other disk with the newest data (sda, the last to break), but with a 4k hole.
> Moreover sda has been forced as "good"...
> 
> Which options I have?
> 
> Thanks
> 
> Giulio Carabetta
> 
> ===================================================
>     root@PartedMagic:/mnt# mdadm --examine /dev/sda2
>     /dev/sda2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : active
>         Device UUID : 3d01cfa9:6313d51c:402b3ca5:815a84e9
>      
>         Update Time : Mon Aug 27 20:46:05 2012
>            Checksum : c51fe8dc - correct
>              Events : 622
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 0
>        Array State : A.AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdb2
>     /dev/sdb2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0c64fdf8:c55ee450:01f05a3c:57b87308
>      
>         Update Time : Mon Aug 27 20:44:22 2012
>            Checksum : fe6eb926 - correct
>              Events : 600
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 1
>        Array State : AAAA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdc2
>     /dev/sdc2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0bb6c440:a2e47ae9:50eee929:fee9fa5e
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 22e0c195 - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 2
>        Array State : ..AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdd2
>     /dev/sdd2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 1f06610d:379589ed:db2a719b:82419b35
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 3bb3564f - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 3
>        Array State : ..AA ('A' == active, '.' == missing)
> 
> ===================================================--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-06 14:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-05 13:34 Data recovery after the failure of two disks of 4 Carabetta Giulio
2012-09-11  1:03 ` NeilBrown
2012-09-12  7:59   ` R: " Carabetta Giulio
2012-11-06 14:37   ` Carabetta Giulio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).