All of lore.kernel.org
 help / color / mirror / Atom feed
* Recovery of failed RAID 6 and LVM
@ 2011-09-25  7:55 Marcin M. Jessa
  2011-09-25  8:39 ` Stan Hoeppner
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-25  7:55 UTC (permalink / raw)
  To: linux-raid

Hi guys.


I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
Yesterday 3 of the drives failed working leaving the RAID setup broken.
Following [5] I managed to start the array and make it resync.
The problem I'm facing now is I cannot access any of the LVM partitions 
[3] I have on top of my md0. Fdisk says the disk doesn't contain a valid 
partition table [4].
I tried to run fsck on the lvm devices without luck.
Has any of you a suggestion, a method I could use to access my data please?



[1]:
# mdadm -QD /dev/md0
/dev/md0:
         Version : 1.2
   Creation Time : Sat Sep 24 23:59:02 2011
      Raid Level : raid6
      Array Size : 5860531200 (5589.04 GiB 6001.18 GB)
   Used Dev Size : 1953510400 (1863.01 GiB 2000.39 GB)
    Raid Devices : 5
   Total Devices : 5
     Persistence : Superblock is persistent

     Update Time : Sun Sep 25 09:40:20 2011
           State : clean, degraded, recovering
  Active Devices : 3
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 2

          Layout : left-symmetric
      Chunk Size : 512K

  Rebuild Status : 63% complete

            Name : odin:0  (local to host odin)
            UUID : be51de24:ebcc6eef:8fc41158:fc728448
          Events : 10314

     Number   Major   Minor   RaidDevice State
        0       8       65        0      active sync   /dev/sde1
        1       8       81        1      active sync   /dev/sdf1
        2       8       97        2      active sync   /dev/sdg1
        5       8      129        3      spare rebuilding   /dev/sdi1
        4       0        0        4      removed

        6       8      113        -      spare   /dev/sdh1


[2]:
# cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid6 sdh1[6](S) sdi1[5] sdg1[2] sdf1[1] sde1[0]
       5860531200 blocks super 1.2 level 6, 512k chunk, algorithm 2 
[5/3] [UUU__]
       [=======>.............]  recovery = 36.8% (720185308/1953510400) 
finish=441.4min speed=46564K/sec


[3]:
# lvdisplay
     Logging initialised at Sun Sep 25 09:49:11 2011
     Set umask from 0022 to 0077
     Finding all logical volumes
   --- Logical volume ---
   LV Name                /dev/fridge/storage
   VG Name                fridge
   LV UUID                kIhbSq-hePX-UIVv-uuiP-iK6w-djcz-iQ3cEI
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                4.88 TiB
   Current LE             1280000
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:0


[4]:

# fdisk  -l /dev/fridge/storage

Disk /dev/fridge/storage: 5368.7 GB, 5368709120000 bytes
255 heads, 63 sectors/track, 652708 cylinders, total 10485760000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1572864 bytes
Disk identifier: 0x00000000

Disk /dev/fridge/storage doesn't contain a valid partition table



[5]: 
http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of_raid_superblock



-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25  7:55 Recovery of failed RAID 6 and LVM Marcin M. Jessa
@ 2011-09-25  8:39 ` Stan Hoeppner
  2011-09-25 10:07   ` Marcin M. Jessa
  2011-09-25 13:15 ` Phil Turmel
  2011-09-25 21:40 ` NeilBrown
  2 siblings, 1 reply; 44+ messages in thread
From: Stan Hoeppner @ 2011-09-25  8:39 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

On 9/25/2011 2:55 AM, Marcin M. Jessa wrote:
> Hi guys.
>
>
> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
> Yesterday 3 of the drives failed working leaving the RAID setup broken.

What was the hardware event that caused this situation?  Did you lose 
power to, or kick the data cable out of a 3-bay eSATA enclosure?  Do you 
have a bad 3 in 1 hot swap cage?  A flaky HBA/driver?

You need to identify the cause and fix it permanently or this will 
likely happen again and again.

-- 
Stan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25  8:39 ` Stan Hoeppner
@ 2011-09-25 10:07   ` Marcin M. Jessa
  0 siblings, 0 replies; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-25 10:07 UTC (permalink / raw)
  To: stan; +Cc: linux-raid

On 9/25/11 10:39 AM, Stan Hoeppner wrote:
> On 9/25/2011 2:55 AM, Marcin M. Jessa wrote:
>> Hi guys.
>>
>>
>> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
>> Yesterday 3 of the drives failed working leaving the RAID setup broken.
>
> What was the hardware event that caused this situation? Did you lose
> power to, or kick the data cable out of a 3-bay eSATA enclosure? Do you
> have a bad 3 in 1 hot swap cage? A flaky HBA/driver?
>
> You need to identify the cause and fix it permanently or this will
> likely happen again and again.
>

The problem is the Seagate drives. Searching for issues with my
drives I found many people are complaining about the same problems as I 
have - drives failing randomly - [1-4].
I just ordered WD drives to replace the Seagate ones one by one in the 
array. I will return the Seagate HDs to the shop.
Right now my main concern is to get the data back from the LVM 
partitions and back it up...
The worst thing is the 3 drives failed so unexpectedly and fast I didn't 
even have a chance to set up a backup solution.
Any help would be greatly appreciated.



[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922
[2]: 
http://forums.seagate.com/t5/Barracuda-XT-Barracuda-Barracuda/ST2000DL003-Barracuda-Green-not-detected-at-BIOS/td-p/87154/page/7
[3]: http://www.readynas.com/forum/viewtopic.php?f=65&t=51496&p=306494
[4]: http://forum.qnap.com/viewtopic.php?f=182&t=39893&start=30



-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25  7:55 Recovery of failed RAID 6 and LVM Marcin M. Jessa
  2011-09-25  8:39 ` Stan Hoeppner
@ 2011-09-25 13:15 ` Phil Turmel
  2011-09-25 14:16   ` Marcin M. Jessa
  2011-09-25 14:41   ` Marcin M. Jessa
  2011-09-25 21:40 ` NeilBrown
  2 siblings, 2 replies; 44+ messages in thread
From: Phil Turmel @ 2011-09-25 13:15 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

On 09/25/2011 03:55 AM, Marcin M. Jessa wrote:
> Hi guys.
> 
> 
> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
> Yesterday 3 of the drives failed working leaving the RAID setup broken.
> Following [5] I managed to start the array and make it resync.
> The problem I'm facing now is I cannot access any of the LVM partitions [3] I have on top of my md0. Fdisk says the disk doesn't contain a valid partition table [4].
> I tried to run fsck on the lvm devices without luck.
> Has any of you a suggestion, a method I could use to access my data please?

[trim /]

> [5]: http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of_raid_superblock

These instructions are horrible!  If you make the slightest mistake, your data is completely hosed.

If first asks for your "mdadm -E" reports from the drives, but it has you filter them through a grep that throws away important information.  (Did you keep that report?)

Next, it has you wipe the superblocks on the array members, destroying all possibility of future forensics.

Then, it has you re-create the array, but omits "--assume-clean", so the array rebuilds.  With the slightest mistake in superblock type, chunk size, layout, alignment, data offset, or device order, the rebuild will trash your data.  Default values for some of those have changed in mdadm from version to version, so a naive "--create" command has a good chance of getting something wrong.

There is no mention of attempting "--assemble --force" with your original superblocks, which is the correct first step in this situation.  And it nearly always works.

I'm sorry, Marcin, but you shouldn't expect to get your data back.  Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.

Regards,

Phil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25 13:15 ` Phil Turmel
@ 2011-09-25 14:16   ` Marcin M. Jessa
  2011-09-25 16:43     ` Phil Turmel
  2011-09-25 14:41   ` Marcin M. Jessa
  1 sibling, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-25 14:16 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On 9/25/11 3:15 PM, Phil Turmel wrote:
> On 09/25/2011 03:55 AM, Marcin M. Jessa wrote:
[...]

>> [5]: http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of_raid_superblock
>
> These instructions are horrible!  If you make the slightest mistake, your data is completely hosed.

Do you know of a better howto ? I was desperate googling a lot, trying 
to run different commands first in order to rebuild my raid array, but 
with no luck. The only howto that started resyncing was the wikipedia 
one I linked to...

> If first asks for your "mdadm -E" reports from the drives, but it has you filter them through a grep that throws away important information.  (Did you keep that report?)

No, unfortunately I did not.

> Next, it has you wipe the superblocks on the array members, destroying all possibility of future forensics.
> Then, it has you re-create the array, but omits "--assume-clean", so the array rebuilds.  With the slightest mistake in superblock type, chunk size, layout, alignment, data offset, or device order, the rebuild will trash your data.  Default values for some of those have changed in mdadm from version to version, so a naive "--create" command has a good chance of getting something wrong.

I tried to run mdadm --assemble --assume-clean /dev/md0 /dev/sd[f-j]1 
but that AFAIR only said that the devices which still were members of 
the array and were still working were busy. I always stoped the array 
before running it.

> There is no mention of attempting "--assemble --force" with your original superblocks, which is the correct first step in this situation.  And it nearly always works.

I also tried running - with no luck:
  # mdadm --assemble --force --scan /dev/md0
  # mdadm --assemble --force /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1 
/dev/sdi1
  # mdadm --assemble --force --run /dev/md0 /dev/sde1 /dev/sdf1 
/dev/sdg1 /dev/sdi1
and
  # mdadm --assemble /dev/md0 --uuid=9f1b28cb:9efcd750:324cd77a:b318ed33 
  --force


> I'm sorry, Marcin, but you shouldn't expect to get your data back.  Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.

Oh sh** ! :( Really, there is nothing that can be done? What happened 
when I started resyncing? I thought the good, working drives would get 
the data syneced with the one of drives which failed (it did not really 
fail, it was up after reboot and smartctl --attributes --log=selftest 
shows it's healthy).


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25 13:15 ` Phil Turmel
  2011-09-25 14:16   ` Marcin M. Jessa
@ 2011-09-25 14:41   ` Marcin M. Jessa
  2011-09-25 16:19     ` Phil Turmel
  1 sibling, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-25 14:41 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On 9/25/11 3:15 PM, Phil Turmel wrote:


> I'm sorry, Marcin, but you shouldn't expect to get your data back.  Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.

What I don't understand is I still have the LVM info. It's just the LVs 
don't have partition table stored anymore:

# pvdisplay
     Logging initialised at Sun Sep 25 16:38:43 2011
     Set umask from 0022 to 0077
     Scanning for physical volume names
   --- Physical volume ---
   PV Name               /dev/md0
   VG Name               fridge
   PV Size               5.46 TiB / not usable 3.00 MiB
   Allocatable           yes
   PE Size               4.00 MiB
   Total PE              1430793
   Free PE               102153
   Allocated PE          1328640
   PV UUID               Ubx1OW-jCyN-Vcy2-4p2W-L6Qb-u6W8-cVkrE4

# vgdisplay
     Logging initialised at Sun Sep 25 16:40:12 2011
     Set umask from 0022 to 0077
     Finding all volume groups
     Finding volume group "fridge"
   --- Volume group ---
   VG Name               fridge
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  33
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                7
   Open LV               0
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               5.46 TiB
   PE Size               4.00 MiB
   Total PE              1430793
   Alloc PE / Size       1328640 / 5.07 TiB
   Free  PE / Size       102153 / 399.04 GiB
   VG UUID               ZD2fsN-dFq4-PcMB-owRh-WxGs-ciK8-PPwPbd

# lvdisplay
     Logging initialised at Sun Sep 25 16:40:30 2011
     Set umask from 0022 to 0077
     Finding all logical volumes
   --- Logical volume ---
   LV Name                /dev/fridge/storage
   VG Name                fridge
   LV UUID                kIhbSq-hePX-UIVv-uuiP-iK6w-djcz-iQ3cEI
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                4.88 TiB
   Current LE             1280000
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:0

   --- Logical volume ---
   LV Name                /dev/fridge/webstorage
   VG Name                fridge
   LV UUID                PuCGo1-LkRa-doEI-n8qU-mqS3-20Cw-SICWPk
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                100.00 GiB
   Current LE             25600
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:1

   --- Logical volume ---
   LV Name                /dev/fridge/mailstorage
   VG Name                fridge
   LV UUID                538TGs-fRYt-VT1n-r8jE-Uvv3-nNXl-Cf8ojP
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                30.00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:2

   --- Logical volume ---
   LV Name                /dev/fridge/web01
   VG Name                fridge
   LV UUID                NsABmI-ok5I-GCaE-yGV6-Dqp6-Qedz-jVDS6Y
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                10.00 GiB
   Current LE             2560
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:3

   --- Logical volume ---
   LV Name                /dev/fridge/db01
   VG Name                fridge
   LV UUID                qa88nB-MqX8-25YN-MEqf-ln81-vNtP-w2yVMW
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                10.00 GiB
   Current LE             2560
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:4

   --- Logical volume ---
   LV Name                /dev/fridge/mail01
   VG Name                fridge
   LV UUID                qxUbLd-SaDq-wCwd-Z5M6-2llk-8SJh-vTlruR
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                10.00 GiB
   Current LE             2560
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:5

   --- Logical volume ---
   LV Name                /dev/fridge/win8
   VG Name                fridge
   LV UUID                TPsBeN-Nj2o-w1mt-pkS8-d9zu-wCMm-vRv3e7
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                30.00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     6144
   Block device           253:6



-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25 14:41   ` Marcin M. Jessa
@ 2011-09-25 16:19     ` Phil Turmel
  0 siblings, 0 replies; 44+ messages in thread
From: Phil Turmel @ 2011-09-25 16:19 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

On 09/25/2011 10:41 AM, Marcin M. Jessa wrote:
> On 9/25/11 3:15 PM, Phil Turmel wrote:
> 
> 
>> I'm sorry, Marcin, but you shouldn't expect to get your data back.  Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.
> 
> What I don't understand is I still have the LVM info. It's just the LVs don't have partition table stored anymore:

You probably got the device order partially correct, which would put some of your data blocks in the correct location.  Having the LVM metadata line up is not terribly surprising.  When some drives are placed back into the correct slots, but not others, only the non-parity data on the correctly placed drives will be correct.  The rebuild will destroy the parity data on those devices, and much of the data on the other devices.  Your partition tables were probably among the latter.

If chunk size, data offset, or layout were also incorrect, then even fewer good data blocks will show up by chance in the correct location.

Without the original mdadm -E reports (complete), there's no way I know of to figure out what happened, much less repair it.

Phil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25 14:16   ` Marcin M. Jessa
@ 2011-09-25 16:43     ` Phil Turmel
  0 siblings, 0 replies; 44+ messages in thread
From: Phil Turmel @ 2011-09-25 16:43 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

On 09/25/2011 10:16 AM, Marcin M. Jessa wrote:
> On 9/25/11 3:15 PM, Phil Turmel wrote:
>> On 09/25/2011 03:55 AM, Marcin M. Jessa wrote:
> [...]
> 
>>> [5]: http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of_raid_superblock
>>
>> These instructions are horrible!  If you make the slightest mistake, your data is completely hosed.
> 
> Do you know of a better howto ? I was desperate googling a lot, trying to run different commands first in order to rebuild my raid array, but with no luck. The only howto that started resyncing was the wikipedia one I linked to...

The mdadm(1) and md(7) manual pages are first.  Next would be anything on or linked from Neil Brown's blog: http://neil.brown.name/blog/mdadm

Of course, you found this list somehow.  It's the official home of mdadm development, and the primary developer, Neil Brown, is an active participant.

>> If first asks for your "mdadm -E" reports from the drives, but it has you filter them through a grep that throws away important information.  (Did you keep that report?)
> 
> No, unfortunately I did not.

Then there's no way to determine any of the original parameters of the array, nor the proper device order.  You can't rely on the device names themselves, as modern kernels try to identify drives on multiple controllers simultaneously, and slight timing changes will change what name ends up where.  Only the original superblock will have a positive ID.

>> Next, it has you wipe the superblocks on the array members, destroying all possibility of future forensics.
>> Then, it has you re-create the array, but omits "--assume-clean", so the array rebuilds.  With the slightest mistake in superblock type, chunk size, layout, alignment, data offset, or device order, the rebuild will trash your data.  Default values for some of those have changed in mdadm from version to version, so a naive "--create" command has a good chance of getting something wrong.
> 
> I tried to run mdadm --assemble --assume-clean /dev/md0 /dev/sd[f-j]1 but that AFAIR only said that the devices which still were members of the array and were still working were busy. I always stoped the array before running it.

"--assume-clean" only applies to "--create" operations, where it suppresses the starting rebuild.  This gives you the opportunity to run "fsck -n" to test whether the device order and other parameters you used results in a working filesystem.

Devices can be reported busy for a variety of reasons.  I would examine /proc/mdstat, the output of "dmsetup table", and the contents of /sys/block/.

>> There is no mention of attempting "--assemble --force" with your original superblocks, which is the correct first step in this situation.  And it nearly always works.
> 
> I also tried running - with no luck:
>  # mdadm --assemble --force --scan /dev/md0
>  # mdadm --assemble --force /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdi1
>  # mdadm --assemble --force --run /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdi1
> and
>  # mdadm --assemble /dev/md0 --uuid=9f1b28cb:9efcd750:324cd77a:b318ed33  --force

If these failed with "device busy", you never really tested whether assembly could have worked.

>> I'm sorry, Marcin, but you shouldn't expect to get your data back.  Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.
> 
> Oh sh** ! :( Really, there is nothing that can be done? What happened when I started resyncing? I thought the good, working drives would get the data syneced with the one of drives which failed (it did not really fail, it was up after reboot and smartctl --attributes --log=selftest shows it's healthy).

"--zero-superblock" destroys all previous knowledge of the member devices' condition, role, or history.  After that, all are considered "good", with the role specified with "--create".

Phil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25  7:55 Recovery of failed RAID 6 and LVM Marcin M. Jessa
  2011-09-25  8:39 ` Stan Hoeppner
  2011-09-25 13:15 ` Phil Turmel
@ 2011-09-25 21:40 ` NeilBrown
  2011-09-25 21:58   ` Marcin M. Jessa
  2 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2011-09-25 21:40 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3836 bytes --]

On Sun, 25 Sep 2011 09:55:04 +0200 "Marcin M. Jessa" <lists@yazzy.org> wrote:

> Hi guys.
> 
> 
> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
> Yesterday 3 of the drives failed working leaving the RAID setup broken.
> Following [5] I managed to start the array and make it resync.

As has been noted, you seem to be quite lucky. [5] contains fairly bad advice
but it isn't obvious yet that you have lost everything.


> The problem I'm facing now is I cannot access any of the LVM partitions 
> [3] I have on top of my md0. Fdisk says the disk doesn't contain a valid 
> partition table [4].

You wouldn't expect an LV to contain a partition table.  You would expect it
to contain a filesystem.
What does
   fsck -f -n /dev/fridge/storage

show??

NeilBrown


> I tried to run fsck on the lvm devices without luck.
> Has any of you a suggestion, a method I could use to access my data please?
> 
> 
> 
> [1]:
> # mdadm -QD /dev/md0
> /dev/md0:
>          Version : 1.2
>    Creation Time : Sat Sep 24 23:59:02 2011
>       Raid Level : raid6
>       Array Size : 5860531200 (5589.04 GiB 6001.18 GB)
>    Used Dev Size : 1953510400 (1863.01 GiB 2000.39 GB)
>     Raid Devices : 5
>    Total Devices : 5
>      Persistence : Superblock is persistent
> 
>      Update Time : Sun Sep 25 09:40:20 2011
>            State : clean, degraded, recovering
>   Active Devices : 3
> Working Devices : 5
>   Failed Devices : 0
>    Spare Devices : 2
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>   Rebuild Status : 63% complete
> 
>             Name : odin:0  (local to host odin)
>             UUID : be51de24:ebcc6eef:8fc41158:fc728448
>           Events : 10314
> 
>      Number   Major   Minor   RaidDevice State
>         0       8       65        0      active sync   /dev/sde1
>         1       8       81        1      active sync   /dev/sdf1
>         2       8       97        2      active sync   /dev/sdg1
>         5       8      129        3      spare rebuilding   /dev/sdi1
>         4       0        0        4      removed
> 
>         6       8      113        -      spare   /dev/sdh1
> 
> 
> [2]:
> # cat /proc/mdstat
> 
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md0 : active raid6 sdh1[6](S) sdi1[5] sdg1[2] sdf1[1] sde1[0]
>        5860531200 blocks super 1.2 level 6, 512k chunk, algorithm 2 
> [5/3] [UUU__]
>        [=======>.............]  recovery = 36.8% (720185308/1953510400) 
> finish=441.4min speed=46564K/sec
> 
> 
> [3]:
> # lvdisplay
>      Logging initialised at Sun Sep 25 09:49:11 2011
>      Set umask from 0022 to 0077
>      Finding all logical volumes
>    --- Logical volume ---
>    LV Name                /dev/fridge/storage
>    VG Name                fridge
>    LV UUID                kIhbSq-hePX-UIVv-uuiP-iK6w-djcz-iQ3cEI
>    LV Write Access        read/write
>    LV Status              available
>    # open                 0
>    LV Size                4.88 TiB
>    Current LE             1280000
>    Segments               1
>    Allocation             inherit
>    Read ahead sectors     auto
>    - currently set to     6144
>    Block device           253:0
> 
> 
> [4]:
> 
> # fdisk  -l /dev/fridge/storage
> 
> Disk /dev/fridge/storage: 5368.7 GB, 5368709120000 bytes
> 255 heads, 63 sectors/track, 652708 cylinders, total 10485760000 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 524288 bytes / 1572864 bytes
> Disk identifier: 0x00000000
> 
> Disk /dev/fridge/storage doesn't contain a valid partition table
> 
> 
> 
> [5]: 
> http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of_raid_superblock
> 
> 
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25 21:40 ` NeilBrown
@ 2011-09-25 21:58   ` Marcin M. Jessa
  2011-09-25 22:18     ` NeilBrown
  0 siblings, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-25 21:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 9/25/11 11:40 PM, NeilBrown wrote:

[...]

> You wouldn't expect an LV to contain a partition table.  You would expect it
> to contain a filesystem.

Yes, there is still data available on the LVs.
I actually managed to grab some files from one of the LVs using 
foremost. But foremost is limited and creates it's own directory 
hierarchy with file names being changed.

> What does
>     fsck -f -n /dev/fridge/storage
>
> show??

# fsck -f -n /dev/fridge/storage
fsck from util-linux 2.19.1
e2fsck 1.42-WIP (02-Jul-2011)
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open 
/dev/mapper/fridge-storage

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
     e2fsck -b 8193 <device>


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25 21:58   ` Marcin M. Jessa
@ 2011-09-25 22:18     ` NeilBrown
  2011-09-25 22:21       ` Marcin M. Jessa
       [not found]       ` <4E804062.3020700@yazzy.org>
  0 siblings, 2 replies; 44+ messages in thread
From: NeilBrown @ 2011-09-25 22:18 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On Sun, 25 Sep 2011 23:58:00 +0200 "Marcin M. Jessa" <lists@yazzy.org> wrote:

> On 9/25/11 11:40 PM, NeilBrown wrote:
> 
> [...]
> 
> > You wouldn't expect an LV to contain a partition table.  You would expect it
> > to contain a filesystem.
> 
> Yes, there is still data available on the LVs.
> I actually managed to grab some files from one of the LVs using 
> foremost. But foremost is limited and creates it's own directory 
> hierarchy with file names being changed.
> 
> > What does
> >     fsck -f -n /dev/fridge/storage
> >
> > show??
> 
> # fsck -f -n /dev/fridge/storage
> fsck from util-linux 2.19.1
> e2fsck 1.42-WIP (02-Jul-2011)
> fsck.ext2: Superblock invalid, trying backup blocks...
> fsck.ext2: Bad magic number in super-block while trying to open 
> /dev/mapper/fridge-storage
> 
> The superblock could not be read or does not describe a correct ext2
> filesystem.  If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
>      e2fsck -b 8193 <device>
> 
> 

Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
xfs or something else?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-25 22:18     ` NeilBrown
@ 2011-09-25 22:21       ` Marcin M. Jessa
       [not found]       ` <4E804062.3020700@yazzy.org>
  1 sibling, 0 replies; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-25 22:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 9/26/11 12:18 AM, NeilBrown wrote:

> Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
> xfs or something else?

It was EXT4.


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
       [not found]       ` <4E804062.3020700@yazzy.org>
@ 2011-09-26  9:31         ` NeilBrown
  2011-09-26 10:53           ` Marcin M. Jessa
  2011-09-27 19:12           ` Marcin M. Jessa
  0 siblings, 2 replies; 44+ messages in thread
From: NeilBrown @ 2011-09-26  9:31 UTC (permalink / raw)
  To: lists, linux-raid

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa" <lists@yazzy.org> wrote:

> On 9/26/11 12:18 AM, NeilBrown wrote:
> 
> > Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
> > xfs or something else?
> 
> You're giving me some hope here and then silence :)
> Why did you ask about the file system? Should I run fsck on the LV ?
> 
> 
> 

You already did run fsck on the LV.  It basically said that it didn't
recognise the filesystem at all.
I asked in case maybe it was XFS in which case a different tool would be
required.
But you said it was EXT4, so the fsck.ext2 which you used should have worked
if anything would.

It is certainly odd that the LVM info is all consistent, but the filesystem
info has disappear.  It could be that you have the chunksize or device order
wrong and so it is looking for the filesystem info at the wrong place.

Nothing else I can suggest - sorry.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-26  9:31         ` NeilBrown
@ 2011-09-26 10:53           ` Marcin M. Jessa
  2011-09-26 11:10             ` NeilBrown
  2011-09-27 19:12           ` Marcin M. Jessa
  1 sibling, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-26 10:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 9/26/11 11:31 AM, NeilBrown wrote:
> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa"<lists@yazzy.org>  wrote:
>
>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>
>>> Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
>>> xfs or something else?
>>
>> You're giving me some hope here and then silence :)
>> Why did you ask about the file system? Should I run fsck on the LV ?
>>
>>
>>
>
> You already did run fsck on the LV.  It basically said that it didn't
> recognise the filesystem at all.
> I asked in case maybe it was XFS in which case a different tool would be
> required.
> But you said it was EXT4, so the fsck.ext2 which you used should have worked
> if anything would.
>
> It is certainly odd that the LVM info is all consistent, but the filesystem
> info has disappear.  It could be that you have the chunksize or device order
> wrong and so it is looking for the filesystem info at the wrong place.
>
> Nothing else I can suggest - sorry.

Would it be worth a shot to use parted, create msdos label and then make 
a partition with a ext file system on top of it and run fsck?



-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-26 10:53           ` Marcin M. Jessa
@ 2011-09-26 11:10             ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2011-09-26 11:10 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

On Mon, 26 Sep 2011 12:53:32 +0200 "Marcin M. Jessa" <lists@yazzy.org> wrote:

> On 9/26/11 11:31 AM, NeilBrown wrote:

> > Nothing else I can suggest - sorry.
> 
> Would it be worth a shot to use parted, create msdos label and then make 
> a partition with a ext file system on top of it and run fsck?
> 

I cannot imagine how doing that would actually improve your situation at
all.  It sounds like you are just corrupting things more.  But maybe I don't
understand what you are trying to do.

I really wouldn't write anything to any device until you had found out where
the data you want is.   However as I cannot suggest how to find the data (I
really think it is beyond repair - sorry) there is still nothing I can
suggest.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-26  9:31         ` NeilBrown
  2011-09-26 10:53           ` Marcin M. Jessa
@ 2011-09-27 19:12           ` Marcin M. Jessa
  2011-09-27 23:13             ` NeilBrown
  1 sibling, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-27 19:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 9/26/11 11:31 AM, NeilBrown wrote:
> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa"<lists@yazzy.org>  wrote:
>
>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>
>>> Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
>>> xfs or something else?
>>
>> You're giving me some hope here and then silence :)
>> Why did you ask about the file system? Should I run fsck on the LV ?
>>
>>
>>
>
> You already did run fsck on the LV.  It basically said that it didn't
> recognise the filesystem at all.
> I asked in case maybe it was XFS in which case a different tool would be
> required.

Looks like I didn't remember correctly. I ran testdisk and it reported 
the file system to be XFS. What would you suggest now Neil?


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-27 19:12           ` Marcin M. Jessa
@ 2011-09-27 23:13             ` NeilBrown
  2011-09-28  2:50               ` Stan Hoeppner
  2011-09-30 20:01               ` Marcin M. Jessa
  0 siblings, 2 replies; 44+ messages in thread
From: NeilBrown @ 2011-09-27 23:13 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa" <lists@yazzy.org> wrote:

> On 9/26/11 11:31 AM, NeilBrown wrote:
> > On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa"<lists@yazzy.org>  wrote:
> >
> >> On 9/26/11 12:18 AM, NeilBrown wrote:
> >>
> >>> Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
> >>> xfs or something else?
> >>
> >> You're giving me some hope here and then silence :)
> >> Why did you ask about the file system? Should I run fsck on the LV ?
> >>
> >>
> >>
> >
> > You already did run fsck on the LV.  It basically said that it didn't
> > recognise the filesystem at all.
> > I asked in case maybe it was XFS in which case a different tool would be
> > required.
> 
> Looks like I didn't remember correctly. I ran testdisk and it reported 
> the file system to be XFS. What would you suggest now Neil?
> 
> 

Presumably
   xfs_check /dev/fridge/storage

and then maybe
   xfs_repair /dev/fridge/storage

but I have no experience with XFS - I'm just reading man pages.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-27 23:13             ` NeilBrown
@ 2011-09-28  2:50               ` Stan Hoeppner
  2011-09-28  7:10                 ` Marcin M. Jessa
  2011-09-28 10:38                 ` Michal Soltys
  2011-09-30 20:01               ` Marcin M. Jessa
  1 sibling, 2 replies; 44+ messages in thread
From: Stan Hoeppner @ 2011-09-28  2:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: lists, linux-raid

On 9/27/2011 6:13 PM, NeilBrown wrote:
> On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa"<lists@yazzy.org>  wrote:
>
>> On 9/26/11 11:31 AM, NeilBrown wrote:
>>> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa"<lists@yazzy.org>   wrote:
>>>
>>>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>>>
>>>>> Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
>>>>> xfs or something else?
>>>>
>>>> You're giving me some hope here and then silence :)
>>>> Why did you ask about the file system? Should I run fsck on the LV ?
>>>>
>>>>
>>>>
>>>
>>> You already did run fsck on the LV.  It basically said that it didn't
>>> recognise the filesystem at all.
>>> I asked in case maybe it was XFS in which case a different tool would be
>>> required.
>>
>> Looks like I didn't remember correctly. I ran testdisk and it reported
>> the file system to be XFS. What would you suggest now Neil?
>>
>>
>
> Presumably
>     xfs_check /dev/fridge/storage
>
> and then maybe
>     xfs_repair /dev/fridge/storage
>
> but I have no experience with XFS - I'm just reading man pages.
>
> NeilBrown

Reading the thread, and the many like it over the past months/years, may 
yield a clue as to why you wish to move on to something other than Linux 
RAID...

-- 
Stan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28  2:50               ` Stan Hoeppner
@ 2011-09-28  7:10                 ` Marcin M. Jessa
  2011-09-28  7:51                   ` David Brown
  2011-09-28 16:12                   ` Stan Hoeppner
  2011-09-28 10:38                 ` Michal Soltys
  1 sibling, 2 replies; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-28  7:10 UTC (permalink / raw)
  To: stan; +Cc: linux-raid

On 9/28/11 4:50 AM, Stan Hoeppner wrote:

> Reading the thread, and the many like it over the past months/years, may
> yield a clue as to why you wish to move on to something other than Linux
> RAID...

:) I will give it another chance.
In case of failure FreeBSD and ZFS would be another option.


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28  7:10                 ` Marcin M. Jessa
@ 2011-09-28  7:51                   ` David Brown
  2011-09-28 16:12                   ` Stan Hoeppner
  1 sibling, 0 replies; 44+ messages in thread
From: David Brown @ 2011-09-28  7:51 UTC (permalink / raw)
  To: linux-raid

On 28/09/2011 09:10, Marcin M. Jessa wrote:
> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>
>> Reading the thread, and the many like it over the past months/years, may
>> yield a clue as to why you wish to move on to something other than Linux
>> RAID...
>
> :) I will give it another chance.
> In case of failure FreeBSD and ZFS would be another option.
>
>

Don't forget that in the face of 3 disk drives that suddenly decide to 
play silly buggers, /no/ raid system will cope well.  You are not having 
a problem because of Linux software raid problems - your problem is due 
to bad hardware.  If you had a similar situation with a hardware raid 
system, it is quite unlikely that you would have had any chance of 
recovering your raid.  What spoiled your chances of recovery here is the 
unfortunate bad advice you found on a website - but that won't happen 
again, since you now know to post here before trying anything!

The key lesson to take away from this experience is to set up a backup 
solution /before/ disaster strikes :-)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28  2:50               ` Stan Hoeppner
  2011-09-28  7:10                 ` Marcin M. Jessa
@ 2011-09-28 10:38                 ` Michal Soltys
  2011-09-28 13:20                   ` Brad Campbell
  2011-09-28 16:31                   ` Stan Hoeppner
  1 sibling, 2 replies; 44+ messages in thread
From: Michal Soltys @ 2011-09-28 10:38 UTC (permalink / raw)
  To: stan; +Cc: NeilBrown, lists, linux-raid

W dniu 28.09.2011 04:50, Stan Hoeppner pisze:
>
> Reading the thread, and the many like it over the past months/years, may
> yield a clue as to why you wish to move on to something other than Linux
> RAID...
>

IMHO, in almost all cases - at the end of the chain - is misinformation. 
While this might be a bit bold - a lot of users don't even spend a few 
minutes doing elementary homework like man md/mdadm/mdadm.conf and less 
/usr/src/linux/Documentation/md.txt. Be it normal usage, or when 
problems happen. Rumors and forum wisdom can be really damaging - not to 
look far away - how many people keep believing that xfs eats your data 
and fills it with 0s ?

It's hard to find cases, when md driver or mdadm was really at fault for 
something. For the most part the typical route is: [bottom barrel cheap 
desktop ]hardware/[terribly designed sata ]cable issues -> a user 
applying random googled suggestions (with shaking hands) -> really, 
really bad problems. But that's not md's failure.

I'd put lots of responsibility on [big] distros as well, which have been 
trying (for many years already) to turn linux into layers of gui/script 
wrapped (and often buggy) experience, trying to hide any and all 
technical details at all cost. But that's OT ...


Be it flexibility, recoverability (with cooled head and after panicking 
while being /away/ from the drives and md) or performance (especially 
after some small adjustments - namely stripe_cache_size for write 
speeds) it's hard to challenge md. And some awesome features are coming 
too (e.g. http://thread.gmane.org/gmane.linux.raid/34708 ).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 10:38                 ` Michal Soltys
@ 2011-09-28 13:20                   ` Brad Campbell
  2011-09-28 19:02                     ` Thomas Fjellstrom
  2011-09-28 16:31                   ` Stan Hoeppner
  1 sibling, 1 reply; 44+ messages in thread
From: Brad Campbell @ 2011-09-28 13:20 UTC (permalink / raw)
  To: linux-raid

On 28/09/11 18:38, Michal Soltys wrote:

> It's hard to find cases, when md driver or mdadm was really at fault for
> something. For the most part the typical route is: [bottom barrel cheap
> desktop ]hardware/[terribly designed sata ]cable issues -> a user
> applying random googled suggestions (with shaking hands) -> really,
> really bad problems. But that's not md's failure.

This really sums it up succinctly.

If you watched the cases of disaster that swing past linux-raid, the 
ones who always walk away whistling a happy tune are the ones who stop, 
think and ask for help.

The basket cases are more often than not created by people trying stuff 
out because they saw it mentioned somewhere else.

I'd suggest that users of real hardware raid suffer less "problems" 
because as they pay a bucketload of money for their raid card, they are 
far less likely to cheap out on cables, enclosures, drives and power 
supplies.

Most of the tales of woe here are related to the failures associated 
with commodity hardware. The 8TB I lost was entirely due to using a $15 
SATA controller.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28  7:10                 ` Marcin M. Jessa
  2011-09-28  7:51                   ` David Brown
@ 2011-09-28 16:12                   ` Stan Hoeppner
  2011-09-28 16:30                     ` Marcin M. Jessa
  2011-09-28 23:49                     ` NeilBrown
  1 sibling, 2 replies; 44+ messages in thread
From: Stan Hoeppner @ 2011-09-28 16:12 UTC (permalink / raw)
  To: lists; +Cc: linux-raid

On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>
>> Reading the thread, and the many like it over the past months/years, may
>> yield a clue as to why you wish to move on to something other than Linux
>> RAID...
>
> :) I will give it another chance.
> In case of failure FreeBSD and ZFS would be another option.

I was responding to Neil's exhaustion with mdadm.  I was speculating 
that help threads such as yours may be a contributing factor, 
requesting/requiring Neil to become Superman many times per month to try 
to save some OP's bacon.

-- 
Stan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 16:12                   ` Stan Hoeppner
@ 2011-09-28 16:30                     ` Marcin M. Jessa
  2011-09-28 18:56                       ` Thomas Fjellstrom
  2011-09-28 23:49                     ` NeilBrown
  1 sibling, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-28 16:30 UTC (permalink / raw)
  To: stan; +Cc: linux-raid

On 9/28/11 6:12 PM, Stan Hoeppner wrote:
> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>>
>>> Reading the thread, and the many like it over the past months/years, may
>>> yield a clue as to why you wish to move on to something other than Linux
>>> RAID...
>>
>> :) I will give it another chance.
>> In case of failure FreeBSD and ZFS would be another option.
>
> I was responding to Neil's exhaustion with mdadm. I was speculating that
> help threads such as yours may be a contributing factor,
> requesting/requiring Neil to become Superman many times per month to try
> to save some OP's bacon.

That's what mailing lists are for. And more will come as long as there 
is no documentation on how to save your behind in case of failures like 
that. Or if the docs with examples available online are utterly useless.


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 10:38                 ` Michal Soltys
  2011-09-28 13:20                   ` Brad Campbell
@ 2011-09-28 16:31                   ` Stan Hoeppner
  2011-09-28 16:37                     ` Marcin M. Jessa
  1 sibling, 1 reply; 44+ messages in thread
From: Stan Hoeppner @ 2011-09-28 16:31 UTC (permalink / raw)
  To: Michal Soltys; +Cc: NeilBrown, lists, linux-raid

On 9/28/2011 5:38 AM, Michal Soltys wrote:
> W dniu 28.09.2011 04:50, Stan Hoeppner pisze:
>>
>> Reading the thread, and the many like it over the past months/years, may
>> yield a clue as to why you wish to move on to something other than Linux
>> RAID...
>>
>
> IMHO, in almost all cases - at the end of the chain - is misinformation.
> While this might be a bit bold - a lot of users don't even spend a few
> minutes doing elementary homework like man md/mdadm/mdadm.conf and less
> /usr/src/linux/Documentation/md.txt. Be it normal usage, or when
> problems happen. Rumors and forum wisdom can be really damaging - not to
> look far away - how many people keep believing that xfs eats your data
> and fills it with 0s ?

Two people have responded to my comment above.  Neither read it in the 
proper context.  I was responding to Neil.  Neil wants to quit Linux 
RAID.  I simply eluded that 'desperate, please save my bacon!' help 
threads such as the current one may be a factor.

-- 
Stan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 16:31                   ` Stan Hoeppner
@ 2011-09-28 16:37                     ` Marcin M. Jessa
  2011-09-28 19:03                       ` Thomas Fjellstrom
  0 siblings, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-28 16:37 UTC (permalink / raw)
  To: stan; +Cc: Michal Soltys, NeilBrown, linux-raid

On 9/28/11 6:31 PM, Stan Hoeppner wrote:

> Two people have responded to my comment above. Neither read it in the
> proper context. I was responding to Neil. Neil wants to quit Linux RAID.
> I simply eluded that 'desperate, please save my bacon!' help threads
> such as the current one may be a factor.

Right, so you're saying we're betting on a soon to be dead horse?


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 16:30                     ` Marcin M. Jessa
@ 2011-09-28 18:56                       ` Thomas Fjellstrom
  2011-09-28 19:26                         ` Marcin M. Jessa
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Fjellstrom @ 2011-09-28 18:56 UTC (permalink / raw)
  To: lists; +Cc: stan, linux-raid

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 6:12 PM, Stan Hoeppner wrote:
> > On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> >> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >>> Reading the thread, and the many like it over the past months/years,
> >>> may yield a clue as to why you wish to move on to something other than
> >>> Linux RAID...
> >>> 
> >> :) I will give it another chance.
> >> 
> >> In case of failure FreeBSD and ZFS would be another option.
> > 
> > I was responding to Neil's exhaustion with mdadm. I was speculating that
> > help threads such as yours may be a contributing factor,
> > requesting/requiring Neil to become Superman many times per month to try
> > to save some OP's bacon.
> 
> That's what mailing lists are for. And more will come as long as there
> is no documentation on how to save your behind in case of failures like
> that. Or if the docs with examples available online are utterly useless.

I think that those of us that have been helped on the list should think about 
contributing some wiki docs, if there's one we can edit.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 13:20                   ` Brad Campbell
@ 2011-09-28 19:02                     ` Thomas Fjellstrom
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Fjellstrom @ 2011-09-28 19:02 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-raid

On September 28, 2011, Brad Campbell wrote:
> On 28/09/11 18:38, Michal Soltys wrote:
> > It's hard to find cases, when md driver or mdadm was really at fault for
> > something. For the most part the typical route is: [bottom barrel cheap
> > desktop ]hardware/[terribly designed sata ]cable issues -> a user
> > applying random googled suggestions (with shaking hands) -> really,
> > really bad problems. But that's not md's failure.
> 
> This really sums it up succinctly.
> 
> If you watched the cases of disaster that swing past linux-raid, the
> ones who always walk away whistling a happy tune are the ones who stop,
> think and ask for help.
> 
> The basket cases are more often than not created by people trying stuff
> out because they saw it mentioned somewhere else.
> 
> I'd suggest that users of real hardware raid suffer less "problems"
> because as they pay a bucketload of money for their raid card, they are
> far less likely to cheap out on cables, enclosures, drives and power
> supplies.
> 
> Most of the tales of woe here are related to the failures associated
> with commodity hardware. The 8TB I lost was entirely due to using a $15
> SATA controller.
> 

I completely agree. The last time I lost my array, it was because I fat 
fingered a mdadm command. Can't remember exactly what it was now, either a 
reshape, or a drive replacement. Now I try to be very very careful.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 16:37                     ` Marcin M. Jessa
@ 2011-09-28 19:03                       ` Thomas Fjellstrom
  2011-09-28 19:29                         ` Marcin M. Jessa
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Fjellstrom @ 2011-09-28 19:03 UTC (permalink / raw)
  To: lists; +Cc: stan, Michal Soltys, NeilBrown, linux-raid

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 6:31 PM, Stan Hoeppner wrote:
> > Two people have responded to my comment above. Neither read it in the
> > proper context. I was responding to Neil. Neil wants to quit Linux RAID.
> > I simply eluded that 'desperate, please save my bacon!' help threads
> > such as the current one may be a factor.
> 
> Right, so you're saying we're betting on a soon to be dead horse?

Oh heck no. There's no way md would die if Neil left. And I doubt he'd leave 
it without a maintainer. At least not under normal circumstances.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 18:56                       ` Thomas Fjellstrom
@ 2011-09-28 19:26                         ` Marcin M. Jessa
  2011-09-28 19:42                           ` Thomas Fjellstrom
  0 siblings, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-28 19:26 UTC (permalink / raw)
  To: thomas; +Cc: stan, linux-raid

On 9/28/11 8:56 PM, Thomas Fjellstrom wrote:
> On September 28, 2011, Marcin M. Jessa wrote:
>> On 9/28/11 6:12 PM, Stan Hoeppner wrote:
>>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>>>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>>>>> Reading the thread, and the many like it over the past months/years,
>>>>> may yield a clue as to why you wish to move on to something other than
>>>>> Linux RAID...
>>>>>
>>>> :) I will give it another chance.
>>>>
>>>> In case of failure FreeBSD and ZFS would be another option.
>>>
>>> I was responding to Neil's exhaustion with mdadm. I was speculating that
>>> help threads such as yours may be a contributing factor,
>>> requesting/requiring Neil to become Superman many times per month to try
>>> to save some OP's bacon.
>>
>> That's what mailing lists are for. And more will come as long as there
>> is no documentation on how to save your behind in case of failures like
>> that. Or if the docs with examples available online are utterly useless.
>
> I think that those of us that have been helped on the list should think about
> contributing some wiki docs, if there's one we can edit.


I have a site, ezunix.org (a bit crippled since the crash) where I 
document anything I come across that can be useful.
But after all the messages I still don't know what to do if you lose 3 
drives in a 5 drive RAID6 setup ;)
I was told I was doing it wrong but never how to do it right.
And that's the case of all the mailing lists I came across before I 
found that wikipedia site with incorrect instructions.



-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 19:03                       ` Thomas Fjellstrom
@ 2011-09-28 19:29                         ` Marcin M. Jessa
  2011-09-28 19:43                           ` Thomas Fjellstrom
  0 siblings, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-28 19:29 UTC (permalink / raw)
  To: thomas; +Cc: stan, Michal Soltys, NeilBrown, linux-raid

On 9/28/11 9:03 PM, Thomas Fjellstrom wrote:
> On September 28, 2011, Marcin M. Jessa wrote:
>> On 9/28/11 6:31 PM, Stan Hoeppner wrote:
>>> Two people have responded to my comment above. Neither read it in the
>>> proper context. I was responding to Neil. Neil wants to quit Linux RAID.
>>> I simply eluded that 'desperate, please save my bacon!' help threads
>>> such as the current one may be a factor.
>>
>> Right, so you're saying we're betting on a soon to be dead horse?
>
> Oh heck no. There's no way md would die if Neil left. And I doubt he'd leave
> it without a maintainer. At least not under normal circumstances.

Let's hope not. I saw the planned changes for Linux 3.1 and they look great.


-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 19:26                         ` Marcin M. Jessa
@ 2011-09-28 19:42                           ` Thomas Fjellstrom
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Fjellstrom @ 2011-09-28 19:42 UTC (permalink / raw)
  To: lists; +Cc: stan, linux-raid

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 8:56 PM, Thomas Fjellstrom wrote:
> > On September 28, 2011, Marcin M. Jessa wrote:
> >> On 9/28/11 6:12 PM, Stan Hoeppner wrote:
> >>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> >>>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >>>>> Reading the thread, and the many like it over the past months/years,
> >>>>> may yield a clue as to why you wish to move on to something other
> >>>>> than Linux RAID...
> >>>>> 
> >>>> :) I will give it another chance.
> >>>> 
> >>>> In case of failure FreeBSD and ZFS would be another option.
> >>> 
> >>> I was responding to Neil's exhaustion with mdadm. I was speculating
> >>> that help threads such as yours may be a contributing factor,
> >>> requesting/requiring Neil to become Superman many times per month to
> >>> try to save some OP's bacon.
> >> 
> >> That's what mailing lists are for. And more will come as long as there
> >> is no documentation on how to save your behind in case of failures like
> >> that. Or if the docs with examples available online are utterly useless.
> > 
> > I think that those of us that have been helped on the list should think
> > about contributing some wiki docs, if there's one we can edit.
> 
> I have a site, ezunix.org (a bit crippled since the crash) where I
> document anything I come across that can be useful.
> But after all the messages I still don't know what to do if you lose 3
> drives in a 5 drive RAID6 setup ;)
> I was told I was doing it wrong but never how to do it right.
> And that's the case of all the mailing lists I came across before I
> found that wikipedia site with incorrect instructions.

I think they did mention how to do it right. something like: mdadm --assemble 
--force

since 3 drives at once likely means the drives are fine. I recently lost ALL of 
the drives in my 7 drive raid5 array. First one was kicked, then the rest fell 
offline at the same time. Because the 6 drives all fell offline at the same time, 
their metadata all agreed on the current state of the array, so nothing other 
than some data that was stuck in the page cache was gone. In my case, after I 
ran: `mdadm --assemble --verbose /dev/md1 /dev/sd[fhijedg]`  only one drive 
was left out, which was the first drive to go. Then I ran: `mdadm --re-add 
/dev/md1 /dev/sdi` to add that drive back, and since I use the very nice 
bitmap feature, it only too my array a few minutes to resync.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 19:29                         ` Marcin M. Jessa
@ 2011-09-28 19:43                           ` Thomas Fjellstrom
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Fjellstrom @ 2011-09-28 19:43 UTC (permalink / raw)
  To: lists; +Cc: stan, Michal Soltys, NeilBrown, linux-raid

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 9:03 PM, Thomas Fjellstrom wrote:
> > On September 28, 2011, Marcin M. Jessa wrote:
> >> On 9/28/11 6:31 PM, Stan Hoeppner wrote:
> >>> Two people have responded to my comment above. Neither read it in the
> >>> proper context. I was responding to Neil. Neil wants to quit Linux
> >>> RAID. I simply eluded that 'desperate, please save my bacon!' help
> >>> threads such as the current one may be a factor.
> >> 
> >> Right, so you're saying we're betting on a soon to be dead horse?
> > 
> > Oh heck no. There's no way md would die if Neil left. And I doubt he'd
> > leave it without a maintainer. At least not under normal circumstances.
> 
> Let's hope not. I saw the planned changes for Linux 3.1 and they look
> great.

I believe he said it wasn't going to happen any time soon. Just that he was 
thinking about retiring eventually.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 16:12                   ` Stan Hoeppner
  2011-09-28 16:30                     ` Marcin M. Jessa
@ 2011-09-28 23:49                     ` NeilBrown
  2011-09-29  9:03                       ` David Brown
  2011-09-29 18:28                       ` Dan Williams
  1 sibling, 2 replies; 44+ messages in thread
From: NeilBrown @ 2011-09-28 23:49 UTC (permalink / raw)
  To: stan; +Cc: lists, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]

On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner <stan@hardwarefreak.com>
wrote:

> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> > On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >
> >> Reading the thread, and the many like it over the past months/years, may
> >> yield a clue as to why you wish to move on to something other than Linux
> >> RAID...
> >
> > :) I will give it another chance.
> > In case of failure FreeBSD and ZFS would be another option.
> 
> I was responding to Neil's exhaustion with mdadm.  I was speculating 
> that help threads such as yours may be a contributing factor, 
> requesting/requiring Neil to become Superman many times per month to try 
> to save some OP's bacon.
> 

No, I don't really think they are a factor - though thanks for thinking
about it.

Obviously not all "help threads" end with a good result but quite a few do
and one has to take the rough with the smooth.
And each help thread is a potential learning experience.  If I see patterns
of failure recurring it will guide and motivate me to improve md or mdadm to
make that failure mode less likely.

I think it is simply that it isn't new any more.  I first started
contributing to md early in 2000, and 11 years is a long time.  Not as long
as Mr Torvalds has works on Linux of course, but Linux is a lot bigger than
md so there is more room to be interested.
There have been many highlights over that time, but the ones that stick in my
memory is when others have contributed in significant ways.  I really value
that, whether it is code, or review or documentation, or making a wiki or
answering mailing lists questions before I do, or even putting extra time in
to reproduce a bug so we can drill down to the cause.

I figure that appearing competent capable and in control isn't going to
attract new blood - new blood wants wide open frontiers with lots of
opportunity (I started in md when it was essentially unmaintained - I know
the attraction).  So I just want to say that there is certainly room and
opportunity over here.

I'm not about to drop md, but I would love an apprentice or two (or 3 or 4)
and would aim to provide the same mix of independence and oversight as Linus
does.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 23:49                     ` NeilBrown
@ 2011-09-29  9:03                       ` David Brown
  2011-09-29 15:21                         ` Stan Hoeppner
  2011-09-29 18:28                       ` Dan Williams
  1 sibling, 1 reply; 44+ messages in thread
From: David Brown @ 2011-09-29  9:03 UTC (permalink / raw)
  To: linux-raid

On 29/09/2011 01:49, NeilBrown wrote:
> On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner<stan@hardwarefreak.com>
> wrote:
>
>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>>>
>>>> Reading the thread, and the many like it over the past months/years, may
>>>> yield a clue as to why you wish to move on to something other than Linux
>>>> RAID...
>>>
>>> :) I will give it another chance.
>>> In case of failure FreeBSD and ZFS would be another option.
>>
>> I was responding to Neil's exhaustion with mdadm.  I was speculating
>> that help threads such as yours may be a contributing factor,
>> requesting/requiring Neil to become Superman many times per month to try
>> to save some OP's bacon.
>>
>
> No, I don't really think they are a factor - though thanks for thinking
> about it.
>
> Obviously not all "help threads" end with a good result but quite a few do
> and one has to take the rough with the smooth.
> And each help thread is a potential learning experience.  If I see patterns
> of failure recurring it will guide and motivate me to improve md or mdadm to
> make that failure mode less likely.
>
> I think it is simply that it isn't new any more.  I first started
> contributing to md early in 2000, and 11 years is a long time.  Not as long
> as Mr Torvalds has works on Linux of course, but Linux is a lot bigger than
> md so there is more room to be interested.
> There have been many highlights over that time, but the ones that stick in my
> memory is when others have contributed in significant ways.  I really value
> that, whether it is code, or review or documentation, or making a wiki or
> answering mailing lists questions before I do, or even putting extra time in
> to reproduce a bug so we can drill down to the cause.
>
> I figure that appearing competent capable and in control isn't going to
> attract new blood - new blood wants wide open frontiers with lots of
> opportunity (I started in md when it was essentially unmaintained - I know
> the attraction).  So I just want to say that there is certainly room and
> opportunity over here.
>
> I'm not about to drop md, but I would love an apprentice or two (or 3 or 4)
> and would aim to provide the same mix of independence and oversight as Linus
> does.
>
> NeilBrown

One challenge for getting apprentices in this particular area is the 
hardware costs.  For someone to be able to really help you out in 
development and serious testing of md, they are going to need access to 
a machine with plenty of disks, preferably with hotplug bays and with a 
mix of hdds and ssds.  That's going to out of reach for many potential 
assistants - it is hard enough finding someone with the required talent, 
interest and time to spend on md/mdadm.  Finding people with money - 
especially people using md raid professionally - should be a lot easier. 
  So if there is anyone out there who is willing and able to contribute 
seriously to md/mdadm, but is hindered by lack of hardware, then I for 
one would be willing to contribute to a fund to help out.




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-29  9:03                       ` David Brown
@ 2011-09-29 15:21                         ` Stan Hoeppner
  2011-09-29 17:14                           ` David Brown
  0 siblings, 1 reply; 44+ messages in thread
From: Stan Hoeppner @ 2011-09-29 15:21 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On 9/29/2011 4:03 AM, David Brown wrote:

> One challenge for getting apprentices in this particular area is the
> hardware costs. For someone to be able to really help you out in
> development and serious testing of md, they are going to need access to
> a machine with plenty of disks, preferably with hotplug bays and with a
> mix of hdds and ssds. That's going to out of reach for many potential
> assistants - it is hard enough finding someone with the required talent,
> interest and time to spend on md/mdadm. Finding people with money -
> especially people using md raid professionally - should be a lot easier.
> So if there is anyone out there who is willing and able to contribute
> seriously to md/mdadm, but is hindered by lack of hardware, then I for
> one would be willing to contribute to a fund to help out.

Almost a dozen different people from Intel have contributed code 
recently.  I would think such folks wouldn't have any problem getting 
access to all the hardware they could need given Intel's financial 
resources.  Seems like a good recruiting pool.

-- 
Stan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-29 15:21                         ` Stan Hoeppner
@ 2011-09-29 17:14                           ` David Brown
  0 siblings, 0 replies; 44+ messages in thread
From: David Brown @ 2011-09-29 17:14 UTC (permalink / raw)
  To: linux-raid

On 29/09/11 17:21, Stan Hoeppner wrote:
> On 9/29/2011 4:03 AM, David Brown wrote:
>
>> One challenge for getting apprentices in this particular area is the
>> hardware costs. For someone to be able to really help you out in
>> development and serious testing of md, they are going to need access to
>> a machine with plenty of disks, preferably with hotplug bays and with a
>> mix of hdds and ssds. That's going to out of reach for many potential
>> assistants - it is hard enough finding someone with the required talent,
>> interest and time to spend on md/mdadm. Finding people with money -
>> especially people using md raid professionally - should be a lot easier.
>> So if there is anyone out there who is willing and able to contribute
>> seriously to md/mdadm, but is hindered by lack of hardware, then I for
>> one would be willing to contribute to a fund to help out.
>
> Almost a dozen different people from Intel have contributed code
> recently. I would think such folks wouldn't have any problem getting
> access to all the hardware they could need given Intel's financial
> resources. Seems like a good recruiting pool.
>

I am sure you are right that these people should have access to plenty 
of hardware - in particular, they should be in an ideal position to help 
with testing/tuning for SSD usage.  But they may not have the time to 
help much, unless Intel is happy to pay them to do so.  People who have 
lots of time - say, a new graduate or someone "between jobs" - often 
don't have the hardware.  All I am saying is that /if/ there are such 
people around, and they are serious about working on md, then I think it 
should be possible to raise a little money to help them help us.  (The 
same applies to existing md developers, of course.)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-28 23:49                     ` NeilBrown
  2011-09-29  9:03                       ` David Brown
@ 2011-09-29 18:28                       ` Dan Williams
  2011-09-29 23:07                         ` NeilBrown
  1 sibling, 1 reply; 44+ messages in thread
From: Dan Williams @ 2011-09-29 18:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: stan, lists, linux-raid

On Wed, Sep 28, 2011 at 4:49 PM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
>
>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>> > On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>> >
>> >> Reading the thread, and the many like it over the past months/years, may
>> >> yield a clue as to why you wish to move on to something other than Linux
>> >> RAID...
>> >
>> > :) I will give it another chance.
>> > In case of failure FreeBSD and ZFS would be another option.
>>
>> I was responding to Neil's exhaustion with mdadm.  I was speculating
>> that help threads such as yours may be a contributing factor,
>> requesting/requiring Neil to become Superman many times per month to try
>> to save some OP's bacon.
>>
>
> No, I don't really think they are a factor - though thanks for thinking
> about it.
>
> Obviously not all "help threads" end with a good result but quite a few do
> and one has to take the rough with the smooth.
> And each help thread is a potential learning experience.  If I see patterns
> of failure recurring it will guide and motivate me to improve md or mdadm to
> make that failure mode less likely.
>
> I think it is simply that it isn't new any more.  I first started
> contributing to md early in 2000, and 11 years is a long time.  Not as long
> as Mr Torvalds has works on Linux of course, but Linux is a lot bigger than
> md so there is more room to be interested.
> There have been many highlights over that time, but the ones that stick in my
> memory is when others have contributed in significant ways.  I really value
> that, whether it is code, or review or documentation, or making a wiki or
> answering mailing lists questions before I do, or even putting extra time in
> to reproduce a bug so we can drill down to the cause.
>
> I figure that appearing competent capable and in control isn't going to
> attract new blood - new blood wants wide open frontiers with lots of
> opportunity (I started in md when it was essentially unmaintained - I know
> the attraction).  So I just want to say that there is certainly room and
> opportunity over here.
>
> I'm not about to drop md, but I would love an apprentice or two (or 3 or 4)
> and would aim to provide the same mix of independence and oversight as Linus
> does.
>

What if as a starting point we could get a Patchwork queue hosted
somewhere so you could at least start formally delegating incoming
patches for an apprentice to disposition?

The hardest part about maintenance is taste, and md has been thriving
on good-taste pragmatic decisions for a while now.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-29 18:28                       ` Dan Williams
@ 2011-09-29 23:07                         ` NeilBrown
  2011-09-30  0:18                           ` Williams, Dan J
  0 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2011-09-29 23:07 UTC (permalink / raw)
  To: Dan Williams; +Cc: stan, lists, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3881 bytes --]

On Thu, 29 Sep 2011 11:28:36 -0700 Dan Williams <dan.j.williams@intel.com>
wrote:

> On Wed, Sep 28, 2011 at 4:49 PM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner <stan@hardwarefreak.com>
> > wrote:
> >
> >> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> >> > On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >> >
> >> >> Reading the thread, and the many like it over the past months/years, may
> >> >> yield a clue as to why you wish to move on to something other than Linux
> >> >> RAID...
> >> >
> >> > :) I will give it another chance.
> >> > In case of failure FreeBSD and ZFS would be another option.
> >>
> >> I was responding to Neil's exhaustion with mdadm.  I was speculating
> >> that help threads such as yours may be a contributing factor,
> >> requesting/requiring Neil to become Superman many times per month to try
> >> to save some OP's bacon.
> >>
> >
> > No, I don't really think they are a factor - though thanks for thinking
> > about it.
> >
> > Obviously not all "help threads" end with a good result but quite a few do
> > and one has to take the rough with the smooth.
> > And each help thread is a potential learning experience.  If I see patterns
> > of failure recurring it will guide and motivate me to improve md or mdadm to
> > make that failure mode less likely.
> >
> > I think it is simply that it isn't new any more.  I first started
> > contributing to md early in 2000, and 11 years is a long time.  Not as long
> > as Mr Torvalds has works on Linux of course, but Linux is a lot bigger than
> > md so there is more room to be interested.
> > There have been many highlights over that time, but the ones that stick in my
> > memory is when others have contributed in significant ways.  I really value
> > that, whether it is code, or review or documentation, or making a wiki or
> > answering mailing lists questions before I do, or even putting extra time in
> > to reproduce a bug so we can drill down to the cause.
> >
> > I figure that appearing competent capable and in control isn't going to
> > attract new blood - new blood wants wide open frontiers with lots of
> > opportunity (I started in md when it was essentially unmaintained - I know
> > the attraction).  So I just want to say that there is certainly room and
> > opportunity over here.
> >
> > I'm not about to drop md, but I would love an apprentice or two (or 3 or 4)
> > and would aim to provide the same mix of independence and oversight as Linus
> > does.
> >
> 
> What if as a starting point we could get a Patchwork queue hosted
> somewhere so you could at least start formally delegating incoming
> patches for an apprentice to disposition?

I don't know much about Patchwork ... what sort of value does it add?

But I don't think much of the idea of delegation.  I don't see a thriving
developer community full of people who want work delegated to them.   Rather
I see a thriving developer community of people who see problems and want to
fix them and dive in and do stuff.
An apprentice who needs to have stuff delegated to them will always be an
apprentice.  A master starts by doing the things their master doesn't want to
do, then moves to the things the master didn't think to do and finally
blossoms by doing the things their master didn't know how to do.

> 
> The hardest part about maintenance is taste, and md has been thriving
> on good-taste pragmatic decisions for a while now.

Taste is learnt by practise.  Having someone correct - or at least
highlight - your mistakes is important, but making the mistakes in the first
place is vital.


I think the starting point is simply to do.  Read the code, ask a question,
suggest a design, send a patch, pick a task of the road-map (or make one up
your self) and start work on it.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-29 23:07                         ` NeilBrown
@ 2011-09-30  0:18                           ` Williams, Dan J
  2011-10-05  2:15                             ` NeilBrown
  0 siblings, 1 reply; 44+ messages in thread
From: Williams, Dan J @ 2011-09-30  0:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: stan, lists, linux-raid

On Thu, Sep 29, 2011 at 4:07 PM, NeilBrown <neilb@suse.de> wrote:
>> What if as a starting point we could get a Patchwork queue hosted
>> somewhere so you could at least start formally delegating incoming
>> patches for an apprentice to disposition?
>
> I don't know much about Patchwork ... what sort of value does it add?

It just makes things more transparent.  It allows a submitter to have
a web interface to view the state of a patch: accepted, rejected,
under review.  Allows a maintainer or a group of maintainers to see
the backlog and assign (delegate) patches between them.  It also
automates the collection of Acked-by, Reviewed-by, etc tags.

> But I don't think much of the idea of delegation.  I don't see a thriving
> developer community full of people who want work delegated to them.

So I only meant "delegate" in the Patchwork parlance to make "who is
merging this md/mdadm patch" clear as the apprentice ramps up.  But
this is probably too much mechanics.

It simply sounds like you want a similar situation like what happened
with git.  I.e. a "Junio" to take over but you'll still be around to
course correct and send patches.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-27 23:13             ` NeilBrown
  2011-09-28  2:50               ` Stan Hoeppner
@ 2011-09-30 20:01               ` Marcin M. Jessa
  2011-09-30 21:47                 ` Thomas Fjellstrom
  1 sibling, 1 reply; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-30 20:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 9/28/11 1:13 AM, NeilBrown wrote:
> On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa"<lists@yazzy.org>  wrote:
>
>> On 9/26/11 11:31 AM, NeilBrown wrote:
>>> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa"<lists@yazzy.org>   wrote:
>>>
>>>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>>>
>>>>> Do you remember what filesystem you had on 'storage'?  Was it ext3 or ext4 or
>>>>> xfs or something else?
>>>>
>>>> You're giving me some hope here and then silence :)
>>>> Why did you ask about the file system? Should I run fsck on the LV ?
>>>>
>>>>
>>>>
>>>
>>> You already did run fsck on the LV.  It basically said that it didn't
>>> recognise the filesystem at all.
>>> I asked in case maybe it was XFS in which case a different tool would be
>>> required.
>>
>> Looks like I didn't remember correctly. I ran testdisk and it reported
>> the file system to be XFS. What would you suggest now Neil?
>>
>>
>
> Presumably
>     xfs_check /dev/fridge/storage
>
> and then maybe
>     xfs_repair /dev/fridge/storage
>
> but I have no experience with XFS - I'm just reading man pages.
>

That didn't work so I decided to give photorec a spin and so far it 
could find and recover lots of files [1].
Why the heck is photorec able to do so but the normal file system fixing 
tools are just useless?


[1]:
Disk /dev/dm-0 - 5368 GB / 5000 GiB (RO)
Partition Start End Size in sectors
No partition 0 10485759999 10485760000 [Whole disk]


Pass 2 - Reading sector 1506591634/10485760000, 843 files found
Elapsed time 10h16m59s - Estimated time for achievement 61h17m10
txt: 445 recovered
exe: 198 recovered
mpg: 62 recovered
swf: 33 recovered
tx?: 33 recovered
gif: 25 recovered
gpg: 19 recovered
bmp: 5 recovered
gz: 4 recovered
riff: 4 recovered
others: 15 recovered



-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-30 20:01               ` Marcin M. Jessa
@ 2011-09-30 21:47                 ` Thomas Fjellstrom
  2011-09-30 22:30                   ` Marcin M. Jessa
  0 siblings, 1 reply; 44+ messages in thread
From: Thomas Fjellstrom @ 2011-09-30 21:47 UTC (permalink / raw)
  To: lists; +Cc: NeilBrown, linux-raid

On September 30, 2011, Marcin M. Jessa wrote:
> On 9/28/11 1:13 AM, NeilBrown wrote:
> > On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa"<lists@yazzy.org>  
wrote:
> >> On 9/26/11 11:31 AM, NeilBrown wrote:
> >>> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa"<lists@yazzy.org>   
wrote:
> >>>> On 9/26/11 12:18 AM, NeilBrown wrote:
> >>>>> Do you remember what filesystem you had on 'storage'?  Was it ext3 or
> >>>>> ext4 or xfs or something else?
> >>>> 
> >>>> You're giving me some hope here and then silence :)
> >>>> Why did you ask about the file system? Should I run fsck on the LV ?
> >>> 
> >>> You already did run fsck on the LV.  It basically said that it didn't
> >>> recognise the filesystem at all.
> >>> I asked in case maybe it was XFS in which case a different tool would
> >>> be required.
> >> 
> >> Looks like I didn't remember correctly. I ran testdisk and it reported
> >> the file system to be XFS. What would you suggest now Neil?
> > 
> > Presumably
> > 
> >     xfs_check /dev/fridge/storage
> > 
> > and then maybe
> > 
> >     xfs_repair /dev/fridge/storage
> > 
> > but I have no experience with XFS - I'm just reading man pages.
> 
> That didn't work so I decided to give photorec a spin and so far it
> could find and recover lots of files [1].
> Why the heck is photorec able to do so but the normal file system fixing
> tools are just useless?


The file system metadata was trashed, and photorec looks at the data only, 
looking for file headers and trying to pull out contiguous files.

> 
> [1]:
> Disk /dev/dm-0 - 5368 GB / 5000 GiB (RO)
> Partition Start End Size in sectors
> No partition 0 10485759999 10485760000 [Whole disk]
> 
> 
> Pass 2 - Reading sector 1506591634/10485760000, 843 files found
> Elapsed time 10h16m59s - Estimated time for achievement 61h17m10
> txt: 445 recovered
> exe: 198 recovered
> mpg: 62 recovered
> swf: 33 recovered
> tx?: 33 recovered
> gif: 25 recovered
> gpg: 19 recovered
> bmp: 5 recovered
> gz: 4 recovered
> riff: 4 recovered
> others: 15 recovered


-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-30 21:47                 ` Thomas Fjellstrom
@ 2011-09-30 22:30                   ` Marcin M. Jessa
  0 siblings, 0 replies; 44+ messages in thread
From: Marcin M. Jessa @ 2011-09-30 22:30 UTC (permalink / raw)
  To: thomas; +Cc: linux-raid

On 9/30/11 11:47 PM, Thomas Fjellstrom wrote:

> The file system metadata was trashed, and photorec looks at the data only,
> looking for file headers and trying to pull out contiguous files.

I'd pay for a tool if it only could restore directories and file names 
as well...

-- 

Marcin M. Jessa

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Recovery of failed RAID 6 and LVM
  2011-09-30  0:18                           ` Williams, Dan J
@ 2011-10-05  2:15                             ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2011-10-05  2:15 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: stan, lists, linux-raid

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

On Thu, 29 Sep 2011 17:18:45 -0700 "Williams, Dan J"
<dan.j.williams@intel.com> wrote:


> It simply sounds like you want a similar situation like what happened
> with git.  I.e. a "Junio" to take over but you'll still be around to
> course correct and send patches.

That isn't necessary in the first instance.  It could possibly (hopefully)
reach that stage, but taking over as maintainer is a big ask and not
something to be expected in a hurry.
I'm happy to start small.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2011-10-05  2:15 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-25  7:55 Recovery of failed RAID 6 and LVM Marcin M. Jessa
2011-09-25  8:39 ` Stan Hoeppner
2011-09-25 10:07   ` Marcin M. Jessa
2011-09-25 13:15 ` Phil Turmel
2011-09-25 14:16   ` Marcin M. Jessa
2011-09-25 16:43     ` Phil Turmel
2011-09-25 14:41   ` Marcin M. Jessa
2011-09-25 16:19     ` Phil Turmel
2011-09-25 21:40 ` NeilBrown
2011-09-25 21:58   ` Marcin M. Jessa
2011-09-25 22:18     ` NeilBrown
2011-09-25 22:21       ` Marcin M. Jessa
     [not found]       ` <4E804062.3020700@yazzy.org>
2011-09-26  9:31         ` NeilBrown
2011-09-26 10:53           ` Marcin M. Jessa
2011-09-26 11:10             ` NeilBrown
2011-09-27 19:12           ` Marcin M. Jessa
2011-09-27 23:13             ` NeilBrown
2011-09-28  2:50               ` Stan Hoeppner
2011-09-28  7:10                 ` Marcin M. Jessa
2011-09-28  7:51                   ` David Brown
2011-09-28 16:12                   ` Stan Hoeppner
2011-09-28 16:30                     ` Marcin M. Jessa
2011-09-28 18:56                       ` Thomas Fjellstrom
2011-09-28 19:26                         ` Marcin M. Jessa
2011-09-28 19:42                           ` Thomas Fjellstrom
2011-09-28 23:49                     ` NeilBrown
2011-09-29  9:03                       ` David Brown
2011-09-29 15:21                         ` Stan Hoeppner
2011-09-29 17:14                           ` David Brown
2011-09-29 18:28                       ` Dan Williams
2011-09-29 23:07                         ` NeilBrown
2011-09-30  0:18                           ` Williams, Dan J
2011-10-05  2:15                             ` NeilBrown
2011-09-28 10:38                 ` Michal Soltys
2011-09-28 13:20                   ` Brad Campbell
2011-09-28 19:02                     ` Thomas Fjellstrom
2011-09-28 16:31                   ` Stan Hoeppner
2011-09-28 16:37                     ` Marcin M. Jessa
2011-09-28 19:03                       ` Thomas Fjellstrom
2011-09-28 19:29                         ` Marcin M. Jessa
2011-09-28 19:43                           ` Thomas Fjellstrom
2011-09-30 20:01               ` Marcin M. Jessa
2011-09-30 21:47                 ` Thomas Fjellstrom
2011-09-30 22:30                   ` Marcin M. Jessa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.