All of lore.kernel.org
 help / color / mirror / Atom feed
* recovering from raid5 corruption
@ 2012-04-29 22:38 Shaya Potter
  2012-04-29 22:52 ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Shaya Potter @ 2012-04-29 22:38 UTC (permalink / raw)
  To: linux-raid

somehow my raid5 got corrupted in the contexts of a main disk failure 
(which wasn't raid related).

compounding this issue was that I had already had one disk in the raid5 
go bad and was in the process of getting it replaced.

this raid array was 5 disks.

What I mean by corrupted is that the superblock of 3 of the remaining 4 
devices seemed to have been wiped out (i.e. had a UUID of all 0s, though 
still enough that it knew it was part of an md device)

now, the one whose superblock seems fine, places it at position disk 3 
(of 0-4) and the missing disk at disk 2.

this would imply that there are only 6 permutations possible for the 
other 3 disks.   (even if that assumptions is wrong, there are only 120 
permutations possible, which I should easily be able to iterate over).

further compounding this, is that there were 2 LVM logical disks on the 
physical raid device.

I've tried being cute and trying all 6 permutations to force recreate 
the array, but lvm isn't picking up anything. (pvscan/lvscan/lvmdiskscan)

The original raid had a version of 0.90.00 (created in 2008), while the 
new one has a version 1.20.

have I ruined any chances of recovery by shooting in the dark with my 
cute attempts, am I SOL or is there a better/proper way I can try to 
recover this data?

Luckily for me, I've been on a backup binge of late, but there still 
about 500-1TB of stuff that wasn't backed up.

thanks, any help would be appreciated.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-29 22:38 recovering from raid5 corruption Shaya Potter
@ 2012-04-29 22:52 ` NeilBrown
  2012-04-29 23:29   ` Shaya Potter
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2012-04-29 22:52 UTC (permalink / raw)
  To: Shaya Potter; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

On Sun, 29 Apr 2012 18:38:29 -0400 Shaya Potter <spotter@gmail.com> wrote:

> somehow my raid5 got corrupted in the contexts of a main disk failure 
> (which wasn't raid related).
> 
> compounding this issue was that I had already had one disk in the raid5 
> go bad and was in the process of getting it replaced.
> 
> this raid array was 5 disks.
> 
> What I mean by corrupted is that the superblock of 3 of the remaining 4 
> devices seemed to have been wiped out (i.e. had a UUID of all 0s, though 
> still enough that it knew it was part of an md device)
> 
> now, the one whose superblock seems fine, places it at position disk 3 
> (of 0-4) and the missing disk at disk 2.
> 
> this would imply that there are only 6 permutations possible for the 
> other 3 disks.   (even if that assumptions is wrong, there are only 120 
> permutations possible, which I should easily be able to iterate over).
> 
> further compounding this, is that there were 2 LVM logical disks on the 
> physical raid device.
> 
> I've tried being cute and trying all 6 permutations to force recreate 
> the array, but lvm isn't picking up anything. (pvscan/lvscan/lvmdiskscan)
> 
> The original raid had a version of 0.90.00 (created in 2008), while the 
> new one has a version 1.20.
> 
> have I ruined any chances of recovery by shooting in the dark with my 
> cute attempts, am I SOL or is there a better/proper way I can try to 
> recover this data?
> 
> Luckily for me, I've been on a backup binge of late, but there still 
> about 500-1TB of stuff that wasn't backed up.

You've written a new superblock 4K in to each device, where previously here
was something.   So you have probably corrupted something though we cannot
easily tell what.

Retry your experiment with --metadata=0.90.  Hopefully one of those
combinations will work better.  If it does, make a backup of the data you
want to keep, then I would suggest rebuilding the array from scratch.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-29 22:52 ` NeilBrown
@ 2012-04-29 23:29   ` Shaya Potter
  2012-04-29 23:41     ` Shaya Potter
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Shaya Potter @ 2012-04-29 23:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 04/29/2012 06:52 PM, NeilBrown wrote:
>
> You've written a new superblock 4K in to each device, where previously here
> was something.   So you have probably corrupted something though we cannot
> easily tell what.
>
> Retry your experiment with --metadata=0.90.  Hopefully one of those
> combinations will work better.  If it does, make a backup of the data you
> want to keep, then I would suggest rebuilding the array from scratch.

ok, thanks, that was a huge help.

I have it setup correctly now (obvious due to the fact that I can read 
the lvm configuration without any gibberish when ordered correctly).

however, now I need to figure out how to recreate the lvm appropriately. 
  I see the configuration "file" in at the start of the raid array (less 
-f /dev/md0 which I'm including below)

I asusme there should be a way to reuse this data to recreate the lvm?

any continued advice would be appreciated, googling doesn't seem to come 
up with much if one doesn't have a backup of the lvm data.

raid5 {
id = "8r27WQ-HvIw-0RQV-aksr-LJGN-DLVD-1WBg8h"
seqno = 6
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 8192
max_lv = 0
max_pv = 0

physical_volumes {

pv0 {
id = "7P0W3p-XoPg-rCo8-HJ2G-Hfxc-UDWI-x6nQck"
device = "/dev/md0"

status = ["ALLOCATABLE"]
dev_size = 11721107456
pe_start = 384
pe_count = 1430799
}
}

logical_volumes {

data {
id = "YZvrHt-Glyr-wnj0-QzV1-qRe6-VcRH-D7wU3U"
status = ["READ", "WRITE", "VISIBLE"]
segment_count = 1

segment1 {
start_extent = 0
extent_count = 524288

type = "striped"
stripe_count = 1        # linear

stripes = [
"pv0", 0
]
}
}

image {
id = "uHOzpc-l8L7-eF5h-Fa0C-EsCS-sM6X-3GpOP0"
status = ["READ", "WRITE", "VISIBLE"]
segment_count = 1

segment1 {
start_extent = 0
extent_count = 906511

type = "striped"
stripe_count = 1        # linear
stripes = [
"pv0", 524288
]
}
}
}
}
# Generated by LVM2 version 2.02.39 (2008-06-27): Wed Aug 19 23:36:50 2009

contents = "Text Format Volume Group"
version = 1

description = ""

creation_host = "nas"   # Linux nas 2.6.27-14-generic #1 SMP Fri Jul 24 
22:19:33 UTC 2009 i686
creation_time = 1250739410      # Wed Aug 19 23:36:50 2009


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-29 23:29   ` Shaya Potter
@ 2012-04-29 23:41     ` Shaya Potter
  2012-04-29 23:44     ` NeilBrown
  2012-04-29 23:45     ` NeilBrown
  2 siblings, 0 replies; 12+ messages in thread
From: Shaya Potter @ 2012-04-29 23:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 04/29/2012 07:29 PM, Shaya Potter wrote:
>
> however, now I need to figure out how to recreate the lvm appropriately.
> I see the configuration "file" in at the start of the raid array (less
> -f /dev/md0 which I'm including below)
>
> I asusme there should be a way to reuse this data to recreate the lvm?
>
> any continued advice would be appreciated, googling doesn't seem to come
> up with much if one doesn't have a backup of the lvm data.

<configuration snipped>

found an old linux journal article that seems to say what I should do, 
but not working.

http://www.linuxjournal.com/article/8874?page=0,2

root@nas:~# vgcfgrestore -f 1.cfg raid5
   Incorrect metadata area header checksum
   Incorrect metadata area header checksum
   Restore failed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-29 23:29   ` Shaya Potter
  2012-04-29 23:41     ` Shaya Potter
@ 2012-04-29 23:44     ` NeilBrown
  2012-04-29 23:45     ` NeilBrown
  2 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2012-04-29 23:44 UTC (permalink / raw)
  To: Shaya Potter; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2650 bytes --]

On Sun, 29 Apr 2012 19:29:10 -0400 Shaya Potter <spotter@gmail.com> wrote:

> On 04/29/2012 06:52 PM, NeilBrown wrote:
> >
> > You've written a new superblock 4K in to each device, where previously here
> > was something.   So you have probably corrupted something though we cannot
> > easily tell what.
> >
> > Retry your experiment with --metadata=0.90.  Hopefully one of those
> > combinations will work better.  If it does, make a backup of the data you
> > want to keep, then I would suggest rebuilding the array from scratch.
> 
> ok, thanks, that was a huge help.
> 
> I have it setup correctly now (obvious due to the fact that I can read 
> the lvm configuration without any gibberish when ordered correctly).
> 
> however, now I need to figure out how to recreate the lvm appropriately. 
>   I see the configuration "file" in at the start of the raid array (less 
> -f /dev/md0 which I'm including below)
> 
> I asusme there should be a way to reuse this data to recreate the lvm?
> 
> any continued advice would be appreciated, googling doesn't seem to come 
> up with much if one doesn't have a backup of the lvm data.

I don't use LVM myself so I don't know all the details, but doesn't 'pvscan'
find them?  I don't know what you do after that.

NeilBrown


> 
> raid5 {
> id = "8r27WQ-HvIw-0RQV-aksr-LJGN-DLVD-1WBg8h"
> seqno = 6
> status = ["RESIZEABLE", "READ", "WRITE"]
> extent_size = 8192
> max_lv = 0
> max_pv = 0
> 
> physical_volumes {
> 
> pv0 {
> id = "7P0W3p-XoPg-rCo8-HJ2G-Hfxc-UDWI-x6nQck"
> device = "/dev/md0"
> 
> status = ["ALLOCATABLE"]
> dev_size = 11721107456
> pe_start = 384
> pe_count = 1430799
> }
> }
> 
> logical_volumes {
> 
> data {
> id = "YZvrHt-Glyr-wnj0-QzV1-qRe6-VcRH-D7wU3U"
> status = ["READ", "WRITE", "VISIBLE"]
> segment_count = 1
> 
> segment1 {
> start_extent = 0
> extent_count = 524288
> 
> type = "striped"
> stripe_count = 1        # linear
> 
> stripes = [
> "pv0", 0
> ]
> }
> }
> 
> image {
> id = "uHOzpc-l8L7-eF5h-Fa0C-EsCS-sM6X-3GpOP0"
> status = ["READ", "WRITE", "VISIBLE"]
> segment_count = 1
> 
> segment1 {
> start_extent = 0
> extent_count = 906511
> 
> type = "striped"
> stripe_count = 1        # linear
> stripes = [
> "pv0", 524288
> ]
> }
> }
> }
> }
> # Generated by LVM2 version 2.02.39 (2008-06-27): Wed Aug 19 23:36:50 2009
> 
> contents = "Text Format Volume Group"
> version = 1
> 
> description = ""
> 
> creation_host = "nas"   # Linux nas 2.6.27-14-generic #1 SMP Fri Jul 24 
> 22:19:33 UTC 2009 i686
> creation_time = 1250739410      # Wed Aug 19 23:36:50 2009


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-29 23:29   ` Shaya Potter
  2012-04-29 23:41     ` Shaya Potter
  2012-04-29 23:44     ` NeilBrown
@ 2012-04-29 23:45     ` NeilBrown
  2012-04-29 23:51       ` Shaya Potter
  2012-04-30  0:46       ` Shaya Potter
  2 siblings, 2 replies; 12+ messages in thread
From: NeilBrown @ 2012-04-29 23:45 UTC (permalink / raw)
  To: Shaya Potter; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2782 bytes --]

On Sun, 29 Apr 2012 19:29:10 -0400 Shaya Potter <spotter@gmail.com> wrote:

> On 04/29/2012 06:52 PM, NeilBrown wrote:
> >
> > You've written a new superblock 4K in to each device, where previously here
> > was something.   So you have probably corrupted something though we cannot
> > easily tell what.
> >
> > Retry your experiment with --metadata=0.90.  Hopefully one of those
> > combinations will work better.  If it does, make a backup of the data you
> > want to keep, then I would suggest rebuilding the array from scratch.
> 
> ok, thanks, that was a huge help.
> 
> I have it setup correctly now (obvious due to the fact that I can read 
> the lvm configuration without any gibberish when ordered correctly).

I should add that this only proves that you have the first device correct,
the rest may be wrong.
You need to activate the LVM, then look at the filesystem and see if it is
consistent before you can be sure that all devices are in the correct
position.

NeilBrown



> 
> however, now I need to figure out how to recreate the lvm appropriately. 
>   I see the configuration "file" in at the start of the raid array (less 
> -f /dev/md0 which I'm including below)
> 
> I asusme there should be a way to reuse this data to recreate the lvm?
> 
> any continued advice would be appreciated, googling doesn't seem to come 
> up with much if one doesn't have a backup of the lvm data.
> 
> raid5 {
> id = "8r27WQ-HvIw-0RQV-aksr-LJGN-DLVD-1WBg8h"
> seqno = 6
> status = ["RESIZEABLE", "READ", "WRITE"]
> extent_size = 8192
> max_lv = 0
> max_pv = 0
> 
> physical_volumes {
> 
> pv0 {
> id = "7P0W3p-XoPg-rCo8-HJ2G-Hfxc-UDWI-x6nQck"
> device = "/dev/md0"
> 
> status = ["ALLOCATABLE"]
> dev_size = 11721107456
> pe_start = 384
> pe_count = 1430799
> }
> }
> 
> logical_volumes {
> 
> data {
> id = "YZvrHt-Glyr-wnj0-QzV1-qRe6-VcRH-D7wU3U"
> status = ["READ", "WRITE", "VISIBLE"]
> segment_count = 1
> 
> segment1 {
> start_extent = 0
> extent_count = 524288
> 
> type = "striped"
> stripe_count = 1        # linear
> 
> stripes = [
> "pv0", 0
> ]
> }
> }
> 
> image {
> id = "uHOzpc-l8L7-eF5h-Fa0C-EsCS-sM6X-3GpOP0"
> status = ["READ", "WRITE", "VISIBLE"]
> segment_count = 1
> 
> segment1 {
> start_extent = 0
> extent_count = 906511
> 
> type = "striped"
> stripe_count = 1        # linear
> stripes = [
> "pv0", 524288
> ]
> }
> }
> }
> }
> # Generated by LVM2 version 2.02.39 (2008-06-27): Wed Aug 19 23:36:50 2009
> 
> contents = "Text Format Volume Group"
> version = 1
> 
> description = ""
> 
> creation_host = "nas"   # Linux nas 2.6.27-14-generic #1 SMP Fri Jul 24 
> 22:19:33 UTC 2009 i686
> creation_time = 1250739410      # Wed Aug 19 23:36:50 2009


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-29 23:45     ` NeilBrown
@ 2012-04-29 23:51       ` Shaya Potter
  2012-04-30  0:46       ` Shaya Potter
  1 sibling, 0 replies; 12+ messages in thread
From: Shaya Potter @ 2012-04-29 23:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 04/29/2012 07:45 PM, NeilBrown wrote:
> On Sun, 29 Apr 2012 19:29:10 -0400 Shaya Potter<spotter@gmail.com>  wrote:
>
>> On 04/29/2012 06:52 PM, NeilBrown wrote:
>>>
>>> You've written a new superblock 4K in to each device, where previously here
>>> was something.   So you have probably corrupted something though we cannot
>>> easily tell what.
>>>
>>> Retry your experiment with --metadata=0.90.  Hopefully one of those
>>> combinations will work better.  If it does, make a backup of the data you
>>> want to keep, then I would suggest rebuilding the array from scratch.
>>
>> ok, thanks, that was a huge help.
>>
>> I have it setup correctly now (obvious due to the fact that I can read
>> the lvm configuration without any gibberish when ordered correctly).
>
> I should add that this only proves that you have the first device correct,
> the rest may be wrong.
> You need to activate the LVM, then look at the filesystem and see if it is
> consistent before you can be sure that all devices are in the correct
> position.

true, but that also brings my permutations down to 2 (based on my 
assumption that disks in slot 2 (missing), 3 (uncorrupted), are correct. 
  so hopefully restoring lvm doesn't mess up any other data (though 
probably have to make sure that ext3 doesn't do any journal 
reading/writing in an attempt to mount to see if setup correctly as well).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-29 23:45     ` NeilBrown
  2012-04-29 23:51       ` Shaya Potter
@ 2012-04-30  0:46       ` Shaya Potter
  2012-04-30  1:09         ` NeilBrown
  1 sibling, 1 reply; 12+ messages in thread
From: Shaya Potter @ 2012-04-30  0:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 04/29/2012 07:45 PM, NeilBrown wrote:
> On Sun, 29 Apr 2012 19:29:10 -0400 Shaya Potter<spotter@gmail.com>  wrote:
>
>> On 04/29/2012 06:52 PM, NeilBrown wrote:
>>>
>>> You've written a new superblock 4K in to each device, where previously here
>>> was something.   So you have probably corrupted something though we cannot
>>> easily tell what.
>>>
>>> Retry your experiment with --metadata=0.90.  Hopefully one of those
>>> combinations will work better.  If it does, make a backup of the data you
>>> want to keep, then I would suggest rebuilding the array from scratch.
>>
>> ok, thanks, that was a huge help.
>>
>> I have it setup correctly now (obvious due to the fact that I can read
>> the lvm configuration without any gibberish when ordered correctly).
>
> I should add that this only proves that you have the first device correct,
> the rest may be wrong.
> You need to activate the LVM, then look at the filesystem and see if it is
> consistent before you can be sure that all devices are in the correct
> position.

this cheat sheet came in handy

http://www.datadisk.co.uk/html_docs/redhat/rh_lvm.htm

did the method at the bottom "corrupt LVM metadata but replacing the 
faulty disk"

copy/paste config file out of beginning of fs.

pvcreate --uuid <uuid for pv0, from config file> /dev/md0
vgcfgrestore -f <config file> <pv name>
vgchange -a y <pv name>

some cursory testing of large contigious files that have checksumming 
built in seems to indicate that they are all ok.  probably have other 
corruption due to the md 0,90 to 1.20 metadata booboo, but if that's 
only 16k-20k (4k * 4 or 5 disks) spread out over 3tb of data, I'm very 
happy :)  and it's mostly family photo data, so not the biggest deal if 
the large majority is ok.

<shew> relieved.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-30  0:46       ` Shaya Potter
@ 2012-04-30  1:09         ` NeilBrown
  2012-04-30  1:13           ` Shaya Potter
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2012-04-30  1:09 UTC (permalink / raw)
  To: Shaya Potter; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2172 bytes --]

On Sun, 29 Apr 2012 20:46:36 -0400 Shaya Potter <spotter@gmail.com> wrote:

> On 04/29/2012 07:45 PM, NeilBrown wrote:
> > On Sun, 29 Apr 2012 19:29:10 -0400 Shaya Potter<spotter@gmail.com>  wrote:
> >
> >> On 04/29/2012 06:52 PM, NeilBrown wrote:
> >>>
> >>> You've written a new superblock 4K in to each device, where previously here
> >>> was something.   So you have probably corrupted something though we cannot
> >>> easily tell what.
> >>>
> >>> Retry your experiment with --metadata=0.90.  Hopefully one of those
> >>> combinations will work better.  If it does, make a backup of the data you
> >>> want to keep, then I would suggest rebuilding the array from scratch.
> >>
> >> ok, thanks, that was a huge help.
> >>
> >> I have it setup correctly now (obvious due to the fact that I can read
> >> the lvm configuration without any gibberish when ordered correctly).
> >
> > I should add that this only proves that you have the first device correct,
> > the rest may be wrong.
> > You need to activate the LVM, then look at the filesystem and see if it is
> > consistent before you can be sure that all devices are in the correct
> > position.
> 
> this cheat sheet came in handy
> 
> http://www.datadisk.co.uk/html_docs/redhat/rh_lvm.htm
> 
> did the method at the bottom "corrupt LVM metadata but replacing the 
> faulty disk"
> 
> copy/paste config file out of beginning of fs.
> 
> pvcreate --uuid <uuid for pv0, from config file> /dev/md0
> vgcfgrestore -f <config file> <pv name>
> vgchange -a y <pv name>
> 
> some cursory testing of large contigious files that have checksumming 
> built in seems to indicate that they are all ok.  probably have other 
> corruption due to the md 0,90 to 1.20 metadata booboo, but if that's 
> only 16k-20k (4k * 4 or 5 disks) spread out over 3tb of data, I'm very 
> happy :)  and it's mostly family photo data, so not the biggest deal if 
> the large majority is ok.
> 
> <shew> relieved.

Excellent.  Thanks for keeping us informed.

If you were using 3.3.1, 3.3.2, or 3.3.3 when this happened, then I know what
caused it and suggest upgrading to 3.3.4.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-30  1:09         ` NeilBrown
@ 2012-04-30  1:13           ` Shaya Potter
  2012-04-30  6:29             ` Jan Ceuleers
  0 siblings, 1 reply; 12+ messages in thread
From: Shaya Potter @ 2012-04-30  1:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 04/29/2012 09:09 PM, NeilBrown wrote:
> On Sun, 29 Apr 2012 20:46:36 -0400 Shaya Potter<spotter@gmail.com>  wrote:
>
>> On 04/29/2012 07:45 PM, NeilBrown wrote:
>>> On Sun, 29 Apr 2012 19:29:10 -0400 Shaya Potter<spotter@gmail.com>   wrote:
>>>
>>>> On 04/29/2012 06:52 PM, NeilBrown wrote:
>>>>>
>>>>> You've written a new superblock 4K in to each device, where previously here
>>>>> was something.   So you have probably corrupted something though we cannot
>>>>> easily tell what.
>>>>>
>>>>> Retry your experiment with --metadata=0.90.  Hopefully one of those
>>>>> combinations will work better.  If it does, make a backup of the data you
>>>>> want to keep, then I would suggest rebuilding the array from scratch.
>>>>
>>>> ok, thanks, that was a huge help.
>>>>
>>>> I have it setup correctly now (obvious due to the fact that I can read
>>>> the lvm configuration without any gibberish when ordered correctly).
>>>
>>> I should add that this only proves that you have the first device correct,
>>> the rest may be wrong.
>>> You need to activate the LVM, then look at the filesystem and see if it is
>>> consistent before you can be sure that all devices are in the correct
>>> position.
>>
>> this cheat sheet came in handy
>>
>> http://www.datadisk.co.uk/html_docs/redhat/rh_lvm.htm
>>
>> did the method at the bottom "corrupt LVM metadata but replacing the
>> faulty disk"
>>
>> copy/paste config file out of beginning of fs.
>>
>> pvcreate --uuid<uuid for pv0, from config file>  /dev/md0
>> vgcfgrestore -f<config file>  <pv name>
>> vgchange -a y<pv name>
>>
>> some cursory testing of large contigious files that have checksumming
>> built in seems to indicate that they are all ok.  probably have other
>> corruption due to the md 0,90 to 1.20 metadata booboo, but if that's
>> only 16k-20k (4k * 4 or 5 disks) spread out over 3tb of data, I'm very
>> happy :)  and it's mostly family photo data, so not the biggest deal if
>> the large majority is ok.
>>
>> <shew>  relieved.
>
> Excellent.  Thanks for keeping us informed.
>
> If you were using 3.3.1, 3.3.2, or 3.3.3 when this happened, then I know what
> caused it and suggest upgrading to 3.3.4.

dont think so.  main disk died, so plugged a new main disk in and 
installed ubuntu 12.04 server on it, but it wasn't playing nice, so 
turned around and installed debian squeeze and thats when I noticed the 
issue.  debian is running 2.6.32.  Ubuntu is running some 3.something, 
but unsure which one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-30  1:13           ` Shaya Potter
@ 2012-04-30  6:29             ` Jan Ceuleers
  2012-04-30 15:33               ` Shaya Potter
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Ceuleers @ 2012-04-30  6:29 UTC (permalink / raw)
  To: Shaya Potter; +Cc: NeilBrown, linux-raid

On 30/04/12 03:13, Shaya Potter wrote:
> On 04/29/2012 09:09 PM, NeilBrown wrote:
>> If you were using 3.3.1, 3.3.2, or 3.3.3 when this happened, then I
>> know what
>> caused it and suggest upgrading to 3.3.4.
>
> dont think so. main disk died, so plugged a new main disk in and
> installed ubuntu 12.04 server on it, but it wasn't playing nice, so
> turned around and installed debian squeeze and thats when I noticed the
> issue. debian is running 2.6.32. Ubuntu is running some 3.something, but
> unsure which one.

Ubuntu 12.04 includes a 3.2.0-based kernel.

The issue was introduced in commit c744a65c1e2d59acc54333ce8 (included 
in 3.3-rc7) and fixed by commit 30b8aa9172dfeaac6d77897c67ee9f9fc574cdbb 
(included in 3.4-rc1). The trouble is that the faulty commit was 
submitted to stable, with the request to backport it as far as 
practicable ("This is suitable for any stable kernel (though there might 
be some conflicts with obvious fixes in earlier kernels)").

I haven't checked, but I'm fairly sure that the Ubuntu 12.04 kernel does 
indeed include the faulty commit and does not yet have the fix (as the 
fix wasn't upstreamed until last week).

HTH, Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recovering from raid5 corruption
  2012-04-30  6:29             ` Jan Ceuleers
@ 2012-04-30 15:33               ` Shaya Potter
  0 siblings, 0 replies; 12+ messages in thread
From: Shaya Potter @ 2012-04-30 15:33 UTC (permalink / raw)
  To: Jan Ceuleers; +Cc: NeilBrown, linux-raid

On 04/30/2012 02:29 AM, Jan Ceuleers wrote:
> On 30/04/12 03:13, Shaya Potter wrote:
>> On 04/29/2012 09:09 PM, NeilBrown wrote:
>>> If you were using 3.3.1, 3.3.2, or 3.3.3 when this happened, then I
>>> know what
>>> caused it and suggest upgrading to 3.3.4.
>>
>> dont think so. main disk died, so plugged a new main disk in and
>> installed ubuntu 12.04 server on it, but it wasn't playing nice, so
>> turned around and installed debian squeeze and thats when I noticed the
>> issue. debian is running 2.6.32. Ubuntu is running some 3.something, but
>> unsure which one.
>
> Ubuntu 12.04 includes a 3.2.0-based kernel.
>
> The issue was introduced in commit c744a65c1e2d59acc54333ce8 (included
> in 3.3-rc7) and fixed by commit 30b8aa9172dfeaac6d77897c67ee9f9fc574cdbb
> (included in 3.4-rc1). The trouble is that the faulty commit was
> submitted to stable, with the request to backport it as far as
> practicable ("This is suitable for any stable kernel (though there might
> be some conflicts with obvious fixes in earlier kernels)").
>
> I haven't checked, but I'm fairly sure that the Ubuntu 12.04 kernel does
> indeed include the faulty commit and does not yet have the fix (as the
> fix wasn't upstreamed until last week).

confirmed with #ubuntu-kernel guys that they have the bad commit in 
their kernel, so they now know about it and seem on top of it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-04-30 15:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-29 22:38 recovering from raid5 corruption Shaya Potter
2012-04-29 22:52 ` NeilBrown
2012-04-29 23:29   ` Shaya Potter
2012-04-29 23:41     ` Shaya Potter
2012-04-29 23:44     ` NeilBrown
2012-04-29 23:45     ` NeilBrown
2012-04-29 23:51       ` Shaya Potter
2012-04-30  0:46       ` Shaya Potter
2012-04-30  1:09         ` NeilBrown
2012-04-30  1:13           ` Shaya Potter
2012-04-30  6:29             ` Jan Ceuleers
2012-04-30 15:33               ` Shaya Potter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.