All of lore.kernel.org
 help / color / mirror / Atom feed
* Recover RAID6 with 4 disks removed
@ 2020-02-06 13:46 Nicolas Karolak
  2020-02-06 14:07 ` Reindl Harald
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Karolak @ 2020-02-06 13:46 UTC (permalink / raw)
  To: linux-raid

Hello,
I have (had...) a RAID6 array with 8 disks and tried to remove 4 disks
from it, and obviously i messed up. Here is the commands i issued (i
do not have the output of them):

```
mdadm --manage /dev/md1 --fail /dev/sdh
mdadm --manage /dev/md1 --fail /dev/sdg
mdadm --detail /dev/md1
cat /proc/mdstat
mdadm --manage /dev/md1 --fail /dev/sdf
mdadm --manage /dev/md1 --fail /dev/sde
mdadm --detail /dev/md1
cat /proc/mdstat
mdadm --manage /dev/md1 --remove /dev/sdh
mdadm --manage /dev/md1 --remove /dev/sdg
mdadm --manage /dev/md1 --remove /dev/sde
mdadm --manage /dev/md1 --remove /dev/sdf
mdadm --detail /dev/md1
cat /proc/mdstat
mdadm --grow /dev/md1 --raid-devices=4
mdadm --grow /dev/md1 --array-size 7780316160  # from here it start
going wrong on the system
```

I began to have "inpout/output" error, `ls` or `cat` or almost every
command was not working (something like "/usr/sbin/ls not found").
`mdadm` command was still working, so i did that:

```
mdadm --manage /dev/md1 --re-add /dev/sde
mdadm --manage /dev/md1 --re-add /dev/sdf
mdadm --manage /dev/md1 --re-add /dev/sdg
mdadm --manage /dev/md1 --re-add /dev/sdh
mdadm --grow /dev/md1 --raid-devices=8
```

The disks were re-added, but as "spares". After that i powered down
the server and made backup of the disks with `dd`.

Is there any hope to retrieve the data? If yes, then how?

Any help is really appreciated. Thanks in advance.

Regards,
Nicolas KAROLAK

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recover RAID6 with 4 disks removed
  2020-02-06 13:46 Recover RAID6 with 4 disks removed Nicolas Karolak
@ 2020-02-06 14:07 ` Reindl Harald
  2020-02-06 16:02   ` Nicolas KAROLAK
  2020-02-06 16:22   ` Robin Hill
  0 siblings, 2 replies; 6+ messages in thread
From: Reindl Harald @ 2020-02-06 14:07 UTC (permalink / raw)
  To: Nicolas Karolak, linux-raid



Am 06.02.20 um 14:46 schrieb Nicolas Karolak:
> I have (had...) a RAID6 array with 8 disks and tried to remove 4 disks
> from it, and obviously i messed up. Here is the commands i issued (i
> do not have the output of them):

didn't you realize that RAID6 has redundancy to survive *exactly two*
failing disks no matter how many disks the array has anmd the data and
redundancy informations are spread ove the disks?

> mdadm --manage /dev/md1 --fail /dev/sdh
> mdadm --manage /dev/md1 --fail /dev/sdg
> mdadm --detail /dev/md1
> cat /proc/mdstat
> mdadm --manage /dev/md1 --fail /dev/sdf
> mdadm --manage /dev/md1 --fail /dev/sde
> mdadm --detail /dev/md1
> cat /proc/mdstat
> mdadm --manage /dev/md1 --remove /dev/sdh
> mdadm --manage /dev/md1 --remove /dev/sdg
> mdadm --manage /dev/md1 --remove /dev/sde
> mdadm --manage /dev/md1 --remove /dev/sdf
> mdadm --detail /dev/md1
> cat /proc/mdstat
> mdadm --grow /dev/md1 --raid-devices=4
> mdadm --grow /dev/md1 --array-size 7780316160  # from here it start
> going wrong on the system

becaue mdadm din't't prevent you from shoot yourself in the foot, likely
for cases when one needs a hammer for restore from a uncommon state as
last ressort

set more than one disk at the same time to "fail" is aksing for troubles
no matter what

what happens when one drive starts to puke when you removed every
redundancy and happily start a reshape that implies heavy IO?

> I began to have "inpout/output" error, `ls` or `cat` or almost every
> command was not working (something like "/usr/sbin/ls not found").
> `mdadm` command was still working, so i did that:
> 
> ```
> mdadm --manage /dev/md1 --re-add /dev/sde
> mdadm --manage /dev/md1 --re-add /dev/sdf
> mdadm --manage /dev/md1 --re-add /dev/sdg
> mdadm --manage /dev/md1 --re-add /dev/sdh
> mdadm --grow /dev/md1 --raid-devices=8
> ```
> 
> The disks were re-added, but as "spares". After that i powered down
> the server and made backup of the disks with `dd`.
> 
> Is there any hope to retrieve the data? If yes, then how?

unlikely - the started reshape did writes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recover RAID6 with 4 disks removed
  2020-02-06 14:07 ` Reindl Harald
@ 2020-02-06 16:02   ` Nicolas KAROLAK
  2020-02-06 19:27     ` Reindl Harald
  2020-02-06 16:22   ` Robin Hill
  1 sibling, 1 reply; 6+ messages in thread
From: Nicolas KAROLAK @ 2020-02-06 16:02 UTC (permalink / raw)
  To: Reindl Harald; +Cc: linux-raid

On Thu, Feb 06, 2020 at 03:07:00PM +0100, Reindl Harald wrote:
> didn't you realize that RAID6 has redundancy to survive *exactly two*
> failing disks no matter how many disks the array has anmd the data and
> redundancy informations are spread ove the disks?

Not at the moment, i tested on a VM before but with 6 disks and removing 2
and then did it on the server without thinking/realizing than 4 is different
than 2 and that it would obsviously f**k the RAID array... (>_<')

> unlikely - the started reshape did writes

That's what i was afraid of. Thank you anyway for the answer.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recover RAID6 with 4 disks removed
  2020-02-06 14:07 ` Reindl Harald
  2020-02-06 16:02   ` Nicolas KAROLAK
@ 2020-02-06 16:22   ` Robin Hill
  2020-02-06 18:11     ` Wols Lists
  1 sibling, 1 reply; 6+ messages in thread
From: Robin Hill @ 2020-02-06 16:22 UTC (permalink / raw)
  To: Nicolas Karolak; +Cc: Reindl Harald, linux-raid

On Thu Feb 06, 2020 at 03:07:00PM +0100, Reindl Harald wrote:

> Am 06.02.20 um 14:46 schrieb Nicolas Karolak:
> > I have (had...) a RAID6 array with 8 disks and tried to remove 4 disks
> > from it, and obviously i messed up. Here is the commands i issued (i
> > do not have the output of them):
> 
> didn't you realize that RAID6 has redundancy to survive *exactly two*
> failing disks no matter how many disks the array has anmd the data and
> redundancy informations are spread ove the disks?
> 
> > mdadm --manage /dev/md1 --fail /dev/sdh
> > mdadm --manage /dev/md1 --fail /dev/sdg
> > mdadm --detail /dev/md1
> > cat /proc/mdstat
> > mdadm --manage /dev/md1 --fail /dev/sdf
> > mdadm --manage /dev/md1 --fail /dev/sde
> > mdadm --detail /dev/md1
> > cat /proc/mdstat
> > mdadm --manage /dev/md1 --remove /dev/sdh
> > mdadm --manage /dev/md1 --remove /dev/sdg
> > mdadm --manage /dev/md1 --remove /dev/sde
> > mdadm --manage /dev/md1 --remove /dev/sdf
> > mdadm --detail /dev/md1
> > cat /proc/mdstat
> > mdadm --grow /dev/md1 --raid-devices=4
> > mdadm --grow /dev/md1 --array-size 7780316160  # from here it start
> > going wrong on the system
> 
> becaue mdadm din't't prevent you from shoot yourself in the foot, likely
> for cases when one needs a hammer for restore from a uncommon state as
> last ressort
> 
> set more than one disk at the same time to "fail" is aksing for troubles
> no matter what
> 
> what happens when one drive starts to puke when you removed every
> redundancy and happily start a reshape that implies heavy IO?
> 
> > I began to have "inpout/output" error, `ls` or `cat` or almost every
> > command was not working (something like "/usr/sbin/ls not found").
> > `mdadm` command was still working, so i did that:
> > 
> > ```
> > mdadm --manage /dev/md1 --re-add /dev/sde
> > mdadm --manage /dev/md1 --re-add /dev/sdf
> > mdadm --manage /dev/md1 --re-add /dev/sdg
> > mdadm --manage /dev/md1 --re-add /dev/sdh
> > mdadm --grow /dev/md1 --raid-devices=8
> > ```
> > 
> > The disks were re-added, but as "spares". After that i powered down
> > the server and made backup of the disks with `dd`.
> > 
> > Is there any hope to retrieve the data? If yes, then how?
> 
> unlikely - the started reshape did writes

I don't think it'll have written anything as the array was in a failed
state. You'll have lost the metadata on the original disks though as
they were removed & re-added (unless you have anything recording these
before the above operations?) so that means doing a create --assume-clean
and "fsck -n" loop with all combinations until you find the correct
order (and assumes they were added at the same time and so share the
same offset). At least you know the positions of 4 of the array members,
so that reduces the number of combinations you'll need.

Check the wiki - there should be instructions on there regarding use of
overlays to prevent further accidental damage. There may even be scripts
to help with automating the create/fsck process.

Cheers,
    Robin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recover RAID6 with 4 disks removed
  2020-02-06 16:22   ` Robin Hill
@ 2020-02-06 18:11     ` Wols Lists
  0 siblings, 0 replies; 6+ messages in thread
From: Wols Lists @ 2020-02-06 18:11 UTC (permalink / raw)
  To: Nicolas Karolak, Reindl Harald, linux-raid

On 06/02/20 16:22, Robin Hill wrote:
> On Thu Feb 06, 2020 at 03:07:00PM +0100, Reindl Harald wrote:
> 
>> Am 06.02.20 um 14:46 schrieb Nicolas Karolak:
>>> I have (had...) a RAID6 array with 8 disks and tried to remove 4 disks
>>> from it, and obviously i messed up. Here is the commands i issued (i
>>> do not have the output of them):
>>
>> didn't you realize that RAID6 has redundancy to survive *exactly two*
>> failing disks no matter how many disks the array has anmd the data and
>> redundancy informations are spread ove the disks?
>>
>>> mdadm --manage /dev/md1 --fail /dev/sdh
>>> mdadm --manage /dev/md1 --fail /dev/sdg
>>> mdadm --detail /dev/md1
>>> cat /proc/mdstat
>>> mdadm --manage /dev/md1 --fail /dev/sdf
>>> mdadm --manage /dev/md1 --fail /dev/sde
>>> mdadm --detail /dev/md1
>>> cat /proc/mdstat
>>> mdadm --manage /dev/md1 --remove /dev/sdh
>>> mdadm --manage /dev/md1 --remove /dev/sdg
>>> mdadm --manage /dev/md1 --remove /dev/sde
>>> mdadm --manage /dev/md1 --remove /dev/sdf
>>> mdadm --detail /dev/md1
>>> cat /proc/mdstat
>>> mdadm --grow /dev/md1 --raid-devices=4
>>> mdadm --grow /dev/md1 --array-size 7780316160  # from here it start
>>> going wrong on the system
>>
>> becaue mdadm din't't prevent you from shoot yourself in the foot, likely
>> for cases when one needs a hammer for restore from a uncommon state as
>> last ressort
>>
>> set more than one disk at the same time to "fail" is aksing for troubles
>> no matter what
>>
>> what happens when one drive starts to puke when you removed every
>> redundancy and happily start a reshape that implies heavy IO?
>>
>>> I began to have "inpout/output" error, `ls` or `cat` or almost every
>>> command was not working (something like "/usr/sbin/ls not found").
>>> `mdadm` command was still working, so i did that:
>>>
>>> ```
>>> mdadm --manage /dev/md1 --re-add /dev/sde
>>> mdadm --manage /dev/md1 --re-add /dev/sdf
>>> mdadm --manage /dev/md1 --re-add /dev/sdg
>>> mdadm --manage /dev/md1 --re-add /dev/sdh
>>> mdadm --grow /dev/md1 --raid-devices=8
>>> ```
>>>
>>> The disks were re-added, but as "spares". After that i powered down
>>> the server and made backup of the disks with `dd`.
>>>
>>> Is there any hope to retrieve the data? If yes, then how?
>>
>> unlikely - the started reshape did writes
> 
> I don't think it'll have written anything as the array was in a failed
> state.

That was my reaction, too ...

> You'll have lost the metadata on the original disks though as
> they were removed & re-added (unless you have anything recording these
> before the above operations?)

Will you?

> so that means doing a create --assume-clean
> and "fsck -n" loop with all combinations until you find the correct
> order (and assumes they were added at the same time and so share the
> same offset). At least you know the positions of 4 of the array members,
> so that reduces the number of combinations you'll need.

I'm not sure about that ... BUT DO NOT try anything that may be
destructive without making sure you've got backups !!!

What I would try (there've been plenty of reports of disks being added
back as spares) is to take out sdh and sdg (the first two disks to be
removed) which will give you a degraded 6-drive array that SHOULD have
all the data on it. Do a forced assembly and run - it will hopefully work!

If it does, then you need to re-add the other two drives back, and hope
nothing else goes wrong while the array sorts itself out ...
> 
> Check the wiki - there should be instructions on there regarding use of
> overlays to prevent further accidental damage. There may even be scripts
> to help with automating the create/fsck process.
> 
https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

> Cheers,
>     Robin
> 
Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recover RAID6 with 4 disks removed
  2020-02-06 16:02   ` Nicolas KAROLAK
@ 2020-02-06 19:27     ` Reindl Harald
  0 siblings, 0 replies; 6+ messages in thread
From: Reindl Harald @ 2020-02-06 19:27 UTC (permalink / raw)
  To: Nicolas KAROLAK; +Cc: linux-raid



Am 06.02.20 um 17:02 schrieb Nicolas KAROLAK:
> On Thu, Feb 06, 2020 at 03:07:00PM +0100, Reindl Harald wrote:
>> didn't you realize that RAID6 has redundancy to survive *exactly two*
>> failing disks no matter how many disks the array has anmd the data and
>> redundancy informations are spread ove the disks?
> 
> Not at the moment, i tested on a VM before but with 6 disks and removing 2
> and then did it on the server without thinking/realizing than 4 is different
> than 2 and that it would obsviously f**k the RAID array... (>_<')

seriously?

but even without knowing what one is doing who sane in his mind removes
more than one disk at the same time from a working array with data?

restore your backups as you had the space to make dd-images i guess you
have made backups before such an operation

https://en.wikipedia.org/wiki/RAID

RAID 5 consists of block-level striping with distributed parity. Unlike
RAID 4, parity information is distributed among the drives, requiring
all drives but one to be present to operate

RAID 6 consists of block-level striping with double distributed parity.
Double parity provides fault tolerance up to two failed drives

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-06 19:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-06 13:46 Recover RAID6 with 4 disks removed Nicolas Karolak
2020-02-06 14:07 ` Reindl Harald
2020-02-06 16:02   ` Nicolas KAROLAK
2020-02-06 19:27     ` Reindl Harald
2020-02-06 16:22   ` Robin Hill
2020-02-06 18:11     ` Wols Lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.