All of lore.kernel.org
 help / color / mirror / Atom feed
* raid5 to raid6 reshape never appeared to start, how to cancel/revert
@ 2017-05-22 18:57 Roger Heflin
  2017-05-22 19:33 ` Andreas Klauer
  0 siblings, 1 reply; 4+ messages in thread
From: Roger Heflin @ 2017-05-22 18:57 UTC (permalink / raw)
  To: Linux RAID

I had a 3 disk raid5 with a hot spare.  I ran this:
mdadm --grow /dev/md126 --level=6 --backup-file /root/r6rebuild

I suspect I should have changed the number of devices in the above command to 4.

The reshape "started" according to /proc/mdstat but never got past
block 1, and the time to complete started going up.  I did stop the
array and have tried to do a revert-reshape but it indicates it will
only revert a number of devices change.

The backup-file was created on a separate ssd.

trying assemble now gets this:
 mdadm --assemble /dev/md126 /dev/sd[abe]1 /dev/sdd
--backup-file=/root/r6rebuild
mdadm: Failed to restore critical section for reshape, sorry.

examine shows this (sdd was the spare when the --grow was issues)
 mdadm --examine /dev/sdd
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 2fb920b1:ce7407fd:dd1a1aa6:74dcda71
  Creation Time : Wed May 19 19:04:03 2010
     Raid Level : raid6
  Used Dev Size : 488384384 (465.76 GiB 500.11 GB)
     Array Size : 976768768 (931.52 GiB 1000.21 GB)


   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 126

  Reshape pos'n : 0
     New Layout : left-symmetric

    Update Time : Mon May 22 09:25:38 2017
          State : clean
Internal Bitmap : present
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : bc406f24 - correct
         Events : 6140735

         Layout : left-symmetric-6
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       48        3      active   /dev/sdd

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8        1        2      active sync   /dev/sda1
   3     3       8       48        3      active   /dev/sdd

mdadm-3.4-2.fc25.x86_64

kernel 4.10.15.200 fc25 fully updated as of 2 days ago.

Examine seems to indicate that the reshape never stopped but the
revert is unable to cancel this grow even though it did not start at
all.

The data is not super critical I don't believe I lose anything as this
one is being used primarily for backups.

It does appear that I added sdd rather than sdd1 but I don't believe
that is anything critical to the issue as it should still work fine
with the entire disk.

Ideas on how to abort the reshape that never started or how to get it
to continue?   The desired final target is a 4 disk raid6.  I have not
as of yet rebooted.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid5 to raid6 reshape never appeared to start, how to cancel/revert
  2017-05-22 18:57 raid5 to raid6 reshape never appeared to start, how to cancel/revert Roger Heflin
@ 2017-05-22 19:33 ` Andreas Klauer
  2017-05-22 20:04   ` Roger Heflin
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Klauer @ 2017-05-22 19:33 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Linux RAID

On Mon, May 22, 2017 at 01:57:44PM -0500, Roger Heflin wrote:
> I had a 3 disk raid5 with a hot spare.  I ran this:
> mdadm --grow /dev/md126 --level=6 --backup-file /root/r6rebuild
> 
> I suspect I should have changed the number of devices in the above command to 4.

It doesn't hurt to specify, but that much is implied.
Growing 3 device raid5 + spare to raid6 results in 4 device raid6.

> The backup-file was created on a separate ssd.

Is there anything meaningful in this file?
 
> trying assemble now gets this:
>  mdadm --assemble /dev/md126 /dev/sd[abe]1 /dev/sdd
> --backup-file=/root/r6rebuild
> mdadm: Failed to restore critical section for reshape, sorry.
> 
> examine shows this (sdd was the spare when the --grow was issues)
>  mdadm --examine /dev/sdd
> /dev/sdd1:

You wrote /dev/sdd above, is it sdd1 now? 

>         Version : 0.91.00

Ancient metadata. You could probably update it to 1.0...

>   Reshape pos'n : 0

So maybe nothing at all changed on disk?

You could try your luck with overlay

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

mdadm --create /dev/md42 --metadata=0.90 --level=5 --chunk=64 \
      --raid-devices=3 /dev/overlay/{a,b,c}

> It does appear that I added sdd rather than sdd1 but I don't believe
> that is anything critical to the issue as it should still work fine
> with the entire disk.

It is critical because if you use the wrong one the data will be shifted.

If the partition goes to the very end of the drive, I think the 0.90 
metadata could be interpreted both ways (as metadata for partition 
as well as whole drive).

If possible you should find some way to migrate to 1.2 metadata.
But worry about it once you have access to your data.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid5 to raid6 reshape never appeared to start, how to cancel/revert
  2017-05-22 19:33 ` Andreas Klauer
@ 2017-05-22 20:04   ` Roger Heflin
  2017-05-26 19:27     ` Roger Heflin
  0 siblings, 1 reply; 4+ messages in thread
From: Roger Heflin @ 2017-05-22 20:04 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Linux RAID

On Mon, May 22, 2017 at 2:33 PM, Andreas Klauer
<Andreas.Klauer@metamorpher.de> wrote:
> On Mon, May 22, 2017 at 01:57:44PM -0500, Roger Heflin wrote:
>> I had a 3 disk raid5 with a hot spare.  I ran this:
>> mdadm --grow /dev/md126 --level=6 --backup-file /root/r6rebuild
>>
>> I suspect I should have changed the number of devices in the above command to 4.
>
> It doesn't hurt to specify, but that much is implied.
> Growing 3 device raid5 + spare to raid6 results in 4 device raid6.
>

Yes.

>> The backup-file was created on a separate ssd.
>
> Is there anything meaningful in this file?
>

16MB in size, but od -x indicates all zeros, so no, there is nothing
meaningful in the file.

>> trying assemble now gets this:
>>  mdadm --assemble /dev/md126 /dev/sd[abe]1 /dev/sdd
>> --backup-file=/root/r6rebuild
>> mdadm: Failed to restore critical section for reshape, sorry.
>>
>> examine shows this (sdd was the spare when the --grow was issues)
>>  mdadm --examine /dev/sdd
>> /dev/sdd1:
>
> You wrote /dev/sdd above, is it sdd1 now?
>
>>         Version : 0.91.00
>
> Ancient metadata. You could probably update it to 1.0...
>

I know.

>>   Reshape pos'n : 0
>
> So maybe nothing at all changed on disk?
>
> You could try your luck with overlay
>
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>
> mdadm --create /dev/md42 --metadata=0.90 --level=5 --chunk=64 \
>       --raid-devices=3 /dev/overlay/{a,b,c}
>
>> It does appear that I added sdd rather than sdd1 but I don't believe
>> that is anything critical to the issue as it should still work fine
>> with the entire disk.
>
> It is critical because if you use the wrong one the data will be shifted.
>
> If the partition goes to the very end of the drive, I think the 0.90
> metadata could be interpreted both ways (as metadata for partition
> as well as whole drive).
>
> If possible you should find some way to migrate to 1.2 metadata.
> But worry about it once you have access to your data.
>

I deal with others messing up partition/no partition recoveries often
enough to not be worried about how to debug and/or fix that mistake.

I found a patch from Neil from 2016 that may be solution to this
issue, I am not clear if it is an exact match to my issue, it looks
pretty close.

http://comments.gmane.org/gmane.linux.raid/51095

> Regards
> Andreas Klauer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid5 to raid6 reshape never appeared to start, how to cancel/revert
  2017-05-22 20:04   ` Roger Heflin
@ 2017-05-26 19:27     ` Roger Heflin
  0 siblings, 0 replies; 4+ messages in thread
From: Roger Heflin @ 2017-05-26 19:27 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Linux RAID

On Mon, May 22, 2017 at 3:04 PM, Roger Heflin <rogerheflin@gmail.com> wrote:
> On Mon, May 22, 2017 at 2:33 PM, Andreas Klauer
> <Andreas.Klauer@metamorpher.de> wrote:
>> On Mon, May 22, 2017 at 01:57:44PM -0500, Roger Heflin wrote:
>>> I had a 3 disk raid5 with a hot spare.  I ran this:
>>> mdadm --grow /dev/md126 --level=6 --backup-file /root/r6rebuild
>>>
>>> I suspect I should have changed the number of devices in the above command to 4.
>>
>> It doesn't hurt to specify, but that much is implied.
>> Growing 3 device raid5 + spare to raid6 results in 4 device raid6.
>>
>
> Yes.
>
>>> The backup-file was created on a separate ssd.
>>
>> Is there anything meaningful in this file?
>>
>
> 16MB in size, but od -x indicates all zeros, so no, there is nothing
> meaningful in the file.
>
>>> trying assemble now gets this:
>>>  mdadm --assemble /dev/md126 /dev/sd[abe]1 /dev/sdd
>>> --backup-file=/root/r6rebuild
>>> mdadm: Failed to restore critical section for reshape, sorry.
>>>
>>> examine shows this (sdd was the spare when the --grow was issues)
>>>  mdadm --examine /dev/sdd
>>> /dev/sdd1:
>>
>> You wrote /dev/sdd above, is it sdd1 now?
>>
>>>         Version : 0.91.00
>>
>> Ancient metadata. You could probably update it to 1.0...
>>
>
> I know.
>
>>>   Reshape pos'n : 0
>>
>> So maybe nothing at all changed on disk?
>>
>> You could try your luck with overlay
>>
>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>>
>> mdadm --create /dev/md42 --metadata=0.90 --level=5 --chunk=64 \
>>       --raid-devices=3 /dev/overlay/{a,b,c}
>>
>>> It does appear that I added sdd rather than sdd1 but I don't believe
>>> that is anything critical to the issue as it should still work fine
>>> with the entire disk.
>>
>> It is critical because if you use the wrong one the data will be shifted.
>>
>> If the partition goes to the very end of the drive, I think the 0.90
>> metadata could be interpreted both ways (as metadata for partition
>> as well as whole drive).
>>
>> If possible you should find some way to migrate to 1.2 metadata.
>> But worry about it once you have access to your data.
>>
>
> I deal with others messing up partition/no partition recoveries often
> enough to not be worried about how to debug and/or fix that mistake.
>
> I found a patch from Neil from 2016 that may be solution to this
> issue, I am not clear if it is an exact match to my issue, it looks
> pretty close.
>
> http://comments.gmane.org/gmane.linux.raid/51095
>
>> Regards
>> Andreas Klauer

Thanks for the ideas.   The patch I mentioned was already in the mdadm
that I had so that was no help.

I got it back by doing an -assume-clean and initially I could see the
pv but not the vg, I checked the device and it did look like a few kb
was missing between the pv label and the first vgdata I saw on the
disk.

I tried a vgcfgrestore and that failed with some weird errors I have
never seen before about failure to write and checksum failures (and I
have used vgcfgrestore a number of times successfully before).  I
finally saved out the first 1M for data to another disk and then
zeroed where the header should have been and did a pvrestore --uuid
and then a vgcfgrestore again and a vgchange -ay and it found the lv
and the filesystem appears to be fully intact.  I am guessing that
something did write to a few k to the disk during the attempt to raid6
it.  I am verifying and/or saving anything that I want (there may be
nothing important on it) and then will rebuild it as a new raid6 with
new metadata.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-05-26 19:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-22 18:57 raid5 to raid6 reshape never appeared to start, how to cancel/revert Roger Heflin
2017-05-22 19:33 ` Andreas Klauer
2017-05-22 20:04   ` Roger Heflin
2017-05-26 19:27     ` Roger Heflin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.