All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
@ 2015-12-09 23:12 George Rapp
  2015-12-10  8:22 ` Mikael Abrahamsson
  0 siblings, 1 reply; 10+ messages in thread
From: George Rapp @ 2015-12-09 23:12 UTC (permalink / raw)
  To: linux-raid

Hello RAID folks -

Reference my old thread at
http://marc.info/?l=linux-raid&m=143880359028232&w=2 which I'm just
now getting back around to working on, the delay having been caused by
my need to clone a failing disk.

Recall that I was attempting to grow a 5-disk RAID 6 array to 6 disks,
but writing to the backup-file was inhibited by SELinux.

# mdadm --add /dev/md4 /dev/sdi1
# mdadm --grow --raid-devices=6
--backup-file=/home/gwr/2015/2015-08/grow_md4.bak /dev/md4

The second command threw a bunch of SELinux errors about access to
/home/gwr/c/grow_md4.bak. The reshape operation sat for many minutes
at 0% progress, according to /proc/mdstat. However, the file
/home/gwr/c/grow_md4.bak *was* created with a size of 6295552 bytes/

I appeared to experience a segmentation fault or other runtime error
when attempting to stop the RAID 6 array with "#mdadm --stop
/dev/md4". System log from that situation has been uploaded to
https://app.box.com/s/3pksam3c7n79anpnzvsrwekzqwtsvlf6 Notably, the
backup file was created, but contains all zero/null characters.

My most recent raid.status file (generated using command "# mdadm
--examine /dev/sd[cdg]4 /dev/sd[hij]1" - and, yes, I know my partition
layout is a mess) has been uploaded to
https://app.box.com/s/pbienbpdanr0rq224b9ag2qu36vk76iv What I find
interesting about this status file is that the reshape of the array
appears to have made no progress (note the presence of " Reshape pos'n
: 0 " on all six devices).

I have been using the recovery advice found at
https://raid.wiki.kernel.org/index.php/RAID_Recovery and
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID,
especially the part about creating overlay files so as not to damage
my actual disks. The overlay devices are in the variable $OVERLAYS.

When I attempt to assemble the RAID 6 array using the backup file, I get this:

# mdadm --assemble --verbose --force
--backup-file=/home/gwr/2015/2015-08/grow_md6.bak /dev/md4 $OVERLAYS
mdadm: looking for devices for /dev/md4
mdadm: /dev/mapper/sdg4 is identified as a member of /dev/md4, slot 2.
mdadm: /dev/mapper/sdi1 is identified as a member of /dev/md4, slot 5.
mdadm: /dev/mapper/sdc4 is identified as a member of /dev/md4, slot 1.
mdadm: /dev/mapper/sdh1 is identified as a member of /dev/md4, slot 3.
mdadm: /dev/mapper/sdj1 is identified as a member of /dev/md4, slot 4.
mdadm: /dev/mapper/sdd4 is identified as a member of /dev/md4, slot 0.
mdadm: /dev/md4 has an active reshape - checking if critical section
needs to be restored
mdadm: No backup metadata on /home/gwr/2015/2015-08/grow_md6.bak
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.

and when I omit the backup file, I get this:

# mdadm --assemble --verbose --force --invalid-backup /dev/md4 $OVERLAYS
mdadm: looking for devices for /dev/md4
mdadm: /dev/mapper/sdg4 is identified as a member of /dev/md4, slot 2.
mdadm: /dev/mapper/sdi1 is identified as a member of /dev/md4, slot 5.
mdadm: /dev/mapper/sdc4 is identified as a member of /dev/md4, slot 1.
mdadm: /dev/mapper/sdh1 is identified as a member of /dev/md4, slot 3.
mdadm: /dev/mapper/sdj1 is identified as a member of /dev/md4, slot 4.
mdadm: /dev/mapper/sdd4 is identified as a member of /dev/md4, slot 0.
mdadm: /dev/md4 has an active reshape - checking if critical section
needs to be restored
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/mapper/sdc4 to /dev/md4 as 1
mdadm: added /dev/mapper/sdg4 to /dev/md4 as 2
mdadm: added /dev/mapper/sdh1 to /dev/md4 as 3
mdadm: added /dev/mapper/sdj1 to /dev/md4 as 4
mdadm: added /dev/mapper/sdi1 to /dev/md4 as 5
mdadm: added /dev/mapper/sdd4 to /dev/md4 as 0
mdadm: /dev/md4: Need a backup file to complete reshape of this array.
mdadm: Please provided one with "--backup-file=..."

I even tried --update=revert-reshape; no luck:

# mdadm --assemble --verbose --invalid-backup --force
--update=revert-reshape /dev/md4 $OVERLAYS
mdadm: looking for devices for /dev/md4
mdadm: /dev/mapper/sdg4 is identified as a member of /dev/md4, slot 2.
mdadm: /dev/mapper/sdi1 is identified as a member of /dev/md4, slot 5.
mdadm: /dev/mapper/sdc4 is identified as a member of /dev/md4, slot 1.
mdadm: /dev/mapper/sdh1 is identified as a member of /dev/md4, slot 3.
mdadm: /dev/mapper/sdj1 is identified as a member of /dev/md4, slot 4.
mdadm: /dev/mapper/sdd4 is identified as a member of /dev/md4, slot 0.
mdadm: /dev/md4 has an active reshape - checking if critical section
needs to be restored
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/mapper/sdc4 to /dev/md4 as 1
mdadm: added /dev/mapper/sdg4 to /dev/md4 as 2
mdadm: added /dev/mapper/sdh1 to /dev/md4 as 3
mdadm: added /dev/mapper/sdj1 to /dev/md4 as 4
mdadm: added /dev/mapper/sdi1 to /dev/md4 as 5
mdadm: added /dev/mapper/sdd4 to /dev/md4 as 0
mdadm: /dev/md4: Need a backup file to complete reshape of this array.
mdadm: Please provided one with "--backup-file=..."
mdadm: (Don't specify --update=revert-reshape again, that part succeeded.)


How can the array have an active reshape if the reshape pos'n is 0 on
all devices? Doesn't that mean that the reshape never actually
started? If so, can I just revert -- somehow -- to a 5-device RAID 6
array to recover my data?

Thanks.

-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2015-12-09 23:12 RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice? George Rapp
@ 2015-12-10  8:22 ` Mikael Abrahamsson
  2015-12-10 22:05   ` George Rapp
  0 siblings, 1 reply; 10+ messages in thread
From: Mikael Abrahamsson @ 2015-12-10  8:22 UTC (permalink / raw)
  To: George Rapp; +Cc: linux-raid

On Wed, 9 Dec 2015, George Rapp wrote:

> # mdadm --assemble --verbose --invalid-backup --force
> --update=revert-reshape /dev/md4 $OVERLAYS
> mdadm: looking for devices for /dev/md4
> mdadm: /dev/mapper/sdg4 is identified as a member of /dev/md4, slot 2.
> mdadm: /dev/mapper/sdi1 is identified as a member of /dev/md4, slot 5.
> mdadm: /dev/mapper/sdc4 is identified as a member of /dev/md4, slot 1.
> mdadm: /dev/mapper/sdh1 is identified as a member of /dev/md4, slot 3.
> mdadm: /dev/mapper/sdj1 is identified as a member of /dev/md4, slot 4.
> mdadm: /dev/mapper/sdd4 is identified as a member of /dev/md4, slot 0.
> mdadm: /dev/md4 has an active reshape - checking if critical section
> needs to be restored
> mdadm: Failed to find backup of critical section
> mdadm: continuing without restoring backup
> mdadm: added /dev/mapper/sdc4 to /dev/md4 as 1
> mdadm: added /dev/mapper/sdg4 to /dev/md4 as 2
> mdadm: added /dev/mapper/sdh1 to /dev/md4 as 3
> mdadm: added /dev/mapper/sdj1 to /dev/md4 as 4
> mdadm: added /dev/mapper/sdi1 to /dev/md4 as 5
> mdadm: added /dev/mapper/sdd4 to /dev/md4 as 0
> mdadm: /dev/md4: Need a backup file to complete reshape of this array.
> mdadm: Please provided one with "--backup-file=..."
> mdadm: (Don't specify --update=revert-reshape again, that part succeeded.)
>
> How can the array have an active reshape if the reshape pos'n is 0 on
> all devices? Doesn't that mean that the reshape never actually
> started? If so, can I just revert -- somehow -- to a 5-device RAID 6
> array to recover my data?

Just a shot in the dark, what happens if you add a backup file to the 
above command, but without revert-reshape? Ie state both --invalid-backup 
but also supply a backup-file. The text above could indicate that this 
might help.

If you get the array up and running again but it's still reshaping but at 
position 0, issue a --continue to it (this has worked for others). Also, I 
would get the latest git version of mdadm and try with that one if you're 
still using v3.2.2 as per the link to your august original email.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2015-12-10  8:22 ` Mikael Abrahamsson
@ 2015-12-10 22:05   ` George Rapp
  2015-12-10 22:34     ` George Rapp
  0 siblings, 1 reply; 10+ messages in thread
From: George Rapp @ 2015-12-10 22:05 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

On Thu, Dec 10, 2015 at 3:22 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Wed, 9 Dec 2015, George Rapp wrote:
>
>> # mdadm --assemble --verbose --invalid-backup --force
>> --update=revert-reshape /dev/md4 $OVERLAYS
>> mdadm: looking for devices for /dev/md4
>> [snipped remainder of output]
>
> Just a shot in the dark, what happens if you add a backup file to the above
> command, but without revert-reshape? Ie state both --invalid-backup but also
> supply a backup-file. The text above could indicate that this might help.

Mikael -

First, thanks for the reply.

I tried that and got a different error message:

# mdadm --assemble --verbose --invalid-backup --force
--backup-file=/home/gwr/2015/2015-08/grow_md6.bak /dev/md4 $OVERLAYS
mdadm: looking for devices for /dev/md4
mdadm: /dev/mapper/sdd4 is identified as a member of /dev/md4, slot 0.
mdadm: /dev/mapper/sdc4 is identified as a member of /dev/md4, slot 1.
mdadm: /dev/mapper/sdg4 is identified as a member of /dev/md4, slot 2.
mdadm: /dev/mapper/sdh1 is identified as a member of /dev/md4, slot 3.
mdadm: /dev/mapper/sdi1 is identified as a member of /dev/md4, slot 5.
mdadm: /dev/mapper/sdj1 is identified as a member of /dev/md4, slot 4.
mdadm: /dev/md4 has an active reshape - checking if critical section
needs to be restored
mdadm: No backup metadata on /home/gwr/2015/2015-08/grow_md6.bak
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/mapper/sdc4 to /dev/md4 as 1
mdadm: added /dev/mapper/sdg4 to /dev/md4 as 2
mdadm: added /dev/mapper/sdh1 to /dev/md4 as 3
mdadm: added /dev/mapper/sdj1 to /dev/md4 as 4
mdadm: added /dev/mapper/sdi1 to /dev/md4 as 5
mdadm: added /dev/mapper/sdd4 to /dev/md4 as 0
mdadm: failed to RUN_ARRAY /dev/md4: Invalid argument

> If you get the array up and running again but it's still reshaping but at
> position 0, issue a --continue to it (this has worked for others). Also, I
> would get the latest git version of mdadm and try with that one if you're
> still using v3.2.2 as per the link to your august original email.

Should have updated that. I'm up to mdadm v3.2.4 now, which is the
latest version offered by the Fedora 22 update repo (and Fedora 23
doesn't offer a newer one):

# mdadm --version
mdadm - v3.3.4 - 3rd August 2015

Thanks again.
-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2015-12-10 22:05   ` George Rapp
@ 2015-12-10 22:34     ` George Rapp
  2015-12-15 14:11       ` Mikael Abrahamsson
  2015-12-21  1:35       ` NeilBrown
  0 siblings, 2 replies; 10+ messages in thread
From: George Rapp @ 2015-12-10 22:34 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

On Thu, Dec 10, 2015 at 5:05 PM, George Rapp <george.rapp@gmail.com> wrote:
> On Thu, Dec 10, 2015 at 3:22 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>>
>> Just a shot in the dark, what happens if you add a backup file to the above
>> command, but without revert-reshape? Ie state both --invalid-backup but also
>> supply a backup-file. The text above could indicate that this might help.
>
> Mikael -
>
> First, thanks for the reply.
>
> I tried that and got a different error message:
>
> # mdadm --assemble --verbose --invalid-backup --force
> --backup-file=/home/gwr/2015/2015-08/grow_md6.bak /dev/md4 $OVERLAYS
> mdadm: looking for devices for /dev/md4
> mdadm: /dev/mapper/sdd4 is identified as a member of /dev/md4, slot 0.
> mdadm: /dev/mapper/sdc4 is identified as a member of /dev/md4, slot 1.
> mdadm: /dev/mapper/sdg4 is identified as a member of /dev/md4, slot 2.
> mdadm: /dev/mapper/sdh1 is identified as a member of /dev/md4, slot 3.
> mdadm: /dev/mapper/sdi1 is identified as a member of /dev/md4, slot 5.
> mdadm: /dev/mapper/sdj1 is identified as a member of /dev/md4, slot 4.
> mdadm: /dev/md4 has an active reshape - checking if critical section
> needs to be restored
> mdadm: No backup metadata on /home/gwr/2015/2015-08/grow_md6.bak
> mdadm: Failed to find backup of critical section
> mdadm: continuing without restoring backup
> mdadm: added /dev/mapper/sdc4 to /dev/md4 as 1
> mdadm: added /dev/mapper/sdg4 to /dev/md4 as 2
> mdadm: added /dev/mapper/sdh1 to /dev/md4 as 3
> mdadm: added /dev/mapper/sdj1 to /dev/md4 as 4
> mdadm: added /dev/mapper/sdi1 to /dev/md4 as 5
> mdadm: added /dev/mapper/sdd4 to /dev/md4 as 0
> mdadm: failed to RUN_ARRAY /dev/md4: Invalid argument

Forgot to include the contents of the system log:

[  928.679299] md: bind<dm-1>
[  928.679809] md: bind<dm-2>
[  928.681957] md: bind<dm-3>
[  928.693345] md: bind<dm-5>
[  928.694155] md: bind<dm-4>
[  928.696251] md: bind<dm-0>
[  928.709133] md/raid:md4: reshape_position too early for
auto-recovery - aborting.
[  928.709159] md: pers->run() failed ...
[  928.709425] md: md4 stopped.
[  928.709442] md: unbind<dm-0>
[  928.709449] md: export_rdev(dm-0)
[  928.709462] md: unbind<dm-4>
[  928.709468] md: export_rdev(dm-4)
[  928.709477] md: unbind<dm-5>
[  928.709483] md: export_rdev(dm-5)
[  928.709493] md: unbind<dm-3>
[  928.709499] md: export_rdev(dm-3)
[  928.709510] md: unbind<dm-2>
[  928.709515] md: export_rdev(dm-2)
[  928.709524] md: unbind<dm-1>
[  928.709529] md: export_rdev(dm-1)
[  928.831905] md: bind<dm-0>
[  928.859783] md: bind<dm-3>
[  928.864100] md: bind<dm-1>
[  928.872128] md: bind<dm-4>
[  928.878222] md: bind<dm-5>
[  928.886799] md: bind<dm-2>

I appear to be too early in the reshape for auto-recovery, but too far
along to just say "never mind on that whole reshape business". Any
other thoughts?

-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2015-12-10 22:34     ` George Rapp
@ 2015-12-15 14:11       ` Mikael Abrahamsson
  2015-12-21  1:35       ` NeilBrown
  1 sibling, 0 replies; 10+ messages in thread
From: Mikael Abrahamsson @ 2015-12-15 14:11 UTC (permalink / raw)
  To: George Rapp; +Cc: linux-raid

On Thu, 10 Dec 2015, George Rapp wrote:

> I appear to be too early in the reshape for auto-recovery, but too far 
> along to just say "never mind on that whole reshape business". Any other 
> thoughts?

what does "cat /proc/mdstat" say after these commands?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2015-12-10 22:34     ` George Rapp
  2015-12-15 14:11       ` Mikael Abrahamsson
@ 2015-12-21  1:35       ` NeilBrown
  2015-12-23  2:04         ` George Rapp
  1 sibling, 1 reply; 10+ messages in thread
From: NeilBrown @ 2015-12-21  1:35 UTC (permalink / raw)
  To: George Rapp, Mikael Abrahamsson; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1641 bytes --]

On Fri, Dec 11 2015, George Rapp wrote:
>
> I appear to be too early in the reshape for auto-recovery, but too far
> along to just say "never mind on that whole reshape business". Any
> other thoughts?
>

What this means is that you've hit a corner case that was never thought
through properly and isn't handled correctly.

The current state of the array is (I think) that it looks like a reshape
to reduce the number of devices in the array has very nearly completed.
Only the first stripe needs to be completed.  Whether that first stripe
is still in the old "N+1" device layout or the new "N" device layout is
unknown to the kernel - this information is only in the backup file
(which doesn't exist).
By telling mdadm --invalid-backup, you effectively tell mdadm that there
is nothing useful in the backup file so it should know that the reshape
has actually completed.  But it has no way to tell the kernel that.
What it should do in this case is (I think) rewrite the metadata to
record that the reshape is complete.  But it doesn't.

I shouldn't be too hard to fix, but it isn't trivial either and I'm
unlikely to get anywhere before the Christmas break.

If you can get reshape to work at all (disable selinux?) you could try
--update=revert-reshape and let the reshape to more devices progress for
a while, and then revert it.

If you cannot get anywhere, then use
  "mdadm --dump=/tmp/whatever /dev/mdthing"

to create a copy of the metadata in some spares files.
Then tar those up (a compressed tarchive should be tiny) and email them.
Then I can try and see if I can make something work on exactly the array
you have.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2015-12-21  1:35       ` NeilBrown
@ 2015-12-23  2:04         ` George Rapp
  2015-12-23  2:18           ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: George Rapp @ 2015-12-23  2:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mikael Abrahamsson, Linux-RAID

On Sun, Dec 20, 2015 at 8:35 PM, NeilBrown <nfbrown@novell.com> wrote:
> On Fri, Dec 11 2015, George Rapp wrote:
>>
>> I appear to be too early in the reshape for auto-recovery, but too far
>> along to just say "never mind on that whole reshape business". Any
>> other thoughts?
>>
>
> What this means is that you've hit a corner case that was never thought
> through properly and isn't handled correctly.

Neil -

Thanks for the reply. Please see my comments inline below.

> The current state of the array is (I think) that it looks like a reshape
> to reduce the number of devices in the array has very nearly completed.
> Only the first stripe needs to be completed.  Whether that first stripe
> is still in the old "N+1" device layout or the new "N" device layout is
> unknown to the kernel - this information is only in the backup file
> (which doesn't exist).

Hmmm. Maybe you're thinking of a different case. This is mine:
http://marc.info/?l=linux-raid&m=143880359028232&w=2

My problem was that I was *increasing* the number of devices from 5 to
6. Also, I don't believe the reshape actually got anywhere, per
/proc/mdstat, which I was watching at the time, because the kernel was
denied write access to my backup file by SELinux.

> By telling mdadm --invalid-backup, you effectively tell mdadm that there
> is nothing useful in the backup file so it should know that the reshape
> has actually completed.  But it has no way to tell the kernel that.
> What it should do in this case is (I think) rewrite the metadata to
> record that the reshape is complete.  But it doesn't.

IMHO, it'd be better in my case to revert to a 5-drive array and
rewrite the metadata to reflect that, since I don't believe the
reshape ever actually began. Once I have access to the array again and
have fsck'ed the filesystem, then I can re-try the "mdadm --grow
--raid-devices=6" command (with SELinux disabled, and watching from
the dunce chair in the opposite corner of the room ... 8^) later.

> I shouldn't be too hard to fix, but it isn't trivial either and I'm
> unlikely to get anywhere before the Christmas break.
>
> If you can get reshape to work at all (disable selinux?) you could try
> --update=revert-reshape and let the reshape to more devices progress for
> a while, and then revert it.
>
> If you cannot get anywhere, then use
>   "mdadm --dump=/tmp/whatever /dev/mdthing"
>
> to create a copy of the metadata in some spares files.
> Then tar those up (a compressed tarchive should be tiny) and email them.
> Then I can try and see if I can make something work on exactly the array
> you have.

Since the array won't run, I can't obtain the metadata you're looking for:

# mdadm --dump=/home/gwr/c/dev-md4-mdadm-dump /dev/md4
mdadm: Cannot find RAID metadata on /dev/md4

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md4 : inactive dm-2[5](S) dm-1[1](S) dm-4[8](S) dm-5[7](S) dm-0[0](S) dm-3[6](S)
      11513452944 blocks super 1.2

(the funny member names are the overlay files I created to experiment
with various recovery operations nondestructively).

I was able to dump the metadata from the six component devices of the
RAID 6 array with:

# mdadm --dump=/home/gwr/c/dev-md4-mdadm-dump /dev/sdj1
/dev/sdj1 saved as /home/gwr/c/dev-md4-mdadm-dump/sdj1.
/dev/sdj1 also saved as
/home/gwr/c/dev-md4-mdadm-dump/wwn-0x5000cca222d7b996-part1.
/dev/sdj1 also saved as
/home/gwr/c/dev-md4-mdadm-dump/wwn-0x13372914453769768960x-part1.
/dev/sdj1 also saved as
/home/gwr/c/dev-md4-mdadm-dump/ata-Hitachi_HUA722020ALA331_B9HP5Y2F-part1.

but after doing that, I have a whole bunch of huge (presumabily
sparse) files in my output directory:

[root@backend3 dev-md4-mdadm-dump]# ll
total 192
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42
ata-Hitachi_HUA722020ALA331_B8HGR23Z-part1
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:42
ata-Hitachi_HUA722020ALA331_B9HP5Y2F-part1
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42
ata-ST2000DL003-9VT166_6YD0YXL1-part1
-rw-r--r-- 4 root root 1964963323392 Dec 22 20:42
ata-ST32000542AS_5XW29Z1K-part4
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:41
ata-ST32000542AS_5XW2D8GA-part4
-rw-r--r-- 4 root root 1964968599552 Dec 22 20:42
ata-TP02000GB_TPW140709340083-part4
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:41 sdc4
-rw-r--r-- 4 root root 1964963323392 Dec 22 20:42 sdd4
-rw-r--r-- 4 root root 1964968599552 Dec 22 20:42 sdg4
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42 sdh1
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42 sdi1
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:42 sdj1
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:42
wwn-0x13372914453769768960x-part1
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42
wwn-0x14378343057695330304x-part1
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:41
wwn-0x1508759625694990336x-part4
-rw-r--r-- 4 root root 1964963323392 Dec 22 20:42
wwn-0x4884205575019188224x-part4
-rw-r--r-- 4 root root 1964963323392 Dec 22 20:42 wwn-0x5000c5002f1743c8-part4
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:41 wwn-0x5000c50030e214f0-part4
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42 wwn-0x5000c5003e0f5862-part1
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42 wwn-0x5000cca222d4c78a-part1
-rw-r--r-- 4 root root 1964964405248 Dec 22 20:42 wwn-0x5000cca222d7b996-part1
-rw-r--r-- 4 root root 1964968599552 Dec 22 20:42 wwn-0x50014ee208b56ae8-part4
-rw-r--r-- 4 root root 1964963324416 Dec 22 20:42
wwn-0x6368721060505866240x-part1
-rw-r--r-- 4 root root 1964968599552 Dec 22 20:42
wwn-0x7703416737422790657x-part4

and "tar -c -v -z -f dev-md4-mdadm-dump.tar.gz dev-md4-mdadm-dump/"
didn't produce a tiny file, but a huge one (15MB and growing in just
10 minutes), so I killed it.

Any further thoughts about how to proceed? Thanks for your help.

George
-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2015-12-23  2:04         ` George Rapp
@ 2015-12-23  2:18           ` NeilBrown
       [not found]             ` <CAF-KpgZ=HY_HKvj5buFOKseUV0GLeOLR1m3B0EYxrYcD3R5ieA@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2015-12-23  2:18 UTC (permalink / raw)
  To: George Rapp; +Cc: Mikael Abrahamsson, Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]

On Wed, Dec 23 2015, George Rapp wrote:

>
>> The current state of the array is (I think) that it looks like a reshape
>> to reduce the number of devices in the array has very nearly completed.
>> Only the first stripe needs to be completed.  Whether that first stripe
>> is still in the old "N+1" device layout or the new "N" device layout is
>> unknown to the kernel - this information is only in the backup file
>> (which doesn't exist).
>
> Hmmm. Maybe you're thinking of a different case. This is mine:
> http://marc.info/?l=linux-raid&m=143880359028232&w=2
>
> My problem was that I was *increasing* the number of devices from 5 to
> 6. Also, I don't believe the reshape actually got anywhere, per
> /proc/mdstat, which I was watching at the time, because the kernel was
> denied write access to my backup file by SELinux.

Yes, but then you did "--assemble --update=revert-reshape" didn't you?
So now it looks like it is being reduced in size.

> I was able to dump the metadata from the six component devices of the
> RAID 6 array with:
>
> # mdadm --dump=/home/gwr/c/dev-md4-mdadm-dump /dev/sdj1
> /dev/sdj1 saved as /home/gwr/c/dev-md4-mdadm-dump/sdj1.
> /dev/sdj1 also saved as
> /home/gwr/c/dev-md4-mdadm-dump/wwn-0x5000cca222d7b996-part1.
> /dev/sdj1 also saved as
> /home/gwr/c/dev-md4-mdadm-dump/wwn-0x13372914453769768960x-part1.
> /dev/sdj1 also saved as
> /home/gwr/c/dev-md4-mdadm-dump/ata-Hitachi_HUA722020ALA331_B9HP5Y2F-part1.

Oh, that's right - you give component device names to --dump, not the
array.  Sorry.

>
> but after doing that, I have a whole bunch of huge (presumabily
> sparse) files in my output directory:
>
> [root@backend3 dev-md4-mdadm-dump]# ll
> total 192
> -rw-r--r-- 4 root root 1964963324416 Dec 22 20:42
> ata-Hitachi_HUA722020ALA331_B8HGR23Z-part1
...
> -rw-r--r-- 4 root root 1964963324416 Dec 22 20:42
> wwn-0x6368721060505866240x-part1
> -rw-r--r-- 4 root root 1964968599552 Dec 22 20:42
> wwn-0x7703416737422790657x-part4
>
> and "tar -c -v -z -f dev-md4-mdadm-dump.tar.gz dev-md4-mdadm-dump/"
> didn't produce a tiny file, but a huge one (15MB and growing in just
> 10 minutes), so I killed it.
>
> Any further thoughts about how to proceed? Thanks for your help.

Maybe add "-S" option to tar.
I don't do this often enough to remember the details.  I though tar
auto-detected sparse files, but apparently not.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
       [not found]             ` <CAF-KpgZ=HY_HKvj5buFOKseUV0GLeOLR1m3B0EYxrYcD3R5ieA@mail.gmail.com>
@ 2016-01-04  2:16               ` NeilBrown
  2016-01-22 19:24                 ` George Rapp
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2016-01-04  2:16 UTC (permalink / raw)
  To: George Rapp; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1830 bytes --]

On Sun, Dec 27 2015, George Rapp wrote:

> Please find attached the output of the following command:
>
> # tar -c -v -z --sparse -f dev-md4-mdadm-dump.tar.gz dev-md4-mdadm-dump
>
> Thanks again for your help!

Thanks.

If you apply the following patch to mdadm (
   git clone git://neil.brown.name/mdadm
   apply patch
   make
) and then try to assemble with --update=revert-reshape, it should
assemble as a 5-device array with no reshape happening.

I probably want more safety checks before this goes upstream, but it
is safe enough for you.

NeilBrown

diff --git a/super1.c b/super1.c
index 10e00652c4ee..efc0491fc94d 100644
--- a/super1.c
+++ b/super1.c
@@ -1317,6 +1317,17 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
 			unsigned long long reshape_sectors;
 			long reshape_chunk;
 			rv = 0;
+			/* If the reshape hasn't started, just stop it */
+			if (sb->reshape_position == 0 &&
+			    (__le32_to_cpu(sb->delta_disks) > 0 ||
+			     (__le32_to_cpu(sb->delta_disks) == 0 &&
+			      !(sb->feature_map & __cpu_to_le32(MD_FEATURE_RESHAPE_BACKWARDS))))) {
+				sb->feature_map &= ~__cpu_to_le32(MD_FEATURE_RESHAPE_ACTIVE);
+				sb->raid_disks = __cpu_to_le32(__le32_to_cpu(sb->raid_disks) -
+							       __le32_to_cpu(sb->delta_disks));
+				sb->delta_disks = 0;
+				goto done;
+			}
 			/* reshape_position is a little messy.
 			 * Its value must be a multiple of the larger
 			 * chunk size, and of the "after" data disks.
@@ -1363,6 +1374,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
 				sb->new_offset = __cpu_to_le32(-offset_delta);
 				sb->data_size = __cpu_to_le64(__le64_to_cpu(sb->data_size) - offset_delta);
 			}
+		done:;
 		}
 	} else if (strcmp(update, "_reshape_progress")==0)
 		sb->reshape_position = __cpu_to_le64(info->reshape_progress);


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice?
  2016-01-04  2:16               ` NeilBrown
@ 2016-01-22 19:24                 ` George Rapp
  0 siblings, 0 replies; 10+ messages in thread
From: George Rapp @ 2016-01-22 19:24 UTC (permalink / raw)
  To: Linux-RAID

On Sun, Jan 3, 2016 at 9:16 PM, NeilBrown <neilb@suse.com> wrote:
> On Sun, Dec 27 2015, George Rapp wrote:
>
> If you apply the following patch to mdadm (
>    git clone git://neil.brown.name/mdadm
>    apply patch
>    make
> ) and then try to assemble with --update=revert-reshape, it should
> assemble as a 5-device array with no reshape happening.
>
> I probably want more safety checks before this goes upstream, but it
> is safe enough for you.
>
> [patch snipped]
>

I've replied privately to Neil, but for list archive purposes, his
patch solved my problem.

After applying his patch, I used the custom-build version of mdadm to
assemble my array:

# UUID=$(mdadm -E /dev/sdd4 | perl -ne '/Array UUID : (\S+)/ and print $1')
# DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep '
+' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo
/dev/{1})
# ./mdadm --assemble --verbose --update=revert-reshape /dev/md4 $DEVICES
mdadm: looking for devices for /dev/md4
mdadm: /dev/sdd4 is identified as a member of /dev/md4, slot 0.
mdadm: /dev/sdc4 is identified as a member of /dev/md4, slot 1.
mdadm: /dev/sdg4 is identified as a member of /dev/md4, slot 2.
mdadm: /dev/sdh1 is identified as a member of /dev/md4, slot 3.
mdadm: /dev/sdi1 is identified as a member of /dev/md4, slot 5.
mdadm: /dev/sdj1 is identified as a member of /dev/md4, slot 4.
mdadm: device 10 in /dev/md4 has wrong state in superblock, but
/dev/sdi1 seems ok
mdadm: added /dev/sdc4 to /dev/md4 as 1
mdadm: added /dev/sdg4 to /dev/md4 as 2
mdadm: added /dev/sdh1 to /dev/md4 as 3
mdadm: added /dev/sdj1 to /dev/md4 as 4
mdadm: added /dev/sdi1 to /dev/md4 as 5
mdadm: added /dev/sdd4 to /dev/md4 as 0
mdadm: /dev/md4 has been started with 5 drives and 1 spare.
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md4 : active raid6 sdd4[0] sdi1[8](S) sdj1[7] sdh1[6] sdg4[5] sdc4[1]
      5756723712 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

For future reference, I would highly recommend executing

# setenforce 0

or, if you're really paranoid, disabling SELinux completely
- edit /etc/sysconfig/selinux and add or modify this line:
SELINUX=disabled
- # touch /.relabel
- # systemctl reboot
(and go get a cup of coffee while your filesystem gets relabeled)

before making any metadata changes on a RAID 6 array.

Thanks again, Neil and members of the Linux-RAID community!
-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-01-22 19:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-09 23:12 RAID 6 "Failed to restore critical section for reshape, sorry." - recovery advice? George Rapp
2015-12-10  8:22 ` Mikael Abrahamsson
2015-12-10 22:05   ` George Rapp
2015-12-10 22:34     ` George Rapp
2015-12-15 14:11       ` Mikael Abrahamsson
2015-12-21  1:35       ` NeilBrown
2015-12-23  2:04         ` George Rapp
2015-12-23  2:18           ` NeilBrown
     [not found]             ` <CAF-KpgZ=HY_HKvj5buFOKseUV0GLeOLR1m3B0EYxrYcD3R5ieA@mail.gmail.com>
2016-01-04  2:16               ` NeilBrown
2016-01-22 19:24                 ` George Rapp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.