All of lore.kernel.org
 help / color / mirror / Atom feed
* Advice recovering from interrupted grow on RAID5 array
@ 2013-10-15  1:59 John Yates
  2013-10-16  5:26 ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: John Yates @ 2013-10-15  1:59 UTC (permalink / raw)
  To: linux-raid

Midway through a RAID5 grow operation from 5 to 6 USB connected
drives, system logs show that the kernel lost communication with some
of the drive ports which has left my array in a state that I have not
been able to reassemble. After reseating the cable connections and
rebooting, all of the drives appear to be functioning normally, so
hopefully the data is still intact. I need advice on recovery steps
for the array.

It appears that each drive failed in quick succession with /dev/sdc1
being the last standing and having the others marked as missing in its
superblock. The superblocks of the other drives show all drives as
available. (--examine output below)

>mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
mdadm: too-old timestamp on backup-metadata on device-5
mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.

Since the Events count was just slightly different on /dev/sdc1, I
retried the assemble with the --force option. This appears to have
copied the Events count of /dev/sdc1 over to /dev/sdd1, /dev/sde1, and
/dev/sdf1, but still failed to assemble the array, though a verbose
assemble command is showing 4 drives now:

mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
/dev/sdf1 /dev/sdg1 --verbose
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
mdadm: :/dev/md127 has an active reshape - checking if critical
section needs to be restored
mdadm: too-old timestamp on backup-metadata on device-5
mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
mdadm: added /dev/sdf1 to /dev/md127 as 1
mdadm: added /dev/sdd1 to /dev/md127 as 2
mdadm: added /dev/sdc1 to /dev/md127 as 3
mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
mdadm: added /dev/sde1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.


Is there a way to correct the superblock data to allow assembly again
and hopefully restart the grow process? Thanks for any help!

--examine before --assemble --force
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : 2f5a0e84:f258e71d:9dd414e4:42dc45a0

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : ca0111bd - correct
         Events : 155279

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : c188fbad:7d9efd7e:a3fb4c45:833e30b9

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:57:26 2013
       Checksum : cf1c1046 - correct
         Events : 155281

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : cda5e64c:a516c4fe:f79216b9:728ecd37

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : e03f8b96 - correct
         Events : 155279

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : 4a3c1fe5:08d55a2d:7e3796ad:2f4ece45

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : a98f44b7 - correct
         Events : 155279

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : 8bcd957a:0dd511c1:020851aa:b4f2963a

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : c4404b07 - correct
         Events : 155279

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : da285064:616afb61:d275b2bb:7dc91d94

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : b9cc8048 - correct
         Events : 155279

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)


--examine after --assemble --force (current state)

/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : 2f5a0e84:f258e71d:9dd414e4:42dc45a0

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : ca0111bd - correct
         Events : 155279

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : c188fbad:7d9efd7e:a3fb4c45:833e30b9

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:57:26 2013
       Checksum : cf1c1046 - correct
         Events : 155281

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : cda5e64c:a516c4fe:f79216b9:728ecd37

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : e03f8b98 - correct
         Events : 155281

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : 4a3c1fe5:08d55a2d:7e3796ad:2f4ece45

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : a98f44b9 - correct
         Events : 155281

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : 8bcd957a:0dd511c1:020851aa:b4f2963a

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : c4404b09 - correct
         Events : 155281

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 331103c1:c6a2afce:56b0404d:4786a453
           Name : localhost:archive
  Creation Time : Thu Nov 15 21:04:04 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 3905990738 (1862.52 GiB 1999.87 GB)
     Array Size : 9764974080 (9312.61 GiB 9999.33 GB)
  Used Dev Size : 3905989632 (1862.52 GiB 1999.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1106 sectors
          State : clean
    Device UUID : da285064:616afb61:d275b2bb:7dc91d94

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 5568890880 (5310.91 GiB 5702.54 GB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 14 01:52:28 2013
       Checksum : b9cc8048 - correct
         Events : 155279

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-15  1:59 Advice recovering from interrupted grow on RAID5 array John Yates
@ 2013-10-16  5:26 ` NeilBrown
  2013-10-16 13:02   ` John Yates
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2013-10-16  5:26 UTC (permalink / raw)
  To: John Yates; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:

> Midway through a RAID5 grow operation from 5 to 6 USB connected
> drives, system logs show that the kernel lost communication with some
> of the drive ports which has left my array in a state that I have not
> been able to reassemble. After reseating the cable connections and
> rebooting, all of the drives appear to be functioning normally, so
> hopefully the data is still intact. I need advice on recovery steps
> for the array.
> 
> It appears that each drive failed in quick succession with /dev/sdc1
> being the last standing and having the others marked as missing in its
> superblock. The superblocks of the other drives show all drives as
> available. (--examine output below)
> 
> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
> mdadm: too-old timestamp on backup-metadata on device-5
> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.

Did you try following the suggestion and run

 export MDADM_GROW_ALLOW_OLD=1

and the try the --asssemble again?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-16  5:26 ` NeilBrown
@ 2013-10-16 13:02   ` John Yates
  2013-10-17  0:07     ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: John Yates @ 2013-10-16 13:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:
>
>> Midway through a RAID5 grow operation from 5 to 6 USB connected
>> drives, system logs show that the kernel lost communication with some
>> of the drive ports which has left my array in a state that I have not
>> been able to reassemble. After reseating the cable connections and
>> rebooting, all of the drives appear to be functioning normally, so
>> hopefully the data is still intact. I need advice on recovery steps
>> for the array.
>>
>> It appears that each drive failed in quick succession with /dev/sdc1
>> being the last standing and having the others marked as missing in its
>> superblock. The superblocks of the other drives show all drives as
>> available. (--examine output below)
>>
>> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
>> mdadm: too-old timestamp on backup-metadata on device-5
>> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
>> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
>
> Did you try following the suggestion and run
>
>  export MDADM_GROW_ALLOW_OLD=1
>
> and the try the --asssemble again?
>
> NeilBrown

Yes I did, thanks. Not much change though. It accepts the timestamp,
but then appears not to use it.

mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
/dev/sdf1 /dev/sdg1 --verbose
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
mdadm: :/dev/md127 has an active reshape - checking if critical
section needs to be restored
mdadm: accepting backup with timestamp 1381360844 for array with
timestamp 1381729948
mdadm: backup-metadata found on device-5 but is not needed
mdadm: added /dev/sdf1 to /dev/md127 as 1
mdadm: added /dev/sdd1 to /dev/md127 as 2
mdadm: added /dev/sdc1 to /dev/md127 as 3
mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
mdadm: added /dev/sde1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-16 13:02   ` John Yates
@ 2013-10-17  0:07     ` NeilBrown
  2013-10-17  5:36       ` John Yates
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2013-10-17  0:07 UTC (permalink / raw)
  To: John Yates; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2932 bytes --]

On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote:

> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:
> >
> >> Midway through a RAID5 grow operation from 5 to 6 USB connected
> >> drives, system logs show that the kernel lost communication with some
> >> of the drive ports which has left my array in a state that I have not
> >> been able to reassemble. After reseating the cable connections and
> >> rebooting, all of the drives appear to be functioning normally, so
> >> hopefully the data is still intact. I need advice on recovery steps
> >> for the array.
> >>
> >> It appears that each drive failed in quick succession with /dev/sdc1
> >> being the last standing and having the others marked as missing in its
> >> superblock. The superblocks of the other drives show all drives as
> >> available. (--examine output below)
> >>
> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
> >> mdadm: too-old timestamp on backup-metadata on device-5
> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
> >
> > Did you try following the suggestion and run
> >
> >  export MDADM_GROW_ALLOW_OLD=1
> >
> > and the try the --asssemble again?
> >
> > NeilBrown
> 
> Yes I did, thanks. Not much change though. It accepts the timestamp,
> but then appears not to use it.
> 
> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> /dev/sdf1 /dev/sdg1 --verbose
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> mdadm: :/dev/md127 has an active reshape - checking if critical
> section needs to be restored
> mdadm: accepting backup with timestamp 1381360844 for array with
> timestamp 1381729948
> mdadm: backup-metadata found on device-5 but is not needed
> mdadm: added /dev/sdf1 to /dev/md127 as 1
> mdadm: added /dev/sdd1 to /dev/md127 as 2
> mdadm: added /dev/sdc1 to /dev/md127 as 3
> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> mdadm: added /dev/sde1 to /dev/md127 as 0
> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.


What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ??

If that doesn't work, please add --verbose as well, and report the output.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-17  0:07     ` NeilBrown
@ 2013-10-17  5:36       ` John Yates
  2013-10-21  1:09         ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: John Yates @ 2013-10-17  5:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote:
>
>> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:
>> >
>> >> Midway through a RAID5 grow operation from 5 to 6 USB connected
>> >> drives, system logs show that the kernel lost communication with some
>> >> of the drive ports which has left my array in a state that I have not
>> >> been able to reassemble. After reseating the cable connections and
>> >> rebooting, all of the drives appear to be functioning normally, so
>> >> hopefully the data is still intact. I need advice on recovery steps
>> >> for the array.
>> >>
>> >> It appears that each drive failed in quick succession with /dev/sdc1
>> >> being the last standing and having the others marked as missing in its
>> >> superblock. The superblocks of the other drives show all drives as
>> >> available. (--examine output below)
>> >>
>> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
>> >> mdadm: too-old timestamp on backup-metadata on device-5
>> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
>> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
>> >
>> > Did you try following the suggestion and run
>> >
>> >  export MDADM_GROW_ALLOW_OLD=1
>> >
>> > and the try the --asssemble again?
>> >
>> > NeilBrown
>>
>> Yes I did, thanks. Not much change though. It accepts the timestamp,
>> but then appears not to use it.
>>
>> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>> /dev/sdf1 /dev/sdg1 --verbose
>> mdadm: looking for devices for /dev/md127
>> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
>> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
>> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
>> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
>> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
>> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
>> mdadm: :/dev/md127 has an active reshape - checking if critical
>> section needs to be restored
>> mdadm: accepting backup with timestamp 1381360844 for array with
>> timestamp 1381729948
>> mdadm: backup-metadata found on device-5 but is not needed
>> mdadm: added /dev/sdf1 to /dev/md127 as 1
>> mdadm: added /dev/sdd1 to /dev/md127 as 2
>> mdadm: added /dev/sdc1 to /dev/md127 as 3
>> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
>> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
>> mdadm: added /dev/sde1 to /dev/md127 as 0
>> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>
>
> What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ??
>
> If that doesn't work, please add --verbose as well, and report the output.
>
> NeilBrown

Thanks Neil. I had tried that as well (output below). I'm wondering if
there is a way to fix the metadata for /dev/sdc1 since that seems to
be the odd one where the --examine data indicates that the other disks
are all bad when I don't believe they really are (just the result of a
partial kernel or driver crash). I have read about some people zeroing
the superblock on a device so that it can be recreated, but I am not
sure exactly how that works and am hesitant to try it since a reshape
was in progress. I have also read about people having had success by
re-running the original mdadm --create while leaving the data intact,
but again I am hesitant to try that, especially because of the reshape
state.

Or... maybe this all has more to do with the Update Time, since the
output seems to indicate 4 drives are usable. All of the drives have
the same Update Time except for /dev/sdc1 which is about 5 minutes
later than the rest. Since it is the fourth device, perhaps the
assemble is satisfied with devices 0, 1, 2, 3, but then seeing an
Update Time on devices 4 and 5 that is earlier than device 3, it
marks them as "possibly out of date" and stops trying to assemble the
array. Hard to tell, but I still would not have any idea how to
overcome that scenario. I appreciate your help!

# export MDADM_GROW_ALLOW_OLD=1
# mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
/dev/sdf1 /dev/sdg1 --force --verbose
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
mdadm: :/dev/md127 has an active reshape - checking if critical
section needs to be restored
mdadm: accepting backup with timestamp 1381360844 for array with
timestamp 1381729948
mdadm: backup-metadata found on device-5 but is not needed
mdadm: added /dev/sdf1 to /dev/md127 as 1
mdadm: added /dev/sdd1 to /dev/md127 as 2
mdadm: added /dev/sdc1 to /dev/md127 as 3
mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
mdadm: added /dev/sde1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-17  5:36       ` John Yates
@ 2013-10-21  1:09         ` NeilBrown
  2013-10-21 16:29           ` John Yates
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2013-10-21  1:09 UTC (permalink / raw)
  To: John Yates; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6185 bytes --]

On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote:

> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote:
> >
> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:
> >> >
> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected
> >> >> drives, system logs show that the kernel lost communication with some
> >> >> of the drive ports which has left my array in a state that I have not
> >> >> been able to reassemble. After reseating the cable connections and
> >> >> rebooting, all of the drives appear to be functioning normally, so
> >> >> hopefully the data is still intact. I need advice on recovery steps
> >> >> for the array.
> >> >>
> >> >> It appears that each drive failed in quick succession with /dev/sdc1
> >> >> being the last standing and having the others marked as missing in its
> >> >> superblock. The superblocks of the other drives show all drives as
> >> >> available. (--examine output below)
> >> >>
> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
> >> >> mdadm: too-old timestamp on backup-metadata on device-5
> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
> >> >
> >> > Did you try following the suggestion and run
> >> >
> >> >  export MDADM_GROW_ALLOW_OLD=1
> >> >
> >> > and the try the --asssemble again?
> >> >
> >> > NeilBrown
> >>
> >> Yes I did, thanks. Not much change though. It accepts the timestamp,
> >> but then appears not to use it.
> >>
> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> >> /dev/sdf1 /dev/sdg1 --verbose
> >> mdadm: looking for devices for /dev/md127
> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> >> mdadm: :/dev/md127 has an active reshape - checking if critical
> >> section needs to be restored
> >> mdadm: accepting backup with timestamp 1381360844 for array with
> >> timestamp 1381729948
> >> mdadm: backup-metadata found on device-5 but is not needed
> >> mdadm: added /dev/sdf1 to /dev/md127 as 1
> >> mdadm: added /dev/sdd1 to /dev/md127 as 2
> >> mdadm: added /dev/sdc1 to /dev/md127 as 3
> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> >> mdadm: added /dev/sde1 to /dev/md127 as 0
> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
> >
> >
> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ??
> >
> > If that doesn't work, please add --verbose as well, and report the output.
> >
> > NeilBrown
> 
> Thanks Neil. I had tried that as well (output below). I'm wondering if
> there is a way to fix the metadata for /dev/sdc1 since that seems to
> be the odd one where the --examine data indicates that the other disks
> are all bad when I don't believe they really are (just the result of a
> partial kernel or driver crash). I have read about some people zeroing
> the superblock on a device so that it can be recreated, but I am not
> sure exactly how that works and am hesitant to try it since a reshape
> was in progress. I have also read about people having had success by
> re-running the original mdadm --create while leaving the data intact,
> but again I am hesitant to try that, especially because of the reshape
> state.
> 
> Or... maybe this all has more to do with the Update Time, since the
> output seems to indicate 4 drives are usable. All of the drives have
> the same Update Time except for /dev/sdc1 which is about 5 minutes
> later than the rest. Since it is the fourth device, perhaps the
> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an
> Update Time on devices 4 and 5 that is earlier than device 3, it
> marks them as "possibly out of date" and stops trying to assemble the
> array. Hard to tell, but I still would not have any idea how to
> overcome that scenario. I appreciate your help!
> 
> # export MDADM_GROW_ALLOW_OLD=1
> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> /dev/sdf1 /dev/sdg1 --force --verbose
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> mdadm: :/dev/md127 has an active reshape - checking if critical
> section needs to be restored
> mdadm: accepting backup with timestamp 1381360844 for array with
> timestamp 1381729948
> mdadm: backup-metadata found on device-5 but is not needed
> mdadm: added /dev/sdf1 to /dev/md127 as 1
> mdadm: added /dev/sdd1 to /dev/md127 as 2
> mdadm: added /dev/sdc1 to /dev/md127 as 3
> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> mdadm: added /dev/sde1 to /dev/md127 as 0
> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.

That shouldn't happen.  With '-f' it should force the event count of either b1
or g1 (or maybe both) to match the others.

What version of mdadm are you using? (mdadm -V)

Maybe try the latest
  git clone git://git.neil.brown.name/mdadm
  cd mdadm
  make mdadm
  ./mdadm .....

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-21  1:09         ` NeilBrown
@ 2013-10-21 16:29           ` John Yates
  2013-10-21 20:06             ` John Yates
  0 siblings, 1 reply; 9+ messages in thread
From: John Yates @ 2013-10-21 16:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Sun, Oct 20, 2013 at 9:09 PM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote:
>
>> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote:
>> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote:
>> >
>> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote:
>> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:
>> >> >
>> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected
>> >> >> drives, system logs show that the kernel lost communication with some
>> >> >> of the drive ports which has left my array in a state that I have not
>> >> >> been able to reassemble. After reseating the cable connections and
>> >> >> rebooting, all of the drives appear to be functioning normally, so
>> >> >> hopefully the data is still intact. I need advice on recovery steps
>> >> >> for the array.
>> >> >>
>> >> >> It appears that each drive failed in quick succession with /dev/sdc1
>> >> >> being the last standing and having the others marked as missing in its
>> >> >> superblock. The superblocks of the other drives show all drives as
>> >> >> available. (--examine output below)
>> >> >>
>> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
>> >> >> mdadm: too-old timestamp on backup-metadata on device-5
>> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
>> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
>> >> >
>> >> > Did you try following the suggestion and run
>> >> >
>> >> >  export MDADM_GROW_ALLOW_OLD=1
>> >> >
>> >> > and the try the --asssemble again?
>> >> >
>> >> > NeilBrown
>> >>
>> >> Yes I did, thanks. Not much change though. It accepts the timestamp,
>> >> but then appears not to use it.
>> >>
>> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>> >> /dev/sdf1 /dev/sdg1 --verbose
>> >> mdadm: looking for devices for /dev/md127
>> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
>> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
>> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
>> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
>> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
>> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
>> >> mdadm: :/dev/md127 has an active reshape - checking if critical
>> >> section needs to be restored
>> >> mdadm: accepting backup with timestamp 1381360844 for array with
>> >> timestamp 1381729948
>> >> mdadm: backup-metadata found on device-5 but is not needed
>> >> mdadm: added /dev/sdf1 to /dev/md127 as 1
>> >> mdadm: added /dev/sdd1 to /dev/md127 as 2
>> >> mdadm: added /dev/sdc1 to /dev/md127 as 3
>> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
>> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
>> >> mdadm: added /dev/sde1 to /dev/md127 as 0
>> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>> >
>> >
>> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ??
>> >
>> > If that doesn't work, please add --verbose as well, and report the output.
>> >
>> > NeilBrown
>>
>> Thanks Neil. I had tried that as well (output below). I'm wondering if
>> there is a way to fix the metadata for /dev/sdc1 since that seems to
>> be the odd one where the --examine data indicates that the other disks
>> are all bad when I don't believe they really are (just the result of a
>> partial kernel or driver crash). I have read about some people zeroing
>> the superblock on a device so that it can be recreated, but I am not
>> sure exactly how that works and am hesitant to try it since a reshape
>> was in progress. I have also read about people having had success by
>> re-running the original mdadm --create while leaving the data intact,
>> but again I am hesitant to try that, especially because of the reshape
>> state.
>>
>> Or... maybe this all has more to do with the Update Time, since the
>> output seems to indicate 4 drives are usable. All of the drives have
>> the same Update Time except for /dev/sdc1 which is about 5 minutes
>> later than the rest. Since it is the fourth device, perhaps the
>> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an
>> Update Time on devices 4 and 5 that is earlier than device 3, it
>> marks them as "possibly out of date" and stops trying to assemble the
>> array. Hard to tell, but I still would not have any idea how to
>> overcome that scenario. I appreciate your help!
>>
>> # export MDADM_GROW_ALLOW_OLD=1
>> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>> /dev/sdf1 /dev/sdg1 --force --verbose
>> mdadm: looking for devices for /dev/md127
>> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
>> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
>> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
>> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
>> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
>> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
>> mdadm: :/dev/md127 has an active reshape - checking if critical
>> section needs to be restored
>> mdadm: accepting backup with timestamp 1381360844 for array with
>> timestamp 1381729948
>> mdadm: backup-metadata found on device-5 but is not needed
>> mdadm: added /dev/sdf1 to /dev/md127 as 1
>> mdadm: added /dev/sdd1 to /dev/md127 as 2
>> mdadm: added /dev/sdc1 to /dev/md127 as 3
>> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
>> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
>> mdadm: added /dev/sde1 to /dev/md127 as 0
>> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>
> That shouldn't happen.  With '-f' it should force the event count of either b1
> or g1 (or maybe both) to match the others.
>
> What version of mdadm are you using? (mdadm -V)
>

mdadm - v3.3 - 3rd September 2013
(Arch Linux)

> Maybe try the latest
>   git clone git://git.neil.brown.name/mdadm
>   cd mdadm
>   make mdadm
>   ./mdadm .....
>
> NeilBrown

OK, trying the latest...

# ./mdadm -V
mdadm - v3.3-27-ga4921f3 - 16th October 2013

# uname -rv
3.11.4-1-ARCH #1 SMP PREEMPT Sat Oct 5 21:22:51 CEST 2013

No change in the result and I don't see errors anywhere indicating a
problem writing to /dev/sdb1 or /dev/sdg1. Are there any more debug
options that I am overlooking?

# ./mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1
/dev/sde1 /dev/sdf1 /dev/sdg1 -f -v
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
mdadm: :/dev/md127 has an active reshape - checking if critical
section needs to be restored
mdadm: accepting backup with timestamp 1381360844 for array with
timestamp 1381729948
mdadm: backup-metadata found on device-5 but is not needed
mdadm: added /dev/sdf1 to /dev/md127 as 1
mdadm: added /dev/sdd1 to /dev/md127 as 2
mdadm: added /dev/sdc1 to /dev/md127 as 3
mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
mdadm: added /dev/sde1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.

# ./mdadm --examine /dev/sd[bcdefg]1 | egrep '/dev/sd|Events|Update|Role|State'
/dev/sdb1:
          State : clean
    Update Time : Mon Oct 14 01:52:28 2013
         Events : 155279
   Device Role : Active device 4
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
          State : clean
    Update Time : Mon Oct 14 01:57:26 2013
         Events : 155281
   Device Role : Active device 3
   Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
          State : clean
    Update Time : Mon Oct 14 01:52:28 2013
         Events : 155281
   Device Role : Active device 2
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
          State : clean
    Update Time : Mon Oct 14 01:52:28 2013
         Events : 155281
   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
          State : clean
    Update Time : Mon Oct 14 01:52:28 2013
         Events : 155281
   Device Role : Active device 1
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
          State : clean
    Update Time : Mon Oct 14 01:52:28 2013
         Events : 155279
   Device Role : Active device 5
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)



Not sure is this is significant but at boot time they are all shown as
spares though the indexing seems odd in that index 2 is skipped:

# cat /proc/mdstat
Personalities :
md127 : inactive sdf1[1](S) sde1[0](S) sdg1[6](S) sdd1[3](S)
sdb1[5](S) sdc1[4](S)
      11717972214 blocks super 1.2

unused devices: <none>


Then I do an `mdadm --stop /dev/md127` before trying the assemble.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-21 16:29           ` John Yates
@ 2013-10-21 20:06             ` John Yates
  2013-10-21 22:51               ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: John Yates @ 2013-10-21 20:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Oct 21, 2013 at 12:29 PM, John Yates <jyates65@gmail.com> wrote:
> On Sun, Oct 20, 2013 at 9:09 PM, NeilBrown <neilb@suse.de> wrote:
>> On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote:
>>
>>> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote:
>>> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote:
>>> >
>>> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote:
>>> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:
>>> >> >
>>> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected
>>> >> >> drives, system logs show that the kernel lost communication with some
>>> >> >> of the drive ports which has left my array in a state that I have not
>>> >> >> been able to reassemble. After reseating the cable connections and
>>> >> >> rebooting, all of the drives appear to be functioning normally, so
>>> >> >> hopefully the data is still intact. I need advice on recovery steps
>>> >> >> for the array.
>>> >> >>
>>> >> >> It appears that each drive failed in quick succession with /dev/sdc1
>>> >> >> being the last standing and having the others marked as missing in its
>>> >> >> superblock. The superblocks of the other drives show all drives as
>>> >> >> available. (--examine output below)
>>> >> >>
>>> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
>>> >> >> mdadm: too-old timestamp on backup-metadata on device-5
>>> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
>>> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
>>> >> >
>>> >> > Did you try following the suggestion and run
>>> >> >
>>> >> >  export MDADM_GROW_ALLOW_OLD=1
>>> >> >
>>> >> > and the try the --asssemble again?
>>> >> >
>>> >> > NeilBrown
>>> >>
>>> >> Yes I did, thanks. Not much change though. It accepts the timestamp,
>>> >> but then appears not to use it.
>>> >>
>>> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>> >> /dev/sdf1 /dev/sdg1 --verbose
>>> >> mdadm: looking for devices for /dev/md127
>>> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
>>> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
>>> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
>>> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
>>> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
>>> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
>>> >> mdadm: :/dev/md127 has an active reshape - checking if critical
>>> >> section needs to be restored
>>> >> mdadm: accepting backup with timestamp 1381360844 for array with
>>> >> timestamp 1381729948
>>> >> mdadm: backup-metadata found on device-5 but is not needed
>>> >> mdadm: added /dev/sdf1 to /dev/md127 as 1
>>> >> mdadm: added /dev/sdd1 to /dev/md127 as 2
>>> >> mdadm: added /dev/sdc1 to /dev/md127 as 3
>>> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
>>> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
>>> >> mdadm: added /dev/sde1 to /dev/md127 as 0
>>> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>>> >
>>> >
>>> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ??
>>> >
>>> > If that doesn't work, please add --verbose as well, and report the output.
>>> >
>>> > NeilBrown
>>>
>>> Thanks Neil. I had tried that as well (output below). I'm wondering if
>>> there is a way to fix the metadata for /dev/sdc1 since that seems to
>>> be the odd one where the --examine data indicates that the other disks
>>> are all bad when I don't believe they really are (just the result of a
>>> partial kernel or driver crash). I have read about some people zeroing
>>> the superblock on a device so that it can be recreated, but I am not
>>> sure exactly how that works and am hesitant to try it since a reshape
>>> was in progress. I have also read about people having had success by
>>> re-running the original mdadm --create while leaving the data intact,
>>> but again I am hesitant to try that, especially because of the reshape
>>> state.
>>>
>>> Or... maybe this all has more to do with the Update Time, since the
>>> output seems to indicate 4 drives are usable. All of the drives have
>>> the same Update Time except for /dev/sdc1 which is about 5 minutes
>>> later than the rest. Since it is the fourth device, perhaps the
>>> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an
>>> Update Time on devices 4 and 5 that is earlier than device 3, it
>>> marks them as "possibly out of date" and stops trying to assemble the
>>> array. Hard to tell, but I still would not have any idea how to
>>> overcome that scenario. I appreciate your help!
>>>
>>> # export MDADM_GROW_ALLOW_OLD=1
>>> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>> /dev/sdf1 /dev/sdg1 --force --verbose
>>> mdadm: looking for devices for /dev/md127
>>> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
>>> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
>>> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
>>> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
>>> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
>>> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
>>> mdadm: :/dev/md127 has an active reshape - checking if critical
>>> section needs to be restored
>>> mdadm: accepting backup with timestamp 1381360844 for array with
>>> timestamp 1381729948
>>> mdadm: backup-metadata found on device-5 but is not needed
>>> mdadm: added /dev/sdf1 to /dev/md127 as 1
>>> mdadm: added /dev/sdd1 to /dev/md127 as 2
>>> mdadm: added /dev/sdc1 to /dev/md127 as 3
>>> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
>>> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
>>> mdadm: added /dev/sde1 to /dev/md127 as 0
>>> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>>
>> That shouldn't happen.  With '-f' it should force the event count of either b1
>> or g1 (or maybe both) to match the others.
>>
>> What version of mdadm are you using? (mdadm -V)
>>
>
> mdadm - v3.3 - 3rd September 2013
> (Arch Linux)
>
>> Maybe try the latest
>>   git clone git://git.neil.brown.name/mdadm
>>   cd mdadm
>>   make mdadm
>>   ./mdadm .....
>>
>> NeilBrown
>
> OK, trying the latest...
>
> # ./mdadm -V
> mdadm - v3.3-27-ga4921f3 - 16th October 2013
>
> # uname -rv
> 3.11.4-1-ARCH #1 SMP PREEMPT Sat Oct 5 21:22:51 CEST 2013
>
> No change in the result and I don't see errors anywhere indicating a
> problem writing to /dev/sdb1 or /dev/sdg1. Are there any more debug
> options that I am overlooking?
>
> # ./mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1
> /dev/sde1 /dev/sdf1 /dev/sdg1 -f -v
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> mdadm: :/dev/md127 has an active reshape - checking if critical
> section needs to be restored
> mdadm: accepting backup with timestamp 1381360844 for array with
> timestamp 1381729948
> mdadm: backup-metadata found on device-5 but is not needed
> mdadm: added /dev/sdf1 to /dev/md127 as 1
> mdadm: added /dev/sdd1 to /dev/md127 as 2
> mdadm: added /dev/sdc1 to /dev/md127 as 3
> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> mdadm: added /dev/sde1 to /dev/md127 as 0
> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>
> # ./mdadm --examine /dev/sd[bcdefg]1 | egrep '/dev/sd|Events|Update|Role|State'
> /dev/sdb1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155279
>    Device Role : Active device 4
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdc1:
>           State : clean
>     Update Time : Mon Oct 14 01:57:26 2013
>          Events : 155281
>    Device Role : Active device 3
>    Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdd1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155281
>    Device Role : Active device 2
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sde1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155281
>    Device Role : Active device 0
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdf1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155281
>    Device Role : Active device 1
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdg1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155279
>    Device Role : Active device 5
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
>
>
>
> Not sure is this is significant but at boot time they are all shown as
> spares though the indexing seems odd in that index 2 is skipped:
>
> # cat /proc/mdstat
> Personalities :
> md127 : inactive sdf1[1](S) sde1[0](S) sdg1[6](S) sdd1[3](S)
> sdb1[5](S) sdc1[4](S)
>       11717972214 blocks super 1.2
>
> unused devices: <none>
>
>
> Then I do an `mdadm --stop /dev/md127` before trying the assemble.

OK, I got the array started and is has resumed reshaping.

Line 806 of Assemble.c:
for (i = 0; i < content->array.raid_disks && i < bestcnt; i++) {

'bestcnt' appears to be an index into the list of available devices,
including non-array members. The loop condition here limits iteration
to the number of devices in the array. In my array, there are some
non-member devices early in the list, so later members are not
considered for updating. Perhaps the 'i < content->array.raid_disks'
condition is not needed here?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advice recovering from interrupted grow on RAID5 array
  2013-10-21 20:06             ` John Yates
@ 2013-10-21 22:51               ` NeilBrown
  0 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2013-10-21 22:51 UTC (permalink / raw)
  To: John Yates; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 12121 bytes --]

On Mon, 21 Oct 2013 16:06:27 -0400 John Yates <jyates65@gmail.com> wrote:

> On Mon, Oct 21, 2013 at 12:29 PM, John Yates <jyates65@gmail.com> wrote:
> > On Sun, Oct 20, 2013 at 9:09 PM, NeilBrown <neilb@suse.de> wrote:
> >> On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@gmail.com> wrote:
> >>
> >>> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@suse.de> wrote:
> >>> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@gmail.com> wrote:
> >>> >
> >>> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@suse.de> wrote:
> >>> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@gmail.com> wrote:
> >>> >> >
> >>> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected
> >>> >> >> drives, system logs show that the kernel lost communication with some
> >>> >> >> of the drive ports which has left my array in a state that I have not
> >>> >> >> been able to reassemble. After reseating the cable connections and
> >>> >> >> rebooting, all of the drives appear to be functioning normally, so
> >>> >> >> hopefully the data is still intact. I need advice on recovery steps
> >>> >> >> for the array.
> >>> >> >>
> >>> >> >> It appears that each drive failed in quick succession with /dev/sdc1
> >>> >> >> being the last standing and having the others marked as missing in its
> >>> >> >> superblock. The superblocks of the other drives show all drives as
> >>> >> >> available. (--examine output below)
> >>> >> >>
> >>> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
> >>> >> >> mdadm: too-old timestamp on backup-metadata on device-5
> >>> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
> >>> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
> >>> >> >
> >>> >> > Did you try following the suggestion and run
> >>> >> >
> >>> >> >  export MDADM_GROW_ALLOW_OLD=1
> >>> >> >
> >>> >> > and the try the --asssemble again?
> >>> >> >
> >>> >> > NeilBrown
> >>> >>
> >>> >> Yes I did, thanks. Not much change though. It accepts the timestamp,
> >>> >> but then appears not to use it.
> >>> >>
> >>> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> >>> >> /dev/sdf1 /dev/sdg1 --verbose
> >>> >> mdadm: looking for devices for /dev/md127
> >>> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> >>> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> >>> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> >>> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> >>> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> >>> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> >>> >> mdadm: :/dev/md127 has an active reshape - checking if critical
> >>> >> section needs to be restored
> >>> >> mdadm: accepting backup with timestamp 1381360844 for array with
> >>> >> timestamp 1381729948
> >>> >> mdadm: backup-metadata found on device-5 but is not needed
> >>> >> mdadm: added /dev/sdf1 to /dev/md127 as 1
> >>> >> mdadm: added /dev/sdd1 to /dev/md127 as 2
> >>> >> mdadm: added /dev/sdc1 to /dev/md127 as 3
> >>> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> >>> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> >>> >> mdadm: added /dev/sde1 to /dev/md127 as 0
> >>> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
> >>> >
> >>> >
> >>> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ??
> >>> >
> >>> > If that doesn't work, please add --verbose as well, and report the output.
> >>> >
> >>> > NeilBrown
> >>>
> >>> Thanks Neil. I had tried that as well (output below). I'm wondering if
> >>> there is a way to fix the metadata for /dev/sdc1 since that seems to
> >>> be the odd one where the --examine data indicates that the other disks
> >>> are all bad when I don't believe they really are (just the result of a
> >>> partial kernel or driver crash). I have read about some people zeroing
> >>> the superblock on a device so that it can be recreated, but I am not
> >>> sure exactly how that works and am hesitant to try it since a reshape
> >>> was in progress. I have also read about people having had success by
> >>> re-running the original mdadm --create while leaving the data intact,
> >>> but again I am hesitant to try that, especially because of the reshape
> >>> state.
> >>>
> >>> Or... maybe this all has more to do with the Update Time, since the
> >>> output seems to indicate 4 drives are usable. All of the drives have
> >>> the same Update Time except for /dev/sdc1 which is about 5 minutes
> >>> later than the rest. Since it is the fourth device, perhaps the
> >>> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an
> >>> Update Time on devices 4 and 5 that is earlier than device 3, it
> >>> marks them as "possibly out of date" and stops trying to assemble the
> >>> array. Hard to tell, but I still would not have any idea how to
> >>> overcome that scenario. I appreciate your help!
> >>>
> >>> # export MDADM_GROW_ALLOW_OLD=1
> >>> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> >>> /dev/sdf1 /dev/sdg1 --force --verbose
> >>> mdadm: looking for devices for /dev/md127
> >>> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> >>> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> >>> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> >>> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> >>> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> >>> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> >>> mdadm: :/dev/md127 has an active reshape - checking if critical
> >>> section needs to be restored
> >>> mdadm: accepting backup with timestamp 1381360844 for array with
> >>> timestamp 1381729948
> >>> mdadm: backup-metadata found on device-5 but is not needed
> >>> mdadm: added /dev/sdf1 to /dev/md127 as 1
> >>> mdadm: added /dev/sdd1 to /dev/md127 as 2
> >>> mdadm: added /dev/sdc1 to /dev/md127 as 3
> >>> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> >>> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> >>> mdadm: added /dev/sde1 to /dev/md127 as 0
> >>> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
> >>
> >> That shouldn't happen.  With '-f' it should force the event count of either b1
> >> or g1 (or maybe both) to match the others.
> >>
> >> What version of mdadm are you using? (mdadm -V)
> >>
> >
> > mdadm - v3.3 - 3rd September 2013
> > (Arch Linux)
> >
> >> Maybe try the latest
> >>   git clone git://git.neil.brown.name/mdadm
> >>   cd mdadm
> >>   make mdadm
> >>   ./mdadm .....
> >>
> >> NeilBrown
> >
> > OK, trying the latest...
> >
> > # ./mdadm -V
> > mdadm - v3.3-27-ga4921f3 - 16th October 2013
> >
> > # uname -rv
> > 3.11.4-1-ARCH #1 SMP PREEMPT Sat Oct 5 21:22:51 CEST 2013
> >
> > No change in the result and I don't see errors anywhere indicating a
> > problem writing to /dev/sdb1 or /dev/sdg1. Are there any more debug
> > options that I am overlooking?
> >
> > # ./mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1
> > /dev/sde1 /dev/sdf1 /dev/sdg1 -f -v
> > mdadm: looking for devices for /dev/md127
> > mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> > mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> > mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> > mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> > mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> > mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> > mdadm: :/dev/md127 has an active reshape - checking if critical
> > section needs to be restored
> > mdadm: accepting backup with timestamp 1381360844 for array with
> > timestamp 1381729948
> > mdadm: backup-metadata found on device-5 but is not needed
> > mdadm: added /dev/sdf1 to /dev/md127 as 1
> > mdadm: added /dev/sdd1 to /dev/md127 as 2
> > mdadm: added /dev/sdc1 to /dev/md127 as 3
> > mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> > mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> > mdadm: added /dev/sde1 to /dev/md127 as 0
> > mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
> >
> > # ./mdadm --examine /dev/sd[bcdefg]1 | egrep '/dev/sd|Events|Update|Role|State'
> > /dev/sdb1:
> >           State : clean
> >     Update Time : Mon Oct 14 01:52:28 2013
> >          Events : 155279
> >    Device Role : Active device 4
> >    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> > /dev/sdc1:
> >           State : clean
> >     Update Time : Mon Oct 14 01:57:26 2013
> >          Events : 155281
> >    Device Role : Active device 3
> >    Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
> > /dev/sdd1:
> >           State : clean
> >     Update Time : Mon Oct 14 01:52:28 2013
> >          Events : 155281
> >    Device Role : Active device 2
> >    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> > /dev/sde1:
> >           State : clean
> >     Update Time : Mon Oct 14 01:52:28 2013
> >          Events : 155281
> >    Device Role : Active device 0
> >    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> > /dev/sdf1:
> >           State : clean
> >     Update Time : Mon Oct 14 01:52:28 2013
> >          Events : 155281
> >    Device Role : Active device 1
> >    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> > /dev/sdg1:
> >           State : clean
> >     Update Time : Mon Oct 14 01:52:28 2013
> >          Events : 155279
> >    Device Role : Active device 5
> >    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> >
> >
> >
> > Not sure is this is significant but at boot time they are all shown as
> > spares though the indexing seems odd in that index 2 is skipped:
> >
> > # cat /proc/mdstat
> > Personalities :
> > md127 : inactive sdf1[1](S) sde1[0](S) sdg1[6](S) sdd1[3](S)
> > sdb1[5](S) sdc1[4](S)
> >       11717972214 blocks super 1.2
> >
> > unused devices: <none>
> >
> >
> > Then I do an `mdadm --stop /dev/md127` before trying the assemble.
> 
> OK, I got the array started and is has resumed reshaping.

I love it when someone solves their own problem :-)

> 
> Line 806 of Assemble.c:
> for (i = 0; i < content->array.raid_disks && i < bestcnt; i++) {
> 
> 'bestcnt' appears to be an index into the list of available devices,
> including non-array members. The loop condition here limits iteration
> to the number of devices in the array. In my array, there are some
> non-member devices early in the list, so later members are not
> considered for updating. Perhaps the 'i < content->array.raid_disks'
> condition is not needed here?

And when they find a bug for me too?  Double prizes!

This code was right, once.
The idea of the 'best' array was that entries from 0 up to raid_disks-1 report
the 'best' device for that slot in the array.  Subsequent entries are for
spares.
When this was the code the code was right - stopping at raid_disks was
correct.

However when I added support for replacement devices I subtly changed the
meaning of 'best'.
Now there are two entries for each slot (the original and the replacement) so
the spares start at raid_disks*2.

So this loop is now wrong.  It should ignore the replacements, and should
continue up to raid_disks*2.
i.e.
-		for (i = 0; i < content->array.raid_disks && i < bestcnt; i++) {
+		for (i = 0; i < content->array.raid_disks*2 && i < bestcnt; i+=2) {

though in the actual patch I'll wrap that line and fix a couple of similar
errors.

Thanks!

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-21 22:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-15  1:59 Advice recovering from interrupted grow on RAID5 array John Yates
2013-10-16  5:26 ` NeilBrown
2013-10-16 13:02   ` John Yates
2013-10-17  0:07     ` NeilBrown
2013-10-17  5:36       ` John Yates
2013-10-21  1:09         ` NeilBrown
2013-10-21 16:29           ` John Yates
2013-10-21 20:06             ` John Yates
2013-10-21 22:51               ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.