All of lore.kernel.org
 help / color / mirror / Atom feed
* drives failed during reshape, array won't even force-assemble
@ 2017-01-25 13:27 Thomas Warntjen
  2017-01-30 18:13 ` Phil Turmel
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Warntjen @ 2017-01-25 13:27 UTC (permalink / raw)
  To: linux-raid

On my new Ubuntu Server 16.4 LTS server I have an old RAID5 made from 
5+1 WD Red 3TB drives which I wanted to upgrade first to RAID6 (5+2) and 
then to 6 data disks, so I added 2 new drives und started the reshape:

# mdadm /dev/md1 --grow --level=6 --backup=/root/raid6.backupfile

When the reshape was at ~70% some wonky cabling caused some of the 
drives to temporarily fail (I heard the drives spin down after I 
accidently touched the cable - SMART says the disks are ok and another 
array on those disks starts just fine).
After a reboot, the array won't start, marking all the drives as spares 
(md1):

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] 
[raid0] [raid10]
md1 : inactive sdg3[3](S) sdj3[1](S) sdi3[6](S) sdh3[0](S) sdc3[2](S) 
sdd3[4](S) sdf3[5](S) sde3[8](S)
       23429580800 blocks super 0.91

md127 : active (auto-read-only) raid6 sdj1[7] sdi1[4] sdg1[2] sdh1[6] 
sdc1[0] sdf1[1] sde1[5] sdd1[3]
       6346752 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/8] 
[UUUUUUUU]

md0 : active raid1 sdb1[2] sda1[1]
       240022528 blocks super 1.2 [2/2] [UU]
       bitmap: 1/2 pages [4KB], 65536KB chunk


# mdadm --detail /dev/md1
/dev/md1:
         Version : 0.91
      Raid Level : raid0
   Total Devices : 8
Preferred Minor : 0
     Persistence : Superblock is persistent

           State : inactive

       New Level : raid6
      New Layout : left-symmetric
   New Chunksize : 64K

            UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
          Events : 0.12370980

     Number   Major   Minor   RaidDevice

        -       8       35        -        /dev/sdc3
        -       8       51        -        /dev/sdd3
        -       8       67        -        /dev/sde3
        -       8       83        -        /dev/sdf3
        -       8       99        -        /dev/sdg3
        -       8      115        -        /dev/sdh3
        -       8      131        -        /dev/sdi3
        -       8      147        -        /dev/sdj3


Since that was the second time the reshape was interrupted (the first 
time was an intentional reboot) I thaought I knew what I was doing and 
stopped and force-assembled the array. That didn't work and probably 
borked it some more...

So according to the RAID-Wiki 
(https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID) 
I stopped the array and created overlay files (and copied the backup-file).

mdadm -E tells me that probably sdd and sdf were the failing drives:

# parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Update'
/dev/mapper/sdc3            Update Time : Tue Jan 24 21:03:00 2017
/dev/mapper/sdd3            Update Time : Tue Jan 24 21:02:49 2017
/dev/mapper/sde3            Update Time : Tue Jan 24 21:10:19 2017
/dev/mapper/sdf3            Update Time : Tue Jan 24 21:02:49 2017
/dev/mapper/sdh3            Update Time : Tue Jan 24 21:03:00 2017
/dev/mapper/sdi3            Update Time : Tue Jan 24 21:10:19 2017
/dev/mapper/sdj3            Update Time : Tue Jan 24 21:03:00 2017
/dev/mapper/sdg3            Update Time : Tue Jan 24 21:10:19 2017

# parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Events'
/dev/mapper/sdc3                 Events : 12370980
/dev/mapper/sdd3                 Events : 12370974
/dev/mapper/sde3                 Events : 12370980
/dev/mapper/sdf3                 Events : 12370974
/dev/mapper/sdh3                 Events : 12370980
/dev/mapper/sdi3                 Events : 12370980
/dev/mapper/sdj3                 Events : 12370980
/dev/mapper/sdg3                 Events : 12370980


Obviously the disks have diverging ideas about the health of the array 
and interestingly also about their own identity:

/dev/sdc3:
       Number   Major   Minor   RaidDevice State
this     2       8       35        2      active sync   /dev/sdc3

    0     0       8      131        0      active sync   /dev/sdi3
    1     1       8      163        1      active sync
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       0        0        4      faulty removed
    5     5       0        0        5      faulty removed
    6     6       8      147        6      active   /dev/sdj3
    7     7       8       67        7      spare   /dev/sde3

/dev/sdd3:
       Number   Major   Minor   RaidDevice State
this     4       8       51        4      active sync   /dev/sdd3

    0     0       8      131        0      active sync   /dev/sdi3
    1     1       8      163        1      active sync
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       8       51        4      active sync   /dev/sdd3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8      147        6      active   /dev/sdj3
    7     7       8       67        7      spare   /dev/sde3

/dev/sde3:
       Number   Major   Minor   RaidDevice State
this     8       8       67        8      spare   /dev/sde3

    0     0       0        0        0      removed
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       0        0        4      faulty removed
    5     5       0        0        5      faulty removed
    6     6       8      147        6      active   /dev/sdj3
    7     7       8      131        7      faulty   /dev/sdi3

/dev/sdf3:
       Number   Major   Minor   RaidDevice State
this     5       8       83        5      active sync   /dev/sdf3

    0     0       8      131        0      active sync   /dev/sdi3
    1     1       8      163        1      active sync
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       8       51        4      active sync   /dev/sdd3
    5     5       8       83        5      active sync   /dev/sdf3
    6     6       8      147        6      active   /dev/sdj3
    7     7       8       67        7      spare   /dev/sde3

/dev/sdg3:
       Number   Major   Minor   RaidDevice State
this     3       8      115        3      active sync   /dev/sdh3

    0     0       0        0        0      removed
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       0        0        4      faulty removed
    5     5       0        0        5      faulty removed
    6     6       8      147        6      active   /dev/sdj3
    7     7       8      131        7      faulty   /dev/sdi3

/dev/sdh3:
       Number   Major   Minor   RaidDevice State
this     0       8      131        0      active sync   /dev/sdi3

    0     0       8      131        0      active sync   /dev/sdi3
    1     1       8      163        1      active sync
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       0        0        4      faulty removed
    5     5       0        0        5      faulty removed
    6     6       8      147        6      active   /dev/sdj3
    7     7       8       67        7      spare   /dev/sde3

/dev/sdi3:
       Number   Major   Minor   RaidDevice State
this     6       8      147        6      active   /dev/sdj3

    0     0       0        0        0      removed
    1     1       0        0        1      faulty removed
    2     2       0        0        2      faulty removed
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       0        0        4      faulty removed
    5     5       0        0        5      faulty removed
    6     6       8      147        6      active   /dev/sdj3
    7     7       8      131        7      faulty   /dev/sdi3

/dev/sdj3:
       Number   Major   Minor   RaidDevice State
this     1       8      163        1      active sync

    0     0       8      131        0      active sync   /dev/sdi3
    1     1       8      163        1      active sync
    2     2       8       35        2      active sync   /dev/sdc3
    3     3       8      115        3      active sync   /dev/sdh3
    4     4       0        0        4      faulty removed
    5     5       0        0        5      faulty removed
    6     6       8      147        6      active   /dev/sdj3
    7     7       8       67        7      spare   /dev/sde3


(for reference)

# l /dev/mapper/
total 0
drwxr-xr-x  2 root root     220 Jan 25 12:34 .
drwxr-xr-x 20 root root    5.5K Jan 25 12:34 ..
crw-------  1 root root 10, 236 Jan 25 12:20 control
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sdc3 -> ../dm-4
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sdd3 -> ../dm-6
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sde3 -> ../dm-5
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sdf3 -> ../dm-7
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sdg3 -> ../dm-2
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sdh3 -> ../dm-3
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sdi3 -> ../dm-0
lrwxrwxrwx  1 root root       7 Jan 25 12:55 sdj3 -> ../dm-1


The event-count of the drives doesn't look too bad, so I try to assemble 
the array:

# mdadm --assemble /dev/md1 $OVERLAYS --verbose 
--backup-file=raid6.backupfile
mdadm: looking for devices for /dev/md1
mdadm: /dev/mapper/sdc3 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/mapper/sdd3 is identified as a member of /dev/md1, slot 4.
mdadm: /dev/mapper/sde3 is identified as a member of /dev/md1, slot 8.
mdadm: /dev/mapper/sdf3 is identified as a member of /dev/md1, slot 5.
mdadm: /dev/mapper/sdh3 is identified as a member of /dev/md1, slot 0.
mdadm: /dev/mapper/sdi3 is identified as a member of /dev/md1, slot 6.
mdadm: /dev/mapper/sdj3 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/mapper/sdg3 is identified as a member of /dev/md1, slot 3.
mdadm: ignoring /dev/mapper/sdg3 as it reports /dev/mapper/sdc3 as failed
mdadm: ignoring /dev/mapper/sdi3 as it reports /dev/mapper/sdc3 as failed
mdadm: device 16 in /dev/md1 has wrong state in superblock, but 
/dev/mapper/sde3 seems ok
mdadm: /dev/md1 has an active reshape - checking if critical section 
needs to be restored
mdadm: restoring critical section
mdadm: added /dev/mapper/sdj3 to /dev/md1 as 1
mdadm: added /dev/mapper/sdc3 to /dev/md1 as 2
mdadm: no uptodate device for slot 3 of /dev/md1
mdadm: added /dev/mapper/sdd3 to /dev/md1 as 4 (possibly out of date)
mdadm: added /dev/mapper/sdf3 to /dev/md1 as 5 (possibly out of date)
mdadm: no uptodate device for slot 6 of /dev/md1
mdadm: added /dev/mapper/sde3 to /dev/md1 as 8
mdadm: added /dev/mapper/sdh3 to /dev/md1 as 0
mdadm: /dev/md1 assembled from 3 drives and 1 spare - not enough to 
start the array.


that was to be expected, now with --force:

# mdadm --assemble /dev/md1 $OVERLAYS --verbose 
--backup-file=raid6.backupfile --force
mdadm: looking for devices for /dev/md1
mdadm: /dev/mapper/sdc3 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/mapper/sdd3 is identified as a member of /dev/md1, slot 4.
mdadm: /dev/mapper/sde3 is identified as a member of /dev/md1, slot 8.
mdadm: /dev/mapper/sdf3 is identified as a member of /dev/md1, slot 5.
mdadm: /dev/mapper/sdh3 is identified as a member of /dev/md1, slot 0.
mdadm: /dev/mapper/sdi3 is identified as a member of /dev/md1, slot 6.
mdadm: /dev/mapper/sdj3 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/mapper/sdg3 is identified as a member of /dev/md1, slot 3.
mdadm: clearing FAULTY flag for device 2 in /dev/md1 for /dev/mapper/sde3
mdadm: Marking array /dev/md1 as 'clean'
mdadm: /dev/md1 has an active reshape - checking if critical section 
needs to be restored
mdadm: restoring critical section
mdadm: added /dev/mapper/sdj3 to /dev/md1 as 1
mdadm: added /dev/mapper/sdc3 to /dev/md1 as 2
mdadm: added /dev/mapper/sdg3 to /dev/md1 as 3
mdadm: added /dev/mapper/sdd3 to /dev/md1 as 4 (possibly out of date)
mdadm: added /dev/mapper/sdf3 to /dev/md1 as 5 (possibly out of date)
mdadm: added /dev/mapper/sdi3 to /dev/md1 as 6
mdadm: added /dev/mapper/sde3 to /dev/md1 as 8
mdadm: added /dev/mapper/sdh3 to /dev/md1 as 0
mdadm: failed to RUN_ARRAY /dev/md1: Input/output error


in the kern.log the following messages appeared:

Jan 25 13:02:51 Oghma kernel: [  765.051249] md: md1 stopped.
Jan 25 13:03:04 Oghma kernel: [  778.562635] md: bind<dm-1>
Jan 25 13:03:04 Oghma kernel: [  778.562780] md: bind<dm-4>
Jan 25 13:03:04 Oghma kernel: [  778.562891] md: bind<dm-2>
Jan 25 13:03:04 Oghma kernel: [  778.562999] md: bind<dm-6>
Jan 25 13:03:04 Oghma kernel: [  778.563104] md: bind<dm-7>
Jan 25 13:03:04 Oghma kernel: [  778.563207] md: bind<dm-0>
Jan 25 13:03:04 Oghma kernel: [  778.563400] md: bind<dm-5>
Jan 25 13:03:04 Oghma kernel: [  778.563577] md: bind<dm-3>
Jan 25 13:03:04 Oghma kernel: [  778.563720] md: kicking non-fresh dm-7 
from array!
Jan 25 13:03:04 Oghma kernel: [  778.563729] md: unbind<dm-7>
Jan 25 13:03:04 Oghma kernel: [  778.577201] md: export_rdev(dm-7)
Jan 25 13:03:04 Oghma kernel: [  778.577213] md: kicking non-fresh dm-6 
from array!
Jan 25 13:03:04 Oghma kernel: [  778.577223] md: unbind<dm-6>
Jan 25 13:03:04 Oghma kernel: [  778.605194] md: export_rdev(dm-6)
Jan 25 13:03:04 Oghma kernel: [  778.607491] md/raid:md1: reshape will 
continue
Jan 25 13:03:04 Oghma kernel: [  778.607541] md/raid:md1: device dm-3 
operational as raid disk 0
Jan 25 13:03:04 Oghma kernel: [  778.607545] md/raid:md1: device dm-2 
operational as raid disk 3
Jan 25 13:03:04 Oghma kernel: [  778.607549] md/raid:md1: device dm-4 
operational as raid disk 2
Jan 25 13:03:04 Oghma kernel: [  778.607551] md/raid:md1: device dm-1 
operational as raid disk 1
Jan 25 13:03:04 Oghma kernel: [  778.608605] md/raid:md1: allocated 7548kB
Jan 25 13:03:04 Oghma kernel: [  778.608733] md/raid:md1: not enough 
operational devices (3/7 failed)
Jan 25 13:03:04 Oghma kernel: [  778.608760] RAID conf printout:
Jan 25 13:03:04 Oghma kernel: [  778.608763]  --- level:6 rd:7 wd:4
Jan 25 13:03:04 Oghma kernel: [  778.608766]  disk 0, o:1, dev:dm-3
Jan 25 13:03:04 Oghma kernel: [  778.608769]  disk 1, o:1, dev:dm-1
Jan 25 13:03:04 Oghma kernel: [  778.608771]  disk 2, o:1, dev:dm-4
Jan 25 13:03:04 Oghma kernel: [  778.608773]  disk 3, o:1, dev:dm-2
Jan 25 13:03:04 Oghma kernel: [  778.608776]  disk 6, o:1, dev:dm-0
Jan 25 13:03:04 Oghma kernel: [  778.609364] md/raid:md1: failed to run 
raid set.
Jan 25 13:03:04 Oghma kernel: [  778.609367] md: pers->run() failed ...
Jan 25 13:03:04 Oghma kernel: [  778.609509] md: md1 stopped.
Jan 25 13:03:04 Oghma kernel: [  778.609519] md: unbind<dm-3>
Jan 25 13:03:04 Oghma kernel: [  778.629256] md: export_rdev(dm-3)
Jan 25 13:03:04 Oghma kernel: [  778.629273] md: unbind<dm-5>
Jan 25 13:03:04 Oghma kernel: [  778.649237] md: export_rdev(dm-5)
Jan 25 13:03:04 Oghma kernel: [  778.649255] md: unbind<dm-0>
Jan 25 13:03:04 Oghma kernel: [  778.665242] md: export_rdev(dm-0)
Jan 25 13:03:04 Oghma kernel: [  778.665259] md: unbind<dm-2>
Jan 25 13:03:04 Oghma kernel: [  778.681241] md: export_rdev(dm-2)
Jan 25 13:03:04 Oghma kernel: [  778.681258] md: unbind<dm-4>
Jan 25 13:03:04 Oghma kernel: [  778.693306] md: export_rdev(dm-4)
Jan 25 13:03:04 Oghma kernel: [  778.693323] md: unbind<dm-1>
Jan 25 13:03:04 Oghma kernel: [  778.705242] md: export_rdev(dm-1)


This seems to be the same problem this guy had 5 years ago 
https://www.spinics.net/lists/raid/msg37483.html but he got enough disks 
going to start the array.
What else is there I can do? This is my last hope :/

kernel: 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 
x86_64 x86_64 x86_64 GNU/Linux
mdadm: installed was "v3.3 - 3rd September 2013", now updated to "v3.4 - 
28th January 2016"

Thanks in advance!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drives failed during reshape, array won't even force-assemble
  2017-01-25 13:27 drives failed during reshape, array won't even force-assemble Thomas Warntjen
@ 2017-01-30 18:13 ` Phil Turmel
  2017-01-30 19:57   ` Thomas Warntjen
  0 siblings, 1 reply; 6+ messages in thread
From: Phil Turmel @ 2017-01-30 18:13 UTC (permalink / raw)
  To: Thomas Warntjen, linux-raid

Hi Thomas,

On 01/25/2017 08:27 AM, Thomas Warntjen wrote:
> On my new Ubuntu Server 16.4 LTS server I have an old RAID5 made from
> 5+1 WD Red 3TB drives which I wanted to upgrade first to RAID6 (5+2) and
> then to 6 data disks, so I added 2 new drives und started the reshape:

[trim /]

> This seems to be the same problem this guy had 5 years ago
> https://www.spinics.net/lists/raid/msg37483.html but he got enough disks
> going to start the array.
> What else is there I can do? This is my last hope :/
> 
> kernel: 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017
> x86_64 x86_64 x86_64 GNU/Linux
> mdadm: installed was "v3.3 - 3rd September 2013", now updated to "v3.4 -
> 28th January 2016"
> 
> Thanks in advance!

Did you ever get any help?  Or solve it on your own?  This looks like a
missed mail in the list archives.

Phil


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drives failed during reshape, array won't even force-assemble
  2017-01-30 18:13 ` Phil Turmel
@ 2017-01-30 19:57   ` Thomas Warntjen
  2017-01-31  0:29     ` Phil Turmel
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Warntjen @ 2017-01-30 19:57 UTC (permalink / raw)
  To: Phil Turmel, linux-raid

Hi Phil,

thanks for your reply - sadly it's the first I got so no, I haven't 
solved it yet. Any help is still highly appreciated!

Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drives failed during reshape, array won't even force-assemble
  2017-01-30 19:57   ` Thomas Warntjen
@ 2017-01-31  0:29     ` Phil Turmel
  2017-02-01 18:55       ` Thomas Warntjen
  0 siblings, 1 reply; 6+ messages in thread
From: Phil Turmel @ 2017-01-31  0:29 UTC (permalink / raw)
  To: Thomas Warntjen, linux-raid

On 01/30/2017 02:57 PM, Thomas Warntjen wrote:
> Hi Phil,
> 
> thanks for your reply - sadly it's the first I got so no, I haven't
> solved it yet. Any help is still highly appreciated!

Ok.

I'm a bit surprised forced assembly didn't work.  Please provide fresh
mdadm --examine output for all member devices (untrimmed), plus the
output from "ls -l /dev/disk/by-id/ata-*".

That'll help.  Please paste inline and turn off line wrap, so it all
comes through neatly.

Phil


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drives failed during reshape, array won't even force-assemble
  2017-01-31  0:29     ` Phil Turmel
@ 2017-02-01 18:55       ` Thomas Warntjen
  2017-02-04  0:52         ` Weedy
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Warntjen @ 2017-02-01 18:55 UTC (permalink / raw)
  To: Phil Turmel, linux-raid

Holy cow, I poked it with a stick and I think I did it!

As I've wrote before after a reboot the array was there but didn't 
start, and I've noticed the same thing happend with the overlay files 
right after I created them:

# /cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] 
[raid0] [raid10]
md1 : inactive dm-0[8](S) dm-1[6](S) dm-7[4](S) dm-6[2](S) dm-5[0](S) 
dm-3[1](S) dm-4[5](S) dm-2[3](S)
       23429580800 blocks super 0.91

# mdadm --detail /dev/md1
/dev/md1:
         Version : 0.91
      Raid Level : raid0
   Total Devices : 8
Preferred Minor : 0
     Persistence : Superblock is persistent

           State : inactive

       New Level : raid6
      New Layout : left-symmetric
   New Chunksize : 64K

            UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
          Events : 0.12370980

     Number   Major   Minor   RaidDevice

        -     252        0        -        /dev/dm-0
        -     252        1        -        /dev/dm-1
        -     252        2        -        /dev/dm-2
        -     252        3        -        /dev/dm-3
        -     252        4        -        /dev/dm-4
        -     252        5        -        /dev/dm-5
        -     252        6        -        /dev/dm-6
        -     252        7        -        /dev/dm-7

	
Now I tried

# mdadm --run /dev/md1
mdadm: failed to start array /dev/md1: Input/output error


and something interesting happend:

# mdadm --detail /dev/md1
/dev/md1:
         Version : 0.91
   Creation Time : Thu Sep  1 22:23:00 2011
      Raid Level : raid6
   Used Dev Size : 18446744073709551615
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 1
     Persistence : Superblock is persistent

     Update Time : Tue Jan 24 21:10:19 2017
           State : active, FAILED, Not Started
  Active Devices : 4
Working Devices : 6
  Failed Devices : 0
   Spare Devices : 2

          Layout : left-symmetric-6
      Chunk Size : 64K

      New Layout : left-symmetric

            UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
          Events : 0.12370980

     Number   Major   Minor   RaidDevice State
        0     252        5        0      active sync   /dev/dm-5
        1     252        3        1      active sync   /dev/dm-3
        2     252        6        2      active sync   /dev/dm-6
        3     252        2        3      active sync   /dev/dm-2
        -       0        0        4      removed
        -       0        0        5      removed
        6     252        1        6      spare rebuilding   /dev/dm-1

        8     252        0        -      spare   /dev/dm-0
	
	
let's try to add the missing drives:

# mdadm --manage /dev/md1 --add /dev/mapper/sdc3
mdadm: re-added /dev/mapper/sdc3
	
# mdadm --manage /dev/md1 --add /dev/mapper/sdd3
mdadm: re-added /dev/mapper/sdd3
	
# mdadm --detail /dev/md1
detail /dev/md1
/dev/md1:
         Version : 0.91
   Creation Time : Thu Sep  1 22:23:00 2011
      Raid Level : raid6
   Used Dev Size : 18446744073709551615
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 1
     Persistence : Superblock is persistent

     Update Time : Tue Jan 24 21:10:19 2017
           State : active, degraded, Not Started
  Active Devices : 6
Working Devices : 8
  Failed Devices : 0
   Spare Devices : 2

          Layout : left-symmetric-6
      Chunk Size : 64K

      New Layout : left-symmetric

            UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
          Events : 0.12370980

     Number   Major   Minor   RaidDevice State
        0     252        5        0      active sync   /dev/dm-5
        1     252        3        1      active sync   /dev/dm-3
        2     252        6        2      active sync   /dev/dm-6
        3     252        2        3      active sync   /dev/dm-2
        4     252        7        4      active sync   /dev/dm-7
        5     252        4        5      active sync   /dev/dm-4
        6     252        1        6      spare rebuilding   /dev/dm-1

        8     252        0        -      spare   /dev/dm-0
	

Not bad at all! But it still won't start, even with --run.  Maybe if I 
wait long enough for the rebuild to finish? But I still don't see it in 
/proc/mdstat and I don't want to wait for several days to see if it 
really rebuilds in the background.

So I poke it with a stick...

# echo "clean" > /sys/block/md1/md/array_state
-bash: echo: write error: Invalid argument	

nope

# echo "active" > /sys/block/md1/md/array_state
-bash: echo: write error: Invalid argument	

nope

# echo "readonly" > /sys/block/md1/md/array_state

wait, no error?

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] 
[raid0] [raid10]
md1 : active (read-only) raid6 dm-0[5] dm-2[4] dm-7[6] dm-6[3] dm-4[0] 
dm-1[2] dm-5[1] dm-3[8](S)
       14643488000 blocks super 0.91 level 6, 64k chunk, algorithm 18 
[7/6] [UUUUUU_]
       resync=PENDING
       bitmap: 175/175 pages [700KB], 8192KB chunk

# mdadm --detail /dev/md1
/dev/md1:
         Version : 0.91
   Creation Time : Thu Sep  1 22:23:00 2011
      Raid Level : raid6
      Array Size : 14643488000 (13965.12 GiB 14994.93 GB)
   Used Dev Size : 18446744073709551615
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 1
     Persistence : Superblock is persistent

   Intent Bitmap : Internal

     Update Time : Tue Jan 24 21:10:19 2017
           State : clean, degraded, resyncing (PENDING)
  Active Devices : 6
Working Devices : 8
  Failed Devices : 0
   Spare Devices : 2

          Layout : left-symmetric-6
      Chunk Size : 64K

      New Layout : left-symmetric

            UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
          Events : 0.12370980

     Number   Major   Minor   RaidDevice State
        0     252        4        0      active sync   /dev/dm-4
        1     252        5        1      active sync   /dev/dm-5
        2     252        1        2      active sync   /dev/dm-1
        3     252        6        3      active sync   /dev/dm-6
        4     252        2        4      active sync   /dev/dm-2
        5     252        0        5      active sync   /dev/dm-0
        6     252        7        6      spare rebuilding   /dev/dm-7

        8     252        3        -      spare   /dev/dm-3


still no error
	
# echo "clean" > /sys/block/md1/md/array_state

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] 
[raid0] [raid10]
md1 : active raid6 raid6 dm-0[5] dm-2[4] dm-7[6] dm-6[3] dm-4[0] dm-1[2] 
dm-5[1] dm-3[8](S)
       14643488000 blocks super 0.91 level 6, 64k chunk, algorithm 18 
[7/6] [UUUUUU_]
       [==============>......]  reshape = 74.6% (2185464448/2928697600) 
finish=7719.3min speed=1603K/sec
       bitmap: 175/175 pages [700KB], 8192KB chunk
       14643488000 blocks super 0.91 level 6, 64k chunk, algorithm 18 
[7/6] [UUUUUU_]
       resync=PENDING
       bitmap: 175/175 pages [700KB], 8192KB chunk

# mdadm --detail /dev/md1
/dev/md1:
         Version : 0.91
   Creation Time : Thu Sep  1 22:23:00 2011
      Raid Level : raid6
      Array Size : 14643488000 (13965.12 GiB 14994.93 GB)
   Used Dev Size : 18446744073709551615
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 1
     Persistence : Superblock is persistent

   Intent Bitmap : Internal

     Update Time : Tue Jan 31 20:09:30 2017
           State : clean, degraded, reshaping
  Active Devices : 6
Working Devices : 8
  Failed Devices : 0
   Spare Devices : 2

          Layout : left-symmetric-6
      Chunk Size : 64K

  Reshape Status : 74% complete
      New Layout : left-symmetric

            UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
          Events : 0.12370982

     Number   Major   Minor   RaidDevice State
        0     252        4        0      active sync   /dev/dm-4
        1     252        5        1      active sync   /dev/dm-5
        2     252        1        2      active sync   /dev/dm-1
        3     252        6        3      active sync   /dev/dm-6
        4     252        2        4      active sync   /dev/dm-2
        5     252        0        5      active sync   /dev/dm-0
        6     252        7        6      spare rebuilding   /dev/dm-7

        8     252        3        -      spare   /dev/dm-3

	
Looks good! fsck shows no errors, nothing in lost+found, so I've stopped 
the reshape (so the overlays won't fill the disk), mounted it readonly 
and backed up the more important data. That finished today, so I 
rebooted and did it for real. Reshape is finished, resync at 24% (6 
hours to go), fsck still looks good. w00t!

	

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drives failed during reshape, array won't even force-assemble
  2017-02-01 18:55       ` Thomas Warntjen
@ 2017-02-04  0:52         ` Weedy
  0 siblings, 0 replies; 6+ messages in thread
From: Weedy @ 2017-02-04  0:52 UTC (permalink / raw)
  To: Thomas Warntjen; +Cc: Phil Turmel, Linux RAID

On 1 February 2017 at 13:55, Thomas Warntjen <thomas@warntjen.net> wrote:
> Holy cow, I poked it with a stick and I think I did it!
>

I really hate these "wait, why the F did that fix it" solutions.
Always left feeling like you haven't learned anything after all your
work.

Congrats you have your data back :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-04  0:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-25 13:27 drives failed during reshape, array won't even force-assemble Thomas Warntjen
2017-01-30 18:13 ` Phil Turmel
2017-01-30 19:57   ` Thomas Warntjen
2017-01-31  0:29     ` Phil Turmel
2017-02-01 18:55       ` Thomas Warntjen
2017-02-04  0:52         ` Weedy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.