* drives failed during reshape, array won't even force-assemble
@ 2017-01-25 13:27 Thomas Warntjen
2017-01-30 18:13 ` Phil Turmel
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Warntjen @ 2017-01-25 13:27 UTC (permalink / raw)
To: linux-raid
On my new Ubuntu Server 16.4 LTS server I have an old RAID5 made from
5+1 WD Red 3TB drives which I wanted to upgrade first to RAID6 (5+2) and
then to 6 data disks, so I added 2 new drives und started the reshape:
# mdadm /dev/md1 --grow --level=6 --backup=/root/raid6.backupfile
When the reshape was at ~70% some wonky cabling caused some of the
drives to temporarily fail (I heard the drives spin down after I
accidently touched the cable - SMART says the disks are ok and another
array on those disks starts just fine).
After a reboot, the array won't start, marking all the drives as spares
(md1):
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath]
[raid0] [raid10]
md1 : inactive sdg3[3](S) sdj3[1](S) sdi3[6](S) sdh3[0](S) sdc3[2](S)
sdd3[4](S) sdf3[5](S) sde3[8](S)
23429580800 blocks super 0.91
md127 : active (auto-read-only) raid6 sdj1[7] sdi1[4] sdg1[2] sdh1[6]
sdc1[0] sdf1[1] sde1[5] sdd1[3]
6346752 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/8]
[UUUUUUUU]
md0 : active raid1 sdb1[2] sda1[1]
240022528 blocks super 1.2 [2/2] [UU]
bitmap: 1/2 pages [4KB], 65536KB chunk
# mdadm --detail /dev/md1
/dev/md1:
Version : 0.91
Raid Level : raid0
Total Devices : 8
Preferred Minor : 0
Persistence : Superblock is persistent
State : inactive
New Level : raid6
New Layout : left-symmetric
New Chunksize : 64K
UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
Events : 0.12370980
Number Major Minor RaidDevice
- 8 35 - /dev/sdc3
- 8 51 - /dev/sdd3
- 8 67 - /dev/sde3
- 8 83 - /dev/sdf3
- 8 99 - /dev/sdg3
- 8 115 - /dev/sdh3
- 8 131 - /dev/sdi3
- 8 147 - /dev/sdj3
Since that was the second time the reshape was interrupted (the first
time was an intentional reboot) I thaought I knew what I was doing and
stopped and force-assembled the array. That didn't work and probably
borked it some more...
So according to the RAID-Wiki
(https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID)
I stopped the array and created overlay files (and copied the backup-file).
mdadm -E tells me that probably sdd and sdf were the failing drives:
# parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Update'
/dev/mapper/sdc3 Update Time : Tue Jan 24 21:03:00 2017
/dev/mapper/sdd3 Update Time : Tue Jan 24 21:02:49 2017
/dev/mapper/sde3 Update Time : Tue Jan 24 21:10:19 2017
/dev/mapper/sdf3 Update Time : Tue Jan 24 21:02:49 2017
/dev/mapper/sdh3 Update Time : Tue Jan 24 21:03:00 2017
/dev/mapper/sdi3 Update Time : Tue Jan 24 21:10:19 2017
/dev/mapper/sdj3 Update Time : Tue Jan 24 21:03:00 2017
/dev/mapper/sdg3 Update Time : Tue Jan 24 21:10:19 2017
# parallel --tag -k mdadm -E ::: $OVERLAYS|grep -E 'Events'
/dev/mapper/sdc3 Events : 12370980
/dev/mapper/sdd3 Events : 12370974
/dev/mapper/sde3 Events : 12370980
/dev/mapper/sdf3 Events : 12370974
/dev/mapper/sdh3 Events : 12370980
/dev/mapper/sdi3 Events : 12370980
/dev/mapper/sdj3 Events : 12370980
/dev/mapper/sdg3 Events : 12370980
Obviously the disks have diverging ideas about the health of the array
and interestingly also about their own identity:
/dev/sdc3:
Number Major Minor RaidDevice State
this 2 8 35 2 active sync /dev/sdc3
0 0 8 131 0 active sync /dev/sdi3
1 1 8 163 1 active sync
2 2 8 35 2 active sync /dev/sdc3
3 3 8 115 3 active sync /dev/sdh3
4 4 0 0 4 faulty removed
5 5 0 0 5 faulty removed
6 6 8 147 6 active /dev/sdj3
7 7 8 67 7 spare /dev/sde3
/dev/sdd3:
Number Major Minor RaidDevice State
this 4 8 51 4 active sync /dev/sdd3
0 0 8 131 0 active sync /dev/sdi3
1 1 8 163 1 active sync
2 2 8 35 2 active sync /dev/sdc3
3 3 8 115 3 active sync /dev/sdh3
4 4 8 51 4 active sync /dev/sdd3
5 5 8 83 5 active sync /dev/sdf3
6 6 8 147 6 active /dev/sdj3
7 7 8 67 7 spare /dev/sde3
/dev/sde3:
Number Major Minor RaidDevice State
this 8 8 67 8 spare /dev/sde3
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 115 3 active sync /dev/sdh3
4 4 0 0 4 faulty removed
5 5 0 0 5 faulty removed
6 6 8 147 6 active /dev/sdj3
7 7 8 131 7 faulty /dev/sdi3
/dev/sdf3:
Number Major Minor RaidDevice State
this 5 8 83 5 active sync /dev/sdf3
0 0 8 131 0 active sync /dev/sdi3
1 1 8 163 1 active sync
2 2 8 35 2 active sync /dev/sdc3
3 3 8 115 3 active sync /dev/sdh3
4 4 8 51 4 active sync /dev/sdd3
5 5 8 83 5 active sync /dev/sdf3
6 6 8 147 6 active /dev/sdj3
7 7 8 67 7 spare /dev/sde3
/dev/sdg3:
Number Major Minor RaidDevice State
this 3 8 115 3 active sync /dev/sdh3
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 115 3 active sync /dev/sdh3
4 4 0 0 4 faulty removed
5 5 0 0 5 faulty removed
6 6 8 147 6 active /dev/sdj3
7 7 8 131 7 faulty /dev/sdi3
/dev/sdh3:
Number Major Minor RaidDevice State
this 0 8 131 0 active sync /dev/sdi3
0 0 8 131 0 active sync /dev/sdi3
1 1 8 163 1 active sync
2 2 8 35 2 active sync /dev/sdc3
3 3 8 115 3 active sync /dev/sdh3
4 4 0 0 4 faulty removed
5 5 0 0 5 faulty removed
6 6 8 147 6 active /dev/sdj3
7 7 8 67 7 spare /dev/sde3
/dev/sdi3:
Number Major Minor RaidDevice State
this 6 8 147 6 active /dev/sdj3
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 115 3 active sync /dev/sdh3
4 4 0 0 4 faulty removed
5 5 0 0 5 faulty removed
6 6 8 147 6 active /dev/sdj3
7 7 8 131 7 faulty /dev/sdi3
/dev/sdj3:
Number Major Minor RaidDevice State
this 1 8 163 1 active sync
0 0 8 131 0 active sync /dev/sdi3
1 1 8 163 1 active sync
2 2 8 35 2 active sync /dev/sdc3
3 3 8 115 3 active sync /dev/sdh3
4 4 0 0 4 faulty removed
5 5 0 0 5 faulty removed
6 6 8 147 6 active /dev/sdj3
7 7 8 67 7 spare /dev/sde3
(for reference)
# l /dev/mapper/
total 0
drwxr-xr-x 2 root root 220 Jan 25 12:34 .
drwxr-xr-x 20 root root 5.5K Jan 25 12:34 ..
crw------- 1 root root 10, 236 Jan 25 12:20 control
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sdc3 -> ../dm-4
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sdd3 -> ../dm-6
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sde3 -> ../dm-5
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sdf3 -> ../dm-7
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sdg3 -> ../dm-2
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sdh3 -> ../dm-3
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sdi3 -> ../dm-0
lrwxrwxrwx 1 root root 7 Jan 25 12:55 sdj3 -> ../dm-1
The event-count of the drives doesn't look too bad, so I try to assemble
the array:
# mdadm --assemble /dev/md1 $OVERLAYS --verbose
--backup-file=raid6.backupfile
mdadm: looking for devices for /dev/md1
mdadm: /dev/mapper/sdc3 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/mapper/sdd3 is identified as a member of /dev/md1, slot 4.
mdadm: /dev/mapper/sde3 is identified as a member of /dev/md1, slot 8.
mdadm: /dev/mapper/sdf3 is identified as a member of /dev/md1, slot 5.
mdadm: /dev/mapper/sdh3 is identified as a member of /dev/md1, slot 0.
mdadm: /dev/mapper/sdi3 is identified as a member of /dev/md1, slot 6.
mdadm: /dev/mapper/sdj3 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/mapper/sdg3 is identified as a member of /dev/md1, slot 3.
mdadm: ignoring /dev/mapper/sdg3 as it reports /dev/mapper/sdc3 as failed
mdadm: ignoring /dev/mapper/sdi3 as it reports /dev/mapper/sdc3 as failed
mdadm: device 16 in /dev/md1 has wrong state in superblock, but
/dev/mapper/sde3 seems ok
mdadm: /dev/md1 has an active reshape - checking if critical section
needs to be restored
mdadm: restoring critical section
mdadm: added /dev/mapper/sdj3 to /dev/md1 as 1
mdadm: added /dev/mapper/sdc3 to /dev/md1 as 2
mdadm: no uptodate device for slot 3 of /dev/md1
mdadm: added /dev/mapper/sdd3 to /dev/md1 as 4 (possibly out of date)
mdadm: added /dev/mapper/sdf3 to /dev/md1 as 5 (possibly out of date)
mdadm: no uptodate device for slot 6 of /dev/md1
mdadm: added /dev/mapper/sde3 to /dev/md1 as 8
mdadm: added /dev/mapper/sdh3 to /dev/md1 as 0
mdadm: /dev/md1 assembled from 3 drives and 1 spare - not enough to
start the array.
that was to be expected, now with --force:
# mdadm --assemble /dev/md1 $OVERLAYS --verbose
--backup-file=raid6.backupfile --force
mdadm: looking for devices for /dev/md1
mdadm: /dev/mapper/sdc3 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/mapper/sdd3 is identified as a member of /dev/md1, slot 4.
mdadm: /dev/mapper/sde3 is identified as a member of /dev/md1, slot 8.
mdadm: /dev/mapper/sdf3 is identified as a member of /dev/md1, slot 5.
mdadm: /dev/mapper/sdh3 is identified as a member of /dev/md1, slot 0.
mdadm: /dev/mapper/sdi3 is identified as a member of /dev/md1, slot 6.
mdadm: /dev/mapper/sdj3 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/mapper/sdg3 is identified as a member of /dev/md1, slot 3.
mdadm: clearing FAULTY flag for device 2 in /dev/md1 for /dev/mapper/sde3
mdadm: Marking array /dev/md1 as 'clean'
mdadm: /dev/md1 has an active reshape - checking if critical section
needs to be restored
mdadm: restoring critical section
mdadm: added /dev/mapper/sdj3 to /dev/md1 as 1
mdadm: added /dev/mapper/sdc3 to /dev/md1 as 2
mdadm: added /dev/mapper/sdg3 to /dev/md1 as 3
mdadm: added /dev/mapper/sdd3 to /dev/md1 as 4 (possibly out of date)
mdadm: added /dev/mapper/sdf3 to /dev/md1 as 5 (possibly out of date)
mdadm: added /dev/mapper/sdi3 to /dev/md1 as 6
mdadm: added /dev/mapper/sde3 to /dev/md1 as 8
mdadm: added /dev/mapper/sdh3 to /dev/md1 as 0
mdadm: failed to RUN_ARRAY /dev/md1: Input/output error
in the kern.log the following messages appeared:
Jan 25 13:02:51 Oghma kernel: [ 765.051249] md: md1 stopped.
Jan 25 13:03:04 Oghma kernel: [ 778.562635] md: bind<dm-1>
Jan 25 13:03:04 Oghma kernel: [ 778.562780] md: bind<dm-4>
Jan 25 13:03:04 Oghma kernel: [ 778.562891] md: bind<dm-2>
Jan 25 13:03:04 Oghma kernel: [ 778.562999] md: bind<dm-6>
Jan 25 13:03:04 Oghma kernel: [ 778.563104] md: bind<dm-7>
Jan 25 13:03:04 Oghma kernel: [ 778.563207] md: bind<dm-0>
Jan 25 13:03:04 Oghma kernel: [ 778.563400] md: bind<dm-5>
Jan 25 13:03:04 Oghma kernel: [ 778.563577] md: bind<dm-3>
Jan 25 13:03:04 Oghma kernel: [ 778.563720] md: kicking non-fresh dm-7
from array!
Jan 25 13:03:04 Oghma kernel: [ 778.563729] md: unbind<dm-7>
Jan 25 13:03:04 Oghma kernel: [ 778.577201] md: export_rdev(dm-7)
Jan 25 13:03:04 Oghma kernel: [ 778.577213] md: kicking non-fresh dm-6
from array!
Jan 25 13:03:04 Oghma kernel: [ 778.577223] md: unbind<dm-6>
Jan 25 13:03:04 Oghma kernel: [ 778.605194] md: export_rdev(dm-6)
Jan 25 13:03:04 Oghma kernel: [ 778.607491] md/raid:md1: reshape will
continue
Jan 25 13:03:04 Oghma kernel: [ 778.607541] md/raid:md1: device dm-3
operational as raid disk 0
Jan 25 13:03:04 Oghma kernel: [ 778.607545] md/raid:md1: device dm-2
operational as raid disk 3
Jan 25 13:03:04 Oghma kernel: [ 778.607549] md/raid:md1: device dm-4
operational as raid disk 2
Jan 25 13:03:04 Oghma kernel: [ 778.607551] md/raid:md1: device dm-1
operational as raid disk 1
Jan 25 13:03:04 Oghma kernel: [ 778.608605] md/raid:md1: allocated 7548kB
Jan 25 13:03:04 Oghma kernel: [ 778.608733] md/raid:md1: not enough
operational devices (3/7 failed)
Jan 25 13:03:04 Oghma kernel: [ 778.608760] RAID conf printout:
Jan 25 13:03:04 Oghma kernel: [ 778.608763] --- level:6 rd:7 wd:4
Jan 25 13:03:04 Oghma kernel: [ 778.608766] disk 0, o:1, dev:dm-3
Jan 25 13:03:04 Oghma kernel: [ 778.608769] disk 1, o:1, dev:dm-1
Jan 25 13:03:04 Oghma kernel: [ 778.608771] disk 2, o:1, dev:dm-4
Jan 25 13:03:04 Oghma kernel: [ 778.608773] disk 3, o:1, dev:dm-2
Jan 25 13:03:04 Oghma kernel: [ 778.608776] disk 6, o:1, dev:dm-0
Jan 25 13:03:04 Oghma kernel: [ 778.609364] md/raid:md1: failed to run
raid set.
Jan 25 13:03:04 Oghma kernel: [ 778.609367] md: pers->run() failed ...
Jan 25 13:03:04 Oghma kernel: [ 778.609509] md: md1 stopped.
Jan 25 13:03:04 Oghma kernel: [ 778.609519] md: unbind<dm-3>
Jan 25 13:03:04 Oghma kernel: [ 778.629256] md: export_rdev(dm-3)
Jan 25 13:03:04 Oghma kernel: [ 778.629273] md: unbind<dm-5>
Jan 25 13:03:04 Oghma kernel: [ 778.649237] md: export_rdev(dm-5)
Jan 25 13:03:04 Oghma kernel: [ 778.649255] md: unbind<dm-0>
Jan 25 13:03:04 Oghma kernel: [ 778.665242] md: export_rdev(dm-0)
Jan 25 13:03:04 Oghma kernel: [ 778.665259] md: unbind<dm-2>
Jan 25 13:03:04 Oghma kernel: [ 778.681241] md: export_rdev(dm-2)
Jan 25 13:03:04 Oghma kernel: [ 778.681258] md: unbind<dm-4>
Jan 25 13:03:04 Oghma kernel: [ 778.693306] md: export_rdev(dm-4)
Jan 25 13:03:04 Oghma kernel: [ 778.693323] md: unbind<dm-1>
Jan 25 13:03:04 Oghma kernel: [ 778.705242] md: export_rdev(dm-1)
This seems to be the same problem this guy had 5 years ago
https://www.spinics.net/lists/raid/msg37483.html but he got enough disks
going to start the array.
What else is there I can do? This is my last hope :/
kernel: 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux
mdadm: installed was "v3.3 - 3rd September 2013", now updated to "v3.4 -
28th January 2016"
Thanks in advance!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: drives failed during reshape, array won't even force-assemble
2017-01-25 13:27 drives failed during reshape, array won't even force-assemble Thomas Warntjen
@ 2017-01-30 18:13 ` Phil Turmel
2017-01-30 19:57 ` Thomas Warntjen
0 siblings, 1 reply; 6+ messages in thread
From: Phil Turmel @ 2017-01-30 18:13 UTC (permalink / raw)
To: Thomas Warntjen, linux-raid
Hi Thomas,
On 01/25/2017 08:27 AM, Thomas Warntjen wrote:
> On my new Ubuntu Server 16.4 LTS server I have an old RAID5 made from
> 5+1 WD Red 3TB drives which I wanted to upgrade first to RAID6 (5+2) and
> then to 6 data disks, so I added 2 new drives und started the reshape:
[trim /]
> This seems to be the same problem this guy had 5 years ago
> https://www.spinics.net/lists/raid/msg37483.html but he got enough disks
> going to start the array.
> What else is there I can do? This is my last hope :/
>
> kernel: 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017
> x86_64 x86_64 x86_64 GNU/Linux
> mdadm: installed was "v3.3 - 3rd September 2013", now updated to "v3.4 -
> 28th January 2016"
>
> Thanks in advance!
Did you ever get any help? Or solve it on your own? This looks like a
missed mail in the list archives.
Phil
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: drives failed during reshape, array won't even force-assemble
2017-01-30 18:13 ` Phil Turmel
@ 2017-01-30 19:57 ` Thomas Warntjen
2017-01-31 0:29 ` Phil Turmel
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Warntjen @ 2017-01-30 19:57 UTC (permalink / raw)
To: Phil Turmel, linux-raid
Hi Phil,
thanks for your reply - sadly it's the first I got so no, I haven't
solved it yet. Any help is still highly appreciated!
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: drives failed during reshape, array won't even force-assemble
2017-01-30 19:57 ` Thomas Warntjen
@ 2017-01-31 0:29 ` Phil Turmel
2017-02-01 18:55 ` Thomas Warntjen
0 siblings, 1 reply; 6+ messages in thread
From: Phil Turmel @ 2017-01-31 0:29 UTC (permalink / raw)
To: Thomas Warntjen, linux-raid
On 01/30/2017 02:57 PM, Thomas Warntjen wrote:
> Hi Phil,
>
> thanks for your reply - sadly it's the first I got so no, I haven't
> solved it yet. Any help is still highly appreciated!
Ok.
I'm a bit surprised forced assembly didn't work. Please provide fresh
mdadm --examine output for all member devices (untrimmed), plus the
output from "ls -l /dev/disk/by-id/ata-*".
That'll help. Please paste inline and turn off line wrap, so it all
comes through neatly.
Phil
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: drives failed during reshape, array won't even force-assemble
2017-01-31 0:29 ` Phil Turmel
@ 2017-02-01 18:55 ` Thomas Warntjen
2017-02-04 0:52 ` Weedy
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Warntjen @ 2017-02-01 18:55 UTC (permalink / raw)
To: Phil Turmel, linux-raid
Holy cow, I poked it with a stick and I think I did it!
As I've wrote before after a reboot the array was there but didn't
start, and I've noticed the same thing happend with the overlay files
right after I created them:
# /cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath]
[raid0] [raid10]
md1 : inactive dm-0[8](S) dm-1[6](S) dm-7[4](S) dm-6[2](S) dm-5[0](S)
dm-3[1](S) dm-4[5](S) dm-2[3](S)
23429580800 blocks super 0.91
# mdadm --detail /dev/md1
/dev/md1:
Version : 0.91
Raid Level : raid0
Total Devices : 8
Preferred Minor : 0
Persistence : Superblock is persistent
State : inactive
New Level : raid6
New Layout : left-symmetric
New Chunksize : 64K
UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
Events : 0.12370980
Number Major Minor RaidDevice
- 252 0 - /dev/dm-0
- 252 1 - /dev/dm-1
- 252 2 - /dev/dm-2
- 252 3 - /dev/dm-3
- 252 4 - /dev/dm-4
- 252 5 - /dev/dm-5
- 252 6 - /dev/dm-6
- 252 7 - /dev/dm-7
Now I tried
# mdadm --run /dev/md1
mdadm: failed to start array /dev/md1: Input/output error
and something interesting happend:
# mdadm --detail /dev/md1
/dev/md1:
Version : 0.91
Creation Time : Thu Sep 1 22:23:00 2011
Raid Level : raid6
Used Dev Size : 18446744073709551615
Raid Devices : 7
Total Devices : 6
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Jan 24 21:10:19 2017
State : active, FAILED, Not Started
Active Devices : 4
Working Devices : 6
Failed Devices : 0
Spare Devices : 2
Layout : left-symmetric-6
Chunk Size : 64K
New Layout : left-symmetric
UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
Events : 0.12370980
Number Major Minor RaidDevice State
0 252 5 0 active sync /dev/dm-5
1 252 3 1 active sync /dev/dm-3
2 252 6 2 active sync /dev/dm-6
3 252 2 3 active sync /dev/dm-2
- 0 0 4 removed
- 0 0 5 removed
6 252 1 6 spare rebuilding /dev/dm-1
8 252 0 - spare /dev/dm-0
let's try to add the missing drives:
# mdadm --manage /dev/md1 --add /dev/mapper/sdc3
mdadm: re-added /dev/mapper/sdc3
# mdadm --manage /dev/md1 --add /dev/mapper/sdd3
mdadm: re-added /dev/mapper/sdd3
# mdadm --detail /dev/md1
detail /dev/md1
/dev/md1:
Version : 0.91
Creation Time : Thu Sep 1 22:23:00 2011
Raid Level : raid6
Used Dev Size : 18446744073709551615
Raid Devices : 7
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Jan 24 21:10:19 2017
State : active, degraded, Not Started
Active Devices : 6
Working Devices : 8
Failed Devices : 0
Spare Devices : 2
Layout : left-symmetric-6
Chunk Size : 64K
New Layout : left-symmetric
UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
Events : 0.12370980
Number Major Minor RaidDevice State
0 252 5 0 active sync /dev/dm-5
1 252 3 1 active sync /dev/dm-3
2 252 6 2 active sync /dev/dm-6
3 252 2 3 active sync /dev/dm-2
4 252 7 4 active sync /dev/dm-7
5 252 4 5 active sync /dev/dm-4
6 252 1 6 spare rebuilding /dev/dm-1
8 252 0 - spare /dev/dm-0
Not bad at all! But it still won't start, even with --run. Maybe if I
wait long enough for the rebuild to finish? But I still don't see it in
/proc/mdstat and I don't want to wait for several days to see if it
really rebuilds in the background.
So I poke it with a stick...
# echo "clean" > /sys/block/md1/md/array_state
-bash: echo: write error: Invalid argument
nope
# echo "active" > /sys/block/md1/md/array_state
-bash: echo: write error: Invalid argument
nope
# echo "readonly" > /sys/block/md1/md/array_state
wait, no error?
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath]
[raid0] [raid10]
md1 : active (read-only) raid6 dm-0[5] dm-2[4] dm-7[6] dm-6[3] dm-4[0]
dm-1[2] dm-5[1] dm-3[8](S)
14643488000 blocks super 0.91 level 6, 64k chunk, algorithm 18
[7/6] [UUUUUU_]
resync=PENDING
bitmap: 175/175 pages [700KB], 8192KB chunk
# mdadm --detail /dev/md1
/dev/md1:
Version : 0.91
Creation Time : Thu Sep 1 22:23:00 2011
Raid Level : raid6
Array Size : 14643488000 (13965.12 GiB 14994.93 GB)
Used Dev Size : 18446744073709551615
Raid Devices : 7
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Jan 24 21:10:19 2017
State : clean, degraded, resyncing (PENDING)
Active Devices : 6
Working Devices : 8
Failed Devices : 0
Spare Devices : 2
Layout : left-symmetric-6
Chunk Size : 64K
New Layout : left-symmetric
UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
Events : 0.12370980
Number Major Minor RaidDevice State
0 252 4 0 active sync /dev/dm-4
1 252 5 1 active sync /dev/dm-5
2 252 1 2 active sync /dev/dm-1
3 252 6 3 active sync /dev/dm-6
4 252 2 4 active sync /dev/dm-2
5 252 0 5 active sync /dev/dm-0
6 252 7 6 spare rebuilding /dev/dm-7
8 252 3 - spare /dev/dm-3
still no error
# echo "clean" > /sys/block/md1/md/array_state
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath]
[raid0] [raid10]
md1 : active raid6 raid6 dm-0[5] dm-2[4] dm-7[6] dm-6[3] dm-4[0] dm-1[2]
dm-5[1] dm-3[8](S)
14643488000 blocks super 0.91 level 6, 64k chunk, algorithm 18
[7/6] [UUUUUU_]
[==============>......] reshape = 74.6% (2185464448/2928697600)
finish=7719.3min speed=1603K/sec
bitmap: 175/175 pages [700KB], 8192KB chunk
14643488000 blocks super 0.91 level 6, 64k chunk, algorithm 18
[7/6] [UUUUUU_]
resync=PENDING
bitmap: 175/175 pages [700KB], 8192KB chunk
# mdadm --detail /dev/md1
/dev/md1:
Version : 0.91
Creation Time : Thu Sep 1 22:23:00 2011
Raid Level : raid6
Array Size : 14643488000 (13965.12 GiB 14994.93 GB)
Used Dev Size : 18446744073709551615
Raid Devices : 7
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Jan 31 20:09:30 2017
State : clean, degraded, reshaping
Active Devices : 6
Working Devices : 8
Failed Devices : 0
Spare Devices : 2
Layout : left-symmetric-6
Chunk Size : 64K
Reshape Status : 74% complete
New Layout : left-symmetric
UUID : 7a58ed4f:baf1934e:a2963c6e:a542ed71
Events : 0.12370982
Number Major Minor RaidDevice State
0 252 4 0 active sync /dev/dm-4
1 252 5 1 active sync /dev/dm-5
2 252 1 2 active sync /dev/dm-1
3 252 6 3 active sync /dev/dm-6
4 252 2 4 active sync /dev/dm-2
5 252 0 5 active sync /dev/dm-0
6 252 7 6 spare rebuilding /dev/dm-7
8 252 3 - spare /dev/dm-3
Looks good! fsck shows no errors, nothing in lost+found, so I've stopped
the reshape (so the overlays won't fill the disk), mounted it readonly
and backed up the more important data. That finished today, so I
rebooted and did it for real. Reshape is finished, resync at 24% (6
hours to go), fsck still looks good. w00t!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: drives failed during reshape, array won't even force-assemble
2017-02-01 18:55 ` Thomas Warntjen
@ 2017-02-04 0:52 ` Weedy
0 siblings, 0 replies; 6+ messages in thread
From: Weedy @ 2017-02-04 0:52 UTC (permalink / raw)
To: Thomas Warntjen; +Cc: Phil Turmel, Linux RAID
On 1 February 2017 at 13:55, Thomas Warntjen <thomas@warntjen.net> wrote:
> Holy cow, I poked it with a stick and I think I did it!
>
I really hate these "wait, why the F did that fix it" solutions.
Always left feeling like you haven't learned anything after all your
work.
Congrats you have your data back :)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-02-04 0:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-25 13:27 drives failed during reshape, array won't even force-assemble Thomas Warntjen
2017-01-30 18:13 ` Phil Turmel
2017-01-30 19:57 ` Thomas Warntjen
2017-01-31 0:29 ` Phil Turmel
2017-02-01 18:55 ` Thomas Warntjen
2017-02-04 0:52 ` Weedy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.