* Re: Help needed recovering from raid failure
@ 2015-04-29 18:17 Peter van Es
2015-04-29 23:27 ` NeilBrown
0 siblings, 1 reply; 7+ messages in thread
From: Peter van Es @ 2015-04-29 18:17 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Dear Neil,
first of all, I really appreciate you trying to help me. This is the first time I’m deploying software raid, so really appreciate the guidance.
> On 29 Apr 2015, at 00:26, NeilBrown <neilb@suse.de> wrote:
>
> This isn't really reporting anything new.
> There is probably a daily cron job which reports all degraded arrays. This
> message is reported by that job.
I understand...
>
>
> Why do you think the array is off-line? The above message doesn't suggest
> that.
>
My Ubuntu server was accessible through ssh but did not serve webpages, files etc. When I went to the console,
it told me it had taken the array offline because of degraded /dev/sdd2 and /dev/sdc2
Those two drives were out of the array.
>
>>
>> Needless to say, I can't boot the system anymore as the boot drive is /dev/md0, and GRUB can't
>> get at it. I do need to recover data (I know, but there's stuf on there I have no backup for--yet).
>
> You boot off a RAID5? Does grub support that? I didn't know.
> But md0 hasn't failed, has it?
>
> Confused.
Well, it took a little time but yes, I managed to define a raid 5 array that the system was able to boot from.
> There is something VERY sick here. I suggest that you tread very carefully.
>
> All your '1' partitions should be about 2GB and the '2' parititions about 2TB
>
> But the --examine output suggests sda2 and sdb2 are 2TB, while sdd2 and sde2
> are 2GB.
>
> That really really shouldn't happen. Maybe check your partition table
> (fdisk).
> I really cannot see how this would happen.
But this question, and the previous question you asked, tell me a little of what I may have done…
I think confused /dev/md0 and /dev/md1 (now called /dev/md126 and /dev/md127 when running of the USB stick).
/dev/md0 is a swap array (around 6GB, comprised of 4 x 2 GB in raid 5)
/dev/md1 is the boot and data array (around 5 TB, comprised of 4 x ~2 TB in raid 5)
I must have confused them and tried to add the /dev/sdc2 and /dev/sdd2 drive to the /dev/md0 array (mdadm —add /dev/md0 /dev/sdc2)
instead of to the /dev/md1 array. They were then added as spare drives, their superblocks were overwritten, but since
a) no swap space was used, and
b) they were added as spares
The data should not have been overwritten.
>
> Can you
> mdadm -Ss
>
> to stop all the arrays, then
>
> fdisk -l /dev/sd?
>
> then
>
> mdadm -Esvv
>
Neil, here they are: again, I appreciate you taking the time and guiding me through this!
Is there any way to resurrect the super blocks and try to force assemble the array, skipping the failing drive /dev/sdd2 (the /dev/sdd2 drive created some errors I observed in the log, /dev/sdc2 must have had a one off issue to be taken out….). I have two new drives (arrived today), and a new SSD drive. I would want to get the new array assembled using /dev/sdc2 perhaps forcing it back to the array geometry and “hoping for the best” and then install a new /dev/sdd2 to be recovered. Then I’ll create a boot and swap drive off the SSD which means that any array failures should not prevent the system from booting…
Requested outputs are below
Thanks,
Peter
fdisk output: (USB devices deleted)
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f24ee
Device Boot Start End Blocks Id System
/dev/sda1 2048 3905535 1951744 fd Linux raid autodetect
/dev/sda2 * 3905536 3907028991 1951561728 fd Linux raid autodetect
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00029d5c
Device Boot Start End Blocks Id System
/dev/sdb1 2048 3905535 1951744 fd Linux raid autodetect
/dev/sdb2 * 3905536 3907028991 1951561728 fd Linux raid autodetect
Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000727bf
Device Boot Start End Blocks Id System
/dev/sdd1 2048 3905535 1951744 fd Linux raid autodetect
/dev/sdd2 * 3905536 3907028991 1951561728 fd Linux raid autodetect
Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009fe7f
Device Boot Start End Blocks Id System
/dev/sde1 2048 3905535 1951744 fd Linux raid autodetect
/dev/sde2 * 3905536 3907028991 1951561728 fd Linux raid autodetect
mdadm -Esvv output (USB devices deleted)
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : cdae3287:91168194:942ba99d:1a85c466
Update Time : Wed Apr 29 17:46:25 2015
Checksum : b8b84dad - correct
Events : 30
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : b051f523:4887e729:cd63bed1:8c2a7575
Update Time : Wed Apr 29 17:46:25 2015
Checksum : 453ddeef - correct
Events : 30
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing)
/dev/sde:
MBR Magic : aa55
Partition[0] : 3903488 sectors at 2048 (type fd)
Partition[1] : 3903123456 sectors at 3905536 (type fd)
/dev/sdd2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 0f3f2b91:09cbb344:e52c4c4b:722d65c4
Update Time : Wed Apr 29 17:46:25 2015
Checksum : 7e273c0f - correct
Events : 30
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : b6668730:3b1380bf:556700d9:30df829c
Update Time : Wed Apr 29 17:46:25 2015
Checksum : 15b83814 - correct
Events : 30
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing)
/dev/sdd:
MBR Magic : aa55
Partition[0] : 3903488 sectors at 2048 (type fd)
Partition[1] : 3903123456 sectors at 3905536 (type fd)
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
Name : ubuntu:1 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:58 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f1e79609:79b7ac23:55197f70:e8fbfd58
Update Time : Sun Apr 26 05:59:13 2015
Checksum : 696f4e76 - correct
Events : 18014
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA.. ('A' == active, '.' == missing)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f52239b1:0fb87e7e:71e29ea4:bf67184a
Update Time : Wed Apr 29 17:46:25 2015
Checksum : ce9c9cd0 - correct
Events : 30
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing)
/dev/sdb:
MBR Magic : aa55
Partition[0] : 3903488 sectors at 2048 (type fd)
Partition[1] : 3903123456 sectors at 3905536 (type fd)
/dev/sda2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
Name : ubuntu:1 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:58 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 713e556d:ca104217:785db68a:d820a57b
Update Time : Sun Apr 26 05:59:13 2015
Checksum : fda151f9 - correct
Events : 18014
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AA.. ('A' == active, '.' == missing)
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c483532d:06f93351:cfdf5a92:e83855b5
Update Time : Wed Apr 29 17:46:25 2015
Checksum : 76650d1c - correct
Events : 30
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing)
/dev/sda:
MBR Magic : aa55
Partition[0] : 3903488 sectors at 2048 (type fd)
Partition[1] : 3903123456 sectors at 3905536 (type fd)--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Help needed recovering from raid failure
2015-04-29 18:17 Help needed recovering from raid failure Peter van Es
@ 2015-04-29 23:27 ` NeilBrown
2015-04-30 19:25 ` Peter van Es
0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2015-04-29 23:27 UTC (permalink / raw)
To: Peter van Es; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5055 bytes --]
On Wed, 29 Apr 2015 20:17:09 +0200 Peter van Es <vanes.peter@gmail.com> wrote:
> Dear Neil,
>
> first of all, I really appreciate you trying to help me. This is the first time I’m deploying software raid, so really appreciate the guidance.
>
>
> > On 29 Apr 2015, at 00:26, NeilBrown <neilb@suse.de> wrote:
> >
> > This isn't really reporting anything new.
> > There is probably a daily cron job which reports all degraded arrays. This
> > message is reported by that job.
>
> I understand...
>
> >
> >
> > Why do you think the array is off-line? The above message doesn't suggest
> > that.
> >
>
> My Ubuntu server was accessible through ssh but did not serve webpages, files etc. When I went to the console,
> it told me it had taken the array offline because of degraded /dev/sdd2 and /dev/sdc2
> Those two drives were out of the array.
>
> >
> >>
> >> Needless to say, I can't boot the system anymore as the boot drive is /dev/md0, and GRUB can't
> >> get at it. I do need to recover data (I know, but there's stuf on there I have no backup for--yet).
> >
> > You boot off a RAID5? Does grub support that? I didn't know.
> > But md0 hasn't failed, has it?
> >
> > Confused.
>
> Well, it took a little time but yes, I managed to define a raid 5 array that the system was able to boot from.
>
> > There is something VERY sick here. I suggest that you tread very carefully.
> >
> > All your '1' partitions should be about 2GB and the '2' parititions about 2TB
> >
> > But the --examine output suggests sda2 and sdb2 are 2TB, while sdd2 and sde2
> > are 2GB.
> >
> > That really really shouldn't happen. Maybe check your partition table
> > (fdisk).
> > I really cannot see how this would happen.
>
> But this question, and the previous question you asked, tell me a little of what I may have done…
>
> I think confused /dev/md0 and /dev/md1 (now called /dev/md126 and /dev/md127 when running of the USB stick).
>
> /dev/md0 is a swap array (around 6GB, comprised of 4 x 2 GB in raid 5)
> /dev/md1 is the boot and data array (around 5 TB, comprised of 4 x ~2 TB in raid 5)
>
> I must have confused them and tried to add the /dev/sdc2 and /dev/sdd2 drive to the /dev/md0 array (mdadm —add /dev/md0 /dev/sdc2)
Oops!
> instead of to the /dev/md1 array. They were then added as spare drives, their superblocks were overwritten, but since
> a) no swap space was used, and
> b) they were added as spares
>
> The data should not have been overwritten.
Hopefully not.
>
> >
> > Can you
> > mdadm -Ss
> >
> > to stop all the arrays, then
> >
> > fdisk -l /dev/sd?
> >
> > then
> >
> > mdadm -Esvv
> >
>
> Neil, here they are: again, I appreciate you taking the time and guiding me through this!
>
> Is there any way to resurrect the super blocks and try to force assemble the array, skipping the failing drive /dev/sdd2 (the /dev/sdd2 drive created some errors I observed in the log, /dev/sdc2 must have had a one off issue to be taken out….). I have two new drives (arrived today), and a new SSD drive. I would want to get the new array assembled using /dev/sdc2 perhaps forcing it back to the array geometry and “hoping for the best” and then install a new /dev/sdd2 to be recovered. Then I’ll create a boot and swap drive off the SSD which means that any array failures should not prevent the system from booting…
As you have destroyed some metadata, it is no longer possible to 'assemble'
the array. We need to re-create it.
sda2 and sdb2 appear to be the first two drives of the array. sdd2 failed
first, so sdce is a better choice to use. It is probably reasonable to
assume that it was the fourth drive in the array. If that assumption proves
false then it might be the third.
Before doing this, double check that the names have changed, so check that
mdadm --examine /dev/sda2
shows
> Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
> Device Role : Active device 0
(among other info) and that
mdadm --exmaine /dev/sdb2
show the same Array UUID and
> Device Role : Active device 1
Then run
mdadm -C /dev/md1 -l5 -n4 --data-offset=262144s --metadata=1.2 --assume-clean \
/dev/sda2 /dev/sdb2 missing /dev/sde2
Then
fsck -n -f /dev/md1
If the works, mount /dev/md1 and have a look around and confirm everything
looks OK.
If fsck complains, we might have sde2 in the wrong position. Or maybe sde
and sdd changed names.
run
mdadm -Ss
then rerun the -C command with a different list of devices. e.g.
/dev/sda2 /dev/sdb2 /dev/sde2 missing
Always have one 'missing' device or you will be very likely to get
out-of-sync data.
Once you have data that look OK, copy out any really really important stuff
then, if you think the 4th drive is reliable enough, or if you have replaced
it, add '2' partition of the fourth drive to the array and let it rebuild.
Then you should be back to a safe working array.
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Help needed recovering from raid failure
2015-04-29 23:27 ` NeilBrown
@ 2015-04-30 19:25 ` Peter van Es
2015-05-01 2:31 ` NeilBrown
0 siblings, 1 reply; 7+ messages in thread
From: Peter van Es @ 2015-04-30 19:25 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Neil,
thanks. I followed your instructions (slightly modified as my version of mdadm did not support the --data-offset stanza). /dev/sdd was the 3rd drive and I had physically removed the 4th drive from my server.
I managed to restart the array. Then I replaced the failing drive, created partitions the same as on /dev/sda and added it to the two arrays.
It is now rebuilding for the data array, and will be done in 440 minutes.... It appears that I've lost nothing important...
One question: I did spot that the Array UUID has changed on the Create command. Is there any way of getting it back to the old value ?
Peter
>
> Before doing this, double check that the names have changed, so check that
> mdadm --examine /dev/sda2
> shows
>> Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
>> Device Role : Active device 0
>
> (among other info) and that
> mdadm --exmaine /dev/sdb2
> show the same Array UUID and
>> Device Role : Active device 1
>
>
> Then run
>
> mdadm -C /dev/md1 -l5 -n4 --data-offset=262144s --metadata=1.2 --assume-clean \
> /dev/sda2 /dev/sdb2 missing /dev/sde2
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Help needed recovering from raid failure
2015-04-30 19:25 ` Peter van Es
@ 2015-05-01 2:31 ` NeilBrown
0 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2015-05-01 2:31 UTC (permalink / raw)
To: Peter van Es; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]
On Thu, 30 Apr 2015 21:25:04 +0200 Peter van Es <vanes.peter@gmail.com> wrote:
> Neil,
>
> thanks. I followed your instructions (slightly modified as my version of mdadm did not support the --data-offset stanza). /dev/sdd was the 3rd drive and I had physically removed the 4th drive from my server.
>
> I managed to restart the array. Then I replaced the failing drive, created partitions the same as on /dev/sda and added it to the two arrays.
>
> It is now rebuilding for the data array, and will be done in 440 minutes.... It appears that I've lost nothing important...
Excellent.
>
> One question: I did spot that the Array UUID has changed on the Create command. Is there any way of getting it back to the old value ?
Why would you want to?
But I think you can. Firstly stop the array (so you need to be booted from a
USB or similar) and then
mdadm --assemble /dev/mdWHATEVER --update=uuid --uuid=your:favo:rite:nums ..list.of.devices..
NeilBrown
>
> Peter
>
>
> >
> > Before doing this, double check that the names have changed, so check that
> > mdadm --examine /dev/sda2
> > shows
> >> Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
> >> Device Role : Active device 0
> >
> > (among other info) and that
> > mdadm --exmaine /dev/sdb2
> > show the same Array UUID and
> >> Device Role : Active device 1
> >
> >
> > Then run
> >
> > mdadm -C /dev/md1 -l5 -n4 --data-offset=262144s --metadata=1.2 --assume-clean \
> > /dev/sda2 /dev/sdb2 missing /dev/sde2
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Help needed recovering from raid failure
2015-04-27 9:35 Peter van Es
2015-04-27 11:07 ` Mikael Abrahamsson
@ 2015-04-28 22:26 ` NeilBrown
1 sibling, 0 replies; 7+ messages in thread
From: NeilBrown @ 2015-04-28 22:26 UTC (permalink / raw)
To: Peter van Es; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 4049 bytes --]
On Mon, 27 Apr 2015 11:35:09 +0200 Peter van Es <vanes.peter@gmail.com> wrote:
> Sorry for the long post...
>
> I am running Ubuntu LTS 14.04.02 Server edition, 64 bits, with 4x 2.0TB drives in a raid-5 array.
>
> The 4th drive was beginning to show read errors. Because it was weekend, I could not go out
> and buy a spare 2TB drive to replace the one that was beginning to fail.
>
> I first got a fail event:
>
> This is an automatically generated mail message from mdadm
> running on bali
>
> A Fail event had been detected on md device /dev/md/1.
>
> It could be related to component device /dev/sdd2.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
> 5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
>
> md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
> 5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
>
> unused devices: <none>
>
> And then subsequently, around 18 hours later:
>
> This is an automatically generated mail message from mdadm
> running on bali
>
> A DegradedArray event had been detected on md device /dev/md/1.
This isn't really reporting anything new.
There is probably a daily cron job which reports all degraded arrays. This
message is reported by that job.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
> 5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
>
> md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
> 5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
>
> unused devices: <none>
>
> The server had taken the array off line at that point.
Why do you think the array is off-line? The above message doesn't suggest
that.
>
> Needless to say, I can't boot the system anymore as the boot drive is /dev/md0, and GRUB can't
> get at it. I do need to recover data (I know, but there's stuf on there I have no backup for--yet).
You boot off a RAID5? Does grub support that? I didn't know.
But md0 hasn't failed, has it?
Confused.
>
> I booted Linux from a USB stick (which is on /dev/sdc1 hence changing the numbering),
> in recovery mode. Below is the output of /proc/mdstat and
> mdadm --examine. It looks like somehow the /dev/sdd2 and /dev/sde2 drives took on the
> super block of the /dev/md127 device (my swap file). May that have been done by the boot from
> the Ubuntu USB stick?
There is something VERY sick here. I suggest that you tread very carefully.
All your '1' partitions should be about 2GB and the '2' parititions about 2TB
But the --examine output suggests sda2 and sdb2 are 2TB, while sdd2 and sde2
are 2GB.
That really really shouldn't happen. Maybe check your partition table
(fdisk).
I really cannot see how this would happen.
>
> My plan... assemble a degraded array, with /dev/sde2 (the 4th drive, formerly known as /dev/sdd2) not in it.
> Because the fail event put the file system in RO mode, I expect /dev/sdd2 (formerly /dev/sdc2) to be ok.
> Then insert new 2TB drive in slot 4. Let system resync and recover.
>
> I'm running xfs on the /dev/md1 device.
>
> Questions:
>
> 1. is this the wise course of action ?
> 2. how exactly do I reassemble the array (/etc/mdadm.conf is inaccessible in recovery mode)
> 3. what command line options do I use exactly from the --examine output below without screwing things up
>
> And help or pointers gratefully accepted
Can you
mdadm -Ss
to stop all the arrays, then
fdisk -l /dev/sd?
then
mdadm -Esvv
and post all of that. Hopefully some of it will make sense.
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Help needed recovering from raid failure
2015-04-27 9:35 Peter van Es
@ 2015-04-27 11:07 ` Mikael Abrahamsson
2015-04-28 22:26 ` NeilBrown
1 sibling, 0 replies; 7+ messages in thread
From: Mikael Abrahamsson @ 2015-04-27 11:07 UTC (permalink / raw)
To: Peter van Es; +Cc: linux-raid
> I booted Linux from a USB stick (which is on /dev/sdc1 hence changing the numbering),
> in recovery mode. Below is the output of /proc/mdstat and
> mdadm --examine. It looks like somehow the /dev/sdd2 and /dev/sde2 drives took on the
> super block of the /dev/md127 device (my swap file). May that have been done by the boot from
> the Ubuntu USB stick?
Your event counters are strange, 2 drives are showing 18014, and two
drives are showing event count of 26. Two drives show an update time of
the 26:th, two show update time on the 27:th of April. This doesn't make
much sense.
If I were you, I would try to make really really sure that I had unplugged
the drive that first went offline, then I would use "mdadm --assemble
--force <md> <component drives>" to get the array up in degraded mode, I
would then mount it read-only and try to copy the most important
information onto some other disk. After that you can try to add the new
drive you bought and let it re-sync. Most likely this will not work as you
most likely have read errors on at least one other drive. You can use
"smartctl" from "smartmontolls" to verify. Most likely you will have
"pending sectors" which are sectors that can't be read on at least one
other drive.
Also, I recommend you do this:
for x in /sys/block/sd[a-z] ; do
echo 180 > $x/device/timeout
done
echo 4096 > /sys/block/md0/md/stripe_cache_size
Change md0 above to your md-device. This will increase your kernel
timeouts and lessen the risk that drives will be considered dead when they
are only having problems reading a block.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 7+ messages in thread
* Help needed recovering from raid failure
@ 2015-04-27 9:35 Peter van Es
2015-04-27 11:07 ` Mikael Abrahamsson
2015-04-28 22:26 ` NeilBrown
0 siblings, 2 replies; 7+ messages in thread
From: Peter van Es @ 2015-04-27 9:35 UTC (permalink / raw)
To: linux-raid
Sorry for the long post...
I am running Ubuntu LTS 14.04.02 Server edition, 64 bits, with 4x 2.0TB drives in a raid-5 array.
The 4th drive was beginning to show read errors. Because it was weekend, I could not go out
and buy a spare 2TB drive to replace the one that was beginning to fail.
I first got a fail event:
This is an automatically generated mail message from mdadm
running on bali
A Fail event had been detected on md device /dev/md/1.
It could be related to component device /dev/sdd2.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
And then subsequently, around 18 hours later:
This is an automatically generated mail message from mdadm
running on bali
A DegradedArray event had been detected on md device /dev/md/1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
The server had taken the array off line at that point.
Needless to say, I can't boot the system anymore as the boot drive is /dev/md0, and GRUB can't
get at it. I do need to recover data (I know, but there's stuf on there I have no backup for--yet).
I booted Linux from a USB stick (which is on /dev/sdc1 hence changing the numbering),
in recovery mode. Below is the output of /proc/mdstat and
mdadm --examine. It looks like somehow the /dev/sdd2 and /dev/sde2 drives took on the
super block of the /dev/md127 device (my swap file). May that have been done by the boot from
the Ubuntu USB stick?
My plan... assemble a degraded array, with /dev/sde2 (the 4th drive, formerly known as /dev/sdd2) not in it.
Because the fail event put the file system in RO mode, I expect /dev/sdd2 (formerly /dev/sdc2) to be ok.
Then insert new 2TB drive in slot 4. Let system resync and recover.
I'm running xfs on the /dev/md1 device.
Questions:
1. is this the wise course of action ?
2. how exactly do I reassemble the array (/etc/mdadm.conf is inaccessible in recovery mode)
3. what command line options do I use exactly from the --examine output below without screwing things up
And help or pointers gratefully accepted
Peter van Es
/proc/mdstat (in recovery)
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md126 : inactive sdb2[1](S) sda2[0](S)
3902861312 blocks super 1.2
md127 : active raid5 sde2[5](S) sde1[3] sdb1[1] sda1[0] sdd1[2] sdd2[4](S)
5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
mdadm --examine /dev/sd[abde]2
/dev/sda2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
Name : ubuntu:1 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:58 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 713e556d:ca104217:785db68a:d820a57b
Update Time : Sun Apr 26 05:59:13 2015
Checksum : fda151f9 - correct
Events : 18014
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AA.. ('A' == active, '.' == missing)
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
Name : ubuntu:1 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:58 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f1e79609:79b7ac23:55197f70:e8fbfd58
Update Time : Sun Apr 26 05:59:13 2015
Checksum : 696f4e76 - correct
Events : 18014
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA.. ('A' == active, '.' == missing)
/dev/sdd2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 0f3f2b91:09cbb344:e52c4c4b:722d65c4
Update Time : Mon Apr 27 08:37:15 2015
Checksum : 7e241855 - correct
Events : 26
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Wed Apr 1 22:27:42 2015
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
Array Size : 5850624 (5.58 GiB 5.99 GB)
Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : cdae3287:91168194:942ba99d:1a85c466
Update Time : Mon Apr 27 08:37:15 2015
Checksum : b8b529f3 - correct
Events : 26
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-05-01 2:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-29 18:17 Help needed recovering from raid failure Peter van Es
2015-04-29 23:27 ` NeilBrown
2015-04-30 19:25 ` Peter van Es
2015-05-01 2:31 ` NeilBrown
-- strict thread matches above, loose matches on Subject: below --
2015-04-27 9:35 Peter van Es
2015-04-27 11:07 ` Mikael Abrahamsson
2015-04-28 22:26 ` NeilBrown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.