* failed raid re-create changed dev size
@ 2012-12-10 23:37 Andris Berzins
2012-12-11 8:15 ` Mikael Abrahamsson
0 siblings, 1 reply; 10+ messages in thread
From: Andris Berzins @ 2012-12-10 23:37 UTC (permalink / raw)
To: linux-raid
Hello,
I have raid5 with 6 devices. One drive died, however, I hot-removed the wrong one, so raid failed.
I found on internet that this can be fixed if I re-create raid with the same configuration, but with dead drive set as 'missing'.
I saved --examinate data for all drives, and re run raid creation:
mdadm --verbose --create /dev/md1 --level=5 --raid-devices=6 /dev/sdb missing /dev/sdc /dev/sdg /dev/sdf /dev/sde
However, I can not mount md1 and looks like it contains random data. The order is correct. However, I noticed that new raid device size is different from old one.
Old working raid --examine:
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 39f653e1:515c53b4:a88a1588:0bf0fd07
Name : spire:1 (local to host spire)
Creation Time : Fri Jun 29 01:59:00 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
Array Size : 14651325440 (13972.59 GiB 15002.96 GB)
Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 93e652d6:35c69c71:501601cd:7e640042
Update Time : Sun Dec 9 19:50:50 2012
Checksum : f30bd58d - correct
Events : 128287
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : ..A.AA ('A' == active, '.' == missing)
Re-created raid examine:
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 39f653e1:515c53b4:a88a1588:0bf0fd07
Name : spire:1 (local to host spire)
Creation Time : Mon Dec 10 21:04:01 2012
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
Array Size : 14650675200 (13971.97 GiB 15002.29 GB)
Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 4d0fe5bc:c05ec2c4:68805eab:bed7ef9c
Update Time : Mon Dec 10 21:04:39 2012
Checksum : 676939cb - correct
Events : 4
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.AAAA ('A' == active, '.' == missing)
So "Avail Dev Size" differs with re-created one being smaller for ~100MB.
What could be the reason why I can't mount re-created raid?
Idea: smartctl on one of drives shows 17 reallocated sectors:
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 17
Could this be reason why re-created raid is smaller?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-10 23:37 failed raid re-create changed dev size Andris Berzins
@ 2012-12-11 8:15 ` Mikael Abrahamsson
2012-12-11 9:01 ` Andris Berzins
0 siblings, 1 reply; 10+ messages in thread
From: Mikael Abrahamsson @ 2012-12-11 8:15 UTC (permalink / raw)
To: Andris Berzins; +Cc: linux-raid
On Tue, 11 Dec 2012, Andris Berzins wrote:
> I found on internet that this can be fixed if I re-create raid with the same configuration, but with dead drive set as 'missing'.
This is extremely dangerous. You would have been better off by first
trying --assemble --force.
> However, I can not mount md1 and looks like it contains random data. The
> order is correct. However, I noticed that new raid device size is
> different from old one.
Your data offset is different, most likely you're not using the same mdadm
version as was initially used to create the raid. Mdadm defaults have
changed over time regarding chink size, data offset and others.
I don't know off the top of my head how to change the mdadm data offset
unfortunately.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-11 8:15 ` Mikael Abrahamsson
@ 2012-12-11 9:01 ` Andris Berzins
2012-12-11 9:27 ` Robin Hill
0 siblings, 1 reply; 10+ messages in thread
From: Andris Berzins @ 2012-12-11 9:01 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: linux-raid
Quoting "Mikael Abrahamsson" <swmike@swm.pp.se>:
> On Tue, 11 Dec 2012, Andris Berzins wrote:
>
>> I found on internet that this can be fixed if I re-create raid with the same
>> configuration, but with dead drive set as 'missing'.
>
> This is extremely dangerous. You would have been better off by first
> trying --assemble --force.
>
>> However, I can not mount md1 and looks like it contains random data. The
>> order is correct. However, I noticed that new raid device size is
>> different from old one.
>
> Your data offset is different, most likely you're not using the same mdadm
> version as was initially used to create the raid. Mdadm defaults have
> changed over time regarding chink size, data offset and others.
Thank you!
I did not notice that.
Looks that I will have to try downgrade kernel:
http://serverfault.com/questions/427683/what-parameters-to-mdadm-to-re-create-md-device-with-payload-starting-at-
0x2200
>
> I don't know off the top of my head how to change the mdadm data offset
> unfortunately.
>
> --
> Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-11 9:01 ` Andris Berzins
@ 2012-12-11 9:27 ` Robin Hill
2012-12-11 14:35 ` Andris Berzins
0 siblings, 1 reply; 10+ messages in thread
From: Robin Hill @ 2012-12-11 9:27 UTC (permalink / raw)
To: Andris Berzins; +Cc: Mikael Abrahamsson, linux-raid
[-- Attachment #1: Type: text/plain, Size: 2365 bytes --]
On Tue Dec 11, 2012 at 11:01:35AM +0200, Andris Berzins wrote:
> Quoting "Mikael Abrahamsson" <swmike@swm.pp.se>:
> > On Tue, 11 Dec 2012, Andris Berzins wrote:
> >
> >> I found on internet that this can be fixed if I re-create raid with the same
> >> configuration, but with dead drive set as 'missing'.
> >
> > This is extremely dangerous. You would have been better off by first
> > trying --assemble --force.
> >
> >> However, I can not mount md1 and looks like it contains random data. The
> >> order is correct. However, I noticed that new raid device size is
> >> different from old one.
> >
> > Your data offset is different, most likely you're not using the same mdadm
> > version as was initially used to create the raid. Mdadm defaults have
> > changed over time regarding chink size, data offset and others.
>
> Thank you!
> I did not notice that.
> Looks that I will have to try downgrade kernel:
> http://serverfault.com/questions/427683/what-parameters-to-mdadm-to-re-create-md-device-with-payload-starting-at-
> 0x2200
>
No, that's an entirely unrelated issue (though if you are running one of
the affected kernel versions you really ought to up/downgrade it).
>
> >
> > I don't know off the top of my head how to change the mdadm data offset
> > unfortunately.
> >
There's no way in the current mdadm releases to do so. Neil does have a
version in git which will allow specifying the data offsets on a
disk-by-disk basis (as it is possible to have multiple offsets within an
array). If all the disks in your array were using an offset of 2048
though, then you'll just need to downgrade mdadm to 3.2.3 (or slightly
earlier) - from what I can find it was changed to 262144 for 3.2.4 (it
was 272 pre version 3.0, so don't go any further back than that).
You will likely have lost some data though, as the new superblocks will
have overwritten part of the data, so make sure you run a fsck
afterwards (start with fsck -n though, to make sure the overall array
looks okay first).
Good luck (and please make sure you try just forcing an assemble next
time),
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-11 9:27 ` Robin Hill
@ 2012-12-11 14:35 ` Andris Berzins
2012-12-11 14:49 ` Robin Hill
2012-12-12 1:04 ` Brad Campbell
0 siblings, 2 replies; 10+ messages in thread
From: Andris Berzins @ 2012-12-11 14:35 UTC (permalink / raw)
To: Robin Hill; +Cc: Mikael Abrahamsson, linux-raid
Quoting "Robin Hill" <robin@robinhill.me.uk>:
>>> I don't know off the top of my head how to change the mdadm data offset
>>> unfortunately.
>>>
> There's no way in the current mdadm releases to do so. Neil does have a
> version in git which will allow specifying the data offsets on a
> disk-by-disk basis (as it is possible to have multiple offsets within an
> array). If all the disks in your array were using an offset of 2048
> though, then you'll just need to downgrade mdadm to 3.2.3 (or slightly
> earlier) - from what I can find it was changed to 262144 for 3.2.4 (it
> was 272 pre version 3.0, so don't go any further back than that).
Downgraded to mdadm version 3.2.3 and re-created array successfully!
>
> You will likely have lost some data though, as the new superblocks will
> have overwritten part of the data, so make sure you run a fsck
> afterwards (start with fsck -n though, to make sure the overall array
> looks okay first).
Is it possible that no data was damaged? It is LUKS partition, i mapped it and run "fsck -n" on underlying ext3 partition,
but fsck returned immediately with status "clean".
> Good luck (and please make sure you try just forcing an assemble next
> time),
I will try to contact these blog authors, who suggested to re-create failed array instead of forcing an assemble, to make a
notice on their blogs.
> Robin
> --
> ___
> ( ' } | Robin Hill <robin@robinhill.me.uk> |
> / / ) | Little Jim says .... |
> // !! | "He fallen in de water !!" |
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-11 14:35 ` Andris Berzins
@ 2012-12-11 14:49 ` Robin Hill
2012-12-12 16:10 ` Andris Berzins
2012-12-12 1:04 ` Brad Campbell
1 sibling, 1 reply; 10+ messages in thread
From: Robin Hill @ 2012-12-11 14:49 UTC (permalink / raw)
To: Andris Berzins; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 967 bytes --]
On Tue Dec 11, 2012 at 04:35:15PM +0200, Andris Berzins wrote:
> Quoting "Robin Hill" <robin@robinhill.me.uk>:
> >
> > You will likely have lost some data though, as the new superblocks will
> > have overwritten part of the data, so make sure you run a fsck
> > afterwards (start with fsck -n though, to make sure the overall array
> > looks okay first).
>
> Is it possible that no data was damaged? It is LUKS partition, i
> mapped it and run "fsck -n" on underlying ext3 partition,
> but fsck returned immediately with status "clean".
>
By default fsck will just check whether the filesystem is marked as
dirty/clean and just skip running if it's clean. You'll need to use "-f"
to force it to run.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-11 14:35 ` Andris Berzins
2012-12-11 14:49 ` Robin Hill
@ 2012-12-12 1:04 ` Brad Campbell
1 sibling, 0 replies; 10+ messages in thread
From: Brad Campbell @ 2012-12-12 1:04 UTC (permalink / raw)
To: linux-raid
On 11/12/12 22:35, Andris Berzins wrote:
> I will try to contact these blog authors, who suggested to re-create failed array instead of forcing an assemble, to make a
> notice on their blogs.
>
If you manage that you'll halve the traffic on this list!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-11 14:49 ` Robin Hill
@ 2012-12-12 16:10 ` Andris Berzins
2012-12-13 9:30 ` Robin Hill
0 siblings, 1 reply; 10+ messages in thread
From: Andris Berzins @ 2012-12-12 16:10 UTC (permalink / raw)
To: Robin Hill; +Cc: linux-raid
>> Is it possible that no data was damaged? It is LUKS partition, i
>> mapped it and run "fsck -n" on underlying ext3 partition,
>> but fsck returned immediately with status "clean".
>>
> By default fsck will just check whether the filesystem is marked as
> dirty/clean and just skip running if it's clean. You'll need to use "-f"
> to force it to run.
It seems that something got damaged. I have several traces as shown below in dmesg.
Tried to run "fsck -f -n" but it looks that it will take several month on this 15TB fs with billion files.
Any ideas?
[151680.304424] INFO: task mv:11190 blocked for more than 120 seconds.
[151680.304426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[151680.304429] mv D ffffffff81806200 0 11190 8887 0x00000000
[151680.304434] ffff8800b6f79558 0000000000000086 ffff8800b6f79518 ffff8800b5f3f580
[151680.304439] ffff8800b6f79fd8 ffff8800b6f79fd8 ffff8800b6f79fd8 00000000000137c0
[151680.304444] ffff8800ba319700 ffff8800b6fd4500 ffff8800b6f79528 ffff8800bfc94080
[151680.304450] Call Trace:
[151680.304454] [<ffffffff811a8db0>] ? __wait_on_buffer+0x30/0x30
[151680.304459] [<ffffffff816590ff>] schedule+0x3f/0x60
[151680.304463] [<ffffffff816591af>] io_schedule+0x8f/0xd0
[151680.304466] [<ffffffff811a8dbe>] sleep_on_buffer+0xe/0x20
[151680.304471] [<ffffffff816599cf>] __wait_on_bit+0x5f/0x90
[151680.304475] [<ffffffff812f0fa8>] ? generic_make_request+0x68/0x70
[151680.304479] [<ffffffff811a8db0>] ? __wait_on_buffer+0x30/0x30
[151680.304484] [<ffffffff81659a7c>] out_of_line_wait_on_bit+0x7c/0x90
[151680.304488] [<ffffffff8108acc0>] ? autoremove_wake_function+0x40/0x40
[151680.304492] [<ffffffff811a8dae>] __wait_on_buffer+0x2e/0x30
[151680.304496] [<ffffffff811a9e58>] bh_submit_read+0x68/0x80
[151680.304500] [<ffffffff811f19fe>] read_block_bitmap+0xde/0x150
[151680.304505] [<ffffffff8125686b>] ? do_get_write_access+0x34b/0x4d0
[151680.304509] [<ffffffff811f31ef>] ext3_new_blocks+0x29f/0x710
[151680.304514] [<ffffffff811f5fc7>] ext3_alloc_blocks+0x57/0xf0
[151680.304519] [<ffffffff811f6636>] ext3_alloc_branch+0x56/0x2d0
[151680.304522] [<ffffffff811a9ad3>] ? __getblk+0x33/0x70
[151680.304527] [<ffffffff811f63cb>] ? ext3_get_branch+0x8b/0x150
[151680.304531] [<ffffffff811f970f>] ext3_get_blocks_handle+0x2ef/0x640
[151680.304535] [<ffffffff811f9b24>] ext3_get_block+0xc4/0x120
[151680.304539] [<ffffffff8165b00e>] ? _raw_spin_lock+0xe/0x20
[151680.304544] [<ffffffff811ab7ae>] __block_write_begin+0x1ce/0x520
[151680.304548] [<ffffffff811f9a60>] ? ext3_get_blocks_handle+0x640/0x640
[151680.304553] [<ffffffff81117f98>] ? grab_cache_page_write_begin+0x78/0xe0
[151680.304557] [<ffffffff811f8e73>] ext3_write_begin+0xc3/0x280
[151680.304562] [<ffffffff8111757a>] generic_perform_write+0xca/0x210
[151680.304566] [<ffffffff816591cb>] ? io_schedule+0xab/0xd0
[151680.304571] [<ffffffff8111771d>] generic_file_buffered_write+0x5d/0x90
[151680.304576] [<ffffffff81119139>] __generic_file_aio_write+0x229/0x440
[151680.304580] [<ffffffff811193c2>] generic_file_aio_write+0x72/0xe0
[151680.304585] [<ffffffff8117792a>] do_sync_write+0xda/0x120
[151680.304590] [<ffffffff812d7f88>] ? apparmor_file_permission+0x18/0x20
[151680.304594] [<ffffffff8129d6ec>] ? security_file_permission+0x2c/0xb0
[151680.304598] [<ffffffff81177ed1>] ? rw_verify_area+0x61/0xf0
[151680.304602] [<ffffffff81178233>] vfs_write+0xb3/0x180
[151680.304606] [<ffffffff8117855a>] sys_write+0x4a/0x90
[151680.304610] [<ffffffff81663602>] system_call_fastpath+0x16/0x1b
>
> Cheers,
> Robin
> --
> ___
> ( ' } | Robin Hill <robin@robinhill.me.uk> |
> / / ) | Little Jim says .... |
> // !! | "He fallen in de water !!" |
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-12 16:10 ` Andris Berzins
@ 2012-12-13 9:30 ` Robin Hill
2012-12-13 16:13 ` Andris Berzins
0 siblings, 1 reply; 10+ messages in thread
From: Robin Hill @ 2012-12-13 9:30 UTC (permalink / raw)
To: Andris Berzins; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]
On Wed Dec 12, 2012 at 06:10:59PM +0200, Andris Berzins wrote:
> >> Is it possible that no data was damaged? It is LUKS partition, i
> >> mapped it and run "fsck -n" on underlying ext3 partition,
> >> but fsck returned immediately with status "clean".
> >>
> > By default fsck will just check whether the filesystem is marked as
> > dirty/clean and just skip running if it's clean. You'll need to use "-f"
> > to force it to run.
>
> It seems that something got damaged. I have several traces as shown
> below in dmesg.
>
> Tried to run "fsck -f -n" but it looks that it will take several month
> on this 15TB fs with billion files.
> Any ideas?
>
Sorry, no. You've got a corrupted filesystem and fsck is the tool to fix
that. If you're certain that the array is now set up correctly (which it
probably is if LUKS is able to map it okay), then you can skip the "-n"
pass and proceed straight to repair. Depending on memory, you may also
want to look into setting up scratch_files in e2fsck.conf as it can suck
up a lot of memory for large filesystems. You may also want to look into
moving to ext4 once you've got the filesystem fixed - fsck times should
be much lower than with ext3.
The only other option would be to reformat and restore from backup.
Good luck,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: failed raid re-create changed dev size
2012-12-13 9:30 ` Robin Hill
@ 2012-12-13 16:13 ` Andris Berzins
0 siblings, 0 replies; 10+ messages in thread
From: Andris Berzins @ 2012-12-13 16:13 UTC (permalink / raw)
To: Robin Hill; +Cc: linux-raid
Quoting "Robin Hill" <robin@robinhill.me.uk>:
> On Wed Dec 12, 2012 at 06:10:59PM +0200, Andris Berzins wrote:
>
>>>> Is it possible that no data was damaged? It is LUKS partition, i
>>>> mapped it and run "fsck -n" on underlying ext3 partition,
>>>> but fsck returned immediately with status "clean".
>>>>
>>> By default fsck will just check whether the filesystem is marked as
>>> dirty/clean and just skip running if it's clean. You'll need to use "-f"
>>> to force it to run.
>>
>> It seems that something got damaged. I have several traces as shown
>> below in dmesg.
>>
>> Tried to run "fsck -f -n" but it looks that it will take several month
>> on this 15TB fs with billion files.
>> Any ideas?
>>
> Sorry, no. You've got a corrupted filesystem and fsck is the tool to fix
> that. If you're certain that the array is now set up correctly (which it
> probably is if LUKS is able to map it okay), then you can skip the "-n"
> pass and proceed straight to repair. Depending on memory, you may also
> want to look into setting up scratch_files in e2fsck.conf as it can suck
> up a lot of memory for large filesystems. You may also want to look into
> moving to ext4 once you've got the filesystem fixed - fsck times should
> be much lower than with ext3.
Thank you for suggestions!
fsck finished sooner than I thought. :)
Very interesting. Turns out that this raid recreation with wrong offset did not damage underlying file system?
# fsck -f -n /dev/mapper/data
fsck from util-linux 2.20.1
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/data: 121432484/457854976 files (0.1% non-contiguous), 2827329571/3662830720 blocks
>
> The only other option would be to reformat and restore from backup.
>
> Good luck,
> Robin
> --
> ___
> ( ' } | Robin Hill <robin@robinhill.me.uk> |
> / / ) | Little Jim says .... |
> // !! | "He fallen in de water !!" |
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-12-13 16:13 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-10 23:37 failed raid re-create changed dev size Andris Berzins
2012-12-11 8:15 ` Mikael Abrahamsson
2012-12-11 9:01 ` Andris Berzins
2012-12-11 9:27 ` Robin Hill
2012-12-11 14:35 ` Andris Berzins
2012-12-11 14:49 ` Robin Hill
2012-12-12 16:10 ` Andris Berzins
2012-12-13 9:30 ` Robin Hill
2012-12-13 16:13 ` Andris Berzins
2012-12-12 1:04 ` Brad Campbell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.