All of lore.kernel.org
 help / color / mirror / Atom feed
* failed raid re-create changed dev size
@ 2012-12-10 23:37 Andris Berzins
  2012-12-11  8:15 ` Mikael Abrahamsson
  0 siblings, 1 reply; 10+ messages in thread
From: Andris Berzins @ 2012-12-10 23:37 UTC (permalink / raw)
  To: linux-raid

Hello,

I have raid5 with 6 devices. One drive died, however, I hot-removed the wrong one, so raid failed.

I found on internet that this can be fixed if I re-create raid with the same configuration, but with dead drive set as 'missing'.
I saved --examinate data for all drives, and re run raid creation:

mdadm --verbose --create /dev/md1 --level=5 --raid-devices=6 /dev/sdb missing /dev/sdc /dev/sdg /dev/sdf /dev/sde

However, I can not mount md1 and looks like it contains random data. The order is correct. However, I noticed that new raid device size is different from old one.

Old working raid --examine:
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 39f653e1:515c53b4:a88a1588:0bf0fd07
           Name : spire:1  (local to host spire)
  Creation Time : Fri Jun 29 01:59:00 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 14651325440 (13972.59 GiB 15002.96 GB)
  Used Dev Size : 5860530176 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 93e652d6:35c69c71:501601cd:7e640042

    Update Time : Sun Dec  9 19:50:50 2012
       Checksum : f30bd58d - correct
         Events : 128287

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : ..A.AA ('A' == active, '.' == missing)


Re-created raid examine:
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 39f653e1:515c53b4:a88a1588:0bf0fd07
           Name : spire:1  (local to host spire)
  Creation Time : Mon Dec 10 21:04:01 2012
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 14650675200 (13971.97 GiB 15002.29 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 4d0fe5bc:c05ec2c4:68805eab:bed7ef9c

    Update Time : Mon Dec 10 21:04:39 2012
       Checksum : 676939cb - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.AAAA ('A' == active, '.' == missing)


So "Avail Dev Size" differs with re-created one being smaller for ~100MB.
What could be the reason why I can't mount re-created raid?


Idea: smartctl on one of drives shows 17 reallocated sectors:
  5 Reallocated_Sector_Ct   0x0033   199   199   140    Pre-fail  Always       -       17

Could this be reason why re-created raid is smaller?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-10 23:37 failed raid re-create changed dev size Andris Berzins
@ 2012-12-11  8:15 ` Mikael Abrahamsson
  2012-12-11  9:01   ` Andris Berzins
  0 siblings, 1 reply; 10+ messages in thread
From: Mikael Abrahamsson @ 2012-12-11  8:15 UTC (permalink / raw)
  To: Andris Berzins; +Cc: linux-raid

On Tue, 11 Dec 2012, Andris Berzins wrote:

> I found on internet that this can be fixed if I re-create raid with the same configuration, but with dead drive set as 'missing'.

This is extremely dangerous. You would have been better off by first 
trying --assemble --force.

> However, I can not mount md1 and looks like it contains random data. The 
> order is correct. However, I noticed that new raid device size is 
> different from old one.

Your data offset is different, most likely you're not using the same mdadm 
version as was initially used to create the raid. Mdadm defaults have 
changed over time regarding chink size, data offset and others.

I don't know off the top of my head how to change the mdadm data offset 
unfortunately.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-11  8:15 ` Mikael Abrahamsson
@ 2012-12-11  9:01   ` Andris Berzins
  2012-12-11  9:27     ` Robin Hill
  0 siblings, 1 reply; 10+ messages in thread
From: Andris Berzins @ 2012-12-11  9:01 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

Quoting "Mikael Abrahamsson" <swmike@swm.pp.se>:
> On Tue, 11 Dec 2012, Andris Berzins wrote:
> 
>> I found on internet that this can be fixed if I re-create raid with the same
>> configuration, but with dead drive set as 'missing'.
> 
> This is extremely dangerous. You would have been better off by first
> trying --assemble --force.
> 
>> However, I can not mount md1 and looks like it contains random data. The
>> order is correct. However, I noticed that new raid device size is
>> different from old one.
> 
> Your data offset is different, most likely you're not using the same mdadm
> version as was initially used to create the raid. Mdadm defaults have
> changed over time regarding chink size, data offset and others.

Thank you!
I did not notice that.
Looks that I will have to try downgrade kernel:
http://serverfault.com/questions/427683/what-parameters-to-mdadm-to-re-create-md-device-with-payload-starting-at-
0x2200


> 
> I don't know off the top of my head how to change the mdadm data offset
> unfortunately.
> 
> --
> Mikael Abrahamsson    email: swmike@swm.pp.se


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-11  9:01   ` Andris Berzins
@ 2012-12-11  9:27     ` Robin Hill
  2012-12-11 14:35       ` Andris Berzins
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Hill @ 2012-12-11  9:27 UTC (permalink / raw)
  To: Andris Berzins; +Cc: Mikael Abrahamsson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2365 bytes --]

On Tue Dec 11, 2012 at 11:01:35AM +0200, Andris Berzins wrote:

> Quoting "Mikael Abrahamsson" <swmike@swm.pp.se>:
> > On Tue, 11 Dec 2012, Andris Berzins wrote:
> > 
> >> I found on internet that this can be fixed if I re-create raid with the same
> >> configuration, but with dead drive set as 'missing'.
> > 
> > This is extremely dangerous. You would have been better off by first
> > trying --assemble --force.
> > 
> >> However, I can not mount md1 and looks like it contains random data. The
> >> order is correct. However, I noticed that new raid device size is
> >> different from old one.
> > 
> > Your data offset is different, most likely you're not using the same mdadm
> > version as was initially used to create the raid. Mdadm defaults have
> > changed over time regarding chink size, data offset and others.
> 
> Thank you!
> I did not notice that.
> Looks that I will have to try downgrade kernel:
> http://serverfault.com/questions/427683/what-parameters-to-mdadm-to-re-create-md-device-with-payload-starting-at-
> 0x2200
> 
No, that's an entirely unrelated issue (though if you are running one of
the affected kernel versions you really ought to up/downgrade it).

> 
> > 
> > I don't know off the top of my head how to change the mdadm data offset
> > unfortunately.
> > 
There's no way in the current mdadm releases to do so. Neil does have a
version in git which will allow specifying the data offsets on a
disk-by-disk basis (as it is possible to have multiple offsets within an
array). If all the disks in your array were using an offset of 2048
though, then you'll just need to downgrade mdadm to 3.2.3 (or slightly
earlier) - from what I can find it was changed to 262144 for 3.2.4 (it
was 272 pre version 3.0, so don't go any further back than that).

You will likely have lost some data though, as the new superblocks will
have overwritten part of the data, so make sure you run a fsck
afterwards (start with fsck -n though, to make sure the overall array
looks okay first).

Good luck (and please make sure you try just forcing an assemble next
time),
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-11  9:27     ` Robin Hill
@ 2012-12-11 14:35       ` Andris Berzins
  2012-12-11 14:49         ` Robin Hill
  2012-12-12  1:04         ` Brad Campbell
  0 siblings, 2 replies; 10+ messages in thread
From: Andris Berzins @ 2012-12-11 14:35 UTC (permalink / raw)
  To: Robin Hill; +Cc: Mikael Abrahamsson, linux-raid

Quoting "Robin Hill" <robin@robinhill.me.uk>:
>>> I don't know off the top of my head how to change the mdadm data offset
>>> unfortunately.
>>> 
> There's no way in the current mdadm releases to do so. Neil does have a
> version in git which will allow specifying the data offsets on a
> disk-by-disk basis (as it is possible to have multiple offsets within an
> array). If all the disks in your array were using an offset of 2048
> though, then you'll just need to downgrade mdadm to 3.2.3 (or slightly
> earlier) - from what I can find it was changed to 262144 for 3.2.4 (it
> was 272 pre version 3.0, so don't go any further back than that).

Downgraded to mdadm version 3.2.3 and re-created array successfully!

> 
> You will likely have lost some data though, as the new superblocks will
> have overwritten part of the data, so make sure you run a fsck
> afterwards (start with fsck -n though, to make sure the overall array
> looks okay first).

Is it possible that no data was damaged? It is LUKS partition, i mapped it and run "fsck -n" on underlying ext3 partition, 
but fsck returned immediately with status "clean".


> Good luck (and please make sure you try just forcing an assemble next
> time),

I will try to contact these blog authors, who suggested to re-create failed array instead of forcing an assemble, to make a 
notice on their blogs.


> Robin
> --
> ___
> ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
> / / )      | Little Jim says ....                            |
> // !!       |      "He fallen in de water !!"                 |


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-11 14:35       ` Andris Berzins
@ 2012-12-11 14:49         ` Robin Hill
  2012-12-12 16:10           ` Andris Berzins
  2012-12-12  1:04         ` Brad Campbell
  1 sibling, 1 reply; 10+ messages in thread
From: Robin Hill @ 2012-12-11 14:49 UTC (permalink / raw)
  To: Andris Berzins; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

On Tue Dec 11, 2012 at 04:35:15PM +0200, Andris Berzins wrote:

> Quoting "Robin Hill" <robin@robinhill.me.uk>:
> > 
> > You will likely have lost some data though, as the new superblocks will
> > have overwritten part of the data, so make sure you run a fsck
> > afterwards (start with fsck -n though, to make sure the overall array
> > looks okay first).
> 
> Is it possible that no data was damaged? It is LUKS partition, i
> mapped it and run "fsck -n" on underlying ext3 partition, 
> but fsck returned immediately with status "clean".
> 
By default fsck will just check whether the filesystem is marked as
dirty/clean and just skip running if it's clean. You'll need to use "-f"
to force it to run.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-11 14:35       ` Andris Berzins
  2012-12-11 14:49         ` Robin Hill
@ 2012-12-12  1:04         ` Brad Campbell
  1 sibling, 0 replies; 10+ messages in thread
From: Brad Campbell @ 2012-12-12  1:04 UTC (permalink / raw)
  To: linux-raid

On 11/12/12 22:35, Andris Berzins wrote:

> I will try to contact these blog authors, who suggested to re-create failed array instead of forcing an assemble, to make a
> notice on their blogs.
>

If you manage that you'll halve the traffic on this list!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-11 14:49         ` Robin Hill
@ 2012-12-12 16:10           ` Andris Berzins
  2012-12-13  9:30             ` Robin Hill
  0 siblings, 1 reply; 10+ messages in thread
From: Andris Berzins @ 2012-12-12 16:10 UTC (permalink / raw)
  To: Robin Hill; +Cc: linux-raid

>> Is it possible that no data was damaged? It is LUKS partition, i
>> mapped it and run "fsck -n" on underlying ext3 partition,
>> but fsck returned immediately with status "clean".
>> 
> By default fsck will just check whether the filesystem is marked as
> dirty/clean and just skip running if it's clean. You'll need to use "-f"
> to force it to run.

It seems that something got damaged. I have several traces as shown below in dmesg.

Tried to run "fsck -f -n" but it looks that it will take several month on this 15TB fs with billion files.
Any ideas?

[151680.304424] INFO: task mv:11190 blocked for more than 120 seconds.
[151680.304426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[151680.304429] mv              D ffffffff81806200     0 11190   8887 0x00000000
[151680.304434]  ffff8800b6f79558 0000000000000086 ffff8800b6f79518 ffff8800b5f3f580
[151680.304439]  ffff8800b6f79fd8 ffff8800b6f79fd8 ffff8800b6f79fd8 00000000000137c0
[151680.304444]  ffff8800ba319700 ffff8800b6fd4500 ffff8800b6f79528 ffff8800bfc94080
[151680.304450] Call Trace:
[151680.304454]  [<ffffffff811a8db0>] ? __wait_on_buffer+0x30/0x30
[151680.304459]  [<ffffffff816590ff>] schedule+0x3f/0x60
[151680.304463]  [<ffffffff816591af>] io_schedule+0x8f/0xd0
[151680.304466]  [<ffffffff811a8dbe>] sleep_on_buffer+0xe/0x20
[151680.304471]  [<ffffffff816599cf>] __wait_on_bit+0x5f/0x90
[151680.304475]  [<ffffffff812f0fa8>] ? generic_make_request+0x68/0x70
[151680.304479]  [<ffffffff811a8db0>] ? __wait_on_buffer+0x30/0x30
[151680.304484]  [<ffffffff81659a7c>] out_of_line_wait_on_bit+0x7c/0x90
[151680.304488]  [<ffffffff8108acc0>] ? autoremove_wake_function+0x40/0x40
[151680.304492]  [<ffffffff811a8dae>] __wait_on_buffer+0x2e/0x30
[151680.304496]  [<ffffffff811a9e58>] bh_submit_read+0x68/0x80
[151680.304500]  [<ffffffff811f19fe>] read_block_bitmap+0xde/0x150
[151680.304505]  [<ffffffff8125686b>] ? do_get_write_access+0x34b/0x4d0
[151680.304509]  [<ffffffff811f31ef>] ext3_new_blocks+0x29f/0x710
[151680.304514]  [<ffffffff811f5fc7>] ext3_alloc_blocks+0x57/0xf0
[151680.304519]  [<ffffffff811f6636>] ext3_alloc_branch+0x56/0x2d0
[151680.304522]  [<ffffffff811a9ad3>] ? __getblk+0x33/0x70
[151680.304527]  [<ffffffff811f63cb>] ? ext3_get_branch+0x8b/0x150
[151680.304531]  [<ffffffff811f970f>] ext3_get_blocks_handle+0x2ef/0x640
[151680.304535]  [<ffffffff811f9b24>] ext3_get_block+0xc4/0x120
[151680.304539]  [<ffffffff8165b00e>] ? _raw_spin_lock+0xe/0x20
[151680.304544]  [<ffffffff811ab7ae>] __block_write_begin+0x1ce/0x520
[151680.304548]  [<ffffffff811f9a60>] ? ext3_get_blocks_handle+0x640/0x640
[151680.304553]  [<ffffffff81117f98>] ? grab_cache_page_write_begin+0x78/0xe0
[151680.304557]  [<ffffffff811f8e73>] ext3_write_begin+0xc3/0x280
[151680.304562]  [<ffffffff8111757a>] generic_perform_write+0xca/0x210
[151680.304566]  [<ffffffff816591cb>] ? io_schedule+0xab/0xd0
[151680.304571]  [<ffffffff8111771d>] generic_file_buffered_write+0x5d/0x90
[151680.304576]  [<ffffffff81119139>] __generic_file_aio_write+0x229/0x440
[151680.304580]  [<ffffffff811193c2>] generic_file_aio_write+0x72/0xe0
[151680.304585]  [<ffffffff8117792a>] do_sync_write+0xda/0x120
[151680.304590]  [<ffffffff812d7f88>] ? apparmor_file_permission+0x18/0x20
[151680.304594]  [<ffffffff8129d6ec>] ? security_file_permission+0x2c/0xb0
[151680.304598]  [<ffffffff81177ed1>] ? rw_verify_area+0x61/0xf0
[151680.304602]  [<ffffffff81178233>] vfs_write+0xb3/0x180
[151680.304606]  [<ffffffff8117855a>] sys_write+0x4a/0x90
[151680.304610]  [<ffffffff81663602>] system_call_fastpath+0x16/0x1b



> 
> Cheers,
> Robin
> --
> ___
> ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
> / / )      | Little Jim says ....                            |
> // !!       |      "He fallen in de water !!"                 |


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-12 16:10           ` Andris Berzins
@ 2012-12-13  9:30             ` Robin Hill
  2012-12-13 16:13               ` Andris Berzins
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Hill @ 2012-12-13  9:30 UTC (permalink / raw)
  To: Andris Berzins; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

On Wed Dec 12, 2012 at 06:10:59PM +0200, Andris Berzins wrote:

> >> Is it possible that no data was damaged? It is LUKS partition, i
> >> mapped it and run "fsck -n" on underlying ext3 partition,
> >> but fsck returned immediately with status "clean".
> >> 
> > By default fsck will just check whether the filesystem is marked as
> > dirty/clean and just skip running if it's clean. You'll need to use "-f"
> > to force it to run.
> 
> It seems that something got damaged. I have several traces as shown
> below in dmesg.
> 
> Tried to run "fsck -f -n" but it looks that it will take several month
> on this 15TB fs with billion files.
> Any ideas?
> 
Sorry, no. You've got a corrupted filesystem and fsck is the tool to fix
that. If you're certain that the array is now set up correctly (which it
probably is if LUKS is able to map it okay), then you can skip the "-n"
pass and proceed straight to repair. Depending on memory, you may also
want to look into setting up scratch_files in e2fsck.conf as it can suck
up a lot of memory for large filesystems. You may also want to look into
moving to ext4 once you've got the filesystem fixed - fsck times should
be much lower than with ext3.

The only other option would be to reformat and restore from backup.

Good luck,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: failed raid re-create changed dev size
  2012-12-13  9:30             ` Robin Hill
@ 2012-12-13 16:13               ` Andris Berzins
  0 siblings, 0 replies; 10+ messages in thread
From: Andris Berzins @ 2012-12-13 16:13 UTC (permalink / raw)
  To: Robin Hill; +Cc: linux-raid

Quoting "Robin Hill" <robin@robinhill.me.uk>:
> On Wed Dec 12, 2012 at 06:10:59PM +0200, Andris Berzins wrote:
> 
>>>> Is it possible that no data was damaged? It is LUKS partition, i
>>>> mapped it and run "fsck -n" on underlying ext3 partition,
>>>> but fsck returned immediately with status "clean".
>>>> 
>>> By default fsck will just check whether the filesystem is marked as
>>> dirty/clean and just skip running if it's clean. You'll need to use "-f"
>>> to force it to run.
>> 
>> It seems that something got damaged. I have several traces as shown
>> below in dmesg.
>> 
>> Tried to run "fsck -f -n" but it looks that it will take several month
>> on this 15TB fs with billion files.
>> Any ideas?
>> 
> Sorry, no. You've got a corrupted filesystem and fsck is the tool to fix
> that. If you're certain that the array is now set up correctly (which it
> probably is if LUKS is able to map it okay), then you can skip the "-n"
> pass and proceed straight to repair. Depending on memory, you may also
> want to look into setting up scratch_files in e2fsck.conf as it can suck
> up a lot of memory for large filesystems. You may also want to look into
> moving to ext4 once you've got the filesystem fixed - fsck times should
> be much lower than with ext3.

Thank you for suggestions!
fsck finished sooner than I thought. :)
Very interesting. Turns out that this raid recreation with wrong offset did not damage underlying file system?

# fsck -f -n /dev/mapper/data
fsck from util-linux 2.20.1
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/data: 121432484/457854976 files (0.1% non-contiguous), 2827329571/3662830720 blocks




> 
> The only other option would be to reformat and restore from backup.
> 
> Good luck,
> Robin
> --
> ___
> ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
> / / )      | Little Jim says ....                            |
> // !!       |      "He fallen in de water !!"                 |


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-12-13 16:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-10 23:37 failed raid re-create changed dev size Andris Berzins
2012-12-11  8:15 ` Mikael Abrahamsson
2012-12-11  9:01   ` Andris Berzins
2012-12-11  9:27     ` Robin Hill
2012-12-11 14:35       ` Andris Berzins
2012-12-11 14:49         ` Robin Hill
2012-12-12 16:10           ` Andris Berzins
2012-12-13  9:30             ` Robin Hill
2012-12-13 16:13               ` Andris Berzins
2012-12-12  1:04         ` Brad Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.