* How to recover after md crash during reshape?
@ 2015-10-20 2:35 andras
2015-10-20 12:50 ` Anugraha Sinha
` (3 more replies)
0 siblings, 4 replies; 24+ messages in thread
From: andras @ 2015-10-20 2:35 UTC (permalink / raw)
To: linux-raid
Dear all,
I have a serious (to me) problem, and I'm seeking some pro advice in
recovering a RAID6 volume after a crash at the beginning of a reshape.
Thank you all in advance for any help!
The details:
I'm running Debian.
uname -r says:
kernel 3.2.0-4-amd64
dmsg says:
Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org)
(gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.68-1+deb7u3
mdadm -v says:
mdadm - v3.2.5 - 18th May 2012
I used to have a RAID6 volume with 7 disks on it. I've recently bought
another 3 new HDD-s and was trying to add them to the array.
I've put them in the machine (hot-plug), partitioned them then did:
mdadm --add /dev/md1 /dev/sdh1 /dev/sdi1 /dev/sdj1
This worked fine, /proc/mdstat showed them as three spares. Then I did:
mdadm --grow --raid-devices=10 /dev/md1
Yes, I was dumb enough to start the process without a backup option -
(copy-paste error from https://raid.wiki.kernel.org/index.php/Growing).
This immediately (well, after 2 seconds) crashed the MD driver:
Oct 17 17:30:27 bazsalikom kernel: [7869821.514718] sd 0:0:0:0:
[sdj] Attached SCSI disk
Oct 17 18:39:21 bazsalikom kernel: [7873955.418679] sdh: sdh1
Oct 17 18:39:37 bazsalikom kernel: [7873972.155084] sdi: sdi1
Oct 17 18:39:49 bazsalikom kernel: [7873983.916038] sdj: sdj1
Oct 17 18:40:33 bazsalikom kernel: [7874027.963430] md: bind<sdh1>
Oct 17 18:40:34 bazsalikom kernel: [7874028.263656] md: bind<sdi1>
Oct 17 18:40:34 bazsalikom kernel: [7874028.361112] md: bind<sdj1>
Oct 17 18:59:48 bazsalikom kernel: [7875182.667815] md: reshape of
RAID array md1
Oct 17 18:59:48 bazsalikom kernel: [7875182.667818] md: minimum
_guaranteed_ speed: 1000 KB/sec/disk.
Oct 17 18:59:48 bazsalikom kernel: [7875182.667821] md: using
maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for reshape.
Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using 128k
window, over a total of 1465135936k.
--> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md: md_do_sync()
got signal ... exiting
Oct 17 19:02:46 bazsalikom kernel: [7875360.928059] md1_raid6
D ffff88021fc12780 0 282 2 0x00000000
Oct 17 19:02:46 bazsalikom kernel: [7875360.928066]
ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0
Oct 17 19:02:46 bazsalikom kernel: [7875360.928073]
0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140
Oct 17 19:02:46 bazsalikom kernel: [7875360.928079]
ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00
Oct 17 19:02:46 bazsalikom kernel: [7875360.928085] Call Trace:
Oct 17 19:02:46 bazsalikom kernel: [7875360.928095]
[<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
Oct 17 19:02:46 bazsalikom kernel: [7875360.928111]
[<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456]
Oct 17 19:02:46 bazsalikom kernel: [7875360.928128]
[<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod]
Oct 17 19:02:46 bazsalikom kernel: [7875360.928134]
[<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
Oct 17 19:02:46 bazsalikom kernel: [7875360.928144]
[<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod]
Oct 17 19:02:46 bazsalikom kernel: [7875360.928151]
[<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456]
Oct 17 19:02:46 bazsalikom kernel: [7875360.928156]
[<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
Oct 17 19:02:46 bazsalikom kernel: [7875360.928160]
[<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
Oct 17 19:02:46 bazsalikom kernel: [7875360.928169]
[<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod]
Oct 17 19:02:46 bazsalikom kernel: [7875360.928174]
[<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
Oct 17 19:02:46 bazsalikom kernel: [7875360.928183]
[<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod]
Oct 17 19:02:46 bazsalikom kernel: [7875360.928188]
[<ffffffff8105f7a1>] ? kthread+0x76/0x7e
Oct 17 19:02:46 bazsalikom kernel: [7875360.928194]
[<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10
Oct 17 19:02:46 bazsalikom kernel: [7875360.928199]
[<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139
Oct 17 19:02:46 bazsalikom kernel: [7875360.928204]
[<ffffffff81357ff0>] ? gs_change+0x13/0x13
Oct 17 19:04:46 bazsalikom kernel: [7875480.928055] md1_raid6
D ffff88021fc12780 0 282 2 0x00000000
Oct 17 19:04:46 bazsalikom kernel: [7875480.928062]
ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0
Oct 17 19:04:46 bazsalikom kernel: [7875480.928069]
0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140
Oct 17 19:04:46 bazsalikom kernel: [7875480.928075]
ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00
Oct 17 19:04:46 bazsalikom kernel: [7875480.928082] Call Trace:
Oct 17 19:04:46 bazsalikom kernel: [7875480.928091]
[<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
Oct 17 19:04:46 bazsalikom kernel: [7875480.928108]
[<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928124]
[<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928130]
[<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
Oct 17 19:04:46 bazsalikom kernel: [7875480.928141]
[<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928148]
[<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928153]
[<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
Oct 17 19:04:46 bazsalikom kernel: [7875480.928157]
[<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
Oct 17 19:04:46 bazsalikom kernel: [7875480.928166]
[<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928171]
[<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928180]
[<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928185]
[<ffffffff8105f7a1>] ? kthread+0x76/0x7e
Oct 17 19:04:46 bazsalikom kernel: [7875480.928191]
[<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10
Oct 17 19:04:46 bazsalikom kernel: [7875480.928196]
[<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139
Oct 17 19:04:46 bazsalikom kernel: [7875480.928200]
[<ffffffff81357ff0>] ? gs_change+0x13/0x13
Oct 17 19:04:46 bazsalikom kernel: [7875480.928212] jbd2/md1-8
D ffff88021fc92780 0 1731 2 0x00000000
Oct 17 19:04:46 bazsalikom kernel: [7875480.928218]
ffff880213693180 0000000000000046 ffff880200000000 ffff880216d04180
Oct 17 19:04:46 bazsalikom kernel: [7875480.928224]
0000000000012780 ffff880213df3fd8 ffff880213df3fd8 ffff880213693180
Oct 17 19:04:46 bazsalikom kernel: [7875480.928230]
0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
Oct 17 19:04:46 bazsalikom kernel: [7875480.928236] Call Trace:
Oct 17 19:04:46 bazsalikom kernel: [7875480.928243]
[<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928248]
[<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
Oct 17 19:04:46 bazsalikom kernel: [7875480.928255]
[<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928260]
[<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928278]
[<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928283]
[<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
Oct 17 19:04:46 bazsalikom kernel: [7875480.928287]
[<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
Oct 17 19:04:46 bazsalikom kernel: [7875480.928293]
[<ffffffff81121b78>] ? bio_alloc_bioset+0x43/0xb6
Oct 17 19:04:46 bazsalikom kernel: [7875480.928297]
[<ffffffff8111da68>] ? submit_bh+0xe2/0xff
Oct 17 19:04:46 bazsalikom kernel: [7875480.928304]
[<ffffffffa0167674>] ? jbd2_journal_commit_transaction+0x803/0x10bf
[jbd2]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928309]
[<ffffffff8100d02f>] ? load_TLS+0x7/0xa
Oct 17 19:04:46 bazsalikom kernel: [7875480.928313]
[<ffffffff8100d69e>] ? __switch_to+0x133/0x258
Oct 17 19:04:46 bazsalikom kernel: [7875480.928318]
[<ffffffff81350dd1>] ? _raw_spin_lock_irqsave+0x9/0x25
Oct 17 19:04:46 bazsalikom kernel: [7875480.928323]
[<ffffffff8105267a>] ? lock_timer_base.isra.29+0x23/0x47
Oct 17 19:04:46 bazsalikom kernel: [7875480.928330]
[<ffffffffa016b166>] ? kjournald2+0xc0/0x20a [jbd2]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928334]
[<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928341]
[<ffffffffa016b0a6>] ? commit_timeout+0x5/0x5 [jbd2]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928345]
[<ffffffff8105f7a1>] ? kthread+0x76/0x7e
Oct 17 19:04:46 bazsalikom kernel: [7875480.928349]
[<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10
Oct 17 19:04:46 bazsalikom kernel: [7875480.928354]
[<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139
Oct 17 19:04:46 bazsalikom kernel: [7875480.928358]
[<ffffffff81357ff0>] ? gs_change+0x13/0x13
Oct 17 19:04:46 bazsalikom kernel: [7875480.928408] smbd
D ffff88021fc12780 0 3063 25481 0x00000000
Oct 17 19:04:46 bazsalikom kernel: [7875480.928413]
ffff880213e07780 0000000000000082 0000000000000000 ffffffff8160d020
Oct 17 19:04:46 bazsalikom kernel: [7875480.928418]
0000000000012780 ffff880003cabfd8 ffff880003cabfd8 ffff880213e07780
Oct 17 19:04:46 bazsalikom kernel: [7875480.928424]
0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
Oct 17 19:04:46 bazsalikom kernel: [7875480.928429] Call Trace:
Oct 17 19:04:46 bazsalikom kernel: [7875480.928435]
[<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928439]
[<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
Oct 17 19:04:46 bazsalikom kernel: [7875480.928445]
[<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928450]
[<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928457]
[<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928468]
[<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928473]
[<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
Oct 17 19:04:46 bazsalikom kernel: [7875480.928477]
[<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
Oct 17 19:04:46 bazsalikom kernel: [7875480.928482]
[<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51
Oct 17 19:04:46 bazsalikom kernel: [7875480.928486]
[<ffffffff811259dd>] ? mpage_readpages+0x113/0x134
Oct 17 19:04:46 bazsalikom kernel: [7875480.928496]
[<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928500]
[<ffffffff81109033>] ? poll_freewait+0x97/0x97
Oct 17 19:04:46 bazsalikom kernel: [7875480.928505]
[<ffffffff81036628>] ? should_resched+0x5/0x23
Oct 17 19:04:46 bazsalikom kernel: [7875480.928508]
[<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928513]
[<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3
Oct 17 19:04:46 bazsalikom kernel: [7875480.928517]
[<ffffffff810be02e>] ? ra_submit+0x19/0x1d
Oct 17 19:04:46 bazsalikom kernel: [7875480.928522]
[<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf
Oct 17 19:04:46 bazsalikom kernel: [7875480.928528]
[<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec
Oct 17 19:04:46 bazsalikom kernel: [7875480.928532]
[<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6
Oct 17 19:04:46 bazsalikom kernel: [7875480.928536]
[<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e
Oct 17 19:04:46 bazsalikom kernel: [7875480.928540]
[<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b
Oct 17 19:04:46 bazsalikom kernel: [7875480.928549] imap
D ffff88021fc12780 0 3121 4613 0x00000000
Oct 17 19:04:46 bazsalikom kernel: [7875480.928554]
ffff880216db1100 0000000000000082 ffffea0000000000 ffffffff8160d020
Oct 17 19:04:46 bazsalikom kernel: [7875480.928559]
0000000000012780 ffff8800cf5b1fd8 ffff8800cf5b1fd8 ffff880216db1100
Oct 17 19:04:46 bazsalikom kernel: [7875480.928564]
0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
Oct 17 19:04:46 bazsalikom kernel: [7875480.928569] Call Trace:
Oct 17 19:04:46 bazsalikom kernel: [7875480.928576]
[<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928580]
[<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
Oct 17 19:04:46 bazsalikom kernel: [7875480.928585]
[<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928590]
[<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928597]
[<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928607]
[<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928611]
[<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
Oct 17 19:04:46 bazsalikom kernel: [7875480.928615]
[<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
Oct 17 19:04:46 bazsalikom kernel: [7875480.928619]
[<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51
Oct 17 19:04:46 bazsalikom kernel: [7875480.928623]
[<ffffffff811259dd>] ? mpage_readpages+0x113/0x134
Oct 17 19:04:46 bazsalikom kernel: [7875480.928633]
[<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928637]
[<ffffffff8110b27f>] ? dput+0x27/0xee
Oct 17 19:04:46 bazsalikom kernel: [7875480.928641]
[<ffffffff811110df>] ? mntput_no_expire+0x1e/0xc9
Oct 17 19:04:46 bazsalikom kernel: [7875480.928646]
[<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3
Oct 17 19:04:46 bazsalikom kernel: [7875480.928650]
[<ffffffff810bdff1>] ? force_page_cache_readahead+0x5f/0x83
Oct 17 19:04:46 bazsalikom kernel: [7875480.928654]
[<ffffffff810b85e5>] ? sys_fadvise64_64+0x141/0x1e2
Oct 17 19:04:46 bazsalikom kernel: [7875480.928658]
[<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b
Oct 17 19:04:46 bazsalikom kernel: [7875480.928667] smbd
D ffff88021fc12780 0 3155 25481 0x00000000
Oct 17 19:04:46 bazsalikom kernel: [7875480.928672]
ffff8802135d8780 0000000000000086 0000000000000000 ffffffff8160d020
Oct 17 19:04:46 bazsalikom kernel: [7875480.928677]
0000000000012780 ffff880005267fd8 ffff880005267fd8 ffff8802135d8780
Oct 17 19:04:46 bazsalikom kernel: [7875480.928683]
0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
Oct 17 19:04:46 bazsalikom kernel: [7875480.928688] Call Trace:
Oct 17 19:04:46 bazsalikom kernel: [7875480.928694]
[<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928698]
[<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
Oct 17 19:04:46 bazsalikom kernel: [7875480.928704]
[<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928708]
[<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928715]
[<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928725]
[<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928729]
[<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
Oct 17 19:04:46 bazsalikom kernel: [7875480.928733]
[<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
Oct 17 19:04:46 bazsalikom kernel: [7875480.928737]
[<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51
Oct 17 19:04:46 bazsalikom kernel: [7875480.928741]
[<ffffffff811259dd>] ? mpage_readpages+0x113/0x134
Oct 17 19:04:46 bazsalikom kernel: [7875480.928751]
[<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
Oct 17 19:04:46 bazsalikom kernel: [7875480.928755]
[<ffffffff81109033>] ? poll_freewait+0x97/0x97
Oct 17 19:04:46 bazsalikom kernel: [7875480.928759]
[<ffffffff81036628>] ? should_resched+0x5/0x23
Oct 17 19:04:46 bazsalikom kernel: [7875480.928762]
[<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c
Oct 17 19:04:46 bazsalikom kernel: [7875480.928767]
[<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3
Oct 17 19:04:46 bazsalikom kernel: [7875480.928771]
[<ffffffff810be02e>] ? ra_submit+0x19/0x1d
Oct 17 19:04:46 bazsalikom kernel: [7875480.928775]
[<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf
Oct 17 19:04:46 bazsalikom kernel: [7875480.928780]
[<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec
Oct 17 19:04:46 bazsalikom kernel: [7875480.928784]
[<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6
Oct 17 19:04:46 bazsalikom kernel: [7875480.928788]
[<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e
Oct 17 19:04:46 bazsalikom kernel: [7875480.928792]
[<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b
From here on, things went downhill pretty damn fast. I was not able to
unmount the file-system, stop or re-start the array (/proc/mdstat went
away), any process trying to touch /dev/md1 hung, so eventually, I run
out of options and hit the reset button on the machine.
Upon reboot, the array wouldn't assemble, it was complaining that SDA
and SDA1 had the same superblock info on it.
mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
superblocks.
If they are really different, please --zero the superblock on one
If they are the same or overlap, please remove one from the
DEVICE list in mdadm.conf.
At this point, I looked at the drives and it appeared that the drive
letters got re-arranged by the kernel. My three new HDD-s (which used to
be SDH, SDI, SDJ) now appear as SDA, SDB and SDD.
I've read up on this a little and everyone seemed to suggest that you
repair this super-block corruption by zeroing out the suport-block, so I
did:
mdadm --zero-superblock /dev/sda1
At this point mdadm started complaining about the super-block on SDB
(and later SDD) so I ended up zeroing out the superblock on all three of
the new hard-drives:
mdadm --zero-superblock /dev/sdb1
mdadm --zero-superblock /dev/sdd1
After this, the array would assemble, but wouldn't start, stating that
it doesn't have enough disks in it - which is correct for the new array:
I just removed 3 drives from a RAID6.
Right now, /proc/mdstat says:
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : inactive sdh2[0](S) sdc2[6](S) sdj1[5](S) sde1[4](S)
sdg1[3](S) sdi1[2](S) sdf2[1](S)
10744335040 blocks super 0.91
mdadm -E /dev/sdc2 says:
/dev/sdc2:
Magic : a92b4efc
Version : 0.91.00
UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
Creation Time : Sat Oct 2 07:21:53 2010
Raid Level : raid6
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
Raid Devices : 10
Total Devices : 10
Preferred Minor : 1
Reshape pos'n : 4096
Delta Devices : 3 (7->10)
Update Time : Sat Oct 17 18:59:50 2015
State : active
Active Devices : 10
Working Devices : 10
Failed Devices : 0
Spare Devices : 0
Checksum : fad60788 - correct
Events : 2579239
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 6 8 98 6 active sync
0 0 8 50 0 active sync
1 1 8 18 1 active sync
2 2 8 65 2 active sync /dev/sde1
3 3 8 33 3 active sync /dev/sdc1
4 4 8 1 4 active sync /dev/sda1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 98 6 active sync
7 7 8 145 7 active sync /dev/sdj1
8 8 8 129 8 active sync /dev/sdi1
9 9 8 113 9 active sync /dev/sdh1
So, if I read this right, the superblock here states that the array is
in the middle of a reshape from 7 to 10 devices, but it just started
(4096 is the position).
What's interesting is the device names listed here don't match the ones
reported by /proc/mdstat, and are actually incorrect. The right
partition numbers are in /proc/mdstat.
The superblocks on the 6 other original disks match, except for of
course which one they mark as 'this' and the checksum.
I've read in here (http://ubuntuforums.org/showthread.php?t=2133576)
among many other places that it might be possible to recover the data on
the array by trying to re-create it to the state before the re-shape.
I've also read that if I want to re-create an array in read-only mode, I
should re-create it degraded.
So, what I thought I would do is this:
mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2
/dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing
Obviously, at this point, I'm trying to be as cautious as possible in
not causing any further damage, if that's at all possible.
It seems that this issue has some similarities to this bug:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1001019
So, please all mdadm gurus, help me out! How can I recover as much of
the data on this volume as possible?
Thanks again,
Andras Tantos
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 2:35 How to recover after md crash during reshape? andras
@ 2015-10-20 12:50 ` Anugraha Sinha
2015-10-20 13:04 ` Wols Lists
` (2 subsequent siblings)
3 siblings, 0 replies; 24+ messages in thread
From: Anugraha Sinha @ 2015-10-20 12:50 UTC (permalink / raw)
To: andras, linux-raid
Hi Andras,
> Upon reboot, the array wouldn't assemble, it was complaining that SDA
> and SDA1 had the same superblock info on it.
>
> mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
> superblocks.
> If they are really different, please --zero the superblock on one
> If they are the same or overlap, please remove one from the
> DEVICE list in mdadm.conf.
>
> At this point, I looked at the drives and it appeared that the drive
> letters got re-arranged by the kernel. My three new HDD-s (which used to
> be SDH, SDI, SDJ) now appear as SDA, SDB and SDD.
>
> I've read up on this a little and everyone seemed to suggest that you
> repair this super-block corruption by zeroing out the suport-block, so I
> did:
>
> mdadm --zero-superblock /dev/sda1
>
> At this point mdadm started complaining about the super-block on SDB
> (and later SDD) so I ended up zeroing out the superblock on all three of
> the new hard-drives:
>
> mdadm --zero-superblock /dev/sdb1
> mdadm --zero-superblock /dev/sdd1
Before doing zero-superblock, you should have removed the drives from
the array first. Then you should have zero'd the superblock information.
This way array, would have got to know about removal of arrays, and it
would have reassembled and started again.
Anyways, I suggest, you should first remove the devices which mdadm is
expecting to be present.
In my opinion you should first execute
[Just as a safegaurd may do this as well]
mdadm --stop /dev/md1
[then]
mdadm /dev/md1 --fail /dev/sda1 --remove /dev/sda1
mdadm /dev/md1 --fail /dev/sdb1 --remove /dev/sdb1
mdadm /dev/md1 --fail /dev/sdd1 --remove /dev/sdd1
Then check what does /proc/mdstat says.
Check mdadm -D /dev/md1 says
If things are good and you are lucky, restart the array (mdadm --run)
Thereafter try and remove existing partitions on /dev/sda, /dev/sdb &
/dev/sdd. (Using GNU Parted)
Recreate partitions, and probably mkfs on newly created partitions as well.
The above will solve the issue that /dev/sda & /dev/sda1 have similar
superblock information.
Finally take a backup and then add and grow your array again.
I hope things work for you.
Regards
Anugraha
On 10/20/2015 11:35 AM, andras@tantosonline.com wrote:
> Dear all,
>
> I have a serious (to me) problem, and I'm seeking some pro advice in
> recovering a RAID6 volume after a crash at the beginning of a reshape.
> Thank you all in advance for any help!
>
> The details:
>
> I'm running Debian.
> uname -r says:
> kernel 3.2.0-4-amd64
> dmsg says:
> Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org)
> (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.68-1+deb7u3
> mdadm -v says:
> mdadm - v3.2.5 - 18th May 2012
>
> I used to have a RAID6 volume with 7 disks on it. I've recently bought
> another 3 new HDD-s and was trying to add them to the array.
> I've put them in the machine (hot-plug), partitioned them then did:
>
> mdadm --add /dev/md1 /dev/sdh1 /dev/sdi1 /dev/sdj1
>
> This worked fine, /proc/mdstat showed them as three spares. Then I did:
>
> mdadm --grow --raid-devices=10 /dev/md1
>
> Yes, I was dumb enough to start the process without a backup option -
> (copy-paste error from https://raid.wiki.kernel.org/index.php/Growing).
>
> This immediately (well, after 2 seconds) crashed the MD driver:
>
> Oct 17 17:30:27 bazsalikom kernel: [7869821.514718] sd 0:0:0:0:
> [sdj] Attached SCSI disk
> Oct 17 18:39:21 bazsalikom kernel: [7873955.418679] sdh: sdh1
> Oct 17 18:39:37 bazsalikom kernel: [7873972.155084] sdi: sdi1
> Oct 17 18:39:49 bazsalikom kernel: [7873983.916038] sdj: sdj1
> Oct 17 18:40:33 bazsalikom kernel: [7874027.963430] md: bind<sdh1>
> Oct 17 18:40:34 bazsalikom kernel: [7874028.263656] md: bind<sdi1>
> Oct 17 18:40:34 bazsalikom kernel: [7874028.361112] md: bind<sdj1>
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667815] md: reshape of
> RAID array md1
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667818] md: minimum
> _guaranteed_ speed: 1000 KB/sec/disk.
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667821] md: using
> maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for reshape.
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using 128k
> window, over a total of 1465135936k.
> --> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md: md_do_sync()
> got signal ... exiting
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928059] md1_raid6 D
> ffff88021fc12780 0 282 2 0x00000000
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928066]
> ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928073]
> 0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928079]
> ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928085] Call Trace:
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928095]
> [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928111]
> [<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456]
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928128]
> [<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod]
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928134]
> [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928144]
> [<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod]
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928151]
> [<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456]
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928156]
> [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928160]
> [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928169]
> [<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod]
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928174]
> [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928183]
> [<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod]
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928188]
> [<ffffffff8105f7a1>] ? kthread+0x76/0x7e
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928194]
> [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928199]
> [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139
> Oct 17 19:02:46 bazsalikom kernel: [7875360.928204]
> [<ffffffff81357ff0>] ? gs_change+0x13/0x13
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928055] md1_raid6 D
> ffff88021fc12780 0 282 2 0x00000000
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928062]
> ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928069]
> 0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928075]
> ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928082] Call Trace:
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928091]
> [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928108]
> [<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928124]
> [<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928130]
> [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928141]
> [<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928148]
> [<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928153]
> [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928157]
> [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928166]
> [<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928171]
> [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928180]
> [<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928185]
> [<ffffffff8105f7a1>] ? kthread+0x76/0x7e
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928191]
> [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928196]
> [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928200]
> [<ffffffff81357ff0>] ? gs_change+0x13/0x13
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928212] jbd2/md1-8 D
> ffff88021fc92780 0 1731 2 0x00000000
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928218]
> ffff880213693180 0000000000000046 ffff880200000000 ffff880216d04180
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928224]
> 0000000000012780 ffff880213df3fd8 ffff880213df3fd8 ffff880213693180
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928230]
> 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928236] Call Trace:
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928243]
> [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928248]
> [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928255]
> [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928260]
> [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928278]
> [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928283]
> [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928287]
> [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928293]
> [<ffffffff81121b78>] ? bio_alloc_bioset+0x43/0xb6
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928297]
> [<ffffffff8111da68>] ? submit_bh+0xe2/0xff
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928304]
> [<ffffffffa0167674>] ? jbd2_journal_commit_transaction+0x803/0x10bf [jbd2]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928309]
> [<ffffffff8100d02f>] ? load_TLS+0x7/0xa
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928313]
> [<ffffffff8100d69e>] ? __switch_to+0x133/0x258
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928318]
> [<ffffffff81350dd1>] ? _raw_spin_lock_irqsave+0x9/0x25
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928323]
> [<ffffffff8105267a>] ? lock_timer_base.isra.29+0x23/0x47
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928330]
> [<ffffffffa016b166>] ? kjournald2+0xc0/0x20a [jbd2]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928334]
> [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928341]
> [<ffffffffa016b0a6>] ? commit_timeout+0x5/0x5 [jbd2]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928345]
> [<ffffffff8105f7a1>] ? kthread+0x76/0x7e
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928349]
> [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928354]
> [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928358]
> [<ffffffff81357ff0>] ? gs_change+0x13/0x13
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928408] smbd D
> ffff88021fc12780 0 3063 25481 0x00000000
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928413]
> ffff880213e07780 0000000000000082 0000000000000000 ffffffff8160d020
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928418]
> 0000000000012780 ffff880003cabfd8 ffff880003cabfd8 ffff880213e07780
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928424]
> 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928429] Call Trace:
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928435]
> [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928439]
> [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928445]
> [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928450]
> [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928457]
> [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928468]
> [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928473]
> [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928477]
> [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928482]
> [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928486]
> [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928496]
> [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928500]
> [<ffffffff81109033>] ? poll_freewait+0x97/0x97
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928505]
> [<ffffffff81036628>] ? should_resched+0x5/0x23
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928508]
> [<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928513]
> [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928517]
> [<ffffffff810be02e>] ? ra_submit+0x19/0x1d
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928522]
> [<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928528]
> [<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928532]
> [<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928536]
> [<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928540]
> [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928549] imap D
> ffff88021fc12780 0 3121 4613 0x00000000
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928554]
> ffff880216db1100 0000000000000082 ffffea0000000000 ffffffff8160d020
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928559]
> 0000000000012780 ffff8800cf5b1fd8 ffff8800cf5b1fd8 ffff880216db1100
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928564]
> 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928569] Call Trace:
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928576]
> [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928580]
> [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928585]
> [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928590]
> [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928597]
> [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928607]
> [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928611]
> [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928615]
> [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928619]
> [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928623]
> [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928633]
> [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928637]
> [<ffffffff8110b27f>] ? dput+0x27/0xee
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928641]
> [<ffffffff811110df>] ? mntput_no_expire+0x1e/0xc9
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928646]
> [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928650]
> [<ffffffff810bdff1>] ? force_page_cache_readahead+0x5f/0x83
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928654]
> [<ffffffff810b85e5>] ? sys_fadvise64_64+0x141/0x1e2
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928658]
> [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928667] smbd D
> ffff88021fc12780 0 3155 25481 0x00000000
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928672]
> ffff8802135d8780 0000000000000086 0000000000000000 ffffffff8160d020
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928677]
> 0000000000012780 ffff880005267fd8 ffff880005267fd8 ffff8802135d8780
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928683]
> 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928688] Call Trace:
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928694]
> [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928698]
> [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928704]
> [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928708]
> [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928715]
> [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928725]
> [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928729]
> [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928733]
> [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928737]
> [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928741]
> [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928751]
> [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4]
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928755]
> [<ffffffff81109033>] ? poll_freewait+0x97/0x97
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928759]
> [<ffffffff81036628>] ? should_resched+0x5/0x23
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928762]
> [<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928767]
> [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928771]
> [<ffffffff810be02e>] ? ra_submit+0x19/0x1d
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928775]
> [<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928780]
> [<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928784]
> [<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928788]
> [<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e
> Oct 17 19:04:46 bazsalikom kernel: [7875480.928792]
> [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b
>
> From here on, things went downhill pretty damn fast. I was not able to
> unmount the file-system, stop or re-start the array (/proc/mdstat went
> away), any process trying to touch /dev/md1 hung, so eventually, I run
> out of options and hit the reset button on the machine.
>
> Upon reboot, the array wouldn't assemble, it was complaining that SDA
> and SDA1 had the same superblock info on it.
>
> mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
> superblocks.
> If they are really different, please --zero the superblock on one
> If they are the same or overlap, please remove one from the
> DEVICE list in mdadm.conf.
>
> At this point, I looked at the drives and it appeared that the drive
> letters got re-arranged by the kernel. My three new HDD-s (which used to
> be SDH, SDI, SDJ) now appear as SDA, SDB and SDD.
>
> I've read up on this a little and everyone seemed to suggest that you
> repair this super-block corruption by zeroing out the suport-block, so I
> did:
>
> mdadm --zero-superblock /dev/sda1
>
> At this point mdadm started complaining about the super-block on SDB
> (and later SDD) so I ended up zeroing out the superblock on all three of
> the new hard-drives:
>
> mdadm --zero-superblock /dev/sdb1
> mdadm --zero-superblock /dev/sdd1
>
> After this, the array would assemble, but wouldn't start, stating that
> it doesn't have enough disks in it - which is correct for the new array:
> I just removed 3 drives from a RAID6.
>
> Right now, /proc/mdstat says:
>
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md1 : inactive sdh2[0](S) sdc2[6](S) sdj1[5](S) sde1[4](S)
> sdg1[3](S) sdi1[2](S) sdf2[1](S)
> 10744335040 blocks super 0.91
>
> mdadm -E /dev/sdc2 says:
> /dev/sdc2:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad60788 - correct
> Events : 2579239
>
>
> Layout : left-symmetric
> Chunk Size : 64K
>
>
> Number Major Minor RaidDevice State
> this 6 8 98 6 active sync
>
>
> 0 0 8 50 0 active sync
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
>
> So, if I read this right, the superblock here states that the array is
> in the middle of a reshape from 7 to 10 devices, but it just started
> (4096 is the position).
> What's interesting is the device names listed here don't match the ones
> reported by /proc/mdstat, and are actually incorrect. The right
> partition numbers are in /proc/mdstat.
>
> The superblocks on the 6 other original disks match, except for of
> course which one they mark as 'this' and the checksum.
>
> I've read in here (http://ubuntuforums.org/showthread.php?t=2133576)
> among many other places that it might be possible to recover the data on
> the array by trying to re-create it to the state before the re-shape.
>
> I've also read that if I want to re-create an array in read-only mode, I
> should re-create it degraded.
>
> So, what I thought I would do is this:
>
> mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2
> /dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing
>
> Obviously, at this point, I'm trying to be as cautious as possible in
> not causing any further damage, if that's at all possible.
>
> It seems that this issue has some similarities to this bug:
> https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1001019
>
> So, please all mdadm gurus, help me out! How can I recover as much of
> the data on this volume as possible?
>
> Thanks again,
> Andras Tantos
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 2:35 How to recover after md crash during reshape? andras
2015-10-20 12:50 ` Anugraha Sinha
@ 2015-10-20 13:04 ` Wols Lists
2015-10-20 13:49 ` Phil Turmel
2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown
3 siblings, 0 replies; 24+ messages in thread
From: Wols Lists @ 2015-10-20 13:04 UTC (permalink / raw)
To: andras, linux-raid
On 20/10/15 03:35, andras@tantosonline.com wrote:
> From here on, things went downhill pretty damn fast. I was not able to
> unmount the file-system, stop or re-start the array (/proc/mdstat went
> away), any process trying to touch /dev/md1 hung, so eventually, I run
> out of options and hit the reset button on the machine.
>
> Upon reboot, the array wouldn't assemble, it was complaining that SDA
> and SDA1 had the same superblock info on it.
>
> mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
> superblocks.
> If they are really different, please --zero the superblock on one
> If they are the same or overlap, please remove one from the
> DEVICE list in mdadm.conf.
>
> At this point, I looked at the drives and it appeared that the drive
> letters got re-arranged by the kernel. My three new HDD-s (which used to
> be SDH, SDI, SDJ) now appear as SDA, SDB and SDD.
>
> I've read up on this a little and everyone seemed to suggest that you
> repair this super-block corruption by zeroing out the suport-block, so I
> did:
>
> mdadm --zero-superblock /dev/sda1
OUCH !!!
REALLY REALLY REALLY don't do anything now until the experts chime in !!!
It looks to me like you have a 0.9 superblock, and this error message is
both common and erroneous. There's only one superblock, but it looks to
mdadm like it's both a disk superblock and a partition superblock.
You've just wiped those drives, I think ...
The experts should be able to recover it for you (I hope), but your
array is now damaged - don't damage it any further !!!
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 2:35 How to recover after md crash during reshape? andras
2015-10-20 12:50 ` Anugraha Sinha
2015-10-20 13:04 ` Wols Lists
@ 2015-10-20 13:49 ` Phil Turmel
[not found] ` <3baf849321d819483c5d20c005a31844@tantosonline.com>
2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown
3 siblings, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2015-10-20 13:49 UTC (permalink / raw)
To: andras, linux-raid
Good morning Andras,
On 10/19/2015 10:35 PM, andras@tantosonline.com wrote:
> Dear all,
>
> I have a serious (to me) problem, and I'm seeking some pro advice in
> recovering a RAID6 volume after a crash at the beginning of a reshape.
> Thank you all in advance for any help!
>
> The details:
>
> I'm running Debian.
> uname -r says:
> kernel 3.2.0-4-amd64
> dmsg says:
> Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org)
> (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.68-1+deb7u3
> mdadm -v says:
> mdadm - v3.2.5 - 18th May 2012
>
> I used to have a RAID6 volume with 7 disks on it. I've recently bought
> another 3 new HDD-s and was trying to add them to the array.
> I've put them in the machine (hot-plug), partitioned them then did:
>
> mdadm --add /dev/md1 /dev/sdh1 /dev/sdi1 /dev/sdj1
>
> This worked fine, /proc/mdstat showed them as three spares. Then I did:
>
> mdadm --grow --raid-devices=10 /dev/md1
>
> Yes, I was dumb enough to start the process without a backup option -
> (copy-paste error from https://raid.wiki.kernel.org/index.php/Growing).
The normal way to recover from this mistake is to issue
mdadm --grow --continue /dev/md1 --backup-file .....
> This immediately (well, after 2 seconds) crashed the MD driver:
Crashing is a bug, of course, but you are using an old kernel. New
kernels *generally* have fewer bugs than old kernels :-) In newer
kernels it would have just held @ 0% progress while still otherwise running.
Same observation applies to the mdadm utility too. Consider using a
relatively new rescue CD for further operations.
[trim /]
> Upon reboot, the array wouldn't assemble, it was complaining that SDA
> and SDA1 had the same superblock info on it.
>
> mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
> superblocks.
> If they are really different, please --zero the superblock on one
> If they are the same or overlap, please remove one from the
> DEVICE list in mdadm.conf.
This is a completely separate problem, and the warning is a bit
misleading. It is a side effect of version 0.90 metadata that could not
be solved in a backward compatible manner. Which is why v1.x metadata
was created and became the default years ago. Basically, v0.90
metadata, which is placed at the end of a device, when used on the last
partition of a disk, is ambiguous about whether it belongs to the last
partition or the disk as a whole.
Normally, you can update the metadata in place from v0.90 to v1.0 with
mdadm --assemble --update=metadata ....
> At this point, I looked at the drives and it appeared that the drive
> letters got re-arranged by the kernel. My three new HDD-s (which used to
> be SDH, SDI, SDJ) now appear as SDA, SDB and SDD.
This is common and often screws people up. The kernel assigns names
based on discovery order, which varies, especially with hotplugging.
You need a map of your array and its devices versus the underlying drive
serial numbers. This is so important I created a script years ago to
generate this information. Please download and run it, and post the
results here so we can precisely tailor the instructions we give.
https://github.com/pturmel/lsdrv
> I've read up on this a little and everyone seemed to suggest that you
> repair this super-block corruption by zeroing out the suport-block, so I
> did:
>
> mdadm --zero-superblock /dev/sda1
"Everyone" was wrong. Your drives only had the one superblock. It was
just misidentified in two contexts. You destroyed the only superblock
on those devices.
[trim /]
> After this, the array would assemble, but wouldn't start, stating that
> it doesn't have enough disks in it - which is correct for the new array:
> I just removed 3 drives from a RAID6.
>
> Right now, /proc/mdstat says:
>
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md1 : inactive sdh2[0](S) sdc2[6](S) sdj1[5](S) sde1[4](S)
> sdg1[3](S) sdi1[2](S) sdf2[1](S)
> 10744335040 blocks super 0.91
>
> mdadm -E /dev/sdc2 says:
> /dev/sdc2:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad60788 - correct
> Events : 2579239
>
>
> Layout : left-symmetric
> Chunk Size : 64K
>
>
> Number Major Minor RaidDevice State
> this 6 8 98 6 active sync
>
>
> 0 0 8 50 0 active sync
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
>
> So, if I read this right, the superblock here states that the array is
> in the middle of a reshape from 7 to 10 devices, but it just started
> (4096 is the position).
Yup, just a little ways in at the beginning. Probably where it tried to
write its first critical section to the backup file.
> What's interesting is the device names listed here don't match the ones
> reported by /proc/mdstat, and are actually incorrect. The right
> partition numbers are in /proc/mdstat.
Names in the superblock are recorded per the last successful assembly.
Which is why a map of actual roles vs. drive serial numbers is so important.
> I've read in here (http://ubuntuforums.org/showthread.php?t=2133576)
> among many other places that it might be possible to recover the data on
> the array by trying to re-create it to the state before the re-shape.
Yes, since you have destroyed those superblocks, and the reshape
position is so low. You might lose a little at the beginning of your
array. Or might not, if it crashed at the first critical section as I
suspect.
> I've also read that if I want to re-create an array in read-only mode, I
> should re-create it degraded.
Not necessary or recommended in this case.
> So, what I thought I would do is this:
>
> mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2
> /dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing
>
> Obviously, at this point, I'm trying to be as cautious as possible in
> not causing any further damage, if that's at all possible.
Good, because the above would destroy your array. You'd get modern
defaults for metadata version, offset, and chunk size.
Please supply all of you mdadm -E reports for the seven partitions and
the lsdrv output I requests. Just post the text inline in your reply.
Do *not* do anything else.
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
[not found] ` <3baf849321d819483c5d20c005a31844@tantosonline.com>
@ 2015-10-20 15:42 ` Phil Turmel
2015-10-20 22:34 ` Anugraha Sinha
` (3 more replies)
0 siblings, 4 replies; 24+ messages in thread
From: Phil Turmel @ 2015-10-20 15:42 UTC (permalink / raw)
To: andras, Linux-RAID
Hi Andras,
{ Added linux-raid back -- convention on kernel.org is to reply-to-all,
trim replies, and either interleave or bottom post. I'm trimming less
than normal this time so the list can see. }
On 10/20/2015 10:48 AM, andras@tantosonline.com wrote:
> On 2015-10-20 08:49, Phil Turmel wrote:
>> Please supply all of you mdadm -E reports for the seven partitions and
>> the lsdrv output I requests. Just post the text inline in your reply.
>>
>> Do *not* do anything else.
>>
>> Phil
> Thanks for all the help!
>
> Here's the output of lsdrv:
>
> PCI [pata_marvell] 04:00.1 IDE interface: Marvell Technology Group Ltd.
> 88SE9128 IDE Controller (rev 11)
> ├scsi 0:x:x:x [Empty]
> └scsi 2:x:x:x [Empty]
> PCI [pata_jmicron] 05:00.1 IDE interface: JMicron Technology Corp.
> JMB363 SATA/IDE Controller (rev 02)
> ├scsi 1:x:x:x [Empty]
> └scsi 3:x:x:x [Empty]
> PCI [ahci] 04:00.0 SATA controller: Marvell Technology Group Ltd.
> 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
> ├scsi 4:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JDN8}
> │└sda 1.82t [8:0] Partitioned (dos)
> │ └sda1 1.82t [8:1] Empty/Unknown
> └scsi 5:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1H84Q}
> └sdb 1.82t [8:16] Partitioned (dos)
> └sdb1 1.82t [8:17] ext4 'data' {d1403616-a9c6-4cd9-8d92-1aabc81fe373}
> PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10
> Family) 4 port SATA IDE Controller #1
> ├scsi 6:0:0:0 ATA ST31500541AS {6XW0BQL0}
> │└sdc 1.36t [8:32] Partitioned (dos)
> │ └sdc1 1.36t [8:33] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> ├scsi 6:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0348342}
> │└sdd 1.82t [8:48] Partitioned (dos)
> │ ├sdd1 525.53m [8:49] ext4 'boot1' {a3a1cedc-3866-4d80-af18-a7a4db99d880}
> │ ├sdd2 1.36t [8:50] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> │ └sdd3 465.24g [8:51] MD raid1 (3) inactive
> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
> ├scsi 7:0:0:0 ATA ST31500541AS {5XW05FFV}
> │└sde 1.36t [8:64] Partitioned (dos)
> │ └sde1 1.36t [8:65] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> └scsi 7:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0209553}
> └sdf 1.82t [8:80] Partitioned (dos)
> ├sdf1 525.53m [8:81] ext4 'boot2' {9b0e1e49-c736-47c0-89a1-4cac07c1d5ef}
> ├sdf2 1.36t [8:82] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> └sdf3 465.24g [8:83] MD raid1 (1/3) (w/ sdi3) in_sync
> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
> └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED
> {f89cbbf7:66e9eb44:42ea8b6c:723593c7}
> │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798}
> └Mounted as /dev/disk/by-uuid/ceb15bfe-e082-484c-9015-1fcc8889b798 @ /
> PCI [ata_piix] 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10
> Family) 2 port SATA IDE Controller #2
> ├scsi 8:0:0:0 ATA ST31500341AS {9VS1EFFD}
> │└sdg 1.36t [8:96] Partitioned (dos)
> │ └sdg1 1.36t [8:97] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> └scsi 10:0:0:0 ATA Hitachi HDS5C302 {ML2220F30TEBLE}
> └sdh 1.82t [8:112] Partitioned (dos)
> └sdh1 1.82t [8:113] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> PCI [ahci] 05:00.0 SATA controller: JMicron Technology Corp. JMB363
> SATA/IDE Controller (rev 02)
> ├scsi 9:0:0:0 ATA WDC WD2002FAEX-0 {WD-WMAY01975001}
> │└sdi 1.82t [8:128] Partitioned (dos)
> │ ├sdi1 525.53m [8:129] Empty/Unknown
> │ ├sdi2 1.36t [8:130] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> │ └sdi3 465.24g [8:131] MD raid1 (2/3) (w/ sdf3) in_sync
> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
> │ └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED
> {f89cbbf7:66e9eb44:42ea8b6c:723593c7}
> │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798}
> └scsi 11:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JCDE}
> └sdj 1.82t [8:144] Partitioned (dos)
> └sdj1 1.82t [8:145] Empty/Unknown
> Other Block Devices
> ├loop0 0.00k [7:0] Empty/Unknown
> ├loop1 0.00k [7:1] Empty/Unknown
> ├loop2 0.00k [7:2] Empty/Unknown
> ├loop3 0.00k [7:3] Empty/Unknown
> ├loop4 0.00k [7:4] Empty/Unknown
> ├loop5 0.00k [7:5] Empty/Unknown
> ├loop6 0.00k [7:6] Empty/Unknown
> └loop7 0.00k [7:7] Empty/Unknown
>
>
> mdadm output:
>
> mdadm -E /dev/sdb1 /dev/sda1 /dev/sdc1 /dev/sdd2 /dev/sde1 /dev/sdh1
> /dev/sdg1 /dev/sdi2 /dev/sdj1 /dev/sdf2
> mdadm: No md superblock detected on /dev/sdb1.
> mdadm: No md superblock detected on /dev/sda1.
> /dev/sdc1:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad60723 - correct
> Events : 2579239
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 4 8 1 4 active sync /dev/sda1
>
> 0 0 8 50 0 active sync /dev/sdd2
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
> /dev/sdd2:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad6072e - correct
> Events : 2579239
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 1 8 18 1 active sync
>
> 0 0 8 50 0 active sync /dev/sdd2
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
> /dev/sde1:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad60741 - correct
> Events : 2579239
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 3 8 33 3 active sync /dev/sdc1
>
> 0 0 8 50 0 active sync /dev/sdd2
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
> /dev/sdh1:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad60775 - correct
> Events : 2579239
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 5 8 81 5 active sync /dev/sdf1
>
> 0 0 8 50 0 active sync /dev/sdd2
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
> /dev/sdg1:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad6075f - correct
> Events : 2579239
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 2 8 65 2 active sync /dev/sde1
>
> 0 0 8 50 0 active sync /dev/sdd2
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
> /dev/sdi2:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad60788 - correct
> Events : 2579239
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 6 8 98 6 active sync
>
> 0 0 8 50 0 active sync /dev/sdd2
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
> mdadm: No md superblock detected on /dev/sdj1.
> /dev/sdf2:
> Magic : a92b4efc
> Version : 0.91.00
> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
> Creation Time : Sat Oct 2 07:21:53 2010
> Raid Level : raid6
> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
> Raid Devices : 10
> Total Devices : 10
> Preferred Minor : 1
>
> Reshape pos'n : 4096
> Delta Devices : 3 (7->10)
>
> Update Time : Sat Oct 17 18:59:50 2015
> State : active
> Active Devices : 10
> Working Devices : 10
> Failed Devices : 0
> Spare Devices : 0
> Checksum : fad6074c - correct
> Events : 2579239
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 0 8 50 0 active sync /dev/sdd2
>
> 0 0 8 50 0 active sync /dev/sdd2
> 1 1 8 18 1 active sync
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 33 3 active sync /dev/sdc1
> 4 4 8 1 4 active sync /dev/sda1
> 5 5 8 81 5 active sync /dev/sdf1
> 6 6 8 98 6 active sync
> 7 7 8 145 7 active sync /dev/sdj1
> 8 8 8 129 8 active sync /dev/sdi1
> 9 9 8 113 9 active sync /dev/sdh1
> Apparently my problems don't stop adding up: now SDD started developing
> problems, so my root partition (md0) is now degraded. I will attempt to
> dd out whatever I can from that drive and continue...
Don't. You have another problem: green & desktop drives in a raid
array. They aren't built for it and will give you grief of one form or
another. Anyways, their problem with timeout mismatch can be worked
around with long driver timeouts. Before you do anything else, you
*MUST* run this command:
for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
(Arrange for this to happen on every boot, and keep doing it manually
until your boot scripts are fixed.)
Then you can add your missing mirror and let MD fix it:
mdadm /dev/md0 --add /dev/sdd3
After that's done syncing, you can have MD fix any remaining UREs in
that raid1 with:
echo check >/sys/block/md0/md/sync_action
While that's in progress, take the time to read through the links in the
postscript -- the timeout mismatch problem and its impact on
unrecoverable read errors has been hashed out on this list many times.
Now to your big array. It is vital that it also be cleaned of UREs
after re-creation before you do anything else. Which means it must
*not* be created degraded (the redundancy is needed to fix UREs).
According to lsdrv and your "mdadm -E" reports, the creation order you
need is:
raid device 0 /dev/sdf2 {WD-WMAZA0209553}
raid device 1 /dev/sdd2 {WD-WMAZA0348342}
raid device 2 /dev/sdg1 {9VS1EFFD}
raid device 3 /dev/sde1 {5XW05FFV}
raid device 4 /dev/sdc1 {6XW0BQL0}
raid device 5 /dev/sdh1 {ML2220F30TEBLE}
raid device 6 /dev/sdi2 {WD-WMAY01975001}
Chunk size is 64k.
Make sure your partially assembled array is stopped:
mdadm --stop /dev/md1
Re-create your array as follows:
mdadm --create --assume-clean --verbose \
--metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \
/dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2}
Use "fsck -n" to check your array's filesystem (expect some damage at
the very begining). If it look reasonable, use fsck to fix any damage.
Then clean up any lingering UREs:
echo check > /sys/block/md1/md/sync_action
Now you can mount it and catch any critical backups. (You do know that
raid != backup, I hope.)
Your array now has a new UUID, so you probably want to fix your
mdadm.conf file and your initramfs.
Finaly, go back and do your --grow, with the --backup-file.
In the future, buy drives with raid ratings like the WD Red family, and
make sure you have a cron job that regularly kicks off array scrubs. I
do mine weekly.
HTH,
Phil
[1] http://marc.info/?l=linux-raid&m=139050322510249&w=2
[2] http://marc.info/?l=linux-raid&m=135863964624202&w=2
[3] http://marc.info/?l=linux-raid&m=135811522817345&w=1
[4] http://marc.info/?l=linux-raid&m=133761065622164&w=2
[5] http://marc.info/?l=linux-raid&m=132477199207506
[6] http://marc.info/?l=linux-raid&m=133665797115876&w=2
[7] https://www.marc.info/?l=linux-raid&m=142487508806844&w=3
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 15:42 ` Phil Turmel
@ 2015-10-20 22:34 ` Anugraha Sinha
2015-10-21 3:52 ` andras
` (2 subsequent siblings)
3 siblings, 0 replies; 24+ messages in thread
From: Anugraha Sinha @ 2015-10-20 22:34 UTC (permalink / raw)
To: Phil Turmel; +Cc: Andras Tantos, Linux-RAID
Hi Phil,
Thanks for all the information shared by you over this thread.
It is really informative.
Regards
Anugraha
On Wed, Oct 21, 2015 at 12:42 AM, Phil Turmel <philip@turmel.org> wrote:
> Hi Andras,
>
> { Added linux-raid back -- convention on kernel.org is to reply-to-all,
> trim replies, and either interleave or bottom post. I'm trimming less
> than normal this time so the list can see. }
>
> On 10/20/2015 10:48 AM, andras@tantosonline.com wrote:
>> On 2015-10-20 08:49, Phil Turmel wrote:
>
>>> Please supply all of you mdadm -E reports for the seven partitions and
>>> the lsdrv output I requests. Just post the text inline in your reply.
>>>
>>> Do *not* do anything else.
>>>
>>> Phil
>
>> Thanks for all the help!
>>
>> Here's the output of lsdrv:
>>
>> PCI [pata_marvell] 04:00.1 IDE interface: Marvell Technology Group Ltd.
>> 88SE9128 IDE Controller (rev 11)
>> ├scsi 0:x:x:x [Empty]
>> └scsi 2:x:x:x [Empty]
>> PCI [pata_jmicron] 05:00.1 IDE interface: JMicron Technology Corp.
>> JMB363 SATA/IDE Controller (rev 02)
>> ├scsi 1:x:x:x [Empty]
>> └scsi 3:x:x:x [Empty]
>> PCI [ahci] 04:00.0 SATA controller: Marvell Technology Group Ltd.
>> 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
>> ├scsi 4:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JDN8}
>> │└sda 1.82t [8:0] Partitioned (dos)
>> │ └sda1 1.82t [8:1] Empty/Unknown
>> └scsi 5:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1H84Q}
>> └sdb 1.82t [8:16] Partitioned (dos)
>> └sdb1 1.82t [8:17] ext4 'data' {d1403616-a9c6-4cd9-8d92-1aabc81fe373}
>> PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10
>> Family) 4 port SATA IDE Controller #1
>> ├scsi 6:0:0:0 ATA ST31500541AS {6XW0BQL0}
>> │└sdc 1.36t [8:32] Partitioned (dos)
>> │ └sdc1 1.36t [8:33] MD raid6 (10) inactive
>> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>> ├scsi 6:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0348342}
>> │└sdd 1.82t [8:48] Partitioned (dos)
>> │ ├sdd1 525.53m [8:49] ext4 'boot1' {a3a1cedc-3866-4d80-af18-a7a4db99d880}
>> │ ├sdd2 1.36t [8:50] MD raid6 (10) inactive
>> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>> │ └sdd3 465.24g [8:51] MD raid1 (3) inactive
>> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
>> ├scsi 7:0:0:0 ATA ST31500541AS {5XW05FFV}
>> │└sde 1.36t [8:64] Partitioned (dos)
>> │ └sde1 1.36t [8:65] MD raid6 (10) inactive
>> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>> └scsi 7:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0209553}
>> └sdf 1.82t [8:80] Partitioned (dos)
>> ├sdf1 525.53m [8:81] ext4 'boot2' {9b0e1e49-c736-47c0-89a1-4cac07c1d5ef}
>> ├sdf2 1.36t [8:82] MD raid6 (10) inactive
>> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>> └sdf3 465.24g [8:83] MD raid1 (1/3) (w/ sdi3) in_sync
>> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
>> └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED
>> {f89cbbf7:66e9eb44:42ea8b6c:723593c7}
>> │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798}
>> └Mounted as /dev/disk/by-uuid/ceb15bfe-e082-484c-9015-1fcc8889b798 @ /
>> PCI [ata_piix] 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10
>> Family) 2 port SATA IDE Controller #2
>> ├scsi 8:0:0:0 ATA ST31500341AS {9VS1EFFD}
>> │└sdg 1.36t [8:96] Partitioned (dos)
>> │ └sdg1 1.36t [8:97] MD raid6 (10) inactive
>> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>> └scsi 10:0:0:0 ATA Hitachi HDS5C302 {ML2220F30TEBLE}
>> └sdh 1.82t [8:112] Partitioned (dos)
>> └sdh1 1.82t [8:113] MD raid6 (10) inactive
>> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>> PCI [ahci] 05:00.0 SATA controller: JMicron Technology Corp. JMB363
>> SATA/IDE Controller (rev 02)
>> ├scsi 9:0:0:0 ATA WDC WD2002FAEX-0 {WD-WMAY01975001}
>> │└sdi 1.82t [8:128] Partitioned (dos)
>> │ ├sdi1 525.53m [8:129] Empty/Unknown
>> │ ├sdi2 1.36t [8:130] MD raid6 (10) inactive
>> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>> │ └sdi3 465.24g [8:131] MD raid1 (2/3) (w/ sdf3) in_sync
>> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
>> │ └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED
>> {f89cbbf7:66e9eb44:42ea8b6c:723593c7}
>> │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798}
>> └scsi 11:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JCDE}
>> └sdj 1.82t [8:144] Partitioned (dos)
>> └sdj1 1.82t [8:145] Empty/Unknown
>> Other Block Devices
>> ├loop0 0.00k [7:0] Empty/Unknown
>> ├loop1 0.00k [7:1] Empty/Unknown
>> ├loop2 0.00k [7:2] Empty/Unknown
>> ├loop3 0.00k [7:3] Empty/Unknown
>> ├loop4 0.00k [7:4] Empty/Unknown
>> ├loop5 0.00k [7:5] Empty/Unknown
>> ├loop6 0.00k [7:6] Empty/Unknown
>> └loop7 0.00k [7:7] Empty/Unknown
>>
>>
>> mdadm output:
>>
>> mdadm -E /dev/sdb1 /dev/sda1 /dev/sdc1 /dev/sdd2 /dev/sde1 /dev/sdh1
>> /dev/sdg1 /dev/sdi2 /dev/sdj1 /dev/sdf2
>
>> mdadm: No md superblock detected on /dev/sdb1.
>
>> mdadm: No md superblock detected on /dev/sda1.
>
>> /dev/sdc1:
>> Magic : a92b4efc
>> Version : 0.91.00
>> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>> Creation Time : Sat Oct 2 07:21:53 2010
>> Raid Level : raid6
>> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Preferred Minor : 1
>>
>> Reshape pos'n : 4096
>> Delta Devices : 3 (7->10)
>>
>> Update Time : Sat Oct 17 18:59:50 2015
>> State : active
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>> Checksum : fad60723 - correct
>> Events : 2579239
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 4 8 1 4 active sync /dev/sda1
>>
>> 0 0 8 50 0 active sync /dev/sdd2
>> 1 1 8 18 1 active sync
>> 2 2 8 65 2 active sync /dev/sde1
>> 3 3 8 33 3 active sync /dev/sdc1
>> 4 4 8 1 4 active sync /dev/sda1
>> 5 5 8 81 5 active sync /dev/sdf1
>> 6 6 8 98 6 active sync
>> 7 7 8 145 7 active sync /dev/sdj1
>> 8 8 8 129 8 active sync /dev/sdi1
>> 9 9 8 113 9 active sync /dev/sdh1
>
>> /dev/sdd2:
>> Magic : a92b4efc
>> Version : 0.91.00
>> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>> Creation Time : Sat Oct 2 07:21:53 2010
>> Raid Level : raid6
>> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Preferred Minor : 1
>>
>> Reshape pos'n : 4096
>> Delta Devices : 3 (7->10)
>>
>> Update Time : Sat Oct 17 18:59:50 2015
>> State : active
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>> Checksum : fad6072e - correct
>> Events : 2579239
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 1 8 18 1 active sync
>>
>> 0 0 8 50 0 active sync /dev/sdd2
>> 1 1 8 18 1 active sync
>> 2 2 8 65 2 active sync /dev/sde1
>> 3 3 8 33 3 active sync /dev/sdc1
>> 4 4 8 1 4 active sync /dev/sda1
>> 5 5 8 81 5 active sync /dev/sdf1
>> 6 6 8 98 6 active sync
>> 7 7 8 145 7 active sync /dev/sdj1
>> 8 8 8 129 8 active sync /dev/sdi1
>> 9 9 8 113 9 active sync /dev/sdh1
>
>> /dev/sde1:
>> Magic : a92b4efc
>> Version : 0.91.00
>> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>> Creation Time : Sat Oct 2 07:21:53 2010
>> Raid Level : raid6
>> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Preferred Minor : 1
>>
>> Reshape pos'n : 4096
>> Delta Devices : 3 (7->10)
>>
>> Update Time : Sat Oct 17 18:59:50 2015
>> State : active
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>> Checksum : fad60741 - correct
>> Events : 2579239
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 3 8 33 3 active sync /dev/sdc1
>>
>> 0 0 8 50 0 active sync /dev/sdd2
>> 1 1 8 18 1 active sync
>> 2 2 8 65 2 active sync /dev/sde1
>> 3 3 8 33 3 active sync /dev/sdc1
>> 4 4 8 1 4 active sync /dev/sda1
>> 5 5 8 81 5 active sync /dev/sdf1
>> 6 6 8 98 6 active sync
>> 7 7 8 145 7 active sync /dev/sdj1
>> 8 8 8 129 8 active sync /dev/sdi1
>> 9 9 8 113 9 active sync /dev/sdh1
>
>> /dev/sdh1:
>> Magic : a92b4efc
>> Version : 0.91.00
>> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>> Creation Time : Sat Oct 2 07:21:53 2010
>> Raid Level : raid6
>> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Preferred Minor : 1
>>
>> Reshape pos'n : 4096
>> Delta Devices : 3 (7->10)
>>
>> Update Time : Sat Oct 17 18:59:50 2015
>> State : active
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>> Checksum : fad60775 - correct
>> Events : 2579239
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 5 8 81 5 active sync /dev/sdf1
>>
>> 0 0 8 50 0 active sync /dev/sdd2
>> 1 1 8 18 1 active sync
>> 2 2 8 65 2 active sync /dev/sde1
>> 3 3 8 33 3 active sync /dev/sdc1
>> 4 4 8 1 4 active sync /dev/sda1
>> 5 5 8 81 5 active sync /dev/sdf1
>> 6 6 8 98 6 active sync
>> 7 7 8 145 7 active sync /dev/sdj1
>> 8 8 8 129 8 active sync /dev/sdi1
>> 9 9 8 113 9 active sync /dev/sdh1
>
>> /dev/sdg1:
>> Magic : a92b4efc
>> Version : 0.91.00
>> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>> Creation Time : Sat Oct 2 07:21:53 2010
>> Raid Level : raid6
>> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Preferred Minor : 1
>>
>> Reshape pos'n : 4096
>> Delta Devices : 3 (7->10)
>>
>> Update Time : Sat Oct 17 18:59:50 2015
>> State : active
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>> Checksum : fad6075f - correct
>> Events : 2579239
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 2 8 65 2 active sync /dev/sde1
>>
>> 0 0 8 50 0 active sync /dev/sdd2
>> 1 1 8 18 1 active sync
>> 2 2 8 65 2 active sync /dev/sde1
>> 3 3 8 33 3 active sync /dev/sdc1
>> 4 4 8 1 4 active sync /dev/sda1
>> 5 5 8 81 5 active sync /dev/sdf1
>> 6 6 8 98 6 active sync
>> 7 7 8 145 7 active sync /dev/sdj1
>> 8 8 8 129 8 active sync /dev/sdi1
>> 9 9 8 113 9 active sync /dev/sdh1
>
>> /dev/sdi2:
>> Magic : a92b4efc
>> Version : 0.91.00
>> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>> Creation Time : Sat Oct 2 07:21:53 2010
>> Raid Level : raid6
>> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Preferred Minor : 1
>>
>> Reshape pos'n : 4096
>> Delta Devices : 3 (7->10)
>>
>> Update Time : Sat Oct 17 18:59:50 2015
>> State : active
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>> Checksum : fad60788 - correct
>> Events : 2579239
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 6 8 98 6 active sync
>>
>> 0 0 8 50 0 active sync /dev/sdd2
>> 1 1 8 18 1 active sync
>> 2 2 8 65 2 active sync /dev/sde1
>> 3 3 8 33 3 active sync /dev/sdc1
>> 4 4 8 1 4 active sync /dev/sda1
>> 5 5 8 81 5 active sync /dev/sdf1
>> 6 6 8 98 6 active sync
>> 7 7 8 145 7 active sync /dev/sdj1
>> 8 8 8 129 8 active sync /dev/sdi1
>> 9 9 8 113 9 active sync /dev/sdh1
>
>> mdadm: No md superblock detected on /dev/sdj1.
>
>> /dev/sdf2:
>> Magic : a92b4efc
>> Version : 0.91.00
>> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>> Creation Time : Sat Oct 2 07:21:53 2010
>> Raid Level : raid6
>> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>> Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>> Raid Devices : 10
>> Total Devices : 10
>> Preferred Minor : 1
>>
>> Reshape pos'n : 4096
>> Delta Devices : 3 (7->10)
>>
>> Update Time : Sat Oct 17 18:59:50 2015
>> State : active
>> Active Devices : 10
>> Working Devices : 10
>> Failed Devices : 0
>> Spare Devices : 0
>> Checksum : fad6074c - correct
>> Events : 2579239
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 0 8 50 0 active sync /dev/sdd2
>>
>> 0 0 8 50 0 active sync /dev/sdd2
>> 1 1 8 18 1 active sync
>> 2 2 8 65 2 active sync /dev/sde1
>> 3 3 8 33 3 active sync /dev/sdc1
>> 4 4 8 1 4 active sync /dev/sda1
>> 5 5 8 81 5 active sync /dev/sdf1
>> 6 6 8 98 6 active sync
>> 7 7 8 145 7 active sync /dev/sdj1
>> 8 8 8 129 8 active sync /dev/sdi1
>> 9 9 8 113 9 active sync /dev/sdh1
>
>> Apparently my problems don't stop adding up: now SDD started developing
>> problems, so my root partition (md0) is now degraded. I will attempt to
>> dd out whatever I can from that drive and continue...
>
> Don't. You have another problem: green & desktop drives in a raid
> array. They aren't built for it and will give you grief of one form or
> another. Anyways, their problem with timeout mismatch can be worked
> around with long driver timeouts. Before you do anything else, you
> *MUST* run this command:
>
> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
>
> (Arrange for this to happen on every boot, and keep doing it manually
> until your boot scripts are fixed.)
>
> Then you can add your missing mirror and let MD fix it:
>
> mdadm /dev/md0 --add /dev/sdd3
>
> After that's done syncing, you can have MD fix any remaining UREs in
> that raid1 with:
>
> echo check >/sys/block/md0/md/sync_action
>
> While that's in progress, take the time to read through the links in the
> postscript -- the timeout mismatch problem and its impact on
> unrecoverable read errors has been hashed out on this list many times.
>
> Now to your big array. It is vital that it also be cleaned of UREs
> after re-creation before you do anything else. Which means it must
> *not* be created degraded (the redundancy is needed to fix UREs).
>
> According to lsdrv and your "mdadm -E" reports, the creation order you
> need is:
>
> raid device 0 /dev/sdf2 {WD-WMAZA0209553}
> raid device 1 /dev/sdd2 {WD-WMAZA0348342}
> raid device 2 /dev/sdg1 {9VS1EFFD}
> raid device 3 /dev/sde1 {5XW05FFV}
> raid device 4 /dev/sdc1 {6XW0BQL0}
> raid device 5 /dev/sdh1 {ML2220F30TEBLE}
> raid device 6 /dev/sdi2 {WD-WMAY01975001}
>
> Chunk size is 64k.
>
> Make sure your partially assembled array is stopped:
>
> mdadm --stop /dev/md1
>
> Re-create your array as follows:
>
> mdadm --create --assume-clean --verbose \
> --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \
> /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2}
>
> Use "fsck -n" to check your array's filesystem (expect some damage at
> the very begining). If it look reasonable, use fsck to fix any damage.
>
> Then clean up any lingering UREs:
>
> echo check > /sys/block/md1/md/sync_action
>
> Now you can mount it and catch any critical backups. (You do know that
> raid != backup, I hope.)
>
> Your array now has a new UUID, so you probably want to fix your
> mdadm.conf file and your initramfs.
>
> Finaly, go back and do your --grow, with the --backup-file.
>
> In the future, buy drives with raid ratings like the WD Red family, and
> make sure you have a cron job that regularly kicks off array scrubs. I
> do mine weekly.
>
> HTH,
>
> Phil
>
> [1] http://marc.info/?l=linux-raid&m=139050322510249&w=2
> [2] http://marc.info/?l=linux-raid&m=135863964624202&w=2
> [3] http://marc.info/?l=linux-raid&m=135811522817345&w=1
> [4] http://marc.info/?l=linux-raid&m=133761065622164&w=2
> [5] http://marc.info/?l=linux-raid&m=132477199207506
> [6] http://marc.info/?l=linux-raid&m=133665797115876&w=2
> [7] https://www.marc.info/?l=linux-raid&m=142487508806844&w=3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 2:35 How to recover after md crash during reshape? andras
` (2 preceding siblings ...)
2015-10-20 13:49 ` Phil Turmel
@ 2015-10-21 1:35 ` Neil Brown
2015-10-21 4:03 ` andras
2015-10-21 12:18 ` Phil Turmel
3 siblings, 2 replies; 24+ messages in thread
From: Neil Brown @ 2015-10-21 1:35 UTC (permalink / raw)
To: andras, linux-raid
[-- Attachment #1: Type: text/plain, Size: 5493 bytes --]
andras@tantosonline.com writes:
Phil has provided lots of useful advice, I'll just add a couple of
clarifications;
>
> mdadm --grow --raid-devices=10 /dev/md1
>
> Yes, I was dumb enough to start the process without a backup option -
> (copy-paste error from https://raid.wiki.kernel.org/index.php/Growing).
Nothing dumb about that - you don't need a --backup option.
If you did, mdadm would have complained.
You only need --backup when the size of the array is unchanged or
decreasing.
(or when growing to a degraded array. e.g. you can reshape a 4-drive
raid5 to a degraded 5-drive raid5 without adding a spare. This will
required a --backup. I'm fairly sure it also requires --force because
it is a very strange thing to do).
When reshaping it a larger array, mdadm only requires a backup while
reshaping the first few stripes, and it uses some space in one of the
new (previously spare) devices to store that backup.
>
> This immediately (well, after 2 seconds) crashed the MD driver:
>
> Oct 17 17:30:27 bazsalikom kernel: [7869821.514718] sd 0:0:0:0:
> [sdj] Attached SCSI disk
> Oct 17 18:39:21 bazsalikom kernel: [7873955.418679] sdh: sdh1
> Oct 17 18:39:37 bazsalikom kernel: [7873972.155084] sdi: sdi1
> Oct 17 18:39:49 bazsalikom kernel: [7873983.916038] sdj: sdj1
> Oct 17 18:40:33 bazsalikom kernel: [7874027.963430] md: bind<sdh1>
> Oct 17 18:40:34 bazsalikom kernel: [7874028.263656] md: bind<sdi1>
> Oct 17 18:40:34 bazsalikom kernel: [7874028.361112] md: bind<sdj1>
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667815] md: reshape of
> RAID array md1
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667818] md: minimum
> _guaranteed_ speed: 1000 KB/sec/disk.
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667821] md: using
> maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for reshape.
> Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using 128k
> window, over a total of 1465135936k.
> --> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md: md_do_sync()
> got signal ... exiting
This is very strange ... maybe some messages missing?
Probably an IO error while writing to a new device.
>
> From here on, things went downhill pretty damn fast. I was not able to
> unmount the file-system, stop or re-start the array (/proc/mdstat went
> away), any process trying to touch /dev/md1 hung, so eventually, I run
> out of options and hit the reset button on the machine.
>
> Upon reboot, the array wouldn't assemble, it was complaining that SDA
> and SDA1 had the same superblock info on it.
>
> mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
> superblocks.
> If they are really different, please --zero the superblock on one
> If they are the same or overlap, please remove one from the
> DEVICE list in mdadm.conf.
It's very hard to make messages like this clear without being incredibly
verbose...
In this case /dev/sda and /dev/sda1 obviously overlap (that is obvious,
isn't it?).
So in that case you need to remove one of them from the DEVICE list.
You probably don't have a DEVICE list so it defaults to everything listed in
/proc/partitions.
The "correct" thing to do at this point would have been to add a DEVICE
list to mdadm.conf which only listed the devices that might be part of
an array. e.g.
DEVICE /dev/sd[a-z][1-9]
> So, if I read this right, the superblock here states that the array is
> in the middle of a reshape from 7 to 10 devices, but it just started
> (4096 is the position).
> What's interesting is the device names listed here don't match the ones
> reported by /proc/mdstat, and are actually incorrect. The right
> partition numbers are in /proc/mdstat.
>
> The superblocks on the 6 other original disks match, except for of
> course which one they mark as 'this' and the checksum.
>
> I've read in here (http://ubuntuforums.org/showthread.php?t=2133576)
> among many other places that it might be possible to recover the data on
> the array by trying to re-create it to the state before the re-shape.
>
> I've also read that if I want to re-create an array in read-only mode, I
> should re-create it degraded.
>
> So, what I thought I would do is this:
>
> mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2
> /dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing
Phil has given good advice on this point which is worth following.
It is quite possible that there will still be corruption.
mdadm reads the first few stripes and stores them somewhere in each of
the spares. md (in the kernel) then reads those stripes again and
writes them out in the new configuration. It appears that one of the
writes failed, others might have succeeded. This may not have corrupted
anything (the first few blocks are in the same position for both the old
and new layout) but it might have done.
So if the filesystem seems corrupt after the array is re-created, that
is likely the reason.
The data still exists in the backup on those new devices (if you haven't
done anything to them) and could be restored.
If you do want to look for the backup, it is around about the middle of
the device and has some metadata which contains the string
"md_backup_data-1". If you find that, you are close to getting the
backup data back.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 15:42 ` Phil Turmel
2015-10-20 22:34 ` Anugraha Sinha
@ 2015-10-21 3:52 ` andras
2015-10-21 12:01 ` Phil Turmel
2015-10-21 16:17 ` Wols Lists
2015-10-25 14:15 ` andras
3 siblings, 1 reply; 24+ messages in thread
From: andras @ 2015-10-21 3:52 UTC (permalink / raw)
To: Phil Turmel; +Cc: Linux-RAID
Phil,
Thank you so much for the detailed explanation and your patience with
me! Sorry for not being more responsive - I don't have access to this
mail account from work.
>
>> Apparently my problems don't stop adding up: now SDD started
>> developing
>> problems, so my root partition (md0) is now degraded. I will attempt
>> to
>> dd out whatever I can from that drive and continue...
>
> Don't. You have another problem: green & desktop drives in a raid
> array. They aren't built for it and will give you grief of one form or
> another. Anyways, their problem with timeout mismatch can be worked
> around with long driver timeouts. Before you do anything else, you
> *MUST* run this command:
>
> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
>
> (Arrange for this to happen on every boot, and keep doing it manually
> until your boot scripts are fixed.)
Yes, will do. In your links below it seems that you're half advocating
for using desktop drives in RAID arrays, half advocating against. From
what I can tell, it seems the recommendation might depend on the
use-case. If one doesn't care too much about instant performance in case
of errors, one might want to use desktop drivers (with the above fix).
If one wants reliable performance, one probably wants NAS drives. Did I
understand the basic trade-off correctly?
It seems that people also think that green drives are a bad idea in
RAIDs in general - mostly because the frequent parking of heads reduces
life-time. Is that a correct statement?
> Then you can add your missing mirror and let MD fix it:
>
> mdadm /dev/md0 --add /dev/sdd3
>
> After that's done syncing, you can have MD fix any remaining UREs in
> that raid1 with:
>
> echo check >/sys/block/md0/md/sync_action
>
> While that's in progress, take the time to read through the links in
> the
> postscript -- the timeout mismatch problem and its impact on
> unrecoverable read errors has been hashed out on this list many times.
>
> Now to your big array. It is vital that it also be cleaned of UREs
> after re-creation before you do anything else. Which means it must
> *not* be created degraded (the redundancy is needed to fix UREs).
>
> According to lsdrv and your "mdadm -E" reports, the creation order you
> need is:
>
> raid device 0 /dev/sdf2 {WD-WMAZA0209553}
> raid device 1 /dev/sdd2 {WD-WMAZA0348342}
> raid device 2 /dev/sdg1 {9VS1EFFD}
> raid device 3 /dev/sde1 {5XW05FFV}
> raid device 4 /dev/sdc1 {6XW0BQL0}
> raid device 5 /dev/sdh1 {ML2220F30TEBLE}
> raid device 6 /dev/sdi2 {WD-WMAY01975001}
>
> Chunk size is 64k.
>
> Make sure your partially assembled array is stopped:
>
> mdadm --stop /dev/md1
>
> Re-create your array as follows:
>
> mdadm --create --assume-clean --verbose \
> --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \
> /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2}
>
> Use "fsck -n" to check your array's filesystem (expect some damage at
> the very begining). If it look reasonable, use fsck to fix any damage.
>
> Then clean up any lingering UREs:
>
> echo check > /sys/block/md1/md/sync_action
>
> Now you can mount it and catch any critical backups. (You do know that
> raid != backup, I hope.)
>
> Your array now has a new UUID, so you probably want to fix your
> mdadm.conf file and your initramfs.
Yes sir! I will go through the steps and report back. One question: the
reason I shouldn't attempt to re-create the new 10-disk array is that it
would wipe out the 7->10 grow progress, so MD would think that it's a
fully grown 10-disk array, right?
> Finaly, go back and do your --grow, with the --backup-file.
>
> In the future, buy drives with raid ratings like the WD Red family, and
> make sure you have a cron job that regularly kicks off array scrubs. I
> do mine weekly.
Thanks for the info. This is the first time someone mentions scrubbing
with regards to RAID to me, but it makes total sense. I will set it up.
Thanks again,
Andras
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown
@ 2015-10-21 4:03 ` andras
2015-10-21 12:18 ` Phil Turmel
1 sibling, 0 replies; 24+ messages in thread
From: andras @ 2015-10-21 4:03 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil,
Thanks for helping me out!
>> Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using
>> 128k
>> window, over a total of 1465135936k.
>> --> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md:
>> md_do_sync()
>> got signal ... exiting
>
> This is very strange ... maybe some messages missing?
> Probably an IO error while writing to a new device.
I'm not sure what have happened either. This is /var/log/messages. Maybe
those things go into a different log?
>> mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
>> superblocks.
>> If they are really different, please --zero the superblock on
>> one
>> If they are the same or overlap, please remove one from the
>> DEVICE list in mdadm.conf.
>
> It's very hard to make messages like this clear without being
> incredibly
> verbose...
>
> In this case /dev/sda and /dev/sda1 obviously overlap (that is obvious,
> isn't it?).
> So in that case you need to remove one of them from the DEVICE list.
> You probably don't have a DEVICE list so it defaults to everything
> listed in
> /proc/partitions.
> The "correct" thing to do at this point would have been to add a DEVICE
> list to mdadm.conf which only listed the devices that might be part of
> an array. e.g.
>
> DEVICE /dev/sd[a-z][1-9]
Understood. My problem was that when I googled for the problem, people
agreed with the suggested solution of the zeroing the superblock. I
guess it tells you how much you should trust 'common wisdom'.
>
> Phil has given good advice on this point which is worth following.
> It is quite possible that there will still be corruption.
>
> mdadm reads the first few stripes and stores them somewhere in each of
> the spares. md (in the kernel) then reads those stripes again and
> writes them out in the new configuration. It appears that one of the
> writes failed, others might have succeeded. This may not have
> corrupted
> anything (the first few blocks are in the same position for both the
> old
> and new layout) but it might have done.
>
> So if the filesystem seems corrupt after the array is re-created, that
> is likely the reason.
> The data still exists in the backup on those new devices (if you
> haven't
> done anything to them) and could be restored.
>
> If you do want to look for the backup, it is around about the middle of
> the device and has some metadata which contains the string
> "md_backup_data-1". If you find that, you are close to getting the
> backup data back.
>
> NeilBrown
Oh, gosh, I hope I don't have to do that deep of a surgery. No, I
haven't touched the new HDDs other then zeroing the superblock. So
whatever was on them, is still there. I'll see how much damage there is
to the FS after I reconstruct the array.
Thanks for all the help!
Andras
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-21 3:52 ` andras
@ 2015-10-21 12:01 ` Phil Turmel
0 siblings, 0 replies; 24+ messages in thread
From: Phil Turmel @ 2015-10-21 12:01 UTC (permalink / raw)
To: andras; +Cc: Linux-RAID
Good morning Andras,
On 10/20/2015 11:52 PM, andras@tantosonline.com wrote:
> Phil,
>
> Thank you so much for the detailed explanation and your patience with
> me! Sorry for not being more responsive - I don't have access to this
> mail account from work.
No worries.
>> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
>>
>> (Arrange for this to happen on every boot, and keep doing it manually
>> until your boot scripts are fixed.)
>
> Yes, will do. In your links below it seems that you're half advocating
> for using desktop drives in RAID arrays, half advocating against. From
> what I can tell, it seems the recommendation might depend on the
> use-case. If one doesn't care too much about instant performance in case
> of errors, one might want to use desktop drivers (with the above fix).
> If one wants reliable performance, one probably wants NAS drives. Did I
> understand the basic trade-off correctly?
Times change. At the time some of those were written, desktop drives
with scterc support were still available, but default off. Those are ok
in a raid if you have the appropriate smartctl command in your boot scripts.
Long timeouts with non-scterc drives, in my opinion, create a user
impression that things are broken, even if the drive is fine (UREs are
natural and unavoidable in the life of a drive). Users are prone to
drastic measures when they think something is broken. Also,
*applications* might not wait that long for their read, either. So, I
only recommend the long timeout solution when an array is already in
trouble with such drives.
> It seems that people also think that green drives are a bad idea in
> RAIDs in general - mostly because the frequent parking of heads reduces
> life-time. Is that a correct statement?
I don't have enough experience with green drives to say. The few that I
have (bought before I discovered the dropped scterc support) became part
of my offsite backup rotation.
> Yes sir! I will go through the steps and report back. One question: the
> reason I shouldn't attempt to re-create the new 10-disk array is that it
> would wipe out the 7->10 grow progress, so MD would think that it's a
> fully grown 10-disk array, right?
Right. Your three extra drives never really were incorporated into the
array, so the data layout is still a 7-drive pattern.
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown
2015-10-21 4:03 ` andras
@ 2015-10-21 12:18 ` Phil Turmel
2015-10-21 20:26 ` Neil Brown
1 sibling, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2015-10-21 12:18 UTC (permalink / raw)
To: Neil Brown, andras, linux-raid
Good morning Neil,
On 10/20/2015 09:35 PM, Neil Brown wrote:
> Nothing dumb about that - you don't need a --backup option.
> If you did, mdadm would have complained.
>
> You only need --backup when the size of the array is unchanged or
> decreasing.
> mdadm reads the first few stripes and stores them somewhere in each of
> the spares. md (in the kernel) then reads those stripes again and
> writes them out in the new configuration. It appears that one of the
> writes failed, others might have succeeded. This may not have corrupted
> anything (the first few blocks are in the same position for both the old
> and new layout) but it might have done.
> If you do want to look for the backup, it is around about the middle of
> the device and has some metadata which contains the string
> "md_backup_data-1". If you find that, you are close to getting the
> backup data back.
Hmmm. This feature has advanced beyond my last look at the code. I was
under the impression the backup option was only optional when mdadm
could move the data offset. Does this new algorithm apply to v0.90
metadata, a v3.2 kernel, and v3.2.5 mdadm?
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-21 16:17 ` Wols Lists
@ 2015-10-21 16:05 ` Phil Turmel
0 siblings, 0 replies; 24+ messages in thread
From: Phil Turmel @ 2015-10-21 16:05 UTC (permalink / raw)
To: Wols Lists, andras, Linux-RAID
Hi Wols,
I glad you've got the big picture correct, but some details need to be
addressed:
On 10/21/2015 12:17 PM, Wols Lists wrote:
> tl;dr summary ...
>
> Desktop drives are spec'd as being okay with one soft error per 10TB
> read - that's where a read fails, you try again, and everything's okay.
No, this isn't correct.
That spec is for *unrecoverable* read errors. For desktop drives,
typically spec'd as one such error every 1e14 bits read, on average.
These are failures where you really have lost the sector contents. Such
sectors are marked as "Pending Relocations" in drive firmware. But the
recording surface might still be good, so the drive waits for a write to
that pending sector, which it then verifies, before deciding to relocate
or not.
When MD raid receives a read error, whether in normal operation or a
scrub, it will reconstruct the missing data and write it back, closing
this loop immediately. Where "normal operation" means "read errors are
reported by the drive before the driver times out".
> A resync will scan the array from start to finish - if you have 10TB's
> worth of disk, you MUST be prepared to handle these errors.
>
> By default, mdadm will assume a disk is faulty and kick it after about
> 10secs, but a desktop drive will hang for maybe several minutes before
> reporting a problem.
MD raid has no timeout, and does not kick drives out for occassional
read errors. The timeout is in the per-device drivers (SCSI, SATA,
whatever). Which defaults to 30 seconds. Desktop drives typically keep
trying to read a bad sector for 120 seconds or more, ignoring the world
while they do so. Drives with default SCTERC support typically report a
read error within four to seven seconds.
With a desktop drive, the linux device driver bails after 30 seconds and
resets the link to the drive -- which gets ignored. And keeps getting
ignored until the original read retry cycle finishes. During this time,
MD has reconstructed the data and told the driver to write the fixed
sector. That *write* also fails (because the driver is failing to
reset) and that *write error* kicks the drive out of the array.
Anyways, please consider reading the threads I pointed Andras at :-)
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 15:42 ` Phil Turmel
2015-10-20 22:34 ` Anugraha Sinha
2015-10-21 3:52 ` andras
@ 2015-10-21 16:17 ` Wols Lists
2015-10-21 16:05 ` Phil Turmel
2015-10-25 14:15 ` andras
3 siblings, 1 reply; 24+ messages in thread
From: Wols Lists @ 2015-10-21 16:17 UTC (permalink / raw)
To: andras, Linux-RAID
On 20/10/15 16:42, Phil Turmel wrote:
> Don't. You have another problem: green & desktop drives in a raid
> array. They aren't built for it and will give you grief of one form or
> another. Anyways, their problem with timeout mismatch can be worked
> around with long driver timeouts. Before you do anything else, you
> *MUST* run this command:
>
> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
>
> (Arrange for this to happen on every boot, and keep doing it manually
> until your boot scripts are fixed.)
tl;dr summary ...
Desktop drives are spec'd as being okay with one soft error per 10TB
read - that's where a read fails, you try again, and everything's okay.
A resync will scan the array from start to finish - if you have 10TB's
worth of disk, you MUST be prepared to handle these errors.
By default, mdadm will assume a disk is faulty and kick it after about
10secs, but a desktop drive will hang for maybe several minutes before
reporting a problem.
In other words, your drives can meet manufacturer's specs, but, with
default settings, your array will never be able to rebuild after a
problem! (Note that many people will say "I've never had a problem", but
most drives are better than spec. You just don't want to be the unlucky
one ...)
Not that I have any (yet), but I'd second the recommendation for WD
Reds. I've got Seagate Barracudas (not raid-compliant), and the Reds are
not much more expensive, and are also the only drives I've found that
support the raid features - mostly that by default they will fail and
report a problem very quickly. (Plus they're spec'd at reading about
40TB per soft error :-)
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-21 12:18 ` Phil Turmel
@ 2015-10-21 20:26 ` Neil Brown
2015-10-21 20:37 ` Phil Turmel
0 siblings, 1 reply; 24+ messages in thread
From: Neil Brown @ 2015-10-21 20:26 UTC (permalink / raw)
To: Phil Turmel, andras, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1909 bytes --]
Phil Turmel <philip@turmel.org> writes:
> Good morning Neil,
>
> On 10/20/2015 09:35 PM, Neil Brown wrote:
>
>> Nothing dumb about that - you don't need a --backup option.
>> If you did, mdadm would have complained.
>>
>> You only need --backup when the size of the array is unchanged or
>> decreasing.
>
>> mdadm reads the first few stripes and stores them somewhere in each of
>> the spares. md (in the kernel) then reads those stripes again and
>> writes them out in the new configuration. It appears that one of the
>> writes failed, others might have succeeded. This may not have corrupted
>> anything (the first few blocks are in the same position for both the old
>> and new layout) but it might have done.
>
>> If you do want to look for the backup, it is around about the middle of
>> the device and has some metadata which contains the string
>> "md_backup_data-1". If you find that, you are close to getting the
>> backup data back.
>
> Hmmm. This feature has advanced beyond my last look at the code. I was
> under the impression the backup option was only optional when mdadm
> could move the data offset. Does this new algorithm apply to v0.90
> metadata, a v3.2 kernel, and v3.2.5 mdadm?
>
It isn't a new algorithm, it is the original algorithm.
In mdadm-2.4-pre1 (march 2006), you couldn't specify a backup file, but
you could grow a raid5 to more devices.
That was changed by a patch with comment:
Allow resize to backup to a file.
To support resizing an array without a spare, mdadm now understands
--backup-file=
which should point to a file for storing a backup of critical data.
This can be given to --grow which will create the file, or
--assemble which will restore from the file if needed.
The backup-file was subsequently used to support in-place reshapes and
array shrinking.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-21 20:26 ` Neil Brown
@ 2015-10-21 20:37 ` Phil Turmel
0 siblings, 0 replies; 24+ messages in thread
From: Phil Turmel @ 2015-10-21 20:37 UTC (permalink / raw)
To: Neil Brown, andras, linux-raid
On 10/21/2015 04:26 PM, Neil Brown wrote:
> Phil Turmel <philip@turmel.org> writes:
>> Hmmm. This feature has advanced beyond my last look at the code. I was
>> under the impression the backup option was only optional when mdadm
>> could move the data offset. Does this new algorithm apply to v0.90
>> metadata, a v3.2 kernel, and v3.2.5 mdadm?
>>
>
> It isn't a new algorithm, it is the original algorithm.
>
> In mdadm-2.4-pre1 (march 2006), you couldn't specify a backup file, but
> you could grow a raid5 to more devices.
> That was changed by a patch with comment:
>
> Allow resize to backup to a file.
>
> To support resizing an array without a spare, mdadm now understands
> --backup-file=
> which should point to a file for storing a backup of critical data.
> This can be given to --grow which will create the file, or
> --assemble which will restore from the file if needed.
>
> The backup-file was subsequently used to support in-place reshapes and
> array shrinking.
Ah, ok. I wasn't using parity raid that far back, and never noticed
that growing to more devices worked that way.
Thanks for clarifying.
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-20 15:42 ` Phil Turmel
` (2 preceding siblings ...)
2015-10-21 16:17 ` Wols Lists
@ 2015-10-25 14:15 ` andras
2015-10-25 23:02 ` Phil Turmel
3 siblings, 1 reply; 24+ messages in thread
From: andras @ 2015-10-25 14:15 UTC (permalink / raw)
To: Phil Turmel; +Cc: Linux-RAID
Phil,
Thanks for all the help. I finally have some progress (and new
problems).
> Now to your big array. It is vital that it also be cleaned of UREs
> after re-creation before you do anything else. Which means it must
> *not* be created degraded (the redundancy is needed to fix UREs).
>
> According to lsdrv and your "mdadm -E" reports, the creation order you
> need is:
>
> raid device 0 /dev/sdf2 {WD-WMAZA0209553}
> raid device 1 /dev/sdd2 {WD-WMAZA0348342}
> raid device 2 /dev/sdg1 {9VS1EFFD}
> raid device 3 /dev/sde1 {5XW05FFV}
> raid device 4 /dev/sdc1 {6XW0BQL0}
> raid device 5 /dev/sdh1 {ML2220F30TEBLE}
> raid device 6 /dev/sdi2 {WD-WMAY01975001}
>
> Chunk size is 64k.
>
> Make sure your partially assembled array is stopped:
>
> mdadm --stop /dev/md1
>
> Re-create your array as follows:
>
> mdadm --create --assume-clean --verbose \
> --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \
> /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2}
Being very paranoid at this stage, instead of trying to re-create the
array on the original drives, I dd-ed their content to a different set
of (bigger) drives, and issued the command on them.
The array assembled fine:
md1 : active raid6 sdc2[6] sdd1[5] sdg1[4] sdb1[3] sdf1[2] sdh2[1]
sda2[0]
7325679040 blocks super 1.0 level 6, 64k chunk, algorithm 2 [7/7]
[UUUUUUU]
bitmap: 0/11 pages [0KB], 65536KB chunk
> Use "fsck -n" to check your array's filesystem (expect some damage at
> the very begining). If it look reasonable, use fsck to fix any damage.
fsck -n run to completion but reported a ton of errors, mostly stemming
from the initial (ext4) superblock being damaged.
e2fsck 1.42.12 (29-Aug-2014)
ext2fs_check_desc: Corrupt group descriptor: bad block for block
bitmap
fsck.ext4: Group descriptors look bad... trying backup blocks...
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal
anyway.
Clear journal? no
The filesystem size (according to the superblock) is 1831419920
blocks
The physical size of the device is 1831419760 blocks
Either the superblock or the partition table is likely to be
corrupt!
Abort? no
data contains a file system with errors, check forced.
Resize inode not valid. Recreate? no
Pass 1: Checking inodes, blocks, and sizes
Inode 7 has illegal block(s). Clear? no
Illegal block #448536 (4285956422) in inode 7. IGNORED.
Illegal block #448537 (4292313414) in inode 7. IGNORED.
Illegal block #448538 (3675619654) in inode 7. IGNORED.
Illegal block #448539 (3686760774) in inode 7. IGNORED.
Illegal block #448541 (1880654150) in inode 7. IGNORED.
Illegal block #448542 (3636035910) in inode 7. IGNORED.
Illegal block #448543 (2516877638) in inode 7. IGNORED.
Illegal block #448544 (2920513862) in inode 7. IGNORED.
Illegal block #449560 (4285956537) in inode 7. IGNORED.
Illegal block #449561 (4292313529) in inode 7. IGNORED.
Illegal block #449562 (3675619769) in inode 7. IGNORED.
Too many illegal blocks in inode 7.
Clear inode? no
Suppress messages? no
...
and so on...
So I issued the real fsck command. It interestingly reported a
completely different set of issues, my guess is that after fixing the
superblock, the inconsistencies that fsck -n was talking about went way,
and the real ones started to show up. At any rate, now the file system
seems to be clean, expect for this message:
The filesystem size (according to the superblock) is 1831419920
blocks
The physical size of the device is 1831419760 blocks
Either the superblock or the partition table is likely to be
corrupt!
This problem prevents me from mounting the FS:
mount -o ro /dev/md1 /mnt -v
mount: wrong fs type, bad option, bad superblock on /dev/md1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
And dmesg reports:
[ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920
exceeds size of device (1831419760 blocks)
So here I am right now. I can see a few paths forward, but first a
question:
Why is it that the re-created MD device is different in size (ever so
slightly) then the ext4 filesystem that it used to contain? I doubt it
has anything to do with the grow operation as I didn't get far enough to
actually resize the filesystem...
One side-effect of using different drives (and dd) is that the partition
table is now misaligned with the new disk geometry. For example:
fdisk -l /dev/sdb
Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x3e6b39b9
Device Boot Start End Sectors Size Id Type
/dev/sdb1 63 2930272064 2930272002 1.4T fd Linux raid
autodetect
Partition 2 does not start on physical sector boundary.
Could this be the route cause?
Here's the sizes of all the other relevant partitions:
/dev/sda2 976752064 3907029167 2930277104 1.4T fd Linux raid
autodetect
/dev/sdb1 63 2930272064 2930272002 1.4T fd Linux raid
autodetect
/dev/sdc2 976752064 3907029167 2930277104 1.4T fd Linux raid
autodetect
/dev/sdd1 63 3907024064 3907024002 1.8T fd Linux raid
autodetect
/dev/sdf1 63 2930272064 2930272002 1.4T fd Linux raid
autodetect
/dev/sdg1 63 2930272064 2930272002 1.4T fd Linux raid
autodetect
/dev/sdh2 976752064 3907029167 2930277104 1.4T fd Linux raid
autodetect
If I look at the size reported by fdisk above, on a 7-disk raid6, with
each partition of that size, I should have 1831420000 sectors available.
I'm sure mdadm takes some sectors for management, but I don't know how
much?
So, I thought of three ways of fixing it:
1. Re-create the array again, but this time force the array size to the
one reported by the filesystem, using -size. What is the unit for -size?
Is that bytes?
2. Re-create the array again, but this time use the original
super-blocks version (0.91 I think). Could that make a difference in the
size of the array?
3. Instead of DD-ing whole drives, dd just the raid6 partitions so the
partition table is correct for the drives. Maybe the misalignment trips
mdadm off and makes it to create the array in the incorrect size?
Thanks for all the help again,
Andras
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-25 14:15 ` andras
@ 2015-10-25 23:02 ` Phil Turmel
2015-10-28 16:31 ` Andras Tantos
0 siblings, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2015-10-25 23:02 UTC (permalink / raw)
To: andras; +Cc: Linux-RAID
On 10/25/2015 10:15 AM, andras@tantosonline.com wrote:
> Phil,
>
> Thanks for all the help. I finally have some progress (and new problems).
> [ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920
> exceeds size of device (1831419760 blocks)
> So, I thought of three ways of fixing it:
> 1. Re-create the array again, but this time force the array size to the
> one reported by the filesystem, using -size. What is the unit for -size?
> Is that bytes?
Yep. You'll need to use the --size option on a create. Note that it
specifies the amount of each device to use, not the overall array size.
According to "man mdadm", its units is k == 1024 bytes. Use the exact
size from your original => --size=1465135936
> 2. Re-create the array again, but this time use the original
> super-blocks version (0.91 I think). Could that make a difference in the
> size of the array?
v0.91 really is just a flag that means v0.90 w/ a reshape in progress.
But yes, the size used would be somewhat different. With the override
above, it won't matter. v1.x metadata has more features, and modern
mdadm normally reserves enough room to support them.
> 3. Instead of DD-ing whole drives, dd just the raid6 partitions so the
> partition table is correct for the drives. Maybe the misalignment trips
> mdadm off and makes it to create the array in the incorrect size?
Yes, dd just the partition contents, so the final array is aligned.
This is *really* important for drives that have logical 512-byte sectors
but physical 4k-sectors. When you put your repaired array back in
service, keep this alignment.
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-25 23:02 ` Phil Turmel
@ 2015-10-28 16:31 ` Andras Tantos
2015-10-28 16:42 ` Phil Turmel
0 siblings, 1 reply; 24+ messages in thread
From: Andras Tantos @ 2015-10-28 16:31 UTC (permalink / raw)
To: Phil Turmel; +Cc: Linux-RAID
Thanks again Phil!
I'm almost there...
>> [ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920
>> exceeds size of device (1831419760 blocks)
>
>Yep. You'll need to use the --size option on a create. Note that it
>specifies the amount of each device to use, not the overall array size.
>According to "man mdadm", its units is k == 1024 bytes. Use the exact
>size from your original => --size=1465135936
When I try to do that, I get the following message:
root@bazsalikom:~# mdadm --create --assume-clean --verbose
--metadata=1.0 --raid-devices=7 --size=1465135936 --chunk=64 --level=6
/dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1
/dev/sdh2
mdadm: layout defaults to left-symmetric
mdadm: /dev/sde2 appears to contain an ext2fs file system
size=-1216020180K mtime=Wed Dec 8 11:55:07 1954
mdadm: /dev/sde2 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdc2 appears to contain an ext2fs file system
size=-1264254912K mtime=Sat Jul 18 15:26:57 2015
mdadm: /dev/sdc2 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdf1 is smaller than given size. 1465135808K <
1465135936K + metadata
mdadm: /dev/sdd1 is smaller than given size. 1465135808K <
1465135936K + metadata
mdadm: /dev/sdb1 is smaller than given size. 1465135808K <
1465135936K + metadata
mdadm: /dev/sdg1 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdh2 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: create aborted
To be able to re-assemble the array, I *have* to specify metadata
version 0.9:
root@bazsalikom:~# mdadm --create --assume-clean --verbose
--metadata=0.9 --raid-devices=7 --size=1465135936 --chunk=64 --level=6
/dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1
/dev/sdh2
mdadm: layout defaults to left-symmetric
mdadm: /dev/sde2 appears to contain an ext2fs file system
size=-1216020180K mtime=Wed Dec 8 11:55:07 1954
mdadm: /dev/sde2 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdc2 appears to contain an ext2fs file system
size=-1264254912K mtime=Sat Jul 18 15:26:57 2015
mdadm: /dev/sdc2 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdf1 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdg1 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: /dev/sdh2 appears to be part of a raid array:
level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
mdadm: largest drive (/dev/sdg1) exceeds size (1465135936K) by more
than 1%
Continue creating array? y
mdadm: array /dev/md1 started.
Is this a problem? Can I upgrade my array to 1.0 metadata? Should I?
Andras
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-28 16:31 ` Andras Tantos
@ 2015-10-28 16:42 ` Phil Turmel
2015-10-28 17:10 ` Andras Tantos
2015-10-29 16:59 ` Andras Tantos
0 siblings, 2 replies; 24+ messages in thread
From: Phil Turmel @ 2015-10-28 16:42 UTC (permalink / raw)
To: Andras Tantos; +Cc: Linux-RAID
On 10/28/2015 12:31 PM, Andras Tantos wrote:
> Thanks again Phil!
>
> I'm almost there...
>
>>> [ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920
>>> exceeds size of device (1831419760 blocks)
>>
>>Yep. You'll need to use the --size option on a create. Note that it
>>specifies the amount of each device to use, not the overall array size.
>>According to "man mdadm", its units is k == 1024 bytes. Use the exact
>>size from your original => --size=1465135936
>
> When I try to do that, I get the following message:
>
> root@bazsalikom:~# mdadm --create --assume-clean --verbose
> --metadata=1.0 --raid-devices=7 --size=1465135936 --chunk=64 --level=6
> /dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1
> /dev/sdh2
> mdadm: layout defaults to left-symmetric
> mdadm: /dev/sde2 appears to contain an ext2fs file system
> size=-1216020180K mtime=Wed Dec 8 11:55:07 1954
> mdadm: /dev/sde2 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdc2 appears to contain an ext2fs file system
> size=-1264254912K mtime=Sat Jul 18 15:26:57 2015
> mdadm: /dev/sdc2 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdf1 is smaller than given size. 1465135808K <
> 1465135936K + metadata
> mdadm: /dev/sdd1 is smaller than given size. 1465135808K <
> 1465135936K + metadata
> mdadm: /dev/sdb1 is smaller than given size. 1465135808K <
> 1465135936K + metadata
> mdadm: /dev/sdg1 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdh2 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: create aborted
>
> To be able to re-assemble the array, I *have* to specify metadata
> version 0.9:
>
> root@bazsalikom:~# mdadm --create --assume-clean --verbose
> --metadata=0.9 --raid-devices=7 --size=1465135936 --chunk=64 --level=6
> /dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1
> /dev/sdh2
> mdadm: layout defaults to left-symmetric
> mdadm: /dev/sde2 appears to contain an ext2fs file system
> size=-1216020180K mtime=Wed Dec 8 11:55:07 1954
> mdadm: /dev/sde2 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdc2 appears to contain an ext2fs file system
> size=-1264254912K mtime=Sat Jul 18 15:26:57 2015
> mdadm: /dev/sdc2 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdf1 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdd1 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdb1 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdg1 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: /dev/sdh2 appears to be part of a raid array:
> level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015
> mdadm: largest drive (/dev/sdg1) exceeds size (1465135936K) by more
> than 1%
> Continue creating array? y
> mdadm: array /dev/md1 started.
>
> Is this a problem? Can I upgrade my array to 1.0 metadata? Should I?
Hmm. Interesting. Your version of mdadm is insisting on reserving much
more space between end of content and the v1.0 metadata than when using
v0.90 metadata.
I'm curious how much. Please show the output of "cat /proc/partitions".
If you stop the array cleanly and then manually re-assemble with
--update=metadata, you might get around it. (Specify all of the devices
explicitly to ensure you don't get burned by v0.90's problems with last
partitions.)
You definitely don't want to stay on v0.90, but you may need to for now
to get out of trouble.
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-28 16:42 ` Phil Turmel
@ 2015-10-28 17:10 ` Andras Tantos
2015-10-28 17:38 ` Phil Turmel
2015-10-29 16:59 ` Andras Tantos
1 sibling, 1 reply; 24+ messages in thread
From: Andras Tantos @ 2015-10-28 17:10 UTC (permalink / raw)
To: Phil Turmel; +Cc: Linux-RAID
Phil,
>> To be able to re-assemble the array, I *have* to specify metadata
>> version 0.9:
>>
>> Is this a problem? Can I upgrade my array to 1.0 metadata? Should I?
>
> Hmm. Interesting. Your version of mdadm is insisting on reserving much
> more space between end of content and the v1.0 metadata than when using
> v0.90 metadata.
>
> I'm curious how much. Please show the output of "cat /proc/partitions".
root@bazsalikom:/home/tantos# cat /proc/partitions
major minor #blocks name
8 16 1465138584 sdb
8 17 1465136001 sdb1
8 48 1465138584 sdd
8 49 1465136001 sdd1
8 80 1465138584 sdf
8 81 1465136001 sdf1
8 96 1953513527 sdg
8 97 1953512001 sdg1
8 112 1953514584 sdh
8 113 538145 sdh1
8 114 1465138552 sdh2
8 115 487837854 sdh3
8 64 1953514584 sde
8 65 538145 sde1
8 66 1465138552 sde2
8 67 487837854 sde3
8 32 1953514584 sdc
8 33 538145 sdc1
8 34 1465138552 sdc2
8 35 487837854 sdc3
9 0 487837760 md0
9 1 7325679680 md1
Andras
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-28 17:10 ` Andras Tantos
@ 2015-10-28 17:38 ` Phil Turmel
0 siblings, 0 replies; 24+ messages in thread
From: Phil Turmel @ 2015-10-28 17:38 UTC (permalink / raw)
To: Andras Tantos; +Cc: Linux-RAID
On 10/28/2015 01:10 PM, Andras Tantos wrote:
> Phil,
>
>>> To be able to re-assemble the array, I *have* to specify metadata
>>> version 0.9:
>>>
>>> Is this a problem? Can I upgrade my array to 1.0 metadata? Should I?
>>
>> Hmm. Interesting. Your version of mdadm is insisting on reserving much
>> more space between end of content and the v1.0 metadata than when using
>> v0.90 metadata.
>>
>> I'm curious how much. Please show the output of "cat /proc/partitions".
Ok. I think your version of mdadm is trying to put a bitmap on the v1.0
array, which can be suppressed with --bitmap=none. Or just do the
--assemble --update.
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-28 16:42 ` Phil Turmel
2015-10-28 17:10 ` Andras Tantos
@ 2015-10-29 16:59 ` Andras Tantos
2015-10-30 18:12 ` Phil Turmel
1 sibling, 1 reply; 24+ messages in thread
From: Andras Tantos @ 2015-10-29 16:59 UTC (permalink / raw)
To: Phil Turmel; +Cc: Linux-RAID
Phil,
On 10/28/2015 9:42 AM, Phil Turmel wrote:
> If you stop the array cleanly and then manually re-assemble with
> --update=metadata, you might get around it. (Specify all of the
> devices explicitly to ensure you don't get burned by v0.90's problems
> with last partitions.) You definitely don't want to stay on v0.90, but
> you may need to for now to get out of trouble. Phil
It seems that my mdadm doesn't have an --update=metadata option, which
if I understand it right means I have to re-create the array with the
no-bitmap option. How dangerous is that? Is it possible that things get
overwritten during the re-create process in the data portion of the array?
I've read that GRUB (which is my bootloader) didn't support v1.0
superblocks for a while. It seems that 0.99 version of GRUB (which is
what I have) has it, but how to make certain? I don't want to render my
system un-bootable...
Can you expand a little bit on the problems of v0.90 superblocks and why
upgrading is advantageous? What I've read about the differences (lifted
limit of number of devices/array and 2TB per device limit) don't really
apply to my case.
Thanks,
Andras
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape?
2015-10-29 16:59 ` Andras Tantos
@ 2015-10-30 18:12 ` Phil Turmel
2015-11-03 23:42 ` How to recover after md crash during reshape? - SOLVED/SUMMARY Andras Tantos
0 siblings, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2015-10-30 18:12 UTC (permalink / raw)
To: Andras Tantos; +Cc: Linux-RAID
On 10/29/2015 12:59 PM, Andras Tantos wrote:
> Phil,
>
> On 10/28/2015 9:42 AM, Phil Turmel wrote:
>> If you stop the array cleanly and then manually re-assemble with
>> --update=metadata, you might get around it. (Specify all of the
>> devices explicitly to ensure you don't get burned by v0.90's problems
>> with last partitions.) You definitely don't want to stay on v0.90, but
>> you may need to for now to get out of trouble. Phil
>
> It seems that my mdadm doesn't have an --update=metadata option, which
> if I understand it right means I have to re-create the array with the
> no-bitmap option. How dangerous is that? Is it possible that things get
> overwritten during the re-create process in the data portion of the array?
Just clone and compile a local copy of the latest mdadm, then run it as
./mdadm for the --update operation.
git clone git://github.com/neilbrown/mdadm
> I've read that GRUB (which is my bootloader) didn't support v1.0
> superblocks for a while. It seems that 0.99 version of GRUB (which is
> what I have) has it, but how to make certain? I don't want to render my
> system un-bootable...
Old grub doesn't understand MD at all, which is why you needed a mirror
that has the content starting at the beginning of the partition. To
grub, it doesn't look like a mirror. This is true for v1.0 as well.
> Can you expand a little bit on the problems of v0.90 superblocks and why
> upgrading is advantageous? What I've read about the differences (lifted
> limit of number of devices/array and 2TB per device limit) don't really
> apply to my case.
v0.90 will screw up if you have it on the last partition of a device,
and that partition runs very close to the end of the device. v0.90
doesn't include size info in the metadata itself, so it is ambiguous in
that case whether the superblock belongs to the device as a whole or the
partition. That'll really scramble an array.
Just say no to v0.90.
Phil
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? - SOLVED/SUMMARY
2015-10-30 18:12 ` Phil Turmel
@ 2015-11-03 23:42 ` Andras Tantos
0 siblings, 0 replies; 24+ messages in thread
From: Andras Tantos @ 2015-11-03 23:42 UTC (permalink / raw)
To: Phil Turmel; +Cc: Linux-RAID
Thank you all who helped me solve my problem, especially Phil Turmel,
who I am in dept for the rest of my live. Right now my family photos -
and my marriage - are safe.
For people, who might be interested in the future, here's a quick
summary of the events and the recovery:
Trouble:
==========
Was going to extend RAID6 array from 7 disks to 10. Array reshape
crashed early in the process. After reboot, the array wouldn't
re-assemble with error message:
mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar
superblocks.
If they are really different, please --zero the superblock on one
If they are the same or overlap, please remove one from the
DEVICE list in mdadm.conf.
What I SHOULD have done here is to remove SDA from the DEVICE list in
mdadm.conf followed by mdadm --grow --continue /dev/md1 --backup-file .....
What I did is to zero the superblock of SDA1.
The same message appeard for the other two new HDDs in the array as
well. By the time I zeroed the super blocks of all three new disks the
array assembled but didn't start because it was missing three drives.
Recovery:
===========
1. Look at the partitions listed in /proc/mdstat for the array.
2. For each of the constituents of the array, do mdadm -E <disk name
from the array>
3. Note all the parameters, especially these: 'Chunk Size', 'Raid
Level', 'Version'
4. Make sure all remaining disks show the same event count ('Events')
and they have correct checksum and all the above parameters match.
5. Note the order of the disks in the array. You can find that in this line:
Number Major Minor RaidDevice State
this 6 8 98 6 active sync
6. If all matches, stop the array:
mdadm --stop /dev/md1
7. Re-create your array as follows:
mdadm --create --assume-clean --verbose \
--metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \
/dev/md1 <list of devices in the exact order from note 5 above>
Replace number of devices, chunk size and raid level from note 3
above. For me, I had do specify metadata version 0.9, which was my
original metadata version (as reported by the 'Version' parameter in
point 3 above). YMMV.
8. If all goes well, the array will now re-assemble with the original 7
disks. The data on the array is corrupted up to the point where the
reshape stopped, so...
9. fsck -n /dev/md1 to assess the damage. If doesn't look terrible, fix
the errors: fsck -y /dev/md1.
10. Mount the array rejoice in the data that's recovered.
Final notes:
===============
I still don't know the root cause of the crash. What I did notice is
that this particular (Core2 duo) system seems to become unstable with
more than 9 HDDs. It doesn't seem to be a power supply issue as it has
trouble even if about half of the drives are supplied from a second PSU.
Version 0.9 metadata has some problems, causing the misleading message
in the first place. Upgrading to version 1.0 metadata is a good idea.
If you use desktop or green drives in your array, fix the short kernel
timeout on SATA devices (30s). Issue this on every boot:
for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
If you don't do that, the first unrecoverable read error will degrade
your array instead of simply relocating the failing sector on the hard
drive.
To find and fix unrecoverable read errors on your array, regularly issue:
echo check >/sys/block/md0/md/sync_action
This is a looooong operation on a large RAID6 array, but makes sure that
bad sectors don't accumulate in seldom-accessed corners and destroy your
array at the worst possible time.
Andras
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-11-03 23:42 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-20 2:35 How to recover after md crash during reshape? andras
2015-10-20 12:50 ` Anugraha Sinha
2015-10-20 13:04 ` Wols Lists
2015-10-20 13:49 ` Phil Turmel
[not found] ` <3baf849321d819483c5d20c005a31844@tantosonline.com>
2015-10-20 15:42 ` Phil Turmel
2015-10-20 22:34 ` Anugraha Sinha
2015-10-21 3:52 ` andras
2015-10-21 12:01 ` Phil Turmel
2015-10-21 16:17 ` Wols Lists
2015-10-21 16:05 ` Phil Turmel
2015-10-25 14:15 ` andras
2015-10-25 23:02 ` Phil Turmel
2015-10-28 16:31 ` Andras Tantos
2015-10-28 16:42 ` Phil Turmel
2015-10-28 17:10 ` Andras Tantos
2015-10-28 17:38 ` Phil Turmel
2015-10-29 16:59 ` Andras Tantos
2015-10-30 18:12 ` Phil Turmel
2015-11-03 23:42 ` How to recover after md crash during reshape? - SOLVED/SUMMARY Andras Tantos
2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown
2015-10-21 4:03 ` andras
2015-10-21 12:18 ` Phil Turmel
2015-10-21 20:26 ` Neil Brown
2015-10-21 20:37 ` Phil Turmel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.