* How to recover after md crash during reshape? @ 2015-10-20 2:35 andras 2015-10-20 12:50 ` Anugraha Sinha ` (3 more replies) 0 siblings, 4 replies; 24+ messages in thread From: andras @ 2015-10-20 2:35 UTC (permalink / raw) To: linux-raid Dear all, I have a serious (to me) problem, and I'm seeking some pro advice in recovering a RAID6 volume after a crash at the beginning of a reshape. Thank you all in advance for any help! The details: I'm running Debian. uname -r says: kernel 3.2.0-4-amd64 dmsg says: Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.68-1+deb7u3 mdadm -v says: mdadm - v3.2.5 - 18th May 2012 I used to have a RAID6 volume with 7 disks on it. I've recently bought another 3 new HDD-s and was trying to add them to the array. I've put them in the machine (hot-plug), partitioned them then did: mdadm --add /dev/md1 /dev/sdh1 /dev/sdi1 /dev/sdj1 This worked fine, /proc/mdstat showed them as three spares. Then I did: mdadm --grow --raid-devices=10 /dev/md1 Yes, I was dumb enough to start the process without a backup option - (copy-paste error from https://raid.wiki.kernel.org/index.php/Growing). This immediately (well, after 2 seconds) crashed the MD driver: Oct 17 17:30:27 bazsalikom kernel: [7869821.514718] sd 0:0:0:0: [sdj] Attached SCSI disk Oct 17 18:39:21 bazsalikom kernel: [7873955.418679] sdh: sdh1 Oct 17 18:39:37 bazsalikom kernel: [7873972.155084] sdi: sdi1 Oct 17 18:39:49 bazsalikom kernel: [7873983.916038] sdj: sdj1 Oct 17 18:40:33 bazsalikom kernel: [7874027.963430] md: bind<sdh1> Oct 17 18:40:34 bazsalikom kernel: [7874028.263656] md: bind<sdi1> Oct 17 18:40:34 bazsalikom kernel: [7874028.361112] md: bind<sdj1> Oct 17 18:59:48 bazsalikom kernel: [7875182.667815] md: reshape of RAID array md1 Oct 17 18:59:48 bazsalikom kernel: [7875182.667818] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Oct 17 18:59:48 bazsalikom kernel: [7875182.667821] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using 128k window, over a total of 1465135936k. --> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md: md_do_sync() got signal ... exiting Oct 17 19:02:46 bazsalikom kernel: [7875360.928059] md1_raid6 D ffff88021fc12780 0 282 2 0x00000000 Oct 17 19:02:46 bazsalikom kernel: [7875360.928066] ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0 Oct 17 19:02:46 bazsalikom kernel: [7875360.928073] 0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140 Oct 17 19:02:46 bazsalikom kernel: [7875360.928079] ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00 Oct 17 19:02:46 bazsalikom kernel: [7875360.928085] Call Trace: Oct 17 19:02:46 bazsalikom kernel: [7875360.928095] [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 Oct 17 19:02:46 bazsalikom kernel: [7875360.928111] [<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456] Oct 17 19:02:46 bazsalikom kernel: [7875360.928128] [<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod] Oct 17 19:02:46 bazsalikom kernel: [7875360.928134] [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 Oct 17 19:02:46 bazsalikom kernel: [7875360.928144] [<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod] Oct 17 19:02:46 bazsalikom kernel: [7875360.928151] [<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456] Oct 17 19:02:46 bazsalikom kernel: [7875360.928156] [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 Oct 17 19:02:46 bazsalikom kernel: [7875360.928160] [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 Oct 17 19:02:46 bazsalikom kernel: [7875360.928169] [<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod] Oct 17 19:02:46 bazsalikom kernel: [7875360.928174] [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c Oct 17 19:02:46 bazsalikom kernel: [7875360.928183] [<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod] Oct 17 19:02:46 bazsalikom kernel: [7875360.928188] [<ffffffff8105f7a1>] ? kthread+0x76/0x7e Oct 17 19:02:46 bazsalikom kernel: [7875360.928194] [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10 Oct 17 19:02:46 bazsalikom kernel: [7875360.928199] [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139 Oct 17 19:02:46 bazsalikom kernel: [7875360.928204] [<ffffffff81357ff0>] ? gs_change+0x13/0x13 Oct 17 19:04:46 bazsalikom kernel: [7875480.928055] md1_raid6 D ffff88021fc12780 0 282 2 0x00000000 Oct 17 19:04:46 bazsalikom kernel: [7875480.928062] ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0 Oct 17 19:04:46 bazsalikom kernel: [7875480.928069] 0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140 Oct 17 19:04:46 bazsalikom kernel: [7875480.928075] ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00 Oct 17 19:04:46 bazsalikom kernel: [7875480.928082] Call Trace: Oct 17 19:04:46 bazsalikom kernel: [7875480.928091] [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 Oct 17 19:04:46 bazsalikom kernel: [7875480.928108] [<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928124] [<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928130] [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 Oct 17 19:04:46 bazsalikom kernel: [7875480.928141] [<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928148] [<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928153] [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 Oct 17 19:04:46 bazsalikom kernel: [7875480.928157] [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 Oct 17 19:04:46 bazsalikom kernel: [7875480.928166] [<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928171] [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c Oct 17 19:04:46 bazsalikom kernel: [7875480.928180] [<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928185] [<ffffffff8105f7a1>] ? kthread+0x76/0x7e Oct 17 19:04:46 bazsalikom kernel: [7875480.928191] [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10 Oct 17 19:04:46 bazsalikom kernel: [7875480.928196] [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139 Oct 17 19:04:46 bazsalikom kernel: [7875480.928200] [<ffffffff81357ff0>] ? gs_change+0x13/0x13 Oct 17 19:04:46 bazsalikom kernel: [7875480.928212] jbd2/md1-8 D ffff88021fc92780 0 1731 2 0x00000000 Oct 17 19:04:46 bazsalikom kernel: [7875480.928218] ffff880213693180 0000000000000046 ffff880200000000 ffff880216d04180 Oct 17 19:04:46 bazsalikom kernel: [7875480.928224] 0000000000012780 ffff880213df3fd8 ffff880213df3fd8 ffff880213693180 Oct 17 19:04:46 bazsalikom kernel: [7875480.928230] 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 Oct 17 19:04:46 bazsalikom kernel: [7875480.928236] Call Trace: Oct 17 19:04:46 bazsalikom kernel: [7875480.928243] [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928248] [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 Oct 17 19:04:46 bazsalikom kernel: [7875480.928255] [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928260] [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c Oct 17 19:04:46 bazsalikom kernel: [7875480.928278] [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928283] [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf Oct 17 19:04:46 bazsalikom kernel: [7875480.928287] [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 Oct 17 19:04:46 bazsalikom kernel: [7875480.928293] [<ffffffff81121b78>] ? bio_alloc_bioset+0x43/0xb6 Oct 17 19:04:46 bazsalikom kernel: [7875480.928297] [<ffffffff8111da68>] ? submit_bh+0xe2/0xff Oct 17 19:04:46 bazsalikom kernel: [7875480.928304] [<ffffffffa0167674>] ? jbd2_journal_commit_transaction+0x803/0x10bf [jbd2] Oct 17 19:04:46 bazsalikom kernel: [7875480.928309] [<ffffffff8100d02f>] ? load_TLS+0x7/0xa Oct 17 19:04:46 bazsalikom kernel: [7875480.928313] [<ffffffff8100d69e>] ? __switch_to+0x133/0x258 Oct 17 19:04:46 bazsalikom kernel: [7875480.928318] [<ffffffff81350dd1>] ? _raw_spin_lock_irqsave+0x9/0x25 Oct 17 19:04:46 bazsalikom kernel: [7875480.928323] [<ffffffff8105267a>] ? lock_timer_base.isra.29+0x23/0x47 Oct 17 19:04:46 bazsalikom kernel: [7875480.928330] [<ffffffffa016b166>] ? kjournald2+0xc0/0x20a [jbd2] Oct 17 19:04:46 bazsalikom kernel: [7875480.928334] [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c Oct 17 19:04:46 bazsalikom kernel: [7875480.928341] [<ffffffffa016b0a6>] ? commit_timeout+0x5/0x5 [jbd2] Oct 17 19:04:46 bazsalikom kernel: [7875480.928345] [<ffffffff8105f7a1>] ? kthread+0x76/0x7e Oct 17 19:04:46 bazsalikom kernel: [7875480.928349] [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10 Oct 17 19:04:46 bazsalikom kernel: [7875480.928354] [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139 Oct 17 19:04:46 bazsalikom kernel: [7875480.928358] [<ffffffff81357ff0>] ? gs_change+0x13/0x13 Oct 17 19:04:46 bazsalikom kernel: [7875480.928408] smbd D ffff88021fc12780 0 3063 25481 0x00000000 Oct 17 19:04:46 bazsalikom kernel: [7875480.928413] ffff880213e07780 0000000000000082 0000000000000000 ffffffff8160d020 Oct 17 19:04:46 bazsalikom kernel: [7875480.928418] 0000000000012780 ffff880003cabfd8 ffff880003cabfd8 ffff880213e07780 Oct 17 19:04:46 bazsalikom kernel: [7875480.928424] 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 Oct 17 19:04:46 bazsalikom kernel: [7875480.928429] Call Trace: Oct 17 19:04:46 bazsalikom kernel: [7875480.928435] [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928439] [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 Oct 17 19:04:46 bazsalikom kernel: [7875480.928445] [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928450] [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c Oct 17 19:04:46 bazsalikom kernel: [7875480.928457] [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928468] [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] Oct 17 19:04:46 bazsalikom kernel: [7875480.928473] [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf Oct 17 19:04:46 bazsalikom kernel: [7875480.928477] [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 Oct 17 19:04:46 bazsalikom kernel: [7875480.928482] [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51 Oct 17 19:04:46 bazsalikom kernel: [7875480.928486] [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134 Oct 17 19:04:46 bazsalikom kernel: [7875480.928496] [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] Oct 17 19:04:46 bazsalikom kernel: [7875480.928500] [<ffffffff81109033>] ? poll_freewait+0x97/0x97 Oct 17 19:04:46 bazsalikom kernel: [7875480.928505] [<ffffffff81036628>] ? should_resched+0x5/0x23 Oct 17 19:04:46 bazsalikom kernel: [7875480.928508] [<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c Oct 17 19:04:46 bazsalikom kernel: [7875480.928513] [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3 Oct 17 19:04:46 bazsalikom kernel: [7875480.928517] [<ffffffff810be02e>] ? ra_submit+0x19/0x1d Oct 17 19:04:46 bazsalikom kernel: [7875480.928522] [<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf Oct 17 19:04:46 bazsalikom kernel: [7875480.928528] [<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec Oct 17 19:04:46 bazsalikom kernel: [7875480.928532] [<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6 Oct 17 19:04:46 bazsalikom kernel: [7875480.928536] [<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e Oct 17 19:04:46 bazsalikom kernel: [7875480.928540] [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b Oct 17 19:04:46 bazsalikom kernel: [7875480.928549] imap D ffff88021fc12780 0 3121 4613 0x00000000 Oct 17 19:04:46 bazsalikom kernel: [7875480.928554] ffff880216db1100 0000000000000082 ffffea0000000000 ffffffff8160d020 Oct 17 19:04:46 bazsalikom kernel: [7875480.928559] 0000000000012780 ffff8800cf5b1fd8 ffff8800cf5b1fd8 ffff880216db1100 Oct 17 19:04:46 bazsalikom kernel: [7875480.928564] 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 Oct 17 19:04:46 bazsalikom kernel: [7875480.928569] Call Trace: Oct 17 19:04:46 bazsalikom kernel: [7875480.928576] [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928580] [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 Oct 17 19:04:46 bazsalikom kernel: [7875480.928585] [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928590] [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c Oct 17 19:04:46 bazsalikom kernel: [7875480.928597] [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928607] [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] Oct 17 19:04:46 bazsalikom kernel: [7875480.928611] [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf Oct 17 19:04:46 bazsalikom kernel: [7875480.928615] [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 Oct 17 19:04:46 bazsalikom kernel: [7875480.928619] [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51 Oct 17 19:04:46 bazsalikom kernel: [7875480.928623] [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134 Oct 17 19:04:46 bazsalikom kernel: [7875480.928633] [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] Oct 17 19:04:46 bazsalikom kernel: [7875480.928637] [<ffffffff8110b27f>] ? dput+0x27/0xee Oct 17 19:04:46 bazsalikom kernel: [7875480.928641] [<ffffffff811110df>] ? mntput_no_expire+0x1e/0xc9 Oct 17 19:04:46 bazsalikom kernel: [7875480.928646] [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3 Oct 17 19:04:46 bazsalikom kernel: [7875480.928650] [<ffffffff810bdff1>] ? force_page_cache_readahead+0x5f/0x83 Oct 17 19:04:46 bazsalikom kernel: [7875480.928654] [<ffffffff810b85e5>] ? sys_fadvise64_64+0x141/0x1e2 Oct 17 19:04:46 bazsalikom kernel: [7875480.928658] [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b Oct 17 19:04:46 bazsalikom kernel: [7875480.928667] smbd D ffff88021fc12780 0 3155 25481 0x00000000 Oct 17 19:04:46 bazsalikom kernel: [7875480.928672] ffff8802135d8780 0000000000000086 0000000000000000 ffffffff8160d020 Oct 17 19:04:46 bazsalikom kernel: [7875480.928677] 0000000000012780 ffff880005267fd8 ffff880005267fd8 ffff8802135d8780 Oct 17 19:04:46 bazsalikom kernel: [7875480.928683] 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 Oct 17 19:04:46 bazsalikom kernel: [7875480.928688] Call Trace: Oct 17 19:04:46 bazsalikom kernel: [7875480.928694] [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928698] [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 Oct 17 19:04:46 bazsalikom kernel: [7875480.928704] [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] Oct 17 19:04:46 bazsalikom kernel: [7875480.928708] [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c Oct 17 19:04:46 bazsalikom kernel: [7875480.928715] [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] Oct 17 19:04:46 bazsalikom kernel: [7875480.928725] [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] Oct 17 19:04:46 bazsalikom kernel: [7875480.928729] [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf Oct 17 19:04:46 bazsalikom kernel: [7875480.928733] [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 Oct 17 19:04:46 bazsalikom kernel: [7875480.928737] [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51 Oct 17 19:04:46 bazsalikom kernel: [7875480.928741] [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134 Oct 17 19:04:46 bazsalikom kernel: [7875480.928751] [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] Oct 17 19:04:46 bazsalikom kernel: [7875480.928755] [<ffffffff81109033>] ? poll_freewait+0x97/0x97 Oct 17 19:04:46 bazsalikom kernel: [7875480.928759] [<ffffffff81036628>] ? should_resched+0x5/0x23 Oct 17 19:04:46 bazsalikom kernel: [7875480.928762] [<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c Oct 17 19:04:46 bazsalikom kernel: [7875480.928767] [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3 Oct 17 19:04:46 bazsalikom kernel: [7875480.928771] [<ffffffff810be02e>] ? ra_submit+0x19/0x1d Oct 17 19:04:46 bazsalikom kernel: [7875480.928775] [<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf Oct 17 19:04:46 bazsalikom kernel: [7875480.928780] [<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec Oct 17 19:04:46 bazsalikom kernel: [7875480.928784] [<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6 Oct 17 19:04:46 bazsalikom kernel: [7875480.928788] [<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e Oct 17 19:04:46 bazsalikom kernel: [7875480.928792] [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b From here on, things went downhill pretty damn fast. I was not able to unmount the file-system, stop or re-start the array (/proc/mdstat went away), any process trying to touch /dev/md1 hung, so eventually, I run out of options and hit the reset button on the machine. Upon reboot, the array wouldn't assemble, it was complaining that SDA and SDA1 had the same superblock info on it. mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar superblocks. If they are really different, please --zero the superblock on one If they are the same or overlap, please remove one from the DEVICE list in mdadm.conf. At this point, I looked at the drives and it appeared that the drive letters got re-arranged by the kernel. My three new HDD-s (which used to be SDH, SDI, SDJ) now appear as SDA, SDB and SDD. I've read up on this a little and everyone seemed to suggest that you repair this super-block corruption by zeroing out the suport-block, so I did: mdadm --zero-superblock /dev/sda1 At this point mdadm started complaining about the super-block on SDB (and later SDD) so I ended up zeroing out the superblock on all three of the new hard-drives: mdadm --zero-superblock /dev/sdb1 mdadm --zero-superblock /dev/sdd1 After this, the array would assemble, but wouldn't start, stating that it doesn't have enough disks in it - which is correct for the new array: I just removed 3 drives from a RAID6. Right now, /proc/mdstat says: Personalities : [raid1] [raid6] [raid5] [raid4] md1 : inactive sdh2[0](S) sdc2[6](S) sdj1[5](S) sde1[4](S) sdg1[3](S) sdi1[2](S) sdf2[1](S) 10744335040 blocks super 0.91 mdadm -E /dev/sdc2 says: /dev/sdc2: Magic : a92b4efc Version : 0.91.00 UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 Creation Time : Sat Oct 2 07:21:53 2010 Raid Level : raid6 Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) Array Size : 11721087488 (11178.10 GiB 12002.39 GB) Raid Devices : 10 Total Devices : 10 Preferred Minor : 1 Reshape pos'n : 4096 Delta Devices : 3 (7->10) Update Time : Sat Oct 17 18:59:50 2015 State : active Active Devices : 10 Working Devices : 10 Failed Devices : 0 Spare Devices : 0 Checksum : fad60788 - correct Events : 2579239 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 6 8 98 6 active sync 0 0 8 50 0 active sync 1 1 8 18 1 active sync 2 2 8 65 2 active sync /dev/sde1 3 3 8 33 3 active sync /dev/sdc1 4 4 8 1 4 active sync /dev/sda1 5 5 8 81 5 active sync /dev/sdf1 6 6 8 98 6 active sync 7 7 8 145 7 active sync /dev/sdj1 8 8 8 129 8 active sync /dev/sdi1 9 9 8 113 9 active sync /dev/sdh1 So, if I read this right, the superblock here states that the array is in the middle of a reshape from 7 to 10 devices, but it just started (4096 is the position). What's interesting is the device names listed here don't match the ones reported by /proc/mdstat, and are actually incorrect. The right partition numbers are in /proc/mdstat. The superblocks on the 6 other original disks match, except for of course which one they mark as 'this' and the checksum. I've read in here (http://ubuntuforums.org/showthread.php?t=2133576) among many other places that it might be possible to recover the data on the array by trying to re-create it to the state before the re-shape. I've also read that if I want to re-create an array in read-only mode, I should re-create it degraded. So, what I thought I would do is this: mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2 /dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing Obviously, at this point, I'm trying to be as cautious as possible in not causing any further damage, if that's at all possible. It seems that this issue has some similarities to this bug: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1001019 So, please all mdadm gurus, help me out! How can I recover as much of the data on this volume as possible? Thanks again, Andras Tantos ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 2:35 How to recover after md crash during reshape? andras @ 2015-10-20 12:50 ` Anugraha Sinha 2015-10-20 13:04 ` Wols Lists ` (2 subsequent siblings) 3 siblings, 0 replies; 24+ messages in thread From: Anugraha Sinha @ 2015-10-20 12:50 UTC (permalink / raw) To: andras, linux-raid Hi Andras, > Upon reboot, the array wouldn't assemble, it was complaining that SDA > and SDA1 had the same superblock info on it. > > mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar > superblocks. > If they are really different, please --zero the superblock on one > If they are the same or overlap, please remove one from the > DEVICE list in mdadm.conf. > > At this point, I looked at the drives and it appeared that the drive > letters got re-arranged by the kernel. My three new HDD-s (which used to > be SDH, SDI, SDJ) now appear as SDA, SDB and SDD. > > I've read up on this a little and everyone seemed to suggest that you > repair this super-block corruption by zeroing out the suport-block, so I > did: > > mdadm --zero-superblock /dev/sda1 > > At this point mdadm started complaining about the super-block on SDB > (and later SDD) so I ended up zeroing out the superblock on all three of > the new hard-drives: > > mdadm --zero-superblock /dev/sdb1 > mdadm --zero-superblock /dev/sdd1 Before doing zero-superblock, you should have removed the drives from the array first. Then you should have zero'd the superblock information. This way array, would have got to know about removal of arrays, and it would have reassembled and started again. Anyways, I suggest, you should first remove the devices which mdadm is expecting to be present. In my opinion you should first execute [Just as a safegaurd may do this as well] mdadm --stop /dev/md1 [then] mdadm /dev/md1 --fail /dev/sda1 --remove /dev/sda1 mdadm /dev/md1 --fail /dev/sdb1 --remove /dev/sdb1 mdadm /dev/md1 --fail /dev/sdd1 --remove /dev/sdd1 Then check what does /proc/mdstat says. Check mdadm -D /dev/md1 says If things are good and you are lucky, restart the array (mdadm --run) Thereafter try and remove existing partitions on /dev/sda, /dev/sdb & /dev/sdd. (Using GNU Parted) Recreate partitions, and probably mkfs on newly created partitions as well. The above will solve the issue that /dev/sda & /dev/sda1 have similar superblock information. Finally take a backup and then add and grow your array again. I hope things work for you. Regards Anugraha On 10/20/2015 11:35 AM, andras@tantosonline.com wrote: > Dear all, > > I have a serious (to me) problem, and I'm seeking some pro advice in > recovering a RAID6 volume after a crash at the beginning of a reshape. > Thank you all in advance for any help! > > The details: > > I'm running Debian. > uname -r says: > kernel 3.2.0-4-amd64 > dmsg says: > Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org) > (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.68-1+deb7u3 > mdadm -v says: > mdadm - v3.2.5 - 18th May 2012 > > I used to have a RAID6 volume with 7 disks on it. I've recently bought > another 3 new HDD-s and was trying to add them to the array. > I've put them in the machine (hot-plug), partitioned them then did: > > mdadm --add /dev/md1 /dev/sdh1 /dev/sdi1 /dev/sdj1 > > This worked fine, /proc/mdstat showed them as three spares. Then I did: > > mdadm --grow --raid-devices=10 /dev/md1 > > Yes, I was dumb enough to start the process without a backup option - > (copy-paste error from https://raid.wiki.kernel.org/index.php/Growing). > > This immediately (well, after 2 seconds) crashed the MD driver: > > Oct 17 17:30:27 bazsalikom kernel: [7869821.514718] sd 0:0:0:0: > [sdj] Attached SCSI disk > Oct 17 18:39:21 bazsalikom kernel: [7873955.418679] sdh: sdh1 > Oct 17 18:39:37 bazsalikom kernel: [7873972.155084] sdi: sdi1 > Oct 17 18:39:49 bazsalikom kernel: [7873983.916038] sdj: sdj1 > Oct 17 18:40:33 bazsalikom kernel: [7874027.963430] md: bind<sdh1> > Oct 17 18:40:34 bazsalikom kernel: [7874028.263656] md: bind<sdi1> > Oct 17 18:40:34 bazsalikom kernel: [7874028.361112] md: bind<sdj1> > Oct 17 18:59:48 bazsalikom kernel: [7875182.667815] md: reshape of > RAID array md1 > Oct 17 18:59:48 bazsalikom kernel: [7875182.667818] md: minimum > _guaranteed_ speed: 1000 KB/sec/disk. > Oct 17 18:59:48 bazsalikom kernel: [7875182.667821] md: using > maximum available idle IO bandwidth (but not more than 200000 KB/sec) > for reshape. > Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using 128k > window, over a total of 1465135936k. > --> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md: md_do_sync() > got signal ... exiting > Oct 17 19:02:46 bazsalikom kernel: [7875360.928059] md1_raid6 D > ffff88021fc12780 0 282 2 0x00000000 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928066] > ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928073] > 0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928079] > ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928085] Call Trace: > Oct 17 19:02:46 bazsalikom kernel: [7875360.928095] > [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928111] > [<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456] > Oct 17 19:02:46 bazsalikom kernel: [7875360.928128] > [<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod] > Oct 17 19:02:46 bazsalikom kernel: [7875360.928134] > [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928144] > [<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod] > Oct 17 19:02:46 bazsalikom kernel: [7875360.928151] > [<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456] > Oct 17 19:02:46 bazsalikom kernel: [7875360.928156] > [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928160] > [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928169] > [<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod] > Oct 17 19:02:46 bazsalikom kernel: [7875360.928174] > [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c > Oct 17 19:02:46 bazsalikom kernel: [7875360.928183] > [<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod] > Oct 17 19:02:46 bazsalikom kernel: [7875360.928188] > [<ffffffff8105f7a1>] ? kthread+0x76/0x7e > Oct 17 19:02:46 bazsalikom kernel: [7875360.928194] > [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928199] > [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139 > Oct 17 19:02:46 bazsalikom kernel: [7875360.928204] > [<ffffffff81357ff0>] ? gs_change+0x13/0x13 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928055] md1_raid6 D > ffff88021fc12780 0 282 2 0x00000000 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928062] > ffff880213fd9140 0000000000000046 ffff8800aa80c140 ffff880201fe08c0 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928069] > 0000000000012780 ffff880211845fd8 ffff880211845fd8 ffff880213fd9140 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928075] > ffff8800a77d8a40 ffffffff81071331 0000000000000046 ffff8802135a0c00 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928082] Call Trace: > Oct 17 19:04:46 bazsalikom kernel: [7875480.928091] > [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928108] > [<ffffffffa0124c6c>] ? check_reshape+0x27b/0x51a [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928124] > [<ffffffffa013ade4>] ? scsi_request_fn+0x443/0x51e [scsi_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928130] > [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928141] > [<ffffffffa00ef3b8>] ? md_check_recovery+0x2a5/0x514 [md_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928148] > [<ffffffffa01286c7>] ? raid5d+0x1c/0x483 [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928153] > [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928157] > [<ffffffff81071331>] ? arch_local_irq_save+0x11/0x17 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928166] > [<ffffffffa00e9256>] ? md_thread+0x114/0x132 [md_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928171] > [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928180] > [<ffffffffa00e9142>] ? md_rdev_init+0xea/0xea [md_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928185] > [<ffffffff8105f7a1>] ? kthread+0x76/0x7e > Oct 17 19:04:46 bazsalikom kernel: [7875480.928191] > [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928196] > [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928200] > [<ffffffff81357ff0>] ? gs_change+0x13/0x13 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928212] jbd2/md1-8 D > ffff88021fc92780 0 1731 2 0x00000000 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928218] > ffff880213693180 0000000000000046 ffff880200000000 ffff880216d04180 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928224] > 0000000000012780 ffff880213df3fd8 ffff880213df3fd8 ffff880213693180 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928230] > 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928236] Call Trace: > Oct 17 19:04:46 bazsalikom kernel: [7875480.928243] > [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928248] > [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928255] > [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928260] > [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928278] > [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928283] > [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf > Oct 17 19:04:46 bazsalikom kernel: [7875480.928287] > [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928293] > [<ffffffff81121b78>] ? bio_alloc_bioset+0x43/0xb6 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928297] > [<ffffffff8111da68>] ? submit_bh+0xe2/0xff > Oct 17 19:04:46 bazsalikom kernel: [7875480.928304] > [<ffffffffa0167674>] ? jbd2_journal_commit_transaction+0x803/0x10bf [jbd2] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928309] > [<ffffffff8100d02f>] ? load_TLS+0x7/0xa > Oct 17 19:04:46 bazsalikom kernel: [7875480.928313] > [<ffffffff8100d69e>] ? __switch_to+0x133/0x258 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928318] > [<ffffffff81350dd1>] ? _raw_spin_lock_irqsave+0x9/0x25 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928323] > [<ffffffff8105267a>] ? lock_timer_base.isra.29+0x23/0x47 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928330] > [<ffffffffa016b166>] ? kjournald2+0xc0/0x20a [jbd2] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928334] > [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928341] > [<ffffffffa016b0a6>] ? commit_timeout+0x5/0x5 [jbd2] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928345] > [<ffffffff8105f7a1>] ? kthread+0x76/0x7e > Oct 17 19:04:46 bazsalikom kernel: [7875480.928349] > [<ffffffff81357ff4>] ? kernel_thread_helper+0x4/0x10 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928354] > [<ffffffff8105f72b>] ? kthread_worker_fn+0x139/0x139 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928358] > [<ffffffff81357ff0>] ? gs_change+0x13/0x13 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928408] smbd D > ffff88021fc12780 0 3063 25481 0x00000000 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928413] > ffff880213e07780 0000000000000082 0000000000000000 ffffffff8160d020 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928418] > 0000000000012780 ffff880003cabfd8 ffff880003cabfd8 ffff880213e07780 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928424] > 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928429] Call Trace: > Oct 17 19:04:46 bazsalikom kernel: [7875480.928435] > [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928439] > [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928445] > [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928450] > [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928457] > [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928468] > [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928473] > [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf > Oct 17 19:04:46 bazsalikom kernel: [7875480.928477] > [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928482] > [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928486] > [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928496] > [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928500] > [<ffffffff81109033>] ? poll_freewait+0x97/0x97 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928505] > [<ffffffff81036628>] ? should_resched+0x5/0x23 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928508] > [<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928513] > [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928517] > [<ffffffff810be02e>] ? ra_submit+0x19/0x1d > Oct 17 19:04:46 bazsalikom kernel: [7875480.928522] > [<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf > Oct 17 19:04:46 bazsalikom kernel: [7875480.928528] > [<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec > Oct 17 19:04:46 bazsalikom kernel: [7875480.928532] > [<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928536] > [<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e > Oct 17 19:04:46 bazsalikom kernel: [7875480.928540] > [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b > Oct 17 19:04:46 bazsalikom kernel: [7875480.928549] imap D > ffff88021fc12780 0 3121 4613 0x00000000 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928554] > ffff880216db1100 0000000000000082 ffffea0000000000 ffffffff8160d020 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928559] > 0000000000012780 ffff8800cf5b1fd8 ffff8800cf5b1fd8 ffff880216db1100 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928564] > 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928569] Call Trace: > Oct 17 19:04:46 bazsalikom kernel: [7875480.928576] > [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928580] > [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928585] > [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928590] > [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928597] > [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928607] > [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928611] > [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf > Oct 17 19:04:46 bazsalikom kernel: [7875480.928615] > [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928619] > [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928623] > [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928633] > [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928637] > [<ffffffff8110b27f>] ? dput+0x27/0xee > Oct 17 19:04:46 bazsalikom kernel: [7875480.928641] > [<ffffffff811110df>] ? mntput_no_expire+0x1e/0xc9 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928646] > [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928650] > [<ffffffff810bdff1>] ? force_page_cache_readahead+0x5f/0x83 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928654] > [<ffffffff810b85e5>] ? sys_fadvise64_64+0x141/0x1e2 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928658] > [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b > Oct 17 19:04:46 bazsalikom kernel: [7875480.928667] smbd D > ffff88021fc12780 0 3155 25481 0x00000000 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928672] > ffff8802135d8780 0000000000000086 0000000000000000 ffffffff8160d020 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928677] > 0000000000012780 ffff880005267fd8 ffff880005267fd8 ffff8802135d8780 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928683] > 0000000000000000 00000001135a0d70 ffff8802135a0d60 ffff8802135a0d70 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928688] Call Trace: > Oct 17 19:04:46 bazsalikom kernel: [7875480.928694] > [<ffffffffa0123804>] ? get_active_stripe+0x24c/0x505 [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928698] > [<ffffffff8103f6e2>] ? try_to_wake_up+0x197/0x197 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928704] > [<ffffffffa01258c8>] ? make_request+0x1b4/0x37a [raid456] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928708] > [<ffffffff8105fdf3>] ? add_wait_queue+0x3c/0x3c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928715] > [<ffffffffa00e8d47>] ? md_make_request+0xee/0x1db [md_mod] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928725] > [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928729] > [<ffffffff8119a3ec>] ? generic_make_request+0x90/0xcf > Oct 17 19:04:46 bazsalikom kernel: [7875480.928733] > [<ffffffff8119a4fe>] ? submit_bio+0xd3/0xf1 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928737] > [<ffffffff810bedab>] ? __lru_cache_add+0x2b/0x51 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928741] > [<ffffffff811259dd>] ? mpage_readpages+0x113/0x134 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928751] > [<ffffffffa017d19a>] ? noalloc_get_block_write+0x17/0x17 [ext4] > Oct 17 19:04:46 bazsalikom kernel: [7875480.928755] > [<ffffffff81109033>] ? poll_freewait+0x97/0x97 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928759] > [<ffffffff81036628>] ? should_resched+0x5/0x23 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928762] > [<ffffffff8134fa44>] ? _cond_resched+0x7/0x1c > Oct 17 19:04:46 bazsalikom kernel: [7875480.928767] > [<ffffffff810bdd31>] ? __do_page_cache_readahead+0x11e/0x1c3 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928771] > [<ffffffff810be02e>] ? ra_submit+0x19/0x1d > Oct 17 19:04:46 bazsalikom kernel: [7875480.928775] > [<ffffffff810b689b>] ? generic_file_aio_read+0x282/0x5cf > Oct 17 19:04:46 bazsalikom kernel: [7875480.928780] > [<ffffffff810fadc4>] ? do_sync_read+0xb4/0xec > Oct 17 19:04:46 bazsalikom kernel: [7875480.928784] > [<ffffffff810fb4af>] ? vfs_read+0x9f/0xe6 > Oct 17 19:04:46 bazsalikom kernel: [7875480.928788] > [<ffffffff810fb61f>] ? sys_pread64+0x53/0x6e > Oct 17 19:04:46 bazsalikom kernel: [7875480.928792] > [<ffffffff81355e92>] ? system_call_fastpath+0x16/0x1b > > From here on, things went downhill pretty damn fast. I was not able to > unmount the file-system, stop or re-start the array (/proc/mdstat went > away), any process trying to touch /dev/md1 hung, so eventually, I run > out of options and hit the reset button on the machine. > > Upon reboot, the array wouldn't assemble, it was complaining that SDA > and SDA1 had the same superblock info on it. > > mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar > superblocks. > If they are really different, please --zero the superblock on one > If they are the same or overlap, please remove one from the > DEVICE list in mdadm.conf. > > At this point, I looked at the drives and it appeared that the drive > letters got re-arranged by the kernel. My three new HDD-s (which used to > be SDH, SDI, SDJ) now appear as SDA, SDB and SDD. > > I've read up on this a little and everyone seemed to suggest that you > repair this super-block corruption by zeroing out the suport-block, so I > did: > > mdadm --zero-superblock /dev/sda1 > > At this point mdadm started complaining about the super-block on SDB > (and later SDD) so I ended up zeroing out the superblock on all three of > the new hard-drives: > > mdadm --zero-superblock /dev/sdb1 > mdadm --zero-superblock /dev/sdd1 > > After this, the array would assemble, but wouldn't start, stating that > it doesn't have enough disks in it - which is correct for the new array: > I just removed 3 drives from a RAID6. > > Right now, /proc/mdstat says: > > Personalities : [raid1] [raid6] [raid5] [raid4] > md1 : inactive sdh2[0](S) sdc2[6](S) sdj1[5](S) sde1[4](S) > sdg1[3](S) sdi1[2](S) sdf2[1](S) > 10744335040 blocks super 0.91 > > mdadm -E /dev/sdc2 says: > /dev/sdc2: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad60788 - correct > Events : 2579239 > > > Layout : left-symmetric > Chunk Size : 64K > > > Number Major Minor RaidDevice State > this 6 8 98 6 active sync > > > 0 0 8 50 0 active sync > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > > So, if I read this right, the superblock here states that the array is > in the middle of a reshape from 7 to 10 devices, but it just started > (4096 is the position). > What's interesting is the device names listed here don't match the ones > reported by /proc/mdstat, and are actually incorrect. The right > partition numbers are in /proc/mdstat. > > The superblocks on the 6 other original disks match, except for of > course which one they mark as 'this' and the checksum. > > I've read in here (http://ubuntuforums.org/showthread.php?t=2133576) > among many other places that it might be possible to recover the data on > the array by trying to re-create it to the state before the re-shape. > > I've also read that if I want to re-create an array in read-only mode, I > should re-create it degraded. > > So, what I thought I would do is this: > > mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2 > /dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing > > Obviously, at this point, I'm trying to be as cautious as possible in > not causing any further damage, if that's at all possible. > > It seems that this issue has some similarities to this bug: > https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1001019 > > So, please all mdadm gurus, help me out! How can I recover as much of > the data on this volume as possible? > > Thanks again, > Andras Tantos > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 2:35 How to recover after md crash during reshape? andras 2015-10-20 12:50 ` Anugraha Sinha @ 2015-10-20 13:04 ` Wols Lists 2015-10-20 13:49 ` Phil Turmel 2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown 3 siblings, 0 replies; 24+ messages in thread From: Wols Lists @ 2015-10-20 13:04 UTC (permalink / raw) To: andras, linux-raid On 20/10/15 03:35, andras@tantosonline.com wrote: > From here on, things went downhill pretty damn fast. I was not able to > unmount the file-system, stop or re-start the array (/proc/mdstat went > away), any process trying to touch /dev/md1 hung, so eventually, I run > out of options and hit the reset button on the machine. > > Upon reboot, the array wouldn't assemble, it was complaining that SDA > and SDA1 had the same superblock info on it. > > mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar > superblocks. > If they are really different, please --zero the superblock on one > If they are the same or overlap, please remove one from the > DEVICE list in mdadm.conf. > > At this point, I looked at the drives and it appeared that the drive > letters got re-arranged by the kernel. My three new HDD-s (which used to > be SDH, SDI, SDJ) now appear as SDA, SDB and SDD. > > I've read up on this a little and everyone seemed to suggest that you > repair this super-block corruption by zeroing out the suport-block, so I > did: > > mdadm --zero-superblock /dev/sda1 OUCH !!! REALLY REALLY REALLY don't do anything now until the experts chime in !!! It looks to me like you have a 0.9 superblock, and this error message is both common and erroneous. There's only one superblock, but it looks to mdadm like it's both a disk superblock and a partition superblock. You've just wiped those drives, I think ... The experts should be able to recover it for you (I hope), but your array is now damaged - don't damage it any further !!! Cheers, Wol ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 2:35 How to recover after md crash during reshape? andras 2015-10-20 12:50 ` Anugraha Sinha 2015-10-20 13:04 ` Wols Lists @ 2015-10-20 13:49 ` Phil Turmel [not found] ` <3baf849321d819483c5d20c005a31844@tantosonline.com> 2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown 3 siblings, 1 reply; 24+ messages in thread From: Phil Turmel @ 2015-10-20 13:49 UTC (permalink / raw) To: andras, linux-raid Good morning Andras, On 10/19/2015 10:35 PM, andras@tantosonline.com wrote: > Dear all, > > I have a serious (to me) problem, and I'm seeking some pro advice in > recovering a RAID6 volume after a crash at the beginning of a reshape. > Thank you all in advance for any help! > > The details: > > I'm running Debian. > uname -r says: > kernel 3.2.0-4-amd64 > dmsg says: > Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org) > (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.68-1+deb7u3 > mdadm -v says: > mdadm - v3.2.5 - 18th May 2012 > > I used to have a RAID6 volume with 7 disks on it. I've recently bought > another 3 new HDD-s and was trying to add them to the array. > I've put them in the machine (hot-plug), partitioned them then did: > > mdadm --add /dev/md1 /dev/sdh1 /dev/sdi1 /dev/sdj1 > > This worked fine, /proc/mdstat showed them as three spares. Then I did: > > mdadm --grow --raid-devices=10 /dev/md1 > > Yes, I was dumb enough to start the process without a backup option - > (copy-paste error from https://raid.wiki.kernel.org/index.php/Growing). The normal way to recover from this mistake is to issue mdadm --grow --continue /dev/md1 --backup-file ..... > This immediately (well, after 2 seconds) crashed the MD driver: Crashing is a bug, of course, but you are using an old kernel. New kernels *generally* have fewer bugs than old kernels :-) In newer kernels it would have just held @ 0% progress while still otherwise running. Same observation applies to the mdadm utility too. Consider using a relatively new rescue CD for further operations. [trim /] > Upon reboot, the array wouldn't assemble, it was complaining that SDA > and SDA1 had the same superblock info on it. > > mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar > superblocks. > If they are really different, please --zero the superblock on one > If they are the same or overlap, please remove one from the > DEVICE list in mdadm.conf. This is a completely separate problem, and the warning is a bit misleading. It is a side effect of version 0.90 metadata that could not be solved in a backward compatible manner. Which is why v1.x metadata was created and became the default years ago. Basically, v0.90 metadata, which is placed at the end of a device, when used on the last partition of a disk, is ambiguous about whether it belongs to the last partition or the disk as a whole. Normally, you can update the metadata in place from v0.90 to v1.0 with mdadm --assemble --update=metadata .... > At this point, I looked at the drives and it appeared that the drive > letters got re-arranged by the kernel. My three new HDD-s (which used to > be SDH, SDI, SDJ) now appear as SDA, SDB and SDD. This is common and often screws people up. The kernel assigns names based on discovery order, which varies, especially with hotplugging. You need a map of your array and its devices versus the underlying drive serial numbers. This is so important I created a script years ago to generate this information. Please download and run it, and post the results here so we can precisely tailor the instructions we give. https://github.com/pturmel/lsdrv > I've read up on this a little and everyone seemed to suggest that you > repair this super-block corruption by zeroing out the suport-block, so I > did: > > mdadm --zero-superblock /dev/sda1 "Everyone" was wrong. Your drives only had the one superblock. It was just misidentified in two contexts. You destroyed the only superblock on those devices. [trim /] > After this, the array would assemble, but wouldn't start, stating that > it doesn't have enough disks in it - which is correct for the new array: > I just removed 3 drives from a RAID6. > > Right now, /proc/mdstat says: > > Personalities : [raid1] [raid6] [raid5] [raid4] > md1 : inactive sdh2[0](S) sdc2[6](S) sdj1[5](S) sde1[4](S) > sdg1[3](S) sdi1[2](S) sdf2[1](S) > 10744335040 blocks super 0.91 > > mdadm -E /dev/sdc2 says: > /dev/sdc2: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad60788 - correct > Events : 2579239 > > > Layout : left-symmetric > Chunk Size : 64K > > > Number Major Minor RaidDevice State > this 6 8 98 6 active sync > > > 0 0 8 50 0 active sync > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > > So, if I read this right, the superblock here states that the array is > in the middle of a reshape from 7 to 10 devices, but it just started > (4096 is the position). Yup, just a little ways in at the beginning. Probably where it tried to write its first critical section to the backup file. > What's interesting is the device names listed here don't match the ones > reported by /proc/mdstat, and are actually incorrect. The right > partition numbers are in /proc/mdstat. Names in the superblock are recorded per the last successful assembly. Which is why a map of actual roles vs. drive serial numbers is so important. > I've read in here (http://ubuntuforums.org/showthread.php?t=2133576) > among many other places that it might be possible to recover the data on > the array by trying to re-create it to the state before the re-shape. Yes, since you have destroyed those superblocks, and the reshape position is so low. You might lose a little at the beginning of your array. Or might not, if it crashed at the first critical section as I suspect. > I've also read that if I want to re-create an array in read-only mode, I > should re-create it degraded. Not necessary or recommended in this case. > So, what I thought I would do is this: > > mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2 > /dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing > > Obviously, at this point, I'm trying to be as cautious as possible in > not causing any further damage, if that's at all possible. Good, because the above would destroy your array. You'd get modern defaults for metadata version, offset, and chunk size. Please supply all of you mdadm -E reports for the seven partitions and the lsdrv output I requests. Just post the text inline in your reply. Do *not* do anything else. Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <3baf849321d819483c5d20c005a31844@tantosonline.com>]
* Re: How to recover after md crash during reshape? [not found] ` <3baf849321d819483c5d20c005a31844@tantosonline.com> @ 2015-10-20 15:42 ` Phil Turmel 2015-10-20 22:34 ` Anugraha Sinha ` (3 more replies) 0 siblings, 4 replies; 24+ messages in thread From: Phil Turmel @ 2015-10-20 15:42 UTC (permalink / raw) To: andras, Linux-RAID Hi Andras, { Added linux-raid back -- convention on kernel.org is to reply-to-all, trim replies, and either interleave or bottom post. I'm trimming less than normal this time so the list can see. } On 10/20/2015 10:48 AM, andras@tantosonline.com wrote: > On 2015-10-20 08:49, Phil Turmel wrote: >> Please supply all of you mdadm -E reports for the seven partitions and >> the lsdrv output I requests. Just post the text inline in your reply. >> >> Do *not* do anything else. >> >> Phil > Thanks for all the help! > > Here's the output of lsdrv: > > PCI [pata_marvell] 04:00.1 IDE interface: Marvell Technology Group Ltd. > 88SE9128 IDE Controller (rev 11) > ├scsi 0:x:x:x [Empty] > └scsi 2:x:x:x [Empty] > PCI [pata_jmicron] 05:00.1 IDE interface: JMicron Technology Corp. > JMB363 SATA/IDE Controller (rev 02) > ├scsi 1:x:x:x [Empty] > └scsi 3:x:x:x [Empty] > PCI [ahci] 04:00.0 SATA controller: Marvell Technology Group Ltd. > 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11) > ├scsi 4:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JDN8} > │└sda 1.82t [8:0] Partitioned (dos) > │ └sda1 1.82t [8:1] Empty/Unknown > └scsi 5:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1H84Q} > └sdb 1.82t [8:16] Partitioned (dos) > └sdb1 1.82t [8:17] ext4 'data' {d1403616-a9c6-4cd9-8d92-1aabc81fe373} > PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 > Family) 4 port SATA IDE Controller #1 > ├scsi 6:0:0:0 ATA ST31500541AS {6XW0BQL0} > │└sdc 1.36t [8:32] Partitioned (dos) > │ └sdc1 1.36t [8:33] MD raid6 (10) inactive > {5e57a17d-43eb-0786-42ea-8b6c723593c7} > ├scsi 6:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0348342} > │└sdd 1.82t [8:48] Partitioned (dos) > │ ├sdd1 525.53m [8:49] ext4 'boot1' {a3a1cedc-3866-4d80-af18-a7a4db99d880} > │ ├sdd2 1.36t [8:50] MD raid6 (10) inactive > {5e57a17d-43eb-0786-42ea-8b6c723593c7} > │ └sdd3 465.24g [8:51] MD raid1 (3) inactive > {f89cbbf7-66e9-eb44-42ea-8b6c723593c7} > ├scsi 7:0:0:0 ATA ST31500541AS {5XW05FFV} > │└sde 1.36t [8:64] Partitioned (dos) > │ └sde1 1.36t [8:65] MD raid6 (10) inactive > {5e57a17d-43eb-0786-42ea-8b6c723593c7} > └scsi 7:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0209553} > └sdf 1.82t [8:80] Partitioned (dos) > ├sdf1 525.53m [8:81] ext4 'boot2' {9b0e1e49-c736-47c0-89a1-4cac07c1d5ef} > ├sdf2 1.36t [8:82] MD raid6 (10) inactive > {5e57a17d-43eb-0786-42ea-8b6c723593c7} > └sdf3 465.24g [8:83] MD raid1 (1/3) (w/ sdi3) in_sync > {f89cbbf7-66e9-eb44-42ea-8b6c723593c7} > └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED > {f89cbbf7:66e9eb44:42ea8b6c:723593c7} > │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798} > └Mounted as /dev/disk/by-uuid/ceb15bfe-e082-484c-9015-1fcc8889b798 @ / > PCI [ata_piix] 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 > Family) 2 port SATA IDE Controller #2 > ├scsi 8:0:0:0 ATA ST31500341AS {9VS1EFFD} > │└sdg 1.36t [8:96] Partitioned (dos) > │ └sdg1 1.36t [8:97] MD raid6 (10) inactive > {5e57a17d-43eb-0786-42ea-8b6c723593c7} > └scsi 10:0:0:0 ATA Hitachi HDS5C302 {ML2220F30TEBLE} > └sdh 1.82t [8:112] Partitioned (dos) > └sdh1 1.82t [8:113] MD raid6 (10) inactive > {5e57a17d-43eb-0786-42ea-8b6c723593c7} > PCI [ahci] 05:00.0 SATA controller: JMicron Technology Corp. JMB363 > SATA/IDE Controller (rev 02) > ├scsi 9:0:0:0 ATA WDC WD2002FAEX-0 {WD-WMAY01975001} > │└sdi 1.82t [8:128] Partitioned (dos) > │ ├sdi1 525.53m [8:129] Empty/Unknown > │ ├sdi2 1.36t [8:130] MD raid6 (10) inactive > {5e57a17d-43eb-0786-42ea-8b6c723593c7} > │ └sdi3 465.24g [8:131] MD raid1 (2/3) (w/ sdf3) in_sync > {f89cbbf7-66e9-eb44-42ea-8b6c723593c7} > │ └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED > {f89cbbf7:66e9eb44:42ea8b6c:723593c7} > │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798} > └scsi 11:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JCDE} > └sdj 1.82t [8:144] Partitioned (dos) > └sdj1 1.82t [8:145] Empty/Unknown > Other Block Devices > ├loop0 0.00k [7:0] Empty/Unknown > ├loop1 0.00k [7:1] Empty/Unknown > ├loop2 0.00k [7:2] Empty/Unknown > ├loop3 0.00k [7:3] Empty/Unknown > ├loop4 0.00k [7:4] Empty/Unknown > ├loop5 0.00k [7:5] Empty/Unknown > ├loop6 0.00k [7:6] Empty/Unknown > └loop7 0.00k [7:7] Empty/Unknown > > > mdadm output: > > mdadm -E /dev/sdb1 /dev/sda1 /dev/sdc1 /dev/sdd2 /dev/sde1 /dev/sdh1 > /dev/sdg1 /dev/sdi2 /dev/sdj1 /dev/sdf2 > mdadm: No md superblock detected on /dev/sdb1. > mdadm: No md superblock detected on /dev/sda1. > /dev/sdc1: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad60723 - correct > Events : 2579239 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 4 8 1 4 active sync /dev/sda1 > > 0 0 8 50 0 active sync /dev/sdd2 > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > /dev/sdd2: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad6072e - correct > Events : 2579239 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 1 8 18 1 active sync > > 0 0 8 50 0 active sync /dev/sdd2 > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > /dev/sde1: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad60741 - correct > Events : 2579239 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 3 8 33 3 active sync /dev/sdc1 > > 0 0 8 50 0 active sync /dev/sdd2 > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > /dev/sdh1: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad60775 - correct > Events : 2579239 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 5 8 81 5 active sync /dev/sdf1 > > 0 0 8 50 0 active sync /dev/sdd2 > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > /dev/sdg1: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad6075f - correct > Events : 2579239 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 2 8 65 2 active sync /dev/sde1 > > 0 0 8 50 0 active sync /dev/sdd2 > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > /dev/sdi2: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad60788 - correct > Events : 2579239 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 6 8 98 6 active sync > > 0 0 8 50 0 active sync /dev/sdd2 > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > mdadm: No md superblock detected on /dev/sdj1. > /dev/sdf2: > Magic : a92b4efc > Version : 0.91.00 > UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 > Creation Time : Sat Oct 2 07:21:53 2010 > Raid Level : raid6 > Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) > Array Size : 11721087488 (11178.10 GiB 12002.39 GB) > Raid Devices : 10 > Total Devices : 10 > Preferred Minor : 1 > > Reshape pos'n : 4096 > Delta Devices : 3 (7->10) > > Update Time : Sat Oct 17 18:59:50 2015 > State : active > Active Devices : 10 > Working Devices : 10 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fad6074c - correct > Events : 2579239 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 0 8 50 0 active sync /dev/sdd2 > > 0 0 8 50 0 active sync /dev/sdd2 > 1 1 8 18 1 active sync > 2 2 8 65 2 active sync /dev/sde1 > 3 3 8 33 3 active sync /dev/sdc1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 81 5 active sync /dev/sdf1 > 6 6 8 98 6 active sync > 7 7 8 145 7 active sync /dev/sdj1 > 8 8 8 129 8 active sync /dev/sdi1 > 9 9 8 113 9 active sync /dev/sdh1 > Apparently my problems don't stop adding up: now SDD started developing > problems, so my root partition (md0) is now degraded. I will attempt to > dd out whatever I can from that drive and continue... Don't. You have another problem: green & desktop drives in a raid array. They aren't built for it and will give you grief of one form or another. Anyways, their problem with timeout mismatch can be worked around with long driver timeouts. Before you do anything else, you *MUST* run this command: for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done (Arrange for this to happen on every boot, and keep doing it manually until your boot scripts are fixed.) Then you can add your missing mirror and let MD fix it: mdadm /dev/md0 --add /dev/sdd3 After that's done syncing, you can have MD fix any remaining UREs in that raid1 with: echo check >/sys/block/md0/md/sync_action While that's in progress, take the time to read through the links in the postscript -- the timeout mismatch problem and its impact on unrecoverable read errors has been hashed out on this list many times. Now to your big array. It is vital that it also be cleaned of UREs after re-creation before you do anything else. Which means it must *not* be created degraded (the redundancy is needed to fix UREs). According to lsdrv and your "mdadm -E" reports, the creation order you need is: raid device 0 /dev/sdf2 {WD-WMAZA0209553} raid device 1 /dev/sdd2 {WD-WMAZA0348342} raid device 2 /dev/sdg1 {9VS1EFFD} raid device 3 /dev/sde1 {5XW05FFV} raid device 4 /dev/sdc1 {6XW0BQL0} raid device 5 /dev/sdh1 {ML2220F30TEBLE} raid device 6 /dev/sdi2 {WD-WMAY01975001} Chunk size is 64k. Make sure your partially assembled array is stopped: mdadm --stop /dev/md1 Re-create your array as follows: mdadm --create --assume-clean --verbose \ --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \ /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2} Use "fsck -n" to check your array's filesystem (expect some damage at the very begining). If it look reasonable, use fsck to fix any damage. Then clean up any lingering UREs: echo check > /sys/block/md1/md/sync_action Now you can mount it and catch any critical backups. (You do know that raid != backup, I hope.) Your array now has a new UUID, so you probably want to fix your mdadm.conf file and your initramfs. Finaly, go back and do your --grow, with the --backup-file. In the future, buy drives with raid ratings like the WD Red family, and make sure you have a cron job that regularly kicks off array scrubs. I do mine weekly. HTH, Phil [1] http://marc.info/?l=linux-raid&m=139050322510249&w=2 [2] http://marc.info/?l=linux-raid&m=135863964624202&w=2 [3] http://marc.info/?l=linux-raid&m=135811522817345&w=1 [4] http://marc.info/?l=linux-raid&m=133761065622164&w=2 [5] http://marc.info/?l=linux-raid&m=132477199207506 [6] http://marc.info/?l=linux-raid&m=133665797115876&w=2 [7] https://www.marc.info/?l=linux-raid&m=142487508806844&w=3 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 15:42 ` Phil Turmel @ 2015-10-20 22:34 ` Anugraha Sinha 2015-10-21 3:52 ` andras ` (2 subsequent siblings) 3 siblings, 0 replies; 24+ messages in thread From: Anugraha Sinha @ 2015-10-20 22:34 UTC (permalink / raw) To: Phil Turmel; +Cc: Andras Tantos, Linux-RAID Hi Phil, Thanks for all the information shared by you over this thread. It is really informative. Regards Anugraha On Wed, Oct 21, 2015 at 12:42 AM, Phil Turmel <philip@turmel.org> wrote: > Hi Andras, > > { Added linux-raid back -- convention on kernel.org is to reply-to-all, > trim replies, and either interleave or bottom post. I'm trimming less > than normal this time so the list can see. } > > On 10/20/2015 10:48 AM, andras@tantosonline.com wrote: >> On 2015-10-20 08:49, Phil Turmel wrote: > >>> Please supply all of you mdadm -E reports for the seven partitions and >>> the lsdrv output I requests. Just post the text inline in your reply. >>> >>> Do *not* do anything else. >>> >>> Phil > >> Thanks for all the help! >> >> Here's the output of lsdrv: >> >> PCI [pata_marvell] 04:00.1 IDE interface: Marvell Technology Group Ltd. >> 88SE9128 IDE Controller (rev 11) >> ├scsi 0:x:x:x [Empty] >> └scsi 2:x:x:x [Empty] >> PCI [pata_jmicron] 05:00.1 IDE interface: JMicron Technology Corp. >> JMB363 SATA/IDE Controller (rev 02) >> ├scsi 1:x:x:x [Empty] >> └scsi 3:x:x:x [Empty] >> PCI [ahci] 04:00.0 SATA controller: Marvell Technology Group Ltd. >> 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11) >> ├scsi 4:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JDN8} >> │└sda 1.82t [8:0] Partitioned (dos) >> │ └sda1 1.82t [8:1] Empty/Unknown >> └scsi 5:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1H84Q} >> └sdb 1.82t [8:16] Partitioned (dos) >> └sdb1 1.82t [8:17] ext4 'data' {d1403616-a9c6-4cd9-8d92-1aabc81fe373} >> PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 >> Family) 4 port SATA IDE Controller #1 >> ├scsi 6:0:0:0 ATA ST31500541AS {6XW0BQL0} >> │└sdc 1.36t [8:32] Partitioned (dos) >> │ └sdc1 1.36t [8:33] MD raid6 (10) inactive >> {5e57a17d-43eb-0786-42ea-8b6c723593c7} >> ├scsi 6:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0348342} >> │└sdd 1.82t [8:48] Partitioned (dos) >> │ ├sdd1 525.53m [8:49] ext4 'boot1' {a3a1cedc-3866-4d80-af18-a7a4db99d880} >> │ ├sdd2 1.36t [8:50] MD raid6 (10) inactive >> {5e57a17d-43eb-0786-42ea-8b6c723593c7} >> │ └sdd3 465.24g [8:51] MD raid1 (3) inactive >> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7} >> ├scsi 7:0:0:0 ATA ST31500541AS {5XW05FFV} >> │└sde 1.36t [8:64] Partitioned (dos) >> │ └sde1 1.36t [8:65] MD raid6 (10) inactive >> {5e57a17d-43eb-0786-42ea-8b6c723593c7} >> └scsi 7:0:1:0 ATA WDC WD20EARS-00M {WD-WMAZA0209553} >> └sdf 1.82t [8:80] Partitioned (dos) >> ├sdf1 525.53m [8:81] ext4 'boot2' {9b0e1e49-c736-47c0-89a1-4cac07c1d5ef} >> ├sdf2 1.36t [8:82] MD raid6 (10) inactive >> {5e57a17d-43eb-0786-42ea-8b6c723593c7} >> └sdf3 465.24g [8:83] MD raid1 (1/3) (w/ sdi3) in_sync >> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7} >> └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED >> {f89cbbf7:66e9eb44:42ea8b6c:723593c7} >> │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798} >> └Mounted as /dev/disk/by-uuid/ceb15bfe-e082-484c-9015-1fcc8889b798 @ / >> PCI [ata_piix] 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 >> Family) 2 port SATA IDE Controller #2 >> ├scsi 8:0:0:0 ATA ST31500341AS {9VS1EFFD} >> │└sdg 1.36t [8:96] Partitioned (dos) >> │ └sdg1 1.36t [8:97] MD raid6 (10) inactive >> {5e57a17d-43eb-0786-42ea-8b6c723593c7} >> └scsi 10:0:0:0 ATA Hitachi HDS5C302 {ML2220F30TEBLE} >> └sdh 1.82t [8:112] Partitioned (dos) >> └sdh1 1.82t [8:113] MD raid6 (10) inactive >> {5e57a17d-43eb-0786-42ea-8b6c723593c7} >> PCI [ahci] 05:00.0 SATA controller: JMicron Technology Corp. JMB363 >> SATA/IDE Controller (rev 02) >> ├scsi 9:0:0:0 ATA WDC WD2002FAEX-0 {WD-WMAY01975001} >> │└sdi 1.82t [8:128] Partitioned (dos) >> │ ├sdi1 525.53m [8:129] Empty/Unknown >> │ ├sdi2 1.36t [8:130] MD raid6 (10) inactive >> {5e57a17d-43eb-0786-42ea-8b6c723593c7} >> │ └sdi3 465.24g [8:131] MD raid1 (2/3) (w/ sdf3) in_sync >> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7} >> │ └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED >> {f89cbbf7:66e9eb44:42ea8b6c:723593c7} >> │ ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798} >> └scsi 11:0:0:0 ATA ST2000DM001-1ER1 {Z4Z1JCDE} >> └sdj 1.82t [8:144] Partitioned (dos) >> └sdj1 1.82t [8:145] Empty/Unknown >> Other Block Devices >> ├loop0 0.00k [7:0] Empty/Unknown >> ├loop1 0.00k [7:1] Empty/Unknown >> ├loop2 0.00k [7:2] Empty/Unknown >> ├loop3 0.00k [7:3] Empty/Unknown >> ├loop4 0.00k [7:4] Empty/Unknown >> ├loop5 0.00k [7:5] Empty/Unknown >> ├loop6 0.00k [7:6] Empty/Unknown >> └loop7 0.00k [7:7] Empty/Unknown >> >> >> mdadm output: >> >> mdadm -E /dev/sdb1 /dev/sda1 /dev/sdc1 /dev/sdd2 /dev/sde1 /dev/sdh1 >> /dev/sdg1 /dev/sdi2 /dev/sdj1 /dev/sdf2 > >> mdadm: No md superblock detected on /dev/sdb1. > >> mdadm: No md superblock detected on /dev/sda1. > >> /dev/sdc1: >> Magic : a92b4efc >> Version : 0.91.00 >> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 >> Creation Time : Sat Oct 2 07:21:53 2010 >> Raid Level : raid6 >> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) >> Array Size : 11721087488 (11178.10 GiB 12002.39 GB) >> Raid Devices : 10 >> Total Devices : 10 >> Preferred Minor : 1 >> >> Reshape pos'n : 4096 >> Delta Devices : 3 (7->10) >> >> Update Time : Sat Oct 17 18:59:50 2015 >> State : active >> Active Devices : 10 >> Working Devices : 10 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : fad60723 - correct >> Events : 2579239 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> Number Major Minor RaidDevice State >> this 4 8 1 4 active sync /dev/sda1 >> >> 0 0 8 50 0 active sync /dev/sdd2 >> 1 1 8 18 1 active sync >> 2 2 8 65 2 active sync /dev/sde1 >> 3 3 8 33 3 active sync /dev/sdc1 >> 4 4 8 1 4 active sync /dev/sda1 >> 5 5 8 81 5 active sync /dev/sdf1 >> 6 6 8 98 6 active sync >> 7 7 8 145 7 active sync /dev/sdj1 >> 8 8 8 129 8 active sync /dev/sdi1 >> 9 9 8 113 9 active sync /dev/sdh1 > >> /dev/sdd2: >> Magic : a92b4efc >> Version : 0.91.00 >> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 >> Creation Time : Sat Oct 2 07:21:53 2010 >> Raid Level : raid6 >> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) >> Array Size : 11721087488 (11178.10 GiB 12002.39 GB) >> Raid Devices : 10 >> Total Devices : 10 >> Preferred Minor : 1 >> >> Reshape pos'n : 4096 >> Delta Devices : 3 (7->10) >> >> Update Time : Sat Oct 17 18:59:50 2015 >> State : active >> Active Devices : 10 >> Working Devices : 10 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : fad6072e - correct >> Events : 2579239 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> Number Major Minor RaidDevice State >> this 1 8 18 1 active sync >> >> 0 0 8 50 0 active sync /dev/sdd2 >> 1 1 8 18 1 active sync >> 2 2 8 65 2 active sync /dev/sde1 >> 3 3 8 33 3 active sync /dev/sdc1 >> 4 4 8 1 4 active sync /dev/sda1 >> 5 5 8 81 5 active sync /dev/sdf1 >> 6 6 8 98 6 active sync >> 7 7 8 145 7 active sync /dev/sdj1 >> 8 8 8 129 8 active sync /dev/sdi1 >> 9 9 8 113 9 active sync /dev/sdh1 > >> /dev/sde1: >> Magic : a92b4efc >> Version : 0.91.00 >> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 >> Creation Time : Sat Oct 2 07:21:53 2010 >> Raid Level : raid6 >> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) >> Array Size : 11721087488 (11178.10 GiB 12002.39 GB) >> Raid Devices : 10 >> Total Devices : 10 >> Preferred Minor : 1 >> >> Reshape pos'n : 4096 >> Delta Devices : 3 (7->10) >> >> Update Time : Sat Oct 17 18:59:50 2015 >> State : active >> Active Devices : 10 >> Working Devices : 10 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : fad60741 - correct >> Events : 2579239 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> Number Major Minor RaidDevice State >> this 3 8 33 3 active sync /dev/sdc1 >> >> 0 0 8 50 0 active sync /dev/sdd2 >> 1 1 8 18 1 active sync >> 2 2 8 65 2 active sync /dev/sde1 >> 3 3 8 33 3 active sync /dev/sdc1 >> 4 4 8 1 4 active sync /dev/sda1 >> 5 5 8 81 5 active sync /dev/sdf1 >> 6 6 8 98 6 active sync >> 7 7 8 145 7 active sync /dev/sdj1 >> 8 8 8 129 8 active sync /dev/sdi1 >> 9 9 8 113 9 active sync /dev/sdh1 > >> /dev/sdh1: >> Magic : a92b4efc >> Version : 0.91.00 >> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 >> Creation Time : Sat Oct 2 07:21:53 2010 >> Raid Level : raid6 >> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) >> Array Size : 11721087488 (11178.10 GiB 12002.39 GB) >> Raid Devices : 10 >> Total Devices : 10 >> Preferred Minor : 1 >> >> Reshape pos'n : 4096 >> Delta Devices : 3 (7->10) >> >> Update Time : Sat Oct 17 18:59:50 2015 >> State : active >> Active Devices : 10 >> Working Devices : 10 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : fad60775 - correct >> Events : 2579239 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> Number Major Minor RaidDevice State >> this 5 8 81 5 active sync /dev/sdf1 >> >> 0 0 8 50 0 active sync /dev/sdd2 >> 1 1 8 18 1 active sync >> 2 2 8 65 2 active sync /dev/sde1 >> 3 3 8 33 3 active sync /dev/sdc1 >> 4 4 8 1 4 active sync /dev/sda1 >> 5 5 8 81 5 active sync /dev/sdf1 >> 6 6 8 98 6 active sync >> 7 7 8 145 7 active sync /dev/sdj1 >> 8 8 8 129 8 active sync /dev/sdi1 >> 9 9 8 113 9 active sync /dev/sdh1 > >> /dev/sdg1: >> Magic : a92b4efc >> Version : 0.91.00 >> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 >> Creation Time : Sat Oct 2 07:21:53 2010 >> Raid Level : raid6 >> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) >> Array Size : 11721087488 (11178.10 GiB 12002.39 GB) >> Raid Devices : 10 >> Total Devices : 10 >> Preferred Minor : 1 >> >> Reshape pos'n : 4096 >> Delta Devices : 3 (7->10) >> >> Update Time : Sat Oct 17 18:59:50 2015 >> State : active >> Active Devices : 10 >> Working Devices : 10 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : fad6075f - correct >> Events : 2579239 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> Number Major Minor RaidDevice State >> this 2 8 65 2 active sync /dev/sde1 >> >> 0 0 8 50 0 active sync /dev/sdd2 >> 1 1 8 18 1 active sync >> 2 2 8 65 2 active sync /dev/sde1 >> 3 3 8 33 3 active sync /dev/sdc1 >> 4 4 8 1 4 active sync /dev/sda1 >> 5 5 8 81 5 active sync /dev/sdf1 >> 6 6 8 98 6 active sync >> 7 7 8 145 7 active sync /dev/sdj1 >> 8 8 8 129 8 active sync /dev/sdi1 >> 9 9 8 113 9 active sync /dev/sdh1 > >> /dev/sdi2: >> Magic : a92b4efc >> Version : 0.91.00 >> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 >> Creation Time : Sat Oct 2 07:21:53 2010 >> Raid Level : raid6 >> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) >> Array Size : 11721087488 (11178.10 GiB 12002.39 GB) >> Raid Devices : 10 >> Total Devices : 10 >> Preferred Minor : 1 >> >> Reshape pos'n : 4096 >> Delta Devices : 3 (7->10) >> >> Update Time : Sat Oct 17 18:59:50 2015 >> State : active >> Active Devices : 10 >> Working Devices : 10 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : fad60788 - correct >> Events : 2579239 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> Number Major Minor RaidDevice State >> this 6 8 98 6 active sync >> >> 0 0 8 50 0 active sync /dev/sdd2 >> 1 1 8 18 1 active sync >> 2 2 8 65 2 active sync /dev/sde1 >> 3 3 8 33 3 active sync /dev/sdc1 >> 4 4 8 1 4 active sync /dev/sda1 >> 5 5 8 81 5 active sync /dev/sdf1 >> 6 6 8 98 6 active sync >> 7 7 8 145 7 active sync /dev/sdj1 >> 8 8 8 129 8 active sync /dev/sdi1 >> 9 9 8 113 9 active sync /dev/sdh1 > >> mdadm: No md superblock detected on /dev/sdj1. > >> /dev/sdf2: >> Magic : a92b4efc >> Version : 0.91.00 >> UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7 >> Creation Time : Sat Oct 2 07:21:53 2010 >> Raid Level : raid6 >> Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) >> Array Size : 11721087488 (11178.10 GiB 12002.39 GB) >> Raid Devices : 10 >> Total Devices : 10 >> Preferred Minor : 1 >> >> Reshape pos'n : 4096 >> Delta Devices : 3 (7->10) >> >> Update Time : Sat Oct 17 18:59:50 2015 >> State : active >> Active Devices : 10 >> Working Devices : 10 >> Failed Devices : 0 >> Spare Devices : 0 >> Checksum : fad6074c - correct >> Events : 2579239 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> Number Major Minor RaidDevice State >> this 0 8 50 0 active sync /dev/sdd2 >> >> 0 0 8 50 0 active sync /dev/sdd2 >> 1 1 8 18 1 active sync >> 2 2 8 65 2 active sync /dev/sde1 >> 3 3 8 33 3 active sync /dev/sdc1 >> 4 4 8 1 4 active sync /dev/sda1 >> 5 5 8 81 5 active sync /dev/sdf1 >> 6 6 8 98 6 active sync >> 7 7 8 145 7 active sync /dev/sdj1 >> 8 8 8 129 8 active sync /dev/sdi1 >> 9 9 8 113 9 active sync /dev/sdh1 > >> Apparently my problems don't stop adding up: now SDD started developing >> problems, so my root partition (md0) is now degraded. I will attempt to >> dd out whatever I can from that drive and continue... > > Don't. You have another problem: green & desktop drives in a raid > array. They aren't built for it and will give you grief of one form or > another. Anyways, their problem with timeout mismatch can be worked > around with long driver timeouts. Before you do anything else, you > *MUST* run this command: > > for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done > > (Arrange for this to happen on every boot, and keep doing it manually > until your boot scripts are fixed.) > > Then you can add your missing mirror and let MD fix it: > > mdadm /dev/md0 --add /dev/sdd3 > > After that's done syncing, you can have MD fix any remaining UREs in > that raid1 with: > > echo check >/sys/block/md0/md/sync_action > > While that's in progress, take the time to read through the links in the > postscript -- the timeout mismatch problem and its impact on > unrecoverable read errors has been hashed out on this list many times. > > Now to your big array. It is vital that it also be cleaned of UREs > after re-creation before you do anything else. Which means it must > *not* be created degraded (the redundancy is needed to fix UREs). > > According to lsdrv and your "mdadm -E" reports, the creation order you > need is: > > raid device 0 /dev/sdf2 {WD-WMAZA0209553} > raid device 1 /dev/sdd2 {WD-WMAZA0348342} > raid device 2 /dev/sdg1 {9VS1EFFD} > raid device 3 /dev/sde1 {5XW05FFV} > raid device 4 /dev/sdc1 {6XW0BQL0} > raid device 5 /dev/sdh1 {ML2220F30TEBLE} > raid device 6 /dev/sdi2 {WD-WMAY01975001} > > Chunk size is 64k. > > Make sure your partially assembled array is stopped: > > mdadm --stop /dev/md1 > > Re-create your array as follows: > > mdadm --create --assume-clean --verbose \ > --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \ > /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2} > > Use "fsck -n" to check your array's filesystem (expect some damage at > the very begining). If it look reasonable, use fsck to fix any damage. > > Then clean up any lingering UREs: > > echo check > /sys/block/md1/md/sync_action > > Now you can mount it and catch any critical backups. (You do know that > raid != backup, I hope.) > > Your array now has a new UUID, so you probably want to fix your > mdadm.conf file and your initramfs. > > Finaly, go back and do your --grow, with the --backup-file. > > In the future, buy drives with raid ratings like the WD Red family, and > make sure you have a cron job that regularly kicks off array scrubs. I > do mine weekly. > > HTH, > > Phil > > [1] http://marc.info/?l=linux-raid&m=139050322510249&w=2 > [2] http://marc.info/?l=linux-raid&m=135863964624202&w=2 > [3] http://marc.info/?l=linux-raid&m=135811522817345&w=1 > [4] http://marc.info/?l=linux-raid&m=133761065622164&w=2 > [5] http://marc.info/?l=linux-raid&m=132477199207506 > [6] http://marc.info/?l=linux-raid&m=133665797115876&w=2 > [7] https://www.marc.info/?l=linux-raid&m=142487508806844&w=3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 15:42 ` Phil Turmel 2015-10-20 22:34 ` Anugraha Sinha @ 2015-10-21 3:52 ` andras 2015-10-21 12:01 ` Phil Turmel 2015-10-21 16:17 ` Wols Lists 2015-10-25 14:15 ` andras 3 siblings, 1 reply; 24+ messages in thread From: andras @ 2015-10-21 3:52 UTC (permalink / raw) To: Phil Turmel; +Cc: Linux-RAID Phil, Thank you so much for the detailed explanation and your patience with me! Sorry for not being more responsive - I don't have access to this mail account from work. > >> Apparently my problems don't stop adding up: now SDD started >> developing >> problems, so my root partition (md0) is now degraded. I will attempt >> to >> dd out whatever I can from that drive and continue... > > Don't. You have another problem: green & desktop drives in a raid > array. They aren't built for it and will give you grief of one form or > another. Anyways, their problem with timeout mismatch can be worked > around with long driver timeouts. Before you do anything else, you > *MUST* run this command: > > for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done > > (Arrange for this to happen on every boot, and keep doing it manually > until your boot scripts are fixed.) Yes, will do. In your links below it seems that you're half advocating for using desktop drives in RAID arrays, half advocating against. From what I can tell, it seems the recommendation might depend on the use-case. If one doesn't care too much about instant performance in case of errors, one might want to use desktop drivers (with the above fix). If one wants reliable performance, one probably wants NAS drives. Did I understand the basic trade-off correctly? It seems that people also think that green drives are a bad idea in RAIDs in general - mostly because the frequent parking of heads reduces life-time. Is that a correct statement? > Then you can add your missing mirror and let MD fix it: > > mdadm /dev/md0 --add /dev/sdd3 > > After that's done syncing, you can have MD fix any remaining UREs in > that raid1 with: > > echo check >/sys/block/md0/md/sync_action > > While that's in progress, take the time to read through the links in > the > postscript -- the timeout mismatch problem and its impact on > unrecoverable read errors has been hashed out on this list many times. > > Now to your big array. It is vital that it also be cleaned of UREs > after re-creation before you do anything else. Which means it must > *not* be created degraded (the redundancy is needed to fix UREs). > > According to lsdrv and your "mdadm -E" reports, the creation order you > need is: > > raid device 0 /dev/sdf2 {WD-WMAZA0209553} > raid device 1 /dev/sdd2 {WD-WMAZA0348342} > raid device 2 /dev/sdg1 {9VS1EFFD} > raid device 3 /dev/sde1 {5XW05FFV} > raid device 4 /dev/sdc1 {6XW0BQL0} > raid device 5 /dev/sdh1 {ML2220F30TEBLE} > raid device 6 /dev/sdi2 {WD-WMAY01975001} > > Chunk size is 64k. > > Make sure your partially assembled array is stopped: > > mdadm --stop /dev/md1 > > Re-create your array as follows: > > mdadm --create --assume-clean --verbose \ > --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \ > /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2} > > Use "fsck -n" to check your array's filesystem (expect some damage at > the very begining). If it look reasonable, use fsck to fix any damage. > > Then clean up any lingering UREs: > > echo check > /sys/block/md1/md/sync_action > > Now you can mount it and catch any critical backups. (You do know that > raid != backup, I hope.) > > Your array now has a new UUID, so you probably want to fix your > mdadm.conf file and your initramfs. Yes sir! I will go through the steps and report back. One question: the reason I shouldn't attempt to re-create the new 10-disk array is that it would wipe out the 7->10 grow progress, so MD would think that it's a fully grown 10-disk array, right? > Finaly, go back and do your --grow, with the --backup-file. > > In the future, buy drives with raid ratings like the WD Red family, and > make sure you have a cron job that regularly kicks off array scrubs. I > do mine weekly. Thanks for the info. This is the first time someone mentions scrubbing with regards to RAID to me, but it makes total sense. I will set it up. Thanks again, Andras ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-21 3:52 ` andras @ 2015-10-21 12:01 ` Phil Turmel 0 siblings, 0 replies; 24+ messages in thread From: Phil Turmel @ 2015-10-21 12:01 UTC (permalink / raw) To: andras; +Cc: Linux-RAID Good morning Andras, On 10/20/2015 11:52 PM, andras@tantosonline.com wrote: > Phil, > > Thank you so much for the detailed explanation and your patience with > me! Sorry for not being more responsive - I don't have access to this > mail account from work. No worries. >> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done >> >> (Arrange for this to happen on every boot, and keep doing it manually >> until your boot scripts are fixed.) > > Yes, will do. In your links below it seems that you're half advocating > for using desktop drives in RAID arrays, half advocating against. From > what I can tell, it seems the recommendation might depend on the > use-case. If one doesn't care too much about instant performance in case > of errors, one might want to use desktop drivers (with the above fix). > If one wants reliable performance, one probably wants NAS drives. Did I > understand the basic trade-off correctly? Times change. At the time some of those were written, desktop drives with scterc support were still available, but default off. Those are ok in a raid if you have the appropriate smartctl command in your boot scripts. Long timeouts with non-scterc drives, in my opinion, create a user impression that things are broken, even if the drive is fine (UREs are natural and unavoidable in the life of a drive). Users are prone to drastic measures when they think something is broken. Also, *applications* might not wait that long for their read, either. So, I only recommend the long timeout solution when an array is already in trouble with such drives. > It seems that people also think that green drives are a bad idea in > RAIDs in general - mostly because the frequent parking of heads reduces > life-time. Is that a correct statement? I don't have enough experience with green drives to say. The few that I have (bought before I discovered the dropped scterc support) became part of my offsite backup rotation. > Yes sir! I will go through the steps and report back. One question: the > reason I shouldn't attempt to re-create the new 10-disk array is that it > would wipe out the 7->10 grow progress, so MD would think that it's a > fully grown 10-disk array, right? Right. Your three extra drives never really were incorporated into the array, so the data layout is still a 7-drive pattern. Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 15:42 ` Phil Turmel 2015-10-20 22:34 ` Anugraha Sinha 2015-10-21 3:52 ` andras @ 2015-10-21 16:17 ` Wols Lists 2015-10-21 16:05 ` Phil Turmel 2015-10-25 14:15 ` andras 3 siblings, 1 reply; 24+ messages in thread From: Wols Lists @ 2015-10-21 16:17 UTC (permalink / raw) To: andras, Linux-RAID On 20/10/15 16:42, Phil Turmel wrote: > Don't. You have another problem: green & desktop drives in a raid > array. They aren't built for it and will give you grief of one form or > another. Anyways, their problem with timeout mismatch can be worked > around with long driver timeouts. Before you do anything else, you > *MUST* run this command: > > for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done > > (Arrange for this to happen on every boot, and keep doing it manually > until your boot scripts are fixed.) tl;dr summary ... Desktop drives are spec'd as being okay with one soft error per 10TB read - that's where a read fails, you try again, and everything's okay. A resync will scan the array from start to finish - if you have 10TB's worth of disk, you MUST be prepared to handle these errors. By default, mdadm will assume a disk is faulty and kick it after about 10secs, but a desktop drive will hang for maybe several minutes before reporting a problem. In other words, your drives can meet manufacturer's specs, but, with default settings, your array will never be able to rebuild after a problem! (Note that many people will say "I've never had a problem", but most drives are better than spec. You just don't want to be the unlucky one ...) Not that I have any (yet), but I'd second the recommendation for WD Reds. I've got Seagate Barracudas (not raid-compliant), and the Reds are not much more expensive, and are also the only drives I've found that support the raid features - mostly that by default they will fail and report a problem very quickly. (Plus they're spec'd at reading about 40TB per soft error :-) Cheers, Wol ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-21 16:17 ` Wols Lists @ 2015-10-21 16:05 ` Phil Turmel 0 siblings, 0 replies; 24+ messages in thread From: Phil Turmel @ 2015-10-21 16:05 UTC (permalink / raw) To: Wols Lists, andras, Linux-RAID Hi Wols, I glad you've got the big picture correct, but some details need to be addressed: On 10/21/2015 12:17 PM, Wols Lists wrote: > tl;dr summary ... > > Desktop drives are spec'd as being okay with one soft error per 10TB > read - that's where a read fails, you try again, and everything's okay. No, this isn't correct. That spec is for *unrecoverable* read errors. For desktop drives, typically spec'd as one such error every 1e14 bits read, on average. These are failures where you really have lost the sector contents. Such sectors are marked as "Pending Relocations" in drive firmware. But the recording surface might still be good, so the drive waits for a write to that pending sector, which it then verifies, before deciding to relocate or not. When MD raid receives a read error, whether in normal operation or a scrub, it will reconstruct the missing data and write it back, closing this loop immediately. Where "normal operation" means "read errors are reported by the drive before the driver times out". > A resync will scan the array from start to finish - if you have 10TB's > worth of disk, you MUST be prepared to handle these errors. > > By default, mdadm will assume a disk is faulty and kick it after about > 10secs, but a desktop drive will hang for maybe several minutes before > reporting a problem. MD raid has no timeout, and does not kick drives out for occassional read errors. The timeout is in the per-device drivers (SCSI, SATA, whatever). Which defaults to 30 seconds. Desktop drives typically keep trying to read a bad sector for 120 seconds or more, ignoring the world while they do so. Drives with default SCTERC support typically report a read error within four to seven seconds. With a desktop drive, the linux device driver bails after 30 seconds and resets the link to the drive -- which gets ignored. And keeps getting ignored until the original read retry cycle finishes. During this time, MD has reconstructed the data and told the driver to write the fixed sector. That *write* also fails (because the driver is failing to reset) and that *write error* kicks the drive out of the array. Anyways, please consider reading the threads I pointed Andras at :-) Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 15:42 ` Phil Turmel ` (2 preceding siblings ...) 2015-10-21 16:17 ` Wols Lists @ 2015-10-25 14:15 ` andras 2015-10-25 23:02 ` Phil Turmel 3 siblings, 1 reply; 24+ messages in thread From: andras @ 2015-10-25 14:15 UTC (permalink / raw) To: Phil Turmel; +Cc: Linux-RAID Phil, Thanks for all the help. I finally have some progress (and new problems). > Now to your big array. It is vital that it also be cleaned of UREs > after re-creation before you do anything else. Which means it must > *not* be created degraded (the redundancy is needed to fix UREs). > > According to lsdrv and your "mdadm -E" reports, the creation order you > need is: > > raid device 0 /dev/sdf2 {WD-WMAZA0209553} > raid device 1 /dev/sdd2 {WD-WMAZA0348342} > raid device 2 /dev/sdg1 {9VS1EFFD} > raid device 3 /dev/sde1 {5XW05FFV} > raid device 4 /dev/sdc1 {6XW0BQL0} > raid device 5 /dev/sdh1 {ML2220F30TEBLE} > raid device 6 /dev/sdi2 {WD-WMAY01975001} > > Chunk size is 64k. > > Make sure your partially assembled array is stopped: > > mdadm --stop /dev/md1 > > Re-create your array as follows: > > mdadm --create --assume-clean --verbose \ > --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \ > /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2} Being very paranoid at this stage, instead of trying to re-create the array on the original drives, I dd-ed their content to a different set of (bigger) drives, and issued the command on them. The array assembled fine: md1 : active raid6 sdc2[6] sdd1[5] sdg1[4] sdb1[3] sdf1[2] sdh2[1] sda2[0] 7325679040 blocks super 1.0 level 6, 64k chunk, algorithm 2 [7/7] [UUUUUUU] bitmap: 0/11 pages [0KB], 65536KB chunk > Use "fsck -n" to check your array's filesystem (expect some damage at > the very begining). If it look reasonable, use fsck to fix any damage. fsck -n run to completion but reported a ton of errors, mostly stemming from the initial (ext4) superblock being damaged. e2fsck 1.42.12 (29-Aug-2014) ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap fsck.ext4: Group descriptors look bad... trying backup blocks... Superblock needs_recovery flag is clear, but journal has data. Recovery flag not set in backup superblock, so running journal anyway. Clear journal? no The filesystem size (according to the superblock) is 1831419920 blocks The physical size of the device is 1831419760 blocks Either the superblock or the partition table is likely to be corrupt! Abort? no data contains a file system with errors, check forced. Resize inode not valid. Recreate? no Pass 1: Checking inodes, blocks, and sizes Inode 7 has illegal block(s). Clear? no Illegal block #448536 (4285956422) in inode 7. IGNORED. Illegal block #448537 (4292313414) in inode 7. IGNORED. Illegal block #448538 (3675619654) in inode 7. IGNORED. Illegal block #448539 (3686760774) in inode 7. IGNORED. Illegal block #448541 (1880654150) in inode 7. IGNORED. Illegal block #448542 (3636035910) in inode 7. IGNORED. Illegal block #448543 (2516877638) in inode 7. IGNORED. Illegal block #448544 (2920513862) in inode 7. IGNORED. Illegal block #449560 (4285956537) in inode 7. IGNORED. Illegal block #449561 (4292313529) in inode 7. IGNORED. Illegal block #449562 (3675619769) in inode 7. IGNORED. Too many illegal blocks in inode 7. Clear inode? no Suppress messages? no ... and so on... So I issued the real fsck command. It interestingly reported a completely different set of issues, my guess is that after fixing the superblock, the inconsistencies that fsck -n was talking about went way, and the real ones started to show up. At any rate, now the file system seems to be clean, expect for this message: The filesystem size (according to the superblock) is 1831419920 blocks The physical size of the device is 1831419760 blocks Either the superblock or the partition table is likely to be corrupt! This problem prevents me from mounting the FS: mount -o ro /dev/md1 /mnt -v mount: wrong fs type, bad option, bad superblock on /dev/md1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. And dmesg reports: [ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920 exceeds size of device (1831419760 blocks) So here I am right now. I can see a few paths forward, but first a question: Why is it that the re-created MD device is different in size (ever so slightly) then the ext4 filesystem that it used to contain? I doubt it has anything to do with the grow operation as I didn't get far enough to actually resize the filesystem... One side-effect of using different drives (and dd) is that the partition table is now misaligned with the new disk geometry. For example: fdisk -l /dev/sdb Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier: 0x3e6b39b9 Device Boot Start End Sectors Size Id Type /dev/sdb1 63 2930272064 2930272002 1.4T fd Linux raid autodetect Partition 2 does not start on physical sector boundary. Could this be the route cause? Here's the sizes of all the other relevant partitions: /dev/sda2 976752064 3907029167 2930277104 1.4T fd Linux raid autodetect /dev/sdb1 63 2930272064 2930272002 1.4T fd Linux raid autodetect /dev/sdc2 976752064 3907029167 2930277104 1.4T fd Linux raid autodetect /dev/sdd1 63 3907024064 3907024002 1.8T fd Linux raid autodetect /dev/sdf1 63 2930272064 2930272002 1.4T fd Linux raid autodetect /dev/sdg1 63 2930272064 2930272002 1.4T fd Linux raid autodetect /dev/sdh2 976752064 3907029167 2930277104 1.4T fd Linux raid autodetect If I look at the size reported by fdisk above, on a 7-disk raid6, with each partition of that size, I should have 1831420000 sectors available. I'm sure mdadm takes some sectors for management, but I don't know how much? So, I thought of three ways of fixing it: 1. Re-create the array again, but this time force the array size to the one reported by the filesystem, using -size. What is the unit for -size? Is that bytes? 2. Re-create the array again, but this time use the original super-blocks version (0.91 I think). Could that make a difference in the size of the array? 3. Instead of DD-ing whole drives, dd just the raid6 partitions so the partition table is correct for the drives. Maybe the misalignment trips mdadm off and makes it to create the array in the incorrect size? Thanks for all the help again, Andras ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-25 14:15 ` andras @ 2015-10-25 23:02 ` Phil Turmel 2015-10-28 16:31 ` Andras Tantos 0 siblings, 1 reply; 24+ messages in thread From: Phil Turmel @ 2015-10-25 23:02 UTC (permalink / raw) To: andras; +Cc: Linux-RAID On 10/25/2015 10:15 AM, andras@tantosonline.com wrote: > Phil, > > Thanks for all the help. I finally have some progress (and new problems). > [ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920 > exceeds size of device (1831419760 blocks) > So, I thought of three ways of fixing it: > 1. Re-create the array again, but this time force the array size to the > one reported by the filesystem, using -size. What is the unit for -size? > Is that bytes? Yep. You'll need to use the --size option on a create. Note that it specifies the amount of each device to use, not the overall array size. According to "man mdadm", its units is k == 1024 bytes. Use the exact size from your original => --size=1465135936 > 2. Re-create the array again, but this time use the original > super-blocks version (0.91 I think). Could that make a difference in the > size of the array? v0.91 really is just a flag that means v0.90 w/ a reshape in progress. But yes, the size used would be somewhat different. With the override above, it won't matter. v1.x metadata has more features, and modern mdadm normally reserves enough room to support them. > 3. Instead of DD-ing whole drives, dd just the raid6 partitions so the > partition table is correct for the drives. Maybe the misalignment trips > mdadm off and makes it to create the array in the incorrect size? Yes, dd just the partition contents, so the final array is aligned. This is *really* important for drives that have logical 512-byte sectors but physical 4k-sectors. When you put your repaired array back in service, keep this alignment. Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-25 23:02 ` Phil Turmel @ 2015-10-28 16:31 ` Andras Tantos 2015-10-28 16:42 ` Phil Turmel 0 siblings, 1 reply; 24+ messages in thread From: Andras Tantos @ 2015-10-28 16:31 UTC (permalink / raw) To: Phil Turmel; +Cc: Linux-RAID Thanks again Phil! I'm almost there... >> [ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920 >> exceeds size of device (1831419760 blocks) > >Yep. You'll need to use the --size option on a create. Note that it >specifies the amount of each device to use, not the overall array size. >According to "man mdadm", its units is k == 1024 bytes. Use the exact >size from your original => --size=1465135936 When I try to do that, I get the following message: root@bazsalikom:~# mdadm --create --assume-clean --verbose --metadata=1.0 --raid-devices=7 --size=1465135936 --chunk=64 --level=6 /dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1 /dev/sdh2 mdadm: layout defaults to left-symmetric mdadm: /dev/sde2 appears to contain an ext2fs file system size=-1216020180K mtime=Wed Dec 8 11:55:07 1954 mdadm: /dev/sde2 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdc2 appears to contain an ext2fs file system size=-1264254912K mtime=Sat Jul 18 15:26:57 2015 mdadm: /dev/sdc2 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdf1 is smaller than given size. 1465135808K < 1465135936K + metadata mdadm: /dev/sdd1 is smaller than given size. 1465135808K < 1465135936K + metadata mdadm: /dev/sdb1 is smaller than given size. 1465135808K < 1465135936K + metadata mdadm: /dev/sdg1 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdh2 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: create aborted To be able to re-assemble the array, I *have* to specify metadata version 0.9: root@bazsalikom:~# mdadm --create --assume-clean --verbose --metadata=0.9 --raid-devices=7 --size=1465135936 --chunk=64 --level=6 /dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1 /dev/sdh2 mdadm: layout defaults to left-symmetric mdadm: /dev/sde2 appears to contain an ext2fs file system size=-1216020180K mtime=Wed Dec 8 11:55:07 1954 mdadm: /dev/sde2 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdc2 appears to contain an ext2fs file system size=-1264254912K mtime=Sat Jul 18 15:26:57 2015 mdadm: /dev/sdc2 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdf1 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdg1 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: /dev/sdh2 appears to be part of a raid array: level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 mdadm: largest drive (/dev/sdg1) exceeds size (1465135936K) by more than 1% Continue creating array? y mdadm: array /dev/md1 started. Is this a problem? Can I upgrade my array to 1.0 metadata? Should I? Andras ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-28 16:31 ` Andras Tantos @ 2015-10-28 16:42 ` Phil Turmel 2015-10-28 17:10 ` Andras Tantos 2015-10-29 16:59 ` Andras Tantos 0 siblings, 2 replies; 24+ messages in thread From: Phil Turmel @ 2015-10-28 16:42 UTC (permalink / raw) To: Andras Tantos; +Cc: Linux-RAID On 10/28/2015 12:31 PM, Andras Tantos wrote: > Thanks again Phil! > > I'm almost there... > >>> [ 5859.527778] EXT4-fs (md1): bad geometry: block count 1831419920 >>> exceeds size of device (1831419760 blocks) >> >>Yep. You'll need to use the --size option on a create. Note that it >>specifies the amount of each device to use, not the overall array size. >>According to "man mdadm", its units is k == 1024 bytes. Use the exact >>size from your original => --size=1465135936 > > When I try to do that, I get the following message: > > root@bazsalikom:~# mdadm --create --assume-clean --verbose > --metadata=1.0 --raid-devices=7 --size=1465135936 --chunk=64 --level=6 > /dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1 > /dev/sdh2 > mdadm: layout defaults to left-symmetric > mdadm: /dev/sde2 appears to contain an ext2fs file system > size=-1216020180K mtime=Wed Dec 8 11:55:07 1954 > mdadm: /dev/sde2 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdc2 appears to contain an ext2fs file system > size=-1264254912K mtime=Sat Jul 18 15:26:57 2015 > mdadm: /dev/sdc2 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdf1 is smaller than given size. 1465135808K < > 1465135936K + metadata > mdadm: /dev/sdd1 is smaller than given size. 1465135808K < > 1465135936K + metadata > mdadm: /dev/sdb1 is smaller than given size. 1465135808K < > 1465135936K + metadata > mdadm: /dev/sdg1 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdh2 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: create aborted > > To be able to re-assemble the array, I *have* to specify metadata > version 0.9: > > root@bazsalikom:~# mdadm --create --assume-clean --verbose > --metadata=0.9 --raid-devices=7 --size=1465135936 --chunk=64 --level=6 > /dev/md1 /dev/sde2 /dev/sdc2 /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdg1 > /dev/sdh2 > mdadm: layout defaults to left-symmetric > mdadm: /dev/sde2 appears to contain an ext2fs file system > size=-1216020180K mtime=Wed Dec 8 11:55:07 1954 > mdadm: /dev/sde2 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdc2 appears to contain an ext2fs file system > size=-1264254912K mtime=Sat Jul 18 15:26:57 2015 > mdadm: /dev/sdc2 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdf1 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdd1 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdb1 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdg1 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: /dev/sdh2 appears to be part of a raid array: > level=raid6 devices=7 ctime=Wed Oct 28 09:17:55 2015 > mdadm: largest drive (/dev/sdg1) exceeds size (1465135936K) by more > than 1% > Continue creating array? y > mdadm: array /dev/md1 started. > > Is this a problem? Can I upgrade my array to 1.0 metadata? Should I? Hmm. Interesting. Your version of mdadm is insisting on reserving much more space between end of content and the v1.0 metadata than when using v0.90 metadata. I'm curious how much. Please show the output of "cat /proc/partitions". If you stop the array cleanly and then manually re-assemble with --update=metadata, you might get around it. (Specify all of the devices explicitly to ensure you don't get burned by v0.90's problems with last partitions.) You definitely don't want to stay on v0.90, but you may need to for now to get out of trouble. Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-28 16:42 ` Phil Turmel @ 2015-10-28 17:10 ` Andras Tantos 2015-10-28 17:38 ` Phil Turmel 2015-10-29 16:59 ` Andras Tantos 1 sibling, 1 reply; 24+ messages in thread From: Andras Tantos @ 2015-10-28 17:10 UTC (permalink / raw) To: Phil Turmel; +Cc: Linux-RAID Phil, >> To be able to re-assemble the array, I *have* to specify metadata >> version 0.9: >> >> Is this a problem? Can I upgrade my array to 1.0 metadata? Should I? > > Hmm. Interesting. Your version of mdadm is insisting on reserving much > more space between end of content and the v1.0 metadata than when using > v0.90 metadata. > > I'm curious how much. Please show the output of "cat /proc/partitions". root@bazsalikom:/home/tantos# cat /proc/partitions major minor #blocks name 8 16 1465138584 sdb 8 17 1465136001 sdb1 8 48 1465138584 sdd 8 49 1465136001 sdd1 8 80 1465138584 sdf 8 81 1465136001 sdf1 8 96 1953513527 sdg 8 97 1953512001 sdg1 8 112 1953514584 sdh 8 113 538145 sdh1 8 114 1465138552 sdh2 8 115 487837854 sdh3 8 64 1953514584 sde 8 65 538145 sde1 8 66 1465138552 sde2 8 67 487837854 sde3 8 32 1953514584 sdc 8 33 538145 sdc1 8 34 1465138552 sdc2 8 35 487837854 sdc3 9 0 487837760 md0 9 1 7325679680 md1 Andras ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-28 17:10 ` Andras Tantos @ 2015-10-28 17:38 ` Phil Turmel 0 siblings, 0 replies; 24+ messages in thread From: Phil Turmel @ 2015-10-28 17:38 UTC (permalink / raw) To: Andras Tantos; +Cc: Linux-RAID On 10/28/2015 01:10 PM, Andras Tantos wrote: > Phil, > >>> To be able to re-assemble the array, I *have* to specify metadata >>> version 0.9: >>> >>> Is this a problem? Can I upgrade my array to 1.0 metadata? Should I? >> >> Hmm. Interesting. Your version of mdadm is insisting on reserving much >> more space between end of content and the v1.0 metadata than when using >> v0.90 metadata. >> >> I'm curious how much. Please show the output of "cat /proc/partitions". Ok. I think your version of mdadm is trying to put a bitmap on the v1.0 array, which can be suppressed with --bitmap=none. Or just do the --assemble --update. Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-28 16:42 ` Phil Turmel 2015-10-28 17:10 ` Andras Tantos @ 2015-10-29 16:59 ` Andras Tantos 2015-10-30 18:12 ` Phil Turmel 1 sibling, 1 reply; 24+ messages in thread From: Andras Tantos @ 2015-10-29 16:59 UTC (permalink / raw) To: Phil Turmel; +Cc: Linux-RAID Phil, On 10/28/2015 9:42 AM, Phil Turmel wrote: > If you stop the array cleanly and then manually re-assemble with > --update=metadata, you might get around it. (Specify all of the > devices explicitly to ensure you don't get burned by v0.90's problems > with last partitions.) You definitely don't want to stay on v0.90, but > you may need to for now to get out of trouble. Phil It seems that my mdadm doesn't have an --update=metadata option, which if I understand it right means I have to re-create the array with the no-bitmap option. How dangerous is that? Is it possible that things get overwritten during the re-create process in the data portion of the array? I've read that GRUB (which is my bootloader) didn't support v1.0 superblocks for a while. It seems that 0.99 version of GRUB (which is what I have) has it, but how to make certain? I don't want to render my system un-bootable... Can you expand a little bit on the problems of v0.90 superblocks and why upgrading is advantageous? What I've read about the differences (lifted limit of number of devices/array and 2TB per device limit) don't really apply to my case. Thanks, Andras ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-29 16:59 ` Andras Tantos @ 2015-10-30 18:12 ` Phil Turmel 2015-11-03 23:42 ` How to recover after md crash during reshape? - SOLVED/SUMMARY Andras Tantos 0 siblings, 1 reply; 24+ messages in thread From: Phil Turmel @ 2015-10-30 18:12 UTC (permalink / raw) To: Andras Tantos; +Cc: Linux-RAID On 10/29/2015 12:59 PM, Andras Tantos wrote: > Phil, > > On 10/28/2015 9:42 AM, Phil Turmel wrote: >> If you stop the array cleanly and then manually re-assemble with >> --update=metadata, you might get around it. (Specify all of the >> devices explicitly to ensure you don't get burned by v0.90's problems >> with last partitions.) You definitely don't want to stay on v0.90, but >> you may need to for now to get out of trouble. Phil > > It seems that my mdadm doesn't have an --update=metadata option, which > if I understand it right means I have to re-create the array with the > no-bitmap option. How dangerous is that? Is it possible that things get > overwritten during the re-create process in the data portion of the array? Just clone and compile a local copy of the latest mdadm, then run it as ./mdadm for the --update operation. git clone git://github.com/neilbrown/mdadm > I've read that GRUB (which is my bootloader) didn't support v1.0 > superblocks for a while. It seems that 0.99 version of GRUB (which is > what I have) has it, but how to make certain? I don't want to render my > system un-bootable... Old grub doesn't understand MD at all, which is why you needed a mirror that has the content starting at the beginning of the partition. To grub, it doesn't look like a mirror. This is true for v1.0 as well. > Can you expand a little bit on the problems of v0.90 superblocks and why > upgrading is advantageous? What I've read about the differences (lifted > limit of number of devices/array and 2TB per device limit) don't really > apply to my case. v0.90 will screw up if you have it on the last partition of a device, and that partition runs very close to the end of the device. v0.90 doesn't include size info in the metadata itself, so it is ambiguous in that case whether the superblock belongs to the device as a whole or the partition. That'll really scramble an array. Just say no to v0.90. Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? - SOLVED/SUMMARY 2015-10-30 18:12 ` Phil Turmel @ 2015-11-03 23:42 ` Andras Tantos 0 siblings, 0 replies; 24+ messages in thread From: Andras Tantos @ 2015-11-03 23:42 UTC (permalink / raw) To: Phil Turmel; +Cc: Linux-RAID Thank you all who helped me solve my problem, especially Phil Turmel, who I am in dept for the rest of my live. Right now my family photos - and my marriage - are safe. For people, who might be interested in the future, here's a quick summary of the events and the recovery: Trouble: ========== Was going to extend RAID6 array from 7 disks to 10. Array reshape crashed early in the process. After reboot, the array wouldn't re-assemble with error message: mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar superblocks. If they are really different, please --zero the superblock on one If they are the same or overlap, please remove one from the DEVICE list in mdadm.conf. What I SHOULD have done here is to remove SDA from the DEVICE list in mdadm.conf followed by mdadm --grow --continue /dev/md1 --backup-file ..... What I did is to zero the superblock of SDA1. The same message appeard for the other two new HDDs in the array as well. By the time I zeroed the super blocks of all three new disks the array assembled but didn't start because it was missing three drives. Recovery: =========== 1. Look at the partitions listed in /proc/mdstat for the array. 2. For each of the constituents of the array, do mdadm -E <disk name from the array> 3. Note all the parameters, especially these: 'Chunk Size', 'Raid Level', 'Version' 4. Make sure all remaining disks show the same event count ('Events') and they have correct checksum and all the above parameters match. 5. Note the order of the disks in the array. You can find that in this line: Number Major Minor RaidDevice State this 6 8 98 6 active sync 6. If all matches, stop the array: mdadm --stop /dev/md1 7. Re-create your array as follows: mdadm --create --assume-clean --verbose \ --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \ /dev/md1 <list of devices in the exact order from note 5 above> Replace number of devices, chunk size and raid level from note 3 above. For me, I had do specify metadata version 0.9, which was my original metadata version (as reported by the 'Version' parameter in point 3 above). YMMV. 8. If all goes well, the array will now re-assemble with the original 7 disks. The data on the array is corrupted up to the point where the reshape stopped, so... 9. fsck -n /dev/md1 to assess the damage. If doesn't look terrible, fix the errors: fsck -y /dev/md1. 10. Mount the array rejoice in the data that's recovered. Final notes: =============== I still don't know the root cause of the crash. What I did notice is that this particular (Core2 duo) system seems to become unstable with more than 9 HDDs. It doesn't seem to be a power supply issue as it has trouble even if about half of the drives are supplied from a second PSU. Version 0.9 metadata has some problems, causing the misleading message in the first place. Upgrading to version 1.0 metadata is a good idea. If you use desktop or green drives in your array, fix the short kernel timeout on SATA devices (30s). Issue this on every boot: for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done If you don't do that, the first unrecoverable read error will degrade your array instead of simply relocating the failing sector on the hard drive. To find and fix unrecoverable read errors on your array, regularly issue: echo check >/sys/block/md0/md/sync_action This is a looooong operation on a large RAID6 array, but makes sure that bad sectors don't accumulate in seldom-accessed corners and destroy your array at the worst possible time. Andras ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-20 2:35 How to recover after md crash during reshape? andras ` (2 preceding siblings ...) 2015-10-20 13:49 ` Phil Turmel @ 2015-10-21 1:35 ` Neil Brown 2015-10-21 4:03 ` andras 2015-10-21 12:18 ` Phil Turmel 3 siblings, 2 replies; 24+ messages in thread From: Neil Brown @ 2015-10-21 1:35 UTC (permalink / raw) To: andras, linux-raid [-- Attachment #1: Type: text/plain, Size: 5493 bytes --] andras@tantosonline.com writes: Phil has provided lots of useful advice, I'll just add a couple of clarifications; > > mdadm --grow --raid-devices=10 /dev/md1 > > Yes, I was dumb enough to start the process without a backup option - > (copy-paste error from https://raid.wiki.kernel.org/index.php/Growing). Nothing dumb about that - you don't need a --backup option. If you did, mdadm would have complained. You only need --backup when the size of the array is unchanged or decreasing. (or when growing to a degraded array. e.g. you can reshape a 4-drive raid5 to a degraded 5-drive raid5 without adding a spare. This will required a --backup. I'm fairly sure it also requires --force because it is a very strange thing to do). When reshaping it a larger array, mdadm only requires a backup while reshaping the first few stripes, and it uses some space in one of the new (previously spare) devices to store that backup. > > This immediately (well, after 2 seconds) crashed the MD driver: > > Oct 17 17:30:27 bazsalikom kernel: [7869821.514718] sd 0:0:0:0: > [sdj] Attached SCSI disk > Oct 17 18:39:21 bazsalikom kernel: [7873955.418679] sdh: sdh1 > Oct 17 18:39:37 bazsalikom kernel: [7873972.155084] sdi: sdi1 > Oct 17 18:39:49 bazsalikom kernel: [7873983.916038] sdj: sdj1 > Oct 17 18:40:33 bazsalikom kernel: [7874027.963430] md: bind<sdh1> > Oct 17 18:40:34 bazsalikom kernel: [7874028.263656] md: bind<sdi1> > Oct 17 18:40:34 bazsalikom kernel: [7874028.361112] md: bind<sdj1> > Oct 17 18:59:48 bazsalikom kernel: [7875182.667815] md: reshape of > RAID array md1 > Oct 17 18:59:48 bazsalikom kernel: [7875182.667818] md: minimum > _guaranteed_ speed: 1000 KB/sec/disk. > Oct 17 18:59:48 bazsalikom kernel: [7875182.667821] md: using > maximum available idle IO bandwidth (but not more than 200000 KB/sec) > for reshape. > Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using 128k > window, over a total of 1465135936k. > --> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md: md_do_sync() > got signal ... exiting This is very strange ... maybe some messages missing? Probably an IO error while writing to a new device. > > From here on, things went downhill pretty damn fast. I was not able to > unmount the file-system, stop or re-start the array (/proc/mdstat went > away), any process trying to touch /dev/md1 hung, so eventually, I run > out of options and hit the reset button on the machine. > > Upon reboot, the array wouldn't assemble, it was complaining that SDA > and SDA1 had the same superblock info on it. > > mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar > superblocks. > If they are really different, please --zero the superblock on one > If they are the same or overlap, please remove one from the > DEVICE list in mdadm.conf. It's very hard to make messages like this clear without being incredibly verbose... In this case /dev/sda and /dev/sda1 obviously overlap (that is obvious, isn't it?). So in that case you need to remove one of them from the DEVICE list. You probably don't have a DEVICE list so it defaults to everything listed in /proc/partitions. The "correct" thing to do at this point would have been to add a DEVICE list to mdadm.conf which only listed the devices that might be part of an array. e.g. DEVICE /dev/sd[a-z][1-9] > So, if I read this right, the superblock here states that the array is > in the middle of a reshape from 7 to 10 devices, but it just started > (4096 is the position). > What's interesting is the device names listed here don't match the ones > reported by /proc/mdstat, and are actually incorrect. The right > partition numbers are in /proc/mdstat. > > The superblocks on the 6 other original disks match, except for of > course which one they mark as 'this' and the checksum. > > I've read in here (http://ubuntuforums.org/showthread.php?t=2133576) > among many other places that it might be possible to recover the data on > the array by trying to re-create it to the state before the re-shape. > > I've also read that if I want to re-create an array in read-only mode, I > should re-create it degraded. > > So, what I thought I would do is this: > > mdadm --create /dev/md1 --level=6 --raid-devices=7 /dev/sdh2 > /dev/sdf2 /dev/sdi1 /dev/sdg1 /dev/sde1 missing missing Phil has given good advice on this point which is worth following. It is quite possible that there will still be corruption. mdadm reads the first few stripes and stores them somewhere in each of the spares. md (in the kernel) then reads those stripes again and writes them out in the new configuration. It appears that one of the writes failed, others might have succeeded. This may not have corrupted anything (the first few blocks are in the same position for both the old and new layout) but it might have done. So if the filesystem seems corrupt after the array is re-created, that is likely the reason. The data still exists in the backup on those new devices (if you haven't done anything to them) and could be restored. If you do want to look for the backup, it is around about the middle of the device and has some metadata which contains the string "md_backup_data-1". If you find that, you are close to getting the backup data back. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown @ 2015-10-21 4:03 ` andras 2015-10-21 12:18 ` Phil Turmel 1 sibling, 0 replies; 24+ messages in thread From: andras @ 2015-10-21 4:03 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Neil, Thanks for helping me out! >> Oct 17 18:59:48 bazsalikom kernel: [7875182.667831] md: using >> 128k >> window, over a total of 1465135936k. >> --> Oct 17 18:59:50 bazsalikom kernel: [7875184.326245] md: >> md_do_sync() >> got signal ... exiting > > This is very strange ... maybe some messages missing? > Probably an IO error while writing to a new device. I'm not sure what have happened either. This is /var/log/messages. Maybe those things go into a different log? >> mdadm: WARNING /dev/sda and /dev/sda1 appear to have very similar >> superblocks. >> If they are really different, please --zero the superblock on >> one >> If they are the same or overlap, please remove one from the >> DEVICE list in mdadm.conf. > > It's very hard to make messages like this clear without being > incredibly > verbose... > > In this case /dev/sda and /dev/sda1 obviously overlap (that is obvious, > isn't it?). > So in that case you need to remove one of them from the DEVICE list. > You probably don't have a DEVICE list so it defaults to everything > listed in > /proc/partitions. > The "correct" thing to do at this point would have been to add a DEVICE > list to mdadm.conf which only listed the devices that might be part of > an array. e.g. > > DEVICE /dev/sd[a-z][1-9] Understood. My problem was that when I googled for the problem, people agreed with the suggested solution of the zeroing the superblock. I guess it tells you how much you should trust 'common wisdom'. > > Phil has given good advice on this point which is worth following. > It is quite possible that there will still be corruption. > > mdadm reads the first few stripes and stores them somewhere in each of > the spares. md (in the kernel) then reads those stripes again and > writes them out in the new configuration. It appears that one of the > writes failed, others might have succeeded. This may not have > corrupted > anything (the first few blocks are in the same position for both the > old > and new layout) but it might have done. > > So if the filesystem seems corrupt after the array is re-created, that > is likely the reason. > The data still exists in the backup on those new devices (if you > haven't > done anything to them) and could be restored. > > If you do want to look for the backup, it is around about the middle of > the device and has some metadata which contains the string > "md_backup_data-1". If you find that, you are close to getting the > backup data back. > > NeilBrown Oh, gosh, I hope I don't have to do that deep of a surgery. No, I haven't touched the new HDDs other then zeroing the superblock. So whatever was on them, is still there. I'll see how much damage there is to the FS after I reconstruct the array. Thanks for all the help! Andras ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown 2015-10-21 4:03 ` andras @ 2015-10-21 12:18 ` Phil Turmel 2015-10-21 20:26 ` Neil Brown 1 sibling, 1 reply; 24+ messages in thread From: Phil Turmel @ 2015-10-21 12:18 UTC (permalink / raw) To: Neil Brown, andras, linux-raid Good morning Neil, On 10/20/2015 09:35 PM, Neil Brown wrote: > Nothing dumb about that - you don't need a --backup option. > If you did, mdadm would have complained. > > You only need --backup when the size of the array is unchanged or > decreasing. > mdadm reads the first few stripes and stores them somewhere in each of > the spares. md (in the kernel) then reads those stripes again and > writes them out in the new configuration. It appears that one of the > writes failed, others might have succeeded. This may not have corrupted > anything (the first few blocks are in the same position for both the old > and new layout) but it might have done. > If you do want to look for the backup, it is around about the middle of > the device and has some metadata which contains the string > "md_backup_data-1". If you find that, you are close to getting the > backup data back. Hmmm. This feature has advanced beyond my last look at the code. I was under the impression the backup option was only optional when mdadm could move the data offset. Does this new algorithm apply to v0.90 metadata, a v3.2 kernel, and v3.2.5 mdadm? Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-21 12:18 ` Phil Turmel @ 2015-10-21 20:26 ` Neil Brown 2015-10-21 20:37 ` Phil Turmel 0 siblings, 1 reply; 24+ messages in thread From: Neil Brown @ 2015-10-21 20:26 UTC (permalink / raw) To: Phil Turmel, andras, linux-raid [-- Attachment #1: Type: text/plain, Size: 1909 bytes --] Phil Turmel <philip@turmel.org> writes: > Good morning Neil, > > On 10/20/2015 09:35 PM, Neil Brown wrote: > >> Nothing dumb about that - you don't need a --backup option. >> If you did, mdadm would have complained. >> >> You only need --backup when the size of the array is unchanged or >> decreasing. > >> mdadm reads the first few stripes and stores them somewhere in each of >> the spares. md (in the kernel) then reads those stripes again and >> writes them out in the new configuration. It appears that one of the >> writes failed, others might have succeeded. This may not have corrupted >> anything (the first few blocks are in the same position for both the old >> and new layout) but it might have done. > >> If you do want to look for the backup, it is around about the middle of >> the device and has some metadata which contains the string >> "md_backup_data-1". If you find that, you are close to getting the >> backup data back. > > Hmmm. This feature has advanced beyond my last look at the code. I was > under the impression the backup option was only optional when mdadm > could move the data offset. Does this new algorithm apply to v0.90 > metadata, a v3.2 kernel, and v3.2.5 mdadm? > It isn't a new algorithm, it is the original algorithm. In mdadm-2.4-pre1 (march 2006), you couldn't specify a backup file, but you could grow a raid5 to more devices. That was changed by a patch with comment: Allow resize to backup to a file. To support resizing an array without a spare, mdadm now understands --backup-file= which should point to a file for storing a backup of critical data. This can be given to --grow which will create the file, or --assemble which will restore from the file if needed. The backup-file was subsequently used to support in-place reshapes and array shrinking. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to recover after md crash during reshape? 2015-10-21 20:26 ` Neil Brown @ 2015-10-21 20:37 ` Phil Turmel 0 siblings, 0 replies; 24+ messages in thread From: Phil Turmel @ 2015-10-21 20:37 UTC (permalink / raw) To: Neil Brown, andras, linux-raid On 10/21/2015 04:26 PM, Neil Brown wrote: > Phil Turmel <philip@turmel.org> writes: >> Hmmm. This feature has advanced beyond my last look at the code. I was >> under the impression the backup option was only optional when mdadm >> could move the data offset. Does this new algorithm apply to v0.90 >> metadata, a v3.2 kernel, and v3.2.5 mdadm? >> > > It isn't a new algorithm, it is the original algorithm. > > In mdadm-2.4-pre1 (march 2006), you couldn't specify a backup file, but > you could grow a raid5 to more devices. > That was changed by a patch with comment: > > Allow resize to backup to a file. > > To support resizing an array without a spare, mdadm now understands > --backup-file= > which should point to a file for storing a backup of critical data. > This can be given to --grow which will create the file, or > --assemble which will restore from the file if needed. > > The backup-file was subsequently used to support in-place reshapes and > array shrinking. Ah, ok. I wasn't using parity raid that far back, and never noticed that growing to more devices worked that way. Thanks for clarifying. Phil ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-11-03 23:42 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-10-20 2:35 How to recover after md crash during reshape? andras 2015-10-20 12:50 ` Anugraha Sinha 2015-10-20 13:04 ` Wols Lists 2015-10-20 13:49 ` Phil Turmel [not found] ` <3baf849321d819483c5d20c005a31844@tantosonline.com> 2015-10-20 15:42 ` Phil Turmel 2015-10-20 22:34 ` Anugraha Sinha 2015-10-21 3:52 ` andras 2015-10-21 12:01 ` Phil Turmel 2015-10-21 16:17 ` Wols Lists 2015-10-21 16:05 ` Phil Turmel 2015-10-25 14:15 ` andras 2015-10-25 23:02 ` Phil Turmel 2015-10-28 16:31 ` Andras Tantos 2015-10-28 16:42 ` Phil Turmel 2015-10-28 17:10 ` Andras Tantos 2015-10-28 17:38 ` Phil Turmel 2015-10-29 16:59 ` Andras Tantos 2015-10-30 18:12 ` Phil Turmel 2015-11-03 23:42 ` How to recover after md crash during reshape? - SOLVED/SUMMARY Andras Tantos 2015-10-21 1:35 ` How to recover after md crash during reshape? Neil Brown 2015-10-21 4:03 ` andras 2015-10-21 12:18 ` Phil Turmel 2015-10-21 20:26 ` Neil Brown 2015-10-21 20:37 ` Phil Turmel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.