Accidental grow before add

* Accidental grow before add
@ 2010-09-26  7:27 Mike Hartman
  2010-09-26  9:39 ` Mike Hartman
  2010-09-27  8:11 ` Jon Hardcastle
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Hartman @ 2010-09-26  7:27 UTC (permalink / raw)
  To: linux-raid

I think I may have mucked up my array, but I'm hoping somebody can
give me a tip to retrieve the situation.

I had just added a new disk to my system and partitioned it in
preparation for adding it to my RAID 6 array, growing it from 7
devices to 8. However, I jumped the gun (guess I'm more tired than I
thought) and ran the grow command before I added the new disk to the
array as a spare.

In other words, I should have run:

mdadm --add /dev/md0 /dev/md3p1
mdadm --grow /dev/md0 --raid-devices=8 --backup-file=/grow_md0.bak

but instead I just ran

mdadm --grow /dev/md0 --raid-devices=8 --backup-file=/grow_md0.bak

I immediately checked /proc/mdstat and got the following output:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 sdk1[0] md2p1[7] sde1[6] sdf1[5] md1p1[4] sdl1[3] sdj1[1]
      7324227840 blocks super 1.2 level 6, 256k chunk, algorithm 2
[8/7] [UUUUUUU_]
      [>....................]  reshape =  0.0% (79600/1464845568)
finish=3066.3min speed=7960K/sec

md3 : active raid0 sdb1[0] sdh1[1]
      1465141760 blocks super 1.2 128k chunks

md2 : active raid0 sdc1[0] sdd1[1]
      1465141760 blocks super 1.2 128k chunks

md1 : active raid0 sdi1[0] sdm1[1]
      1465141760 blocks super 1.2 128k chunks

unused devices: <none>

At this point I figured I was probably ok. It looked like it was
restructuring the array to expect 8 disks, and with only 7 it would
just end up being in a degraded state. So I figured I'd just cost
myself some time - one reshape to get to the degraded 8 disk state,
and another reshape to activate the new disk instead of just the one
reshape onto the new disk. I went ahead and added the new disk as a
spare, figuring the current reshape operation would ignore it until it
completed, and then the system would notice it was degraded with a
spare available and rebuild it.

However, things have slowed to a crawl (relative to the time it
normally takes to regrow this array) so I'm afraid something has gone
wrong. As you can see in the initial mdstat above, it started at
7960K/sec - quite fast for a reshape on this array. But just a couple
minutes after that it had dropped down to only 667K. It worked its way
back up through 1801K to 10277K, which is about average for a reshape
on this array. Not sure how long it stayed at that level, but now
(still only 10 or 15 minutes after the original mistake) it's plunged
all the way down to 40K/s. It's been down at this level for several
minutes and still dropping slowly. This doesn't strike me as a good
sign for the health of the unusual regrow operation.

Anybody have a theory on what could be causing the slowness? Does it
seem like a reasonable consequence to growing an array without a spare
attached? I'm hoping that this particular growing mistake isn't
automatically fatal or mdadm would have warned me or asked for a
confirmation or something. Worst case scenario I'm hoping the array
survives even if I just have to live with this speed and wait for it
to finish - although at the current rate that would take over a
year... Dare I mount the array's partition to check on the contents,
or would that risk messing it up worse?

Here's the latest /proc/mdstat:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 md3p1[8](S) sdk1[0] md2p1[7] sde1[6] sdf1[5]
md1p1[4] sdl1[3] sdj1[1]
      7324227840 blocks super 1.2 level 6, 256k chunk, algorithm 2
[8/7] [UUUUUUU_]
      [>....................]  reshape =  0.1% (1862640/1464845568)
finish=628568.8min speed=38K/sec

md3 : active raid0 sdb1[0] sdh1[1]
      1465141760 blocks super 1.2 128k chunks

md2 : active raid0 sdc1[0] sdd1[1]
      1465141760 blocks super 1.2 128k chunks

md1 : active raid0 sdi1[0] sdm1[1]
      1465141760 blocks super 1.2 128k chunks

unused devices: <none>

Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread