Raid10 reshape bug

* Raid10 reshape bug
@ 2021-02-19 20:13 Phillip Susi
  2021-03-08 16:39 ` Phillip Susi
  0 siblings, 1 reply; 3+ messages in thread
From: Phillip Susi @ 2021-02-19 20:13 UTC (permalink / raw)
  To: linux-raid

In the process of upgrading a xen server I broke the previous raid1 and
used the removed disk to create a new raid10 to prepare the new install.
I think initially I created it in the default near configuration, so I
reshaped it to offset with 1M chunk size.  I got the domUs up and
running again and was pretty happy with the result, so I blew away the
old system disk and added that disk to the new array and allowed it to
sync.  Then I thought that the 1M chunk size was hurting performance, so
I requested a reshape to a 256k chunk size with mdadm -G /dev/md0 -c
256.  It looked like it was proceeding fine so I went home for the
night.

When I came in this morming, mdadm -D showed that the reshape was
complete, but I started getting ELF errors and such running various
programs and I started to get a feeling that something had gone horribly
wrong.  At one point I was trying to run blockdev --getsz and isntead
the system somehow ran findmnt.  mdadm -E showed that there was a very
large unused section of the disk both before and after.  This is
probably because I had used -s to restrict the used size of the device
to be only 256g instead of the full 2tb so it wouldn't take so long to
resync, and since there was plenty of unused space, md decided to just
write back the new layout stripes in unused space further down the disk.
At this point I rebooted and grub could not recognize the filesystem.  I
booted other media and tried an e2fsck but it had so many complaints,
one of which being that the root directory was not, in fact, a directory
so it deleted it that I just gave up and started reinstalling and
restoring the domU from backup.

Clearly somehow the reshape process did NOT write the data back to the
disk in the correct place.  This was using debian testing with linux
5.10.0 and mdadm v4.1.

I will try to reproduce it in a vm at some point.

^ permalink raw reply	[flat|nested] 3+ messages in thread