[linux-lvm] LVM RAID5 out-of-sync recovery

* [linux-lvm] LVM RAID5 out-of-sync recovery
@ 2016-10-03 23:49 Slava Prisivko
  2016-10-04  9:45 ` Giuliano Procida
  0 siblings, 1 reply; 12+ messages in thread
From: Slava Prisivko @ 2016-10-03 23:49 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

Hi,

In order to mitigate cross-posting, here's the original question on
Serverfault.SE: LVM RAID5 out-of-sync recovery
<https://serverfault.com/questions/806805/lvm-raid5-out-of-sync-recovery>,
but feel free to answer wherever you deem appropriate.

How can one recover from an LVM RAID5 out-of-sync?

I have an LVM RAID5 configuration (RAID5 using the LVM tools).

However, because of a technical problem mirrors went out of sync. You can
reproduce this as explained in this Unix & Linux question
<http://unix.stackexchange.com/a/182503>:

> Playing with my Jessie VM, I disconnected (virtually) one disk. That
worked, the machine stayed running. lvs, though, gave no indication the
arrays were degraded. I re-attached the disk, and removed a second. Stayed
running (this is raid6). Re-attached, still no indication from lvs. I ran
lvconvert --repair on the volume, it told me it was OK. Then I pulled a
third disk... and the machine died. Re-inserted it, rebooted, and am now
unsure how to fix.

If I had been using mdadm, I could have probably recovered the data using
`mdadm --force --assemble`, but I was not able to achieve the same using
the LVM tools.

I have tried to concatenate rmeta and rimage for each mirror and put them
on three linear devices in order to feed them to the mdadm (because LVM
leverages MD), but without success (`mdadm --examine` does not recognize
the superblock), because it appears that the mdadm superblock format
<https://raid.wiki.kernel.org/index.php/RAID_superblock_formats> differs
from the dm_raid superblock format (search for the "dm_raid_superblock")
<https://github.com/torvalds/linux/blob/43f4d36cbf82428374966568ea57a0bc0d664a20/drivers/md/dm-raid.c>
.

I tried to understand how device-mapper RAID
<https://www.kernel.org/doc/Documentation/device-mapper/dm-raid.txt>
leverages MD, but was unable to find any documentation while the kernel code
<https://github.com/torvalds/linux/blob/43f4d36cbf82428374966568ea57a0bc0d664a20/drivers/md/dm-raid.c>
is quite complicated.

I also tried to rebuild the mirror directly by using `dmsetup`, but it
can't rebuild if metadata is out of sync.

Overall, almost the only useful information I could find is RAIDing with
LVM vs MDRAID - pros and cons? question on Unix & Linux SE
<http://unix.stackexchange.com/a/182503>.

The output of various commands is provided below.

    # lvs -a -o +devices

    test                           vg   rwi---r---  64.00m
test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
    [test_rimage_0]                vg   Iwi-a-r-r-  32.00m /dev/sdc2(1)
    [test_rimage_1]                vg   Iwi-a-r-r-  32.00m
/dev/sda2(238244)
    [test_rimage_2]                vg   Iwi-a-r-r-  32.00m
/dev/sdb2(148612)
    [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m /dev/sdc2(0)
    [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m
/dev/sda2(238243)
    [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m /dev/sdb2(148611)

I cannot activate the LV:

    # lvchange -ay vg/test -v
        Activating logical volume "test" exclusively.
        activation/volume_list configuration setting not defined: Checking
only host tags for vg/test.
        Loading vg-test_rmeta_0 table (253:35)
        Suppressed vg-test_rmeta_0 (253:35) identical table reload.
        Loading vg-test_rimage_0 table (253:36)
        Suppressed vg-test_rimage_0 (253:36) identical table reload.
        Loading vg-test_rmeta_1 table (253:37)
        Suppressed vg-test_rmeta_1 (253:37) identical table reload.
        Loading vg-test_rimage_1 table (253:38)
        Suppressed vg-test_rimage_1 (253:38) identical table reload.
        Loading vg-test_rmeta_2 table (253:39)
        Suppressed vg-test_rmeta_2 (253:39) identical table reload.
        Loading vg-test_rimage_2 table (253:40)
        Suppressed vg-test_rimage_2 (253:40) identical table reload.
        Creating vg-test
        Loading vg-test table (253:87)
      device-mapper: reload ioctl on (253:87) failed: Invalid argument
        Removing vg-test (253:87)

While trying to activate I'm getting the following in the dmesg:

    device-mapper: table: 253:87: raid: Cannot change device positions in
RAID array
    device-mapper: ioctl: error adding target to table

lvconvert only works on active LVs:
    # lvconvert --repair vg/test

      vg/test must be active to perform this operation.

I have the following LVM version:

    # lvm version
      LVM version:     2.02.145(2) (2016-03-04)
      Library version: 1.02.119 (2016-03-04)
      Driver version:  4.34.0

And the following kernel version:

    Linux server 4.4.8-hardened-r1-1 #1 SMP

--
Best regards,
Slava Prisivko.

[-- Attachment #2: Type: text/html, Size: 5988 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread