linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] LVM RAID5 out-of-sync recovery
@ 2016-10-03 23:49 Slava Prisivko
  2016-10-04  9:45 ` Giuliano Procida
  0 siblings, 1 reply; 12+ messages in thread
From: Slava Prisivko @ 2016-10-03 23:49 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

Hi,

In order to mitigate cross-posting, here's the original question on
Serverfault.SE: LVM RAID5 out-of-sync recovery
<https://serverfault.com/questions/806805/lvm-raid5-out-of-sync-recovery>,
but feel free to answer wherever you deem appropriate.

How can one recover from an LVM RAID5 out-of-sync?

I have an LVM RAID5 configuration (RAID5 using the LVM tools).

However, because of a technical problem mirrors went out of sync. You can
reproduce this as explained in this Unix & Linux question
<http://unix.stackexchange.com/a/182503>:

> Playing with my Jessie VM, I disconnected (virtually) one disk. That
worked, the machine stayed running. lvs, though, gave no indication the
arrays were degraded. I re-attached the disk, and removed a second. Stayed
running (this is raid6). Re-attached, still no indication from lvs. I ran
lvconvert --repair on the volume, it told me it was OK. Then I pulled a
third disk... and the machine died. Re-inserted it, rebooted, and am now
unsure how to fix.

If I had been using mdadm, I could have probably recovered the data using
`mdadm --force --assemble`, but I was not able to achieve the same using
the LVM tools.

I have tried to concatenate rmeta and rimage for each mirror and put them
on three linear devices in order to feed them to the mdadm (because LVM
leverages MD), but without success (`mdadm --examine` does not recognize
the superblock), because it appears that the mdadm superblock format
<https://raid.wiki.kernel.org/index.php/RAID_superblock_formats> differs
from the dm_raid superblock format (search for the "dm_raid_superblock")
<https://github.com/torvalds/linux/blob/43f4d36cbf82428374966568ea57a0bc0d664a20/drivers/md/dm-raid.c>
.

I tried to understand how device-mapper RAID
<https://www.kernel.org/doc/Documentation/device-mapper/dm-raid.txt>
leverages MD, but was unable to find any documentation while the kernel code
<https://github.com/torvalds/linux/blob/43f4d36cbf82428374966568ea57a0bc0d664a20/drivers/md/dm-raid.c>
is quite complicated.

I also tried to rebuild the mirror directly by using `dmsetup`, but it
can't rebuild if metadata is out of sync.

Overall, almost the only useful information I could find is RAIDing with
LVM vs MDRAID - pros and cons? question on Unix & Linux SE
<http://unix.stackexchange.com/a/182503>.

The output of various commands is provided below.

    # lvs -a -o +devices

    test                           vg   rwi---r---  64.00m
test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
    [test_rimage_0]                vg   Iwi-a-r-r-  32.00m /dev/sdc2(1)
    [test_rimage_1]                vg   Iwi-a-r-r-  32.00m
/dev/sda2(238244)
    [test_rimage_2]                vg   Iwi-a-r-r-  32.00m
/dev/sdb2(148612)
    [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m /dev/sdc2(0)
    [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m
/dev/sda2(238243)
    [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m /dev/sdb2(148611)

I cannot activate the LV:

    # lvchange -ay vg/test -v
        Activating logical volume "test" exclusively.
        activation/volume_list configuration setting not defined: Checking
only host tags for vg/test.
        Loading vg-test_rmeta_0 table (253:35)
        Suppressed vg-test_rmeta_0 (253:35) identical table reload.
        Loading vg-test_rimage_0 table (253:36)
        Suppressed vg-test_rimage_0 (253:36) identical table reload.
        Loading vg-test_rmeta_1 table (253:37)
        Suppressed vg-test_rmeta_1 (253:37) identical table reload.
        Loading vg-test_rimage_1 table (253:38)
        Suppressed vg-test_rimage_1 (253:38) identical table reload.
        Loading vg-test_rmeta_2 table (253:39)
        Suppressed vg-test_rmeta_2 (253:39) identical table reload.
        Loading vg-test_rimage_2 table (253:40)
        Suppressed vg-test_rimage_2 (253:40) identical table reload.
        Creating vg-test
        Loading vg-test table (253:87)
      device-mapper: reload ioctl on (253:87) failed: Invalid argument
        Removing vg-test (253:87)

While trying to activate I'm getting the following in the dmesg:

    device-mapper: table: 253:87: raid: Cannot change device positions in
RAID array
    device-mapper: ioctl: error adding target to table

lvconvert only works on active LVs:
    # lvconvert --repair vg/test

      vg/test must be active to perform this operation.

I have the following LVM version:

    # lvm version
      LVM version:     2.02.145(2) (2016-03-04)
      Library version: 1.02.119 (2016-03-04)
      Driver version:  4.34.0

And the following kernel version:

    Linux server 4.4.8-hardened-r1-1 #1 SMP

--
Best regards,
Slava Prisivko.

[-- Attachment #2: Type: text/html, Size: 5988 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-03 23:49 [linux-lvm] LVM RAID5 out-of-sync recovery Slava Prisivko
@ 2016-10-04  9:45 ` Giuliano Procida
  2016-10-04 22:14   ` Slava Prisivko
  0 siblings, 1 reply; 12+ messages in thread
From: Giuliano Procida @ 2016-10-04  9:45 UTC (permalink / raw)
  To: LVM general discussion and development

Before anything else, I would have suggested backing up the image and
meta sub LVs, but it looks like you are just testing.

Clear down any odd state with dmsetup remove /dev/vg/... and then run:

vgextend --restoremissing

Actually, always run LVM commands with -v -t before really running them.

On 4 October 2016 at 00:49, Slava Prisivko <vprisivko@gmail.com> wrote:
> In order to mitigate cross-posting, here's the original question on
> Serverfault.SE: LVM RAID5 out-of-sync recovery, but feel free to answer
> wherever you deem appropriate.
>
> How can one recover from an LVM RAID5 out-of-sync?

I suppose it's supposed to recover mostly automatically.
*If* your array is assembled (or whatever the LVM-equivalent
termiology is) then you can force a given subset of PVs to be
resynced.
http://man7.org/linux/man-pages/man8/lvchange.8.html - look for rebuild
However, this does not seem to be your problem.

> I have an LVM RAID5 configuration (RAID5 using the LVM tools).
>
> However, because of a technical problem mirrors went out of sync. You can
> reproduce this as explained in this Unix & Linux question:
>
>> Playing with my Jessie VM, I disconnected (virtually) one disk. That
>> worked, the machine stayed running. lvs, though, gave no indication the
>> arrays were degraded.

You should have noticed something in the kernel logs. Also, lvs should
have reported that the array was now (p)artial.

>> I re-attached the disk, and removed a second. Stayed
>> running (this is raid6). Re-attached, still no indication from lvs. I ran
>> lvconvert --repair on the volume, it told me it was OK. Then I pulled a
>> third disk... and the machine died. Re-inserted it, rebooted, and am now
>> unsure how to fix.

So this is RAID6 rather than RAID5?
And you killed 3 disks in a RAID 6 array?

> If I had been using mdadm, I could have probably recovered the data using
> `mdadm --force --assemble`, but I was not able to achieve the same using the
> LVM tools.

LVM is very different. :-(

> I have tried to concatenate rmeta and rimage for each mirror and put them on
> three linear devices in order to feed them to the mdadm (because LVM
> leverages MD), but without success (`mdadm --examine` does not recognize the
> superblock), because it appears that the mdadm superblock format differs
> from the dm_raid superblock format (search for the "dm_raid_superblock").

Not only that, but (as far as I can tell), LVM RAID 6 parity (well,
syndrome) is calculated in a different manner to the older mdadm RAID;
it uses an industry-standard layout instead of the (more obvious?) md
layout.
I wrote a utility to parity-check the default LVM RAID6 layout with
the usual stripe size (easily adjusted) here:
https://drive.google.com/open?id=0B8dHrWSoVcaDbkY3WmkxSmpfSVE

You can use this to see to what degree the data in the image LVs are
in fact in/out of sync. I've not attempted to add sync functionality
to this.

> I tried to understand how device-mapper RAID leverages MD, but was unable to
> find any documentation while the kernel code is quite complicated.
>
> I also tried to rebuild the mirror directly by using `dmsetup`, but it can't
> rebuild if metadata is out of sync.
>
> Overall, almost the only useful information I could find is RAIDing with LVM
> vs MDRAID - pros and cons? question on Unix & Linux SE.

Well, I would read through this as well (versions 6 and 7 also available):
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/Logical_Volume_Manager_Administration/index.html

> The output of various commands is provided below.
>
>     # lvs -a -o +devices
>
>     test                           vg   rwi---r---  64.00m
> test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
>     [test_rimage_0]                vg   Iwi-a-r-r-  32.00m /dev/sdc2(1)
>     [test_rimage_1]                vg   Iwi-a-r-r-  32.00m /dev/sda2(238244)
>     [test_rimage_2]                vg   Iwi-a-r-r-  32.00m /dev/sdb2(148612)
>     [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m /dev/sdc2(0)
>     [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m /dev/sda2(238243)
>     [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m /dev/sdb2(148611)
>
> I cannot activate the LV:
>
>     # lvchange -ay vg/test -v
>         Activating logical volume "test" exclusively.
>         activation/volume_list configuration setting not defined: Checking
> only host tags for vg/test.
>         Loading vg-test_rmeta_0 table (253:35)
>         Suppressed vg-test_rmeta_0 (253:35) identical table reload.
>         Loading vg-test_rimage_0 table (253:36)
>         Suppressed vg-test_rimage_0 (253:36) identical table reload.
>         Loading vg-test_rmeta_1 table (253:37)
>         Suppressed vg-test_rmeta_1 (253:37) identical table reload.
>         Loading vg-test_rimage_1 table (253:38)
>         Suppressed vg-test_rimage_1 (253:38) identical table reload.
>         Loading vg-test_rmeta_2 table (253:39)
>         Suppressed vg-test_rmeta_2 (253:39) identical table reload.
>         Loading vg-test_rimage_2 table (253:40)
>         Suppressed vg-test_rimage_2 (253:40) identical table reload.
>         Creating vg-test
>         Loading vg-test table (253:87)
>       device-mapper: reload ioctl on (253:87) failed: Invalid argument
>         Removing vg-test (253:87)
>
> While trying to activate I'm getting the following in the dmesg:
>
>     device-mapper: table: 253:87: raid: Cannot change device positions in
> RAID array
>     device-mapper: ioctl: error adding target to table

That's a new error message to me. I would try clearing out the dm
table (dmsetup remove /dev/vg/test_*) before trying again (-v -t,
first).

> lvconvert only works on active LVs:
>     # lvconvert --repair vg/test
>       vg/test must be active to perform this operation.

And it requires new PVs ("replacement drives") to put the subLVs on.
It's probably not what you want.

> I have the following LVM version:
>
>     # lvm version
>       LVM version:     2.02.145(2) (2016-03-04)
>       Library version: 1.02.119 (2016-03-04)
>       Driver version:  4.34.0

I would update LVM to whatever is in Debian testing as there has been
a fair bit of change this year.

> And the following kernel version:
>
>     Linux server 4.4.8-hardened-r1-1 #1 SMP

More useful would be the contents of /etc/lvm/backup/vg and the output
of vgs and pvs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-04  9:45 ` Giuliano Procida
@ 2016-10-04 22:14   ` Slava Prisivko
  2016-10-05 12:48     ` Giuliano Procida
  0 siblings, 1 reply; 12+ messages in thread
From: Slava Prisivko @ 2016-10-04 22:14 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 14604 bytes --]

Thanks!

On Tue, Oct 4, 2016 at 12:49 PM Giuliano Procida <giuliano.procida@gmail.com>
wrote:

Before anything else, I would have suggested backing up the image and
meta sub LVs, but it looks like you are just testing.

Already did. I'm not testing, I just renamed the LVs to "test_*" because
the previous name doesn't matter.

There is nothing particularly important there, but I would like to
understand whether I would be able to recover should something alike happen
in the future.


Clear down any odd state with dmsetup remove /dev/vg/... and then run:

vgextend --restoremissing

I didn't have to, because all the PVs are present:

# pvs
  PV         VG Fmt  Attr PSize   PFree
  /dev/sda2  vg lvm2 a--    1.82t   1.10t
  /dev/sdb2  vg lvm2 a--    3.64t   1.42t
  /dev/sdc2  vg lvm2 a--  931.51g 195.18g


Actually, always run LVM commands with -v -t before really running them.

Thanks! I had backed up the rmeta* and rimage*, so I didn't feel the need
for using -t. Am I wrong?


On 4 October 2016 at 00:49, Slava Prisivko <vprisivko@gmail.com> wrote:
> In order to mitigate cross-posting, here's the original question on
> Serverfault.SE: LVM RAID5 out-of-sync recovery, but feel free to answer
> wherever you deem appropriate.
>
> How can one recover from an LVM RAID5 out-of-sync?

I suppose it's supposed to recover mostly automatically.
*If* your array is assembled (or whatever the LVM-equivalent
termiology is) then you can force a given subset of PVs to be
resynced.
http://man7.org/linux/man-pages/man8/lvchange.8.html - look for rebuild
However, this does not seem to be your problem.

Yeah, I tried, but in vain:
# lvchange --rebuild /dev/sda2 vg/test -v
    Archiving volume group "vg" metadata (seqno 518).
Do you really want to rebuild 1 PVs of logical volume vg/test [y/n]: y
    Accepted input: [y]
  vg/test must be active to perform this operation.

> I have an LVM RAID5 configuration (RAID5 using the LVM tools).
>
> However, because of a technical problem mirrors went out of sync. You can
> reproduce this as explained in this Unix & Linux question:
>
>> Playing with my Jessie VM, I disconnected (virtually) one disk. That
>> worked, the machine stayed running. lvs, though, gave no indication the
>> arrays were degraded.

You should have noticed something in the kernel logs. Also, lvs should
have reported that the array was now (p)artial.

Yes, I've noticed it. The problem was a faulty SATA cable (as I learned
later), so when I switched the computer on for the first time, /dev/sda was
missing (in the current device allocation). I switched off the computer,
swapped the /dev/sda and /dev/sdb SATA cable (without thinking about the
consequences) and switched it on. This time the /dev/sdb was missing. I
replaced the faulty cable with a new one and switched the machine back on.
This time sda, sdb and sdc were all present, but the RAID went out-of-sync.

I'm pretty sure there were very few (if any) writing operations during the
degraded operating mode, so the I could recover by rebuilding the old
mirror (sda) using the more recent ones (sdb and sdc).


>> I re-attached the disk, and removed a second. Stayed
>> running (this is raid6). Re-attached, still no indication from lvs. I ran
>> lvconvert --repair on the volume, it told me it was OK. Then I pulled a
>> third disk... and the machine died. Re-inserted it, rebooted, and am now
>> unsure how to fix.

So this is RAID6 rather than RAID5?
And you killed 3 disks in a RAID 6 array?

Although I have RAID5, not the RAID6, but the principle is the same (as I
explained in the previous paragraph).


> If I had been using mdadm, I could have probably recovered the data using
> `mdadm --force --assemble`, but I was not able to achieve the same using
the
> LVM tools.

LVM is very different. :-(

> I have tried to concatenate rmeta and rimage for each mirror and put them
on
> three linear devices in order to feed them to the mdadm (because LVM
> leverages MD), but without success (`mdadm --examine` does not recognize
the
> superblock), because it appears that the mdadm superblock format differs
> from the dm_raid superblock format (search for the "dm_raid_superblock").

Not only that, but (as far as I can tell), LVM RAID 6 parity (well,
syndrome) is calculated in a different manner to the older mdadm RAID;
it uses an industry-standard layout instead of the (more obvious?) md
layout.
I wrote a utility to parity-check the default LVM RAID6 layout with
the usual stripe size (easily adjusted) here:
https://drive.google.com/open?id=0B8dHrWSoVcaDbkY3WmkxSmpfSVE

You can use this to see to what degree the data in the image LVs are
in fact in/out of sync. I've not attempted to add sync functionality
to this.

Thanks, I used your raid5_parity_check.cc utility with the default stripe
size (64 * 1024), but it actually doesn't matter since you're just
calculating the total xor and the stripe size acts as a buffer size for
that.

I get three unsynced stripes out of 512 (32 mib / 64 kib), but I would like
to try to reconstruct the test_rimage_1 using h the other two. Just in
case, here are the bad stripe numbers: 16, 48, 49.


> I tried to understand how device-mapper RAID leverages MD, but was unable
to
> find any documentation while the kernel code is quite complicated.
>
> I also tried to rebuild the mirror directly by using `dmsetup`, but it
can't
> rebuild if metadata is out of sync.
>
> Overall, almost the only useful information I could find is RAIDing with
LVM
> vs MDRAID - pros and cons? question on Unix & Linux SE.

Well, I would read through this as well (versions 6 and 7 also available):
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/Logical_Volume_Manager_Administration/index.html

 Thanks, but nothing particular relevant for my case there.

> The output of various commands is provided below.
>
>     # lvs -a -o +devices
>
>     test                           vg   rwi---r---  64.00m
> test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
>     [test_rimage_0]                vg   Iwi-a-r-r-  32.00m /dev/sdc2(1)
>     [test_rimage_1]                vg   Iwi-a-r-r-  32.00m
/dev/sda2(238244)
>     [test_rimage_2]                vg   Iwi-a-r-r-  32.00m
/dev/sdb2(148612)
>     [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m /dev/sdc2(0)
>     [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m
/dev/sda2(238243)
>     [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m
/dev/sdb2(148611)
>
> I cannot activate the LV:
>
>     # lvchange -ay vg/test -v
>         Activating logical volume "test" exclusively.
>         activation/volume_list configuration setting not defined: Checking
> only host tags for vg/test.
>         Loading vg-test_rmeta_0 table (253:35)
>         Suppressed vg-test_rmeta_0 (253:35) identical table reload.
>         Loading vg-test_rimage_0 table (253:36)
>         Suppressed vg-test_rimage_0 (253:36) identical table reload.
>         Loading vg-test_rmeta_1 table (253:37)
>         Suppressed vg-test_rmeta_1 (253:37) identical table reload.
>         Loading vg-test_rimage_1 table (253:38)
>         Suppressed vg-test_rimage_1 (253:38) identical table reload.
>         Loading vg-test_rmeta_2 table (253:39)
>         Suppressed vg-test_rmeta_2 (253:39) identical table reload.
>         Loading vg-test_rimage_2 table (253:40)
>         Suppressed vg-test_rimage_2 (253:40) identical table reload.
>         Creating vg-test
>         Loading vg-test table (253:87)
>       device-mapper: reload ioctl on (253:87) failed: Invalid argument
>         Removing vg-test (253:87)
>
> While trying to activate I'm getting the following in the dmesg:
>
>     device-mapper: table: 253:87: raid: Cannot change device positions in
> RAID array
>     device-mapper: ioctl: error adding target to table

That's a new error message to me. I would try clearing out the dm
table (dmsetup remove /dev/vg/test_*) before trying again (-v -t,
first).

After cleaning the dmsetup table of test_* and trying to lvchange -ay I get
practically the same:
# lvchange -ay vg/test -v
    Activating logical volume vg/test exclusively.
    activation/volume_list configuration setting not defined: Checking only
host tags for vg/test.
    Creating vg-test_rmeta_0
    Loading vg-test_rmeta_0 table (253:35)
    Resuming vg-test_rmeta_0 (253:35)
    Creating vg-test_rimage_0
    Loading vg-test_rimage_0 table (253:36)
    Resuming vg-test_rimage_0 (253:36)
    Creating vg-test_rmeta_1
    Loading vg-test_rmeta_1 table (253:37)
    Resuming vg-test_rmeta_1 (253:37)
    Creating vg-test_rimage_1
    Loading vg-test_rimage_1 table (253:38)
    Resuming vg-test_rimage_1 (253:38)
    Creating vg-test_rmeta_2
    Loading vg-test_rmeta_2 table (253:39)
    Resuming vg-test_rmeta_2 (253:39)
    Creating vg-test_rimage_2
    Loading vg-test_rimage_2 table (253:40)
    Resuming vg-test_rimage_2 (253:40)
    Creating vg-test
    Loading vg-test table (253:87)
  device-mapper: reload ioctl on (253:87) failed: Invalid argument
    Removing vg-test (253:87)

device-mapper: table: 253:87: raid: Cannot change device positions in RAID
array
device-mapper: ioctl: error adding target to table


> lvconvert only works on active LVs:
>     # lvconvert --repair vg/test
>       vg/test must be active to perform this operation.

And it requires new PVs ("replacement drives") to put the subLVs on.
It's probably not what you want.

> I have the following LVM version:
>
>     # lvm version
>       LVM version:     2.02.145(2) (2016-03-04)
>       Library version: 1.02.119 (2016-03-04)
>       Driver version:  4.34.0

I would update LVM to whatever is in Debian testing as there has been
a fair bit of change this year.

I've updated to the 2.02.166 (the latest version):

# lvm version
  LVM version:     2.02.166(2) (2016-09-26)
  Library version: 1.02.135 (2016-09-26)
  Driver version:  4.34.0


> And the following kernel version:
>
>     Linux server 4.4.8-hardened-r1-1 #1 SMP

More useful would be the contents of /etc/lvm/backup/vg and the output
of vgs and pvs.

# pvs
  PV         VG Fmt  Attr PSize   PFree
  /dev/sda2  vg lvm2 a--    1.82t   1.10t
  /dev/sdb2  vg lvm2 a--    3.64t   1.42t
  /dev/sdc2  vg lvm2 a--  931.51g 195.18g

# vgs
  VG #PV #LV #SN Attr   VSize VFree
  vg   3  18   0 wz--n- 6.37t 2.71t

Here is the relevant /etc/lvm/archive (archive is more recent that
backup) content:
test {

id = "JjiPmi-esfx-vdeF-5zMv-TsJC-6vFf-qNgNnZ" status = ["READ", "WRITE",
"VISIBLE"] flags = [] creation_time = 18446744073709551615 # 1970-01-01
02:59:59 +0300 creation_host = "server" segment_count = 1 segment1 {
start_extent = 0 extent_count = 16 # 64 Megabytes type = "raid5"
device_count = 3 stripe_size = 128 region_size = 1024 raids = [
"test_rmeta_0", "test_rimage_0", "test_rmeta_1", "test_rimage_1",
"test_rmeta_2", "test_rimage_2" ] } }

        test_rmeta_0 {
            id = "WE3CUg-ayo8-lp1Y-9S2v-zRGi-mV1s-DWYoST"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 1    # 4 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 0
                ]
            }
        }
		
		test_rmeta_1 {
            id = "Apk3mc-zy4q-c05I-hiIO-1Kae-9yB6-Cl5lfJ"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 1    # 4 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv1", 238243
                ]
            }
        }
		
		test_rmeta_2 {
            id = "j2Waf3-A77y-pvfd-foGK-Hq7B-rHe8-YKzQY0"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 1    # 4 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv2", 148611
                ]
            }
        }
		
		        test_rimage_0 {
            id = "zaGgJx-YSIl-o2oq-UN9l-02Q8-IS5u-sz4RhQ"
            status = ["READ", "WRITE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 8    # 32 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 1
                ]
            }
        }

        test_rimage_1 {
            id = "0mD5AL-GKj3-siFz-xQmO-ZtQo-L3MM-Ro2SG2"
            status = ["READ", "WRITE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 8    # 32 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv1", 238244
                ]
            }
        }
		
		test_rimage_2 {
            id = "4FxiHV-j637-ENml-Okm3-uL1p-fuZ0-y9dE8Y"
            status = ["READ", "WRITE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 8    # 32 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv2", 148612
                ]
            }
        }

--
Best regards,
Slava Prisivko.



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[-- Attachment #2: Type: text/html, Size: 25332 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-04 22:14   ` Slava Prisivko
@ 2016-10-05 12:48     ` Giuliano Procida
  2016-10-07  6:43       ` Giuliano Procida
  2016-10-09 19:00       ` Slava Prisivko
  0 siblings, 2 replies; 12+ messages in thread
From: Giuliano Procida @ 2016-10-05 12:48 UTC (permalink / raw)
  To: LVM general discussion and development

On 4 October 2016 at 23:14, Slava Prisivko <vprisivko@gmail.com> wrote:
>> vgextend --restoremissing
>
> I didn't have to, because all the PVs are present:
>
> # pvs
>   PV         VG Fmt  Attr PSize   PFree
>   /dev/sda2  vg lvm2 a--    1.82t   1.10t
>   /dev/sdb2  vg lvm2 a--    3.64t   1.42t
>   /dev/sdc2  vg lvm2 a--  931.51g 195.18g

Double-check in the metadata for MISSING. This is what I was hoping
might be in your /etc/lvm/backup file.

>> Actually, always run LVM commands with -v -t before really running them.
>
> Thanks! I had backed up the rmeta* and rimage*, so I didn't feel the need
> for using -t. Am I wrong?

Well, some nasty surprises may be avoidable (particularly if also using -f).

> Yes, I've noticed it. The problem was a faulty SATA cable (as I learned
> later), so when I switched the computer on for the first time, /dev/sda was
> missing (in the current device allocation). I switched off the computer,
> swapped the /dev/sda and /dev/sdb SATA cable (without thinking about the
> consequences) and switched it on. This time the /dev/sdb was missing. I
> replaced the faulty cable with a new one and switched the machine back on.
> This time sda, sdb and sdc were all present, but the RAID went out-of-sync.

In swapping the cables, you may have changed the sd{a,b,c} enumeration
but this will have no impact on the UUIDs that LVM uses to identify
the PVs.

> I'm pretty sure there were very few (if any) writing operations during the
> degraded operating mode, so the I could recover by rebuilding the old mirror
> (sda) using the more recent ones (sdb and sdc).

Agreed, based on your check below.

> Thanks, I used your raid5_parity_check.cc utility with the default stripe
> size (64 * 1024), but it actually doesn't matter since you're just
> calculating the total xor and the stripe size acts as a buffer size for
> that.

[I was little surprised to discover that RAID 6 works as a byte erasure code.]

The stripe size and layout matters once if you want to adapt the code
to extract or repair the data.

> I get three unsynced stripes out of 512 (32 mib / 64 kib), but I would like
> to try to reconstruct the test_rimage_1 using h the other two. Just in case,
> here are the bad stripe numbers: 16, 48, 49.

I've updated the utility (this is for raid5 = raid5_ls). Warning: not
tested on out-of-sync data.

https://drive.google.com/open?id=0B8dHrWSoVcaDYXlUWXEtZEMwX0E

# Assume the first sub LV has the out-of-date data and dump the
correct(ed) LV content.
./foo stripe $((64*1024)) repair 0 /dev/${lv}_rimage_* | cmp - /dev/${lv}

>> > The output of various commands is provided below.
>> >
>> >     # lvs -a -o +devices
>> >
>> >     test                           vg   rwi---r---  64.00m
>> > test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
>> >     [test_rimage_0]                vg   Iwi-a-r-r-  32.00m /dev/sdc2(1)
>> >     [test_rimage_1]                vg   Iwi-a-r-r-  32.00m
>> > /dev/sda2(238244)
>> >     [test_rimage_2]                vg   Iwi-a-r-r-  32.00m
>> > /dev/sdb2(148612)
>> >     [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m /dev/sdc2(0)
>> >     [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m
>> > /dev/sda2(238243)
>> >     [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m
>> > /dev/sdb2(148611)

The extra r(efresh) attributes suggest trying a resync operation which
may not be possible on inactive LV.
I missed that the RAID device is actually in the list.

> After cleaning the dmsetup table of test_* and trying to lvchange -ay I get
> practically the same:
> # lvchange -ay vg/test -v
[snip]
>   device-mapper: reload ioctl on (253:87) failed: Invalid argument
>     Removing vg-test (253:87)
>
> device-mapper: table: 253:87: raid: Cannot change device positions in RAID
> array
> device-mapper: ioctl: error adding target to table

This error occurs when the sub LV metadata says "I am device X in this
array" but dmsetup is being asked to put the sub LV at different
position Y (alas, neither are logged). With lots of -v and -d flags
you can get lvchange to include the dm table entries in the
diagnostics.

You can check the rmeta superblocks with
https://drive.google.com/open?id=0B8dHrWSoVcaDUk0wbHQzSEY3LTg

> Here is the relevant /etc/lvm/archive (archive is more recent that backup)

That looks sane, but you omitted the physical volumes section so there
is no way to cross-check UUIDs and devices or see if there are MISSING
flags.

If you use
https://drive.google.com/open?id=0B8dHrWSoVcaDQkU5NG1sLWc5cjg
directly, you can get metadata that LVM is reading off the PVs and
double-check for discrepancies.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-05 12:48     ` Giuliano Procida
@ 2016-10-07  6:43       ` Giuliano Procida
  2016-10-09 19:00         ` Slava Prisivko
  2016-10-09 19:00       ` Slava Prisivko
  1 sibling, 1 reply; 12+ messages in thread
From: Giuliano Procida @ 2016-10-07  6:43 UTC (permalink / raw)
  To: LVM general discussion and development

Slava, the main problem I had was that LVM forbade many operations
while I had a PV missing.

In your case, apparently, all PVs are present. So I suggest the following:

1. examine the recent history in /etc/lvm/archive
2. diff each transition and see if you can understand what has
happened at each stage
3. vgcfgrestore the most recent version that you think will allow you
to activate your array, you can work backwards incrementally
4. check kernel logs!
5. scrub (resync) the array if needed

Hope this helps,
Giuliano.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-05 12:48     ` Giuliano Procida
  2016-10-07  6:43       ` Giuliano Procida
@ 2016-10-09 19:00       ` Slava Prisivko
  2016-10-12  6:57         ` Giuliano Procida
  1 sibling, 1 reply; 12+ messages in thread
From: Slava Prisivko @ 2016-10-09 19:00 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 9568 bytes --]

Hi!

On Wed, Oct 5, 2016 at 3:53 PM Giuliano Procida <giuliano.procida@gmail.com>
wrote:

> On 4 October 2016 at 23:14, Slava Prisivko <vprisivko@gmail.com> wrote:
> >> vgextend --restoremissing
> >
> > I didn't have to, because all the PVs are present:
> >
> > # pvs
> >   PV         VG Fmt  Attr PSize   PFree
> >   /dev/sda2  vg lvm2 a--    1.82t   1.10t
> >   /dev/sdb2  vg lvm2 a--    3.64t   1.42t
> >   /dev/sdc2  vg lvm2 a--  931.51g 195.18g
>
> Double-check in the metadata for MISSING. This is what I was hoping
> might be in your /etc/lvm/backup file.
>
> >> Actually, always run LVM commands with -v -t before really running them.
> >
> > Thanks! I had backed up the rmeta* and rimage*, so I didn't feel the need
> > for using -t. Am I wrong?
>
> Well, some nasty surprises may be avoidable (particularly if also using
> -f).
>
> > Yes, I've noticed it. The problem was a faulty SATA cable (as I learned
> > later), so when I switched the computer on for the first time, /dev/sda
> was
> > missing (in the current device allocation). I switched off the computer,
> > swapped the /dev/sda and /dev/sdb SATA cable (without thinking about the
> > consequences) and switched it on. This time the /dev/sdb was missing. I
> > replaced the faulty cable with a new one and switched the machine back
> on.
> > This time sda, sdb and sdc were all present, but the RAID went
> out-of-sync.
>
> In swapping the cables, you may have changed the sd{a,b,c} enumeration
> but this will have no impact on the UUIDs that LVM uses to identify
> the PVs.
>
That's right, but the images went out-of-sync because during the first boot
only sdb and sdc were present (so the content of sda should have been
implied), during the second boot only sda and sdc were present (so the
content of sdb should have been implied), but when I replaced the cable
there was a conflict between these three.

>
> > I'm pretty sure there were very few (if any) writing operations during
> the
> > degraded operating mode, so the I could recover by rebuilding the old
> mirror
> > (sda) using the more recent ones (sdb and sdc).
>
> Agreed, based on your check below.
>
> > Thanks, I used your raid5_parity_check.cc utility with the default stripe
> > size (64 * 1024), but it actually doesn't matter since you're just
> > calculating the total xor and the stripe size acts as a buffer size for
> > that.
>
> [I was little surprised to discover that RAID 6 works as a byte erasure
> code.]
>
> The stripe size and layout matters once if you want to adapt the code
> to extract or repair the data.
>
> > I get three unsynced stripes out of 512 (32 mib / 64 kib), but I would
> like
> > to try to reconstruct the test_rimage_1 using h the other two. Just in
> case,
> > here are the bad stripe numbers: 16, 48, 49.
>
> I've updated the utility (this is for raid5 = raid5_ls). Warning: not
> tested on out-of-sync data.
>
> https://drive.google.com/open?id=0B8dHrWSoVcaDYXlUWXEtZEMwX0E


>
> # Assume the first sub LV has the out-of-date data and dump the
> correct(ed) LV content.
> ./foo stripe $((64*1024)) repair 0 /dev/${lv}_rimage_* | cmp - /dev/${lv}
>
Thanks!

I tried to reassemble the array using 3 different pairs of correct LV
images, but it doesn't work (I am sure because I cannot luksOpen a LUKS
image which is in the LV, which is almost surely uncorrectable).

>
> >> > The output of various commands is provided below.
> >> >
> >> >     # lvs -a -o +devices
> >> >
> >> >     test                           vg   rwi---r---  64.00m
> >> > test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
> >> >     [test_rimage_0]                vg   Iwi-a-r-r-  32.00m
> /dev/sdc2(1)
> >> >     [test_rimage_1]                vg   Iwi-a-r-r-  32.00m
> >> > /dev/sda2(238244)
> >> >     [test_rimage_2]                vg   Iwi-a-r-r-  32.00m
> >> > /dev/sdb2(148612)
> >> >     [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m
> /dev/sdc2(0)
> >> >     [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m
> >> > /dev/sda2(238243)
> >> >     [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m
> >> > /dev/sdb2(148611)
>
> The extra r(efresh) attributes suggest trying a resync operation which
> may not be possible on inactive LV.
> I missed that the RAID device is actually in the list.
>
> > After cleaning the dmsetup table of test_* and trying to lvchange -ay I
> get
> > practically the same:
> > # lvchange -ay vg/test -v
> [snip]
> >   device-mapper: reload ioctl on (253:87) failed: Invalid argument
> >     Removing vg-test (253:87)
> >
> > device-mapper: table: 253:87: raid: Cannot change device positions in
> RAID
> > array
> > device-mapper: ioctl: error adding target to table
>
> This error occurs when the sub LV metadata says "I am device X in this
> array" but dmsetup is being asked to put the sub LV at different
> position Y (alas, neither are logged). With lots of -v and -d flags
> you can get lvchange to include the dm table entries in the
> diagnostics.
>
This is as useful as it gets (-vvvv -dddd):
    Loading vg-test_rmeta_0 table (253:35)
        Adding target to (253:35): 0 8192 linear 8:34 2048
        dm table   (253:35) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rmeta_0 (253:35) identical table reload.
    Loading vg-test_rimage_0 table (253:36)
        Adding target to (253:36): 0 65536 linear 8:34 10240
        dm table   (253:36) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rimage_0 (253:36) identical table reload.
    Loading vg-test_rmeta_1 table (253:37)
        Adding target to (253:37): 0 8192 linear 8:2 1951688704
        dm table   (253:37) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rmeta_1 (253:37) identical table reload.
    Loading vg-test_rimage_1 table (253:38)
        Adding target to (253:38): 0 65536 linear 8:2 1951696896
        dm table   (253:38) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rimage_1 (253:38) identical table reload.
    Loading vg-test_rmeta_2 table (253:39)
        Adding target to (253:39): 0 8192 linear 8:18 1217423360
        dm table   (253:39) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rmeta_2 (253:39) identical table reload.
    Loading vg-test_rimage_2 table (253:40)
        Adding target to (253:40): 0 65536 linear 8:18 1217431552
        dm table   (253:40) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rimage_2 (253:40) identical table reload.
    Creating vg-test
        dm create vg-test
LVM-Pgjp5f2PRJipxvoNdsYmq0olg9iWwY5pJjiPmiesfxvdeF5zMvTsJC6vFfqNgNnZ [
noopencount flush ]   [16384] (*1)
    Loading vg-test table (253:84)
        Adding target to (253:84): 0 131072 raid raid5_ls 3 128 region_size
1024 3 253:35 253:36 253:37 253:38 253:39 253:40
        dm table   (253:84) [ opencount flush ]   [16384] (*1)
        dm reload   (253:84) [ noopencount flush ]   [16384] (*1)
  device-mapper: reload ioctl on (253:84) failed: Invalid argument

I don't see any problems here.

>
> You can check the rmeta superblocks with
> https://drive.google.com/open?id=0B8dHrWSoVcaDUk0wbHQzSEY3LTg

Thanks, it's very useful!

/dev/mapper/vg-test_rmeta_0
found RAID superblock at offset 0
 magic=1683123524
 features=0
 num_devices=3
 array_position=0
 events=56
 failed_devices=0
 disk_recovery_offset=18446744073709551615
 array_resync_offset=18446744073709551615
 level=5
 layout=2
 stripe_sectors=128
found bitmap file superblock at offset 4096:
         magic: 6d746962
       version: 4
          uuid: 00000000.00000000.00000000.00000000
        events: 56
events cleared: 33
         state: 00000000
     chunksize: 524288 B
  daemon sleep: 5s
     sync size: 32768 KB
max write behind: 0

/dev/mapper/vg-test_rmeta_1
found RAID superblock at offset 0
 magic=1683123524
 features=0
 num_devices=3
 array_position=4294967295
 events=62
 failed_devices=1
 disk_recovery_offset=0
 array_resync_offset=18446744073709551615
 level=5
 layout=2
 stripe_sectors=128
found bitmap file superblock at offset 4096:
         magic: 6d746962
       version: 4
          uuid: 00000000.00000000.00000000.00000000
        events: 60
events cleared: 33
         state: 00000000
     chunksize: 524288 B
  daemon sleep: 5s
     sync size: 32768 KB
max write behind: 0

/dev/mapper/vg-test_rmeta_2
found RAID superblock at offset 0
 magic=1683123524
 features=0
 num_devices=3
 array_position=2
 events=62
 failed_devices=1
 disk_recovery_offset=18446744073709551615
 array_resync_offset=18446744073709551615
 level=5
 layout=2
 stripe_sectors=128
found bitmap file superblock at offset 4096:
         magic: 6d746962
       version: 4
          uuid: 00000000.00000000.00000000.00000000
        events: 62
events cleared: 33
         state: 00000000
     chunksize: 524288 B
  daemon sleep: 5s
     sync size: 32768 KB
max write behind: 0

The problem I see here is that events count is different for the three
rmetas.

>
>
> > Here is the relevant /etc/lvm/archive (archive is more recent that
> backup)
>
> That looks sane, but you omitted the physical volumes section so there
> is no way to cross-check UUIDs and devices or see if there are MISSING
> flags.
>
The ids are the same and there are no MISSING flags.

>
> If you use
> https://drive.google.com/open?id=0B8dHrWSoVcaDQkU5NG1sLWc5cjg
> directly, you can get metadata that LVM is reading off the PVs and
> double-check for discrepancies.


> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

[-- Attachment #2: Type: text/html, Size: 15750 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-07  6:43       ` Giuliano Procida
@ 2016-10-09 19:00         ` Slava Prisivko
  0 siblings, 0 replies; 12+ messages in thread
From: Slava Prisivko @ 2016-10-09 19:00 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

Hi!

On Fri, Oct 7, 2016 at 9:49 AM Giuliano Procida <giuliano.procida@gmail.com>
wrote:

> Slava, the main problem I had was that LVM forbade many operations
> while I had a PV missing.
>
> In your case, apparently, all PVs are present. So I suggest the following:
>
> 1. examine the recent history in /etc/lvm/archive
> 2. diff each transition and see if you can understand what has
> happened at each stage
> 3. vgcfgrestore the most recent version that you think will allow you
> to activate your array, you can work backwards incrementally
> 4. check kernel logs!
> 5. scrub (resync) the array if needed
>

Since I cannot resync manually using your code and there are not MISSING
flags in /etc/lvm/archive and the different event count for the three
rmetas it seems it would not be helpful, would it?

I diffed the status quo of /etc/lvm/archive with the state before the
troubles and there is no significant difference and there is no difference
in IDs and no MISSING flags.

>
> Hope this helps,
> Giuliano.
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

[-- Attachment #2: Type: text/html, Size: 2373 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-09 19:00       ` Slava Prisivko
@ 2016-10-12  6:57         ` Giuliano Procida
  2016-10-12  7:15           ` Giuliano Procida
  2016-10-13 20:44           ` Slava Prisivko
  0 siblings, 2 replies; 12+ messages in thread
From: Giuliano Procida @ 2016-10-12  6:57 UTC (permalink / raw)
  To: LVM general discussion and development

On 9 October 2016 at 20:00, Slava Prisivko <vprisivko@gmail.com> wrote:
> I tried to reassemble the array using 3 different pairs of correct LV
> images, but it doesn't work (I am sure because I cannot luksOpen a LUKS
> image which is in the LV, which is almost surely uncorrectable).

I would hope that a luks volume would at least be recognisable using
file -s. If you extract the image data into a regular file you should
be able to losetup that and then luksOpen the loop device.

> This is as useful as it gets (-vvvv -dddd):
>     Loading vg-test_rmeta_0 table (253:35)
>         Adding target to (253:35): 0 8192 linear 8:34 2048
>         dm table   (253:35) [ opencount flush ]   [16384] (*1)
>     Suppressed vg-test_rmeta_0 (253:35) identical table reload.
>     Loading vg-test_rimage_0 table (253:36)
>         Adding target to (253:36): 0 65536 linear 8:34 10240
>         dm table   (253:36) [ opencount flush ]   [16384] (*1)
>     Suppressed vg-test_rimage_0 (253:36) identical table reload.
>     Loading vg-test_rmeta_1 table (253:37)
>         Adding target to (253:37): 0 8192 linear 8:2 1951688704
>         dm table   (253:37) [ opencount flush ]   [16384] (*1)
>     Suppressed vg-test_rmeta_1 (253:37) identical table reload.
>     Loading vg-test_rimage_1 table (253:38)
>         Adding target to (253:38): 0 65536 linear 8:2 1951696896
>         dm table   (253:38) [ opencount flush ]   [16384] (*1)
>     Suppressed vg-test_rimage_1 (253:38) identical table reload.
>     Loading vg-test_rmeta_2 table (253:39)
>         Adding target to (253:39): 0 8192 linear 8:18 1217423360
>         dm table   (253:39) [ opencount flush ]   [16384] (*1)
>     Suppressed vg-test_rmeta_2 (253:39) identical table reload.
>     Loading vg-test_rimage_2 table (253:40)
>         Adding target to (253:40): 0 65536 linear 8:18 1217431552
>         dm table   (253:40) [ opencount flush ]   [16384] (*1)
>     Suppressed vg-test_rimage_2 (253:40) identical table reload.
>     Creating vg-test
>         dm create vg-test
> LVM-Pgjp5f2PRJipxvoNdsYmq0olg9iWwY5pJjiPmiesfxvdeF5zMvTsJC6vFfqNgNnZ [
> noopencount flush ]   [16384] (*1)
>     Loading vg-test table (253:84)
>         Adding target to (253:84): 0 131072 raid raid5_ls 3 128 region_size
> 1024 3 253:35 253:36 253:37 253:38 253:39 253:40
>         dm table   (253:84) [ opencount flush ]   [16384] (*1)
>         dm reload   (253:84) [ noopencount flush ]   [16384] (*1)
>   device-mapper: reload ioctl on (253:84) failed: Invalid argument
>
> I don't see any problems here.

In my case I got (for example, and Gmail is going to fold the lines, sorry):

[...]
    Loading vg0-photos table (254:45)
        Adding target to (254:45): 0 1258291200 raid raid6_zr 3 128
region_size 1024 5 254:73 254:74 254:37 254:38 254:39 254:40 254:41
254:42 254:43 254:44
        dm table   (254:45) [ opencount flush ]   [16384] (*1)
        dm reload   (254:45) [ noopencount flush ]   [16384] (*1)
  device-mapper: reload ioctl on (254:45) failed: Invalid argument

The actual errors are in the kernel logs:

[...]
[144855.931712] device-mapper: raid: New device injected into existing
array without 'rebuild' parameter specified
[144855.935523] device-mapper: table: 254:45: raid: Unable to assemble
array: Invalid superblocks
[144855.939290] device-mapper: ioctl: error adding target to table

128 means 128*512 so this is 64k as in your case. I was able to verify
that my extracted images matched the RAID device. My problem was not
assembling the array, it was that the array would be rebuilt on every
subsequent use:

    Loading vg0-var table (254:21)
        Adding target to (254:21): 0 52428800 raid raid5_ls 5 128
region_size 1024 rebuild 0 5 254:11 254:12 254:13 254:14 254:15 254:16
254:17 254:18 254:19 254:20
        dm table   (254:21) [ opencount flush ]   [16384] (*1)
        dm reload   (254:21) [ noopencount flush ]   [16384] (*1)
        Table size changed from 0 to 52428800 for vg0-var (254:21).

>> You can check the rmeta superblocks with
>> https://drive.google.com/open?id=0B8dHrWSoVcaDUk0wbHQzSEY3LTg
>
> Thanks, it's very useful!
>
> /dev/mapper/vg-test_rmeta_0
> found RAID superblock at offset 0
>  magic=1683123524
>  features=0
>  num_devices=3
>  array_position=0
>  events=56
>  failed_devices=0
>  disk_recovery_offset=18446744073709551615
>  array_resync_offset=18446744073709551615
>  level=5
>  layout=2
>  stripe_sectors=128
> found bitmap file superblock at offset 4096:
>          magic: 6d746962
>        version: 4
>           uuid: 00000000.00000000.00000000.00000000
>         events: 56
> events cleared: 33
>          state: 00000000
>      chunksize: 524288 B
>   daemon sleep: 5s
>      sync size: 32768 KB
> max write behind: 0
>
> /dev/mapper/vg-test_rmeta_1
> found RAID superblock at offset 0
>  magic=1683123524
>  features=0
>  num_devices=3
>  array_position=4294967295
>  events=62
>  failed_devices=1
>  disk_recovery_offset=0
>  array_resync_offset=18446744073709551615
>  level=5
>  layout=2
>  stripe_sectors=128
> found bitmap file superblock at offset 4096:
>          magic: 6d746962
>        version: 4
>           uuid: 00000000.00000000.00000000.00000000
>         events: 60
> events cleared: 33
>          state: 00000000
>      chunksize: 524288 B
>   daemon sleep: 5s
>      sync size: 32768 KB
> max write behind: 0
>
> /dev/mapper/vg-test_rmeta_2
> found RAID superblock at offset 0
>  magic=1683123524
>  features=0
>  num_devices=3
>  array_position=2
>  events=62
>  failed_devices=1
>  disk_recovery_offset=18446744073709551615
>  array_resync_offset=18446744073709551615
>  level=5
>  layout=2
>  stripe_sectors=128
> found bitmap file superblock at offset 4096:
>          magic: 6d746962
>        version: 4
>           uuid: 00000000.00000000.00000000.00000000
>         events: 62
> events cleared: 33
>          state: 00000000
>      chunksize: 524288 B
>   daemon sleep: 5s
>      sync size: 32768 KB
> max write behind: 0
>
> The problem I see here is that events count is different for the three
> rmetas.

The event counts relate to the intent bitmap (I believe).

That looks OK, because failed devices is 1, meaning 0b0...01; i.e.,
device 0 of the array is "failed". The real problem is device 1 which
has

>  array_position=4294967295

This should be 1 instead. This is 32-bit unsigned 0xf...f. It may be
that it has special significance in kernel or LVM code. I've not
checked beyond noticing one test: role < 0.

I recommend using diff3 or pairwise diff on the metadata dumps to
ensure you have not missed any other differences.

One possible way forward:

(Optionally) adapt my resync code so it writes back to the original
files instead instead of outputting corrected linear data.

Modify the rmeta data to remove the failed flag and reset the bad
position to the correct value. sync and power off (or otherwise
prevent the device mapper from writing back bad data).

It's possible the RAID volume will fail to sync due to bitmap
inconsistencies. I don't know how to re-write the superblocks to say
"trust me, all data are in sync".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-12  6:57         ` Giuliano Procida
@ 2016-10-12  7:15           ` Giuliano Procida
  2016-10-14 23:19             ` Heinz Mauelshagen
  2016-10-13 20:44           ` Slava Prisivko
  1 sibling, 1 reply; 12+ messages in thread
From: Giuliano Procida @ 2016-10-12  7:15 UTC (permalink / raw)
  To: LVM general discussion and development

On 12 October 2016 at 07:57, Giuliano Procida
<giuliano.procida@gmail.com> wrote:
>>  array_position=4294967295
>
> This should be 1 instead. This is 32-bit unsigned 0xf...f. It may be
> that it has special significance in kernel or LVM code. I've not
> checked beyond noticing one test: role < 0.

http://lxr.free-electrons.com/source/drivers/md/dm-raid.c

Now role is an int and the RHS of the assignment is le32_to_cpu(...)
which returns a u32. Testing < 0 will *never* succeed on a 64-bit
architecture. This is a kernel bug. If the code is changed so that
role is also u32 and the test is against ~0, it's possible that
different, better things will happen. Please try reporting this to the
dm-devel people.

I still don't know what wrote that value to the superblock though.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-12  6:57         ` Giuliano Procida
  2016-10-12  7:15           ` Giuliano Procida
@ 2016-10-13 20:44           ` Slava Prisivko
  1 sibling, 0 replies; 12+ messages in thread
From: Slava Prisivko @ 2016-10-13 20:44 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 9469 bytes --]

On Wed, Oct 12, 2016 at 10:02 AM Giuliano Procida <
giuliano.procida@gmail.com> wrote:

> On 9 October 2016 at 20:00, Slava Prisivko <vprisivko@gmail.com> wrote:
>
> > I tried to reassemble the array using 3 different pairs of correct LV
>
> > images, but it doesn't work (I am sure because I cannot luksOpen a LUKS
>
> > image which is in the LV, which is almost surely uncorrectable).


>
>
> I would hope that a luks volume would at least be recognisable using
>
> file -s. If you extract the image data into a regular file you should
>
> be able to losetup that and then luksOpen the loop device.
>
Yes, it's recognizable. I can perform luksDump and luksOpen but for the
latter command the password just doesn't work. Well, cryptsetup works with
files just as well as with devices, so it doesn't help. But I tried just to
be sure and, quite naturally, it doesn't work either.

>
>
>
> > This is as useful as it gets (-vvvv -dddd):
>
> >     Loading vg-test_rmeta_0 table (253:35)
>
> >         Adding target to (253:35): 0 8192 linear 8:34 2048
>
> >         dm table   (253:35) [ opencount flush ]   [16384] (*1)
>
> >     Suppressed vg-test_rmeta_0 (253:35) identical table reload.
>
> >     Loading vg-test_rimage_0 table (253:36)
>
> >         Adding target to (253:36): 0 65536 linear 8:34 10240
>
> >         dm table   (253:36) [ opencount flush ]   [16384] (*1)
>
> >     Suppressed vg-test_rimage_0 (253:36) identical table reload.
>
> >     Loading vg-test_rmeta_1 table (253:37)
>
> >         Adding target to (253:37): 0 8192 linear 8:2 1951688704
>
> >         dm table   (253:37) [ opencount flush ]   [16384] (*1)
>
> >     Suppressed vg-test_rmeta_1 (253:37) identical table reload.
>
> >     Loading vg-test_rimage_1 table (253:38)
>
> >         Adding target to (253:38): 0 65536 linear 8:2 1951696896
>
> >         dm table   (253:38) [ opencount flush ]   [16384] (*1)
>
> >     Suppressed vg-test_rimage_1 (253:38) identical table reload.
>
> >     Loading vg-test_rmeta_2 table (253:39)
>
> >         Adding target to (253:39): 0 8192 linear 8:18 1217423360
>
> >         dm table   (253:39) [ opencount flush ]   [16384] (*1)
>
> >     Suppressed vg-test_rmeta_2 (253:39) identical table reload.
>
> >     Loading vg-test_rimage_2 table (253:40)
>
> >         Adding target to (253:40): 0 65536 linear 8:18 1217431552
>
> >         dm table   (253:40) [ opencount flush ]   [16384] (*1)
>
> >     Suppressed vg-test_rimage_2 (253:40) identical table reload.
>
> >     Creating vg-test
>
> >         dm create vg-test
>
> > LVM-Pgjp5f2PRJipxvoNdsYmq0olg9iWwY5pJjiPmiesfxvdeF5zMvTsJC6vFfqNgNnZ [
>
> > noopencount flush ]   [16384] (*1)
>
> >     Loading vg-test table (253:84)
>
> >         Adding target to (253:84): 0 131072 raid raid5_ls 3 128
> region_size
>
> > 1024 3 253:35 253:36 253:37 253:38 253:39 253:40
>
> >         dm table   (253:84) [ opencount flush ]   [16384] (*1)
>
> >         dm reload   (253:84) [ noopencount flush ]   [16384] (*1)
>
> >   device-mapper: reload ioctl on (253:84) failed: Invalid argument
>
> >
>
> > I don't see any problems here.
>
>
>
> In my case I got (for example, and Gmail is going to fold the lines,
> sorry):
>
>
>
> [...]
>
>     Loading vg0-photos table (254:45)
>
>         Adding target to (254:45): 0 1258291200 raid raid6_zr 3 128
>
> region_size 1024 5 254:73 254:74 254:37 254:38 254:39 254:40 254:41
>
> 254:42 254:43 254:44
>
>         dm table   (254:45) [ opencount flush ]   [16384] (*1)
>
>         dm reload   (254:45) [ noopencount flush ]   [16384] (*1)
>
>   device-mapper: reload ioctl on (254:45) failed: Invalid argument
>
>
>
> The actual errors are in the kernel logs:
>
>
>
> [...]
>
> [144855.931712] device-mapper: raid: New device injected into existing
>
> array without 'rebuild' parameter specified
>
> [144855.935523] device-mapper: table: 254:45: raid: Unable to assemble
>
> array: Invalid superblocks
>
> [144855.939290] device-mapper: ioctl: error adding target to table
>

I had the following the first time:
[   74.743051] device-mapper: raid: Failed to read superblock of device at
position 1
[   74.761094] md/raid:mdX: device dm-73 operational as raid disk 2
[   74.765707] md/raid:mdX: device dm-67 operational as raid disk 0
[   74.770911] md/raid:mdX: allocated 3219kB
[   74.773571] md/raid:mdX: raid level 5 active with 2 out of 3 devices,
algorithm 2
[   74.775964] RAID conf printout:
[   74.775968]  --- level:5 rd:3 wd:2
[   74.775971]  disk 0, o:1, dev:dm-67
[   74.775973]  disk 2, o:1, dev:dm-73
[   74.793120] created bitmap (1 pages) for device mdX
[   74.822333] mdX: bitmap initialized from disk: read 1 pages, set 2 of 64
bits

After that I had only the previously mentioned errors in the kernel log:

device-mapper: table: 253:84: raid: Cannot change device positions in RAID
array
device-mapper: ioctl: error adding target to table

>
>
>
> 128 means 128*512 so this is 64k as in your case. I was able to verify
>
> that my extracted images matched the RAID device. My problem was not
>
> assembling the array, it was that the array would be rebuilt on every
>
> subsequent use:
>
>
>
>     Loading vg0-var table (254:21)
>
>         Adding target to (254:21): 0 52428800 raid raid5_ls 5 128
>
> region_size 1024 rebuild 0 5 254:11 254:12 254:13 254:14 254:15 254:16
>
> 254:17 254:18 254:19 254:20
>
>         dm table   (254:21) [ opencount flush ]   [16384] (*1)
>
>         dm reload   (254:21) [ noopencount flush ]   [16384] (*1)
>
>         Table size changed from 0 to 52428800 for vg0-var (254:21).
>
>
>
> >> You can check the rmeta superblocks with
>
> >> https://drive.google.com/open?id=0B8dHrWSoVcaDUk0wbHQzSEY3LTg
>
> >
>
> > Thanks, it's very useful!
>
> >
>
> > /dev/mapper/vg-test_rmeta_0
>
> > found RAID superblock at offset 0
>
> >  magic=1683123524
>
> >  features=0
>
> >  num_devices=3
>
> >  array_position=0
>
> >  events=56
>
> >  failed_devices=0
>
> >  disk_recovery_offset=18446744073709551615
>
> >  array_resync_offset=18446744073709551615
>
> >  level=5
>
> >  layout=2
>
> >  stripe_sectors=128
>
> > found bitmap file superblock at offset 4096:
>
> >          magic: 6d746962
>
> >        version: 4
>
> >           uuid: 00000000.00000000.00000000.00000000
>
> >         events: 56
>
> > events cleared: 33
>
> >          state: 00000000
>
> >      chunksize: 524288 B
>
> >   daemon sleep: 5s
>
> >      sync size: 32768 KB
>
> > max write behind: 0
>
> >
>
> > /dev/mapper/vg-test_rmeta_1
>
> > found RAID superblock at offset 0
>
> >  magic=1683123524
>
> >  features=0
>
> >  num_devices=3
>
> >  array_position=4294967295
>
> >  events=62
>
> >  failed_devices=1
>
> >  disk_recovery_offset=0
>
> >  array_resync_offset=18446744073709551615
>
> >  level=5
>
> >  layout=2
>
> >  stripe_sectors=128
>
> > found bitmap file superblock at offset 4096:
>
> >          magic: 6d746962
>
> >        version: 4
>
> >           uuid: 00000000.00000000.00000000.00000000
>
> >         events: 60
>
> > events cleared: 33
>
> >          state: 00000000
>
> >      chunksize: 524288 B
>
> >   daemon sleep: 5s
>
> >      sync size: 32768 KB
>
> > max write behind: 0
>
> >
>
> > /dev/mapper/vg-test_rmeta_2
>
> > found RAID superblock at offset 0
>
> >  magic=1683123524
>
> >  features=0
>
> >  num_devices=3
>
> >  array_position=2
>
> >  events=62
>
> >  failed_devices=1
>
> >  disk_recovery_offset=18446744073709551615
>
> >  array_resync_offset=18446744073709551615
>
> >  level=5
>
> >  layout=2
>
> >  stripe_sectors=128
>
> > found bitmap file superblock at offset 4096:
>
> >          magic: 6d746962
>
> >        version: 4
>
> >           uuid: 00000000.00000000.00000000.00000000
>
> >         events: 62
>
> > events cleared: 33
>
> >          state: 00000000
>
> >      chunksize: 524288 B
>
> >   daemon sleep: 5s
>
> >      sync size: 32768 KB
>
> > max write behind: 0
>
> >
>
> > The problem I see here is that events count is different for the three
>
> > rmetas.
>
>
>
> The event counts relate to the intent bitmap (I believe).
>
>
>
> That looks OK, because failed devices is 1, meaning 0b0...01; i.e.,
>
> device 0 of the array is "failed". The real problem is device 1 which
>
> has
>
>
>
> >  array_position=4294967295
>
>
>
> This should be 1 instead. This is 32-bit unsigned 0xf...f. It may be
>
> that it has special significance in kernel or LVM code. I've not
>
> checked beyond noticing one test: role < 0.
>
>
>
> I recommend using diff3 or pairwise diff on the metadata dumps to
>
> ensure you have not missed any other differences.
>
>
>
> One possible way forward:
>
>
>
> (Optionally) adapt my resync code so it writes back to the original
>
> files instead instead of outputting corrected linear data.
>
>
>
> Modify the rmeta data to remove the failed flag and reset the bad
>
> position to the correct value. sync and power off (or otherwise
>
> prevent the device mapper from writing back bad data).
>
>
>
> It's possible the RAID volume will fail to sync due to bitmap
>
> inconsistencies. I don't know how to re-write the superblocks to say
>
> "trust me, all data are in sync".
>
Thanks for the tip! But could it help me if the manual data reassembly
using your code doesn't work? I don't understand what metadata could do to
fix that.

>
>
>
> _______________________________________________
>
> linux-lvm mailing list
>
> linux-lvm@redhat.com
>
> https://www.redhat.com/mailman/listinfo/linux-lvm
>
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[-- Attachment #2: Type: text/html, Size: 15989 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-12  7:15           ` Giuliano Procida
@ 2016-10-14 23:19             ` Heinz Mauelshagen
  2016-10-15  7:21               ` Giuliano Procida
  0 siblings, 1 reply; 12+ messages in thread
From: Heinz Mauelshagen @ 2016-10-14 23:19 UTC (permalink / raw)
  To: LVM general discussion and development

On 10/12/2016 09:15 AM, Giuliano Procida wrote:
> On 12 October 2016 at 07:57, Giuliano Procida
> <giuliano.procida@gmail.com> wrote:
>>>   array_position=4294967295
>> This should be 1 instead. This is 32-bit unsigned 0xf...f. It may be
>> that it has special significance in kernel or LVM code. I've not
>> checked beyond noticing one test: role < 0.
> http://lxr.free-electrons.com/source/drivers/md/dm-raid.c
>
> Now role is an int and the RHS of the assignment is le32_to_cpu(...)
> which returns a u32. Testing < 0 will *never* succeed on a 64-bit
> architecture.
Well, the result of le32_to_cpu is assigned to a 32 bit int on 64 bit arch.

The 4294967295 reported by parse_rmeta will thus result in signed int 
role = -1
which is used for comparision.

The tool should rather report the array_position signed to avoid iritation.



>   This is a kernel bug. If the code is changed so that
> role is also u32 and the test is against ~0, it's possible that
> different, better things will happen. Please try reporting this to the
> dm-devel people.
>
> I still don't know what wrote that value to the superblock though.

Must have been the position gotten set to -1 indicating failure to hot add
the image LV back in and thus MD has written it to that superblock.

Did you spot any "Faulty.*device #.*has readable super block.\n 
Attempting to revive it."
messages from dm-raid in the kernel log by chance?

Heinz

>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] LVM RAID5 out-of-sync recovery
  2016-10-14 23:19             ` Heinz Mauelshagen
@ 2016-10-15  7:21               ` Giuliano Procida
  0 siblings, 0 replies; 12+ messages in thread
From: Giuliano Procida @ 2016-10-15  7:21 UTC (permalink / raw)
  To: LVM general discussion and development

On 15 October 2016 at 00:19, Heinz Mauelshagen <heinzm@redhat.com> wrote:
> On 10/12/2016 09:15 AM, Giuliano Procida wrote:
>>
>> On 12 October 2016 at 07:57, Giuliano Procida
>> [lies]
>
> Well, the result of le32_to_cpu is assigned to a 32 bit int on 64 bit arch.
>
> The 4294967295 reported by parse_rmeta will thus result in signed int role =
> -1
> which is used for comparision.
>
> The tool should rather report the array_position signed to avoid iritation.

Oops, yes. sizeof(int) == 4 in the kernel, even on a 64-bit architecture.
I'd forgotten this. It seemed surprising that compiler would likely warn
but the warning would have been ignored.

>> I still don't know what wrote that value to the superblock though.
>
> Must have been the position gotten set to -1 indicating failure to hot add
> the image LV back in and thus MD has written it to that superblock.

I didn't find the code that did this when I last looked.

> Did you spot any "Faulty.*device #.*has readable super block.\n Attempting
> to revive it."
> messages from dm-raid in the kernel log by chance?

That's a question for Slava.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-10-15  7:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-03 23:49 [linux-lvm] LVM RAID5 out-of-sync recovery Slava Prisivko
2016-10-04  9:45 ` Giuliano Procida
2016-10-04 22:14   ` Slava Prisivko
2016-10-05 12:48     ` Giuliano Procida
2016-10-07  6:43       ` Giuliano Procida
2016-10-09 19:00         ` Slava Prisivko
2016-10-09 19:00       ` Slava Prisivko
2016-10-12  6:57         ` Giuliano Procida
2016-10-12  7:15           ` Giuliano Procida
2016-10-14 23:19             ` Heinz Mauelshagen
2016-10-15  7:21               ` Giuliano Procida
2016-10-13 20:44           ` Slava Prisivko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).