* [linux-lvm] "md/raid:mdX: cannot start dirty degraded array."
@ 2021-10-11 14:08 Andreas Trottmann
2021-10-27 20:46 ` Andreas Trottmann
0 siblings, 1 reply; 3+ messages in thread
From: Andreas Trottmann @ 2021-10-11 14:08 UTC (permalink / raw)
To: linux-lvm
Hello linux-lvm
( I originally sent this e-Mail to linux-raid@vger.kernel.org which
appears to have been the wrong place )
I am running a server that runs a number of virtual machines and manages
their virtual disks as logical volumes using lvmraid (so: indivdual SSDs
are used as PVs for LVM; the LVs are using RAID to create redundancy and
are created with commands such as "lvcreate --type raid5 --stripes 4
--stripesize 128 ...")
The server is running Debian 10 "buster" with latest updates and its
stock kernel: Linux (hostname) 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3
(2021-07-18) x86_64 GNU/Linux
Recently, the server had one of its SSDs serving as a PV fail.
After a restart, all of the logical volumes came back, except one.
As far as I remember, there were NO raid operations (resync, reshape or
the like) going on when the SSD failed.
Many LVs had _rmeta and _rimage volumes on the failed PV; all but one of
them are perfectly useable in a "degraded" state.
The one failed volume in question consists of four stripes and uses raid5.
When I'm trying to activate it, I get:
# lvchange -a y /dev/vg_ssds_0/host-home
Couldn't find device with uuid 8iz0p5-vh1c-kaxK-cTRC-1ryd-eQd1-wX1Yq9.
device-mapper: reload ioctl on (253:245) failed: Input/output error
dmesg shows:
device-mapper: raid: Failed to read superblock of device at position 1
md/raid:mdX: not clean -- starting background reconstruction
md/raid:mdX: device dm-50 operational as raid disk 0
md/raid:mdX: device dm-168 operational as raid disk 2
md/raid:mdX: device dm-230 operational as raid disk 3
md/raid:mdX: cannot start dirty degraded array.
md/raid:mdX: failed to run raid set.
md: pers->run() failed ...
device-mapper: table: 253:245: raid: Failed to run raid array
device-mapper: ioctl: error adding target to table
I can successfully activate and access three of the four _rmeta_X and
_rimage_X LVs: _0, _2 and _3.
_rmeta_1 and _rimage_1 was on the failed SSD.
This makes me think that the data should be recoverable; three out of
four RAID5 stripes should be enough.
I copied the entire data of all of the _rimage and _rmeta volumes onto a
safe space.
The _rmeta ones look like this:
# od -t xC /dev/vg_ssds_0/host-home_rmeta_0
0000000 44 6d 52 64 01 00 00 00 04 00 00 00 00 00 00 00
0000020 ce 0b 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0000060 05 00 00 00 02 00 00 00 00 01 00 00 00 00 00 00
0000100 ff ff ff ff ff ff ff ff 05 00 00 00 02 00 00 00
0000120 00 01 00 00 00 00 00 00 00 00 00 cb 01 00 00 00
0000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000160 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00
0000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0010000 62 69 74 6d 04 00 00 00 00 00 00 00 00 00 00 00
0010020 00 00 00 00 00 00 00 00 ce 0b 00 00 00 00 00 00
0010040 ce 0b 00 00 00 00 00 00 00 00 00 99 00 00 00 00
0010060 00 00 00 00 00 00 20 00 05 00 00 00 00 00 00 00
0010100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
20000000
the only difference of _rmeta_2 and _rmeta_3 to _rmeta_0 is a "2" and a
"3", respectively, on offset 12; this should be "array_position" and it
makes sense to me that _rmeta_0 contains 0, _rmeta_2 contains 2 and
_rmeta_3 contains 3.
I googled for the error message "md/raid:mdX: not clean -- starting
background", and found
https://forums.opensuse.org/showthread.php/497294-LVM-RAID5-broken-after-sata-link-error
In the case described there, the "failed_devices" field was not zero,
and zeroing it out using a hex editor made "vgchange -a y" do the right
thing again. However, in my _rmetas, it looks like the "failed_devices"
fields are already all zero:
44 6D 52 64 magic
01 00 00 00 compat_features FEATURE_FLAG_SUPPORTS_V190
04 00 00 00 num_devices
00 00 00 00 array_position
CE 0B 00 00 00 00 00 00 events
00 00 00 00 00 00 00 00 failed_devices (none)
FF FF FF FF FF FF FF FF disk_recovery_offset
FF FF FF FF FF FF FF FF array_resync_offset
05 00 00 00 level
02 00 00 00 layout
00 01 00 00 stripe_sectors
00 00 00 00 flags
FF FF FF FF FF FF FF FF reshape_position
05 00 00 00 new_level
02 00 00 00 new_layout
00 01 00 00 new_strip_sectors
00 00 00 00 delta_disks
00 00 00 CB 01 00 00 00 array_sectors (0x01CB000000)
00 00 00 00 00 00 00 00 data_offset
00 00 00 00 00 00 00 00 new_data_offset
00 00 00 80 00 00 00 00 sectors
00 00 00 00 00 00 00 00 extended_failed_devices (none)
(...) (more zero bytes skipped)
00 00 00 00 incompat_features
This looks very fine to me; the "array sectors" value fits with the
actual size of the array.
I was not able to find the meaning of the block starting at offset
0x0010000 (62 69 74 6d; "bitm").
I now have two questions:
* is there anything I can do to those _rmeta blocks in order to make
"vgchange -a y" work again?
* if not: I successfully copied the "_rimage_" into files. Is there
anything magical that I can do with with losetup and mdadm to create a
new /dev/md/... device that I can access to copy data from?
Thank you very much in advance and kind regards
--
Andreas Trottmann
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linux-lvm] "md/raid:mdX: cannot start dirty degraded array."
2021-10-11 14:08 [linux-lvm] "md/raid:mdX: cannot start dirty degraded array." Andreas Trottmann
@ 2021-10-27 20:46 ` Andreas Trottmann
2021-11-18 21:01 ` Heinz Mauelshagen
0 siblings, 1 reply; 3+ messages in thread
From: Andreas Trottmann @ 2021-10-27 20:46 UTC (permalink / raw)
To: linux-lvm
Am 11.10.21 um 16:08 schrieb Andreas Trottmann:
> I am running a server that runs a number of virtual machines and manages their virtual disks as logical volumes using lvmraid (...)
> After a restart, all of the logical volumes came back, except one.
> When I'm trying to activate it, I get:
>
> # lvchange -a y /dev/vg_ssds_0/host-home
> Couldn't find device with uuid 8iz0p5-vh1c-kaxK-cTRC-1ryd-eQd1-wX1Yq9.
> device-mapper: reload ioctl on (253:245) failed: Input/output error
I am replying to my own e-mail here in order to document how I got the
data back, in case someone in a similar situation finds this mail when
searching for the symptoms.
First: I did *not* succeeed in activating the lvmraid volume. No matter
how I tried to modify the _rmeta volumes, I always got "reload ioctl
(...) failed: Input/output error" from "lvchange", and "cannot start
dirty degraded array" in dmesg.
So, I used "lvchange -a y /dev/vg_ssds_0/host-home_rimage_0" (and
_rimage_2 and _rimage_3, as those were the ones that were *not* on the
failed PV) to get access to the indivdual RAID SubLVs. I then used "dd
if=/dev/vg_ssds_0/host-home_rimage_0 of=/mnt/space/rimage_0" to copy the
data to a file on a filesystem with enough space. I repeated this with 2
and 3 as well. I then used losetup to access /mnt/space/rimage_0 as
/dev/loop0, rimage_2 as loop2, and rimage_3 as loop3.
Now I wanted to use mdadm to "build" the RAID in the "array that doesn't
have per-device metadata (superblocks)" case:
# mdadm --build /dev/md0 -n 4 -c 128 -l 5 --assume-clean --readonly
/dev/loop0 missing /dev/loop2 /dev/loop3
However, this failed with "mdadm: Raid level 5 not permitted with --build".
("-c 128" was the chunk size used when creating the lvmraid, "-n 4" and
"-l 5" refer to the number of devices and the raid level)
I then read the man page about the "superblocks", and found out that the
"1.0" style of RAID metadata (selected with an mdadm "-e 1.0" option)
places a superblock at the end of the device. Some experimenting on
unused devices showed that the size used for actual data was the size of
the block device minus 144 KiB (possibly 144 KiB = 128 KiB (chunksize) +
8 KiB (size of superblock) + 8 KiB (size of bitmap). So I added 147456
zero bytes at the end of each file:
# for i in 0 2 3; do head -c 147456 /dev/zero >> /mnt/space/rimage_$i; done
After detaching and re-attaching the loop devices, I ran
# mdadm --create /dev/md0 -n 4 -c 128 -l 5 -e 1.0 --assume-clean
/dev/loop0 missing /dev/loop2 /dev/loop3
(substituting "missing" in the place where the missing RAID SubLV would
have been)
And, voilà: /dev/md0 was perfectly readable, fsck showed no errors, and
it could be mounted correctly, with all data intact.
Kind regards
--
Andreas Trottmann
Werft22 AG
Tel +41 (0)56 210 91 32
Fax +41 (0)56 210 91 34
Mobile +41 (0)79 229 88 55
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linux-lvm] "md/raid:mdX: cannot start dirty degraded array."
2021-10-27 20:46 ` Andreas Trottmann
@ 2021-11-18 21:01 ` Heinz Mauelshagen
0 siblings, 0 replies; 3+ messages in thread
From: Heinz Mauelshagen @ 2021-11-18 21:01 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1.1: Type: text/plain, Size: 3745 bytes --]
Andreas,
LVM RAID and MD RAID have different, hence incompatible superblock formats
so you can't switch to MD in this case.
Try activating your RaidLV with 'lvchange -ay --activationmode=degraded
/dev/vg_ssds_0/host-home', add a replacement PV of adequate size if none
available and run 'lvconvert --repair /dev/vg_ssds_0/host-home'.
Best,
Heinz
On Fri, Oct 29, 2021 at 1:07 PM Andreas Trottmann <
andreas.trottmann@werft22.com> wrote:
> Am 11.10.21 um 16:08 schrieb Andreas Trottmann:
>
> > I am running a server that runs a number of virtual machines and manages
> their virtual disks as logical volumes using lvmraid (...)
>
> > After a restart, all of the logical volumes came back, except one.
>
> > When I'm trying to activate it, I get:
> >
> > # lvchange -a y /dev/vg_ssds_0/host-home
> > Couldn't find device with uuid 8iz0p5-vh1c-kaxK-cTRC-1ryd-eQd1-wX1Yq9.
> > device-mapper: reload ioctl on (253:245) failed: Input/output error
>
>
> I am replying to my own e-mail here in order to document how I got the
> data back, in case someone in a similar situation finds this mail when
> searching for the symptoms.
>
> First: I did *not* succeeed in activating the lvmraid volume. No matter
> how I tried to modify the _rmeta volumes, I always got "reload ioctl
> (...) failed: Input/output error" from "lvchange", and "cannot start
> dirty degraded array" in dmesg.
>
> So, I used "lvchange -a y /dev/vg_ssds_0/host-home_rimage_0" (and
> _rimage_2 and _rimage_3, as those were the ones that were *not* on the
> failed PV) to get access to the indivdual RAID SubLVs. I then used "dd
> if=/dev/vg_ssds_0/host-home_rimage_0 of=/mnt/space/rimage_0" to copy the
> data to a file on a filesystem with enough space. I repeated this with 2
> and 3 as well. I then used losetup to access /mnt/space/rimage_0 as
> /dev/loop0, rimage_2 as loop2, and rimage_3 as loop3.
>
> Now I wanted to use mdadm to "build" the RAID in the "array that doesn't
> have per-device metadata (superblocks)" case:
>
> # mdadm --build /dev/md0 -n 4 -c 128 -l 5 --assume-clean --readonly
> /dev/loop0 missing /dev/loop2 /dev/loop3
>
> However, this failed with "mdadm: Raid level 5 not permitted with --build".
>
> ("-c 128" was the chunk size used when creating the lvmraid, "-n 4" and
> "-l 5" refer to the number of devices and the raid level)
>
> I then read the man page about the "superblocks", and found out that the
> "1.0" style of RAID metadata (selected with an mdadm "-e 1.0" option)
> places a superblock at the end of the device. Some experimenting on
> unused devices showed that the size used for actual data was the size of
> the block device minus 144 KiB (possibly 144 KiB = 128 KiB (chunksize) +
> 8 KiB (size of superblock) + 8 KiB (size of bitmap). So I added 147456
> zero bytes at the end of each file:
>
> # for i in 0 2 3; do head -c 147456 /dev/zero >> /mnt/space/rimage_$i; done
>
> After detaching and re-attaching the loop devices, I ran
>
> # mdadm --create /dev/md0 -n 4 -c 128 -l 5 -e 1.0 --assume-clean
> /dev/loop0 missing /dev/loop2 /dev/loop3
>
> (substituting "missing" in the place where the missing RAID SubLV would
> have been)
>
> And, voilà: /dev/md0 was perfectly readable, fsck showed no errors, and
> it could be mounted correctly, with all data intact.
>
>
>
> Kind regards
>
> --
> Andreas Trottmann
> Werft22 AG
> Tel +41 (0)56 210 91 32
> Fax +41 (0)56 210 91 34
> Mobile +41 (0)79 229 88 55
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[-- Attachment #1.2: Type: text/html, Size: 4746 bytes --]
[-- Attachment #2: Type: text/plain, Size: 201 bytes --]
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-11-19 7:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11 14:08 [linux-lvm] "md/raid:mdX: cannot start dirty degraded array." Andreas Trottmann
2021-10-27 20:46 ` Andreas Trottmann
2021-11-18 21:01 ` Heinz Mauelshagen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).