linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] LVM RAID behavior after losing physical disk
@ 2022-01-25 18:44 Andrei Rodionov
  2022-01-27 13:16 ` John Stoffel
  2022-01-27 16:14 ` Andrei Rodionov
  0 siblings, 2 replies; 4+ messages in thread
From: Andrei Rodionov @ 2022-01-25 18:44 UTC (permalink / raw)
  To: linux-lvm


[-- Attachment #1.1: Type: text/plain, Size: 5656 bytes --]

Hello,

I've provisioned an LVM RAID 6 across 4 physical disks. I'm trying to
understand the RAID behavior after injecting the failure - removing
physical disk /dev/sdc.

pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf
vgcreate pool_vg /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
lvcreate -l +100%FREE -n pool_lv --type raid6 pool_vg
mkfs.xfs /dev/pool_vg/pool_lv
echo "/dev/mapper/pool_vg-pool_lv /mnt xfs
defaults,x-systemd.mount-timeout=30 0 0" >> /etc/fstab

Everything appears to be working fine:
# pvs --segments -o
pv_name,pv_size,seg_size,vg_name,lv_name,lv_attr,lv_size,seg_pe_ranges
  PV         PSize    SSize   VG        LV                 Attr       LSize
  PE Ranges
  /dev/sda3   <49.00g <24.50g ubuntu-vg ubuntu-lv          -wi-ao----
<24.50g /dev/sda3:0-6270
  /dev/sda3   <49.00g  24.50g ubuntu-vg                                    0
  /dev/sdc   <100.00g   4.00m pool_vg   [pool_lv_rmeta_0]  ewi-aor---
4.00m /dev/sdc:0-0
  /dev/sdc   <100.00g  99.99g pool_vg   [pool_lv_rimage_0] iwi-aor---
 99.99g /dev/sdc:1-25598
  /dev/sdd   <100.00g   4.00m pool_vg   [pool_lv_rmeta_1]  ewi-aor---
4.00m /dev/sdd:0-0
  /dev/sdd   <100.00g  99.99g pool_vg   [pool_lv_rimage_1] iwi-aor---
 99.99g /dev/sdd:1-25598
  /dev/sde   <100.00g   4.00m pool_vg   [pool_lv_rmeta_2]  ewi-aor---
4.00m /dev/sde:0-0
  /dev/sde   <100.00g  99.99g pool_vg   [pool_lv_rimage_2] iwi-aor---
 99.99g /dev/sde:1-25598
  /dev/sdf   <100.00g   4.00m pool_vg   [pool_lv_rmeta_3]  ewi-aor---
4.00m /dev/sdf:0-0
  /dev/sdf   <100.00g  99.99g pool_vg   [pool_lv_rimage_3] iwi-aor---
 99.99g /dev/sdf:1-25598
  /dev/sdg   <100.00g   4.00m pool_vg   [pool_lv_rmeta_4]  ewi-aor---
4.00m /dev/sdg:0-0
  /dev/sdg   <100.00g  99.99g pool_vg   [pool_lv_rimage_4] iwi-aor---
 99.99g /dev/sdg:1-25598
# lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
  LV                 Attr       Cpy%Sync Health          Devices
  pool_lv            rwi-aor--- 100.00
pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
  [pool_lv_rimage_0] iwi-aor---                          /dev/sdc(1)
  [pool_lv_rimage_1] iwi-aor---                          /dev/sdd(1)
  [pool_lv_rimage_2] iwi-aor---                          /dev/sde(1)
  [pool_lv_rimage_3] iwi-aor---                          /dev/sdf(1)
  [pool_lv_rimage_4] iwi-aor---                          /dev/sdg(1)
  [pool_lv_rmeta_0]  ewi-aor---                          /dev/sdc(0)
  [pool_lv_rmeta_1]  ewi-aor---                          /dev/sdd(0)
  [pool_lv_rmeta_2]  ewi-aor---                          /dev/sde(0)
  [pool_lv_rmeta_3]  ewi-aor---                          /dev/sdf(0)
  [pool_lv_rmeta_4]  ewi-aor---                          /dev/sdg(0)

After the /dev/sdc is removed and the system is rebooted, the RAID goes
into "partial" health state and is no longer accessible.

# lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
  WARNING: Couldn't find device with uuid
03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3.
  WARNING: VG pool_vg is missing PV 03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3
(last written to /dev/sdc).
  LV                 Attr       Cpy%Sync Health          Devices
  pool_lv            rwi---r-p-          partial
pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
  [pool_lv_rimage_0] Iwi---r-p-          partial         [unknown](1)
  [pool_lv_rimage_1] Iwi---r---                          /dev/sdd(1)
  [pool_lv_rimage_2] Iwi---r---                          /dev/sde(1)
  [pool_lv_rimage_3] Iwi---r---                          /dev/sdf(1)
  [pool_lv_rimage_4] Iwi---r---                          /dev/sdg(1)
  [pool_lv_rmeta_0]  ewi---r-p-          partial         [unknown](0)
  [pool_lv_rmeta_1]  ewi---r---                          /dev/sdd(0)
  [pool_lv_rmeta_2]  ewi---r---                          /dev/sde(0)
  [pool_lv_rmeta_3]  ewi---r---                          /dev/sdf(0)
  [pool_lv_rmeta_4]  ewi---r---                          /dev/sdg(0)


>From what I understand, the RAID should be able to continue with a physical
disk loss and be in a "degraded" state, not "partial", because the data is
fully present on the surviving disks.

>From /etc/lvm/lvm.conf:
        #   degraded
        #     Like complete, but additionally RAID LVs of segment type
raid1,
        #     raid4, raid5, radid6 and raid10 will be activated if there is
no
        #     data loss, i.e. they have sufficient redundancy to present the
        #     entire addressable range of the Logical Volume.
        #   partial
        #     Allows the activation of any LV even if a missing or failed PV
        #     could cause data loss with a portion of the LV inaccessible.
        #     This setting should not normally be used, but may sometimes
        #     assist with data recovery.

From
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_configure-mange-raid-configuring-and-managing-logical-volumes
:

"RAID is not like traditional LVM mirroring. LVM mirroring required failed
devices to be removed or the mirrored logical volume would hang. RAID
arrays can keep on running with failed devices. In fact, for RAID types
other than RAID1, removing a device would mean converting to a lower level
RAID (for example, from RAID6 to RAID5, or from RAID4 or RAID5 to RAID0).

However, in my case, the RAID is not converted, it's simply not available.

This is running in a virtual machine on VMware ESXi 7,  LVM version:
2.03.07(2) (2019-11-30).

Am I missing something obvious? Appreciate any insights.

Thanks,

Andrei

[-- Attachment #1.2: Type: text/html, Size: 7146 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] LVM RAID behavior after losing physical disk
  2022-01-25 18:44 [linux-lvm] LVM RAID behavior after losing physical disk Andrei Rodionov
@ 2022-01-27 13:16 ` John Stoffel
  2022-01-27 16:14 ` Andrei Rodionov
  1 sibling, 0 replies; 4+ messages in thread
From: John Stoffel @ 2022-01-27 13:16 UTC (permalink / raw)
  To: LVM general discussion and development

>>>>> "Andrei" == Andrei Rodionov <andrei.rodionov@gmail.com> writes:

Andrei> I've provisioned an LVM RAID 6 across 4 physical disks. I'm
Andrei> trying to understand the RAID behavior after injecting the
Andrei> failure - removing physical disk /dev/sdc.

The docs state you need to use 5 devices for RAID6 under LVM, not
four.  And you do show 5 disks in your vgcreate, but not your lvcreate
command.

Maybe you could post a test script you use to do your testing to make
sure you're calling it correctly?  


Andrei> pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf
Andrei> vgcreate pool_vg /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
Andrei> lvcreate -l +100%FREE -n pool_lv --type raid6 pool_vg
Andrei> mkfs.xfs /dev/pool_vg/pool_lv
Andrei> echo "/dev/mapper/pool_vg-pool_lv /mnt xfs defaults,x-systemd.mount-timeout=30 0 0" >> /etc/fstab

This looks ok, but maybe you need to specify the explicit stripe count
and size?

  lvcreate --type raid6 -l 100%FREE --stripes 3 --stripesize 1 -n pool_lv pool_vg

Andrei> Everything appears to be working fine:
Andrei> # pvs --segments -o pv_name,pv_size,seg_size,vg_name,lv_name,lv_attr,lv_size,seg_pe_ranges
Andrei>   PV         PSize    SSize   VG        LV                 Attr       LSize   PE Ranges
Andrei>   /dev/sda3   <49.00g <24.50g ubuntu-vg ubuntu-lv          -wi-ao---- <24.50g /dev/sda3:0-6270
Andrei>   /dev/sda3   <49.00g  24.50g ubuntu-vg                                    0
Andrei>   /dev/sdc   <100.00g   4.00m pool_vg   [pool_lv_rmeta_0]  ewi-aor---   4.00m /dev/sdc:0-0
Andrei>   /dev/sdc   <100.00g  99.99g pool_vg   [pool_lv_rimage_0] iwi-aor---  99.99g /dev/sdc:1-25598
Andrei>   /dev/sdd   <100.00g   4.00m pool_vg   [pool_lv_rmeta_1]  ewi-aor---   4.00m /dev/sdd:0-0
Andrei>   /dev/sdd   <100.00g  99.99g pool_vg   [pool_lv_rimage_1] iwi-aor---  99.99g /dev/sdd:1-25598
Andrei>   /dev/sde   <100.00g   4.00m pool_vg   [pool_lv_rmeta_2]  ewi-aor---   4.00m /dev/sde:0-0
Andrei>   /dev/sde   <100.00g  99.99g pool_vg   [pool_lv_rimage_2] iwi-aor---  99.99g /dev/sde:1-25598
Andrei>   /dev/sdf   <100.00g   4.00m pool_vg   [pool_lv_rmeta_3]  ewi-aor---   4.00m /dev/sdf:0-0
Andrei>   /dev/sdf   <100.00g  99.99g pool_vg   [pool_lv_rimage_3] iwi-aor---  99.99g /dev/sdf:1-25598
Andrei>   /dev/sdg   <100.00g   4.00m pool_vg   [pool_lv_rmeta_4]  ewi-aor---   4.00m /dev/sdg:0-0
Andrei>   /dev/sdg   <100.00g  99.99g pool_vg   [pool_lv_rimage_4] iwi-aor---  99.99g /dev/sdg:1-25598
Andrei> # lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
Andrei>   LV                 Attr       Cpy%Sync Health          Devices
Andrei>   pool_lv            rwi-aor--- 100.00                   pool_lv_rimage_0(0),pool_lv_rimage_1
Andrei> (0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
Andrei>   [pool_lv_rimage_0] iwi-aor---                          /dev/sdc(1)
Andrei>   [pool_lv_rimage_1] iwi-aor---                          /dev/sdd(1)
Andrei>   [pool_lv_rimage_2] iwi-aor---                          /dev/sde(1)
Andrei>   [pool_lv_rimage_3] iwi-aor---                          /dev/sdf(1)
Andrei>   [pool_lv_rimage_4] iwi-aor---                          /dev/sdg(1)
Andrei>   [pool_lv_rmeta_0]  ewi-aor---                          /dev/sdc(0)
Andrei>   [pool_lv_rmeta_1]  ewi-aor---                          /dev/sdd(0)
Andrei>   [pool_lv_rmeta_2]  ewi-aor---                          /dev/sde(0)
Andrei>   [pool_lv_rmeta_3]  ewi-aor---                          /dev/sdf(0)
Andrei>   [pool_lv_rmeta_4]  ewi-aor---                          /dev/sdg(0)

Andrei> After the /dev/sdc is removed and the system is rebooted, the
Andrei> RAID goes into "partial" health state and is no longer
Andrei> accessible.

Just for grins, what happens if you re-add the sdc and then reboot?
Does it re-find the array?  

Andrei> # lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
Andrei>   WARNING: Couldn't find device with uuid 03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3.
Andrei>   WARNING: VG pool_vg is missing PV 03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3 (last written to /dev/
Andrei> sdc).
Andrei>   LV                 Attr       Cpy%Sync Health          Devices
Andrei>   pool_lv            rwi---r-p-          partial         pool_lv_rimage_0(0),pool_lv_rimage_1
Andrei> (0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
Andrei>   [pool_lv_rimage_0] Iwi---r-p-          partial         [unknown](1)
Andrei>   [pool_lv_rimage_1] Iwi---r---                          /dev/sdd(1)
Andrei>   [pool_lv_rimage_2] Iwi---r---                          /dev/sde(1)
Andrei>   [pool_lv_rimage_3] Iwi---r---                          /dev/sdf(1)
Andrei>   [pool_lv_rimage_4] Iwi---r---                          /dev/sdg(1)
Andrei>   [pool_lv_rmeta_0]  ewi---r-p-          partial         [unknown](0)
Andrei>   [pool_lv_rmeta_1]  ewi---r---                          /dev/sdd(0)
Andrei>   [pool_lv_rmeta_2]  ewi---r---                          /dev/sde(0)
Andrei>   [pool_lv_rmeta_3]  ewi---r---                          /dev/sdf(0)
Andrei>   [pool_lv_rmeta_4]  ewi---r---                          /dev/sdg(0)

Andrei> From what I understand, the RAID should be able to continue
Andrei> with a physical disk loss and be in a "degraded" state, not
Andrei> "partial", because the data is fully present on the surviving
Andrei> disks.

Andrei> From /etc/lvm/lvm.conf:
Andrei>         #   degraded
Andrei>         #     Like complete, but additionally RAID LVs of segment type raid1,
Andrei>         #     raid4, raid5, radid6 and raid10 will be activated if there is no
Andrei>         #     data loss, i.e. they have sufficient redundancy to present the
Andrei>         #     entire addressable range of the Logical Volume.
Andrei>         #   partial
Andrei>         #     Allows the activation of any LV even if a missing or failed PV
Andrei>         #     could cause data loss with a portion of the LV inaccessible.
Andrei>         #     This setting should not normally be used, but may sometimes
Andrei>         #     assist with data recovery.

What is your actual setting in /etc/lvm/lvm.conf for the block:

   activation {
     ...
     activation_mode = "degraded"
     ...
   }

I'm on debian, and not on RHEL8 and I haven't tested this myself, but
I wonder if you needed to really apply the '--stripes 3' value when
you built it?

John


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] LVM RAID behavior after losing physical disk
  2022-01-25 18:44 [linux-lvm] LVM RAID behavior after losing physical disk Andrei Rodionov
  2022-01-27 13:16 ` John Stoffel
@ 2022-01-27 16:14 ` Andrei Rodionov
  2022-01-29 21:46   ` John Stoffel
  1 sibling, 1 reply; 4+ messages in thread
From: Andrei Rodionov @ 2022-01-27 16:14 UTC (permalink / raw)
  To: linux-lvm

Hello,

I apologize for replying to my own message, I was subscribed in the
digest mode...

I've been hitting my head against this one for a while now. Originally
discovered this on Ubuntu 20.04, but I'm seeing the same issue with
RHEL 8.5. The loss of a single disk leaves the RAID in "partial" mode,
when it should be "degraded".

I've tried to explicitly specify the number of stripes, but it did not
make a difference. After adding the missing disk back, the array is
healthy again. Please see below.

# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.5 (Ootpa)
# lvm version
  LVM version:     2.03.12(2)-RHEL8 (2021-05-19)
  Library version: 1.02.177-RHEL8 (2021-05-19)
  Driver version:  4.43.0

# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda             8:0    0   50G  0 disk
├─sda1          8:1    0    1G  0 part /boot
└─sda2          8:2    0   49G  0 part
  ├─rhel-root 253:0    0   44G  0 lvm  /
  └─rhel-swap 253:1    0    5G  0 lvm  [SWAP]
sdb             8:16   0   70G  0 disk
sdc             8:32   0  100G  0 disk
sdd             8:48   0  100G  0 disk
sde             8:64   0  100G  0 disk
sdf             8:80   0  100G  0 disk
sdg             8:96   0  100G  0 disk

# pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
# vgcreate pool_vg /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg

# lvcreate -l +100%FREE -n pool_lv --type raid6 --stripes 3
--stripesize 1 pool_vg
  Invalid stripe size 1.00 KiB.
  Run `lvcreate --help' for more information.

# lvcreate -l +100%FREE -n pool_lv --type raid6 --stripes 3
--stripesize 4 pool_vg
  Logical volume "pool_lv" created.

# mkfs.xfs /dev/pool_vg/pool_lv
# echo "/dev/mapper/pool_vg-pool_lv /mnt xfs
defaults,x-systemd.mount-timeout=30 0 0" >> /etc/fstab
# mount -a
# touch /mnt/test

Note the RAID is correctly striped across all 5 disks:

# lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
  LV                 Attr       Cpy%Sync Health          Devices
  pool_lv            rwi-aor--- 100.00
pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
  [pool_lv_rimage_0] iwi-aor---                          /dev/sdc(1)
  [pool_lv_rimage_1] iwi-aor---                          /dev/sdd(1)
  [pool_lv_rimage_2] iwi-aor---                          /dev/sde(1)
  [pool_lv_rimage_3] iwi-aor---                          /dev/sdf(1)
  [pool_lv_rimage_4] iwi-aor---                          /dev/sdg(1)
  [pool_lv_rmeta_0]  ewi-aor---                          /dev/sdc(0)
  [pool_lv_rmeta_1]  ewi-aor---                          /dev/sdd(0)
  [pool_lv_rmeta_2]  ewi-aor---                          /dev/sde(0)
  [pool_lv_rmeta_3]  ewi-aor---                          /dev/sdf(0)
  [pool_lv_rmeta_4]  ewi-aor---                          /dev/sdg(0)

After shutting down the OS and removing a disk, reboot drops the
system into a single user mode because it cannot mount /mnt! The RAID
is now in "partial" mode, when it must be just "degraded".

# lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
  WARNING: Couldn't find device with uuid
d5y3gp-taRv-2YMa-3mR0-94ZZ-72Od-IKF8Co.
  WARNING: VG pool_vg is missing PV
d5y3gp-taRv-2YMa-3mR0-94ZZ-72Od-IKF8Co (last written to /dev/sdc).
  LV                 Attr       Cpy%Sync Health          Devices
  pool_lv            rwi---r-p-          partial
pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
  [pool_lv_rimage_0] Iwi---r-p-          partial         [unknown](1)
  [pool_lv_rimage_1] Iwi---r---                          /dev/sdc(1)
  [pool_lv_rimage_2] Iwi---r---                          /dev/sdd(1)
  [pool_lv_rimage_3] Iwi---r---                          /dev/sde(1)
  [pool_lv_rimage_4] Iwi---r---                          /dev/sdf(1)
  [pool_lv_rmeta_0]  ewi---r-p-          partial         [unknown](0)
  [pool_lv_rmeta_1]  ewi---r---                          /dev/sdc(0)
  [pool_lv_rmeta_2]  ewi---r---                          /dev/sdd(0)
  [pool_lv_rmeta_3]  ewi---r---                          /dev/sde(0)
  [pool_lv_rmeta_4]  ewi---r---                          /dev/sdf(0)

After adding the missing disk back, the system boots correctly and
there are no issues with the RAID:

# lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
  LV                 Attr       Cpy%Sync Health          Devices
  pool_lv            rwi-a-r--- 100.00
pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
  [pool_lv_rimage_0] iwi-aor---                          /dev/sdc(1)
  [pool_lv_rimage_1] iwi-aor---                          /dev/sdd(1)
  [pool_lv_rimage_2] iwi-aor---                          /dev/sde(1)
  [pool_lv_rimage_3] iwi-aor---                          /dev/sdf(1)
  [pool_lv_rimage_4] iwi-aor---                          /dev/sdg(1)
  [pool_lv_rmeta_0]  ewi-aor---                          /dev/sdc(0)
  [pool_lv_rmeta_1]  ewi-aor---                          /dev/sdd(0)
  [pool_lv_rmeta_2]  ewi-aor---                          /dev/sde(0)
  [pool_lv_rmeta_3]  ewi-aor---                          /dev/sdf(0)
  [pool_lv_rmeta_4]  ewi-aor---                          /dev/sdg(0)


On Tue, Jan 25, 2022 at 11:44 AM Andrei Rodionov
<andrei.rodionov@gmail.com> wrote:
>
> Hello,
>
> I've provisioned an LVM RAID 6 across 5 physical disks. I'm trying to understand the RAID behavior after injecting the failure - removing physical disk /dev/sdc.


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] LVM RAID behavior after losing physical disk
  2022-01-27 16:14 ` Andrei Rodionov
@ 2022-01-29 21:46   ` John Stoffel
  0 siblings, 0 replies; 4+ messages in thread
From: John Stoffel @ 2022-01-29 21:46 UTC (permalink / raw)
  To: LVM general discussion and development


Andrei> I apologize for replying to my own message, I was subscribed in the
Andrei> digest mode...

No problem

Andrei> I've been hitting my head against this one for a while
Andrei> now. Originally discovered this on Ubuntu 20.04, but I'm
Andrei> seeing the same issue with RHEL 8.5. The loss of a single disk
Andrei> leaves the RAID in "partial" mode, when it should be
Andrei> "degraded".

Ouch, this isn't good.  But why aren't you using MD RAID on top of the
disks (partitioned ideally in my book) and then turn that MD device
into a PV in a VG, and then make your LVs in there?  

Andrei> I've tried to explicitly specify the number of stripes, but it
Andrei> did not make a difference. After adding the missing disk back,
Andrei> the array is healthy again. Please see below.

I'm wondering if there's something setup in the defaults for
/etc/lvm.conf which makes a degraded array fail, instead of coming up
degraded.

But honestly, if you're looking for disk level redundancy, then I'd
stronly recommend you use disks -> MD RAID6 -> LVM - > filesystem  for
your data.

It's reliable, durable and well understood.

I know there's an attraction to runnning LVs and RAID all together,
since that should be easier to manage, right?  But I think not.

Have you tried to activate the LV using:

   lvchange -ay --activationmode degraded LV
   
as a test?  What does it say?  I'm looking at the lvmraid man page for
this suggestion.


Andrei> # cat /etc/redhat-release
Andrei> Red Hat Enterprise Linux release 8.5 (Ootpa)
Andrei> # lvm version
Andrei>   LVM version:     2.03.12(2)-RHEL8 (2021-05-19)
Andrei>   Library version: 1.02.177-RHEL8 (2021-05-19)
Andrei>   Driver version:  4.43.0

Andrei> # lsblk
Andrei> NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
Andrei> sda             8:0    0   50G  0 disk
Andrei> ├─sda1          8:1    0    1G  0 part /boot
Andrei> └─sda2          8:2    0   49G  0 part
Andrei>   ├─rhel-root 253:0    0   44G  0 lvm  /
Andrei>   └─rhel-swap 253:1    0    5G  0 lvm  [SWAP]
Andrei> sdb             8:16   0   70G  0 disk
Andrei> sdc             8:32   0  100G  0 disk
Andrei> sdd             8:48   0  100G  0 disk
Andrei> sde             8:64   0  100G  0 disk
Andrei> sdf             8:80   0  100G  0 disk
Andrei> sdg             8:96   0  100G  0 disk

Andrei> # pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
Andrei> # vgcreate pool_vg /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg

Andrei> # lvcreate -l +100%FREE -n pool_lv --type raid6 --stripes 3
Andrei> --stripesize 1 pool_vg
Andrei>   Invalid stripe size 1.00 KiB.
Andrei>   Run `lvcreate --help' for more information.

Andrei> # lvcreate -l +100%FREE -n pool_lv --type raid6 --stripes 3
Andrei> --stripesize 4 pool_vg
Andrei>   Logical volume "pool_lv" created.

Andrei> # mkfs.xfs /dev/pool_vg/pool_lv
Andrei> # echo "/dev/mapper/pool_vg-pool_lv /mnt xfs
Andrei> defaults,x-systemd.mount-timeout=30 0 0" >> /etc/fstab
Andrei> # mount -a
Andrei> # touch /mnt/test

Andrei> Note the RAID is correctly striped across all 5 disks:

Andrei> # lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
Andrei>   LV                 Attr       Cpy%Sync Health          Devices
Andrei>   pool_lv            rwi-aor--- 100.00
Andrei> pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
Andrei>   [pool_lv_rimage_0] iwi-aor---                          /dev/sdc(1)
Andrei>   [pool_lv_rimage_1] iwi-aor---                          /dev/sdd(1)
Andrei>   [pool_lv_rimage_2] iwi-aor---                          /dev/sde(1)
Andrei>   [pool_lv_rimage_3] iwi-aor---                          /dev/sdf(1)
Andrei>   [pool_lv_rimage_4] iwi-aor---                          /dev/sdg(1)
Andrei>   [pool_lv_rmeta_0]  ewi-aor---                          /dev/sdc(0)
Andrei>   [pool_lv_rmeta_1]  ewi-aor---                          /dev/sdd(0)
Andrei>   [pool_lv_rmeta_2]  ewi-aor---                          /dev/sde(0)
Andrei>   [pool_lv_rmeta_3]  ewi-aor---                          /dev/sdf(0)
Andrei>   [pool_lv_rmeta_4]  ewi-aor---                          /dev/sdg(0)

Andrei> After shutting down the OS and removing a disk, reboot drops the
Andrei> system into a single user mode because it cannot mount /mnt! The RAID
Andrei> is now in "partial" mode, when it must be just "degraded".

Andrei> # lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
Andrei>   WARNING: Couldn't find device with uuid
Andrei> d5y3gp-taRv-2YMa-3mR0-94ZZ-72Od-IKF8Co.
Andrei>   WARNING: VG pool_vg is missing PV
Andrei> d5y3gp-taRv-2YMa-3mR0-94ZZ-72Od-IKF8Co (last written to /dev/sdc).
Andrei>   LV                 Attr       Cpy%Sync Health          Devices
Andrei>   pool_lv            rwi---r-p-          partial
Andrei> pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
Andrei>   [pool_lv_rimage_0] Iwi---r-p-          partial         [unknown](1)
Andrei>   [pool_lv_rimage_1] Iwi---r---                          /dev/sdc(1)
Andrei>   [pool_lv_rimage_2] Iwi---r---                          /dev/sdd(1)
Andrei>   [pool_lv_rimage_3] Iwi---r---                          /dev/sde(1)
Andrei>   [pool_lv_rimage_4] Iwi---r---                          /dev/sdf(1)
Andrei>   [pool_lv_rmeta_0]  ewi---r-p-          partial         [unknown](0)
Andrei>   [pool_lv_rmeta_1]  ewi---r---                          /dev/sdc(0)
Andrei>   [pool_lv_rmeta_2]  ewi---r---                          /dev/sdd(0)
Andrei>   [pool_lv_rmeta_3]  ewi---r---                          /dev/sde(0)
Andrei>   [pool_lv_rmeta_4]  ewi---r---                          /dev/sdf(0)

Andrei> After adding the missing disk back, the system boots correctly and
Andrei> there are no issues with the RAID:

Andrei> # lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
Andrei>   LV                 Attr       Cpy%Sync Health          Devices
Andrei>   pool_lv            rwi-a-r--- 100.00
Andrei> pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
Andrei>   [pool_lv_rimage_0] iwi-aor---                          /dev/sdc(1)
Andrei>   [pool_lv_rimage_1] iwi-aor---                          /dev/sdd(1)
Andrei>   [pool_lv_rimage_2] iwi-aor---                          /dev/sde(1)
Andrei>   [pool_lv_rimage_3] iwi-aor---                          /dev/sdf(1)
Andrei>   [pool_lv_rimage_4] iwi-aor---                          /dev/sdg(1)
Andrei>   [pool_lv_rmeta_0]  ewi-aor---                          /dev/sdc(0)
Andrei>   [pool_lv_rmeta_1]  ewi-aor---                          /dev/sdd(0)
Andrei>   [pool_lv_rmeta_2]  ewi-aor---                          /dev/sde(0)
Andrei>   [pool_lv_rmeta_3]  ewi-aor---                          /dev/sdf(0)
Andrei>   [pool_lv_rmeta_4]  ewi-aor---                          /dev/sdg(0)


Andrei> On Tue, Jan 25, 2022 at 11:44 AM Andrei Rodionov
Andrei> <andrei.rodionov@gmail.com> wrote:
>> 
>> Hello,
>> 
>> I've provisioned an LVM RAID 6 across 5 physical disks. I'm trying to understand the RAID behavior after injecting the failure - removing physical disk /dev/sdc.


Andrei> _______________________________________________
Andrei> linux-lvm mailing list
Andrei> linux-lvm@redhat.com
Andrei> https://listman.redhat.com/mailman/listinfo/linux-lvm
Andrei> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-29 21:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25 18:44 [linux-lvm] LVM RAID behavior after losing physical disk Andrei Rodionov
2022-01-27 13:16 ` John Stoffel
2022-01-27 16:14 ` Andrei Rodionov
2022-01-29 21:46   ` John Stoffel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).