linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: "John Stoffel" <john@stoffel.org>
To: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] raid10 with missing redundancy, but health status claims it is ok.
Date: Sat, 28 May 2022 12:15:48 -0400	[thread overview]
Message-ID: <25234.19124.329350.465135@quad.stoffel.home> (raw)
In-Reply-To: <4ea715cc-3a58-bb47-af41-f40e630f93f3@syseleven.de>

>>>>> "Olaf" == Olaf Seibert <o.seibert@syseleven.de> writes:

I'm leaving for the rest of the weekend, but hopefully this will help you...

Olaf> Hi all, I'm new to this list. I hope somebody here can help me.

We will try!  But I would strongly urge that you take backups of all
your data NOW, before you do anything else.  Copy to another disk
which is seperate from this system just in case.

My next suggestion would be for you to provide the output of the
'pvs', 'vgs' and 'lvs' commands.   Also, which disk died?  And have
you replaced it?    

My second suggestion would be for you to use 'md' as the lower level
RAID1/10/5/6 level underneath your LVM volumes.  Alot of people think
it's better to have it all in one tool (btrfs, zfs, others) but I
stronly feel that using nice layers helps keep things organized and
reliable.

So if you can, add two new disks into your system, add a full-disk
partition which starts at offset of 1mb or so, and maybe even leaves a
couple of MBs of free space at the end, and then create an MD pair on
them:

   mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdy1 /dev/sdz1
     
Now you can add that disk in your nova VG with:

   vgextend nova /dev/md0

Then try to move your LV named 'lvname' onto the new MD PV.

   pvmove -n lvname /dev/<source_PV> /dev/md0

I think you really want to move the *entire* top level LV onto new
storage.  Then you will know you have safe data.  And this can be done
while the volume is up and running.

But again!!!!!!  Please take a backup (rsync onto a new LV maybe?) of
your current data to make sure you don't lose anything.  

Olaf> We had a disk go bad (disk commands timed out and took many
Olaf> seconds to do so) in our LVM installation with mirroring. With
Olaf> some trouble, we managed to pvremove the offending disk, and
Olaf> used `lvconvert --repair -y nova/$lv` to repair (restore
Olaf> redundancy) the logical volumes.

How many disks do you have in the system?  Please don't try to hide
names of disks and such unless you really need to.  It makes it much
harder to diagnose.  


Olaf> One logical volume still seems to have trouble though. In `lvs -o
Olaf> devices -a` it shows no devices for 2 of its subvolumes, and it has the
Olaf> weird 'v' status:


Olaf>   LV                                                   VG     Attr
Olaf>  LSize   Pool Origin Data%  Meta%  Move Log         Cpy%Sync Convert Devices
Olaf>   lvname            nova   Rwi-aor--- 800.00g
Olaf>                  100.00
Olaf> lvname_rimage_0(0),lvname_rimage_1(0),lvname_rimage_2(0),lvname_rimage_3(0)
Olaf>   [lvname_rimage_0] nova   iwi-aor--- 400.00g
Olaf>                                   /dev/sdc1(19605)
Olaf>   [lvname_rimage_1] nova   iwi-aor--- 400.00g
Olaf>                                   /dev/sdi1(19605)
Olaf>   [lvname_rimage_2] nova   vwi---r--- 400.00g
Olaf>   [lvname_rimage_3] nova   iwi-aor--- 400.00g
Olaf>                                   /dev/sdj1(19605)
Olaf>   [lvname_rmeta_0]  nova   ewi-aor---  64.00m
Olaf>                                   /dev/sdc1(19604)
Olaf>   [lvname_rmeta_1]  nova   ewi-aor---  64.00m
Olaf>                                   /dev/sdi1(19604)
Olaf>   [lvname_rmeta_2]  nova   ewi---r---  64.00m
Olaf>   [lvname_rmeta_3]  nova   ewi-aor---  64.00m
Olaf>                                   /dev/sdj1(19604)

Olaf> ```
Olaf> and also according to `lvdisplay -am` there is a problem with
Olaf> `..._rimage2` and `..._rmeta2`:
Olaf> ```
Olaf>   --- Logical volume ---
Olaf>   Internal LV Name       lvname_rimage_2
Olaf>   VG Name                nova
Olaf>   LV UUID                xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Olaf>   LV Write Access        read/write
Olaf>   LV Creation host, time xxxxxxxxx, 2021-07-09 16:45:21 +0000
Olaf>   LV Status              NOT available
Olaf>   LV Size                400.00 GiB
Olaf>   Current LE             6400
Olaf>   Segments               1
Olaf>   Allocation             inherit
Olaf>   Read ahead sectors     auto

Olaf>   --- Segments ---
Olaf>   Virtual extents 0 to 6399:
Olaf>     Type                error

Olaf>   --- Logical volume ---
Olaf>   Internal LV Name       lvname_rmeta_2
Olaf>   VG Name                nova
Olaf>   LV UUID                xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Olaf>   LV Write Access        read/write
Olaf>   LV Creation host, time xxxxxxxxx, 2021-07-09 16:45:21 +0000
Olaf>   LV Status              NOT available
Olaf>   LV Size                64.00 MiB
Olaf>   Current LE             1
Olaf>   Segments               1
Olaf>   Allocation             inherit
Olaf>   Read ahead sectors     auto

Olaf>   --- Segments ---
Olaf>   Virtual extents 0 to 0:
Olaf>     Type                error

Olaf> Similarly, the metadata looks corresponding:

Olaf>                 lvname_rimage_2 {
Olaf>                         id = "..."
Olaf>                         status = ["READ", "WRITE"]
Olaf>                         flags = []
Olaf>                         creation_time = 1625849121      # 2021-07-09
Olaf> 16:45:21 +0000
Olaf>                         creation_host = "cbk130133"
Olaf>                         segment_count = 1

Olaf>                         segment1 {
Olaf>                                 start_extent = 0
Olaf>                                 extent_count = 6400     # 400 Gigabytes

Olaf>                                 type = "error"
Olaf>                         }
Olaf>                 }




Olaf> On the other hand, the health status appears to read out normal:

Olaf> [13:38:20] root@cbk130133:~# lvs -o +lv_health_status
Olaf>   LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log
Olaf>       Cpy%Sync Convert Health
Olaf>   lvname nova   Rwi-aor--- 800.00g       ..   100.00



Olaf> We tried various combinations of `lvconvert --repair -y nova/$lv` and
Olaf> `lvchange --syncaction repair` on it without effect.
Olaf> `lvchange -ay` doesn't work either:

Olaf> $ sudo lvchange -ay   nova/lvname_rmeta_2
Olaf>   Operation not permitted on hidden LV nova/lvname_rmeta_2.
Olaf> $ sudo lvchange -ay   nova/lvname
Olaf> $ # (no effect)
Olaf> $ sudo lvconvert --repair nova/lvname_rimage_2
Olaf>   WARNING: Disabling lvmetad cache for repair command.
Olaf>   WARNING: Not using lvmetad because of repair.
Olaf>   Command on LV nova/lvname_rimage_2 does not accept LV type error.
Olaf>   Command not permitted on LV nova/lvname_rimage_2.
Olaf> $ sudo lvchange --resync nova/lvname_rimage_2
Olaf>   WARNING: Not using lvmetad because a repair command was run.
Olaf>   Command on LV nova/lvname_rimage_2 does not accept LV type error.
Olaf>   Command not permitted on LV nova/lvname_rimage_2.
Olaf> $ sudo lvchange --resync nova/lvname
Olaf>   WARNING: Not using lvmetad because a repair command was run.
Olaf>   Logical volume nova/lvname in use.
Olaf>   Can't resync open logical volume nova/lvname.
Olaf> $ lvchange --rebuild /dev/sdf1 nova/lvname
Olaf>   WARNING: Not using lvmetad because a repair command was run.
Olaf> Do you really want to rebuild 1 PVs of logical volume nova/lvname [y/n]: y
Olaf>   device-mapper: create ioctl on lvname_rmeta_2 LVM-blah failed: Device
Olaf> or resource busy
Olaf>   Failed to lock logical volume nova/lvname.
Olaf> $ lvchange --raidsyncaction repair nova/lvname
Olaf> # (took a long time to complete but didn't change anything)
Olaf> $ sudo lvconvert --mirrors +1 nova/lvname
Olaf>   Using default stripesize 64.00 KiB.
Olaf>   --mirrors/-m cannot be changed with raid10.



Olaf> Any idea how to restore redundancy on this logical volume? It is in
Olaf> continuous use, of course...
Olaf> It seems like somehow we must convince LVM to allocate some space for
Olaf> it, instead of using the error segment (there is plenty available in the
Olaf> volume group).

Olaf> Thanks in advance.

Olaf> -Olaf

Olaf> -- 
Olaf> SysEleven GmbH
Olaf> Boxhagener Straße 80
Olaf> 10245 Berlin

Olaf> T +49 30 233 2012 0
Olaf> F +49 30 616 7555 0

Olaf> http://www.syseleven.de
Olaf> http://www.facebook.com/SysEleven
Olaf> https://www.instagram.com/syseleven/

Olaf> Aktueller System-Status immer unter:
Olaf> http://www.twitter.com/syseleven

Olaf> Firmensitz: Berlin
Olaf> Registergericht: AG Berlin Charlottenburg, HRB 108571 B
Olaf> Geschäftsführer: Marc Korthaus, Jens Ihlenfeld, Andreas Hermann

Olaf> _______________________________________________
Olaf> linux-lvm mailing list
Olaf> linux-lvm@redhat.com
Olaf> https://listman.redhat.com/mailman/listinfo/linux-lvm
Olaf> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


  reply	other threads:[~2022-05-28 16:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-27 13:56 Olaf Seibert
2022-05-28 16:15 ` John Stoffel [this message]
2022-05-30  8:16   ` Olaf Seibert
2022-05-30  8:49     ` Olaf Seibert
2022-06-01 21:58       ` John Stoffel
2022-05-30 14:07     ` Demi Marie Obenour
2022-05-31 11:27       ` Olaf Seibert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25234.19124.329350.465135@quad.stoffel.home \
    --to=john@stoffel.org \
    --cc=linux-lvm@redhat.com \
    --subject='Re: [linux-lvm] raid10 with missing redundancy, but health status claims it is ok.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).