From: Olaf Seibert <o.seibert@syseleven.de>
To: linux-lvm@redhat.com
Subject: Re: [linux-lvm] raid10 with missing redundancy, but health status claims it is ok.
Date: Mon, 30 May 2022 10:16:27 +0200 [thread overview]
Message-ID: <e86ef2bc-84f5-188c-cde4-5848a1c95648@syseleven.de> (raw)
In-Reply-To: <25234.19124.329350.465135@quad.stoffel.home>
First, John, thanks for your reply.
On 28.05.22 18:15, John Stoffel wrote:
>>>>>> "Olaf" == Olaf Seibert <o.seibert@syseleven.de> writes:
>
> I'm leaving for the rest of the weekend, but hopefully this will help you...
>
> Olaf> Hi all, I'm new to this list. I hope somebody here can help me.
>
> We will try! But I would strongly urge that you take backups of all
> your data NOW, before you do anything else. Copy to another disk
> which is seperate from this system just in case.
Unfortunately there are some complicating factors that I left out so far.
The machine in question is a host for virtual machines run by customers.
So we can't just even look at the data, never mind rsyncing it.
(the name "nova" might have given that away; that is the name of the
OpenStack compute service)
> My next suggestion would be for you to provide the output of the
> 'pvs', 'vgs' and 'lvs' commands. Also, which disk died? And have
> you replaced it?
/dev/sde died. It is still in the machine.
$ sudo pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 system lvm2 a-- 445.22g 347.95g
/dev/sdb2 system lvm2 a-- 445.22g 347.94g
/dev/sdc1 nova lvm2 a-- 1.75t 412.19g
/dev/sdd1 nova lvm2 a-- 1.75t 1.75t
/dev/sdf1 nova lvm2 a-- 1.75t 812.25g
/dev/sdg1 nova lvm2 a-- 1.75t 1.75t
/dev/sdh1 nova lvm2 a-- 1.75t 1.75t
/dev/sdi1 nova lvm2 a-- 1.75t 412.19g
/dev/sdj1 nova lvm2 a-- 1.75t 412.19g
$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
nova 7 20 0 wz--n- 12.23t 7.24t
system 2 2 0 wz--n- 890.45g 695.89g
$ sudo lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
1b77 nova Rwi-aor--- 50.00g 100.00
1c13 nova Rwi-aor--- 50.00g 100.00
203f nova Rwi-aor--- 800.00g 100.00
3077 nova Rwi-aor--- 50.00g 100.00
61a0 nova Rwi-a-r--- 50.00g 100.00
63c1 nova Rwi-aor--- 50.00g 100.00
8958 nova Rwi-aor--- 800.00g 100.00
8a4f nova Rwi-aor--- 50.00g 100.00
965a nova Rwi-aor--- 100.00g 100.00
9d89 nova Rwi-aor--- 200.00g 100.00
9df4 nova Rwi-a-r--- 50.00g 100.00
b41b nova Rwi-aor--- 50.00g 100.00
c517 nova Rwi-aor--- 50.00g 100.00
d36b nova Rwi-aor--- 50.00g 100.00
dd1b nova Rwi-a-r--- 50.00g 100.00
e2ed nova Rwi-aor--- 50.00g 100.00
ef6c nova Rwi-aor--- 50.00g 100.00
f5ce nova Rwi-aor--- 100.00g 100.00
f952 nova Rwi-aor--- 50.00g 100.00
fbf6 nova Rwi-aor--- 50.00g 100.00
boot system mwi-aom--- 1.91g [boot_mlog] 100.00
root system mwi-aom--- 95.37g [root_mlog] 100.00
I am abbreviating the LV names since they are long boring UUIDs
related to customer data. "203f" is "lvname", the LV which has problems.
> My second suggestion would be for you to use 'md' as the lower level
> RAID1/10/5/6 level underneath your LVM volumes. Alot of people think
> it's better to have it all in one tool (btrfs, zfs, others) but I
> stronly feel that using nice layers helps keep things organized and
> reliable.
>
> So if you can, add two new disks into your system, add a full-disk
> partition which starts at offset of 1mb or so, and maybe even leaves a
> couple of MBs of free space at the end, and then create an MD pair on
> them:
I am not sure if there are any free slots for more disks. We would need
to send somebody to the datacenter to put in any disks in any case.
I think I understand what you are getting at here, redundancy-wise.
But won't it confuse LVM? If it decides to store one side of any mirror on
this new md0, won't this result in 3 copies of the data for that volume?
In the list of commands I tried, there was this one:
> Olaf> $ sudo lvchange --resync nova/lvname
> Olaf> WARNING: Not using lvmetad because a repair command was run.
> Olaf> Logical volume nova/lvname in use.
> Olaf> Can't resync open logical volume nova/lvname.
Any chance that this command might work, if we can ask the customer to
shut down their VM for a while?
On the other hand, there were some other commands that took a while to run,
and therefore seemed to do something, but in the end they didn't.
It seems that this "error" segment (which seems to have replaced the bad disk)
is really confusing LVM. Such as `lvconvert --repair` which apparently
worked on the other LVs.
> mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdy1 /dev/sdz1
>
> Now you can add that disk in your nova VG with:
>
> vgextend nova /dev/md0
>
> Then try to move your LV named 'lvname' onto the new MD PV.
>
> pvmove -n lvname /dev/<source_PV> /dev/md0
>
> I think you really want to move the *entire* top level LV onto new
> storage. Then you will know you have safe data. And this can be done
> while the volume is up and running.
>
> But again!!!!!! Please take a backup (rsync onto a new LV maybe?) of
> your current data to make sure you don't lose anything.
>
> Olaf> We had a disk go bad (disk commands timed out and took many
> Olaf> seconds to do so) in our LVM installation with mirroring. With
> Olaf> some trouble, we managed to pvremove the offending disk, and
> Olaf> used `lvconvert --repair -y nova/$lv` to repair (restore
> Olaf> redundancy) the logical volumes.
>
> How many disks do you have in the system? Please don't try to hide
> names of disks and such unless you really need to. It makes it much
> harder to diagnose.
There are 10 disks (sda-j) of which sde is broken and no longer listed.
> Olaf> It seems like somehow we must convince LVM to allocate some space for
> Olaf> it, instead of using the error segment (there is plenty available in the
> Olaf> volume group).
Thanks,
-Olaf.
--
SysEleven GmbH
Boxhagener Straße 80
10245 Berlin
T +49 30 233 2012 0
F +49 30 616 7555 0
http://www.syseleven.de
http://www.facebook.com/SysEleven
https://www.instagram.com/syseleven/
Aktueller System-Status immer unter:
http://www.twitter.com/syseleven
Firmensitz: Berlin
Registergericht: AG Berlin Charlottenburg, HRB 108571 B
Geschäftsführer: Marc Korthaus, Jens Ihlenfeld, Andreas Hermann
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
next prev parent reply other threads:[~2022-05-30 8:16 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-27 13:56 [linux-lvm] raid10 with missing redundancy, but health status claims it is ok Olaf Seibert
2022-05-28 16:15 ` John Stoffel
2022-05-30 8:16 ` Olaf Seibert [this message]
2022-05-30 8:49 ` Olaf Seibert
2022-06-01 21:58 ` John Stoffel
2022-05-30 14:07 ` Demi Marie Obenour
2022-05-31 11:27 ` Olaf Seibert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e86ef2bc-84f5-188c-cde4-5848a1c95648@syseleven.de \
--to=o.seibert@syseleven.de \
--cc=linux-lvm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).