From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast04.extmail.prod.ext.rdu2.redhat.com [10.11.55.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6155C202348A for ; Fri, 2 Oct 2020 13:05:44 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E2A55101A56A for ; Fri, 2 Oct 2020 13:05:43 +0000 (UTC) MIME-Version: 1.0 References: <73d0ffcd-4ed5-38b1-0d17-a4b16c7863d6@redhat.com> In-Reply-To: From: Duncan Townsend Date: Fri, 2 Oct 2020 08:05:28 -0500 Message-ID: Content-Type: multipart/alternative; boundary="0000000000000ca46305b0afca6d" Subject: Re: [linux-lvm] thin: pool target too small Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Zdenek Kabelac Cc: LVM general discussion and development --0000000000000ca46305b0afca6d Content-Type: text/plain; charset="UTF-8" On Wed, Sep 30, 2020, 1:00 PM Duncan Townsend wrote: > On Tue, Sep 29, 2020, 10:54 AM Zdenek Kabelac wrote: > >> Dne 29. 09. 20 v 16:33 Duncan Townsend napsal(a): >> > On Sat, Sep 26, 2020, 8:30 AM Duncan Townsend > > > wrote: >> > >> >> > > There were further error messages as further snapshots were >> attempted, >> > > > but I was unable to capture them as my system went down. Upon >> reboot, >> > > > the "transaction_id" message that I referred to in my previous >> message >> > > > was repeated (but with increased transaction IDs). >> > > >> > > For better fix it would need to be better understood what has >> happened >> > > in parallel while 'lvm' inside dmeventd was resizing pool data. >> > >> >> So the lvm2 has been fixed upstream to report more educative messages to >> the user - although it still does require some experience in managing >> thin-pool kernel metadata and lvm2 metadata. >> > > That's good news! However, I believe I lack the requisite experience. Is > there some documentation that I ought to read as a starting point? Or is it > best to just read the source? > > > To the best of my knowledge, no other LVM operations were in flight >> at >> > the time. The script that I use issues LVM commands strictly >> >> In your case - dmeventd did 'unlocked' resize - while other command >> was taking a snapshot - and it happened the sequence with 'snapshot' has >> won - so until the reload of thin-pool - lvm2 has not spotted difference. >> (which is simply a bad race cause due to badly working locking on your >> system) >> > > After reading more about lvm locking, it looks like the original issue > might have been that the locking directory lives on a lv instead of on a > non-lvm-managed block device. (Although, the locking directory is on a > different vg on a different pv from the one that had the error.) > > Is there a way to make dmeventd (or any other lvm program) abort if this > locking fails? Should I switch to using a clustered locking daemon (even > though I have only the single, non-virtualized host)? > > > Would it be reasonable to use vgcfgrestore again on the >> > manually-repaired metadata I used before? I'm not entirely sure what >> >> You will need to vgcfgrestore - but I think you've misused my passed >> recoverd >> piece, where I've specifically asked to only replace specific segments of >> resized thin-pool within your latest VG metadata - since those likely have >> all the proper mappings to thin LVs. >> > > All I did was use vgcfgrestore to apply the metadata file attached to your > previous private email. I had to edit the transaction number, as I noted > previously. That was a single line change. Was that the wrong thing to do? > I lack the experience with lvm/thin metadata, so I am flying a bit blind > here. I apologize if I've made things worse. > > While you have taken the metadata from 'resize' moment - you've lost all >> the thinLV lvm2 metadata for later created one. >> >> I'll try to make one for you. >> > > Thank you very much. I am extremely grateful that you've helped me so much > in repairing my system. > > > to look for while editing the XML from thin_dump, and I would very >> > much like to avoid causing further damage to my system. (Also, FWIW, >> > thin_dump appears to segfault when run with musl-libc instead of >> >> Well - lvm2 is glibc oriented project - so users of those 'esoteric' >> distribution need to be expert on its own. >> >> If you can provide coredump or even better patch for crash - we might >> replace the code with something better usable - but there is zero testing >> with anything else then glibc... >> > > Noted. I believe I'll be switching to glibc because there are a number of > other packages that are broken for this distro. > > If you have an interest, this is the issue I've opened with my distro > about the crash: https://github.com/void-linux/void-packages/issues/25125 > . I despair that this will receive much attention, given that not even gdb > works properly. > Hello! Could somebody advise whether restoring the VG metadata is likely to cause this system's condition to worsen? At this point, all I want is to do is get the data off this drive and then start over with something more stable. Thanks for the help! --Duncan Townsend P.S. This was written on mobile. Please forgive my typos. > --0000000000000ca46305b0afca6d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Sep 30, 2020, 1:00 PM Duncan Townsend <dun= cancmt@gmail.com> wrote:
On Tue, Sep 29, 2020, 10:54 AM Zdenek Kabelac <zkabelac@redhat.com> wrote:
Dne 29. 09. 20 v 16:33 Duncan Townsend napsal(a):
> On Sat, Sep 26, 2020, 8:30 AM Duncan Townsend <duncancmt@gmail.com
> <mailto:duncancmt@gmail.com>> wrot= e:
>
>>=C2=A0 =C2=A0 =C2=A0 > > There were further error messages as= further snapshots were attempted,
>=C2=A0 =C2=A0 =C2=A0 > > but I was unable to capture them as my s= ystem went down. Upon reboot,
>=C2=A0 =C2=A0 =C2=A0 > > the "transaction_id" message t= hat I referred to in my previous message
>=C2=A0 =C2=A0 =C2=A0 > > was repeated (but with increased transac= tion IDs).
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 > For better fix it would need to be better und= erstood what has happened
>=C2=A0 =C2=A0 =C2=A0 > in parallel while 'lvm' inside dmeven= td was resizing pool data.
>

So the lvm2 has been fixed upstream to report more educative messages to the user - although it still does require some experience in managing
thin-pool kernel metadata and lvm2 metadata.

That's good news! However, = I believe I lack the requisite experience. Is there some documentation that= I ought to read as a starting point? Or is it best to just read the source= ?

>=C2=A0 =C2=A0 =C2=A0To the best of my knowledge, no other LVM operation= s were in flight at
>=C2=A0 =C2=A0 =C2=A0the time. The script that I use issues LVM commands= strictly

In your case - dmeventd did 'unlocked' resize - while other command=
was taking a snapshot - and it happened the sequence with 'snapshot'= ; has
won - so until the reload of thin-pool - lvm2 has not spotted difference. (which is simply a bad race cause due to badly working locking on your syst= em)

After reading more about lvm locking, it looks like the original issue m= ight have been that the locking directory lives on a lv instead of on a non= -lvm-managed block device. (Although, the locking directory is on a differe= nt vg on a different pv from the one that had the error.)

Is there a way to make dmeventd (or any o= ther lvm program) abort if this locking fails? Should I switch to using a c= lustered locking daemon (even though I have only the single, non-virtualize= d host)?

>=C2=A0 =C2=A0 =C2=A0Would it be reasonable to use vgcfgrestore again on= the
>=C2=A0 =C2=A0 =C2=A0manually-repaired metadata I used before? I'm n= ot entirely sure what

You will need to vgcfgrestore - but I think you've misused my passed re= coverd
piece, where I've specifically asked to only replace specific segments = of
resized thin-pool within your latest VG metadata - since those likely have<= br> all the proper mappings to thin LVs.

All I did was use vgcfgrestore to app= ly the metadata file attached to your previous private email. I had to edit= the transaction number, as I noted previously. That was a single line chan= ge. Was that the wrong thing to do? I lack the experience with lvm/thin met= adata, so I am flying a bit blind here. I apologize if I've made things= worse.

While you have taken the metadata from 'resize' moment - you've= lost all
the thinLV lvm2 metadata for later created one.

I'll try to make one for you.

Thank you very much. I am extremely gratef= ul that you've helped me so much in repairing my system.

>=C2=A0 =C2=A0 =C2=A0to look for while editing the XML from thin_dump, a= nd I would very
>=C2=A0 =C2=A0 =C2=A0much like to avoid causing further damage to my sys= tem. (Also, FWIW,
>=C2=A0 =C2=A0 =C2=A0thin_dump appears to segfault when run with musl-li= bc instead of

Well - lvm2 is glibc oriented project - so users of those 'esoteric'= ;
distribution need to be expert on its own.

If you can provide coredump or even better patch for crash - we might
replace the code with something better usable - but there is zero testing with anything else then glibc...

Noted. I believe I'll be switching to g= libc because there are a number of other packages that are broken for this = distro.

If you have an i= nterest, this is the issue I've opened with my distro about the crash:= =C2=A0https://github.com/void-lin= ux/void-packages/issues/25125 . I despair that this will receive much a= ttention, given that not even gdb works properly.
<= /div>

Hello! Could somebody ad= vise whether restoring the VG metadata is likely to cause this system's= condition to worsen? At this point, all I want is to do is get the data of= f this drive and then start over with something more stable.

Thanks for the help!
--Duncan Townsend

P.S. This was written on mobile. Pl= ease forgive my typos.
--0000000000000ca46305b0afca6d--