On Wed, Sep 30, 2020, 1:00 PM Duncan Townsend <duncancmt@gmail.com> wrote:

> On Tue, Sep 29, 2020, 10:54 AM Zdenek Kabelac <zkabelac@redhat.com> wrote:
>
>> Dne 29. 09. 20 v 16:33 Duncan Townsend napsal(a):
>> > On Sat, Sep 26, 2020, 8:30 AM Duncan Townsend <duncancmt@gmail.com
>> > <mailto:duncancmt@gmail.com>> wrote:
>> >
>> >>      > > There were further error messages as further snapshots were
>> attempted,
>> >      > > but I was unable to capture them as my system went down. Upon
>> reboot,
>> >      > > the "transaction_id" message that I referred to in my previous
>> message
>> >      > > was repeated (but with increased transaction IDs).
>> >      >
>> >      > For better fix it would need to be better understood what has
>> happened
>> >      > in parallel while 'lvm' inside dmeventd was resizing pool data.
>> >
>>
>> So the lvm2 has been fixed upstream to report more educative messages to
>> the user - although it still does require some experience in managing
>> thin-pool kernel metadata and lvm2 metadata.
>>
>
> That's good news! However, I believe I lack the requisite experience. Is
> there some documentation that I ought to read as a starting point? Or is it
> best to just read the source?
>
> >     To the best of my knowledge, no other LVM operations were in flight
>> at
>> >     the time. The script that I use issues LVM commands strictly
>>
>> In your case - dmeventd did 'unlocked' resize - while other command
>> was taking a snapshot - and it happened the sequence with 'snapshot' has
>> won - so until the reload of thin-pool - lvm2 has not spotted difference.
>> (which is simply a bad race cause due to badly working locking on your
>> system)
>>
>
> After reading more about lvm locking, it looks like the original issue
> might have been that the locking directory lives on a lv instead of on a
> non-lvm-managed block device. (Although, the locking directory is on a
> different vg on a different pv from the one that had the error.)
>
> Is there a way to make dmeventd (or any other lvm program) abort if this
> locking fails? Should I switch to using a clustered locking daemon (even
> though I have only the single, non-virtualized host)?
>
> >     Would it be reasonable to use vgcfgrestore again on the
>> >     manually-repaired metadata I used before? I'm not entirely sure what
>>
>> You will need to vgcfgrestore - but I think you've misused my passed
>> recoverd
>> piece, where I've specifically asked to only replace specific segments of
>> resized thin-pool within your latest VG metadata - since those likely have
>> all the proper mappings to thin LVs.
>>
>
> All I did was use vgcfgrestore to apply the metadata file attached to your
> previous private email. I had to edit the transaction number, as I noted
> previously. That was a single line change. Was that the wrong thing to do?
> I lack the experience with lvm/thin metadata, so I am flying a bit blind
> here. I apologize if I've made things worse.
>
> While you have taken the metadata from 'resize' moment - you've lost all
>> the thinLV lvm2 metadata for later created one.
>>
>> I'll try to make one for you.
>>
>
> Thank you very much. I am extremely grateful that you've helped me so much
> in repairing my system.
>
> >     to look for while editing the XML from thin_dump, and I would very
>> >     much like to avoid causing further damage to my system. (Also, FWIW,
>> >     thin_dump appears to segfault when run with musl-libc instead of
>>
>> Well - lvm2 is glibc oriented project - so users of those 'esoteric'
>> distribution need to be expert on its own.
>>
>> If you can provide coredump or even better patch for crash - we might
>> replace the code with something better usable - but there is zero testing
>> with anything else then glibc...
>>
>
> Noted. I believe I'll be switching to glibc because there are a number of
> other packages that are broken for this distro.
>
> If you have an interest, this is the issue I've opened with my distro
> about the crash: https://github.com/void-linux/void-packages/issues/25125
> . I despair that this will receive much attention, given that not even gdb
> works properly.
>

Hello! Could somebody advise whether restoring the VG metadata is likely to
cause this system's condition to worsen? At this point, all I want is to do
is get the data off this drive and then start over with something more
stable.

Thanks for the help!
--Duncan Townsend

P.S. This was written on mobile. Please forgive my typos.

>