linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions
@ 2022-09-27 10:10 Roberto Fastec
  2022-09-29 10:52 ` Zdenek Kabelac
  2022-09-29 11:48 ` Gionatan Danti
  0 siblings, 2 replies; 6+ messages in thread
From: Roberto Fastec @ 2022-09-27 10:10 UTC (permalink / raw)
  To: LVM general discussion and development


[-- Attachment #1.1: Type: text/plain, Size: 1534 bytes --]

Dear friends of the LVM mailing list

I suppose this question is for some real LVM2 guru or even developer

Here I kindly make three question with three premises

premises
1. I'm a total noob about LVM2 low level logic, so I'm sorry of the
questions will sound silly :-)
2. The following applies to a whole md RAID (in my example it will be a
RAID5 made of 4 drives 1TB each so useful available space more or less
2.7TB)
3. I assign whole those 2.7TB to one single PV and one single VG and one
single LV.

questions
1. Given the premise 3. The corresponding LVM2 metadata/tables are and will
be just a (allow me the term) "grid" "mapping that space" in an ordered
sequence to in the subsequent use (and filling) of the RAID space "just
mark" the used ones and the free ones? Or those grid cells will/could be in
a messed order ?
And explicitly I mean. In case of metadata corruption (always with respect
of premise 3.) , could we just generate a dummy metadata table with all the
extents marked as "used" in such a way that we can anyway access them
And can we expect to have them ordered?

2. Does it exist a sort of "fsck" for the LVM2 metadata ? We do technical
assistance and recently, specifically with those NAS devices that make use
of LVM2, we have experienced really easy metadata corruption in occurence
of just nothing or because of a electric power interruption (which is
really astonishing). We mean no drives failures , no bad SMARTs . Just
corruption from "nowhere" and "nocause"

Thank you for any hint

Robert
Fastec

[-- Attachment #1.2: Type: text/html, Size: 1909 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions
  2022-09-27 10:10 [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions Roberto Fastec
@ 2022-09-29 10:52 ` Zdenek Kabelac
  2022-09-29 11:15   ` Roberto Fastec
  2022-09-29 11:48 ` Gionatan Danti
  1 sibling, 1 reply; 6+ messages in thread
From: Zdenek Kabelac @ 2022-09-29 10:52 UTC (permalink / raw)
  To: LVM general discussion and development, Roberto Fastec

Dne 27. 09. 22 v 12:10 Roberto Fastec napsal(a):
> Dear friends of the LVM mailing list
> 
> I suppose this question is for some real LVM2 guru or even developer
> 
> Here I kindly make three question with three premises
> 
> premises
> 1. I'm a total noob about LVM2 low level logic, so I'm sorry of the questions 
> will sound silly :-)
> 2. The following applies to a whole md RAID (in my example it will be a RAID5 
> made of 4 drives 1TB each so useful available space more or less 2.7TB)
> 3. I assign whole those 2.7TB to one single PV and one single VG and one 
> single LV.
> 
> questions
> 1. Given the premise 3. The corresponding LVM2 metadata/tables are and will be 
> just a (allow me the term) "grid" "mapping that space" in an ordered sequence 
> to in the subsequent use (and filling) of the RAID space "just mark" the used 
> ones and the free ones? Or those grid cells will/could be in a messed order ?
> And explicitly I mean. In case of metadata corruption (always with respect of 
> premise 3.) , could we just generate a dummy metadata table with all the 
> extents marked as "used" in such a way that we can anyway access them
> And can we expect to have them ordered?

lvm2  'metadata handling'  is purely internal to the lvm2 codebase - you can't 
rely on any 'witnessed/observed' logic.

There is cmdline API to access and manipulate metadata in most cases.

Temporarily you can i.e. update/modify your current metadata with 'vi' editor 
and vgcfgrestore them - however this is not a 'guaranteed' operational mode - 
rather a workaround if the 'cmdline' interface is not handling some error case 
well - and it should be used as  RFE to enhance lvm2 in such case.

> 
> 2. Does it exist a sort of "fsck" for the LVM2 metadata ? We do technical 
> assistance and recently, specifically with those NAS devices that make use of 

In general - lvm2 metadata on disk always do have CRC32  checksum - when 
invalid -> metadata is garbage.

Each loaded CRC32 correct metadata is always then fully validated - yep it can 
be sometimes a bit costly in the case of very large metadata size - but so far 
- no big problems -  CPUs are mostly getting faster as well...  so bigger 
setups tends to have also powerful hw....

> LVM2, we have experienced really easy metadata corruption in occurence of just 
> nothing or because of a electric power interruption (which is really 
> astonishing). We mean no drives failures , no bad SMARTs . Just corruption 
> from "nowhere" and "nocause"


Corrupted metadata are always considered unusable - user has to restore to 
previous valid version (and here sometimes all the combinations of error might 
eventually require  'vi editor' assistance - but again - in very very unusual 
circumstances.

Metadata are archived in /etc/lvm/archive and  they are also in ring-buffer 
present on all PVs in a VG  -  if there are too many PVs - user can 'opt-out' 
and consider only a subset of PVs to hold metadata - i.e.  200PVs - and only 
20PVs holding metadata - but these are highly unusual configurations...

Regards


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions
  2022-09-29 10:52 ` Zdenek Kabelac
@ 2022-09-29 11:15   ` Roberto Fastec
  2022-09-29 11:41     ` Zdenek Kabelac
  0 siblings, 1 reply; 6+ messages in thread
From: Roberto Fastec @ 2022-09-29 11:15 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: LVM general discussion and development


[-- Attachment #1.1: Type: text/plain, Size: 4268 bytes --]

Hello Zdenek
Thank you for the explanation

May I kindly ask you what/which is the command line API to access and manipulate those metadata?

And when you say vi editor, do you kindly mean direct edit of HEX values on the raw metadata?

Thank you

If you kindly may have some link to some documentation, thank you even more

Though here it is not the configuration that got lost

Also, additional info, we now got that all the cases do have active the thin-provisionin and looks like that these are additional/different metadata tables

So if these got messed/corrupted... 

In QNAP looks they have made some customization and so thin-provision LVM metadata are on a dedicated partition

we observed the HEX inside there and got partially the logic

About thin-provisioning, again, any "fsck"-like is available? (I suppose no, but just as confirmation)

Thank you
R.



Il giorno 29 set 2022, 12:52, alle ore 12:52, Zdenek Kabelac <zdenek.kabelac@gmail.com> ha scritto:
>Dne 27. 09. 22 v 12:10 Roberto Fastec napsal(a):
>> Dear friends of the LVM mailing list
>> 
>> I suppose this question is for some real LVM2 guru or even developer
>> 
>> Here I kindly make three question with three premises
>> 
>> premises
>> 1. I'm a total noob about LVM2 low level logic, so I'm sorry of the
>questions 
>> will sound silly :-)
>> 2. The following applies to a whole md RAID (in my example it will be
>a RAID5 
>> made of 4 drives 1TB each so useful available space more or less
>2.7TB)
>> 3. I assign whole those 2.7TB to one single PV and one single VG and
>one 
>> single LV.
>> 
>> questions
>> 1. Given the premise 3. The corresponding LVM2 metadata/tables are
>and will be 
>> just a (allow me the term) "grid" "mapping that space" in an ordered
>sequence 
>> to in the subsequent use (and filling) of the RAID space "just mark"
>the used 
>> ones and the free ones? Or those grid cells will/could be in a messed
>order ?
>> And explicitly I mean. In case of metadata corruption (always with
>respect of 
>> premise 3.) , could we just generate a dummy metadata table with all
>the 
>> extents marked as "used" in such a way that we can anyway access them
>> And can we expect to have them ordered?
>
>lvm2  'metadata handling'  is purely internal to the lvm2 codebase -
>you can't 
>rely on any 'witnessed/observed' logic.
>
>There is cmdline API to access and manipulate metadata in most cases.
>
>Temporarily you can i.e. update/modify your current metadata with 'vi'
>editor 
>and vgcfgrestore them - however this is not a 'guaranteed' operational
>mode - 
>rather a workaround if the 'cmdline' interface is not handling some
>error case 
>well - and it should be used as  RFE to enhance lvm2 in such case.
>
>> 
>> 2. Does it exist a sort of "fsck" for the LVM2 metadata ? We do
>technical 
>> assistance and recently, specifically with those NAS devices that
>make use of 
>
>In general - lvm2 metadata on disk always do have CRC32  checksum -
>when 
>invalid -> metadata is garbage.
>
>Each loaded CRC32 correct metadata is always then fully validated - yep
>it can 
>be sometimes a bit costly in the case of very large metadata size - but
>so far 
>- no big problems -  CPUs are mostly getting faster as well...  so
>bigger 
>setups tends to have also powerful hw....
>
>> LVM2, we have experienced really easy metadata corruption in
>occurence of just 
>> nothing or because of a electric power interruption (which is really 
>> astonishing). We mean no drives failures , no bad SMARTs . Just
>corruption 
>> from "nowhere" and "nocause"
>
>
>Corrupted metadata are always considered unusable - user has to restore
>to 
>previous valid version (and here sometimes all the combinations of
>error might 
>eventually require  'vi editor' assistance - but again - in very very
>unusual 
>circumstances.
>
>Metadata are archived in /etc/lvm/archive and  they are also in
>ring-buffer 
>present on all PVs in a VG  -  if there are too many PVs - user can
>'opt-out' 
>and consider only a subset of PVs to hold metadata - i.e.  200PVs - and
>only 
>20PVs holding metadata - but these are highly unusual configurations...
>
>Regards
>
>
>Zdenek

[-- Attachment #1.2: Type: text/html, Size: 5345 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions
  2022-09-29 11:15   ` Roberto Fastec
@ 2022-09-29 11:41     ` Zdenek Kabelac
  2022-09-29 12:12       ` Roberto Fastec
  0 siblings, 1 reply; 6+ messages in thread
From: Zdenek Kabelac @ 2022-09-29 11:41 UTC (permalink / raw)
  To: Roberto Fastec; +Cc: LVM general discussion and development

Dne 29. 09. 22 v 13:15 Roberto Fastec napsal(a):
> Hello Zdenek
> Thank you for the explanation
> 
> May I kindly ask you what/which is the command line API to access and 
> manipulate those metadata?
> 

'command line API' in the mean of:

To create LV --   'lvcreate'....
To remove LV --   'lvremove'....


Note - many command can actually work without physical interaction with DM 
layer  (--driverloaded n) - however in some case some targets require presence 
of DM.

lvm2 commands are the way how to change your metadata properly.


> And when you say vi editor, do you kindly mean direct edit of HEX values on 
> the raw metadata?

No way -  you can't change metadata on disk - unless you would be basically 
precisely copying what lvm2 command does  -  so what would be the point ??

Simply use lvm2 command to make the job.  Unless I'm missing some important 
point why would you need to work  with lvm2 metadata but without lvm2 ??


> 
> Thank you
> 
> If you kindly may have some link to some documentation, thank you even more
> 
> Though here it is not the configuration that got lost

Well yeah - it will take some time - but i.e. RHEL storage documentation might 
be a good way to go through it.



> Also, additional info, we now got that all the cases do have active the 
> thin-provisionin and looks like that these are additional/different metadata 
> tables

This-provisioning is handled by LVM2 only to provide  LVs for metadata and 
data LVs - and then the  thinLVs to a user.

Physical block layout for thin-provisioning is fully stored inside 
thin-pool's metadata device.

To explore those mappings you need to use tools like 'thin_dump', 'thin_ls'

> 
> So if these got messed/corrupted...
> 

If these thin-pool metadata get corrupted, there is tool: 'thin_repair'.

Note: corruption of some high-level bTree nodes may result a severe damage to 
whole metadata structure ->  i.e. lots of thinLVs being lost.

It's a good idea to keep such metadata on some resilient type of storage 
(raid) and of course  rule #1  - create regular backups of your thin 
volumes...   (snapshot of thinLV is not a backup!).


> In QNAP looks they have made some customization and so thin-provision LVM 
> metadata are on a dedicated partition
> 
> we observed the HEX inside there and got partially the logic
> 
> About thin-provisioning, again, any "fsck"-like is available? (I suppose no, 
> but just as confirmation)

This tool is called  'thin_check'

(and this tool is in fact executed with every thin-pool activation & 
deactivation by default by lvm2)

Note: just like with lvm2 metadata - also thin-pool's kernel metadata are 
check-summed (protected agains disc bit corruptions), so again zero chance 
with any 'hex-editor' to manipulate them - unless you would 'recreate' 
thin-pool engine...


Regards

Zdenek


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions
  2022-09-27 10:10 [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions Roberto Fastec
  2022-09-29 10:52 ` Zdenek Kabelac
@ 2022-09-29 11:48 ` Gionatan Danti
  1 sibling, 0 replies; 6+ messages in thread
From: Gionatan Danti @ 2022-09-29 11:48 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Roberto Fastec

Il 2022-09-27 12:10 Roberto Fastec ha scritto:
> questions
> 1. Given the premise 3. The corresponding LVM2 metadata/tables are and
> will be just a (allow me the term) "grid" "mapping that space" in an
> ordered sequence to in the subsequent use (and filling) of the RAID
> space "just mark" the used ones and the free ones? Or those grid cells
> will/could be in a messed order ?

Classical linear LVM volume (read: not lvmthin) are mostly concatenated 
4MB-sized chunk of space, but this is not a given (especially if some 
volumes changed in size).

> And explicitly I mean. In case of metadata corruption (always with
> respect of premise 3.) , could we just generate a dummy metadata table
> with all the extents marked as "used" in such a way that we can anyway
> access them

For linear volumes, one can try to setup a dmtable (or dummy metadata) 
to linearly read the data but, as stated above, this is far from 
reliable.

> 2. Does it exist a sort of "fsck" for the LVM2 metadata ? We do
> technical assistance and recently, specifically with those NAS devices
> that make use of LVM2, we have experienced really easy metadata
> corruption in occurence of just nothing or because of a electric power
> interruption (which is really astonishing). We mean no drives failures
> , no bad SMARTs . Just corruption from "nowhere" and "nocause"

For classical LVM, the metadata are actually backed up in ascii format 
unser /etc/lvm. While LVM itself keep a binary metadata representation, 
it also accept/store the textual so you can use the latter to restore 
your volumes.

Do you notice how I explicitly talked about *classical* volumes? This is 
because thin volumes (man lvmthin) use completely different, and much 
more complex, allocation strategies. Losing such metadata would kill the 
entire thin pool, and this is the reason a backup metadata volume is 
required for some operations. thincheck is effectively a sort ot 
"lvmthin fsck", but if you ever need to use it, be prepared to data loss 
(ranging from small to massive).

I saw various NAS that used custom-patched lvmthin volumes, and I 
suppose this is the root of your issues. If it is acceptable for your 
workload, try using classical LVM on these NAS.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions
  2022-09-29 11:41     ` Zdenek Kabelac
@ 2022-09-29 12:12       ` Roberto Fastec
  0 siblings, 0 replies; 6+ messages in thread
From: Roberto Fastec @ 2022-09-29 12:12 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: LVM general discussion and development


[-- Attachment #1.1: Type: text/plain, Size: 3293 bytes --]

Thank you for all the details and for your kind replies

I will have a look to the utilities you kindly pointed out

Kind regards 

Il giorno 29 set 2022, 13:41, alle ore 13:41, Zdenek Kabelac <zdenek.kabelac@gmail.com> ha scritto:
>Dne 29. 09. 22 v 13:15 Roberto Fastec napsal(a):
>> Hello Zdenek
>> Thank you for the explanation
>> 
>> May I kindly ask you what/which is the command line API to access and
>
>> manipulate those metadata?
>> 
>
>'command line API' in the mean of:
>
>To create LV --   'lvcreate'....
>To remove LV --   'lvremove'....
>
>
>Note - many command can actually work without physical interaction with
>DM 
>layer  (--driverloaded n) - however in some case some targets require
>presence 
>of DM.
>
>lvm2 commands are the way how to change your metadata properly.
>
>
>> And when you say vi editor, do you kindly mean direct edit of HEX
>values on 
>> the raw metadata?
>
>No way -  you can't change metadata on disk - unless you would be
>basically 
>precisely copying what lvm2 command does  -  so what would be the point
>??
>
>Simply use lvm2 command to make the job.  Unless I'm missing some
>important 
>point why would you need to work  with lvm2 metadata but without lvm2
>??
>
>
>> 
>> Thank you
>> 
>> If you kindly may have some link to some documentation, thank you
>even more
>> 
>> Though here it is not the configuration that got lost
>
>Well yeah - it will take some time - but i.e. RHEL storage
>documentation might 
>be a good way to go through it.
>
>
>
>> Also, additional info, we now got that all the cases do have active
>the 
>> thin-provisionin and looks like that these are additional/different
>metadata 
>> tables
>
>This-provisioning is handled by LVM2 only to provide  LVs for metadata
>and 
>data LVs - and then the  thinLVs to a user.
>
>Physical block layout for thin-provisioning is fully stored inside 
>thin-pool's metadata device.
>
>To explore those mappings you need to use tools like 'thin_dump',
>'thin_ls'
>
>> 
>> So if these got messed/corrupted...
>> 
>
>If these thin-pool metadata get corrupted, there is tool:
>'thin_repair'.
>
>Note: corruption of some high-level bTree nodes may result a severe
>damage to 
>whole metadata structure ->  i.e. lots of thinLVs being lost.
>
>It's a good idea to keep such metadata on some resilient type of
>storage 
>(raid) and of course  rule #1  - create regular backups of your thin 
>volumes...   (snapshot of thinLV is not a backup!).
>
>
>> In QNAP looks they have made some customization and so thin-provision
>LVM 
>> metadata are on a dedicated partition
>> 
>> we observed the HEX inside there and got partially the logic
>> 
>> About thin-provisioning, again, any "fsck"-like is available? (I
>suppose no, 
>> but just as confirmation)
>
>This tool is called  'thin_check'
>
>(and this tool is in fact executed with every thin-pool activation & 
>deactivation by default by lvm2)
>
>Note: just like with lvm2 metadata - also thin-pool's kernel metadata
>are 
>check-summed (protected agains disc bit corruptions), so again zero
>chance 
>with any 'hex-editor' to manipulate them - unless you would 'recreate' 
>thin-pool engine...
>
>
>Regards
>
>Zdenek

[-- Attachment #1.2: Type: text/html, Size: 4508 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-30  7:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 10:10 [linux-lvm] LVM2 Metadata structure, extents ordering, metadata corruptions Roberto Fastec
2022-09-29 10:52 ` Zdenek Kabelac
2022-09-29 11:15   ` Roberto Fastec
2022-09-29 11:41     ` Zdenek Kabelac
2022-09-29 12:12       ` Roberto Fastec
2022-09-29 11:48 ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).