linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Zdenek Kabelac <zdenek.kabelac@gmail.com>
To: Demi Marie Obenour <demi@invisiblethingslab.com>
Cc: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] LVM performance vs direct dm-thin
Date: Sun, 30 Jan 2022 18:43:13 +0100	[thread overview]
Message-ID: <849ab633-ec3d-a0a5-38bf-72b87bbba2c5@gmail.com> (raw)
In-Reply-To: <YfbApm+9ac6iBbbg@itl-email>

Dne 30. 01. 22 v 17:45 Demi Marie Obenour napsal(a):
> On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
>> Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):
>>> On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:
>>>> Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):
>>>>> How much slower are operations on an LVM2 thin pool compared to manually
>>>>> managing a dm-thin target via ioctls?  I am mostly concerned about
>>>>> volume snapshot, creation, and destruction.  Data integrity is very
>>>>> important, so taking shortcuts that risk data loss is out of the
>>>>> question.  However, the application may have some additional information
>>>>> that LVM2 does not have.  For instance, it may know that the volume that
>>>>> it is snapshotting is not in use, or that a certain volume it is
>>>>> creating will never be used after power-off.
>>>>>
>>>
>>>> So brave developers may always write their own management tools for their
>>>> constrained environment requirements that will by significantly faster in
>>>> terms of how many thins you could create per minute (btw you will need to
>>>> also consider dropping usage of udev on such system)
>>>
>>> What kind of constraints are you referring to?  Is it possible and safe
>>> to have udev running, but told to ignore the thins in question?
>>
>> Lvm2 is oriented more towards managing set of different disks,
>> where user is adding/removing/replacing them.  So it's more about
>> recoverability, good support for manual repair  (ascii metadata),
>> tracking history of changes,  backward compatibility, support
>> of conversion to different volume types (i.e. caching of thins, pvmove...)
>> Support for no/udev & no/systemd, clusters and nearly every linux distro
>> available... So there is a lot - and this all adds quite complexity.
> 
> I am certain it does, and that makes a lot of sense.  Thanks for the
> hard work!  Those features are all useful for Qubes OS, too — just not
> in the VM startup/shutdown path.
> 
>> So once you scratch all this - and you say you only care about single disc
>> then you are able to use more efficient metadata formats which you could
>> even keep permanently in memory during the lifetime - this all adds great
>> performance.
>>
>> But it all depends how you could constrain your environment.
>>
>> It's worth to mention there is lvm2 support for 'external' 'thin volume'
>> creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but thin
>> volume creation, activation, deactivation of thins is left to external tool.
>> This has been used by docker for a while - later on they switched to
>> overlayFs I believe..
> 
> That indeeds sounds like a good choice for Qubes OS.  It would allow the
> data and metadata LVs to be any volume type that lvm2 supports, and
> managed using all of lvm2’s features.  So one could still put the
> metadata on a RAID-10 volume while everything else is RAID-6, or set up
> a dm-cache volume to store the data (please correct me if I am wrong).
> Qubes OS has already moved to using a separate thin pool for virtual
> machines, as it prevents dom0 (privileged management VM) from being run
> out of disk space (by accident or malice).  That means that the thin
> pool use for guests is managed only by Qubes OS, and so the standard
> lvm2 tools do not need to touch it.
> 
> Is this a setup that you would recommend, and would be comfortable using
> in production?  As far as metadata is concerned, Qubes OS has its own
> XML file containing metadata about all qubes, which should suffice for
> this purpose.  To prevent races during updates and ensure automatic
> crash recovery, is it sufficient to store metadata for both new and old
> transaction IDs, and pick the correct one based on the device-mapper
> status line?  I have seen lvm2 get in an inconsistent state (transaction
> ID off by one) that required manual repair before, which is quite
> unnerving for a desktop OS.

My biased advice would be to stay with lvm2. There is lot of work, many things 
are not well documented and getting everything running correctly will take a 
lot of effort  (Docker in fact did not managed to do it well and was incapable 
to provide any recoverability)

> One feature that would be nice is to be able to import an
> externally-provided mapping of thin pool device numbers to LV names, so
> that lvm2 could provide a (read-only, and not guaranteed fresh) view of
> system state for reporting purposes.

Once you will have evidence it's the lvm2 causing major issue - you could 
consider whether it's worth to step into a separate project.


>>>> It's worth to mention - the more bullet-proof you will want to make your
>>>> project - the more closer to the extra processing made by lvm2 you will get.
>>>
>>> Why is this?  How does lvm2 compare to stratis, for example?
>>
>> Stratis is yet another volume manager written in Rust combined with XFS for
>> easier user experience. That's all I'd probably say about it...
> 
> That’s fine.  I guess my question is why making lvm2 bullet-proof needs
> so much overhead.

It's difficult - if you would be distributing lvm2 with exact kernel version & 
udev & systemd with a single linux distro - it reduces huge set of troubles...

>>>> However before you will step into these waters - you should probably
>>>> evaluate whether thin-pool actually meet your needs if you have that high
>>>> expectation for number of supported volumes - so you will not end up with
>>>> hyper fast snapshot creation while the actual usage then is not meeting your
>>>> needs...
>>>
>>> What needs are you thinking of specifically?  Qubes OS needs block
>>> devices, so filesystem-backed storage would require the use of loop
>>> devices unless I use ZFS zvols.  Do you have any specific
>>> recommendations?
>>
>> As long as you live in the world without crashes, buggy kernels, apps  and
>> failing hard drives everything looks very simple.
> 
> Would you mind explaining further?  LVM2 RAID and cache volumes should
> provide most of the benefits that Qubes OS desires, unless I am missing
> something.

I'm not familiar with QubesOS - but in many cases in real-life world we can't 
push to our users latest&greatest - so we need to live with bugs and add 
workarounds...

>> And every development costs quite some time & money.
> 
> That it does.
> 
>> Since you mentioned ZFS - you might want focus on using 'ZFS-only' solution.
>> Combining  ZFS or Btrfs with lvm2 is always going to be a painful way as
>> those filesystems have their own volume management.
> 
> Absolutely!  That said, I do wonder what your thoughts on using loop
> devices for VM storage are.  I know they are slower than thin volumes,
> but they are also much easier to manage, since they are just ordinary
> disk files.  Any filesystem with reflink can provide the needed
> copy-on-write support.

Chain filesystem->block_layer->filesystem->block_layer is something you most 
likely do not want to use for any well performing solution...
But it's ok for testing...

Regards

Zdenek



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

  reply	other threads:[~2022-01-30 17:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-29 20:34 [linux-lvm] LVM performance vs direct dm-thin Demi Marie Obenour
2022-01-29 21:32 ` Zdenek Kabelac
2022-01-30  0:32   ` Demi Marie Obenour
2022-01-30 10:52     ` Zdenek Kabelac
2022-01-30 16:45       ` Demi Marie Obenour
2022-01-30 17:43         ` Zdenek Kabelac [this message]
2022-01-30 20:27           ` Gionatan Danti
2022-01-30 21:17             ` Demi Marie Obenour
2022-01-31  7:52               ` Gionatan Danti
2022-02-02  2:09           ` Demi Marie Obenour
2022-02-02 10:04             ` Zdenek Kabelac
2022-02-03  0:23               ` Demi Marie Obenour
2022-02-03 12:04                 ` Zdenek Kabelac
2022-02-03 12:04                   ` Zdenek Kabelac
2022-01-30 21:39         ` Stuart D. Gathman
2022-01-30 22:14           ` Demi Marie Obenour
2022-01-31 21:29             ` Marian Csontos
2022-02-03  4:48               ` Demi Marie Obenour
2022-02-03 12:28                 ` Zdenek Kabelac
2022-02-04  0:01                   ` Demi Marie Obenour
2022-02-04 10:16                     ` Zdenek Kabelac
2022-01-31  7:47           ` Gionatan Danti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=849ab633-ec3d-a0a5-38bf-72b87bbba2c5@gmail.com \
    --to=zdenek.kabelac@gmail.com \
    --cc=demi@invisiblethingslab.com \
    --cc=linux-lvm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).