* [linux-lvm] Higher than expected metadata usage?
@ 2018-03-27 7:44 Gionatan Danti
2018-03-27 8:30 ` Zdenek Kabelac
2018-03-27 10:39 ` Zdenek Kabelac
0 siblings, 2 replies; 9+ messages in thread
From: Gionatan Danti @ 2018-03-27 7:44 UTC (permalink / raw)
To: linux-lvm
Hi all,
I can't wrap my head on the following reported data vs metadata usage
before/after a snapshot deletion.
System is an updated CentOS 7.4 x64
BEFORE SNAP DEL:
[root@ ~]# lvs
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert
000-ThinPool vg_storage twi-aot--- 7.21t 80.26
56.88
Storage vg_storage Vwi-aot--- 7.10t 000-ThinPool 76.13
ZZZSnap vg_storage Vwi---t--k 7.10t 000-ThinPool Storage
As you can see, a ~80% full data pool resulted in a ~57% metadata usage
AFTER SNAP DEL:
[root@ ~]# lvremove vg_storage/ZZZSnap
Logical volume "ZZZSnap" successfully removed
[root@ ~]# lvs
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert
000-ThinPool vg_storage twi-aot--- 7.21t 74.95
36.94
Storage vg_storage Vwi-aot--- 7.10t 000-ThinPool 76.13
Now data is at ~75 (5% lower), but metadata is at only ~37%: a whopping
20% metadata difference for a mere 5% data freed.
This was unexpected: I thought there was a more or less linear relation
between data and metadata usage as, after all, the first is about
allocated chunks tracked by the latter. I know that snapshots pose
additional overhead on metadata tracking, but based on previous tests I
expected this overhead to be much smaller. In this case, we are speaking
about a 4X amplification for a single snapshot. This is concerning
because I want to *never* run out of metadata space.
If it can help, just after taking the snapshot I sparsified some file on
the mounted filesystem, *without* fstrimming it (so, from lvmthin
standpoint, nothing changed on chunk allocation).
What am I missing? Is the "data%" field a measure of how many data
chunks are allocated, or does it even track "how full" are these data
chunks? This would benignly explain the observed discrepancy, as a
partially-full data chunks can be used to store other data without any
new metadata allocation.
Full LVM information:
[root@ ~]# lvs -a -o +chunk_size
LV VG Attr LSize Pool
Origin Data% Meta% Move Log Cpy%Sync Convert Chunk
000-ThinPool vg_storage twi-aot--- 7.21t
74.95 36.94 4.00m
[000-ThinPool_tdata] vg_storage Twi-ao---- 7.21t
0
[000-ThinPool_tmeta] vg_storage ewi-ao---- 116.00m
0
Storage vg_storage Vwi-aot--- 7.10t 000-ThinPool
76.13 0
[lvol0_pmspare] vg_storage ewi------- 116.00m
0
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 7:44 [linux-lvm] Higher than expected metadata usage? Gionatan Danti
@ 2018-03-27 8:30 ` Zdenek Kabelac
2018-03-27 9:40 ` Gionatan Danti
2018-03-27 10:39 ` Zdenek Kabelac
1 sibling, 1 reply; 9+ messages in thread
From: Zdenek Kabelac @ 2018-03-27 8:30 UTC (permalink / raw)
To: LVM general discussion and development, Gionatan Danti
Dne 27.3.2018 v 09:44 Gionatan Danti napsal(a):
> Hi all,
> I can't wrap my head on the following reported data vs metadata usage
> before/after a snapshot deletion.
>
> System is an updated CentOS 7.4 x64
>
> BEFORE SNAP DEL:
> [root@ ~]# lvs
> � LV���������� VG�������� Attr������ LSize� Pool�������� Origin� Data% Meta%
> Move Log Cpy%Sync Convert
> � 000-ThinPool vg_storage twi-aot---� 7.21t��������������������� 80.26 56.88
> � Storage����� vg_storage Vwi-aot---� 7.10t 000-ThinPool�������� 76.13
> � ZZZSnap����� vg_storage Vwi---t--k� 7.10t 000-ThinPool Storage
>
> As you can see, a ~80% full data pool resulted in a ~57% metadata usage
>
> AFTER SNAP DEL:
> [root@ ~]# lvremove vg_storage/ZZZSnap
> � Logical volume "ZZZSnap" successfully removed
> [root@ ~]# lvs
> � LV���������� VG�������� Attr������ LSize� Pool�������� Origin Data% Meta%
> Move Log Cpy%Sync Convert
> � 000-ThinPool vg_storage twi-aot---� 7.21t�������������������� 74.95 36.94
> � Storage����� vg_storage Vwi-aot---� 7.10t 000-ThinPool������� 76.13
>
> Now data is at ~75 (5% lower), but metadata is at only ~37%: a whopping 20%
> metadata difference for a mere 5% data freed.
>
> This was unexpected: I thought there was a more or less linear relation
> between data and metadata usage as, after all, the first is about allocated
> chunks tracked by the latter. I know that snapshots pose additional overhead
> on metadata tracking, but based on previous tests I expected this overhead to
> be much smaller. In this case, we are speaking about a 4X amplification for a
> single snapshot. This is concerning because I want to *never* run out of
> metadata space.
>
> If it can help, just after taking the snapshot I sparsified some file on the
> mounted filesystem, *without* fstrimming it (so, from lvmthin standpoint,
> nothing changed on chunk allocation).
>
> What am I missing? Is the "data%" field a measure of how many data chunks are
> allocated, or does it even track "how full" are these data chunks? This would
> benignly explain the observed discrepancy, as a partially-full data chunks can
> be used to store other data without any new metadata allocation.
>
> Full LVM information:
>
> [root@ ~]# lvs -a -o +chunk_size
> � LV������������������ VG�������� Attr������ LSize�� Pool Origin Data%
> Meta%� Move Log Cpy%Sync Convert Chunk
> � 000-ThinPool�������� vg_storage twi-aot---�� 7.21t �74.95
> 36.94��������������������������� 4.00m
> � [000-ThinPool_tdata] vg_storage Twi-ao----�� 7.21t
> ������������������������������������������� 0
> � [000-ThinPool_tmeta] vg_storage ewi-ao---- 116.00m
> ������������������������������������������� 0
> � Storage������������� vg_storage Vwi-aot---�� 7.10t 000-ThinPool
> �76.13������������������������������������� 0
> � [lvol0_pmspare]����� vg_storage ewi------- 116.00m
> ������������������������������������������� 0
>
Hi
Well just for the 1st. look - 116MB for metadata for 7.21TB is *VERY* small
size. I'm not sure what is the data 'chunk-size' - but you will need to
extend pool's metadata sooner or later considerably - I'd suggest at least
2-4GB for this data size range.
Metadata itself are also allocated in some internal chunks - so releasing a
thin-volume doesn't necessarily free space in the whole metadata chunks thus
such chunk remains allocated and there is not a more detailed free-space
tracking as space in chunks is shared between multiple thin volumes and is
related to efficient storage of b-Trees...
There is no 'direct' connection between releasing space in data and metadata
volume - so it's quite natural you will see different percentage of free space
after thin volume removal between those two volumes.
The only problem would be if repeated operation would lead to some permanent
growth....
Regards
Zdenek
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 8:30 ` Zdenek Kabelac
@ 2018-03-27 9:40 ` Gionatan Danti
2018-03-27 10:18 ` Zdenek Kabelac
0 siblings, 1 reply; 9+ messages in thread
From: Gionatan Danti @ 2018-03-27 9:40 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 27/03/2018 10:30, Zdenek Kabelac wrote:
> Hi
>
> Well just for the 1st. look -� 116MB for metadata for 7.21TB is *VERY*
> small size. I'm not sure what is the data 'chunk-size'� - but you will
> need to extend pool's metadata sooner or later considerably - I'd
> suggest at least 2-4GB for this data size range.
Hi Zdenek,
as shown by the last lvs command, data chunk size is at 4MB. Data chunk
size and metadata volume size where automatically selected at thin pool
creation - ie: they are default values.
Indeed, running "thin_metadata_size -b4m -s7t -m1000 -um" show
"thin_metadata_size - 60.80 mebibytes estimated metadata area size"
> Metadata itself are also allocated in some internal chunks - so
> releasing a thin-volume doesn't necessarily free space in the whole
> metadata chunks thus such chunk remains allocated and there is not a
> more detailed free-space tracking as space in chunks is shared between
> multiple thin volumes and is related to efficient storage of b-Trees...
Ok, so removing a snapshot/volume can free a lower than expected
metadata amount. I fully understand that. However, I saw the *reverse*:
removing a volume shrunk metadata (much) more than expected. This also
mean that snapshot creation and data writes on the main volume caused a
*much* larger than expected increase in metadata usage.
> There is no 'direct' connection between releasing space in data and
> metadata volume - so it's quite natural you will see different
> percentage of free space after thin volume removal between those two
> volumes.
I understand that if data is shared between two or more volumes,
deleting a volume will not change much from a metadata standpoint.
However, this is true for the data pool also: it will continue to show
the same utilization. After all, removing a shared volume only means
that data chunk are mapped in another volume.
However, I was under impression that a more or less direct connection
between allocated pool data chunk and metadata existed: otherwise, a
tool as thin_metadata_size lose its scope.
So, where am I wrong?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 9:40 ` Gionatan Danti
@ 2018-03-27 10:18 ` Zdenek Kabelac
2018-03-27 10:58 ` Gionatan Danti
0 siblings, 1 reply; 9+ messages in thread
From: Zdenek Kabelac @ 2018-03-27 10:18 UTC (permalink / raw)
To: Gionatan Danti, LVM general discussion and development
Dne 27.3.2018 v 11:40 Gionatan Danti napsal(a):
> On 27/03/2018 10:30, Zdenek Kabelac wrote:
>> Hi
>>
>> Well just for the 1st. look -� 116MB for metadata for 7.21TB is *VERY* small
>> size. I'm not sure what is the data 'chunk-size'� - but you will need to
>> extend pool's metadata sooner or later considerably - I'd suggest at least
>> 2-4GB for this data size range.
>
> Hi Zdenek,
> as shown by the last lvs command, data chunk size is at 4MB. Data chunk size
> and metadata volume size where automatically selected at thin pool creation -
> ie: they are default values.
>
> Indeed, running "thin_metadata_size -b4m -s7t -m1000 -um" show
> "thin_metadata_size - 60.80 mebibytes estimated metadata area size"
>
>> Metadata itself are also allocated in some internal chunks - so releasing a
>> thin-volume doesn't necessarily free space in the whole metadata chunks thus
>> such chunk remains allocated and there is not a more detailed free-space
>> tracking as space in chunks is shared between multiple thin volumes and is
>> related to efficient storage of b-Trees...
>
> Ok, so removing a snapshot/volume can free a lower than expected metadata
> amount. I fully understand that. However, I saw the *reverse*: removing a
> volume shrunk metadata (much) more than expected. This also mean that snapshot
> creation and data writes on the main volume caused a *much* larger than
> expected increase in metadata usage.
As said - the 'metadata' usage is chunk-based and it's journal driven (i.e.
there is never in-place overwrite of valid data) - so the data storage pattern
always depends on existing layout and its transition to new state.
>
>> There is no 'direct' connection between releasing space in data and metadata
>> volume - so it's quite natural you will see different percentage of free
>> space after thin volume removal between those two volumes.
>
> I understand that if data is shared between two or more volumes, deleting a
> volume will not change much from a metadata standpoint. However, this is true
> for the data pool also: it will continue to show the same utilization. After
> all, removing a shared volume only means that data chunk are mapped in another
> volume.
>
> However, I was under impression that a more or less direct connection between
> allocated pool data chunk and metadata existed: otherwise, a tool as
> thin_metadata_size lose its scope.
>
> So, where am I wrong?
Tool for size estimation is giving some 'rough' first guess/first choice number.
The metadata usage is based in real-word data manipulation - so while it's
relatively easy to 'cup' a single thin LV metadata usage - once there is a
lot of sharing between many different volumes - the exact size estimation
is difficult - as it depend on the order how the 'btree' has been constructed.
I.e. it is surely true the i.e. defragmentation of thin-pool may give you a
more compact tree consuming less space - but the amount of work needed to get
thin-pool into the most optimal configuration doesn't pay off. So you need to
live with cases, where the metadata usage behaves in a bit unpredictable
manner - since it's more preferred speed over the smallest consumed space -
which could be very pricey in terms of CPU and memory usage.
So as it has been said - metadata is 'accounted' in chunks for a userspace app
(like lvm2 is or what you get with 'dmsetup status') - but how much free space
is left in these individual chunks is kernel internal...
It's time to move on, you address 7TB and you 'extremely' care about couple MB
'hint here' - try to investigate how much space is wasted in filesystem itself ;)
Regards
Zdenek
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 10:18 ` Zdenek Kabelac
@ 2018-03-27 10:58 ` Gionatan Danti
2018-03-27 11:06 ` Gionatan Danti
0 siblings, 1 reply; 9+ messages in thread
From: Gionatan Danti @ 2018-03-27 10:58 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 27/03/2018 12:18, Zdenek Kabelac wrote:
> Tool for size estimation is giving some 'rough' first guess/first choice
> number.
>
> The metadata usage is based in real-word data manipulation - so while
> it's relatively easy to 'cup'� a single thin LV metadata usage - once
> there is a lot of sharing between many different volumes - the exact
> size estimation
> is difficult - as it depend on the order how the 'btree' has been
> constructed.
>
> I.e. it is surely true the i.e. defragmentation of thin-pool may give
> you a more compact tree consuming less space - but the amount of work
> needed to get thin-pool into the most optimal configuration doesn't pay
> off.� So you need to live with cases, where the metadata usage behaves
> in a bit unpredictable manner - since it's more preferred speed over the
> smallest consumed space - which could be very pricey in terms of CPU and
> memory usage.
>
> So as it has been said - metadata is 'accounted' in chunks for a
> userspace app (like lvm2 is or what you get with 'dmsetup status') - but
> how much free space is left in these individual chunks is kernel
> internal...
Ok, understood.
> It's time to move on, you address 7TB and you 'extremely' care about
> couple MB 'hint here' - try to investigate how much space is wasted in
> filesystem itself ;)
Mmm no, I am caring for the couple MBs themselves. I was concerned about
the possibility to get a full metadata device by writing far less data
than expected. But I now get the point.
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 10:58 ` Gionatan Danti
@ 2018-03-27 11:06 ` Gionatan Danti
0 siblings, 0 replies; 9+ messages in thread
From: Gionatan Danti @ 2018-03-27 11:06 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 27/03/2018 12:58, Gionatan Danti wrote:
> Mmm no, I am caring for the couple MBs themselves. I was concerned about
> the possibility to get a full metadata device by writing far less data
> than expected. But I now get the point.
Sorry, I really meant "I am NOT caring for the couple MBs themselves"
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 7:44 [linux-lvm] Higher than expected metadata usage? Gionatan Danti
2018-03-27 8:30 ` Zdenek Kabelac
@ 2018-03-27 10:39 ` Zdenek Kabelac
2018-03-27 11:05 ` Gionatan Danti
1 sibling, 1 reply; 9+ messages in thread
From: Zdenek Kabelac @ 2018-03-27 10:39 UTC (permalink / raw)
To: LVM general discussion and development, Gionatan Danti
Dne 27.3.2018 v 09:44 Gionatan Danti napsal(a):
> What am I missing? Is the "data%" field a measure of how many data chunks are
> allocated, or does it even track "how full" are these data chunks? This would
> benignly explain the observed discrepancy, as a partially-full data chunks can
> be used to store other data without any new metadata allocation.
>
Hi
I've forget to mention there is "thin_ls" tool (comes with
device-mapper-persistent-data package (with thin_check) - for those who want
to know precise amount of allocation and what amount of blocks is owned
exclusively by a single thinLV and what is shared.
It's worth to note - numbers printed by 'lvs' are *JUST* really rough
estimations of data usage for both thin_pool & thin_volumes.
Kernel is not maintaining full data-set - only a needed portion of it - and
since 'detailed' precise evaluation is expensive it's deferred to the tool
thin_ls...
And last but not least comment - when you pointed out 4MB extent usage - it's
relatively huge chunk - and if the 'fstrim' wants to succeed - those 4MB
blocks fitting thin-pool chunks needs to be fully released.
So i.e. if there are some 'sparse' filesystem metadata blocks places - they
may prevent TRIM to successeed - so while your filesystem may have a lot of
free space for its data - the actually amount if physically trimmed space can
be much much smaller.
So beware if the 4MB chunk-size for a thin-pool is good fit here....
The smaller the chunk is - the better change of TRIM there is...
For heavily fragmented XFS even 64K chunks might be a challenge....
Regards
Zdenek
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 10:39 ` Zdenek Kabelac
@ 2018-03-27 11:05 ` Gionatan Danti
2018-03-27 12:52 ` Zdenek Kabelac
0 siblings, 1 reply; 9+ messages in thread
From: Gionatan Danti @ 2018-03-27 11:05 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 27/03/2018 12:39, Zdenek Kabelac wrote:
> Hi
>
> I've forget to mention� there is� "thin_ls" tool (comes with
> device-mapper-persistent-data package (with thin_check) - for those who
> want to know precise amount of allocation and what amount of blocks is
> owned exclusively by a single thinLV and what is shared.
>
> It's worth to note - numbers printed by 'lvs' are *JUST* really rough
> estimations of data usage for both� thin_pool & thin_volumes.
>
> Kernel is not maintaining full data-set - only a needed portion of it -
> and since 'detailed' precise evaluation is expensive it's deferred to
> the tool thin_ls...
Ok, thanks for the remind about "thin_ls" (I often forgot about these
"minor" but very useful utilities...)
> And last but not least comment -� when you pointed out 4MB extent usage
> - it's relatively huge chunk - and if the 'fstrim' wants to succeed -
> those 4MB blocks fitting thin-pool chunks needs to be fully released. >
> So i.e. if there are some 'sparse' filesystem metadata blocks places -
> they may prevent TRIM to successeed - so while your filesystem may have
> a lot of free space for its data - the actually amount if physically
> trimmed space can be much much smaller.
>
> So beware if the 4MB chunk-size for a thin-pool is good fit here....
> The smaller the chunk is - the better change of TRIM there is...
Sure, I understand that. Anyway, please note that 4MB chunk size was
*automatically* chosen by the system during pool creation. It seems to
me that the default is to constrain the metadata volume to be < 128 MB,
right?
> For heavily fragmented XFS even 64K chunks might be a challenge....
True, but chunk size *always* is a performance/efficiency tradeoff.
Making a 64K chunk-sided volume will end with even more fragmentation
for the underlying disk subsystem. Obviously, if many snapshot are
expected, a small chunk size is the right choice (CoW filesystem as
BTRFS and ZFS face similar problems, by the way).
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-lvm] Higher than expected metadata usage?
2018-03-27 11:05 ` Gionatan Danti
@ 2018-03-27 12:52 ` Zdenek Kabelac
0 siblings, 0 replies; 9+ messages in thread
From: Zdenek Kabelac @ 2018-03-27 12:52 UTC (permalink / raw)
To: Gionatan Danti, LVM general discussion and development
Dne 27.3.2018 v 13:05 Gionatan Danti napsal(a):
> On 27/03/2018 12:39, Zdenek Kabelac wrote:
>> Hi
>>
>> And last but not least comment -� when you pointed out 4MB extent usage -
>> it's relatively huge chunk - and if the 'fstrim' wants to succeed - those
>> 4MB blocks fitting thin-pool chunks needs to be fully released. >
>> So i.e. if there are some 'sparse' filesystem metadata blocks places - they
>> may prevent TRIM to successeed - so while your filesystem may have a lot of
>> free space for its data - the actually amount if physically trimmed space
>> can be much much smaller.
>>
>> So beware if the 4MB chunk-size for a thin-pool is good fit here....
>> The smaller the chunk is - the better change of TRIM there is...
>
> Sure, I understand that. Anyway, please note that 4MB chunk size was
> *automatically* chosen by the system during pool creation. It seems to me that
> the default is to constrain the metadata volume to be < 128 MB, right?
Yes - on default lvm2 'targets' to fit metadata into this 128MB size.
Obviously there is nothing like 'one size fits all' - so it really the user
thinks about the use-case and pick better parameters then defaults.
Size 128MB is picked to have metadata that easily fit in RAM.
>> For heavily fragmented XFS even 64K chunks might be a challenge....
>
> True, but chunk size *always* is a performance/efficiency tradeoff. Making a
> 64K chunk-sided volume will end with even more fragmentation for the
> underlying disk subsystem. Obviously, if many snapshot are expected, a small
> chunk size is the right choice (CoW filesystem as BTRFS and ZFS face similar
> problems, by the way).
Yep - the smaller the chunk is - the less 'max' size of data device can be
supported as there is final number of chunks you can address from maximal
metadata size which is ~16GB and can't get any bigger.
The bigger the chunk is - the less sharing in snapshot happens, but it gets
less fragments.
Regards
Zdenek
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-03-27 12:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-27 7:44 [linux-lvm] Higher than expected metadata usage? Gionatan Danti
2018-03-27 8:30 ` Zdenek Kabelac
2018-03-27 9:40 ` Gionatan Danti
2018-03-27 10:18 ` Zdenek Kabelac
2018-03-27 10:58 ` Gionatan Danti
2018-03-27 11:06 ` Gionatan Danti
2018-03-27 10:39 ` Zdenek Kabelac
2018-03-27 11:05 ` Gionatan Danti
2018-03-27 12:52 ` Zdenek Kabelac
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).