* [linux-lvm] Why is the performance of my lvmthin snapshot so poor
@ 2022-06-13 8:49 Zhiyong Ye
2022-06-14 7:04 ` Gionatan Danti
0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-13 8:49 UTC (permalink / raw)
To: linux-lvm
Hi all,
I am new to lvmthin. When I created snapshots using lvmthin, the write
performance of the original lvm was poor.
After I create thin lv with zeroing disabled, I first write the whole
volume with fio, then create snapshot, and finally test the write
performance of this volume with fio again. The performance after
creating a snapshot is very poor, only 10% of the thick lv, and also
much worse than the performance of the first write of thin lv. The
performance data for random writes in my environment fio is as follows:
case iops
thick lv 63043
thin lv 42130
snapshotted thin lv 5245
It is mentioned in the lvmthin main page under "Chunk size" that the
size of chunksize has an impact on snapshot performance. So I tested the
write performance after creating snapshots with different chunksize. The
data is shown below:
chunksize iops
64k 5245
256k 2115
1024k 509
The performance degradation after snapshotting is expected as writing to
a snapshotted lv involving reading the original data, writing it
elsewhere and then writing new data into the original chunk. But the
performance loss was so much more than I expected. Is there any way to
improve performance after creating a snapshot? Can I ask for your help?
Regards,
Zhiyong Ye
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-13 8:49 [linux-lvm] Why is the performance of my lvmthin snapshot so poor Zhiyong Ye
@ 2022-06-14 7:04 ` Gionatan Danti
2022-06-14 10:16 ` Zhiyong Ye
0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-14 7:04 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Zhiyong Ye
Il 2022-06-13 10:49 Zhiyong Ye ha scritto:
> The performance degradation after snapshotting is expected as writing
> to a snapshotted lv involving reading the original data, writing it
> elsewhere and then writing new data into the original chunk. But the
> performance loss was so much more than I expected. Is there any way to
> improve performance after creating a snapshot? Can I ask for your
> help?
This is the key point: when first writing to a new chunk, not only it
needs to be allocated, but old data must be copied. This r/m/w operation
transform an async operation (write) on a sync one (read), ruining
performance. Subsequent writes to the same chunk does have the same
issue.
The magnitute of the slowdown seems somewhat excessive, though. When
dealing with HDD pools, I remember a 3-5x impact on IOPs. Can you show
the exact fio command and the parameters of your thin pool (ie: chunk
size) and storage subsystem (HDD vs SSD, SATA vs SAS vs NVME)?
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-14 7:04 ` Gionatan Danti
@ 2022-06-14 10:16 ` Zhiyong Ye
2022-06-14 12:56 ` Gionatan Danti
0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-14 10:16 UTC (permalink / raw)
To: Gionatan Danti, LVM general discussion and development
Hi Gionatan,
Thanks for your reply and detailed answer.
I actually use iSCSI for my underlying storage and the bare disk random
write iops is 65543. Some of the parameters of my iSCSI initiator are as
follows:
node.session.iscsi.FirstBurstLength = 524288
node.session.iscsi.MaxBurstLength = 33552384
node.session.cmds_max = 4096
node.session.queue_depth = 1024
After creating the PV and VG based on the iSCSI device, I created the
thin pool as follows:
lvcreate -n pool -L 1000G test-vg
lvcreate -n poolmeta -L 100G test-vg
lvconvert --type thin-pool --chunksize 64k --poolmetadata
test-vg/poolmeta test-vg/pool
lvchange -Z n test-vg/pool
Then I create thin lv in the thin pool:
lvcreate -n test-thin -V 500G --thinpool pool test-vg
And my command for creating snapshots:
lvcreate -n test-thin1s1 -s test-vg/test-thin
I have the following fio parameter and use it for all tests:
[global]
bs=4k
direct=1
iodepth=32
numjobs=8
ioengine=libaio
group_reporting
runtime=120
time_based
filename=/dev/vdb
[rand-write]
name=rand-write
rw=randwrite
stonewall
Thanks again!
Zhiyong Ye
在 6/14/22 3:04 PM, Gionatan Danti 写道:
> Il 2022-06-13 10:49 Zhiyong Ye ha scritto:
>> The performance degradation after snapshotting is expected as writing
>> to a snapshotted lv involving reading the original data, writing it
>> elsewhere and then writing new data into the original chunk. But the
>> performance loss was so much more than I expected. Is there any way to
>> improve performance after creating a snapshot? Can I ask for your
>> help?
>
> This is the key point: when first writing to a new chunk, not only it
> needs to be allocated, but old data must be copied. This r/m/w operation
> transform an async operation (write) on a sync one (read), ruining
> performance. Subsequent writes to the same chunk does have the same issue.
>
> The magnitute of the slowdown seems somewhat excessive, though. When
> dealing with HDD pools, I remember a 3-5x impact on IOPs. Can you show
> the exact fio command and the parameters of your thin pool (ie: chunk
> size) and storage subsystem (HDD vs SSD, SATA vs SAS vs NVME)?
>
> Regards.
>
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-14 10:16 ` Zhiyong Ye
@ 2022-06-14 12:56 ` Gionatan Danti
2022-06-14 13:29 ` Zhiyong Ye
0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-14 12:56 UTC (permalink / raw)
To: Zhiyong Ye; +Cc: LVM general discussion and development
Il 2022-06-14 12:16 Zhiyong Ye ha scritto:
> After creating the PV and VG based on the iSCSI device, I created the
> thin pool as follows:
> lvcreate -n pool -L 1000G test-vg
> lvcreate -n poolmeta -L 100G test-vg
> lvconvert --type thin-pool --chunksize 64k --poolmetadata
> test-vg/poolmeta test-vg/pool
> lvchange -Z n test-vg/pool
I did my performance test with bigger chunk size, in the range of
128-512K. It can very well be that the overload of a smaller chunk size
results in 10x lower IOPs for to-be-allocated-and-copied chunks. Can you
retry fio after increasing chunk size?
As a side not, if I remember correcly thin pool metadata is hard limited
do 16 GB - no need to allocate 100 GB for it.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-14 12:56 ` Gionatan Danti
@ 2022-06-14 13:29 ` Zhiyong Ye
2022-06-14 14:54 ` Gionatan Danti
0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-14 13:29 UTC (permalink / raw)
To: Gionatan Danti; +Cc: LVM general discussion and development
在 6/14/22 8:56 PM, Gionatan Danti 写道:
> Il 2022-06-14 12:16 Zhiyong Ye ha scritto:
>> After creating the PV and VG based on the iSCSI device, I created the
>> thin pool as follows:
>> lvcreate -n pool -L 1000G test-vg
>> lvcreate -n poolmeta -L 100G test-vg
>> lvconvert --type thin-pool --chunksize 64k --poolmetadata
>> test-vg/poolmeta test-vg/pool
>> lvchange -Z n test-vg/pool
>
> I did my performance test with bigger chunk size, in the range of
> 128-512K. It can very well be that the overload of a smaller chunk size
> results in 10x lower IOPs for to-be-allocated-and-copied chunks. Can you
> retry fio after increasing chunk size?
Yes, I also tested the write performance after creating snapshots with
different chunksize. But the data shows that the larger the chunksize,
the worse the performance. The data is shown below:
chunksize iops
64k 5245
256k 2115
1024k 509
The reason for this may be that when the volume creates a snapshot, each
write to an existing block will cause a COW (Copy-on-write), and the COW
is a copy of the entire data block in chunksize, for example, when the
chunksize is 64k, even if only 4k of data is written, the entire 64k
data block will be copied. I'm not sure if I understand this correctly.
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-14 13:29 ` Zhiyong Ye
@ 2022-06-14 14:54 ` Gionatan Danti
2022-06-15 7:42 ` Zhiyong Ye
0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-14 14:54 UTC (permalink / raw)
To: Zhiyong Ye; +Cc: LVM general discussion and development
Il 2022-06-14 15:29 Zhiyong Ye ha scritto:
> The reason for this may be that when the volume creates a snapshot,
> each write to an existing block will cause a COW (Copy-on-write), and
> the COW is a copy of the entire data block in chunksize, for example,
> when the chunksize is 64k, even if only 4k of data is written, the
> entire 64k data block will be copied. I'm not sure if I understand
> this correctly.
Yes, in your case, the added copies are lowering total available IOPs.
But note how the decrease is sub-linear (from 64K to 1M you have a 16x
increase in chunk size but "only" a 10x hit in IOPs): this is due to the
lowered metadata overhead.
A last try: if you can, please regenerate your thin volume with 64K
chunks and set fio to execute 64K requests. Lets see if LVM is at least
smart enough to avoid coping a to-be-completely-overwritten chunks.
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-14 14:54 ` Gionatan Danti
@ 2022-06-15 7:42 ` Zhiyong Ye
2022-06-15 9:34 ` Gionatan Danti
2022-06-16 7:53 ` Demi Marie Obenour
0 siblings, 2 replies; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-15 7:42 UTC (permalink / raw)
To: Gionatan Danti; +Cc: LVM general discussion and development
在 6/14/22 10:54 PM, Gionatan Danti 写道:
> Il 2022-06-14 15:29 Zhiyong Ye ha scritto:
>> The reason for this may be that when the volume creates a snapshot,
>> each write to an existing block will cause a COW (Copy-on-write), and
>> the COW is a copy of the entire data block in chunksize, for example,
>> when the chunksize is 64k, even if only 4k of data is written, the
>> entire 64k data block will be copied. I'm not sure if I understand
>> this correctly.
>
> Yes, in your case, the added copies are lowering total available IOPs.
> But note how the decrease is sub-linear (from 64K to 1M you have a 16x
> increase in chunk size but "only" a 10x hit in IOPs): this is due to the
> lowered metadata overhead.
It seems that the consumption of COW copies when sending 4k requests is
much greater than the loss from metadata.
> A last try: if you can, please regenerate your thin volume with 64K
> chunks and set fio to execute 64K requests. Lets see if LVM is at least
> smart enough to avoid coping a to-be-completely-overwritten chunks.
I regenerated the thin volume with the chunksize of 64K and the random
write performance data tested with fio 64k requests is as follows:
case iops
thin lv 9381
snapshotted thin lv 8307
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-15 7:42 ` Zhiyong Ye
@ 2022-06-15 9:34 ` Gionatan Danti
2022-06-15 9:46 ` Zhiyong Ye
2022-06-16 7:53 ` Demi Marie Obenour
1 sibling, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-15 9:34 UTC (permalink / raw)
To: Zhiyong Ye; +Cc: LVM general discussion and development
Il 2022-06-15 09:42 Zhiyong Ye ha scritto:
> I regenerated the thin volume with the chunksize of 64K and the random
> write performance data tested with fio 64k requests is as follows:
> case iops
> thin lv 9381
> snapshotted thin lv 8307
As expected, increasing I/O size (to avoid r/m/w) greatly reduced the
issue (the ~11% hit is due to metadata allocation overhead).
I don't see anything wrong, so I think you had to live with the
previously recorded 10x performance hit when overwriting 4K blocks on a
64K chunk size thin volume...
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-15 9:34 ` Gionatan Danti
@ 2022-06-15 9:46 ` Zhiyong Ye
2022-06-15 12:40 ` Gionatan Danti
0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-15 9:46 UTC (permalink / raw)
To: Gionatan Danti; +Cc: LVM general discussion and development
在 6/15/22 5:34 PM, Gionatan Danti 写道:
> Il 2022-06-15 09:42 Zhiyong Ye ha scritto:
>> I regenerated the thin volume with the chunksize of 64K and the random
>> write performance data tested with fio 64k requests is as follows:
>> case iops
>> thin lv 9381
>> snapshotted thin lv 8307
>
> As expected, increasing I/O size (to avoid r/m/w) greatly reduced the
> issue (the ~11% hit is due to metadata allocation overhead).
>
> I don't see anything wrong, so I think you had to live with the
> previously recorded 10x performance hit when overwriting 4K blocks on a
> 64K chunk size thin volume...
I also think it meets expectations. But is there any other way to
optimize snapshot performance at the code level? Does it help to reduce
the chunksize size in the code, I see in the help documentation that the
chunksize can only be 64k minimum.
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-15 9:46 ` Zhiyong Ye
@ 2022-06-15 12:40 ` Gionatan Danti
2022-06-15 16:39 ` Demi Marie Obenour
0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-15 12:40 UTC (permalink / raw)
To: Zhiyong Ye; +Cc: LVM general discussion and development
Il 2022-06-15 11:46 Zhiyong Ye ha scritto:
> I also think it meets expectations. But is there any other way to
> optimize snapshot performance at the code level? Does it help to
> reduce the chunksize size in the code, I see in the help documentation
> that the chunksize can only be 64k minimum.
I don't think forcing the code to use smaller recordsize is a good idea.
Considering the hard limit on metadata size (16 GB max), 64K chunks are
good for ~16 TB thin pool - already relatively small.
A, say, 16K recordsize would be good for a 4 TB pool only, an so on.
Moreover, sequential performance will significantly suffer.
I think you have to accept the performance hit on first chunck
allocation & rewrite.
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-15 12:40 ` Gionatan Danti
@ 2022-06-15 16:39 ` Demi Marie Obenour
0 siblings, 0 replies; 15+ messages in thread
From: Demi Marie Obenour @ 2022-06-15 16:39 UTC (permalink / raw)
To: LVM general discussion and development, Zhiyong Ye
[-- Attachment #1.1: Type: text/plain, Size: 998 bytes --]
On Wed, Jun 15, 2022 at 02:40:29PM +0200, Gionatan Danti wrote:
> Il 2022-06-15 11:46 Zhiyong Ye ha scritto:
> > I also think it meets expectations. But is there any other way to
> > optimize snapshot performance at the code level? Does it help to
> > reduce the chunksize size in the code, I see in the help documentation
> > that the chunksize can only be 64k minimum.
>
> I don't think forcing the code to use smaller recordsize is a good idea.
> Considering the hard limit on metadata size (16 GB max), 64K chunks are good
> for ~16 TB thin pool - already relatively small.
>
> A, say, 16K recordsize would be good for a 4 TB pool only, an so on.
> Moreover, sequential performance will significantly suffer.
>
> I think you have to accept the performance hit on first chunck allocation &
> rewrite.
I seriously hope this will be fixed in dm-thin v2. It’s a significant
problem for Qubes OS.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 202 bytes --]
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-15 7:42 ` Zhiyong Ye
2022-06-15 9:34 ` Gionatan Danti
@ 2022-06-16 7:53 ` Demi Marie Obenour
2022-06-16 13:22 ` Gionatan Danti
1 sibling, 1 reply; 15+ messages in thread
From: Demi Marie Obenour @ 2022-06-16 7:53 UTC (permalink / raw)
To: LVM general discussion and development, Gionatan Danti
[-- Attachment #1.1: Type: text/plain, Size: 1935 bytes --]
On Wed, Jun 15, 2022 at 03:42:17PM +0800, Zhiyong Ye wrote:
>
>
> 在 6/14/22 10:54 PM, Gionatan Danti 写道:
> > Il 2022-06-14 15:29 Zhiyong Ye ha scritto:
> > > The reason for this may be that when the volume creates a snapshot,
> > > each write to an existing block will cause a COW (Copy-on-write), and
> > > the COW is a copy of the entire data block in chunksize, for example,
> > > when the chunksize is 64k, even if only 4k of data is written, the
> > > entire 64k data block will be copied. I'm not sure if I understand
> > > this correctly.
> >
> > Yes, in your case, the added copies are lowering total available IOPs.
> > But note how the decrease is sub-linear (from 64K to 1M you have a 16x
> > increase in chunk size but "only" a 10x hit in IOPs): this is due to the
> > lowered metadata overhead.
>
> It seems that the consumption of COW copies when sending 4k requests is much
> greater than the loss from metadata.
>
> > A last try: if you can, please regenerate your thin volume with 64K
> > chunks and set fio to execute 64K requests. Lets see if LVM is at least
> > smart enough to avoid coping a to-be-completely-overwritten chunks.
>
> I regenerated the thin volume with the chunksize of 64K and the random write
> performance data tested with fio 64k requests is as follows:
> case iops
> thin lv 9381
> snapshotted thin lv 8307
That seems reasonable. My conclusion is that dm-thin (which is what LVM
uses) is not a good fit for workloads with a lot of small random writes
and frequent snapshots, due to the 64k minimum chunk size. This also
explains why dm-thin does not allow smaller blocks: not only would it
only support very small thin pools, it would also have massive metadata
write overhead. Hopefully dm-thin v2 will improve the situation.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 202 bytes --]
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-16 7:53 ` Demi Marie Obenour
@ 2022-06-16 13:22 ` Gionatan Danti
2022-06-16 16:19 ` Demi Marie Obenour
0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-16 13:22 UTC (permalink / raw)
To: Demi Marie Obenour; +Cc: LVM general discussion and development
Il 2022-06-16 09:53 Demi Marie Obenour ha scritto:
> That seems reasonable. My conclusion is that dm-thin (which is what
> LVM
> uses) is not a good fit for workloads with a lot of small random writes
> and frequent snapshots, due to the 64k minimum chunk size. This also
> explains why dm-thin does not allow smaller blocks: not only would it
> only support very small thin pools, it would also have massive metadata
> write overhead. Hopefully dm-thin v2 will improve the situation.
I think that, in this case, no free lunch really exists. I tried the
following thin provisioning methods, each with its strong & weak points:
lvmthin: probably the more flexible of the mainline kernel options. You
pay for r/m/w only when allocating a small block (say 4K) the first time
after taking a snapshot. It is fast and well integrated with lvm command
line. Con: bad behavior on out-of-space condition
xfs + reflink: a great, simple to use tool when applicable. It has a
very small granularity (4K) with no r/m/w. Cons: requires fine tuning
for good performance when reflinking big files; IO freezes during
metadata copy for reflink; a very small granularity means sequential IO
is going to suffer heavily (see here for more details:
https://marc.info/?l=linux-xfs&m=157891132109888&w=2)
btrfs: very small granularity (4K) and many integrated features. Cons:
bad performance overall, especially when using mechanical HDD
vdo: is provides small granularity (4K) thin provisioning, compression
and deduplication. Cons: (still) out-of-tree; requires a powerloss
protected writeback cache to maintain good performance; no snapshot
capability
zfs: designed for the ground up for pervasive CoW, with many features
and ARC/L2ARC. Cons: out-of-tree; using small granularity (4K) means bad
overall performance; using big granularity (128K by default) is a
necessary compromise for most HDD pools.
For what it is worth, I settled on ZFS when using out-of-tree modules is
not an issue and lvmthin otherwise (but I plan to use xfs + reflink more
in the future).
Do you have any information to share about dm-thin v2? I heard about it
some years ago, but I found no recent info.
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-16 13:22 ` Gionatan Danti
@ 2022-06-16 16:19 ` Demi Marie Obenour
2022-06-16 19:50 ` Gionatan Danti
0 siblings, 1 reply; 15+ messages in thread
From: Demi Marie Obenour @ 2022-06-16 16:19 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1.1: Type: text/plain, Size: 3444 bytes --]
On Thu, Jun 16, 2022 at 03:22:09PM +0200, Gionatan Danti wrote:
> Il 2022-06-16 09:53 Demi Marie Obenour ha scritto:
> > That seems reasonable. My conclusion is that dm-thin (which is what LVM
> > uses) is not a good fit for workloads with a lot of small random writes
> > and frequent snapshots, due to the 64k minimum chunk size. This also
> > explains why dm-thin does not allow smaller blocks: not only would it
> > only support very small thin pools, it would also have massive metadata
> > write overhead. Hopefully dm-thin v2 will improve the situation.
>
> I think that, in this case, no free lunch really exists. I tried the
> following thin provisioning methods, each with its strong & weak points:
>
> lvmthin: probably the more flexible of the mainline kernel options. You pay
> for r/m/w only when allocating a small block (say 4K) the first time after
> taking a snapshot. It is fast and well integrated with lvm command line.
> Con: bad behavior on out-of-space condition
Also, the LVM command line is slow, and there is very large write
amplification with lots of random writes immediately after taking a
snapshot. Furthermore, because of the mismatch between the dm-thin
block size and the filesystem block size, fstrim might not reclaim as
much space in the pool as one would expect.
> xfs + reflink: a great, simple to use tool when applicable. It has a very
> small granularity (4K) with no r/m/w. Cons: requires fine tuning for good
> performance when reflinking big files; IO freezes during metadata copy for
> reflink; a very small granularity means sequential IO is going to suffer
> heavily (see here for more details:
> https://marc.info/?l=linux-xfs&m=157891132109888&w=2)
Also heavy fragmentation can make journal replay very slow, to the point
of taking days on spinning hard drives. Dave Chinner explains this here:
https://lore.kernel.org/linux-xfs/20220509230918.GP1098723@dread.disaster.area/.
> btrfs: very small granularity (4K) and many integrated features. Cons: bad
> performance overall, especially when using mechanical HDD
Also poor out-of-space handling and unbounded worst-case latency.
> vdo: is provides small granularity (4K) thin provisioning, compression and
> deduplication. Cons: (still) out-of-tree; requires a powerloss protected
> writeback cache to maintain good performance; no snapshot capability
>
> zfs: designed for the ground up for pervasive CoW, with many features and
> ARC/L2ARC. Cons: out-of-tree; using small granularity (4K) means bad overall
> performance; using big granularity (128K by default) is a necessary
> compromise for most HDD pools.
Is this still a problem on NVMe storage? HDDs will not really be fast
no matter what one does, at least unless there is a write-back cache
that can convert random I/O to sequential I/O. Even that only helps
much if your working set fits in cache, or if your workload is
write-mostly.
> For what it is worth, I settled on ZFS when using out-of-tree modules is not
> an issue and lvmthin otherwise (but I plan to use xfs + reflink more in the
> future).
>
> Do you have any information to share about dm-thin v2? I heard about it some
> years ago, but I found no recent info.
It does not exist yet. Joe Thornber would be the person to ask
regarding any plans to create it.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 202 bytes --]
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
2022-06-16 16:19 ` Demi Marie Obenour
@ 2022-06-16 19:50 ` Gionatan Danti
0 siblings, 0 replies; 15+ messages in thread
From: Gionatan Danti @ 2022-06-16 19:50 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Demi Marie Obenour
Il 2022-06-16 18:19 Demi Marie Obenour ha scritto:
> Also heavy fragmentation can make journal replay very slow, to the
> point
> of taking days on spinning hard drives. Dave Chinner explains this
> here:
> https://lore.kernel.org/linux-xfs/20220509230918.GP1098723@dread.disaster.area/.
Thanks, the linked thread was very interesting.
> Also poor out-of-space handling and unbounded worst-case latency.
Very true.
> Is this still a problem on NVMe storage? HDDs will not really be fast
> no matter what one does, at least unless there is a write-back cache
> that can convert random I/O to sequential I/O. Even that only helps
> much if your working set fits in cache, or if your workload is
> write-mostly.
One of the key features of ZFS is to transform random writes into
sequential ones. With the right recordsize, and coupled with prefetch,
compressed ARC and L2ARC, even HDD pool can be surprisingly usable.
For NVMe pools you should use a much lower recordsize to avoid
read/write amplification, but not lower than 16K to not impair
compression efficiency (unless you are storing mostly uncompressible
stuff). That said, for pure NVMe storage (no compression or other data
transformations) I think XFS, possibly with direct IO, is the fastest
choice by a factor of 2x.
> It does not exist yet. Joe Thornber would be the person to ask
> regarding any plans to create it.
Ok - I was hoping to miss something, but it is not the case.
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-06-16 19:51 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-13 8:49 [linux-lvm] Why is the performance of my lvmthin snapshot so poor Zhiyong Ye
2022-06-14 7:04 ` Gionatan Danti
2022-06-14 10:16 ` Zhiyong Ye
2022-06-14 12:56 ` Gionatan Danti
2022-06-14 13:29 ` Zhiyong Ye
2022-06-14 14:54 ` Gionatan Danti
2022-06-15 7:42 ` Zhiyong Ye
2022-06-15 9:34 ` Gionatan Danti
2022-06-15 9:46 ` Zhiyong Ye
2022-06-15 12:40 ` Gionatan Danti
2022-06-15 16:39 ` Demi Marie Obenour
2022-06-16 7:53 ` Demi Marie Obenour
2022-06-16 13:22 ` Gionatan Danti
2022-06-16 16:19 ` Demi Marie Obenour
2022-06-16 19:50 ` Gionatan Danti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).