[linux-lvm] Why is the performance of my lvmthin snapshot so poor

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] Why is the performance of my lvmthin snapshot so poor
@ 2022-06-13  8:49 Zhiyong Ye
  2022-06-14  7:04 ` Gionatan Danti
  0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-13  8:49 UTC (permalink / raw)
  To: linux-lvm

Hi all,

I am new to lvmthin. When I created snapshots using lvmthin, the write 
performance of the original lvm was poor.

After I create thin lv with zeroing disabled, I first write the whole 
volume with fio, then create snapshot, and finally test the write 
performance of this volume with fio again. The performance after 
creating a snapshot is very poor, only 10% of the thick lv, and also 
much worse than the performance of the first write of thin lv. The 
performance data for random writes in my environment fio is as follows:
case                    iops
thick lv                63043
thin lv                 42130
snapshotted thin lv     5245

It is mentioned in the lvmthin main page under "Chunk size" that the 
size of chunksize has an impact on snapshot performance. So I tested the 
write performance after creating snapshots with different chunksize. The 
data is shown below:
chunksize               iops
64k                     5245
256k                    2115
1024k                   509

The performance degradation after snapshotting is expected as writing to 
a snapshotted lv involving reading the original data, writing it 
elsewhere and then writing new data into the original chunk. But the 
performance loss was so much more than I expected. Is there any way to 
improve performance after creating a snapshot? Can I ask for your help?

Regards,

Zhiyong Ye

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-13  8:49 [linux-lvm] Why is the performance of my lvmthin snapshot so poor Zhiyong Ye
@ 2022-06-14  7:04 ` Gionatan Danti
  2022-06-14 10:16   ` Zhiyong Ye
  0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-14  7:04 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zhiyong Ye

Il 2022-06-13 10:49 Zhiyong Ye ha scritto:
> The performance degradation after snapshotting is expected as writing
> to a snapshotted lv involving reading the original data, writing it
> elsewhere and then writing new data into the original chunk. But the
> performance loss was so much more than I expected. Is there any way to
> improve performance after creating a snapshot? Can I ask for your
> help?

This is the key point: when first writing to a new chunk, not only it 
needs to be allocated, but old data must be copied. This r/m/w operation 
transform an async operation (write) on a sync one (read), ruining 
performance. Subsequent writes to the same chunk does have the same 
issue.

The magnitute of the slowdown seems somewhat excessive, though. When 
dealing with HDD pools, I remember a 3-5x impact on IOPs. Can you show 
the exact fio command and the parameters of your thin pool (ie: chunk 
size) and storage subsystem (HDD vs SSD, SATA vs SAS vs NVME)?

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-14  7:04 ` Gionatan Danti
@ 2022-06-14 10:16   ` Zhiyong Ye
  2022-06-14 12:56     ` Gionatan Danti
  0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-14 10:16 UTC (permalink / raw)
  To: Gionatan Danti, LVM general discussion and development

Hi Gionatan,

Thanks for your reply and detailed answer.

I actually use iSCSI for my underlying storage and the bare disk random 
write iops is 65543. Some of the parameters of my iSCSI initiator are as 
follows:
node.session.iscsi.FirstBurstLength = 524288
node.session.iscsi.MaxBurstLength = 33552384
node.session.cmds_max = 4096
node.session.queue_depth = 1024

After creating the PV and VG based on the iSCSI device, I created the 
thin pool as follows:
lvcreate -n pool -L 1000G test-vg
lvcreate -n poolmeta -L 100G test-vg
lvconvert --type thin-pool --chunksize 64k --poolmetadata 
test-vg/poolmeta test-vg/pool
lvchange -Z n test-vg/pool

Then I create thin lv in the thin pool:
lvcreate -n test-thin -V 500G --thinpool pool test-vg

And my command for creating snapshots:
lvcreate -n test-thin1s1 -s test-vg/test-thin

I have the following fio parameter and use it for all tests:
[global]
bs=4k
direct=1
iodepth=32
numjobs=8
ioengine=libaio
group_reporting
runtime=120
time_based
filename=/dev/vdb

[rand-write]
name=rand-write
rw=randwrite
stonewall

Thanks again!

Zhiyong Ye

在 6/14/22 3:04 PM, Gionatan Danti 写道:
> Il 2022-06-13 10:49 Zhiyong Ye ha scritto:
>> The performance degradation after snapshotting is expected as writing
>> to a snapshotted lv involving reading the original data, writing it
>> elsewhere and then writing new data into the original chunk. But the
>> performance loss was so much more than I expected. Is there any way to
>> improve performance after creating a snapshot? Can I ask for your
>> help?
> 
> This is the key point: when first writing to a new chunk, not only it 
> needs to be allocated, but old data must be copied. This r/m/w operation 
> transform an async operation (write) on a sync one (read), ruining 
> performance. Subsequent writes to the same chunk does have the same issue.
> 
> The magnitute of the slowdown seems somewhat excessive, though. When 
> dealing with HDD pools, I remember a 3-5x impact on IOPs. Can you show 
> the exact fio command and the parameters of your thin pool (ie: chunk 
> size) and storage subsystem (HDD vs SSD, SATA vs SAS vs NVME)?
> 
> Regards.
> 

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-14 10:16   ` Zhiyong Ye
@ 2022-06-14 12:56     ` Gionatan Danti
  2022-06-14 13:29       ` Zhiyong Ye
  0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-14 12:56 UTC (permalink / raw)
  To: Zhiyong Ye; +Cc: LVM general discussion and development

Il 2022-06-14 12:16 Zhiyong Ye ha scritto:
> After creating the PV and VG based on the iSCSI device, I created the
> thin pool as follows:
> lvcreate -n pool -L 1000G test-vg
> lvcreate -n poolmeta -L 100G test-vg
> lvconvert --type thin-pool --chunksize 64k --poolmetadata
> test-vg/poolmeta test-vg/pool
> lvchange -Z n test-vg/pool

I did my performance test with bigger chunk size, in the range of 
128-512K. It can very well be that the overload of a smaller chunk size 
results in 10x lower IOPs for to-be-allocated-and-copied chunks. Can you 
retry fio after increasing chunk size?

As a side not, if I remember correcly thin pool metadata is hard limited 
do 16 GB - no need to allocate 100 GB for it.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-14 12:56     ` Gionatan Danti
@ 2022-06-14 13:29       ` Zhiyong Ye
  2022-06-14 14:54         ` Gionatan Danti
  0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-14 13:29 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: LVM general discussion and development

在 6/14/22 8:56 PM, Gionatan Danti 写道:
> Il 2022-06-14 12:16 Zhiyong Ye ha scritto:
>> After creating the PV and VG based on the iSCSI device, I created the
>> thin pool as follows:
>> lvcreate -n pool -L 1000G test-vg
>> lvcreate -n poolmeta -L 100G test-vg
>> lvconvert --type thin-pool --chunksize 64k --poolmetadata
>> test-vg/poolmeta test-vg/pool
>> lvchange -Z n test-vg/pool
> 
> I did my performance test with bigger chunk size, in the range of 
> 128-512K. It can very well be that the overload of a smaller chunk size 
> results in 10x lower IOPs for to-be-allocated-and-copied chunks. Can you 
> retry fio after increasing chunk size?

Yes, I also tested the write performance after creating snapshots with 
different chunksize. But the data shows that the larger the chunksize, 
the worse the performance. The data is shown below:
chunksize               iops
64k                     5245
256k                    2115
1024k                   509

The reason for this may be that when the volume creates a snapshot, each 
write to an existing block will cause a COW (Copy-on-write), and the COW 
is a copy of the entire data block in chunksize, for example, when the 
chunksize is 64k, even if only 4k of data is written, the entire 64k 
data block will be copied. I'm not sure if I understand this correctly.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-14 13:29       ` Zhiyong Ye
@ 2022-06-14 14:54         ` Gionatan Danti
  2022-06-15  7:42           ` Zhiyong Ye
  0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-14 14:54 UTC (permalink / raw)
  To: Zhiyong Ye; +Cc: LVM general discussion and development

Il 2022-06-14 15:29 Zhiyong Ye ha scritto:
> The reason for this may be that when the volume creates a snapshot,
> each write to an existing block will cause a COW (Copy-on-write), and
> the COW is a copy of the entire data block in chunksize, for example,
> when the chunksize is 64k, even if only 4k of data is written, the
> entire 64k data block will be copied. I'm not sure if I understand
> this correctly.

Yes, in your case, the added copies are lowering total available IOPs. 
But note how the decrease is sub-linear (from 64K to 1M you have a 16x 
increase in chunk size but "only" a 10x hit in IOPs): this is due to the 
lowered metadata overhead.

A last try: if you can, please regenerate your thin volume with 64K 
chunks and set fio to execute 64K requests. Lets see if LVM is at least 
smart enough to avoid coping a to-be-completely-overwritten chunks.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-14 14:54         ` Gionatan Danti
@ 2022-06-15  7:42           ` Zhiyong Ye
  2022-06-15  9:34             ` Gionatan Danti
  2022-06-16  7:53             ` Demi Marie Obenour
  0 siblings, 2 replies; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-15  7:42 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: LVM general discussion and development



在 6/14/22 10:54 PM, Gionatan Danti 写道:
> Il 2022-06-14 15:29 Zhiyong Ye ha scritto:
>> The reason for this may be that when the volume creates a snapshot,
>> each write to an existing block will cause a COW (Copy-on-write), and
>> the COW is a copy of the entire data block in chunksize, for example,
>> when the chunksize is 64k, even if only 4k of data is written, the
>> entire 64k data block will be copied. I'm not sure if I understand
>> this correctly.
> 
> Yes, in your case, the added copies are lowering total available IOPs. 
> But note how the decrease is sub-linear (from 64K to 1M you have a 16x 
> increase in chunk size but "only" a 10x hit in IOPs): this is due to the 
> lowered metadata overhead.

It seems that the consumption of COW copies when sending 4k requests is 
much greater than the loss from metadata.

> A last try: if you can, please regenerate your thin volume with 64K 
> chunks and set fio to execute 64K requests. Lets see if LVM is at least 
> smart enough to avoid coping a to-be-completely-overwritten chunks.

I regenerated the thin volume with the chunksize of 64K and the random 
write performance data tested with fio 64k requests is as follows:
case                    iops
thin lv                 9381
snapshotted thin lv     8307

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-15  7:42           ` Zhiyong Ye
@ 2022-06-15  9:34             ` Gionatan Danti
  2022-06-15  9:46               ` Zhiyong Ye
  2022-06-16  7:53             ` Demi Marie Obenour
  1 sibling, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-15  9:34 UTC (permalink / raw)
  To: Zhiyong Ye; +Cc: LVM general discussion and development

Il 2022-06-15 09:42 Zhiyong Ye ha scritto:
> I regenerated the thin volume with the chunksize of 64K and the random
> write performance data tested with fio 64k requests is as follows:
> case                    iops
> thin lv                 9381
> snapshotted thin lv     8307

As expected, increasing I/O size (to avoid r/m/w) greatly reduced the 
issue (the ~11% hit is due to metadata allocation overhead).

I don't see anything wrong, so I think you had to live with the 
previously recorded 10x performance hit when overwriting 4K blocks on a 
64K chunk size thin volume...

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-15  9:34             ` Gionatan Danti
@ 2022-06-15  9:46               ` Zhiyong Ye
  2022-06-15 12:40                 ` Gionatan Danti
  0 siblings, 1 reply; 15+ messages in thread
From: Zhiyong Ye @ 2022-06-15  9:46 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: LVM general discussion and development



在 6/15/22 5:34 PM, Gionatan Danti 写道:
> Il 2022-06-15 09:42 Zhiyong Ye ha scritto:
>> I regenerated the thin volume with the chunksize of 64K and the random
>> write performance data tested with fio 64k requests is as follows:
>> case                    iops
>> thin lv                 9381
>> snapshotted thin lv     8307
> 
> As expected, increasing I/O size (to avoid r/m/w) greatly reduced the 
> issue (the ~11% hit is due to metadata allocation overhead).
> 
> I don't see anything wrong, so I think you had to live with the 
> previously recorded 10x performance hit when overwriting 4K blocks on a 
> 64K chunk size thin volume...

I also think it meets expectations. But is there any other way to 
optimize snapshot performance at the code level? Does it help to reduce 
the chunksize size in the code, I see in the help documentation that the 
chunksize can only be 64k minimum.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-15  9:46               ` Zhiyong Ye
@ 2022-06-15 12:40                 ` Gionatan Danti
  2022-06-15 16:39                   ` Demi Marie Obenour
  0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-15 12:40 UTC (permalink / raw)
  To: Zhiyong Ye; +Cc: LVM general discussion and development

Il 2022-06-15 11:46 Zhiyong Ye ha scritto:
> I also think it meets expectations. But is there any other way to
> optimize snapshot performance at the code level? Does it help to
> reduce the chunksize size in the code, I see in the help documentation
> that the chunksize can only be 64k minimum.

I don't think forcing the code to use smaller recordsize is a good idea. 
Considering the hard limit on metadata size (16 GB max), 64K chunks are 
good for ~16 TB thin pool - already relatively small.

A, say, 16K recordsize would be good for a 4 TB pool only, an so on. 
Moreover, sequential performance will significantly suffer.

I think you have to accept the performance hit on first chunck 
allocation & rewrite.
Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-15 12:40                 ` Gionatan Danti
@ 2022-06-15 16:39                   ` Demi Marie Obenour
  0 siblings, 0 replies; 15+ messages in thread
From: Demi Marie Obenour @ 2022-06-15 16:39 UTC (permalink / raw)
  To: LVM general discussion and development, Zhiyong Ye


[-- Attachment #1.1: Type: text/plain, Size: 998 bytes --]

On Wed, Jun 15, 2022 at 02:40:29PM +0200, Gionatan Danti wrote:
> Il 2022-06-15 11:46 Zhiyong Ye ha scritto:
> > I also think it meets expectations. But is there any other way to
> > optimize snapshot performance at the code level? Does it help to
> > reduce the chunksize size in the code, I see in the help documentation
> > that the chunksize can only be 64k minimum.
> 
> I don't think forcing the code to use smaller recordsize is a good idea.
> Considering the hard limit on metadata size (16 GB max), 64K chunks are good
> for ~16 TB thin pool - already relatively small.
> 
> A, say, 16K recordsize would be good for a 4 TB pool only, an so on.
> Moreover, sequential performance will significantly suffer.
> 
> I think you have to accept the performance hit on first chunck allocation &
> rewrite.

I seriously hope this will be fixed in dm-thin v2.  It’s a significant
problem for Qubes OS.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-15  7:42           ` Zhiyong Ye
  2022-06-15  9:34             ` Gionatan Danti
@ 2022-06-16  7:53             ` Demi Marie Obenour
  2022-06-16 13:22               ` Gionatan Danti
  1 sibling, 1 reply; 15+ messages in thread
From: Demi Marie Obenour @ 2022-06-16  7:53 UTC (permalink / raw)
  To: LVM general discussion and development, Gionatan Danti


[-- Attachment #1.1: Type: text/plain, Size: 1935 bytes --]

On Wed, Jun 15, 2022 at 03:42:17PM +0800, Zhiyong Ye wrote:
> 
> 
> 在 6/14/22 10:54 PM, Gionatan Danti 写道:
> > Il 2022-06-14 15:29 Zhiyong Ye ha scritto:
> > > The reason for this may be that when the volume creates a snapshot,
> > > each write to an existing block will cause a COW (Copy-on-write), and
> > > the COW is a copy of the entire data block in chunksize, for example,
> > > when the chunksize is 64k, even if only 4k of data is written, the
> > > entire 64k data block will be copied. I'm not sure if I understand
> > > this correctly.
> > 
> > Yes, in your case, the added copies are lowering total available IOPs.
> > But note how the decrease is sub-linear (from 64K to 1M you have a 16x
> > increase in chunk size but "only" a 10x hit in IOPs): this is due to the
> > lowered metadata overhead.
> 
> It seems that the consumption of COW copies when sending 4k requests is much
> greater than the loss from metadata.
> 
> > A last try: if you can, please regenerate your thin volume with 64K
> > chunks and set fio to execute 64K requests. Lets see if LVM is at least
> > smart enough to avoid coping a to-be-completely-overwritten chunks.
> 
> I regenerated the thin volume with the chunksize of 64K and the random write
> performance data tested with fio 64k requests is as follows:
> case                    iops
> thin lv                 9381
> snapshotted thin lv     8307

That seems reasonable.  My conclusion is that dm-thin (which is what LVM
uses) is not a good fit for workloads with a lot of small random writes
and frequent snapshots, due to the 64k minimum chunk size.  This also
explains why dm-thin does not allow smaller blocks: not only would it
only support very small thin pools, it would also have massive metadata
write overhead.  Hopefully dm-thin v2 will improve the situation.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-16  7:53             ` Demi Marie Obenour
@ 2022-06-16 13:22               ` Gionatan Danti
  2022-06-16 16:19                 ` Demi Marie Obenour
  0 siblings, 1 reply; 15+ messages in thread
From: Gionatan Danti @ 2022-06-16 13:22 UTC (permalink / raw)
  To: Demi Marie Obenour; +Cc: LVM general discussion and development

Il 2022-06-16 09:53 Demi Marie Obenour ha scritto:
> That seems reasonable.  My conclusion is that dm-thin (which is what 
> LVM
> uses) is not a good fit for workloads with a lot of small random writes
> and frequent snapshots, due to the 64k minimum chunk size.  This also
> explains why dm-thin does not allow smaller blocks: not only would it
> only support very small thin pools, it would also have massive metadata
> write overhead.  Hopefully dm-thin v2 will improve the situation.

I think that, in this case, no free lunch really exists. I tried the 
following thin provisioning methods, each with its strong & weak points:

lvmthin: probably the more flexible of the mainline kernel options. You 
pay for r/m/w only when allocating a small block (say 4K) the first time 
after taking a snapshot. It is fast and well integrated with lvm command 
line. Con: bad behavior on out-of-space condition

xfs + reflink: a great, simple to use tool when applicable. It has a 
very small granularity (4K) with no r/m/w. Cons: requires fine tuning 
for good performance when reflinking big files; IO freezes during 
metadata copy for reflink; a very small granularity means sequential IO 
is going to suffer heavily (see here for more details: 
https://marc.info/?l=linux-xfs&m=157891132109888&w=2)

btrfs: very small granularity (4K) and many integrated features. Cons: 
bad performance overall, especially when using mechanical HDD

vdo: is provides small granularity (4K) thin provisioning, compression 
and deduplication. Cons: (still) out-of-tree; requires a powerloss 
protected writeback cache to maintain good performance; no snapshot 
capability

zfs: designed for the ground up for pervasive CoW, with many features 
and ARC/L2ARC. Cons: out-of-tree; using small granularity (4K) means bad 
overall performance; using big granularity (128K by default) is a 
necessary compromise for most HDD pools.

For what it is worth, I settled on ZFS when using out-of-tree modules is 
not an issue and lvmthin otherwise (but I plan to use xfs + reflink more 
in the future).

Do you have any information to share about dm-thin v2? I heard about it 
some years ago, but I found no recent info.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-16 13:22               ` Gionatan Danti
@ 2022-06-16 16:19                 ` Demi Marie Obenour
  2022-06-16 19:50                   ` Gionatan Danti
  0 siblings, 1 reply; 15+ messages in thread
From: Demi Marie Obenour @ 2022-06-16 16:19 UTC (permalink / raw)
  To: LVM general discussion and development


[-- Attachment #1.1: Type: text/plain, Size: 3444 bytes --]

On Thu, Jun 16, 2022 at 03:22:09PM +0200, Gionatan Danti wrote:
> Il 2022-06-16 09:53 Demi Marie Obenour ha scritto:
> > That seems reasonable.  My conclusion is that dm-thin (which is what LVM
> > uses) is not a good fit for workloads with a lot of small random writes
> > and frequent snapshots, due to the 64k minimum chunk size.  This also
> > explains why dm-thin does not allow smaller blocks: not only would it
> > only support very small thin pools, it would also have massive metadata
> > write overhead.  Hopefully dm-thin v2 will improve the situation.
> 
> I think that, in this case, no free lunch really exists. I tried the
> following thin provisioning methods, each with its strong & weak points:
> 
> lvmthin: probably the more flexible of the mainline kernel options. You pay
> for r/m/w only when allocating a small block (say 4K) the first time after
> taking a snapshot. It is fast and well integrated with lvm command line.
> Con: bad behavior on out-of-space condition

Also, the LVM command line is slow, and there is very large write
amplification with lots of random writes immediately after taking a
snapshot.  Furthermore, because of the mismatch between the dm-thin
block size and the filesystem block size, fstrim might not reclaim as
much space in the pool as one would expect.

> xfs + reflink: a great, simple to use tool when applicable. It has a very
> small granularity (4K) with no r/m/w. Cons: requires fine tuning for good
> performance when reflinking big files; IO freezes during metadata copy for
> reflink; a very small granularity means sequential IO is going to suffer
> heavily (see here for more details:
> https://marc.info/?l=linux-xfs&m=157891132109888&w=2)

Also heavy fragmentation can make journal replay very slow, to the point
of taking days on spinning hard drives.  Dave Chinner explains this here:
https://lore.kernel.org/linux-xfs/20220509230918.GP1098723@dread.disaster.area/.

> btrfs: very small granularity (4K) and many integrated features. Cons: bad
> performance overall, especially when using mechanical HDD

Also poor out-of-space handling and unbounded worst-case latency.

> vdo: is provides small granularity (4K) thin provisioning, compression and
> deduplication. Cons: (still) out-of-tree; requires a powerloss protected
> writeback cache to maintain good performance; no snapshot capability
> 
> zfs: designed for the ground up for pervasive CoW, with many features and
> ARC/L2ARC. Cons: out-of-tree; using small granularity (4K) means bad overall
> performance; using big granularity (128K by default) is a necessary
> compromise for most HDD pools.

Is this still a problem on NVMe storage?  HDDs will not really be fast
no matter what one does, at least unless there is a write-back cache
that can convert random I/O to sequential I/O.  Even that only helps
much if your working set fits in cache, or if your workload is
write-mostly.

> For what it is worth, I settled on ZFS when using out-of-tree modules is not
> an issue and lvmthin otherwise (but I plan to use xfs + reflink more in the
> future).
> 
> Do you have any information to share about dm-thin v2? I heard about it some
> years ago, but I found no recent info.

It does not exist yet.  Joe Thornber would be the person to ask
regarding any plans to create it.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor
  2022-06-16 16:19                 ` Demi Marie Obenour
@ 2022-06-16 19:50                   ` Gionatan Danti
  0 siblings, 0 replies; 15+ messages in thread
From: Gionatan Danti @ 2022-06-16 19:50 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Demi Marie Obenour

Il 2022-06-16 18:19 Demi Marie Obenour ha scritto:
> Also heavy fragmentation can make journal replay very slow, to the 
> point
> of taking days on spinning hard drives.  Dave Chinner explains this 
> here:
> https://lore.kernel.org/linux-xfs/20220509230918.GP1098723@dread.disaster.area/.

Thanks, the linked thread was very interesting.

> Also poor out-of-space handling and unbounded worst-case latency.

Very true.

> Is this still a problem on NVMe storage?  HDDs will not really be fast
> no matter what one does, at least unless there is a write-back cache
> that can convert random I/O to sequential I/O.  Even that only helps
> much if your working set fits in cache, or if your workload is
> write-mostly.

One of the key features of ZFS is to transform random writes into 
sequential ones. With the right recordsize, and coupled with prefetch, 
compressed ARC and L2ARC, even HDD pool can be surprisingly usable.

For NVMe pools you should use a much lower recordsize to avoid 
read/write amplification, but not lower than 16K to not impair 
compression efficiency (unless you are storing mostly uncompressible 
stuff). That said, for pure NVMe storage (no compression or other data 
transformations) I think XFS, possibly with direct IO, is the fastest 
choice by a factor of 2x.

> It does not exist yet.  Joe Thornber would be the person to ask
> regarding any plans to create it.

Ok - I was hoping to miss something, but it is not the case.
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-06-16 19:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-13  8:49 [linux-lvm] Why is the performance of my lvmthin snapshot so poor Zhiyong Ye
2022-06-14  7:04 ` Gionatan Danti
2022-06-14 10:16   ` Zhiyong Ye
2022-06-14 12:56     ` Gionatan Danti
2022-06-14 13:29       ` Zhiyong Ye
2022-06-14 14:54         ` Gionatan Danti
2022-06-15  7:42           ` Zhiyong Ye
2022-06-15  9:34             ` Gionatan Danti
2022-06-15  9:46               ` Zhiyong Ye
2022-06-15 12:40                 ` Gionatan Danti
2022-06-15 16:39                   ` Demi Marie Obenour
2022-06-16  7:53             ` Demi Marie Obenour
2022-06-16 13:22               ` Gionatan Danti
2022-06-16 16:19                 ` Demi Marie Obenour
2022-06-16 19:50                   ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).