* [linux-lvm] Why is the performance of my lvmthin snapshot so poor @ 2022-06-13 8:49 Zhiyong Ye 2022-06-14 7:04 ` Gionatan Danti 0 siblings, 1 reply; 15+ messages in thread From: Zhiyong Ye @ 2022-06-13 8:49 UTC (permalink / raw) To: linux-lvm Hi all, I am new to lvmthin. When I created snapshots using lvmthin, the write performance of the original lvm was poor. After I create thin lv with zeroing disabled, I first write the whole volume with fio, then create snapshot, and finally test the write performance of this volume with fio again. The performance after creating a snapshot is very poor, only 10% of the thick lv, and also much worse than the performance of the first write of thin lv. The performance data for random writes in my environment fio is as follows: case iops thick lv 63043 thin lv 42130 snapshotted thin lv 5245 It is mentioned in the lvmthin main page under "Chunk size" that the size of chunksize has an impact on snapshot performance. So I tested the write performance after creating snapshots with different chunksize. The data is shown below: chunksize iops 64k 5245 256k 2115 1024k 509 The performance degradation after snapshotting is expected as writing to a snapshotted lv involving reading the original data, writing it elsewhere and then writing new data into the original chunk. But the performance loss was so much more than I expected. Is there any way to improve performance after creating a snapshot? Can I ask for your help? Regards, Zhiyong Ye _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-13 8:49 [linux-lvm] Why is the performance of my lvmthin snapshot so poor Zhiyong Ye @ 2022-06-14 7:04 ` Gionatan Danti 2022-06-14 10:16 ` Zhiyong Ye 0 siblings, 1 reply; 15+ messages in thread From: Gionatan Danti @ 2022-06-14 7:04 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Zhiyong Ye Il 2022-06-13 10:49 Zhiyong Ye ha scritto: > The performance degradation after snapshotting is expected as writing > to a snapshotted lv involving reading the original data, writing it > elsewhere and then writing new data into the original chunk. But the > performance loss was so much more than I expected. Is there any way to > improve performance after creating a snapshot? Can I ask for your > help? This is the key point: when first writing to a new chunk, not only it needs to be allocated, but old data must be copied. This r/m/w operation transform an async operation (write) on a sync one (read), ruining performance. Subsequent writes to the same chunk does have the same issue. The magnitute of the slowdown seems somewhat excessive, though. When dealing with HDD pools, I remember a 3-5x impact on IOPs. Can you show the exact fio command and the parameters of your thin pool (ie: chunk size) and storage subsystem (HDD vs SSD, SATA vs SAS vs NVME)? Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-14 7:04 ` Gionatan Danti @ 2022-06-14 10:16 ` Zhiyong Ye 2022-06-14 12:56 ` Gionatan Danti 0 siblings, 1 reply; 15+ messages in thread From: Zhiyong Ye @ 2022-06-14 10:16 UTC (permalink / raw) To: Gionatan Danti, LVM general discussion and development Hi Gionatan, Thanks for your reply and detailed answer. I actually use iSCSI for my underlying storage and the bare disk random write iops is 65543. Some of the parameters of my iSCSI initiator are as follows: node.session.iscsi.FirstBurstLength = 524288 node.session.iscsi.MaxBurstLength = 33552384 node.session.cmds_max = 4096 node.session.queue_depth = 1024 After creating the PV and VG based on the iSCSI device, I created the thin pool as follows: lvcreate -n pool -L 1000G test-vg lvcreate -n poolmeta -L 100G test-vg lvconvert --type thin-pool --chunksize 64k --poolmetadata test-vg/poolmeta test-vg/pool lvchange -Z n test-vg/pool Then I create thin lv in the thin pool: lvcreate -n test-thin -V 500G --thinpool pool test-vg And my command for creating snapshots: lvcreate -n test-thin1s1 -s test-vg/test-thin I have the following fio parameter and use it for all tests: [global] bs=4k direct=1 iodepth=32 numjobs=8 ioengine=libaio group_reporting runtime=120 time_based filename=/dev/vdb [rand-write] name=rand-write rw=randwrite stonewall Thanks again! Zhiyong Ye 在 6/14/22 3:04 PM, Gionatan Danti 写道: > Il 2022-06-13 10:49 Zhiyong Ye ha scritto: >> The performance degradation after snapshotting is expected as writing >> to a snapshotted lv involving reading the original data, writing it >> elsewhere and then writing new data into the original chunk. But the >> performance loss was so much more than I expected. Is there any way to >> improve performance after creating a snapshot? Can I ask for your >> help? > > This is the key point: when first writing to a new chunk, not only it > needs to be allocated, but old data must be copied. This r/m/w operation > transform an async operation (write) on a sync one (read), ruining > performance. Subsequent writes to the same chunk does have the same issue. > > The magnitute of the slowdown seems somewhat excessive, though. When > dealing with HDD pools, I remember a 3-5x impact on IOPs. Can you show > the exact fio command and the parameters of your thin pool (ie: chunk > size) and storage subsystem (HDD vs SSD, SATA vs SAS vs NVME)? > > Regards. > _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-14 10:16 ` Zhiyong Ye @ 2022-06-14 12:56 ` Gionatan Danti 2022-06-14 13:29 ` Zhiyong Ye 0 siblings, 1 reply; 15+ messages in thread From: Gionatan Danti @ 2022-06-14 12:56 UTC (permalink / raw) To: Zhiyong Ye; +Cc: LVM general discussion and development Il 2022-06-14 12:16 Zhiyong Ye ha scritto: > After creating the PV and VG based on the iSCSI device, I created the > thin pool as follows: > lvcreate -n pool -L 1000G test-vg > lvcreate -n poolmeta -L 100G test-vg > lvconvert --type thin-pool --chunksize 64k --poolmetadata > test-vg/poolmeta test-vg/pool > lvchange -Z n test-vg/pool I did my performance test with bigger chunk size, in the range of 128-512K. It can very well be that the overload of a smaller chunk size results in 10x lower IOPs for to-be-allocated-and-copied chunks. Can you retry fio after increasing chunk size? As a side not, if I remember correcly thin pool metadata is hard limited do 16 GB - no need to allocate 100 GB for it. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-14 12:56 ` Gionatan Danti @ 2022-06-14 13:29 ` Zhiyong Ye 2022-06-14 14:54 ` Gionatan Danti 0 siblings, 1 reply; 15+ messages in thread From: Zhiyong Ye @ 2022-06-14 13:29 UTC (permalink / raw) To: Gionatan Danti; +Cc: LVM general discussion and development 在 6/14/22 8:56 PM, Gionatan Danti 写道: > Il 2022-06-14 12:16 Zhiyong Ye ha scritto: >> After creating the PV and VG based on the iSCSI device, I created the >> thin pool as follows: >> lvcreate -n pool -L 1000G test-vg >> lvcreate -n poolmeta -L 100G test-vg >> lvconvert --type thin-pool --chunksize 64k --poolmetadata >> test-vg/poolmeta test-vg/pool >> lvchange -Z n test-vg/pool > > I did my performance test with bigger chunk size, in the range of > 128-512K. It can very well be that the overload of a smaller chunk size > results in 10x lower IOPs for to-be-allocated-and-copied chunks. Can you > retry fio after increasing chunk size? Yes, I also tested the write performance after creating snapshots with different chunksize. But the data shows that the larger the chunksize, the worse the performance. The data is shown below: chunksize iops 64k 5245 256k 2115 1024k 509 The reason for this may be that when the volume creates a snapshot, each write to an existing block will cause a COW (Copy-on-write), and the COW is a copy of the entire data block in chunksize, for example, when the chunksize is 64k, even if only 4k of data is written, the entire 64k data block will be copied. I'm not sure if I understand this correctly. _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-14 13:29 ` Zhiyong Ye @ 2022-06-14 14:54 ` Gionatan Danti 2022-06-15 7:42 ` Zhiyong Ye 0 siblings, 1 reply; 15+ messages in thread From: Gionatan Danti @ 2022-06-14 14:54 UTC (permalink / raw) To: Zhiyong Ye; +Cc: LVM general discussion and development Il 2022-06-14 15:29 Zhiyong Ye ha scritto: > The reason for this may be that when the volume creates a snapshot, > each write to an existing block will cause a COW (Copy-on-write), and > the COW is a copy of the entire data block in chunksize, for example, > when the chunksize is 64k, even if only 4k of data is written, the > entire 64k data block will be copied. I'm not sure if I understand > this correctly. Yes, in your case, the added copies are lowering total available IOPs. But note how the decrease is sub-linear (from 64K to 1M you have a 16x increase in chunk size but "only" a 10x hit in IOPs): this is due to the lowered metadata overhead. A last try: if you can, please regenerate your thin volume with 64K chunks and set fio to execute 64K requests. Lets see if LVM is at least smart enough to avoid coping a to-be-completely-overwritten chunks. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-14 14:54 ` Gionatan Danti @ 2022-06-15 7:42 ` Zhiyong Ye 2022-06-15 9:34 ` Gionatan Danti 2022-06-16 7:53 ` Demi Marie Obenour 0 siblings, 2 replies; 15+ messages in thread From: Zhiyong Ye @ 2022-06-15 7:42 UTC (permalink / raw) To: Gionatan Danti; +Cc: LVM general discussion and development 在 6/14/22 10:54 PM, Gionatan Danti 写道: > Il 2022-06-14 15:29 Zhiyong Ye ha scritto: >> The reason for this may be that when the volume creates a snapshot, >> each write to an existing block will cause a COW (Copy-on-write), and >> the COW is a copy of the entire data block in chunksize, for example, >> when the chunksize is 64k, even if only 4k of data is written, the >> entire 64k data block will be copied. I'm not sure if I understand >> this correctly. > > Yes, in your case, the added copies are lowering total available IOPs. > But note how the decrease is sub-linear (from 64K to 1M you have a 16x > increase in chunk size but "only" a 10x hit in IOPs): this is due to the > lowered metadata overhead. It seems that the consumption of COW copies when sending 4k requests is much greater than the loss from metadata. > A last try: if you can, please regenerate your thin volume with 64K > chunks and set fio to execute 64K requests. Lets see if LVM is at least > smart enough to avoid coping a to-be-completely-overwritten chunks. I regenerated the thin volume with the chunksize of 64K and the random write performance data tested with fio 64k requests is as follows: case iops thin lv 9381 snapshotted thin lv 8307 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-15 7:42 ` Zhiyong Ye @ 2022-06-15 9:34 ` Gionatan Danti 2022-06-15 9:46 ` Zhiyong Ye 2022-06-16 7:53 ` Demi Marie Obenour 1 sibling, 1 reply; 15+ messages in thread From: Gionatan Danti @ 2022-06-15 9:34 UTC (permalink / raw) To: Zhiyong Ye; +Cc: LVM general discussion and development Il 2022-06-15 09:42 Zhiyong Ye ha scritto: > I regenerated the thin volume with the chunksize of 64K and the random > write performance data tested with fio 64k requests is as follows: > case iops > thin lv 9381 > snapshotted thin lv 8307 As expected, increasing I/O size (to avoid r/m/w) greatly reduced the issue (the ~11% hit is due to metadata allocation overhead). I don't see anything wrong, so I think you had to live with the previously recorded 10x performance hit when overwriting 4K blocks on a 64K chunk size thin volume... Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-15 9:34 ` Gionatan Danti @ 2022-06-15 9:46 ` Zhiyong Ye 2022-06-15 12:40 ` Gionatan Danti 0 siblings, 1 reply; 15+ messages in thread From: Zhiyong Ye @ 2022-06-15 9:46 UTC (permalink / raw) To: Gionatan Danti; +Cc: LVM general discussion and development 在 6/15/22 5:34 PM, Gionatan Danti 写道: > Il 2022-06-15 09:42 Zhiyong Ye ha scritto: >> I regenerated the thin volume with the chunksize of 64K and the random >> write performance data tested with fio 64k requests is as follows: >> case iops >> thin lv 9381 >> snapshotted thin lv 8307 > > As expected, increasing I/O size (to avoid r/m/w) greatly reduced the > issue (the ~11% hit is due to metadata allocation overhead). > > I don't see anything wrong, so I think you had to live with the > previously recorded 10x performance hit when overwriting 4K blocks on a > 64K chunk size thin volume... I also think it meets expectations. But is there any other way to optimize snapshot performance at the code level? Does it help to reduce the chunksize size in the code, I see in the help documentation that the chunksize can only be 64k minimum. _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-15 9:46 ` Zhiyong Ye @ 2022-06-15 12:40 ` Gionatan Danti 2022-06-15 16:39 ` Demi Marie Obenour 0 siblings, 1 reply; 15+ messages in thread From: Gionatan Danti @ 2022-06-15 12:40 UTC (permalink / raw) To: Zhiyong Ye; +Cc: LVM general discussion and development Il 2022-06-15 11:46 Zhiyong Ye ha scritto: > I also think it meets expectations. But is there any other way to > optimize snapshot performance at the code level? Does it help to > reduce the chunksize size in the code, I see in the help documentation > that the chunksize can only be 64k minimum. I don't think forcing the code to use smaller recordsize is a good idea. Considering the hard limit on metadata size (16 GB max), 64K chunks are good for ~16 TB thin pool - already relatively small. A, say, 16K recordsize would be good for a 4 TB pool only, an so on. Moreover, sequential performance will significantly suffer. I think you have to accept the performance hit on first chunck allocation & rewrite. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-15 12:40 ` Gionatan Danti @ 2022-06-15 16:39 ` Demi Marie Obenour 0 siblings, 0 replies; 15+ messages in thread From: Demi Marie Obenour @ 2022-06-15 16:39 UTC (permalink / raw) To: LVM general discussion and development, Zhiyong Ye [-- Attachment #1.1: Type: text/plain, Size: 998 bytes --] On Wed, Jun 15, 2022 at 02:40:29PM +0200, Gionatan Danti wrote: > Il 2022-06-15 11:46 Zhiyong Ye ha scritto: > > I also think it meets expectations. But is there any other way to > > optimize snapshot performance at the code level? Does it help to > > reduce the chunksize size in the code, I see in the help documentation > > that the chunksize can only be 64k minimum. > > I don't think forcing the code to use smaller recordsize is a good idea. > Considering the hard limit on metadata size (16 GB max), 64K chunks are good > for ~16 TB thin pool - already relatively small. > > A, say, 16K recordsize would be good for a 4 TB pool only, an so on. > Moreover, sequential performance will significantly suffer. > > I think you have to accept the performance hit on first chunck allocation & > rewrite. I seriously hope this will be fixed in dm-thin v2. It’s a significant problem for Qubes OS. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 202 bytes --] _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-15 7:42 ` Zhiyong Ye 2022-06-15 9:34 ` Gionatan Danti @ 2022-06-16 7:53 ` Demi Marie Obenour 2022-06-16 13:22 ` Gionatan Danti 1 sibling, 1 reply; 15+ messages in thread From: Demi Marie Obenour @ 2022-06-16 7:53 UTC (permalink / raw) To: LVM general discussion and development, Gionatan Danti [-- Attachment #1.1: Type: text/plain, Size: 1935 bytes --] On Wed, Jun 15, 2022 at 03:42:17PM +0800, Zhiyong Ye wrote: > > > 在 6/14/22 10:54 PM, Gionatan Danti 写道: > > Il 2022-06-14 15:29 Zhiyong Ye ha scritto: > > > The reason for this may be that when the volume creates a snapshot, > > > each write to an existing block will cause a COW (Copy-on-write), and > > > the COW is a copy of the entire data block in chunksize, for example, > > > when the chunksize is 64k, even if only 4k of data is written, the > > > entire 64k data block will be copied. I'm not sure if I understand > > > this correctly. > > > > Yes, in your case, the added copies are lowering total available IOPs. > > But note how the decrease is sub-linear (from 64K to 1M you have a 16x > > increase in chunk size but "only" a 10x hit in IOPs): this is due to the > > lowered metadata overhead. > > It seems that the consumption of COW copies when sending 4k requests is much > greater than the loss from metadata. > > > A last try: if you can, please regenerate your thin volume with 64K > > chunks and set fio to execute 64K requests. Lets see if LVM is at least > > smart enough to avoid coping a to-be-completely-overwritten chunks. > > I regenerated the thin volume with the chunksize of 64K and the random write > performance data tested with fio 64k requests is as follows: > case iops > thin lv 9381 > snapshotted thin lv 8307 That seems reasonable. My conclusion is that dm-thin (which is what LVM uses) is not a good fit for workloads with a lot of small random writes and frequent snapshots, due to the 64k minimum chunk size. This also explains why dm-thin does not allow smaller blocks: not only would it only support very small thin pools, it would also have massive metadata write overhead. Hopefully dm-thin v2 will improve the situation. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 202 bytes --] _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-16 7:53 ` Demi Marie Obenour @ 2022-06-16 13:22 ` Gionatan Danti 2022-06-16 16:19 ` Demi Marie Obenour 0 siblings, 1 reply; 15+ messages in thread From: Gionatan Danti @ 2022-06-16 13:22 UTC (permalink / raw) To: Demi Marie Obenour; +Cc: LVM general discussion and development Il 2022-06-16 09:53 Demi Marie Obenour ha scritto: > That seems reasonable. My conclusion is that dm-thin (which is what > LVM > uses) is not a good fit for workloads with a lot of small random writes > and frequent snapshots, due to the 64k minimum chunk size. This also > explains why dm-thin does not allow smaller blocks: not only would it > only support very small thin pools, it would also have massive metadata > write overhead. Hopefully dm-thin v2 will improve the situation. I think that, in this case, no free lunch really exists. I tried the following thin provisioning methods, each with its strong & weak points: lvmthin: probably the more flexible of the mainline kernel options. You pay for r/m/w only when allocating a small block (say 4K) the first time after taking a snapshot. It is fast and well integrated with lvm command line. Con: bad behavior on out-of-space condition xfs + reflink: a great, simple to use tool when applicable. It has a very small granularity (4K) with no r/m/w. Cons: requires fine tuning for good performance when reflinking big files; IO freezes during metadata copy for reflink; a very small granularity means sequential IO is going to suffer heavily (see here for more details: https://marc.info/?l=linux-xfs&m=157891132109888&w=2) btrfs: very small granularity (4K) and many integrated features. Cons: bad performance overall, especially when using mechanical HDD vdo: is provides small granularity (4K) thin provisioning, compression and deduplication. Cons: (still) out-of-tree; requires a powerloss protected writeback cache to maintain good performance; no snapshot capability zfs: designed for the ground up for pervasive CoW, with many features and ARC/L2ARC. Cons: out-of-tree; using small granularity (4K) means bad overall performance; using big granularity (128K by default) is a necessary compromise for most HDD pools. For what it is worth, I settled on ZFS when using out-of-tree modules is not an issue and lvmthin otherwise (but I plan to use xfs + reflink more in the future). Do you have any information to share about dm-thin v2? I heard about it some years ago, but I found no recent info. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-16 13:22 ` Gionatan Danti @ 2022-06-16 16:19 ` Demi Marie Obenour 2022-06-16 19:50 ` Gionatan Danti 0 siblings, 1 reply; 15+ messages in thread From: Demi Marie Obenour @ 2022-06-16 16:19 UTC (permalink / raw) To: LVM general discussion and development [-- Attachment #1.1: Type: text/plain, Size: 3444 bytes --] On Thu, Jun 16, 2022 at 03:22:09PM +0200, Gionatan Danti wrote: > Il 2022-06-16 09:53 Demi Marie Obenour ha scritto: > > That seems reasonable. My conclusion is that dm-thin (which is what LVM > > uses) is not a good fit for workloads with a lot of small random writes > > and frequent snapshots, due to the 64k minimum chunk size. This also > > explains why dm-thin does not allow smaller blocks: not only would it > > only support very small thin pools, it would also have massive metadata > > write overhead. Hopefully dm-thin v2 will improve the situation. > > I think that, in this case, no free lunch really exists. I tried the > following thin provisioning methods, each with its strong & weak points: > > lvmthin: probably the more flexible of the mainline kernel options. You pay > for r/m/w only when allocating a small block (say 4K) the first time after > taking a snapshot. It is fast and well integrated with lvm command line. > Con: bad behavior on out-of-space condition Also, the LVM command line is slow, and there is very large write amplification with lots of random writes immediately after taking a snapshot. Furthermore, because of the mismatch between the dm-thin block size and the filesystem block size, fstrim might not reclaim as much space in the pool as one would expect. > xfs + reflink: a great, simple to use tool when applicable. It has a very > small granularity (4K) with no r/m/w. Cons: requires fine tuning for good > performance when reflinking big files; IO freezes during metadata copy for > reflink; a very small granularity means sequential IO is going to suffer > heavily (see here for more details: > https://marc.info/?l=linux-xfs&m=157891132109888&w=2) Also heavy fragmentation can make journal replay very slow, to the point of taking days on spinning hard drives. Dave Chinner explains this here: https://lore.kernel.org/linux-xfs/20220509230918.GP1098723@dread.disaster.area/. > btrfs: very small granularity (4K) and many integrated features. Cons: bad > performance overall, especially when using mechanical HDD Also poor out-of-space handling and unbounded worst-case latency. > vdo: is provides small granularity (4K) thin provisioning, compression and > deduplication. Cons: (still) out-of-tree; requires a powerloss protected > writeback cache to maintain good performance; no snapshot capability > > zfs: designed for the ground up for pervasive CoW, with many features and > ARC/L2ARC. Cons: out-of-tree; using small granularity (4K) means bad overall > performance; using big granularity (128K by default) is a necessary > compromise for most HDD pools. Is this still a problem on NVMe storage? HDDs will not really be fast no matter what one does, at least unless there is a write-back cache that can convert random I/O to sequential I/O. Even that only helps much if your working set fits in cache, or if your workload is write-mostly. > For what it is worth, I settled on ZFS when using out-of-tree modules is not > an issue and lvmthin otherwise (but I plan to use xfs + reflink more in the > future). > > Do you have any information to share about dm-thin v2? I heard about it some > years ago, but I found no recent info. It does not exist yet. Joe Thornber would be the person to ask regarding any plans to create it. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 202 bytes --] _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor 2022-06-16 16:19 ` Demi Marie Obenour @ 2022-06-16 19:50 ` Gionatan Danti 0 siblings, 0 replies; 15+ messages in thread From: Gionatan Danti @ 2022-06-16 19:50 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Demi Marie Obenour Il 2022-06-16 18:19 Demi Marie Obenour ha scritto: > Also heavy fragmentation can make journal replay very slow, to the > point > of taking days on spinning hard drives. Dave Chinner explains this > here: > https://lore.kernel.org/linux-xfs/20220509230918.GP1098723@dread.disaster.area/. Thanks, the linked thread was very interesting. > Also poor out-of-space handling and unbounded worst-case latency. Very true. > Is this still a problem on NVMe storage? HDDs will not really be fast > no matter what one does, at least unless there is a write-back cache > that can convert random I/O to sequential I/O. Even that only helps > much if your working set fits in cache, or if your workload is > write-mostly. One of the key features of ZFS is to transform random writes into sequential ones. With the right recordsize, and coupled with prefetch, compressed ARC and L2ARC, even HDD pool can be surprisingly usable. For NVMe pools you should use a much lower recordsize to avoid read/write amplification, but not lower than 16K to not impair compression efficiency (unless you are storing mostly uncompressible stuff). That said, for pure NVMe storage (no compression or other data transformations) I think XFS, possibly with direct IO, is the fastest choice by a factor of 2x. > It does not exist yet. Joe Thornber would be the person to ask > regarding any plans to create it. Ok - I was hoping to miss something, but it is not the case. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-06-16 19:51 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-13 8:49 [linux-lvm] Why is the performance of my lvmthin snapshot so poor Zhiyong Ye 2022-06-14 7:04 ` Gionatan Danti 2022-06-14 10:16 ` Zhiyong Ye 2022-06-14 12:56 ` Gionatan Danti 2022-06-14 13:29 ` Zhiyong Ye 2022-06-14 14:54 ` Gionatan Danti 2022-06-15 7:42 ` Zhiyong Ye 2022-06-15 9:34 ` Gionatan Danti 2022-06-15 9:46 ` Zhiyong Ye 2022-06-15 12:40 ` Gionatan Danti 2022-06-15 16:39 ` Demi Marie Obenour 2022-06-16 7:53 ` Demi Marie Obenour 2022-06-16 13:22 ` Gionatan Danti 2022-06-16 16:19 ` Demi Marie Obenour 2022-06-16 19:50 ` Gionatan Danti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).