From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx29.extmail.prod.ext.phx2.redhat.com [10.5.110.70]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C6CB5165D7 for ; Tue, 22 Oct 2019 10:47:35 +0000 (UTC) Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4E27A18C8902 for ; Tue, 22 Oct 2019 10:47:34 +0000 (UTC) Received: by mail-lj1-f172.google.com with SMTP id q78so1512713lje.5 for ; Tue, 22 Oct 2019 03:47:34 -0700 (PDT) Received: from [192.168.1.103] (m83-185-244-38.cust.tele2.se. [83.185.244.38]) by smtp.gmail.com with ESMTPSA id y24sm6567808lfy.20.2019.10.22.03.47.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Oct 2019 03:47:31 -0700 (PDT) From: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Message-ID: Date: Tue, 22 Oct 2019 12:47:30 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="------------1FDFE40FB0C6983501E14F49" Content-Language: sv Subject: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: linux-lvm@redhat.com This is a multi-part message in MIME format. --------------1FDFE40FB0C6983501E14F49 Content-Type: text/plain; charset="windows-1252"; format="flowed" Content-Transfer-Encoding: quoted-printable Hi When you create a snapshot of a logical volume. A new virtual dm- device will be created with the content of the changes=20 from the origin. This cow device can than be used to read changed contents etc. In case of an incident, this cow device can be used to read back the=20 changed content to its origin using the "lvmerge" command. The question I have is if there is a way to couple an external cow=20 device to an empty equaly sized logical volume, so that the empty logical volume is aware of that all changed content=20 are placed on this attached cow device? If that is possible, than it will help making instant recovery of LV=20 volumes from an external source using native lvmerge command, from for=20 example a backup server. [EMPTY LOGICAL VOLUME] =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD ^ =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD | =EF=BF=BD=EF=BF=BD =EF=BF=BD lvmerge =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD | [ATTACHED COW DEVICE] Regards Tomas --------------1FDFE40FB0C6983501E14F49 Content-Type: text/html; charset="windows-1252" Content-Transfer-Encoding: quoted-printable

Hi

When you create a snapshot of a logical volume.

A new virtual dm- device will be created with the content of the changes from the origin.

This cow device can than be used to read changed contents etc.


In case of an incident, this cow device can be used to read back the changed content to its origin using the "lvmerge" command.


The question I have is if there is a way to couple an external cow device to an empty equaly sized logical volume,

so that the empty logical volume is aware of that all changed content are placed on this attached cow device?

If that is possible, than it will help making instant recovery of LV volumes from an external source using native lvmerge command, from for example a backup server.


[EMPTY LOGICAL VOLUME]
=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD ^
=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD =EF=BF=BD |
=EF=BF=BD=EF=BF=BD =EF= =BF=BD lvmerge
=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD |

[ATTACHED COW DEVICE]

Regards Tomas

--------------1FDFE40FB0C6983501E14F49-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: From: Zdenek Kabelac Message-ID: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> Date: Tue, 22 Oct 2019 15:57:12 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: LVM general discussion and development , =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Dne 22. 10. 19 v 12:47 Dalebj=EF=BF=BDrk, Tomas napsal(a): > Hi >=20 > When you create a snapshot of a logical volume. >=20 > A new virtual dm- device will be created with the content of the changes = from=20 > the origin. >=20 > This cow device can than be used to read changed contents etc. >=20 >=20 > In case of an incident, this cow device can be used to read back the chan= ged=20 > content to its origin using the "lvmerge" command. >=20 >=20 > The question I have is if there is a way to couple an external cow device= to=20 > an empty equaly sized logical volume, >=20 > so that the empty logical volume is aware of that all changed content are= =20 > placed on this attached cow device? >=20 > If that is possible, than it will help making instant recovery of LV volu= mes=20 > from an external source using native lvmerge command, from for example a = > backup server. For most info how old snapshot for so called 'thick' LVs works - check these papers: http://people.redhat.com/agk/talks/ lvconvert --merge is in fact 'instant' operation - when it happens - you can immediately acce= ss 'already merged' content while the merge is happening in the background (you can look for copies percentage in lvs command) However 'thick' LVs with old snapshots are rather 'dated' technology you should probably checkout the usage of thinly provisioned LVs. Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Message-ID: Date: Tue, 22 Oct 2019 17:29:24 +0200 MIME-Version: 1.0 In-Reply-To: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> Content-Transfer-Encoding: 8bit Content-Language: sv Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: Zdenek Kabelac , LVM general discussion and development Thanks for feedback, I know that thick LV snapshots are out dated, and that one should use thin LV snapshots. But my understanding is that the dm- cow and dm - origin are still present and available in thin too? Example of a scenario: 1. Create a snapshot of LV testlv with the name snaplv 2. Perform a full copy of the snaplv using for example dd to a block device 3. Delete the snapshot Now I would like to re-attach this external block device as a snapshot again. After all, it is just a dm and LVM config, right? So for example: 1. create a snapshot of testlv with the name snaplv 2. re create the -cow meta data device : ... ��� Recreate this -cow meta data device by telling the origin that all data has been changed and are in the cow device (the raw device) 3. If the above were possible to perform, than it could be possible to instantly get at copy of the LV data using the lvconvert --merge command I have already invented a way to perform "block level incremental forever"; using the -cow device. And a possibility to reverse the blocks, to copy back only changed content from external devices. But, it would be better if the cow device could be recreated in a faster way, mentioning that all blocks are present on an external device, so that the LV volume can be restored much quicker using "lvconvert --merge" command. That would be super cool! Imagine backing up multi terrabyte sized volumes in minutes to external destinations, and restoring the data in seconds using instant recovery by re-creating or emulating the cow device, and associating all blocks to an external device? Regards Tomas Den 2019-10-22 kl. 15:57, skrev Zdenek Kabelac: > Dne 22. 10. 19 v 12:47 Dalebj�rk, Tomas napsal(a): >> Hi >> >> When you create a snapshot of a logical volume. >> >> A new virtual dm- device will be created with the content of the >> changes from the origin. >> >> This cow device can than be used to read changed contents etc. >> >> >> In case of an incident, this cow device can be used to read back the >> changed content to its origin using the "lvmerge" command. >> >> >> The question I have is if there is a way to couple an external cow >> device to an empty equaly sized logical volume, >> >> so that the empty logical volume is aware of that all changed content >> are placed on this attached cow device? >> >> If that is possible, than it will help making instant recovery of LV >> volumes from an external source using native lvmerge command, from >> for example a backup server. > > For most info how old snapshot for so called 'thick' LVs works - check > these papers: http://people.redhat.com/agk/talks/ > > > lvconvert --merge > > is in fact 'instant' operation - when it happens - you can immediately > access > 'already merged' content� while the merge is happening in the background > (you can look for copies percentage in lvs command) > > > However 'thick' LVs with old snapshots are rather 'dated' technology > you should probably checkout the usage of� thinly provisioned LVs. > > Regards > > Zdenek > > From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Zdenek Kabelac Message-ID: Date: Tue, 22 Oct 2019 17:36:50 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= , LVM general discussion and development Dne 22. 10. 19 v 17:29 Dalebj=EF=BF=BDrk, Tomas napsal(a): > Thanks for feedback, >=20 > But, it would be better if the cow device could be recreated in a faster = way,=20 > mentioning that all blocks are present on an external device, so that the= LV=20 > volume can be restored much quicker using "lvconvert --merge" command. >=20 > That would be super cool! >=20 > Imagine backing up multi terrabyte sized volumes in minutes to external=20 > destinations, and restoring the data in seconds using instant recovery by= =20 > re-creating or emulating the cow device, and associating all blocks to an= =20 > external device? >=20 Hi I do not want to break your imagination here, but that is exactly the thing= =20 you can do with thin provisioning and thin_delta tool. You just work with LV, take snapshot1, take snapshot2, send delta between s1 -> s2 to remove machine, remove s1, take s3, send delta s2 -> s3... It's just not automated by lvm2 ATM... Using this with old snapshot would be insanely inefficient... Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Message-ID: <83c4026c-5abe-e9e5-ac1d-6ed9e025e660@gmail.com> Date: Tue, 22 Oct 2019 18:13:25 +0200 MIME-Version: 1.0 In-Reply-To: Content-Transfer-Encoding: 8bit Content-Language: sv Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: Zdenek Kabelac , LVM general discussion and development That is cool, But, are there any practical example how this could be working in reality. Eg: lvcreate -s mysnap vg/testlv thin_dump vg/mysnap > deltafile # I assume that this should be the name of the snapshot? But... How to recreate only the metadata only?, so that the meta data changes are associated to an external device? thin_restore -i metadata < deltafile # that will restore the metadata, but I also want the restored meta data to point out the location of the data from for example a file or a raw deice I have created a way to perform block level incremental forever by reading the -cow device, and thin_dump would be nice replacement for that. This can also be reversed, so that the thin_restore can be used to restore the meta data and the data@same time (If I now the format of it) But it would be much more better if one can do the restoration in background using "lvconvert --merge" tool, by first restoring the metadata (I can understand that this part is needed), and assoicate all the data to an external raw disk or much more better a file, so that all changes associated to this restored snapshot can be found on the file. Not so good to explain this, but I hope you understand how I am thinking. A destroyed thin pool, can than be restored instantly using a backup server as the cow similar device. Regards Tomas Den 2019-10-22 kl. 17:36, skrev Zdenek Kabelac: > Dne 22. 10. 19 v 17:29 Dalebj�rk, Tomas napsal(a): >> Thanks for feedback, >> >> But, it would be better if the cow device could be recreated in a >> faster way, mentioning that all blocks are present on an external >> device, so that the LV volume can be restored much quicker using >> "lvconvert --merge" command. >> >> That would be super cool! >> >> Imagine backing up multi terrabyte sized volumes in minutes to >> external destinations, and restoring the data in seconds using >> instant recovery by re-creating or emulating the cow device, and >> associating all blocks to an external device? >> > > Hi > > I do not want to break your imagination here, but that is exactly the > thing you can do with thin provisioning and thin_delta tool. > > You just work with LV, take snapshot1, take snapshot2, > send delta between� s1 -> s2� to remove machine, > remove s1, take s3, send delta� s2 -> s3... > > It's just not automated by lvm2 ATM... > > Using this with old snapshot would be insanely inefficient... > > Regards > > Zdenek > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx12.extmail.prod.ext.phx2.redhat.com [10.5.110.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 43CDF10027A1 for ; Tue, 22 Oct 2019 16:15:10 +0000 (UTC) Received: from mail.gathman.org (mail.gathman.org [70.184.247.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B923A307F5ED for ; Tue, 22 Oct 2019 16:15:08 +0000 (UTC) Date: Tue, 22 Oct 2019 12:15:06 -0400 (EDT) From: "Stuart D. Gathman" In-Reply-To: Message-ID: References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463811830-2053687769-1571760906=:1156" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: LVM general discussion and development Cc: =?ISO-8859-15?Q?Dalebj=F6rk=2C_Tomas?= This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463811830-2053687769-1571760906=:1156 Content-Type: text/plain; charset="windows-1252"; format="flowed" Content-Transfer-Encoding: quoted-printable On Tue, 22 Oct 2019, Zdenek Kabelac wrote: > Dne 22. 10. 19 v 17:29 Dalebj=EF=BF=BDrk, Tomas napsal(a): >> But, it would be better if the cow device could be recreated in a faster= =20 >> way, mentioning that all blocks are present on an external device, so th= at=20 >> the LV volume can be restored much quicker using "lvconvert --merge"=20 >> command. > I do not want to break your imagination here, but that is exactly the thi= ng=20 > you can do with thin provisioning and thin_delta tool. lvconvert --merge does a "rollback" to the point at which the snapshot was taken. The master LV already has current data. What Tomas wants to be able to do a "rollforward" from the point at which the snapshot was taken. He also wants to be able to put the cow volume on an extern/remote medium, and add a snapshot using an already existing cow. This way, restoring means copying the full volume from backup, creating a snapshot using existing external cow, then lvconvert --merge=20 instantly logically applies the cow changes while updating the master LV. Pros: "Old" snapshots are exactly as efficient as thin when there is exactly one. They only get inefficient with multiple snapshots. On the other hand, thin volumes are as inefficient as an old LV with one snapshot. An old LV is as efficient, and as anti-fragile, as a partition. Thin volumes are much more flexible, but depend on much more fragile database like meta-data. For this reason, I always prefer "old" LVs when the functionality of thin LVs are not actually needed. I can even manually recover from trashed meta data by editing it, as it is human readable text. Updates to the external cow can be pipelined (but then properly handling reads becomes non trivial - there are mature remote block device implementations for linux that will do the job). Cons: For the external cow to be useful, updates to it must be *strictly* serialized. This is doable, but not as obvious or trivial as it might seem at first glance. (Remote block device software will take care of this as well.) The "rollforward" must be applied to the backup image of the snapshot. If the admin gets it paired with the wrong backup, massive corruption ensues. This could be automated. E.g. the full image backup and external cow would have unique matching names. Or the full image backup could compute an md5 in parallel, which would be store with the cow. But none of those tools currently exist. --=20 Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ---1463811830-2053687769-1571760906=:1156-- From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx13.extmail.prod.ext.phx2.redhat.com [10.5.110.42]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E7CCD3CC8 for ; Tue, 22 Oct 2019 21:38:44 +0000 (UTC) Received: from mr011msb.fastweb.it (mr011msb.fastweb.it [85.18.95.108]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2235C3082135 for ; Tue, 22 Oct 2019 21:38:41 +0000 (UTC) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Date: Tue, 22 Oct 2019 23:38:34 +0200 From: Gionatan Danti In-Reply-To: References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> Message-ID: <9e58accdc28692b3c8b2b09f37bce57c@assyoma.it> Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development Cc: =?UTF-8?Q?Dalebj=C3=B6rk=2C_Tomas?= Hi, Il 22-10-2019 18:15 Stuart D. Gathman ha scritto: > "Old" snapshots are exactly as efficient as thin when there is exactly > one. They only get inefficient with multiple snapshots. On the other > hand, thin volumes are as inefficient as an old LV with one snapshot. > An old LV is as efficient, and as anti-fragile, as a partition. Thin > volumes are much more flexible, but depend on much more fragile > database > like meta-data. this is both true and false: while in the single-snapshot case performance remains acceptable even from fat snapshots, the btree representation (and more modern code) of the "new" (7+ years old now) thin snapshots gurantees significantly higher performance, at least on my tests. Note #1: I know that the old snapshot code uses 4K chunks by default, versus the 64K chunks of thinsnap. That said, I recorded higher thinsnap performance even when using a 64K chunk size for old fat snapshots. Note #2: I generally disable thinpool zeroing (as I use a filesystem layer on top of thin volumes). I 100% agree that old LVM code, with its plain text metadata and continuous plain-text backups, is extremely reliable and easy to fix/correct. > For this reason, I always prefer "old" LVs when the functionality of > thin LVs are not actually needed. I can even manually recover from > trashed meta data by editing it, as it is human readable text. My main use of fat logical volumes is for boot and root filesystems, while thin vols (and zfs datasets, but this is another story...) are used for data partitions. The main thing that somewhat scares me is that (if things had not changed) thinvol uses a single root btree node: losing it means losing *all* thin volumes of a specific thin pool. Coupled with the fact that metadata dump are not as handy as with the old LVM code (no vgcfgrestore), it worries me. > The "rollforward" must be applied to the backup image of the snapshot. > If the admin gets it paired with the wrong backup, massive corruption > ensues. This could be automated. E.g. the full image backup and > external cow would have unique matching names. Or the full image > backup > could compute an md5 in parallel, which would be store with the cow. > But none of those tools currently exist. This is the reason why I have not used thin_delta in production: an error from my part in recovering the volume (ie: applying the wrong delta) would cause massive data corruption. My current setup for instant recovery *and* added resiliance is somewhat similar to that: RAID -> DRBD -> THINPOOL -> THINVOL w/periodic snapshots (with the DRBD layer replicating to a sibling machine). Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx22.extmail.prod.ext.phx2.redhat.com [10.5.110.63]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E442A5C1D6 for ; Tue, 22 Oct 2019 22:53:50 +0000 (UTC) Received: from mail.gathman.org (mail.gathman.org [70.184.247.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8EEB918CB8E0 for ; Tue, 22 Oct 2019 22:53:49 +0000 (UTC) Date: Tue, 22 Oct 2019 18:53:41 -0400 (EDT) From: "Stuart D. Gathman" In-Reply-To: <9e58accdc28692b3c8b2b09f37bce57c@assyoma.it> Message-ID: References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <9e58accdc28692b3c8b2b09f37bce57c@assyoma.it> MIME-Version: 1.0 Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" Content-Transfer-Encoding: 7bit To: Gionatan Danti Cc: =?ISO-8859-15?Q?Dalebj=F6rk=2C_Tomas?= , LVM general discussion and development On Tue, 22 Oct 2019, Gionatan Danti wrote: > The main thing that somewhat scares me is that (if things had not changed) > thinvol uses a single root btree node: losing it means losing *all* thin > volumes of a specific thin pool. Coupled with the fact that metadata dump are > not as handy as with the old LVM code (no vgcfgrestore), it worries me. If you can find all the leaf nodes belonging to the root (in my btree database they are marked with the root id and can be found by sequential scan of the volume), then reconstructing the btree data is straightforward - even in place. I remember realizing this was the only way to recover a major customer's data - and had the utility written, tested, and applied in a 36 hour programming marathon (which I hope to never repeat). If this hasn't occured to thin pool programmers, I am happy to flesh out the procedure. Having such a utility available as a last resort would ratchet up the reliability of thin pools. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx17.extmail.prod.ext.phx2.redhat.com [10.5.110.46]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 969EC60619 for ; Wed, 23 Oct 2019 06:58:56 +0000 (UTC) Received: from mr011msb.fastweb.it (mr011msb.fastweb.it [85.18.95.108]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CBE623082E03 for ; Wed, 23 Oct 2019 06:58:52 +0000 (UTC) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Date: Wed, 23 Oct 2019 08:58:45 +0200 From: Gionatan Danti In-Reply-To: References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <9e58accdc28692b3c8b2b09f37bce57c@assyoma.it> Message-ID: Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: "Stuart D. Gathman" Cc: =?UTF-8?Q?Dalebj=C3=B6rk=2C_Tomas?= , LVM general discussion and development Il 23-10-2019 00:53 Stuart D. Gathman ha scritto: > If you can find all the leaf nodes belonging to the root (in my btree > database they are marked with the root id and can be found by > sequential > scan of the volume), then reconstructing the btree data is > straightforward - even in place. > > I remember realizing this was the only way to recover a major > customer's > data - and had the utility written, tested, and applied in a 36 hour > programming marathon (which I hope to never repeat). If this hasn't > occured to thin pool programmers, I am happy to flesh out the > procedure. > Having such a utility available as a last resort would ratchet up the > reliability of thin pools. Very interesting. Can I ask you what product/database you recovered? Anyway, giving similar ability to thin Vols would be awesome. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx16.extmail.prod.ext.phx2.redhat.com [10.5.110.45]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2A82D5D6A9 for ; Tue, 22 Oct 2019 17:02:27 +0000 (UTC) Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3C8513082A98 for ; Tue, 22 Oct 2019 17:02:26 +0000 (UTC) Received: by mail-lf1-f41.google.com with SMTP id y127so13732278lfc.0 for ; Tue, 22 Oct 2019 10:02:26 -0700 (PDT) MIME-Version: 1.0 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> In-Reply-To: From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Tue, 22 Oct 2019 19:02:14 +0200 Message-ID: Content-Type: multipart/alternative; boundary="000000000000ab96cb059582c3a1" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: "Stuart D. Gathman" Cc: LVM general discussion and development --000000000000ab96cb059582c3a1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for feedbak. Think of lvmsync as a tool, which reads the block changes from the cow device. ... Lets assume that I am able to recreate this cow format instantly back to the server, and present this as a file with the name "cowfile" on the file system for simplicity. Is it possible than in some way, to use this cowfile in someway to inform LVM about the location of the snapshot area, so that lvconvert --merge can be used to restore the data quicker, using this cowfile. The cowfile will include all blocks for the logical volume. Regards Tomas Den tis 22 okt. 2019 kl 18:15 skrev Stuart D. Gathman : > On Tue, 22 Oct 2019, Zdenek Kabelac wrote: > > > Dne 22. 10. 19 v 17:29 Dalebj=C3=B6rk, Tomas napsal(a): > >> But, it would be better if the cow device could be recreated in a > faster > >> way, mentioning that all blocks are present on an external device, so > that > >> the LV volume can be restored much quicker using "lvconvert --merge" > >> command. > > > I do not want to break your imagination here, but that is exactly the > thing > > you can do with thin provisioning and thin_delta tool. > > lvconvert --merge does a "rollback" to the point at which the snapshot > was taken. The master LV already has current data. What Tomas wants to > be able to do a "rollforward" from the point at which the snapshot was > taken. He also wants to be able to put the cow volume on an > extern/remote medium, and add a snapshot using an already existing cow. > > This way, restoring means copying the full volume from backup, creating > a snapshot using existing external cow, then lvconvert --merge > instantly logically applies the cow changes while updating the master > LV. > > Pros: > > "Old" snapshots are exactly as efficient as thin when there is exactly > one. They only get inefficient with multiple snapshots. On the other > hand, thin volumes are as inefficient as an old LV with one snapshot. > An old LV is as efficient, and as anti-fragile, as a partition. Thin > volumes are much more flexible, but depend on much more fragile database > like meta-data. > > For this reason, I always prefer "old" LVs when the functionality of > thin LVs are not actually needed. I can even manually recover from > trashed meta data by editing it, as it is human readable text. > > Updates to the external cow can be pipelined (but then properly > handling reads becomes non trivial - there are mature remote block > device implementations for linux that will do the job). > > Cons: > > For the external cow to be useful, updates to it must be *strictly* > serialized. This is doable, but not as obvious or trivial as it might > seem at first glance. (Remote block device software will take care > of this as well.) > > The "rollforward" must be applied to the backup image of the snapshot. > If the admin gets it paired with the wrong backup, massive corruption > ensues. This could be automated. E.g. the full image backup and > external cow would have unique matching names. Or the full image backup > could compute an md5 in parallel, which would be store with the cow. > But none of those tools currently exist. > > -- > Stuart D. Gathman > "Confutatis maledictis, flamis acribus addictis" - background song for > a Microsoft sponsored "Where do you want to go from here?" commercial. --000000000000ab96cb059582c3a1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for feedbak.

Think of= lvmsync as a tool, which reads the block changes from the cow device.
<offset><chunk_size>= ;<data>...
Lets assume that I am able to recreate th= is cow format instantly back to the server,
and present this as a= file with the name "cowfile" on the file system for simplicity.<= br>

Is it possible than in some way, to use this c= owfile in someway to inform LVM about the location of the snapshot area, so= that lvconvert --merge can be= used to restore the data quicker, using this cowfile.

=
The cowfile will include all blocks for the logical volume.
=
Regards Tomas

=
Den tis 22 okt. 2019 kl 18:15 skrev S= tuart D. Gathman <stuart@gathman.o= rg>:
On T= ue, 22 Oct 2019, Zdenek Kabelac wrote:

> Dne 22. 10. 19 v 17:29 Dalebj=C3=B6rk, Tomas napsal(a):
>> But, it would be better if the cow device could be recreated in a = faster
>> way, mentioning that all blocks are present on an external device,= so that
>> the LV volume can be restored much quicker using "lvconvert -= -merge"
>> command.

> I do not want to break your imagination here, but that is exactly the = thing
> you can do with thin provisioning and thin_delta tool.

lvconvert --merge does a "rollback" to the point at which the sna= pshot
was taken.=C2=A0 The master LV already has current data.=C2=A0 What Tomas w= ants to
be able to do a "rollforward" from the point at which the snapsho= t was
taken.=C2=A0 He also wants to be able to put the cow volume on an
extern/remote medium, and add a snapshot using an already existing cow.

This way, restoring means copying the full volume from backup, creating
a snapshot using existing external cow, then lvconvert --merge
instantly logically applies the cow changes while updating the master
LV.

Pros:

"Old" snapshots are exactly as efficient as thin when there is ex= actly
one.=C2=A0 They only get inefficient with multiple snapshots.=C2=A0 On the = other
hand, thin volumes are as inefficient as an old LV with one snapshot.
An old LV is as efficient, and as anti-fragile, as a partition.=C2=A0 Thin<= br> volumes are much more flexible, but depend on much more fragile database like meta-data.

For this reason, I always prefer "old" LVs when the functionality= of
thin LVs are not actually needed.=C2=A0 I can even manually recover from trashed meta data by editing it, as it is human readable text.

Updates to the external cow can be pipelined (but then properly
handling reads becomes non trivial - there are mature remote block
device implementations for linux that will do the job).

Cons:

For the external cow to be useful, updates to it must be *strictly*
serialized.=C2=A0 This is doable, but not as obvious or trivial as it might=
seem at first glance.=C2=A0 (Remote block device software will take care of this as well.)

The "rollforward" must be applied to the backup image of the snap= shot.
If the admin gets it paired with the wrong backup, massive corruption
ensues.=C2=A0 This could be automated.=C2=A0 E.g. the full image backup and=
external cow would have unique matching names.=C2=A0 Or the full image back= up
could compute an md5 in parallel, which would be store with the cow.
But none of those tools currently exist.

--
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Stuart D. Gathman <stuart@gathman.org&= gt;
"Confutatis maledictis, flamis acribus addictis" - background son= g for
a Microsoft sponsored "Where do you want to go from here?" commer= cial.
--000000000000ab96cb059582c3a1-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <9e58accdc28692b3c8b2b09f37bce57c@assyoma.it> From: Zdenek Kabelac Message-ID: <8dcbc40d-3a29-f920-8f05-b697571bf563@redhat.com> Date: Wed, 23 Oct 2019 12:12:49 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: LVM general discussion and development , "Stuart D. Gathman" , Gionatan Danti Cc: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Dne 23. 10. 19 v 0:53 Stuart D. Gathman napsal(a): > On Tue, 22 Oct 2019, Gionatan Danti wrote: >=20 >> The main thing that somewhat scares me is that (if things had not change= d)=20 >> thinvol uses a single root btree node: losing it means losing *all* thin= =20 >> volumes of a specific thin pool. Coupled with the fact that metadata dum= p=20 >> are not as handy as with the old LVM code (no vgcfgrestore), it worries = me. >=20 > If you can find all the leaf nodes belonging to the root (in my btree > database they are marked with the root id and can be found by sequential > scan of the volume), then reconstructing the btree data is > straightforward - even in place. >=20 > I remember realizing this was the only way to recover a major customer's > data - and had the utility written, tested, and applied in a 36 hour > programming marathon (which I hope to never repeat).=EF=BF=BD If this has= n't > occured to thin pool programmers, I am happy to flesh out the procedure. > Having such a utility available as a last resort would ratchet up the > reliability of thin pools. There have been made great enhancements in thin_repair tool (>=3D0.8.5) But of course further fixes and extensions are always welcomed by Joe. There are unfortunately some 'limitations' what can be fixed with current metadata format but lots of troubles we have witnessed in past are now mostly 'covered' by the recent kernel driver. But if there is known case=20 causing troubles - please open BZ so we can look over it. Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <83c4026c-5abe-e9e5-ac1d-6ed9e025e660@gmail.com> From: Zdenek Kabelac Message-ID: Date: Wed, 23 Oct 2019 12:26:40 +0200 MIME-Version: 1.0 In-Reply-To: <83c4026c-5abe-e9e5-ac1d-6ed9e025e660@gmail.com> Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: LVM general discussion and development , =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= , Zdenek Kabelac Dne 22. 10. 19 v 18:13 Dalebj=EF=BF=BDrk, Tomas napsal(a): > That is cool, >=20 > But, are there any practical example how this could be working in reality. >=20 There is not yet a practical example available from our lvm2 team yet. So we are only describing the 'model' & 'plan' we have ATM... >=20 > I have created a way to perform block level incremental forever by readin= g the=20 > -cow device, and thin_dump would be nice replacement for that. COW is dead technology from our perspective - it can't cope with recent performance of modern drives like NVMe... So our plan is to focus on thinp technology here. Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Zdenek Kabelac Message-ID: Date: Wed, 23 Oct 2019 12:46:23 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: LVM general discussion and development , "Stuart D. Gathman" Cc: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Dne 22. 10. 19 v 18:15 Stuart D. Gathman napsal(a): > On Tue, 22 Oct 2019, Zdenek Kabelac wrote: > >> Dne 22. 10. 19 v 17:29 Dalebj�rk, Tomas napsal(a): >>> But, it would be better if the cow device could be recreated in a faster >>> way, mentioning that all blocks are present on an external device, so that >>> the LV volume can be restored much quicker using "lvconvert --merge" command. > >> I do not want to break your imagination here, but that is exactly the thing >> you can do with thin provisioning and thin_delta tool. > > lvconvert --merge does a "rollback" to the point at which the snapshot > was taken.� The master LV already has current data.� What Tomas wants to > be able to do a "rollforward" from the point at which the snapshot was > taken.� He also wants to be able to put the cow volume on an > extern/remote medium, and add a snapshot using an already existing cow. > > This way, restoring means copying the full volume from backup, creating > a snapshot using existing external cow, then lvconvert --merge instantly > logically applies the cow changes while updating the master > LV. > > Pros: > > "Old" snapshots are exactly as efficient as thin when there is exactly > one.� They only get inefficient with multiple snapshots.� On the other > hand, thin volumes are as inefficient as an old LV with one snapshot. > An old LV is as efficient, and as anti-fragile, as a partition.� Thin > volumes are much more flexible, but depend on much more fragile database > like meta-data. Just few 'comments' - it's not really comparable - the efficiency of thin-pool metadata outperforms old snapshot in BIG way (there is no point to talk about snapshots that takes just couple of MiB) There is also BIG difference about the usage of old snapshot origin and snapshot. COW of old snapshot effectively cuts performance 1/2 if you write to origin. > For this reason, I always prefer "old" LVs when the functionality of > thin LVs are not actually needed.� I can even manually recover from > trashed meta data by editing it, as it is human readable text. On the other hand you can loose COW snapshot at any moment in time if your 'COW' storage is no big enough - this is very different from thin-poo..... Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Gionatan Danti Message-ID: Date: Wed, 23 Oct 2019 13:08:37 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: it-IT Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development , Zdenek Kabelac , "Stuart D. Gathman" Cc: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= On 23/10/19 12:46, Zdenek Kabelac wrote: > Just few 'comments' - it's not really comparable - the efficiency of > thin-pool metadata outperforms old snapshot in BIG way (there is no > point to talk about snapshots that takes just couple of MiB) Yes, this matches my experience. > There is also BIG difference about the usage of old snapshot origin and > snapshot. > > COW of old snapshot effectively cuts performance 1/2 if you write to > origin. If used without non-volatile RAID controller, 1/2 is generous - I measured performance as low as 1/5 (with fat snapshot). Talking about thin snapshot, an obvious performance optimization which seems to not be implemented is to skip reading source data when overwriting in larger-than-chunksize blocks. For example, consider a completely filled 64k chunk thin volume (with thinpool having ample free space). Snapshotting it and writing a 4k block on origin will obviously cause a read of the original 64k chunk, an in-memory change of the 4k block and a write of the entire modified 64k block to a new location. But writing, say, a 1 MB block should *not* cause the same read on source: after all, the read data will be immediately discarded, overwritten by the changed 1 MB block. However, my testing shows that source chunks are always read, even when completely overwritten. Am I missing something? -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com [10.5.110.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 70F9E60624 for ; Wed, 23 Oct 2019 12:20:56 +0000 (UTC) Received: from mail.service4.ru (mail.service4.ru [95.217.67.65]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DA4F7C0495A3 for ; Wed, 23 Oct 2019 12:20:53 +0000 (UTC) Received: from mail.service4.ru (localhost [127.0.0.1]) by mail.service4.ru (Postfix) with ESMTP id 9C3F240962 for ; Wed, 23 Oct 2019 15:20:52 +0300 (MSK) Received: from mail.service4.ru ([127.0.0.1]) by mail.service4.ru (mail.service4.ru [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 7LfVSn2KIkqn for ; Wed, 23 Oct 2019 15:20:51 +0300 (MSK) Received: from ilia (unknown [95.217.67.1]) by mail.service4.ru (Postfix) with ESMTPSA id AABE940310 for ; Wed, 23 Oct 2019 15:20:51 +0300 (MSK) References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Ilia Zykov Message-ID: Date: Wed, 23 Oct 2019 15:20:53 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="------------ms000005010904030503070602" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: linux-lvm@redhat.com This is a cryptographically signed message in MIME format. --------------ms000005010904030503070602 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 23.10.2019 14:08, Gionatan Danti wrote: >=20 > For example, consider a completely filled 64k chunk thin volume (with > thinpool having ample free space). Snapshotting it and writing a 4k > block on origin will obviously cause a read of the original 64k chunk, > an in-memory change of the 4k block and a write of the entire modified > 64k block to a new location. But writing, say, a 1 MB block should *not= * > cause the same read on source: after all, the read data will be > immediately discarded, overwritten by the changed 1 MB block. >=20 > However, my testing shows that source chunks are always read, even when= > completely overwritten. Not only read but sometimes write. I watched it without snapshot. Only zeroing was enabled. Before wrote new chunks "dd bs=3D1048576 ..." chunks were zeroed. But for security it'= s good. IMHO: In this case good choice firstly write chunks to the disk and then give this chunks to the volume. >=20 > Am I missing something? >=20 --------------ms000005010904030503070602 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC C5AwggVBMIIEKaADAgECAhAMikt7ZGHaNf6ki+JIVQugMA0GCSqGSIb3DQEBCwUAMIGCMQsw CQYDVQQGEwJJVDEPMA0GA1UECAwGTWlsYW5vMQ8wDQYDVQQHDAZNaWxhbm8xIzAhBgNVBAoM GkFjdGFsaXMgUy5wLkEuLzAzMzU4NTIwOTY3MSwwKgYDVQQDDCNBY3RhbGlzIENsaWVudCBB dXRoZW50aWNhdGlvbiBDQSBHMTAeFw0xOTA4MTUxMDE1MjZaFw0yMDA4MTUxMDE1MjZaMBcx FTATBgNVBAMMDG1haWxAaXp5ay5ydTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB ANVNYo+py/n2jfs2n8r7GjG/fV1caStg6b4ndJgjA/fTA9tqrQYQPbwfojv/ucOR+SopvLuq iHAhe9AF14GYcZZ+C1bghoabnY0271D1y8MiY1u+fRlLsOyuKFVROGqccKxZv5YPFLm2lSzB TqGTIje1BGlE6ONOl/3EdiL2D3+nLlhnoXJMF64rKywiehFzerNqoNCpfhsqztneZZ9PPxwZ kgM/fZDHmF+WNPgFvK8j7BsgVIRbH1aej91Del8eXotFJE17wjOGXhe7xAp6EKpF0gqBtSz1 My3etWp1eiEWyksMd7jJCWEWwcBSX2TYYrwE5gWYonCkiY2eqZUm+yUCAwEAAaOCAhswggIX MAwGA1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAUfmD8+GynPT3XrpOheQKPs3QpO/UwSwYIKwYB BQUHAQEEPzA9MDsGCCsGAQUFBzAChi9odHRwOi8vY2FjZXJ0LmFjdGFsaXMuaXQvY2VydHMv YWN0YWxpcy1hdXRjbGlnMTAXBgNVHREEEDAOgQxtYWlsQGl6eWsucnUwRwYDVR0gBEAwPjA8 BgYrgR8BGAEwMjAwBggrBgEFBQcCARYkaHR0cHM6Ly93d3cuYWN0YWxpcy5pdC9hcmVhLWRv d25sb2FkMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcDBDCB6AYDVR0fBIHgMIHdMIGb oIGYoIGVhoGSbGRhcDovL2xkYXAwNS5hY3RhbGlzLml0L2NuJTNkQWN0YWxpcyUyMENsaWVu dCUyMEF1dGhlbnRpY2F0aW9uJTIwQ0ElMjBHMSxvJTNkQWN0YWxpcyUyMFMucC5BLi8wMzM1 ODUyMDk2NyxjJTNkSVQ/Y2VydGlmaWNhdGVSZXZvY2F0aW9uTGlzdDtiaW5hcnkwPaA7oDmG N2h0dHA6Ly9jcmwwNS5hY3RhbGlzLml0L1JlcG9zaXRvcnkvQVVUSENMLUcxL2dldExhc3RD UkwwHQYDVR0OBBYEFHTf5HbXR0vLELk8y8T2yn5e3stDMA4GA1UdDwEB/wQEAwIFoDANBgkq hkiG9w0BAQsFAAOCAQEAYTBNdI9CGfJoZhiKLpk1n3MoDtprPV4ba5LS5SPb2qyzXnkWpp10 DAlUKS934tjIniXwSSsWvChoNv0iB07eTRRd3eG7aR9CMj3KTx2q6ZUrX8i+5goXE08aDE8v fRJh3DVVSkcTw0s+4SzKW9DNz92C3LvTnozJcUnW30J+JhfsE0BqYt4yboRxfBaboXk9zAEB YKtvH0ERemen1z/SMif/2bNjGG+kKfy+XoXNwM/UqVfguveiPZ3esnsnbH3BY+7HCwYz4i7s VDc6IKmVko2gTXxbrZxutJq3yL989ou/mfpMo2S/Z/nvs6NBy63No4hVSCIM8bZLk0tomrDh ajCCBkcwggQvoAMCAQICCCzUitOxHg+JMA0GCSqGSIb3DQEBCwUAMGsxCzAJBgNVBAYTAklU MQ4wDAYDVQQHDAVNaWxhbjEjMCEGA1UECgwaQWN0YWxpcyBTLnAuQS4vMDMzNTg1MjA5Njcx JzAlBgNVBAMMHkFjdGFsaXMgQXV0aGVudGljYXRpb24gUm9vdCBDQTAeFw0xNTA1MTQwNzE0 MTVaFw0zMDA1MTQwNzE0MTVaMIGCMQswCQYDVQQGEwJJVDEPMA0GA1UECAwGTWlsYW5vMQ8w DQYDVQQHDAZNaWxhbm8xIzAhBgNVBAoMGkFjdGFsaXMgUy5wLkEuLzAzMzU4NTIwOTY3MSww KgYDVQQDDCNBY3RhbGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBHMTCCASIwDQYJKoZI hvcNAQEBBQADggEPADCCAQoCggEBAMD8wYlW2Yji9ARlv80JNasoKTD+DMr3J6scEe6GPV3k 9WxEtgxXM5WX3oiKjS2p25Mqk8cnV2fpMaEvdO9alrGes0vqcUqly7PkU753RGlseYXR2XCj Vhs4cuRYjuBmbxpRSJxRImmPnThKY41r0nl6b3A6Z2MOjPQF7h6OCYYwtz/ziv/+UBV587U2 uIlOukaS7Xjk4ArYkQsGTSsfBBXqqn06WL3xG+B/dRO5/mOtY5tHdhPHydsBk2kksI3PJ0yN gKV7o6HM7pG9pB6sGhj96uVLnnVnJ0WXOuV1ISv2eit9ir60LjT99hf+TMZLxA5yaVJ57fYj BMbxM599cw0CAwEAAaOCAdUwggHRMEEGCCsGAQUFBwEBBDUwMzAxBggrBgEFBQcwAYYlaHR0 cDovL29jc3AwNS5hY3RhbGlzLml0L1ZBL0FVVEgtUk9PVDAdBgNVHQ4EFgQUfmD8+GynPT3X rpOheQKPs3QpO/UwDwYDVR0TAQH/BAUwAwEB/zAfBgNVHSMEGDAWgBRS2Ig6yJ94Zu2J83s4 cJTJAgI20DBFBgNVHSAEPjA8MDoGBFUdIAAwMjAwBggrBgEFBQcCARYkaHR0cHM6Ly93d3cu YWN0YWxpcy5pdC9hcmVhLWRvd25sb2FkMIHjBgNVHR8EgdswgdgwgZaggZOggZCGgY1sZGFw Oi8vbGRhcDA1LmFjdGFsaXMuaXQvY24lM2RBY3RhbGlzJTIwQXV0aGVudGljYXRpb24lMjBS b290JTIwQ0EsbyUzZEFjdGFsaXMlMjBTLnAuQS4lMmYwMzM1ODUyMDk2NyxjJTNkSVQ/Y2Vy dGlmaWNhdGVSZXZvY2F0aW9uTGlzdDtiaW5hcnkwPaA7oDmGN2h0dHA6Ly9jcmwwNS5hY3Rh bGlzLml0L1JlcG9zaXRvcnkvQVVUSC1ST09UL2dldExhc3RDUkwwDgYDVR0PAQH/BAQDAgEG MA0GCSqGSIb3DQEBCwUAA4ICAQBNk87VJL5BG0oWWHNfZYny2Xo+WIy8y8QP5VsWZ7LBS6Qz 8kn8zJp3c9xdOkudZbcA3vm5U8HKXc1JdzNmpSh92zq/OeZLvUa+rnncmvhxkFE9Doag6Nit ggBPZwXHwDcYn430/F8wqAt3LX/bsd6INVrhPFk3C2SoAjLjUQZibXvQuFINMN4l6j86vCrk UaGzSqnXT45NxIivkAPhBQgpGtcTi4f+3DxkyTDbWtf9LuaC4l2jgB3gC7f56nmdpGfpYsyv KE7+Ip+WryH93pWt6C+r68KU3Gu02cU1/dHvNOXWUDeKkVT3T26wZVrTaMx+0nS3i63KDfJd hFzutfdBgCWHcp03NhOhMqy1RnAylF/dVZgkka6hKaWe1tOU21kS4uvsD4wM5k6tl0pin2o6 u47kyoJJMOxRSQcosWtDXUmaLHUG91ZC6hvBDmDmpmS6h/r+7mtPrpYOxTr4hW3me2EfXkTv NTvBQtbi4LrZchg9vhi44EJ7L53g7GzQFn5KK8vqqgMb1c1+T0mkKdqSedgGiB9TDdYtv4Hk Uj/N00TKxZMLiDMw4V8ShUL6bKTXNfb3E68s47cD+MatFjUuGFj0uFPvZlvlNAoJ7IMfXzIi TWy35X+akm+d49wBh54yv6icz2t/cBU1y1weuPBd8NUH/Ue3mXk0SXwkGP3yVDGCAp0wggKZ AgEBMIGXMIGCMQswCQYDVQQGEwJJVDEPMA0GA1UECAwGTWlsYW5vMQ8wDQYDVQQHDAZNaWxh bm8xIzAhBgNVBAoMGkFjdGFsaXMgUy5wLkEuLzAzMzU4NTIwOTY3MSwwKgYDVQQDDCNBY3Rh bGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBHMQIQDIpLe2Rh2jX+pIviSFULoDANBglg hkgBZQMEAgEFAKCB1zAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEP Fw0xOTEwMjMxMjIwNTNaMC8GCSqGSIb3DQEJBDEiBCBCgGo1yTpCLm7dCbf0Fv0rCY/ljt8n 4Ki4mcm9IMH6rTBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjALBglghkgBZQMEAQIw CgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFAMAcGBSsOAwIHMA0G CCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIIBAFW0rFaFfG4bdxVdAsW/y600j1TsXUl5 9mQ/qh7IeMRZLomq/0CAQZqGEYiB/rKH265oaGgBK8w+6PGxp/jioZX1r0uJhSpCZS1EPnH9 FaJ/53b+6xRUEAv0oM46SPE5CJciXMSooViOYMu7lztIRShTjt4iFJyM4Jykvyr3RweY2wh5 lcq9M8svPJxUtBLYw/ClgEkahV39izux2EIugTgJNgN8mESiDhvOsPiP56JMT5xCPz6GDapm 8KBXgL2xry2r3z4PJgpFQdcC5LUjlr3auG5H0KlGWOrrAzSf2Px1gTE/5JtYWWv/9ZhPtIjv 2KzaoPR+hDVtF3ho8ogzI/wAAAAAAAA= --------------ms000005010904030503070602-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Zdenek Kabelac Message-ID: <0dd80515-d4db-273f-2ee4-78981c11b2c8@redhat.com> Date: Wed, 23 Oct 2019 14:59:50 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development , Gionatan Danti , "Stuart D. Gathman" Cc: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Dne 23. 10. 19 v 13:08 Gionatan Danti napsal(a): > On 23/10/19 12:46, Zdenek Kabelac wrote: >> Just few 'comments' - it's not really comparable - the efficiency of >> thin-pool metadata outperforms old snapshot in BIG way (there is no point to >> talk about snapshots that takes just couple of MiB) > > Yes, this matches my experience. > >> There is also BIG difference about the usage of old snapshot origin and >> snapshot. >> >> COW of old snapshot effectively cuts performance 1/2 if you write to origin. > > If used without non-volatile RAID controller, 1/2 is generous - I measured > performance as low as 1/5 (with fat snapshot). > > Talking about thin snapshot, an obvious performance optimization which seems > to not be implemented is to skip reading source data when overwriting in > larger-than-chunksize blocks. Hi There is no such optimization possible for old snapshots. You would need to write ONLY to snapshots. As soon as you start to write to origin - you have to 'read' original data from origin, copy them to COW storage, once this is finished, you can overwrite origin data area with your writing I/O. This is simply never going to work fast ;) - the fast way is thin-pool... Old snapshots were designed for 'short' lived snapshots (so you can take a backup of volume which is not being modified underneath). Any idea of improving this old snapshots target are sooner or later going to end-up with thin-pool anyway :) (we've been in this river many many years back in time...) > For example, consider a completely filled 64k chunk thin volume (with thinpool > having ample free space). Snapshotting it and writing a 4k block on origin There is no support of snapshot of snapshot with old snaps... It would be extremely slow to use... > However, my testing shows that source chunks are always read, even when > completely overwritten. > > Am I missing something? Yep - you would need to always jump to your 'snapshot' - so instead of keeping 'origin' on major:minor - it would need to become a 'snapshot'... Seriously complex concept to work with - especially when there is thin-pool... Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Zdenek Kabelac Message-ID: <8a27504b-1b3a-7126-7ade-18f57b819ead@redhat.com> Date: Wed, 23 Oct 2019 15:05:14 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development , Ilia Zykov Dne 23. 10. 19 v 14:20 Ilia Zykov napsal(a): > On 23.10.2019 14:08, Gionatan Danti wrote: >> >> For example, consider a completely filled 64k chunk thin volume (with >> thinpool having ample free space). Snapshotting it and writing a 4k >> block on origin will obviously cause a read of the original 64k chunk, >> an in-memory change of the 4k block and a write of the entire modified >> 64k block to a new location. But writing, say, a 1 MB block should *not* >> cause the same read on source: after all, the read data will be >> immediately discarded, overwritten by the changed 1 MB block. >> >> However, my testing shows that source chunks are always read, even when >> completely overwritten. > > Not only read but sometimes write. > I watched it without snapshot. Only zeroing was enabled. Before wrote > new chunks "dd bs=1048576 ..." chunks were zeroed. But for security it's > good. IMHO: In this case good choice firstly write chunks to the disk > and then give this chunks to the volume. Yep - we are recommending to disable zeroing as soon as chunksize >512K. But for 'security' reason the option it's up to users to select what fits the needs in the best way - there is no 'one solution fits them all' in this case. Clearly when you put a modern filesystem (ext4, xfs...) on top of thinLV - you can't read junk data - filesystem knows very well about written portions. But if you will access thinLV device on 'block-level' with 'dd' command you might see some old data trash if zeroing is disabled... For smaller chunksizes zeroing is usually not a big deal - with bigger chunks it slows down initial provisioning in major way - but once the block is provisioned there are no further costs.... Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <0dd80515-d4db-273f-2ee4-78981c11b2c8@redhat.com> From: Gionatan Danti Message-ID: <0fde933f-1e8d-60de-cada-4a49a67129c4@assyoma.it> Date: Wed, 23 Oct 2019 16:37:38 +0200 MIME-Version: 1.0 In-Reply-To: <0dd80515-d4db-273f-2ee4-78981c11b2c8@redhat.com> Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: Zdenek Kabelac , LVM general discussion and development , "Stuart D. Gathman" Cc: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= On 23/10/19 14:59, Zdenek Kabelac wrote: > Dne 23. 10. 19 v 13:08 Gionatan Danti napsal(a): >> Talking about thin snapshot, an obvious performance optimization which >> seems to not be implemented is to skip reading source data when >> overwriting in larger-than-chunksize blocks. > > Hi > > There is no such optimization possible for old snapshots. > You would need to write ONLY to snapshots. > > As soon as you start to write to origin - you have to 'read' original > data from origin, copy them to COW storage, once this is finished, you can > overwrite origin data area with your writing I/O. > > This is simply never going to work fast ;) - the fast way is thin-pool... > > Old snapshots were designed for 'short' lived snapshots (so you can take > a backup of volume which is not being modified underneath). > > Any idea of improving this old snapshots target are sooner or later > going to end-up with thin-pool anyway :)� (we've been in this river many > many years back in time...) > > >> For example, consider a completely filled 64k chunk thin volume (with >> thinpool having ample free space). Snapshotting it and writing a 4k >> block on origin > > There is no support of� snapshot of snapshot� with old snaps... > It would be extremely slow to use... > >> However, my testing shows that source chunks are always read, even >> when completely overwritten. >> >> Am I missing something? > > Yep - you would need to always jump to your 'snapshot' - so instead of > keeping 'origin' on� major:minor� - it would need to become a 'snapshot'... > Seriously complex concept to work with - especially when there is > thin-pool... Hi, I was speaking about *thin* snapshots here. Rewriting the example given above (for clarity): "For example, consider a completely filled 64k chunk thin volume (with thinpool having ample free space). Snapshotting it and writing a 4k block on origin will obviously cause a read of the original 64k chunk, an in-memory change of the 4k block and a write of the entire modified 64k block to a new location. But writing, say, a 1 MB block should *not* cause the same read on source: after all, the read data will be immediately discarded, overwritten by the changed 1 MB block." I would expect that such large-block *thin* snapshot rewrite behavior would not cause a read/modify/write, but it really does. Is this a low-hanging fruit or there are more fundamental problem avoiding read/modify/write in this case? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <8a27504b-1b3a-7126-7ade-18f57b819ead@redhat.com> From: Gionatan Danti Message-ID: <3e8184ac-4357-3b97-ac3c-d1133606f103@assyoma.it> Date: Wed, 23 Oct 2019 16:40:01 +0200 MIME-Version: 1.0 In-Reply-To: <8a27504b-1b3a-7126-7ade-18f57b819ead@redhat.com> Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: LVM general discussion and development , Zdenek Kabelac , Ilia Zykov On 23/10/19 15:05, Zdenek Kabelac wrote: > Yep - we are recommending to disable zeroing as soon as chunksize >512K. >=20 > But for 'security' reason the option it's up to users to select what=20 > fits the needs in the best way - there is no=EF=BF=BD 'one solution fits = them=20 > all' in this case. Sure, but again: if writing a block larger than the underlying chunk,=20 zeroing can (and should) skipped. Yet I seem to remember that the new=20 block is zeroed in any case, even if it is going to be rewritten entirely. Do I remember wrongly? --=20 Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <0dd80515-d4db-273f-2ee4-78981c11b2c8@redhat.com> <0fde933f-1e8d-60de-cada-4a49a67129c4@assyoma.it> From: Zdenek Kabelac Message-ID: <13d450bf-f6ad-7db8-eec1-88c4eb473287@redhat.com> Date: Wed, 23 Oct 2019 17:37:22 +0200 MIME-Version: 1.0 In-Reply-To: <0fde933f-1e8d-60de-cada-4a49a67129c4@assyoma.it> Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development , Gionatan Danti , "Stuart D. Gathman" Cc: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Dne 23. 10. 19 v 16:37 Gionatan Danti napsal(a): > On 23/10/19 14:59, Zdenek Kabelac wrote: >> Dne 23. 10. 19 v 13:08 Gionatan Danti napsal(a): >>> Talking about thin snapshot, an obvious performance optimization which >>> seems to not be implemented is to skip reading source data when overwriting >>> in larger-than-chunksize blocks. > > "For example, consider a completely filled 64k chunk thin volume (with > thinpool having ample free space). Snapshotting it and writing a 4k block on > origin will obviously cause a read of the original 64k chunk, an in-memory > change of the 4k block and a write of the entire modified 64k block to a new > location. But writing, say, a 1 MB block should *not* cause the same read on > source: after all, the read data will be immediately discarded, overwritten by > the changed 1 MB block." > > I would expect that such large-block *thin* snapshot rewrite behavior would > not cause a read/modify/write, but it really does. > > Is this a low-hanging fruit or there are more fundamental problem avoiding > read/modify/write in this case? Hi If you use 1MiB chunksize for thin-pool and you use 'dd' with proper bs size and you write 'aligned' on 1MiB boundary (be sure you user directIO, so you are not a victim of some page cache flushing...) - there should not be any useless read. If you still do see such read - and you can easily reproduce this with latest kernel - report a bug please with your reproducer and results. Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx10.extmail.prod.ext.phx2.redhat.com [10.5.110.39]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1AE26196B2 for ; Wed, 23 Oct 2019 15:46:43 +0000 (UTC) Received: from mail.service4.ru (mail.service4.ru [95.217.67.65]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B9A6B5945B for ; Wed, 23 Oct 2019 15:46:40 +0000 (UTC) Received: from mail.service4.ru (localhost [127.0.0.1]) by mail.service4.ru (Postfix) with ESMTP id 3189740962 for ; Wed, 23 Oct 2019 18:46:39 +0300 (MSK) Received: from mail.service4.ru ([127.0.0.1]) by mail.service4.ru (mail.service4.ru [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id J33plE9pVy7E for ; Wed, 23 Oct 2019 18:46:38 +0300 (MSK) Received: from ilia (unknown [95.217.67.1]) by mail.service4.ru (Postfix) with ESMTPSA id 517C940519 for ; Wed, 23 Oct 2019 18:46:38 +0300 (MSK) References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <8a27504b-1b3a-7126-7ade-18f57b819ead@redhat.com> <3e8184ac-4357-3b97-ac3c-d1133606f103@assyoma.it> From: Ilia Zykov Message-ID: Date: Wed, 23 Oct 2019 18:46:38 +0300 MIME-Version: 1.0 In-Reply-To: <3e8184ac-4357-3b97-ac3c-d1133606f103@assyoma.it> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="------------ms080807050508030906060401" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: LVM general discussion and development This is a cryptographically signed message in MIME format. --------------ms080807050508030906060401 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 23.10.2019 17:40, Gionatan Danti wrote: > On 23/10/19 15:05, Zdenek Kabelac wrote: >> Yep - we are recommending to disable zeroing as soon as chunksize >512= K. >> >> But for 'security' reason the option it's up to users to select what >> fits the needs in the best way - there is no=C2=A0 'one solution fits = them >> all' in this case. >=20 > Sure, but again: if writing a block larger than the underlying chunk, > zeroing can (and should) skipped. Yet I seem to remember that the new At this case if we get reset before a full chunk written, the tail of the chunk will be a foreign old data (if meta data already written) - little security problem. We need firstly write a data to the disk and then give the fully written chunk to the volume. But I think it's 'little' complicate matters. > block is zeroed in any case, even if it is going to be rewritten entire= ly. >=20 > Do I remember wrongly? >=20 --------------ms080807050508030906060401 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC C5AwggVBMIIEKaADAgECAhAMikt7ZGHaNf6ki+JIVQugMA0GCSqGSIb3DQEBCwUAMIGCMQsw CQYDVQQGEwJJVDEPMA0GA1UECAwGTWlsYW5vMQ8wDQYDVQQHDAZNaWxhbm8xIzAhBgNVBAoM GkFjdGFsaXMgUy5wLkEuLzAzMzU4NTIwOTY3MSwwKgYDVQQDDCNBY3RhbGlzIENsaWVudCBB dXRoZW50aWNhdGlvbiBDQSBHMTAeFw0xOTA4MTUxMDE1MjZaFw0yMDA4MTUxMDE1MjZaMBcx FTATBgNVBAMMDG1haWxAaXp5ay5ydTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB ANVNYo+py/n2jfs2n8r7GjG/fV1caStg6b4ndJgjA/fTA9tqrQYQPbwfojv/ucOR+SopvLuq iHAhe9AF14GYcZZ+C1bghoabnY0271D1y8MiY1u+fRlLsOyuKFVROGqccKxZv5YPFLm2lSzB TqGTIje1BGlE6ONOl/3EdiL2D3+nLlhnoXJMF64rKywiehFzerNqoNCpfhsqztneZZ9PPxwZ kgM/fZDHmF+WNPgFvK8j7BsgVIRbH1aej91Del8eXotFJE17wjOGXhe7xAp6EKpF0gqBtSz1 My3etWp1eiEWyksMd7jJCWEWwcBSX2TYYrwE5gWYonCkiY2eqZUm+yUCAwEAAaOCAhswggIX MAwGA1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAUfmD8+GynPT3XrpOheQKPs3QpO/UwSwYIKwYB BQUHAQEEPzA9MDsGCCsGAQUFBzAChi9odHRwOi8vY2FjZXJ0LmFjdGFsaXMuaXQvY2VydHMv YWN0YWxpcy1hdXRjbGlnMTAXBgNVHREEEDAOgQxtYWlsQGl6eWsucnUwRwYDVR0gBEAwPjA8 BgYrgR8BGAEwMjAwBggrBgEFBQcCARYkaHR0cHM6Ly93d3cuYWN0YWxpcy5pdC9hcmVhLWRv d25sb2FkMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcDBDCB6AYDVR0fBIHgMIHdMIGb oIGYoIGVhoGSbGRhcDovL2xkYXAwNS5hY3RhbGlzLml0L2NuJTNkQWN0YWxpcyUyMENsaWVu dCUyMEF1dGhlbnRpY2F0aW9uJTIwQ0ElMjBHMSxvJTNkQWN0YWxpcyUyMFMucC5BLi8wMzM1 ODUyMDk2NyxjJTNkSVQ/Y2VydGlmaWNhdGVSZXZvY2F0aW9uTGlzdDtiaW5hcnkwPaA7oDmG N2h0dHA6Ly9jcmwwNS5hY3RhbGlzLml0L1JlcG9zaXRvcnkvQVVUSENMLUcxL2dldExhc3RD UkwwHQYDVR0OBBYEFHTf5HbXR0vLELk8y8T2yn5e3stDMA4GA1UdDwEB/wQEAwIFoDANBgkq hkiG9w0BAQsFAAOCAQEAYTBNdI9CGfJoZhiKLpk1n3MoDtprPV4ba5LS5SPb2qyzXnkWpp10 DAlUKS934tjIniXwSSsWvChoNv0iB07eTRRd3eG7aR9CMj3KTx2q6ZUrX8i+5goXE08aDE8v fRJh3DVVSkcTw0s+4SzKW9DNz92C3LvTnozJcUnW30J+JhfsE0BqYt4yboRxfBaboXk9zAEB YKtvH0ERemen1z/SMif/2bNjGG+kKfy+XoXNwM/UqVfguveiPZ3esnsnbH3BY+7HCwYz4i7s VDc6IKmVko2gTXxbrZxutJq3yL989ou/mfpMo2S/Z/nvs6NBy63No4hVSCIM8bZLk0tomrDh ajCCBkcwggQvoAMCAQICCCzUitOxHg+JMA0GCSqGSIb3DQEBCwUAMGsxCzAJBgNVBAYTAklU MQ4wDAYDVQQHDAVNaWxhbjEjMCEGA1UECgwaQWN0YWxpcyBTLnAuQS4vMDMzNTg1MjA5Njcx JzAlBgNVBAMMHkFjdGFsaXMgQXV0aGVudGljYXRpb24gUm9vdCBDQTAeFw0xNTA1MTQwNzE0 MTVaFw0zMDA1MTQwNzE0MTVaMIGCMQswCQYDVQQGEwJJVDEPMA0GA1UECAwGTWlsYW5vMQ8w DQYDVQQHDAZNaWxhbm8xIzAhBgNVBAoMGkFjdGFsaXMgUy5wLkEuLzAzMzU4NTIwOTY3MSww KgYDVQQDDCNBY3RhbGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBHMTCCASIwDQYJKoZI hvcNAQEBBQADggEPADCCAQoCggEBAMD8wYlW2Yji9ARlv80JNasoKTD+DMr3J6scEe6GPV3k 9WxEtgxXM5WX3oiKjS2p25Mqk8cnV2fpMaEvdO9alrGes0vqcUqly7PkU753RGlseYXR2XCj Vhs4cuRYjuBmbxpRSJxRImmPnThKY41r0nl6b3A6Z2MOjPQF7h6OCYYwtz/ziv/+UBV587U2 uIlOukaS7Xjk4ArYkQsGTSsfBBXqqn06WL3xG+B/dRO5/mOtY5tHdhPHydsBk2kksI3PJ0yN gKV7o6HM7pG9pB6sGhj96uVLnnVnJ0WXOuV1ISv2eit9ir60LjT99hf+TMZLxA5yaVJ57fYj BMbxM599cw0CAwEAAaOCAdUwggHRMEEGCCsGAQUFBwEBBDUwMzAxBggrBgEFBQcwAYYlaHR0 cDovL29jc3AwNS5hY3RhbGlzLml0L1ZBL0FVVEgtUk9PVDAdBgNVHQ4EFgQUfmD8+GynPT3X rpOheQKPs3QpO/UwDwYDVR0TAQH/BAUwAwEB/zAfBgNVHSMEGDAWgBRS2Ig6yJ94Zu2J83s4 cJTJAgI20DBFBgNVHSAEPjA8MDoGBFUdIAAwMjAwBggrBgEFBQcCARYkaHR0cHM6Ly93d3cu YWN0YWxpcy5pdC9hcmVhLWRvd25sb2FkMIHjBgNVHR8EgdswgdgwgZaggZOggZCGgY1sZGFw Oi8vbGRhcDA1LmFjdGFsaXMuaXQvY24lM2RBY3RhbGlzJTIwQXV0aGVudGljYXRpb24lMjBS b290JTIwQ0EsbyUzZEFjdGFsaXMlMjBTLnAuQS4lMmYwMzM1ODUyMDk2NyxjJTNkSVQ/Y2Vy dGlmaWNhdGVSZXZvY2F0aW9uTGlzdDtiaW5hcnkwPaA7oDmGN2h0dHA6Ly9jcmwwNS5hY3Rh bGlzLml0L1JlcG9zaXRvcnkvQVVUSC1ST09UL2dldExhc3RDUkwwDgYDVR0PAQH/BAQDAgEG MA0GCSqGSIb3DQEBCwUAA4ICAQBNk87VJL5BG0oWWHNfZYny2Xo+WIy8y8QP5VsWZ7LBS6Qz 8kn8zJp3c9xdOkudZbcA3vm5U8HKXc1JdzNmpSh92zq/OeZLvUa+rnncmvhxkFE9Doag6Nit ggBPZwXHwDcYn430/F8wqAt3LX/bsd6INVrhPFk3C2SoAjLjUQZibXvQuFINMN4l6j86vCrk UaGzSqnXT45NxIivkAPhBQgpGtcTi4f+3DxkyTDbWtf9LuaC4l2jgB3gC7f56nmdpGfpYsyv KE7+Ip+WryH93pWt6C+r68KU3Gu02cU1/dHvNOXWUDeKkVT3T26wZVrTaMx+0nS3i63KDfJd hFzutfdBgCWHcp03NhOhMqy1RnAylF/dVZgkka6hKaWe1tOU21kS4uvsD4wM5k6tl0pin2o6 u47kyoJJMOxRSQcosWtDXUmaLHUG91ZC6hvBDmDmpmS6h/r+7mtPrpYOxTr4hW3me2EfXkTv NTvBQtbi4LrZchg9vhi44EJ7L53g7GzQFn5KK8vqqgMb1c1+T0mkKdqSedgGiB9TDdYtv4Hk Uj/N00TKxZMLiDMw4V8ShUL6bKTXNfb3E68s47cD+MatFjUuGFj0uFPvZlvlNAoJ7IMfXzIi TWy35X+akm+d49wBh54yv6icz2t/cBU1y1weuPBd8NUH/Ue3mXk0SXwkGP3yVDGCAp0wggKZ AgEBMIGXMIGCMQswCQYDVQQGEwJJVDEPMA0GA1UECAwGTWlsYW5vMQ8wDQYDVQQHDAZNaWxh bm8xIzAhBgNVBAoMGkFjdGFsaXMgUy5wLkEuLzAzMzU4NTIwOTY3MSwwKgYDVQQDDCNBY3Rh bGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBHMQIQDIpLe2Rh2jX+pIviSFULoDANBglg hkgBZQMEAgEFAKCB1zAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEP Fw0xOTEwMjMxNTQ2MzhaMC8GCSqGSIb3DQEJBDEiBCA/09DQMmFvBS0fATpxLaQs7w78/4wY /y887rf6tzfFDjBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjALBglghkgBZQMEAQIw CgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFAMAcGBSsOAwIHMA0G CCqGSIb3DQMCAgEoMA0GCSqGSIb3DQEBAQUABIIBANLdWuALOKduVS0onhvLTK3YCmvkMBBo FePeJ4rpQBOowsUGciNjuDaKbPFtHWgEDh4D1PwKG3heKbuW1pEvHQJDGj6YTkvIpudBZjkO e9BQWITNR5qQFkQzjUMulCqmi2jTci4PW2rVDRHUU4gedUOYBl9I+VEMdMmPpynElSO7evvd mezvrTN2cOPs0CbGGWTpu6unSFMqvMaFl8xvHvX0Ln3Nq7nZz3Bm8YMLaTwmmrTVs4gykY8L Z32n8nT4YtxbM/i45XHtu7mE8AHDIfdWk8n/9fSv0m/XlhMcheAoi4KM409Ka98JO1pghuc3 Ry6SbYqx5BxIG06DdQUP8RQAAAAAAAA= --------------ms080807050508030906060401-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Date: Wed, 23 Oct 2019 19:16:05 +0200 From: Gionatan Danti In-Reply-To: <13d450bf-f6ad-7db8-eec1-88c4eb473287@redhat.com> References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <0dd80515-d4db-273f-2ee4-78981c11b2c8@redhat.com> <0fde933f-1e8d-60de-cada-4a49a67129c4@assyoma.it> <13d450bf-f6ad-7db8-eec1-88c4eb473287@redhat.com> Message-ID: <65a61709788b69c722572dac76cf02a2@assyoma.it> Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Zdenek Kabelac Cc: =?UTF-8?Q?Dalebj=C3=B6rk=2C_Tomas?= , LVM general discussion and development Il 23-10-2019 17:37 Zdenek Kabelac ha scritto: > Hi > > If you use 1MiB chunksize for thin-pool and you use 'dd' with proper > bs size > and you write 'aligned' on 1MiB boundary (be sure you user directIO, > so you are not a victim of some page cache flushing...) - there should > not be any useless read. > > If you still do see such read - and you can easily reproduce this with > latest kernel - report a bug please with your reproducer and results. > > Regards > > Zdenek OK, I triple-checked my numbers and you are right: on a fully updated CentOS 7.7 x86-64 box with kernel-3.10.0-1062.4.1 and lvm2-2.02.185-2, it seems that the behavior I observed on older (>2 years ago) is not present anymore. Take this original lvm setup: [root@localhost ~]# lvs -o +chunk_size LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Chunk root centos -wi-ao---- <6.20g 0 swap centos -wi-ao---- 512.00m 0 thinpool centos twi-aot--- 1.00g 25.00 14.16 64.00k thinvol centos Vwi-a-t--- 256.00m thinpool 100.00 0 Taking a snapshot (lvcreate -s /dev/centos/thinvol -n thinsnap) and overwriting 1 MB of data on origin via "dd if=/dev/urandom of=/dev/centos/thinvol bs=1M count=32 oflag=direct" results in the following I/O to/from disk: [root@localhost ~]# dstat -d -D sdc ---dsk/sdc--- read writ 1036k 32M As you can see, while 1 MB was indeed read (due to metadata read?), no other read amplification occoured. Now I got curious to see if zeroing behave in the same manner. So, I deleted thinsnap & thinvol, toggled zeroing on (lvchange -Zy centos/thinpool), and recreated thinvol: [root@localhost ~]# lvs -o +chunk_size LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Chunk root centos -wi-ao---- <6.20g 0 swap centos -wi-ao---- 512.00m 0 thinpool centos twi-aotz-- 1.00g 0.00 11.04 64.00k thinvol centos Vwi-a-tz-- 256.00m thinpool 0.00 0 [root@localhost ~]# dstat -d -D sdc --dsk/sdc-- read writ 0 13M 520k 19M Again, no write amplificaton occoured. Kudos to all the team for optimizing lvmthin in this manner, it really is a flexible and great performing tool. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx01.extmail.prod.ext.phx2.redhat.com [10.5.110.25]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B8D0A60C57 for ; Wed, 23 Oct 2019 10:06:43 +0000 (UTC) Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E398581F01 for ; Wed, 23 Oct 2019 10:06:41 +0000 (UTC) Received: by mail-lj1-f176.google.com with SMTP id u4so6282529ljj.9 for ; Wed, 23 Oct 2019 03:06:41 -0700 (PDT) MIME-Version: 1.0 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <9e58accdc28692b3c8b2b09f37bce57c@assyoma.it> In-Reply-To: From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Wed, 23 Oct 2019 12:06:28 +0200 Message-ID: Content-Type: multipart/alternative; boundary="000000000000b6d6c80595911272" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Gionatan Danti Cc: LVM general discussion and development --000000000000b6d6c80595911272 Content-Type: text/plain; charset="UTF-8" Many thanks for all the feedback. The idea works for those applications that supports snapshots. Like Sybase / SAP Adaptive Server Enterprise, Sybase / SAP IQ Server, DB2, MongoDB, MariaDB/MySQL, PostgreSQL etc.. Anyhow, back to the origin question: Is there a way how to re-create the cow- format. so that lvconvert --merge can be used. Or by having lvconvert --merge to accept to read from a "cow file" If that would be possible, than instant recovery would be possible from an external source, like a backup server. Regards Tomas Den ons 23 okt. 2019 kl 08:58 skrev Gionatan Danti : > Il 23-10-2019 00:53 Stuart D. Gathman ha scritto: > > If you can find all the leaf nodes belonging to the root (in my btree > > database they are marked with the root id and can be found by > > sequential > > scan of the volume), then reconstructing the btree data is > > straightforward - even in place. > > > > I remember realizing this was the only way to recover a major > > customer's > > data - and had the utility written, tested, and applied in a 36 hour > > programming marathon (which I hope to never repeat). If this hasn't > > occured to thin pool programmers, I am happy to flesh out the > > procedure. > > Having such a utility available as a last resort would ratchet up the > > reliability of thin pools. > > Very interesting. Can I ask you what product/database you recovered? > > Anyway, giving similar ability to thin Vols would be awesome. > > Thanks. > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it > email: g.danti@assyoma.it - info@assyoma.it > GPG public key ID: FF5F32A8 > --000000000000b6d6c80595911272 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Many thanks for all the feedback.

The idea works for those applications that supports snapshots.
<= div>Like Sybase / SAP Adaptive Server Enterprise, Sybase / SAP IQ Server, D= B2, MongoDB, MariaDB/MySQL, PostgreSQL etc..

Anyho= w, back to the origin question:
Is there a way how to re-create t= he cow- format.
so that lvconvert --merge can be used.
= Or by having lvconvert --merge to accept to read from a "cow file"= ;

If that would be possible, than instant recovery= would be possible from an external source, like a backup server.

Regards Tomas

Den ons 23 okt. 2019 kl 08:58= skrev Gionatan Danti <g.danti@ass= yoma.it>:
Il 23-10-2019 00:53 Stuart D. Ga= thman ha scritto:
> If you can find all the leaf nodes belonging to the root (in my btree<= br> > database they are marked with the root id and can be found by
> sequential
> scan of the volume), then reconstructing the btree data is
> straightforward - even in place.
>
> I remember realizing this was the only way to recover a major
> customer's
> data - and had the utility written, tested, and applied in a 36 hour > programming marathon (which I hope to never repeat).=C2=A0 If this has= n't
> occured to thin pool programmers, I am happy to flesh out the
> procedure.
> Having such a utility available as a last resort would ratchet up the<= br> > reliability of thin pools.

Very interesting. Can I ask you what product/database you recovered?

Anyway, giving similar ability to thin Vols would be awesome.

Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assy= oma.it - info@assy= oma.it
GPG public key ID: FF5F32A8
--000000000000b6d6c80595911272-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <83c4026c-5abe-e9e5-ac1d-6ed9e025e660@gmail.com> In-Reply-To: From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Wed, 23 Oct 2019 12:56:22 +0200 Message-ID: Content-Type: multipart/alternative; boundary="00000000000023dfdd059591c51e" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Zdenek Kabelac Cc: LVM general discussion and development --00000000000023dfdd059591c51e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks, Ok, looking at thin than. Is there a way to adopt similarities using thin instead? Regards Tomas Den ons 23 okt. 2019 kl 12:26 skrev Zdenek Kabelac : > Dne 22. 10. 19 v 18:13 Dalebj=C3=B6rk, Tomas napsal(a): > > That is cool, > > > > But, are there any practical example how this could be working in > reality. > > > > There is not yet a practical example available from our lvm2 team yet. > > So we are only describing the 'model' & 'plan' we have ATM... > > > > > > I have created a way to perform block level incremental forever by > reading the > > -cow device, and thin_dump would be nice replacement for that. > > > COW is dead technology from our perspective - it can't cope with recent > performance of modern drives like NVMe... > > So our plan is to focus on thinp technology here. > > > Zdenek > > --00000000000023dfdd059591c51e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks,

Ok, looking at thin = than.
Is there a way to adopt similarities using thin instead?

Regards Tomas

Den ons 23 okt. 2019 kl 12:26 skr= ev Zdenek Kabelac <zkabelac@redha= t.com>:
Dne 22. 10. 19 v 18:13 Dalebj=C3= =B6rk, Tomas napsal(a):
> That is cool,
>
> But, are there any practical example how this could be working in real= ity.
>

There is not yet a practical example available from our lvm2 team yet.

So we are only describing the 'model' & 'plan' we have = ATM...


>
> I have created a way to perform block level incremental forever by rea= ding the
> -cow device, and thin_dump would be nice replacement for that.


COW is dead technology from our perspective - it can't cope with recent=
performance of modern drives like NVMe...

So our plan is to focus on thinp technology here.


Zdenek

--00000000000023dfdd059591c51e-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> In-Reply-To: From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Wed, 23 Oct 2019 13:24:31 +0200 Message-ID: Content-Type: multipart/alternative; boundary="000000000000c6a6b505959229f0" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Gionatan Danti Cc: Zdenek Kabelac , LVM general discussion and development --000000000000c6a6b505959229f0 Content-Type: text/plain; charset="UTF-8" I have tested FusionIO together with old thick snapshots. I created the thick snapshot on a separate old traditional SATA drive, just to check if that could be used as a snapshot target for high performance disks; like a Fusion IO card. For those who doesn't know about FusionIO; they can deal with 150-250,000 IOPS. And to be honest, I couldn't bottle neck the SATA disk I used as a thick snapshot target. The reason for why is simple: - thick snapshots uses sequential write techniques If I would have been using thin snapshots, than the writes would most likely be more randomized on disk, which would have required more spindles to coop with this. Anyhow; I am still eager to hear how to use an external device to import snapshots. And when I say "import"; I am not talking about copyback, more to use to read data from. Regards Tomas Den ons 23 okt. 2019 kl 13:08 skrev Gionatan Danti : > On 23/10/19 12:46, Zdenek Kabelac wrote: > > Just few 'comments' - it's not really comparable - the efficiency of > > thin-pool metadata outperforms old snapshot in BIG way (there is no > > point to talk about snapshots that takes just couple of MiB) > > Yes, this matches my experience. > > > There is also BIG difference about the usage of old snapshot origin and > > snapshot. > > > > COW of old snapshot effectively cuts performance 1/2 if you write to > > origin. > > If used without non-volatile RAID controller, 1/2 is generous - I > measured performance as low as 1/5 (with fat snapshot). > > Talking about thin snapshot, an obvious performance optimization which > seems to not be implemented is to skip reading source data when > overwriting in larger-than-chunksize blocks. > > For example, consider a completely filled 64k chunk thin volume (with > thinpool having ample free space). Snapshotting it and writing a 4k > block on origin will obviously cause a read of the original 64k chunk, > an in-memory change of the 4k block and a write of the entire modified > 64k block to a new location. But writing, say, a 1 MB block should *not* > cause the same read on source: after all, the read data will be > immediately discarded, overwritten by the changed 1 MB block. > > However, my testing shows that source chunks are always read, even when > completely overwritten. > > Am I missing something? > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it > email: g.danti@assyoma.it - info@assyoma.it > GPG public key ID: FF5F32A8 > --000000000000c6a6b505959229f0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I have tested FusionIO together with old thick snapsh= ots.
I created the thick snapshot on a separate old traditional S= ATA drive, just to check if that could be used as a snapshot target for hig= h performance disks; like a Fusion IO card.
For those who doesn&#= 39;t know about FusionIO; they can deal with 150-250,000 IOPS.

And to be honest, I couldn't bottle neck the SATA disk= I used as a thick snapshot target.
The reason for why is simple:=
- thick snapshots uses sequential write techniques
If I would have been using thin snapshots, than the writes woul= d most likely be more randomized on disk, which would have required more sp= indles to coop with this.

Anyhow;
I am s= till eager to hear how to use an external device to import snapshots.
=
And when I say "import"; I am not talking about copyback, mo= re to use to read data from.

Regards Tomas

De= n ons 23 okt. 2019 kl 13:08 skrev Gionatan Danti <g.danti@assyoma.it>:
On 23= /10/19 12:46, Zdenek Kabelac wrote:
> Just few 'comments' - it's not really comparable - the eff= iciency of
> thin-pool metadata outperforms old snapshot in BIG way (there is no > point to talk about snapshots that takes just couple of MiB)

Yes, this matches my experience.

> There is also BIG difference about the usage of old snapshot origin an= d
> snapshot.
>
> COW of old snapshot effectively cuts performance 1/2 if you write to <= br> > origin.

If used without non-volatile RAID controller, 1/2 is generous - I
measured performance as low as 1/5 (with fat snapshot).

Talking about thin snapshot, an obvious performance optimization which
seems to not be implemented is to skip reading source data when
overwriting in larger-than-chunksize blocks.

For example, consider a completely filled 64k chunk thin volume (with
thinpool having ample free space). Snapshotting it and writing a 4k
block on origin will obviously cause a read of the original 64k chunk,
an in-memory change of the 4k block and a write of the entire modified
64k block to a new location. But writing, say, a 1 MB block should *not* cause the same read on source: after all, the read data will be
immediately discarded, overwritten by the changed 1 MB block.

However, my testing shows that source chunks are always read, even when completely overwritten.

Am I missing something?

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assy= oma.it - info@assy= oma.it
GPG public key ID: FF5F32A8
--000000000000c6a6b505959229f0-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> In-Reply-To: From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Wed, 23 Oct 2019 13:26:36 +0200 Message-ID: Content-Type: multipart/alternative; boundary="0000000000003bdd740595923115" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Gionatan Danti Cc: Zdenek Kabelac , LVM general discussion and development --0000000000003bdd740595923115 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable And the block size for thick snapshots can be set when using the lvcreate command. And the automatic growing of a snapshot can be configured too in the lvm configuration. Same issues with both thin and thick, if you run out of space. //T Den ons 23 okt. 2019 kl 13:24 skrev Tomas Dalebj=C3=B6rk < tomas.dalebjork@gmail.com>: > I have tested FusionIO together with old thick snapshots. > I created the thick snapshot on a separate old traditional SATA drive, > just to check if that could be used as a snapshot target for high > performance disks; like a Fusion IO card. > For those who doesn't know about FusionIO; they can deal with 150-250,000 > IOPS. > > And to be honest, I couldn't bottle neck the SATA disk I used as a thick > snapshot target. > The reason for why is simple: > - thick snapshots uses sequential write techniques > > If I would have been using thin snapshots, than the writes would most > likely be more randomized on disk, which would have required more spindle= s > to coop with this. > > Anyhow; > I am still eager to hear how to use an external device to import snapshot= s. > And when I say "import"; I am not talking about copyback, more to use to > read data from. > > Regards Tomas > > Den ons 23 okt. 2019 kl 13:08 skrev Gionatan Danti : > >> On 23/10/19 12:46, Zdenek Kabelac wrote: >> > Just few 'comments' - it's not really comparable - the efficiency of >> > thin-pool metadata outperforms old snapshot in BIG way (there is no >> > point to talk about snapshots that takes just couple of MiB) >> >> Yes, this matches my experience. >> >> > There is also BIG difference about the usage of old snapshot origin an= d >> > snapshot. >> > >> > COW of old snapshot effectively cuts performance 1/2 if you write to >> > origin. >> >> If used without non-volatile RAID controller, 1/2 is generous - I >> measured performance as low as 1/5 (with fat snapshot). >> >> Talking about thin snapshot, an obvious performance optimization which >> seems to not be implemented is to skip reading source data when >> overwriting in larger-than-chunksize blocks. >> >> For example, consider a completely filled 64k chunk thin volume (with >> thinpool having ample free space). Snapshotting it and writing a 4k >> block on origin will obviously cause a read of the original 64k chunk, >> an in-memory change of the 4k block and a write of the entire modified >> 64k block to a new location. But writing, say, a 1 MB block should *not* >> cause the same read on source: after all, the read data will be >> immediately discarded, overwritten by the changed 1 MB block. >> >> However, my testing shows that source chunks are always read, even when >> completely overwritten. >> >> Am I missing something? >> >> -- >> Danti Gionatan >> Supporto Tecnico >> Assyoma S.r.l. - www.assyoma.it >> email: g.danti@assyoma.it - info@assyoma.it >> GPG public key ID: FF5F32A8 >> > --0000000000003bdd740595923115 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
And the block size for thick snapshots can be set whe= n using the lvcreate command.
And the automatic growing of a snap= shot can be configured too in the lvm configuration.

Same issues with both thin and thick, if you run out of space.
//T

Den ons 23 okt. 2019 kl 13:24 skrev Tomas Dalebj=C3=B6rk <tomas.dalebjork@gmail.com>:=
I have tested FusionIO tog= ether with old thick snapshots.
I created the thick snapshot on a= separate old traditional SATA drive, just to check if that could be used a= s a snapshot target for high performance disks; like a Fusion IO card.
For those who doesn't know about FusionIO; they can deal with 150= -250,000 IOPS.

And to be honest, I couldn'= t bottle neck the SATA disk I used as a thick snapshot target.
Th= e reason for why is simple:
- thick snapshots uses sequential wri= te techniques

If I would have been using thin snap= shots, than the writes would most likely be more randomized on disk, which = would have required more spindles to coop with this.

Anyhow;
I am still eager to hear how to use an external device= to import snapshots.
And when I say "import"; I am not= talking about copyback, more to use to read data from.

Regards Tomas

Den ons 23 okt. 2019 kl 13:08 skrev Gionatan Danti= <g.danti@assyom= a.it>:
On 23/10/19 12:46, Zdenek Kabelac w= rote:
> Just few 'comments' - it's not really comparable - the eff= iciency of
> thin-pool metadata outperforms old snapshot in BIG way (there is no > point to talk about snapshots that takes just couple of MiB)

Yes, this matches my experience.

> There is also BIG difference about the usage of old snapshot origin an= d
> snapshot.
>
> COW of old snapshot effectively cuts performance 1/2 if you write to <= br> > origin.

If used without non-volatile RAID controller, 1/2 is generous - I
measured performance as low as 1/5 (with fat snapshot).

Talking about thin snapshot, an obvious performance optimization which
seems to not be implemented is to skip reading source data when
overwriting in larger-than-chunksize blocks.

For example, consider a completely filled 64k chunk thin volume (with
thinpool having ample free space). Snapshotting it and writing a 4k
block on origin will obviously cause a read of the original 64k chunk,
an in-memory change of the 4k block and a write of the entire modified
64k block to a new location. But writing, say, a 1 MB block should *not* cause the same read on source: after all, the read data will be
immediately discarded, overwritten by the changed 1 MB block.

However, my testing shows that source chunks are always read, even when completely overwritten.

Am I missing something?

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assy= oma.it - info@assy= oma.it
GPG public key ID: FF5F32A8
--0000000000003bdd740595923115-- From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx20.extmail.prod.ext.phx2.redhat.com [10.5.110.49]) by smtp.corp.redhat.com (Postfix) with ESMTPS id ED7A75DE5B for ; Wed, 23 Oct 2019 12:12:56 +0000 (UTC) Received: from mail.service4.ru (mail.service4.ru [95.217.67.65]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A12B03086246 for ; Wed, 23 Oct 2019 12:12:53 +0000 (UTC) Received: from mail.service4.ru (localhost [127.0.0.1]) by mail.service4.ru (Postfix) with ESMTP id 0F9AC40962 for ; Wed, 23 Oct 2019 15:12:52 +0300 (MSK) Received: from mail.service4.ru ([127.0.0.1]) by mail.service4.ru (mail.service4.ru [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ct591czGFMVq for ; Wed, 23 Oct 2019 15:12:51 +0300 (MSK) Received: from ilia (unknown [95.217.67.1]) by mail.service4.ru (Postfix) with ESMTPSA id 3F12540310 for ; Wed, 23 Oct 2019 15:12:51 +0300 (MSK) References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Ilia Zykov Message-ID: <4daf9dc2-c2b4-7191-80c4-fa2073d7b5e9@service4.ru> Date: Wed, 23 Oct 2019 15:12:52 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="------------ms020701070205010904000503" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: linux-lvm@redhat.com This is a cryptographically signed message in MIME format. --------------ms020701070205010904000503 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 23.10.2019 14:08, Gionatan Danti wrote: >=20 > For example, consider a completely filled 64k chunk thin volume (with > thinpool having ample free space). Snapshotting it and writing a 4k > block on origin will obviously cause a read of the original 64k chunk, > an in-memory change of the 4k block and a write of the entire modified > 64k block to a new location. But writing, say, a 1 MB block should *not= * > cause the same read on source: after all, the read data will be > immediately discarded, overwritten by the changed 1 MB block. >=20 > However, my testing shows that source chunks are always read, even when= > completely overwritten. Not only read but sometimes write. I watched it without snapshot. Only zeroing was enabled. Before wrote new chunks "dd bs=3D1048576 ..." chunks were zeroed. But for security it'= s good. IMHO: In this case best choice firstly write chunks to the disk and then give this chunks to the volume. >=20 > Am I missing something? >=20 --------------ms020701070205010904000503 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC C5gwggVJMIIEMaADAgECAhB4cYLFWiS6TlQZt1XHO6t8MA0GCSqGSIb3DQEBCwUAMIGCMQsw CQYDVQQGEwJJVDEPMA0GA1UECAwGTWlsYW5vMQ8wDQYDVQQHDAZNaWxhbm8xIzAhBgNVBAoM GkFjdGFsaXMgUy5wLkEuLzAzMzU4NTIwOTY3MSwwKgYDVQQDDCNBY3RhbGlzIENsaWVudCBB dXRoZW50aWNhdGlvbiBDQSBHMTAeFw0xOTA4MTUxMzIzMjhaFw0yMDA4MTUxMzIzMjhaMBsx GTAXBgNVBAMMEG1haWxAc2VydmljZTQucnUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK AoIBAQCVtIsFGaETpCy13RrPlkkWnljEyXpI9AXCaVqXgwJ1Zw6rnUndl3FPB6L7e4tnaT2w OAXak0cbxS4AlUOXfYA6cSfS42n5QUOdo7HKoS+F50hYeiwXetF6q2UejDcdTXQsXLwXyk3d ROvzGgQ3wyANWElQHOJ+3LIujEmNxMv5h9CMZuEA/yFyaVHxzfRX1mzxnDIywQrsXxjVrnaM r+3AJDezkq0qkIWA9+JFfcCcLrKVOD7sKG9wwZ1XlRljYgQDV1OHhTsJoti8fBWtofDCNkmy 7NtocVHQ/Krux0i3e8xl2oS7LVoViSKJ5pUhVCzLJyX8XOE7ue1X9iTGp+gFAgMBAAGjggIf MIICGzAMBgNVHRMBAf8EAjAAMB8GA1UdIwQYMBaAFH5g/Phspz09166ToXkCj7N0KTv1MEsG CCsGAQUFBwEBBD8wPTA7BggrBgEFBQcwAoYvaHR0cDovL2NhY2VydC5hY3RhbGlzLml0L2Nl cnRzL2FjdGFsaXMtYXV0Y2xpZzEwGwYDVR0RBBQwEoEQbWFpbEBzZXJ2aWNlNC5ydTBHBgNV HSAEQDA+MDwGBiuBHwEYATAyMDAGCCsGAQUFBwIBFiRodHRwczovL3d3dy5hY3RhbGlzLml0 L2FyZWEtZG93bmxvYWQwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMIHoBgNVHR8E geAwgd0wgZuggZiggZWGgZJsZGFwOi8vbGRhcDA1LmFjdGFsaXMuaXQvY24lM2RBY3RhbGlz JTIwQ2xpZW50JTIwQXV0aGVudGljYXRpb24lMjBDQSUyMEcxLG8lM2RBY3RhbGlzJTIwUy5w LkEuLzAzMzU4NTIwOTY3LGMlM2RJVD9jZXJ0aWZpY2F0ZVJldm9jYXRpb25MaXN0O2JpbmFy eTA9oDugOYY3aHR0cDovL2NybDA1LmFjdGFsaXMuaXQvUmVwb3NpdG9yeS9BVVRIQ0wtRzEv Z2V0TGFzdENSTDAdBgNVHQ4EFgQUgCpQAAD0UTKv6ZEg7yBkU1mt5vAwDgYDVR0PAQH/BAQD AgWgMA0GCSqGSIb3DQEBCwUAA4IBAQBalXd7ytQdG2BOxQ4dKhXLsqz/UtbmgFOz26u2XnSb ttiq6INAdtDOyjqLDAVwRA6VDzNd/3I7jUzpVVl1iEAm3K/f3fjANb00EE4c5w7QCH1YQ3MB eu2HVlfxCrrFC1ZrhvuCypKwXNOuDdFYpxMyPLWaU5wIjyU5TWbJyXdYgd1+KRad2vSwO+2k BLkrn0ApSeTBuK9PhWcQbH/SDD8KrAIjQOAmyqiywaJPjtHLGkObyh8JqAkg9HeiVtmlqvOe xp3/oXdbTnYUuEFGXyUofoRiOhPB5Hh1f2T8XFZxg5hay5cKU3RB1Y4UkxG5xEjPYj+ASes9 bkxdaIYB9i60MIIGRzCCBC+gAwIBAgIILNSK07EeD4kwDQYJKoZIhvcNAQELBQAwazELMAkG A1UEBhMCSVQxDjAMBgNVBAcMBU1pbGFuMSMwIQYDVQQKDBpBY3RhbGlzIFMucC5BLi8wMzM1 ODUyMDk2NzEnMCUGA1UEAwweQWN0YWxpcyBBdXRoZW50aWNhdGlvbiBSb290IENBMB4XDTE1 MDUxNDA3MTQxNVoXDTMwMDUxNDA3MTQxNVowgYIxCzAJBgNVBAYTAklUMQ8wDQYDVQQIDAZN aWxhbm8xDzANBgNVBAcMBk1pbGFubzEjMCEGA1UECgwaQWN0YWxpcyBTLnAuQS4vMDMzNTg1 MjA5NjcxLDAqBgNVBAMMI0FjdGFsaXMgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIENBIEcxMIIB IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAwPzBiVbZiOL0BGW/zQk1qygpMP4Myvcn qxwR7oY9XeT1bES2DFczlZfeiIqNLanbkyqTxydXZ+kxoS9071qWsZ6zS+pxSqXLs+RTvndE aWx5hdHZcKNWGzhy5FiO4GZvGlFInFEiaY+dOEpjjWvSeXpvcDpnYw6M9AXuHo4JhjC3P/OK //5QFXnztTa4iU66RpLteOTgCtiRCwZNKx8EFeqqfTpYvfEb4H91E7n+Y61jm0d2E8fJ2wGT aSSwjc8nTI2ApXujoczukb2kHqwaGP3q5UuedWcnRZc65XUhK/Z6K32KvrQuNP32F/5MxkvE DnJpUnnt9iMExvEzn31zDQIDAQABo4IB1TCCAdEwQQYIKwYBBQUHAQEENTAzMDEGCCsGAQUF BzABhiVodHRwOi8vb2NzcDA1LmFjdGFsaXMuaXQvVkEvQVVUSC1ST09UMB0GA1UdDgQWBBR+ YPz4bKc9Pdeuk6F5Ao+zdCk79TAPBgNVHRMBAf8EBTADAQH/MB8GA1UdIwQYMBaAFFLYiDrI n3hm7YnzezhwlMkCAjbQMEUGA1UdIAQ+MDwwOgYEVR0gADAyMDAGCCsGAQUFBwIBFiRodHRw czovL3d3dy5hY3RhbGlzLml0L2FyZWEtZG93bmxvYWQwgeMGA1UdHwSB2zCB2DCBlqCBk6CB kIaBjWxkYXA6Ly9sZGFwMDUuYWN0YWxpcy5pdC9jbiUzZEFjdGFsaXMlMjBBdXRoZW50aWNh dGlvbiUyMFJvb3QlMjBDQSxvJTNkQWN0YWxpcyUyMFMucC5BLiUyZjAzMzU4NTIwOTY3LGMl M2RJVD9jZXJ0aWZpY2F0ZVJldm9jYXRpb25MaXN0O2JpbmFyeTA9oDugOYY3aHR0cDovL2Ny bDA1LmFjdGFsaXMuaXQvUmVwb3NpdG9yeS9BVVRILVJPT1QvZ2V0TGFzdENSTDAOBgNVHQ8B Af8EBAMCAQYwDQYJKoZIhvcNAQELBQADggIBAE2TztUkvkEbShZYc19lifLZej5YjLzLxA/l WxZnssFLpDPySfzMmndz3F06S51ltwDe+blTwcpdzUl3M2alKH3bOr855ku9Rr6uedya+HGQ UT0OhqDo2K2CAE9nBcfANxifjfT8XzCoC3ctf9ux3og1WuE8WTcLZKgCMuNRBmJte9C4Ug0w 3iXqPzq8KuRRobNKqddPjk3EiK+QA+EFCCka1xOLh/7cPGTJMNta1/0u5oLiXaOAHeALt/nq eZ2kZ+lizK8oTv4in5avIf3ela3oL6vrwpTca7TZxTX90e805dZQN4qRVPdPbrBlWtNozH7S dLeLrcoN8l2EXO6190GAJYdynTc2E6EyrLVGcDKUX91VmCSRrqEppZ7W05TbWRLi6+wPjAzm Tq2XSmKfajq7juTKgkkw7FFJByixa0NdSZosdQb3VkLqG8EOYOamZLqH+v7ua0+ulg7FOviF beZ7YR9eRO81O8FC1uLgutlyGD2+GLjgQnsvneDsbNAWfkory+qqAxvVzX5PSaQp2pJ52AaI H1MN1i2/geRSP83TRMrFkwuIMzDhXxKFQvpspNc19vcTryzjtwP4xq0WNS4YWPS4U+9mW+U0 Cgnsgx9fMiJNbLflf5qSb53j3AGHnjK/qJzPa39wFTXLXB648F3w1Qf9R7eZeTRJfCQY/fJU MYICnTCCApkCAQEwgZcwgYIxCzAJBgNVBAYTAklUMQ8wDQYDVQQIDAZNaWxhbm8xDzANBgNV BAcMBk1pbGFubzEjMCEGA1UECgwaQWN0YWxpcyBTLnAuQS4vMDMzNTg1MjA5NjcxLDAqBgNV BAMMI0FjdGFsaXMgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIENBIEcxAhB4cYLFWiS6TlQZt1XH O6t8MA0GCWCGSAFlAwQCAQUAoIHXMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZI hvcNAQkFMQ8XDTE5MTAyMzEyMTI1M1owLwYJKoZIhvcNAQkEMSIEIOritgBn33WFWpoq9T/C 0xrFhrM+n491gGTsulphnr9rMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCG SAFlAwQBAjAKBggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYF Kw4DAgcwDQYIKoZIhvcNAwICASgwDQYJKoZIhvcNAQEBBQAEggEAeSAsfKsTQccN4jP9luGR fUEQtSrN/6dkjd/UtIgnrm3/nxLUBLqwurrRaR5HG8HA1ad/E9mlmwt6kpQky4KVqCFcAXmM oo7nECKKGpw3mmPH7JjP3ZYUfe1H8W8910CwTsBA9hTdICol5fGvTEgkgF0M726s/btiCCYf j5RK8ytdITLrpdx/o5VfapJzfYPxFWCVnOMv4jfLFCi4W7V8zBapv2KMFLS8OyCRWQ34oB3R TEFHEeCd7cenm33MP+81kscvxR5EA9+mdSu0gi4VLcrOjw3hjY/GpeUZY5Z1xb92utitsu82 DzT/USFYgDzi7HjQTbdFCa6JeEcgrO/EHwAAAAAAAA== --------------ms020701070205010904000503-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> From: Zdenek Kabelac Message-ID: <35a99505-b5a9-a398-6ec2-b530733e974c@redhat.com> Date: Thu, 24 Oct 2019 18:01:25 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: LVM general discussion and development , =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= , Gionatan Danti Dne 23. 10. 19 v 13:24 Tomas Dalebj=C3=B6rk napsal(a): > I have tested FusionIO together with old thick snapshots. > I created the thick snapshot on a separate old traditional SATA drive, ju= st to=20 > check if that could be used as a snapshot target for high performance dis= ks;=20 > like a Fusion IO card. > For those who doesn't know about FusionIO; they can deal with 150-250,000= IOPS. >=20 > And to be honest, I couldn't bottle neck the SATA disk I used as a thick = > snapshot target. > The reason for why is simple: > - thick snapshots uses sequential write techniques >=20 > If I would have been using thin snapshots, than the writes would most lik= ely=20 > be more randomized on disk, which would have required more spindles to co= op=20 > with this. >=20 > Anyhow; > I am still eager to hear how to use an external device to import snapshot= s. > And when I say "import"; I am not talking about copyback, more to use to = read=20 > data from. Format of 'on-disk' snapshot metadata for old snapshot is trivial - being s= ome header + pairs of dataoffset-TO-FROM - I think googling will reveal couple python tools playing with it. You can add pre-created COW image to LV with lvconvert --snapshot and to avoid 'zeroing' metadata use option -Zn (BTW in the same way you can detach snapshot from LV with --splitsnapshot s= o=20 you can look how the metadata looks like...) Although it's pretty unusual why would anyone create first the COW image wi= th=20 all the special layout and then merge it to LV - instead of directly=20 merging... There is only the 'little' advantage of minimizing 'offline' t= ime=20 of such device (and it's the reason why --split exists). Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <909d4cae-ddd2-3951-eee8-8dec8faa6f22@redhat.com> <35a99505-b5a9-a398-6ec2-b530733e974c@redhat.com> In-Reply-To: <35a99505-b5a9-a398-6ec2-b530733e974c@redhat.com> From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Fri, 25 Oct 2019 18:31:25 +0200 Message-ID: Content-Type: multipart/alternative; boundary="00000000000004b7980595beaf8e" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Zdenek Kabelac Cc: Gionatan Danti , LVM general discussion and development --00000000000004b7980595beaf8e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Wow! Impressing. This will make history! If this is possible, than we are able to implement a solution, which can do= : - progressive block level incremental forever (always incremental on block level : this already exist) - instant recovery to point in time (using the mentioned methods you just described) For example, lets say that a client wants to restore a file system, or a logical volume to how it looked a like yesterday. Eventhough there are no snapshot, nor any data. Than the client (with some coding); can start from an empty volume, and re-attach a cow device, and convert that using lvconvert --merge, so that the copying can be done in background using the backup server. If you forget about "how we will re-create the cow device"; and just focusing on the LVM ideas of re-attaching a cow device. Do you think that I have understood it correctly? Den tors 24 okt. 2019 kl 18:01 skrev Zdenek Kabelac : > Dne 23. 10. 19 v 13:24 Tomas Dalebj=C3=B6rk napsal(a): > > I have tested FusionIO together with old thick snapshots. > > I created the thick snapshot on a separate old traditional SATA drive, > just to > > check if that could be used as a snapshot target for high performance > disks; > > like a Fusion IO card. > > For those who doesn't know about FusionIO; they can deal with > 150-250,000 IOPS. > > > > And to be honest, I couldn't bottle neck the SATA disk I used as a thic= k > > snapshot target. > > The reason for why is simple: > > - thick snapshots uses sequential write techniques > > > > If I would have been using thin snapshots, than the writes would most > likely > > be more randomized on disk, which would have required more spindles to > coop > > with this. > > > > Anyhow; > > I am still eager to hear how to use an external device to import > snapshots. > > And when I say "import"; I am not talking about copyback, more to use t= o > read > > data from. > > Format of 'on-disk' snapshot metadata for old snapshot is trivial - being > some > header + pairs of dataoffset-TO-FROM - I think googling will reveal coup= le > python tools playing with it. > > You can add pre-created COW image to LV with lvconvert --snapshot > and to avoid 'zeroing' metadata use option -Zn > (BTW in the same way you can detach snapshot from LV with --splitsnapshot > so > you can look how the metadata looks like...) > > Although it's pretty unusual why would anyone create first the COW image > with > all the special layout and then merge it to LV - instead of directly > merging... There is only the 'little' advantage of minimizing 'offline' > time > of such device (and it's the reason why --split exists). > > Regards > > Zdenek > > > --00000000000004b7980595beaf8e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Wow!

Impressing.
T= his will make history!

If this is possible, than w= e are able to implement a solution, which can do:
- progressive b= lock level incremental forever (always incremental on block level : this al= ready exist)
- instant recovery to point in time (using the menti= oned methods you just described)

For example, lets= say that a client wants to restore a file system, or a logical volume to h= ow it looked a like yesterday.
Eventhough there are no snapshot, = nor any data.
Than the client (with some coding); can start from = an empty volume, and re-attach a cow device, and convert that using lvconve= rt --merge, so that the copying can be done in background using the backup = server.

If you forget about "how we will re-c= reate the cow device"; and just focusing on the LVM ideas of re-attach= ing a cow device.
Do you think that I have understood it correctl= y?


Den tors 24 okt. 2019 kl 18:01 skrev Zdenek Kabelac = <zkabelac@redhat.com>:
=
Dne 23. 10. 19 v 13= :24 Tomas Dalebj=C3=B6rk napsal(a):
> I have tested FusionIO together with old thick snapshots.
> I created the thick snapshot on a separate old traditional SATA drive,= just to
> check if that could be used as a snapshot target for high performance = disks;
> like a Fusion IO card.
> For those who doesn't know about FusionIO; they can deal with 150-= 250,000 IOPS.
>
> And to be honest, I couldn't bottle neck the SATA disk I used as a= thick
> snapshot target.
> The reason for why is simple:
> - thick snapshots uses sequential write techniques
>
> If I would have been using thin snapshots, than the writes would most = likely
> be more randomized on disk, which would have required more spindles to= coop
> with this.
>
> Anyhow;
> I am still eager to hear how to use an external device to import snaps= hots.
> And when I say "import"; I am not talking about copyback, mo= re to use to read
> data from.

Format of 'on-disk' snapshot metadata for old snapshot is trivial -= being some
header + pairs of dataoffset-TO-FROM -=C2=A0 I think googling will reveal c= ouple
python tools playing with it.

You can add pre-created COW image to LV=C2=A0 with=C2=A0 lvconvert --snapsh= ot
and to avoid 'zeroing' metadata use option -Zn
(BTW in the same way you can detach snapshot from LV with --splitsnapshot s= o
you can look how the metadata looks like...)

Although it's pretty unusual why would anyone create first the COW imag= e with
all the special layout and then merge it to LV - instead of directly
merging...=C2=A0 =C2=A0There is only the 'little' advantage of mini= mizing 'offline' time
of such device=C2=A0 =C2=A0(and it's the reason why --split exists).
Regards

Zdenek


--00000000000004b7980595beaf8e-- From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/alternative; boundary=Apple-Mail-8E3B7A3B-A0AE-4BE2-8999-7368CD6F5ECE Content-Transfer-Encoding: 7bit From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Mon, 4 Nov 2019 06:54:29 +0100 Message-Id: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> References: In-Reply-To: Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Zdenek Kabelac Cc: Gionatan Danti , LVM general discussion and development --Apple-Mail-8E3B7A3B-A0AE-4BE2-8999-7368CD6F5ECE Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi I have some additional questions related to this. regarding this statement: =E2=80=9C While the merge is in progress, reads or writes to the origin appe= ar as they were directed to the snapshot being merged. =E2=80=9D What exactly does that mean? Will that mean that before changes are being placed on the origin device, it= has to first: read the data from the snapshot back to origin, copy the data back from orig= in to the snapshot, and than after that allow changes to happen? if that is the case, does it keep track of that this block should not be cop= ied again? and will the ongoing merge priorities this block before the other background= copying? how about read operations ? will the requested read operations on the origin volume be prioritized befor= e the copying of snapshot data? I didn=E2=80=99t find much information about this, hence why I ask here assuming that someone has executed: lvconvert - - merge -b snapshot thanks for the feedback=20 Skickat fr=C3=A5n min iPhone > 25 okt. 2019 kl. 18:31 skrev Tomas Dalebj=C3=B6rk : >=20 > =EF=BB=BF > Wow! >=20 > Impressing. > This will make history! >=20 > If this is possible, than we are able to implement a solution, which can d= o: > - progressive block level incremental forever (always incremental on block= level : this already exist) > - instant recovery to point in time (using the mentioned methods you just d= escribed) >=20 > For example, lets say that a client wants to restore a file system, or a l= ogical volume to how it looked a like yesterday. > Eventhough there are no snapshot, nor any data. > Than the client (with some coding); can start from an empty volume, and re= -attach a cow device, and convert that using lvconvert --merge, so that the c= opying can be done in background using the backup server. >=20 > If you forget about "how we will re-create the cow device"; and just focus= ing on the LVM ideas of re-attaching a cow device. > Do you think that I have understood it correctly? >=20 >=20 > Den tors 24 okt. 2019 kl 18:01 skrev Zdenek Kabelac := >> Dne 23. 10. 19 v 13:24 Tomas Dalebj=C3=B6rk napsal(a): >> > I have tested FusionIO together with old thick snapshots. >> > I created the thick snapshot on a separate old traditional SATA drive, j= ust to=20 >> > check if that could be used as a snapshot target for high performance d= isks;=20 >> > like a Fusion IO card. >> > For those who doesn't know about FusionIO; they can deal with 150-250,0= 00 IOPS. >> >=20 >> > And to be honest, I couldn't bottle neck the SATA disk I used as a thic= k >> > snapshot target. >> > The reason for why is simple: >> > - thick snapshots uses sequential write techniques >> >=20 >> > If I would have been using thin snapshots, than the writes would most l= ikely=20 >> > be more randomized on disk, which would have required more spindles to c= oop=20 >> > with this. >> >=20 >> > Anyhow; >> > I am still eager to hear how to use an external device to import snapsh= ots. >> > And when I say "import"; I am not talking about copyback, more to use t= o read=20 >> > data from. >>=20 >> Format of 'on-disk' snapshot metadata for old snapshot is trivial - being= some >> header + pairs of dataoffset-TO-FROM - I think googling will reveal coup= le >> python tools playing with it. >>=20 >> You can add pre-created COW image to LV with lvconvert --snapshot >> and to avoid 'zeroing' metadata use option -Zn >> (BTW in the same way you can detach snapshot from LV with --splitsnapshot= so=20 >> you can look how the metadata looks like...) >>=20 >> Although it's pretty unusual why would anyone create first the COW image w= ith=20 >> all the special layout and then merge it to LV - instead of directly=20 >> merging... There is only the 'little' advantage of minimizing 'offline'= time=20 >> of such device (and it's the reason why --split exists). >>=20 >> Regards >>=20 >> Zdenek >>=20 >>=20 --Apple-Mail-8E3B7A3B-A0AE-4BE2-8999-7368CD6F5ECE Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi

I have some additiona= l questions related to this.
regarding this statement:
=E2= =80=9C While t= he merge is in progress, reads or writes to the origin appear as they were d= irected to the snapshot being merged. =E2=80=9D

What exactly does that m= ean?

Will that mean that before changes are being p= laced on the origin device, it has to first:
read the data from th= e snapshot back to origin, copy the data back from origin to the snapshot, a= nd than after that allow changes to happen?
if that is the case, d= oes it keep track of that this block should not be copied again?
<= br>
and will the ongoing merge priorities this block before the ot= her background copying?

how about read operations ?=
will the requested read operations on the origin volume be priori= tized before the copying of snapshot data?

I didn=E2= =80=99t find much information about this, hence why I ask here
assuming that someone has executed: lvconvert - - merge -b snaps= hot

thanks for the feedback 

Skickat fr=C3=A5n min iPhone
=
25 okt. 2019 kl. 18:31 skrev Tomas Dalebj=C3=B6= rk <tomas.dalebjork@gmail.com>:

=EF=BB=BF
Wow!
=
Impressing.
This will make history!

<= /div>
If this is possible, than we are able to implement a solution, whi= ch can do:
- progressive block level incremental forever (always i= ncremental on block level : this already exist)
- instant recovery= to point in time (using the mentioned methods you just described)

For example, lets say that a client wants to restore a file s= ystem, or a logical volume to how it looked a like yesterday.
Even= though there are no snapshot, nor any data.
Than the client (with s= ome coding); can start from an empty volume, and re-attach a cow device, and= convert that using lvconvert --merge, so that the copying can be done in ba= ckground using the backup server.

If you forget abo= ut "how we will re-create the cow device"; and just focusing on the LVM idea= s of re-attaching a cow device.
Do you think that I have understoo= d it correctly?


Den tors 24 okt. 2019 kl 18:01 skrev Zden= ek Kabelac <zkabelac@redhat.com>:
Dne 23. 10.= 19 v 13:24 Tomas Dalebj=C3=B6rk napsal(a):
> I have tested FusionIO together with old thick snapshots.
> I created the thick snapshot on a separate old traditional SATA drive, j= ust to
> check if that could be used as a snapshot target for high performance d= isks;
> like a Fusion IO card.
> For those who doesn't know about FusionIO; they can deal with 150-250,0= 00 IOPS.
>
> And to be honest, I couldn't bottle neck the SATA disk I used as a thic= k
> snapshot target.
> The reason for why is simple:
> - thick snapshots uses sequential write techniques
>
> If I would have been using thin snapshots, than the writes would most l= ikely
> be more randomized on disk, which would have required more spindles to c= oop
> with this.
>
> Anyhow;
> I am still eager to hear how to use an external device to import snapsh= ots.
> And when I say "import"; I am not talking about copyback, more to use t= o read
> data from.

Format of 'on-disk' snapshot metadata for old snapshot is trivial - being so= me
header + pairs of dataoffset-TO-FROM -  I think googling will reveal co= uple
python tools playing with it.

You can add pre-created COW image to LV  with  lvconvert --snapsho= t
and to avoid 'zeroing' metadata use option -Zn
(BTW in the same way you can detach snapshot from LV with --splitsnapshot so=
you can look how the metadata looks like...)

Although it's pretty unusual why would anyone create first the COW image wit= h
all the special layout and then merge it to LV - instead of directly
merging...   There is only the 'little' advantage of minimizing 'o= ffline' time
of such device   (and it's the reason why --split exists).

Regards

Zdenek


= --Apple-Mail-8E3B7A3B-A0AE-4BE2-8999-7368CD6F5ECE-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> From: Zdenek Kabelac Message-ID: <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> Date: Mon, 4 Nov 2019 11:07:01 +0100 MIME-Version: 1.0 In-Reply-To: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= Cc: LVM general discussion and development Dne 04. 11. 19 v 6:54 Tomas Dalebjörk napsal(a): > Hi > > I have some additional questions related to this. > regarding this statement: > “ While the merge is in progress, reads or writes to the origin appear as they > were directed to the snapshot being merged. ” > > What exactly does that mean? > > Will that mean that before changes are being placed on the origin device, it > has to first: > read the data from the snapshot back to origin, copy the data back from origin > to the snapshot, and than after that allow changes to happen? > if that is the case, does it keep track of that this block should not be > copied again? Hi When the 'merge' is in progress - your 'origin' is no longer accessible for your normal usage. It's hiddenly active and only usable by snapshot-merge target) So during 'merging' - you can already use you snapshot like if it would be and origin - and in the background there is a process that reads data from 'snapshot' COW device and copies them back to hidden origin. (this is what you can observe with 'lvs' and copy%) So any 'new' writes to such device lends at right place - reads are either from COW (if the block has not yet been merged) or from origin. Once all blocks from 'COW' are merged into origing - tables are remapped again so all 'supportive' devices are removed and only your 'now fully merged' origin becomes present for usage (while still being fully online) Hopefully it gets more clear. For more explanation how DM works - probably visit: http://people.redhat.com/agk/talks/ > and will the ongoing merge priorities this block before the other background > copying? > > how about read operations ? > will the requested read operations on the origin volume be prioritized before > the copying of snapshot data? The priority is that you always get proper block. Don't seek there the 'top most' performance - the correctness was always the priority there and for long time there is no much devel effort on this ancient target - since thin-pool usage is simply way more superior.... 1st. note - major difficulty comes from ONLINE usage. If you do NOT need device to be online (aka you keep 'reserve' copy of device) - you can merge things directly into a device - and I simply don't see why you would want to complicate this whole with extra step of transforming data into COW format first and the do online merge. 2nd. note - clearly one cannot start 'merge' of snapshot into origin while such origin device is in-use (i.e. mounted) - as that would lead to 'modification' of such filesystem under its hands. Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> From: Zdenek Kabelac Message-ID: <992d73bd-259e-a89c-f34f-60388d9e1c5b@redhat.com> Date: Mon, 4 Nov 2019 16:04:34 +0100 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= Cc: LVM general discussion and development Dne 04. 11. 19 v 15:40 Tomas Dalebj=C3=B6rk napsal(a): > Thanks for feedback. >=20 > > Scenario 2: A write comes want to write block LP 100, but lvconvert has n= ot=20 > yet copied that LP block (yes, I do understand that origin is hidden now) > Will lvconvery prioritize to copy data from /dev/vg00/lv00-snap to=20 > /dev/vg00/lv00 for that block, and let the requestor write the changes=20 > directly on the origin after the copying has been performed? > Or will the write be blocked until lvconvert has finished the copying of = the=20 > requested block, and than a write can be accepted to the origin? > Or where will the changes be written? Since the COW device contains not only 'data' but also 'metadata' blocks and during the 'merge' it's being updated so it 'knows' which data has been already merged back to origin (in other words during the merge the usa= ge=20 of COW is being reduced towards 0) - I assume your 'plan' stops right here and there is not much point to explore how much sub-optimal the rest of=20 merging process is (and as said - primary aspect was robustness - so if th= ere=20 is crash in any moment in time - data remain correct) Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Mon, 4 Nov 2019 18:28:23 +0100 Message-Id: <1F8EA03C-9EC5-4006-91A0-4588756F5698@gmail.com> References: <992d73bd-259e-a89c-f34f-60388d9e1c5b@redhat.com> In-Reply-To: <992d73bd-259e-a89c-f34f-60388d9e1c5b@redhat.com> Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: Zdenek Kabelac Cc: LVM general discussion and development thanks, I understand that meta data blocks needs to be update, that I can understand. how about the other questions? like : data write will happen towards which device? cow device or after the copying has been completed to the origin disk? Skickat från min iPhone > 4 nov. 2019 kl. 16:04 skrev Zdenek Kabelac : > > Dne 04. 11. 19 v 15:40 Tomas Dalebjörk napsal(a): >> Thanks for feedback. >> > >> Scenario 2: A write comes want to write block LP 100, but lvconvert has not yet copied that LP block (yes, I do understand that origin is hidden now) >> Will lvconvery prioritize to copy data from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block, and let the requestor write the changes directly on the origin after the copying has been performed? >> Or will the write be blocked until lvconvert has finished the copying of the requested block, and than a write can be accepted to the origin? >> Or where will the changes be written? > > Since the COW device contains not only 'data' but also 'metadata' blocks > and during the 'merge' it's being updated so it 'knows' which data has > been already merged back to origin (in other words during the merge the usage of COW is being reduced towards 0) - I assume your 'plan' stops right here > and there is not much point to explore how much sub-optimal the rest of merging process is (and as said - primary aspect was robustness - so if there is crash in any moment in time - data remain correct) > > Regards > > Zdenek > From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> In-Reply-To: <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Mon, 4 Nov 2019 15:40:59 +0100 Message-ID: Content-Type: multipart/alternative; boundary="0000000000008bb4a50596864ecd" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Zdenek Kabelac Cc: LVM general discussion and development --0000000000008bb4a50596864ecd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for feedback. Let me try to type different scenarios: We have an origin volume, lets call it: /dev/vg00/lv00 We convert a snapshot volume to origin volume, lets call it: /dev/vg00/lv00-snap - all blocks has been changed, and are represented in the /dev/vg00/lv00-snap, when we start the lvconvert process I assume that something reads the data from /dev/vg00/lv00-snap and copy that to /dev/vg00/lv00 It will most likely start from the first block, to the last block to copy. The block size is 1MB on /dev/vg00/lv00-snap, and we have for simplicity the same block size on the origin /dev/vg00/lv00 Scenario 1: A read comes want to read block LP 100, but lvconvert has not yet copied that LP block. Will the read comes from /dev/vg00/lv00-snap directly and delivered to requestor? Or will lvconvert prioritize to copy data from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block, and let the requestor wait until the copying has been completed, so that a read operation can happen from origin? Or will the requestor have to wait until the copy data from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block has been completed, without any prioritization? Scenario 2: A write comes want to write block LP 100, but lvconvert has not yet copied that LP block (yes, I do understand that origin is hidden now) Will lvconvery prioritize to copy data from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block, and let the requestor write the changes directly on the origin after the copying has been performed? Or will the write be blocked until lvconvert has finished the copying of the requested block, and than a write can be accepted to the origin? Or where will the changes be written? It is important for me to understand, as the backup device that I want to map as a COW device is a read only target, and is not allowed to be written to. If read happends from the backup COW device, and writes happends to the origin, than it is possible to create an instant recovery. If writes happends to the backup COW device, than it not that easy to implement a instance reovery solution, as the backup device is write protected. Thanks in advance. Den m=C3=A5n 4 nov. 2019 kl 11:07 skrev Zdenek Kabelac : > Dne 04. 11. 19 v 6:54 Tomas Dalebj=C3=B6rk napsal(a): > > Hi > > > > I have some additional questions related to this. > > regarding this statement: > > =E2=80=9C While the merge is in progress, reads or writes to the origin= appear > as they > > were directed to the snapshot being merged. =E2=80=9D > > > > What exactly does that mean? > > > > Will that mean that before changes are being placed on the origin > device, it > > has to first: > > read the data from the snapshot back to origin, copy the data back from > origin > > to the snapshot, and than after that allow changes to happen? > > if that is the case, does it keep track of that this block should not b= e > > copied again? > > Hi > > When the 'merge' is in progress - your 'origin' is no longer accessible > for your normal usage. It's hiddenly active and only usable by > snapshot-merge > target) > > So during 'merging' - you can already use you snapshot like if it would b= e > and > origin - and in the background there is a process that reads data from > 'snapshot' COW device and copies them back to hidden origin. > (this is what you can observe with 'lvs' and copy%) > > So any 'new' writes to such device lends at right place - reads are > either > from COW (if the block has not yet been merged) or from origin. > > Once all blocks from 'COW' are merged into origing - tables are remapped > again > so all 'supportive' devices are removed and only your 'now fully merged' > origin becomes present for usage (while still being fully online) > > Hopefully it gets more clear. > > > For more explanation how DM works - probably visit: > http://people.redhat.com/agk/talks/ > > > and will the ongoing merge priorities this block before the other > background > > copying? > > > > how about read operations ? > > will the requested read operations on the origin volume be prioritized > before > > the copying of snapshot data? > > The priority is that you always get proper block. > Don't seek there the 'top most' performance - the correctness was always > the > priority there and for long time there is no much devel effort on this > ancient > target - since thin-pool usage is simply way more superior.... > > 1st. note - major difficulty comes from ONLINE usage. If you do NOT need > device to be online (aka you keep 'reserve' copy of device) - you can > merge > things directly into a device - and I simply don't see why you would want > to > complicate this whole with extra step of transforming data into COW forma= t > first and the do online merge. > > 2nd. note - clearly one cannot start 'merge' of snapshot into origin whil= e > such origin device is in-use (i.e. mounted) - as that would lead to > 'modification' of such filesystem under its hands. > > Regards > > Zdenek > > --0000000000008bb4a50596864ecd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for feedback.

Let me = try to type different scenarios:

We have an origin= volume, lets call it: /dev/vg00/lv00
We convert a snapshot volum= e to origin volume, lets call it: /dev/vg00/lv00-snap
- all block= s has been changed, and are represented in the /dev/vg00/lv00-snap, when we= start the lvconvert process

I assume that somethi= ng reads the data from /dev/vg00/lv00-snap and copy that to /dev/vg00/lv00<= /div>
It will most likely start from the first block, to the last block= to copy.
The block size is 1MB on /dev/vg00/lv00-snap, and we ha= ve for simplicity the same block size on the origin /dev/vg00/lv00

Scenario 1: A read comes want to read block LP 100, but lv= convert has not yet copied that LP block.
Will the read comes fro= m /dev/vg00/lv00-snap directly and delivered to requestor?
Or wil= l lvconvert prioritize to copy data from /dev/vg00/lv00-snap to /dev/vg00/l= v00 for that block, and let the requestor wait until the copying has been c= ompleted, so that a read operation can happen from origin?
Or wil= l the requestor have to wait until the copy data from /dev/vg00/lv00-snap t= o /dev/vg00/lv00 for that block has been completed, without any prioritizat= ion?

Scenario 2: A write comes want to write block= LP 100, but lvconvert has not yet copied that LP block (yes, I do understa= nd that origin is hidden now)
Will lvconvery prioritize to copy d= ata from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block, and let the = requestor write the changes directly on the origin after the copying has be= en performed?
Or will the write be blocked until lvconvert has fi= nished the copying of the requested block, and than a write can be accepted= to the origin?
Or where will the changes be written?
<= br>
It is important for me to understand, as the backup device th= at I want to map as a COW device is a read only target, and is not allowed = to be written to.
If read happends from the backup COW device, an= d writes happends to the origin, than it is possible to create an instant r= ecovery.
If writes happends to the backup COW device, than it not= that easy to implement a instance reovery solution, as the backup device i= s write protected.

Thanks in advance.

Dne 04. 11. 19 v 6:54 Tomas Dalebj=C3=B6rk= napsal(a):
> Hi
>
> I have some additional questions related to this.
> regarding this statement:
> =E2=80=9C=C2=A0While the merge is in progress, reads or writes to the = origin appear as they
> were directed to the snapshot being merged. =E2=80=9D
>
> What exactly does that mean?
>
> Will that mean that before changes are being placed on the origin devi= ce, it
> has to first:
> read the data from the snapshot back to origin, copy the data back fro= m origin
> to the snapshot, and than after that allow changes to happen?
> if that is the case, does it keep track of that this block should not = be
> copied again?

Hi

When the 'merge' is in progress -=C2=A0 your 'origin' is no= longer accessible
for your normal usage. It's hiddenly active and only usable by snapshot= -merge
target)

So during 'merging' - you can already use you snapshot like if it w= ould be and
origin - and in the background there is a process that reads data from
'snapshot' COW device and copies them back to hidden origin.
(this is what you can observe with 'lvs' and copy%)

So any 'new' writes to such device lends at right place -=C2=A0 rea= ds are either
from COW (if the block has not yet been merged) or from origin.

Once all blocks from 'COW' are merged into origing - tables are rem= apped again
so all 'supportive' devices are removed and only your 'now full= y merged'
origin becomes present for usage (while still being fully online)

Hopefully it gets more clear.


For more explanation how DM works - probably visit:
http://people.redhat.com/agk/talks/

> and will the ongoing merge priorities this block before the other back= ground
> copying?
>
> how about read operations ?
> will the requested read operations on the origin volume be prioritized= before
> the copying of snapshot data?

The priority is that you always get proper block.
Don't seek there the 'top most' performance - the correctness w= as always the
priority there and for long time there is no much devel effort on this anci= ent
target - since=C2=A0 thin-pool usage is simply way more superior....

1st. note - major difficulty comes from ONLINE usage. If you do NOT need device to be online (aka you keep 'reserve' copy of device) - you c= an merge
things directly into a device - and I simply don't see why you would wa= nt to
complicate this whole with extra step of transforming data into COW format =
first and the do online merge.

2nd. note - clearly one cannot start 'merge' of snapshot into origi= n while
such origin device is in-use (i.e. mounted) - as that would lead to
'modification' of such filesystem under its hands.

Regards

Zdenek

--0000000000008bb4a50596864ecd-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <992d73bd-259e-a89c-f34f-60388d9e1c5b@redhat.com> <1F8EA03C-9EC5-4006-91A0-4588756F5698@gmail.com> From: Zdenek Kabelac Message-ID: Date: Tue, 5 Nov 2019 17:24:01 +0100 MIME-Version: 1.0 In-Reply-To: <1F8EA03C-9EC5-4006-91A0-4588756F5698@gmail.com> Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: LVM general discussion and development , =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= Dne 04. 11. 19 v 18:28 Tomas Dalebj=C3=B6rk napsal(a): > thanks, I understand that meta data blocks needs to be update, that I can= understand. > how about the other questions? > like : data write will happen towards which device? cow device or after t= he copying has been completed to the origin disk? Hi I'd assume - if the block is still mapped in COW and the block is not yet=20 merged into origin - the 'write' needs to lend COW - as there is no 'extra'= =20 information about which 'portion' of the chunk has been already 'merged'. If you happen to 'write' your I/O to currently merged 'chunk' - you will wait till check gets merged and metadata are updated and then your I/O land= in=20 origin. But I don't think there are any optimization made - as it doesn't really=20 matter too much in terms of the actual merging speed - if couple I/O are=20 repeated - who cares - on the overall time of whole merging process it will= =20 have negligible impact - and as said - the preference was made towards=20 simplicity and correctness. For the most details - just feel free to take a look at: linux/drviers/md/dm-snap.c i.e. function snapshot_merge_next_chunks() The snapshot was designed to be small and map a very low percentage of orig= in=20 device - it's never been assumed to be used with 200GiB and similar snapsho= t=20 COW size.... Regads Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 5 Nov 2019 11:40:01 -0500 (EST) From: Mikulas Patocka In-Reply-To: Message-ID: References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="185206533-1230804827-1572972002=:14929" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: =?ISO-8859-15?Q?Tomas_Dalebj=F6rk?= Cc: LVM general discussion and development , Zdenek Kabelac This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --185206533-1230804827-1572972002=:14929 Content-Type: TEXT/PLAIN; charset="utf-8" Content-Transfer-Encoding: 8bit On Mon, 4 Nov 2019, Tomas Dalebjörk wrote: > Thanks for feedback. > > Let me try to type different scenarios: > > We have an origin volume, lets call it: /dev/vg00/lv00 > We convert a snapshot volume to origin volume, lets call it: /dev/vg00/lv00-snap > - all blocks has been changed, and are represented in the /dev/vg00/lv00-snap, when we start the lvconvert process > > I assume that something reads the data from /dev/vg00/lv00-snap and copy that to /dev/vg00/lv00 > It will most likely start from the first block, to the last block to copy. Merging starts from the last block on the lv00-snap device and it proceeds backward to the beginning. > The block size is 1MB on /dev/vg00/lv00-snap, and we have for simplicity the same block size on the origin /dev/vg00/lv00 > > Scenario 1: A read comes want to read block LP 100, but lvconvert has not yet copied that LP block. > Will the read comes from /dev/vg00/lv00-snap directly and delivered to requestor? Yes. > Or will lvconvert prioritize to copy data from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block, and let the requestor wait until the copying has been completed, so > that a read operation can happen from origin? > Or will the requestor have to wait until the copy data from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block has been completed, without any prioritization? It only waits if you attempt to read or write the block that is currently being copied. If you read data that hasn't been merged yet, it reads from the snapshot, if you read data that has been merged, it reads from the origin, if you read data that is currently being copied, it waits. > Scenario 2: A write comes want to write block LP 100, but lvconvert has not yet copied that LP block (yes, I do understand that origin is hidden now) > Will lvconvery prioritize to copy data from /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block, and let the requestor write the changes directly on the origin after the > copying has been performed? No. > Or will the write be blocked until lvconvert has finished the copying of the requested block, and than a write can be accepted to the origin? > Or where will the changes be written? The changes will be written to the lv00-snap device. If you write data that hasn't been merged yet, the write is redirected to the lv00-snap device. If you write data that has already been merged, the write is directed to the origin device. If you write data that is currently being merged, it waits. > It is important for me to understand, as the backup device that I want to map as a COW device is a read only target, and is not allowed to be written to. You can't have read-only COW device. Both metadata and data on the COW device are updated during the merge. > If read happends from the backup COW device, and writes happends to the origin, than it is possible to create an instant recovery. > If writes happends to the backup COW device, than it not that easy to implement a instance reovery solution, as the backup device is write protected. > > Thanks in advance. Mikulas --185206533-1230804827-1572972002=:14929-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> In-Reply-To: From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Tue, 5 Nov 2019 21:56:27 +0100 Message-ID: Content-Type: multipart/alternative; boundary="00000000000039ebaa05969faba9" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Mikulas Patocka Cc: LVM general discussion and development , Zdenek Kabelac --00000000000039ebaa05969faba9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks, That really helped me to understand how the snapshot works. Last question: - lets say that block 100 which is 1MB in size is in the cow device, and a write happen that wants to something or all data on that region of block 100. Than I assume; based on what have been previously said here, that the block in the cow device will be overwritten with the new changes. Regards Tomas Den tis 5 nov. 2019 kl 17:40 skrev Mikulas Patocka : > > > On Mon, 4 Nov 2019, Tomas Dalebj=C3=B6rk wrote: > > > Thanks for feedback. > > > > Let me try to type different scenarios: > > > > We have an origin volume, lets call it: /dev/vg00/lv00 > > We convert a snapshot volume to origin volume, lets call it: > /dev/vg00/lv00-snap > > - all blocks has been changed, and are represented in the > /dev/vg00/lv00-snap, when we start the lvconvert process > > > > I assume that something reads the data from /dev/vg00/lv00-snap and cop= y > that to /dev/vg00/lv00 > > It will most likely start from the first block, to the last block to > copy. > > Merging starts from the last block on the lv00-snap device and it proceed= s > backward to the beginning. > > > The block size is 1MB on /dev/vg00/lv00-snap, and we have for simplicit= y > the same block size on the origin /dev/vg00/lv00 > > > > Scenario 1: A read comes want to read block LP 100, but lvconvert has > not yet copied that LP block. > > Will the read comes from /dev/vg00/lv00-snap directly and delivered to > requestor? > > Yes. > > > Or will lvconvert prioritize to copy data from /dev/vg00/lv00-snap to > /dev/vg00/lv00 for that block, and let the requestor wait until the copyi= ng > has been completed, so > > that a read operation can happen from origin? > > Or will the requestor have to wait until the copy data from > /dev/vg00/lv00-snap to /dev/vg00/lv00 for that block has been completed, > without any prioritization? > > It only waits if you attempt to read or write the block that is currently > being copied. > > If you read data that hasn't been merged yet, it reads from the snapshot, > if you read data that has been merged, it reads from the origin, if you > read data that is currently being copied, it waits. > > > Scenario 2: A write comes want to write block LP 100, but lvconvert has > not yet copied that LP block (yes, I do understand that origin is hidden > now) > > Will lvconvery prioritize to copy data from /dev/vg00/lv00-snap to > /dev/vg00/lv00 for that block, and let the requestor write the changes > directly on the origin after the > > copying has been performed? > > No. > > > Or will the write be blocked until lvconvert has finished the copying o= f > the requested block, and than a write can be accepted to the origin? > > Or where will the changes be written? > > The changes will be written to the lv00-snap device. > > If you write data that hasn't been merged yet, the write is redirected to > the lv00-snap device. If you write data that has already been merged, the > write is directed to the origin device. If you write data that is > currently being merged, it waits. > > > It is important for me to understand, as the backup device that I want > to map as a COW device is a read only target, and is not allowed to be > written to. > > You can't have read-only COW device. Both metadata and data on the COW > device are updated during the merge. > > > If read happends from the backup COW device, and writes happends to the > origin, than it is possible to create an instant recovery. > > If writes happends to the backup COW device, than it not that easy to > implement a instance reovery solution, as the backup device is write > protected. > > > > Thanks in advance. > > Mikulas --00000000000039ebaa05969faba9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks,

That really helped m= e to understand how the snapshot works.
Last question:
= - lets say that block 100 which is 1MB in size is in the cow device, and a = write happen that wants to something or all data on that region of block 10= 0.
Than I assume; based on what have been previously said here, t= hat the block in the cow device will be overwritten with the new changes.

Regards Tomas

Den tis 5 nov. 2019 kl 17:40= skrev Mikulas Patocka <mpatocka@= redhat.com>:


On Mon, 4 Nov 2019, Tomas Dalebj=C3=B6rk wrote:

> Thanks for feedback.
>
> Let me try to type different scenarios:
>
> We have an origin volume, lets call it: /dev/vg00/lv00
> We convert a snapshot volume to origin volume, lets call it: /dev/vg00= /lv00-snap
> - all blocks has been changed, and are represented in the /dev/vg00/lv= 00-snap, when we start the lvconvert process
>
> I assume that something reads the data from /dev/vg00/lv00-snap and co= py that to /dev/vg00/lv00
> It will most likely start from the first block, to the last block to c= opy.

Merging starts from the last block on the lv00-snap device and it proceeds =
backward to the beginning.

> The block size is 1MB on /dev/vg00/lv00-snap, and we have for simplici= ty the same block size on the origin /dev/vg00/lv00
>
> Scenario 1: A read comes want to read block LP 100, but lvconvert has = not yet copied that LP block.
> Will the read comes from /dev/vg00/lv00-snap directly and delivered to= requestor?

Yes.

> Or will lvconvert prioritize to copy data from /dev/vg00/lv00-snap to = /dev/vg00/lv00 for that block, and let the requestor wait until the copying= has been completed, so
> that a read operation can happen from origin?
> Or will the requestor have to wait until the copy data from /dev/vg00/= lv00-snap to /dev/vg00/lv00 for that block has been completed, without any = prioritization?

It only waits if you attempt to read or write the block that is currently <= br> being copied.

If you read data that hasn't been merged yet, it reads from the snapsho= t,
if you read data that has been merged, it reads from the origin, if you read data that is currently being copied, it waits.

> Scenario 2: A write comes want to write block LP 100, but lvconvert ha= s not yet copied that LP block (yes, I do understand that origin is hidden = now)
> Will lvconvery prioritize to copy data from /dev/vg00/lv00-snap to /de= v/vg00/lv00 for that block, and let the requestor write the changes directl= y on the origin after the
> copying has been performed?

No.

> Or will the write be blocked until lvconvert has finished the copying = of the requested block, and than a write can be accepted to the origin?
> Or where will the changes be written?

The changes will be written to the lv00-snap device.

If you write data that hasn't been merged yet, the write is redirected = to
the lv00-snap device. If you write data that has already been merged, the <= br> write is directed to the origin device. If you write data that is
currently being merged, it waits.

> It is important for me to understand, as the backup device that I want= to map as a COW device is a read only target, and is not allowed to be wri= tten to.

You can't have read-only COW device. Both metadata and data on the COW =
device are updated during the merge.

> If read happends from the backup COW device, and writes happends to th= e origin, than it is possible to create an instant recovery.
> If writes happends to the backup COW device, than it not that easy to = implement a instance reovery solution, as the backup device is write protec= ted.
>
> Thanks in advance.

Mikulas
--00000000000039ebaa05969faba9-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> From: Zdenek Kabelac Message-ID: Date: Wed, 6 Nov 2019 10:22:25 +0100 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= Cc: LVM general discussion and development Dne 05. 11. 19 v 21:56 Tomas Dalebj=C3=B6rk napsal(a): > Thanks, >=20 > That really helped me to understand how the snapshot works. > Last question: > - lets say that block 100 which is 1MB in size is in the cow device, and = a=20 > write happen that wants to something or all data on that region of block = 100. > Than I assume; based on what have been previously said here, that the blo= ck in=20 > the cow device will be overwritten with the new changes. Yes - it needs to be written to 'COW' device - since when the block will be= =20 merged - it would overwrite whatever would have been written in 'origin' (as said - there is nothing else in snapshot metadata then 'from->to' bloc= k=20 mapping table - so there is no way to store information about a portion of = 'chunk' being already written into origin) - and 'merge' needs to work=20 reliable in cases like 'power-off' in the middle of merge operation... Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 7 Nov 2019 11:54:43 -0500 (EST) From: Mikulas Patocka In-Reply-To: Message-ID: References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="185206533-1804003513-1573145683=:31278" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: =?ISO-8859-15?Q?Tomas_Dalebj=F6rk?= Cc: LVM general discussion and development , Zdenek Kabelac This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --185206533-1804003513-1573145683=:31278 Content-Type: TEXT/PLAIN; charset="utf-8" Content-Transfer-Encoding: 8bit On Tue, 5 Nov 2019, Tomas Dalebjörk wrote: > Thanks, > > That really helped me to understand how the snapshot works. > Last question: > - lets say that block 100 which is 1MB in size is in the cow device, and a write happen that wants to something or all data on that region of block 100. > Than I assume; based on what have been previously said here, that the block in the cow device will be overwritten with the new changes. Yes, the block in the cow device will be overwritten. Mikulas > Regards Tomas --185206533-1804003513-1573145683=:31278-- From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <5BDF90CF-72E7-44A0-8C78-7854B2B8996A@gmail.com> <29ad8317-4a55-f017-6a0b-c06bf40ccab8@redhat.com> In-Reply-To: From: =?UTF-8?Q?Tomas_Dalebj=C3=B6rk?= Date: Thu, 7 Nov 2019 18:29:21 +0100 Message-ID: Content-Type: multipart/alternative; boundary="00000000000037ecc90596c50289" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Mikulas Patocka Cc: LVM general discussion and development , Zdenek Kabelac --00000000000037ecc90596c50289 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Great, thanks! Den tors 7 nov. 2019 kl 17:54 skrev Mikulas Patocka : > > > On Tue, 5 Nov 2019, Tomas Dalebj=C3=B6rk wrote: > > > Thanks, > > > > That really helped me to understand how the snapshot works. > > Last question: > > - lets say that block 100 which is 1MB in size is in the cow device, an= d > a write happen that wants to something or all data on that region of bloc= k > 100. > > Than I assume; based on what have been previously said here, that the > block in the cow device will be overwritten with the new changes. > > Yes, the block in the cow device will be overwritten. > > Mikulas > > > Regards Tomas --00000000000037ecc90596c50289 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Great, thanks!

Den tors 7 nov. 2019 kl 17:54 skrev Mik= ulas Patocka <mpatocka@redhat.com= >:


On Tue, 5 Nov 2019, Tomas Dalebj=C3=B6rk wrote:

> Thanks,
>
> That really helped me to understand how the snapshot works.
> Last question:
> - lets say that block 100 which is 1MB in size is in the cow device, a= nd a write happen that wants to something or all data on that region of blo= ck 100.
> Than I assume; based on what have been previously said here, that the = block in the cow device will be overwritten with the new changes.

Yes, the block in the cow device will be overwritten.

Mikulas

> Regards Tomas
--00000000000037ecc90596c50289-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <0B002628-0740-4DA4-A9BD-320A743A7A30@gmail.com> From: Zdenek Kabelac Message-ID: <06b9d2f9-86eb-4b5d-6198-6608842e0448@redhat.com> Date: Fri, 4 Sep 2020 14:37:59 +0200 MIME-Version: 1.0 In-Reply-To: <0B002628-0740-4DA4-A9BD-320A743A7A30@gmail.com> Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= Cc: LVM general discussion and development Dne 04. 09. 20 v 14:09 Tomas Dalebjörk napsal(a): > hi > > I tried to perform as suggested > # lvconvert —splitsnapshot vg/lv-snap > works fine > # lvconvert -s vg/lv vg/lv-snap > works fine too > > but... > if I try to converting cow data directly from the meta device, than it doesn’t > work > eg > # lvconvert -s vg/lv /dev/mycowdev > the tool doesn’t like the path > I tried to place a link in /dev/vg/mycowdev -> /dev/mycowdev Hi lvm2 does only support 'objects' within VG without any plan to support 'external' devices. So user may not take any 'random' device in a system and use it for commands like lvconvert. There is always very strict requirement to place block devices as VG member first (pvcreate, vgextend...) and then user can allocate space of this device for various LVs. > conclusion > even though the cow device is an exact copy of the cow device that I have > saved on /dev/mycowdev before the split, it wouldn’t work to use to convert > back as a lvm snapshot COW data needs to be simply stored on an LV for use with lvm2. You may of course use the 'dmsetup' command directly and arrange your snapshot setup in the way to combine various kinds of devices - but this is going completely without any lvm2 command involved - in this case you have to fully manipulate all devices in your device stack with this dmsetup command. Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast06.extmail.prod.ext.rdu2.redhat.com [10.11.55.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 05E9110F038 for ; Fri, 4 Sep 2020 12:09:57 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AEC57185A78B for ; Fri, 4 Sep 2020 12:09:57 +0000 (UTC) From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Fri, 4 Sep 2020 14:09:47 +0200 Message-Id: <0B002628-0740-4DA4-A9BD-320A743A7A30@gmail.com> References: In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-4BA69AAC-0ACB-41BC-8310-70990E9FF80E Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Mikulas Patocka Cc: LVM general discussion and development , Zdenek Kabelac --Apple-Mail-4BA69AAC-0ACB-41BC-8310-70990E9FF80E Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable hi I tried to perform as suggested # lvconvert =E2=80=94splitsnapshot vg/lv-snap works fine # lvconvert -s vg/lv vg/lv-snap works fine too but... if I try to converting cow data directly from the meta device, than it does= n=E2=80=99t work eg # lvconvert -s vg/lv /dev/mycowdev the tool doesn=E2=80=99t like the path I tried to place a link in /dev/vg/mycowdev -> /dev/mycowdev and retried the operations=20 # lvconveet -s vg/lv /dev/vg/mycowdev but this doesn=E2=80=99t work either conclusion=20 even though the cow device is an exact copy of the cow device that I have s= aved on /dev/mycowdev before the split, it wouldn=E2=80=99t work to use to = convert back as a lvm snapshot=20 not sure if I understand the tool correctly, or if there are other things n= eeded to perform, such as creating virtual information about the lvm VGDA d= ata on the first of this virtual volume named /dev/mycowdev=20 let me know what more steps are needed beat regards Tomas Sent from my iPhone > On 7 Nov 2019, at 18:29, Tomas Dalebj=C3=B6rk = wrote: >=20 > =EF=BB=BF > Great, thanks!=20 >=20 > Den tors 7 nov. 2019 kl 17:54 skrev Mikulas Patocka = : >>=20 >>=20 >> On Tue, 5 Nov 2019, Tomas Dalebj=C3=B6rk wrote: >>=20 >> > Thanks, >> >=20 >> > That really helped me to understand how the snapshot works. >> > Last question: >> > - lets say that block 100 which is 1MB in size is in the cow device, a= nd a write happen that wants to something or all data on that region of blo= ck 100. >> > Than I assume; based on what have been previously said here, that the = block in the cow device will be overwritten with the new changes. >>=20 >> Yes, the block in the cow device will be overwritten. >>=20 >> Mikulas >>=20 >> > Regards Tomas --Apple-Mail-4BA69AAC-0ACB-41BC-8310-70990E9FF80E Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable hi

I tried to perfor= m as suggested
# lvconvert =E2=80=94splitsnapshot vg/lv-snap
works fine
# lvconvert -s vg/lv vg/lv-snap
works = fine too

but...
if I try to converting c= ow data directly from the meta device, than it doesn=E2=80=99t work
eg
# lvconvert -s vg/lv /dev/mycowdev
the tool doesn= =E2=80=99t like the path
I tried to place a link in /dev/vg/mycow= dev -> /dev/mycowdev
and retried the operations 
# lvconveet -s vg/lv /dev/vg/mycowdev
but this doesn=E2=80=99t = work either

conclusion 
even though= the cow device is an exact copy of the cow device that I have saved on /de= v/mycowdev before the split, it wouldn=E2=80=99t work to use to convert bac= k as a lvm snapshot 

not sure if I understand= the tool correctly, or if there are other things needed to perform, such a= s creating virtual information about the lvm VGDA data on the first of this= virtual volume named /dev/mycowdev 

let me k= now what more steps are needed

beat regards Tomas<= /div>

Sent from my iPhone
On 7 Nov 2019, at 18:29, Tomas Dalebj=C3=B6rk &= lt;tomas.dalebjork@gmail.com> wrote:

=EF=BB=BF
Great, thanks! =

Den tors 7 nov. 2019 kl 17:54 skrev Mikulas Patocka <mpatocka@redhat.com>:


On Tue, 5 Nov 2019, Tomas Dalebj=C3=B6rk wrote:

> Thanks,
>
> That really helped me to understand how the snapshot works.
> Last question:
> - lets say that block 100 which is 1MB in size is in the cow device, a= nd a write happen that wants to something or all data on that region of blo= ck 100.
> Than I assume; based on what have been previously said here, that the = block in the cow device will be overwritten with the new changes.

Yes, the block in the cow device will be overwritten.

Mikulas

> Regards Tomas
--Apple-Mail-4BA69AAC-0ACB-41BC-8310-70990E9FF80E-- From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 7 Sep 2020 09:09:08 -0400 (EDT) From: Mikulas Patocka In-Reply-To: <0B002628-0740-4DA4-A9BD-320A743A7A30@gmail.com> Message-ID: References: <0B002628-0740-4DA4-A9BD-320A743A7A30@gmail.com> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="185206533-864684669-1599484148=:5779" Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: =?ISO-8859-15?Q?Tomas_Dalebj=F6rk?= Cc: LVM general discussion and development , Zdenek Kabelac This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --185206533-864684669-1599484148=:5779 Content-Type: TEXT/PLAIN; charset="utf-8" Content-Transfer-Encoding: 8bit On Fri, 4 Sep 2020, Tomas Dalebjörk wrote: > hi > I tried to perform as suggested > # lvconvert —splitsnapshot vg/lv-snap > works fine > # lvconvert -s vg/lv vg/lv-snap > works fine too > > but... > if I try to converting cow data directly from the meta device, than it doesn’t work > eg > # lvconvert -s vg/lv /dev/mycowdev > the tool doesn’t like the path > I tried to place a link in /dev/vg/mycowdev -> /dev/mycowdev > and retried the operations  > # lvconveet -s vg/lv /dev/vg/mycowdev > but this doesn’t work either > > conclusion  even though the cow device is an exact copy of the cow > device that I have saved on /dev/mycowdev before the split, it wouldn’t > work to use to convert back as a lvm snapshot  > > not sure if I understand the tool correctly, or if there are other > things needed to perform, such as creating virtual information about the > lvm VGDA data on the first of this virtual volume named /dev/mycowdev  AFAIK LVM doesn't support taking existing cow device and attaching it to an existing volume. When you create a snapshot, you start with am empty cow. Mikulas > let me know what more steps are needed > > beat regards Tomas > > Sent from my iPhone > > On 7 Nov 2019, at 18:29, Tomas Dalebjörk wrote: > > Great, thanks! > > Den tors 7 nov. 2019 kl 17:54 skrev Mikulas Patocka : > > > On Tue, 5 Nov 2019, Tomas Dalebjörk wrote: > > > Thanks, > > > > That really helped me to understand how the snapshot works. > > Last question: > > - lets say that block 100 which is 1MB in size is in the cow device, and a write happen that wants to something or all data on that region of block > 100. > > Than I assume; based on what have been previously said here, that the block in the cow device will be overwritten with the new changes. > > Yes, the block in the cow device will be overwritten. > > Mikulas > > > Regards Tomas > > > --185206533-864684669-1599484148=:5779-- From mboxrd@z Thu Jan 1 00:00:00 1970 References: <0B002628-0740-4DA4-A9BD-320A743A7A30@gmail.com> <30564577-1c0a-7405-f70e-81614b62dec0@gmail.com> From: Zdenek Kabelac Message-ID: Date: Mon, 7 Sep 2020 16:17:47 +0200 MIME-Version: 1.0 In-Reply-To: <30564577-1c0a-7405-f70e-81614b62dec0@gmail.com> Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= , Mikulas Patocka Cc: LVM general discussion and development Dne 07. 09. 20 v 16:14 Dalebj=C3=B6rk, Tomas napsal(a): > Hi Mikulas, >=20 > Thanks for the replies >=20 > I am confused now with the last message? >=20 > LVM doesn't support taking existing cow device and attaching it to an exi= sting=20 > volume? >=20 > Isn't that what "lvconvert --splitsnapshot" & "lvconvert -s" is ment to b= e doing? >=20 > lets say that I create the snapshot on a different device using these ste= ps: >=20 > root@src# lvcreate -s -L 10GB -n lvsnap vg/lv /dev/sdh > root@src# lvconvert ---splitsnapshot vg/lvsnap > root@src# echo "I now move /dev/sdb to another server" > root@tgt# lvconvert -s newvg/newlv vg/lvsnap >=20 Hi This is only supported as long as you stay within VG. So newlv & lvsnap must be in a single VG. Note - you can 'vgreduce' PV from VG1 and vgextend to VG2. But it always work on whole PV base - you can't mix LV between VGs. Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast04.extmail.prod.ext.rdu2.redhat.com [10.11.55.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9C8F730B84 for ; Mon, 7 Sep 2020 16:34:25 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D6179101AA40 for ; Mon, 7 Sep 2020 16:34:25 +0000 (UTC) From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Mon, 7 Sep 2020 18:34:15 +0200 Message-Id: References: In-Reply-To: Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: Zdenek Kabelac Cc: LVM general discussion and development thanks for feedback so if I understand this correctly # fallocate -l 100M /tmp/pv1 # fallocate -l 100M /tmp/pv2 # fallocate -l 100M /tmp/pv3 # losetup —find —show /tmp/pv1 # losetup —find —show /tmp/pv2 # losetup —find —show /tmp/pv3 # vgcreate vg0 /dev/loop0 # lvcreate -n lv0 -l 1 vg0 # vgextend vg0 /dev/loop1 # lvcreate -s -l 1 -n lvsnap /dev/loop1 # vgchange -a n vg0 # lvconvert —splitsnapshot vg0/lvsnap # vgreduce vg0 /dev/loop1 # vgcreate vg1 /dev/loop2 # lvcreate -n lv0 -l 1 vg1 # vgextend vg1 /dev/loop1 # lvconvert -s vg1/lvsnap vg1/lv0 not sure if the steps are correct? regards Tomas Sent from my iPhone > On 7 Sep 2020, at 16:17, Zdenek Kabelac wrote: > > Dne 07. 09. 20 v 16:14 Dalebjörk, Tomas napsal(a): >> Hi Mikulas, >> Thanks for the replies >> I am confused now with the last message? >> LVM doesn't support taking existing cow device and attaching it to an existing volume? >> Isn't that what "lvconvert --splitsnapshot" & "lvconvert -s" is ment to be doing? >> lets say that I create the snapshot on a different device using these steps: >> root@src# lvcreate -s -L 10GB -n lvsnap vg/lv /dev/sdh >> root@src# lvconvert ---splitsnapshot vg/lvsnap >> root@src# echo "I now move /dev/sdb to another server" >> root@tgt# lvconvert -s newvg/newlv vg/lvsnap > > Hi > > This is only supported as long as you stay within VG. > So newlv & lvsnap must be in a single VG. > > Note - you can 'vgreduce' PV from VG1 and vgextend to VG2. > But it always work on whole PV base - you can't mix > LV between VGs. > > Zdenek > From mboxrd@z Thu Jan 1 00:00:00 1970 References: From: Zdenek Kabelac Message-ID: <4081c11e-d2b0-ebfa-d1a0-92a4efc79e81@redhat.com> Date: Mon, 7 Sep 2020 18:42:08 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= Cc: LVM general discussion and development Dne 07. 09. 20 v 18:34 Tomas Dalebjörk napsal(a): > thanks for feedback > > so if I understand this correctly > # fallocate -l 100M /tmp/pv1 > # fallocate -l 100M /tmp/pv2 > # fallocate -l 100M /tmp/pv3 > > # losetup —find —show /tmp/pv1 > # losetup —find —show /tmp/pv2 > # losetup —find —show /tmp/pv3 > > # vgcreate vg0 /dev/loop0 > # lvcreate -n lv0 -l 1 vg0 > # vgextend vg0 /dev/loop1 > # lvcreate -s -l 1 -n lvsnap /dev/loop1 > # vgchange -a n vg0 > > # lvconvert —splitsnapshot vg0/lvsnap > > # vgreduce vg0 /dev/loop1 Hi Here you would need to use 'vgsplit' rather - otherwise you loose the mapping for whatever was living on /dev/loop1 > > # vgcreate vg1 /dev/loop2 > # lvcreate -n lv0 -l 1 vg1 > # vgextend vg1 /dev/loop1 And 'vgmerge' > # lvconvert -s vg1/lvsnap vg1/lv0 > > not sure if the steps are correct? > I hope you realize the content of vg1/lv0 must be exactly same as vg0/lv0. As snapshot COW volume contains only 'diff chunks' - so if you would attach snapshot to 'different' lv - you would get only mess. Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast05.extmail.prod.ext.rdu2.redhat.com [10.11.55.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E5FB71730C for ; Mon, 7 Sep 2020 17:37:48 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9A0B1803915 for ; Mon, 7 Sep 2020 17:37:48 +0000 (UTC) From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Mon, 7 Sep 2020 19:37:41 +0200 Message-Id: References: <4081c11e-d2b0-ebfa-d1a0-92a4efc79e81@redhat.com> In-Reply-To: <4081c11e-d2b0-ebfa-d1a0-92a4efc79e81@redhat.com> Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: Zdenek Kabelac Cc: LVM general discussion and development thanks ok vgsplit/merge instead and after that lvconvert-s yes, I am aware of the issues with corruption but if the cow device has all data, than no corruption will happen, right? if COW has a copy of all blocks than a lvconvert —merge, or mount of the snapshot volume will be without issues right? regards Tomas Sent from my iPhone > On 7 Sep 2020, at 18:42, Zdenek Kabelac wrote: > > Dne 07. 09. 20 v 18:34 Tomas Dalebjörk napsal(a): >> thanks for feedback >> so if I understand this correctly >> # fallocate -l 100M /tmp/pv1 >> # fallocate -l 100M /tmp/pv2 >> # fallocate -l 100M /tmp/pv3 >> # losetup —find —show /tmp/pv1 >> # losetup —find —show /tmp/pv2 >> # losetup —find —show /tmp/pv3 >> # vgcreate vg0 /dev/loop0 >> # lvcreate -n lv0 -l 1 vg0 >> # vgextend vg0 /dev/loop1 >> # lvcreate -s -l 1 -n lvsnap /dev/loop1 >> # vgchange -a n vg0 >> # lvconvert —splitsnapshot vg0/lvsnap >> # vgreduce vg0 /dev/loop1 > > > Hi > > Here you would need to use 'vgsplit' rather - otherwise you > loose the mapping for whatever was living on /dev/loop1 > >> # vgcreate vg1 /dev/loop2 >> # lvcreate -n lv0 -l 1 vg1 >> # vgextend vg1 /dev/loop1 > > And 'vgmerge' > > >> # lvconvert -s vg1/lvsnap vg1/lv0 >> not sure if the steps are correct? > > > I hope you realize the content of vg1/lv0 must be exactly same > as vg0/lv0. > > As snapshot COW volume contains only 'diff chunks' - so if you > would attach snapshot to 'different' lv - you would get only mess. > > > Zdenek > From mboxrd@z Thu Jan 1 00:00:00 1970 References: <4081c11e-d2b0-ebfa-d1a0-92a4efc79e81@redhat.com> From: Zdenek Kabelac Message-ID: <643c43ad-0814-18a7-bb60-439adb4c6514@redhat.com> Date: Mon, 7 Sep 2020 19:50:35 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: =?UTF-8?Q?Tomas_Dalebj=c3=b6rk?= Cc: LVM general discussion and development Dne 07. 09. 20 v 19:37 Tomas Dalebjörk napsal(a): > thanks > > ok > vgsplit/merge instead > and after that lvconvert-s > > yes, I am aware of the issues with corruption > but if the cow device has all data, than no corruption will happen, right? > > if COW has a copy of all blocks > than a lvconvert —merge, or mount of the snapshot volume will be without issues If the 'COW' has all the data - why do you need then snapshot ? Why not travel whole LV instead of snapshot ? Also - nowdays this old (so called 'thick') snapshot is really slow compared with thin-provisioning - might be good if you check what kind of features you can gain/loose if you would have switched to thin-pool (clearly whole thin-pool (both data & metadata) would need to travel between your VGs.) Regards Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast01.extmail.prod.ext.rdu2.redhat.com [10.11.55.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C67769E9D for ; Mon, 7 Sep 2020 19:56:16 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4F0C3856DF1 for ; Mon, 7 Sep 2020 19:56:16 +0000 (UTC) From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Mon, 7 Sep 2020 21:56:07 +0200 Message-Id: <99376C3F-DB70-4917-BA62-3A51A55F6DF5@gmail.com> References: <4081c11e-d2b0-ebfa-d1a0-92a4efc79e81@redhat.com> In-Reply-To: <4081c11e-d2b0-ebfa-d1a0-92a4efc79e81@redhat.com> Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: Zdenek Kabelac Cc: LVM general discussion and development hi I tried all these steps but when I associated the snapshot cow device back to an empty origin, and typed the lvs command the data% output shows 0% instead of 37% ? so it looks like that the lvconvert -s vg1/lvsnap vg1/lv0 looses the cow data? perhaps ypu can guide me how this can be done? btw, just to emulate s full copy, I executed the dd if=/dev/vg0/lv0 of=/dev/vg1/lv0 before the lvconvert -s, to make sure the last data is there and than I tried to mount the vg1/lv0 which worked fine but the data was not at snapshot view even mounting vg1/lvsnap works fine but with wrong data confused over how and why vgmerge should be used as vgsplit does the work? regards Tomas Sent from my iPhone > On 7 Sep 2020, at 18:42, Zdenek Kabelac wrote: > > Dne 07. 09. 20 v 18:34 Tomas Dalebjörk napsal(a): >> thanks for feedback >> so if I understand this correctly >> # fallocate -l 100M /tmp/pv1 >> # fallocate -l 100M /tmp/pv2 >> # fallocate -l 100M /tmp/pv3 >> # losetup —find —show /tmp/pv1 >> # losetup —find —show /tmp/pv2 >> # losetup —find —show /tmp/pv3 >> # vgcreate vg0 /dev/loop0 >> # lvcreate -n lv0 -l 1 vg0 >> # vgextend vg0 /dev/loop1 >> # lvcreate -s -l 1 -n lvsnap /dev/loop1 >> # vgchange -a n vg0 >> # lvconvert —splitsnapshot vg0/lvsnap >> # vgreduce vg0 /dev/loop1 > > > Hi > > Here you would need to use 'vgsplit' rather - otherwise you > loose the mapping for whatever was living on /dev/loop1 > >> # vgcreate vg1 /dev/loop2 >> # lvcreate -n lv0 -l 1 vg1 >> # vgextend vg1 /dev/loop1 > > And 'vgmerge' > > >> # lvconvert -s vg1/lvsnap vg1/lv0 >> not sure if the steps are correct? > > > I hope you realize the content of vg1/lv0 must be exactly same > as vg0/lv0. > > As snapshot COW volume contains only 'diff chunks' - so if you > would attach snapshot to 'different' lv - you would get only mess. > > > Zdenek > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast06.extmail.prod.ext.rdu2.redhat.com [10.11.55.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 18081202450E for ; Mon, 7 Sep 2020 20:22:25 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B04B1186E120 for ; Mon, 7 Sep 2020 20:22:25 +0000 (UTC) From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Mon, 7 Sep 2020 22:22:16 +0200 Message-Id: <891B1B37-E9D2-47D1-8E1B-58F342003F00@gmail.com> References: <99376C3F-DB70-4917-BA62-3A51A55F6DF5@gmail.com> In-Reply-To: <99376C3F-DB70-4917-BA62-3A51A55F6DF5@gmail.com> Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: Zdenek Kabelac Cc: LVM general discussion and development yes, we need the snapshot data, as it is provisioned from the backup target and can’t be changed we will definitely look into thin snapshots later, but want to first making sure that we can reanimate the cow device as a device and associate this with an empty origin we want if possible be able to associate this cow to a new empty vg/lv using new vgname/lvname if possible after all, it is just an virutal volume Sent from my iPhone > On 7 Sep 2020, at 21:56, Tomas Dalebjörk wrote: > > hi > I tried all these steps > but when I associated the snapshot cow device back to an empty origin, and typed the lvs command > the data% output shows 0% instead of 37% ? > so it looks like that the lvconvert -s vg1/lvsnap vg1/lv0 looses the cow data? > > perhaps ypu can guide me how this can be done? > > btw, just to emulate s full copy, I executed the > dd if=/dev/vg0/lv0 of=/dev/vg1/lv0 > before the lvconvert -s, to make sure the last data is there > > and than I tried to mount the vg1/lv0 which worked fine > but the data was not at snapshot view > even mounting vg1/lvsnap works fine > but with wrong data > > confused over how and why vgmerge should be used as vgsplit does the work? > > regards Tomas > > Sent from my iPhone > >> On 7 Sep 2020, at 18:42, Zdenek Kabelac wrote: >> >> Dne 07. 09. 20 v 18:34 Tomas Dalebjörk napsal(a): >>> thanks for feedback >>> so if I understand this correctly >>> # fallocate -l 100M /tmp/pv1 >>> # fallocate -l 100M /tmp/pv2 >>> # fallocate -l 100M /tmp/pv3 >>> # losetup —find —show /tmp/pv1 >>> # losetup —find —show /tmp/pv2 >>> # losetup —find —show /tmp/pv3 >>> # vgcreate vg0 /dev/loop0 >>> # lvcreate -n lv0 -l 1 vg0 >>> # vgextend vg0 /dev/loop1 >>> # lvcreate -s -l 1 -n lvsnap /dev/loop1 >>> # vgchange -a n vg0 >>> # lvconvert —splitsnapshot vg0/lvsnap >>> # vgreduce vg0 /dev/loop1 >> >> >> Hi >> >> Here you would need to use 'vgsplit' rather - otherwise you >> loose the mapping for whatever was living on /dev/loop1 >> >>> # vgcreate vg1 /dev/loop2 >>> # lvcreate -n lv0 -l 1 vg1 >>> # vgextend vg1 /dev/loop1 >> >> And 'vgmerge' >> >> >>> # lvconvert -s vg1/lvsnap vg1/lv0 >>> not sure if the steps are correct? >> >> >> I hope you realize the content of vg1/lv0 must be exactly same >> as vg0/lv0. >> >> As snapshot COW volume contains only 'diff chunks' - so if you >> would attach snapshot to 'different' lv - you would get only mess. >> >> >> Zdenek >> From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast02.extmail.prod.ext.rdu2.redhat.com [10.11.55.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2C8722166B44 for ; Mon, 7 Sep 2020 21:02:16 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CFD618007A4 for ; Mon, 7 Sep 2020 21:02:16 +0000 (UTC) From: =?utf-8?Q?Tomas_Dalebj=C3=B6rk?= Mime-Version: 1.0 (1.0) Date: Mon, 7 Sep 2020 23:02:08 +0200 Message-Id: References: <99376C3F-DB70-4917-BA62-3A51A55F6DF5@gmail.com> In-Reply-To: <99376C3F-DB70-4917-BA62-3A51A55F6DF5@gmail.com> Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: Zdenek Kabelac Cc: LVM general discussion and development it worked I missed the -Zn flag Sent from my iPhone > On 7 Sep 2020, at 21:56, Tomas Dalebjörk wrote: > > hi > I tried all these steps > but when I associated the snapshot cow device back to an empty origin, and typed the lvs command > the data% output shows 0% instead of 37% ? > so it looks like that the lvconvert -s vg1/lvsnap vg1/lv0 looses the cow data? > > perhaps ypu can guide me how this can be done? > > btw, just to emulate s full copy, I executed the > dd if=/dev/vg0/lv0 of=/dev/vg1/lv0 > before the lvconvert -s, to make sure the last data is there > > and than I tried to mount the vg1/lv0 which worked fine > but the data was not at snapshot view > even mounting vg1/lvsnap works fine > but with wrong data > > confused over how and why vgmerge should be used as vgsplit does the work? > > regards Tomas > > Sent from my iPhone > >> On 7 Sep 2020, at 18:42, Zdenek Kabelac wrote: >> >> Dne 07. 09. 20 v 18:34 Tomas Dalebjörk napsal(a): >>> thanks for feedback >>> so if I understand this correctly >>> # fallocate -l 100M /tmp/pv1 >>> # fallocate -l 100M /tmp/pv2 >>> # fallocate -l 100M /tmp/pv3 >>> # losetup —find —show /tmp/pv1 >>> # losetup —find —show /tmp/pv2 >>> # losetup —find —show /tmp/pv3 >>> # vgcreate vg0 /dev/loop0 >>> # lvcreate -n lv0 -l 1 vg0 >>> # vgextend vg0 /dev/loop1 >>> # lvcreate -s -l 1 -n lvsnap /dev/loop1 >>> # vgchange -a n vg0 >>> # lvconvert —splitsnapshot vg0/lvsnap >>> # vgreduce vg0 /dev/loop1 >> >> >> Hi >> >> Here you would need to use 'vgsplit' rather - otherwise you >> loose the mapping for whatever was living on /dev/loop1 >> >>> # vgcreate vg1 /dev/loop2 >>> # lvcreate -n lv0 -l 1 vg1 >>> # vgextend vg1 /dev/loop1 >> >> And 'vgmerge' >> >> >>> # lvconvert -s vg1/lvsnap vg1/lv0 >>> not sure if the steps are correct? >> >> >> I hope you realize the content of vg1/lv0 must be exactly same >> as vg0/lv0. >> >> As snapshot COW volume contains only 'diff chunks' - so if you >> would attach snapshot to 'different' lv - you would get only mess. >> >> >> Zdenek >> From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast01.extmail.prod.ext.rdu2.redhat.com [10.11.55.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F05EA2166B28 for ; Mon, 7 Sep 2020 14:14:25 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 96B28856DEC for ; Mon, 7 Sep 2020 14:14:25 +0000 (UTC) References: <0B002628-0740-4DA4-A9BD-320A743A7A30@gmail.com> From: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Message-ID: <30564577-1c0a-7405-f70e-81614b62dec0@gmail.com> Date: Mon, 7 Sep 2020 16:14:16 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------E6D7F23649AEFE52F024FDA6" Content-Language: sv Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Mikulas Patocka Cc: LVM general discussion and development , Zdenek Kabelac This is a multi-part message in MIME format. --------------E6D7F23649AEFE52F024FDA6 Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Hi Mikulas, Thanks for the replies I am confused now with the last message? LVM doesn't support taking existing cow device and attaching it to an existing volume? Isn't that what "lvconvert --splitsnapshot" & "lvconvert -s" is ment to be doing? lets say that I create the snapshot on a different device using these steps: root@src# lvcreate -s -L 10GB -n lvsnap vg/lv /dev/sdh root@src# lvconvert ---splitsnapshot vg/lvsnap root@src# echo "I now move /dev/sdb to another server" root@tgt# lvconvert -s newvg/newlv vg/lvsnap Regards Tomas Den 2020-09-07 kl. 15:09, skrev Mikulas Patocka: > > On Fri, 4 Sep 2020, Tomas Dalebjörk wrote: > >> hi >> I tried to perform as suggested >> # lvconvert —splitsnapshot vg/lv-snap >> works fine >> # lvconvert -s vg/lv vg/lv-snap >> works fine too >> >> but... >> if I try to converting cow data directly from the meta device, than it doesn’t work >> eg >> # lvconvert -s vg/lv /dev/mycowdev >> the tool doesn’t like the path >> I tried to place a link in /dev/vg/mycowdev -> /dev/mycowdev >> and retried the operations >> # lvconveet -s vg/lv /dev/vg/mycowdev >> but this doesn’t work either >> >> conclusion  even though the cow device is an exact copy of the cow >> device that I have saved on /dev/mycowdev before the split, it wouldn’t >> work to use to convert back as a lvm snapshot >> >> not sure if I understand the tool correctly, or if there are other >> things needed to perform, such as creating virtual information about the >> lvm VGDA data on the first of this virtual volume named /dev/mycowdev > AFAIK LVM doesn't support taking existing cow device and attaching it to > an existing volume. When you create a snapshot, you start with am empty > cow. > > Mikulas > >> let me know what more steps are needed >> >> beat regards Tomas >> >> Sent from my iPhone >> >> On 7 Nov 2019, at 18:29, Tomas Dalebjörk wrote: >> >> Great, thanks! >> >> Den tors 7 nov. 2019 kl 17:54 skrev Mikulas Patocka : >> >> >> On Tue, 5 Nov 2019, Tomas Dalebjörk wrote: >> >> > Thanks, >> > >> > That really helped me to understand how the snapshot works. >> > Last question: >> > - lets say that block 100 which is 1MB in size is in the cow device, and a write happen that wants to something or all data on that region of block >> 100. >> > Than I assume; based on what have been previously said here, that the block in the cow device will be overwritten with the new changes. >> >> Yes, the block in the cow device will be overwritten. >> >> Mikulas >> >> > Regards Tomas >> >> --------------E6D7F23649AEFE52F024FDA6 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: 8bit

Hi Mikulas,

Thanks for the replies

I am confused now with the last message?

LVM doesn't support taking existing cow device and attaching it to an existing volume?

Isn't that what "lvconvert --splitsnapshot" & "lvconvert -s" is ment to be doing?

lets say that I create the snapshot on a different device using these steps:

root@src# lvcreate -s -L 10GB -n lvsnap vg/lv /dev/sdh
root@src# lvconvert ---splitsnapshot vg/lvsnap
root@src# echo "I now move /dev/sdb to another server"
root@tgt# lvconvert -s newvg/newlv vg/lvsnap


Regards Tomas

Den 2020-09-07 kl. 15:09, skrev Mikulas Patocka:

On Fri, 4 Sep 2020, Tomas Dalebjörk wrote:

hi
I tried to perform as suggested
# lvconvert —splitsnapshot vg/lv-snap
works fine
# lvconvert -s vg/lv vg/lv-snap
works fine too

but...
if I try to converting cow data directly from the meta device, than it doesn’t work
eg
# lvconvert -s vg/lv /dev/mycowdev
the tool doesn’t like the path
I tried to place a link in /dev/vg/mycowdev -> /dev/mycowdev
and retried the operations 
# lvconveet -s vg/lv /dev/vg/mycowdev
but this doesn’t work either

conclusion  even though the cow device is an exact copy of the cow 
device that I have saved on /dev/mycowdev before the split, it wouldn’t 
work to use to convert back as a lvm snapshot 

not sure if I understand the tool correctly, or if there are other 
things needed to perform, such as creating virtual information about the 
lvm VGDA data on the first of this virtual volume named /dev/mycowdev 
AFAIK LVM doesn't support taking existing cow device and attaching it to 
an existing volume. When you create a snapshot, you start with am empty 
cow.

Mikulas

let me know what more steps are needed

beat regards Tomas

Sent from my iPhone

      On 7 Nov 2019, at 18:29, Tomas Dalebjörk <tomas.dalebjork@gmail.com> wrote:

      Great, thanks!

Den tors 7 nov. 2019 kl 17:54 skrev Mikulas Patocka <mpatocka@redhat.com>:


      On Tue, 5 Nov 2019, Tomas Dalebjörk wrote:

      > Thanks,
      >
      > That really helped me to understand how the snapshot works.
      > Last question:
      > - lets say that block 100 which is 1MB in size is in the cow device, and a write happen that wants to something or all data on that region of block
      100.
      > Than I assume; based on what have been previously said here, that the block in the cow device will be overwritten with the new changes.

      Yes, the block in the cow device will be overwritten.

      Mikulas

      > Regards Tomas


--------------E6D7F23649AEFE52F024FDA6-- From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mimecast-mx02.redhat.com (mimecast05.extmail.prod.ext.rdu2.redhat.com [10.11.55.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 53D27A37A3 for ; Tue, 8 Sep 2020 12:32:33 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2F1C6906383 for ; Tue, 8 Sep 2020 12:32:33 +0000 (UTC) References: <4081c11e-d2b0-ebfa-d1a0-92a4efc79e81@redhat.com> <643c43ad-0814-18a7-bb60-439adb4c6514@redhat.com> From: =?UTF-8?Q?Dalebj=c3=b6rk=2c_Tomas?= Message-ID: Date: Tue, 8 Sep 2020 14:32:23 +0200 MIME-Version: 1.0 In-Reply-To: <643c43ad-0814-18a7-bb60-439adb4c6514@redhat.com> Content-Type: multipart/alternative; boundary="------------5658DE6AE8A38C07BC45B845" Content-Language: sv Subject: Re: [linux-lvm] exposing snapshot block device Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Zdenek Kabelac Cc: LVM general discussion and development This is a multi-part message in MIME format. --------------5658DE6AE8A38C07BC45B845 Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Hi, This is the steps that I did. - the COW data exists on /dev/loop1, including space for the PV header + metadata I created a fakevg template file from vgcfgbackup /tmp/fakevg.bkp ( the content of this file I created fake uuid etc... ) I craete a fake uuid for the PV # pvcreate -ff -u fake-uidx-nrxx-xxxx --restorefile /tmp/fakevg.bkp And created rhe metadata from the backup # vgcfgrestore -f /tmp/fakevg.bkp fakevg I can now see the lvsnap in fakevg Perhaps the restore can be done directly to the destination vg? not sure... Anyhow, I than used the vgsplit to move the fakevg data to the destination vg # vgsplit fakevg destvg /dev/loop1 I know have the lvsnap volume in the correct volume group From here, I connected the lvsnap to a lv destination using # lvconvert -Zn -s destvg/lvsnap destvg/destlv I know have a snapshot connected to the origin destlv From here, I can either mount the snapshot and start using it, or revert to the snapshot # lvchange -a n destvg/destlv # lvconvert --merge -b destvg/lvsnap # lvchange -a y destvg/destlv Now to my questions... is there any DBUS api that can perform the vgcfgrestore operations that I can use through C? or another ways to recreate the metadata? I have to now use two steps: pvcreate + vgcfgrestore, where I just need to actually restore just the metadata (only vgcfgrestore)? If I run vgcfgrestore without pvcreate, than vgcfgrestore will not find the pvid, and cant be executed with a parameter like: # vgcfgrestore -f vgXX.bkp /dev/nbd Instead it has to be used with the parameter vgXX pointing out the volume group... I can live with vgcfgrestore + pvcreate, but would prefer to use the libblockdev (DBUS) or another api from C directly. What options do I have? Thanks for an excellent help God Bless Tomas Den 2020-09-07 kl. 19:50, skrev Zdenek Kabelac: > Dne 07. 09. 20 v 19:37 Tomas Dalebjörk napsal(a): >> thanks >> >> ok >>   vgsplit/merge instead >> and after that lvconvert-s >> >> yes, I am aware of the issues with corruption >> but if the cow device has all data, than no corruption will happen, >> right? >> >> if COW has a copy of all blocks >> than a lvconvert —merge, or mount of the snapshot volume will be >> without issues > > If the 'COW' has all the data - why do you need then snapshot ? > Why not travel whole LV instead of snapshot ? > > Also - nowdays this old (so called 'thick') snapshot is really slow > compared with thin-provisioning - might be good if you check what kind > of features > you can gain/loose if you would have switched to thin-pool > (clearly whole thin-pool (both data & metadata) would need to travel > between your VGs.) > > Regards > > Zdenek > --------------5658DE6AE8A38C07BC45B845 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: 8bit

Hi,


This is the steps that I did.

- the COW data exists on /dev/loop1, including space for the PV header + metadata

I created a fakevg template file from vgcfgbackup /tmp/fakevg.bkp

( the content of this file I created fake uuid etc... )


I craete a fake uuid for the PV

# pvcreate -ff -u fake-uidx-nrxx-xxxx --restorefile /tmp/fakevg.bkp


And created rhe metadata from the backup

# vgcfgrestore -f /tmp/fakevg.bkp fakevg


I can now see the lvsnap in fakevg

Perhaps the restore can be done directly to the destination vg? not sure...

Anyhow, I than used the vgsplit to move the fakevg data to the destination vg

# vgsplit fakevg destvg /dev/loop1


I know have the lvsnap volume in the correct volume group

From here, I connected the lvsnap to a lv destination using

# lvconvert -Zn -s destvg/lvsnap destvg/destlv


I know have a snapshot connected to the origin destlv

From here, I can either mount the snapshot and start using it, or revert to the snapshot

# lvchange -a n destvg/destlv
# lvconvert --merge -b destvg/lvsnap
# lvchange -a y destvg/destlv


Now to my questions...

is there any DBUS api that can perform the vgcfgrestore operations that I can use through C?

or another ways to recreate the metadata?

I have to now use two steps: pvcreate + vgcfgrestore, where I just need to actually restore just the metadata (only vgcfgrestore)?

If I run vgcfgrestore without pvcreate, than vgcfgrestore will not find the pvid, and cant be executed with a parameter like:

# vgcfgrestore -f vgXX.bkp /dev/nbd

Instead it has to be used with the parameter vgXX pointing out the volume group...


I can live with vgcfgrestore + pvcreate, but would prefer to use the libblockdev (DBUS) or another api from C directly.

What options do I have?


Thanks for an excellent help

God Bless

Tomas



Den 2020-09-07 kl. 19:50, skrev Zdenek Kabelac:
Dne 07. 09. 20 v 19:37 Tomas Dalebjörk napsal(a):
thanks

ok
  vgsplit/merge instead
and after that lvconvert-s

yes, I am aware of the issues with corruption
but if the cow device has all data, than no corruption will happen, right?

if COW has a copy of all blocks
than a lvconvert —merge, or mount of the snapshot volume will be without issues

If the 'COW' has all the data - why do you need then snapshot ?
Why not travel whole LV instead of snapshot ?

Also - nowdays this old (so called 'thick') snapshot is really slow compared with thin-provisioning - might be good if you check what kind of features
you can gain/loose if you would have switched to thin-pool
(clearly whole thin-pool (both data & metadata) would need to travel between your VGs.)

Regards

Zdenek

--------------5658DE6AE8A38C07BC45B845--