* [Qemu-devel] [RFC] live snapshot, live merge, live block migration @ 2011-05-09 13:40 Dor Laor 2011-05-09 15:23 ` Anthony Liguori ` (3 more replies) 0 siblings, 4 replies; 28+ messages in thread From: Dor Laor @ 2011-05-09 13:40 UTC (permalink / raw) To: qemu-devel, Anthony Liguori, Avi Kivity, Marcelo Tosatti, jes sorensen, Kevin Wolf, Stefan Hajnoczi No patch here (sorry) but collection of thoughts about these features and their potential building blocks. Please review (also on http://wiki.qemu.org/Features/LiveBlockMigration) Future qemu is expected to support these features (some already implemented): * Live block copy Ability to copy 1+ virtual disk from the source backing file/block device to a new target that is accessible by the host. The copy supposed to be executed while the VM runs in a transparent way. Status: code exists (by Marcelo) today in qemu but needs refactoring due to a race condition at the end of the copy operation. We agreed that a re-implementation of the copy operation should take place that makes sure the image is completely mirrored until management decides what copy to keep. * Live snapshots and live snapshot merge Live snapshot is already incorporated (by Jes) in qemu (still need qemu-agent work to freeze the guest FS). Live snapshot merge is required in order of reducing the overhead caused by the additional snapshots (sometimes over raw device). Currently not implemented for a live running guest Possibility: enhance live copy to be used for live snapshot merge. It is almost the same mechanism. * Copy on read (image streaming) Ability to start guest execution while the parent image reside remotely and each block access is replicated to a local copy (image format snapshot) It should be nice to have a general mechanism that will be used for all image formats. What about the protocol to access these blocks over the net? We can reuse existing ones (nbd/iscsi). Such functionality can be hooked together with live block migration instead of the 'post copy' method. * Live block migration (pre/post) Beyond live block copy we'll sometimes need to move both the storage and the guest. There are two main approached here: - pre copy First live copy the image and only then live migration the VM. It is simple but if the purpose of the whole live block migration was to balance the cpu load, it won't be practical to use since copying an image of 100GB will take too long. - post copy First live migrate the VM, then live copy it's blocks. It's better approach for HA/load balancing but it might make management complex (need to keep the source VM alive, what happens on failures?) Using copy on read might simplify it - post copy = live snapshot + copy on read. In addition there are two cases for the storage access: 1. The source block device is shared and can be easily accessed by the destination qemu-kvm process. That's the easy case, no special protocol needed for the block devices copying. 2. There is no shared storage at all. This means we should implement a block access protocol over the live migration fd :( We need to chose whether to implement a new one, or re-use NBD or iScsi (target&initiator) * Using external dirty block bitmap FVD has an option to use external dirty block bitmap file in addition to the regular mapping/data files. We can consider using it for live block migration and live merge too. It can also allow additional usages of 3rd party tools to calculate diffs between the snapshots. There is a big down side thought since it will make management complicated and there is the risky of the image and its bitmap file get out of sync. It's much better choice to have qemu-img tool to be the single interface to the dirty block bitmap data. Summary: * We need Marcelo's new (to come) block copy implementation * should work in parallel to migration and hotplug * General copy on read is desirable * Live snapshot merge to be implemented using block copy * Need to utilize a remote block access protocol (iscsi/nbd/other) Which one is the best? * Keep qemu-img the single interface for dirty block mappings. * Live block migration pre copy == live copy + block access protocol + live migration * Live block migration post copy == live migration + block access protocol/copy on read. Comments? Regards, Dor ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-09 13:40 [Qemu-devel] [RFC] live snapshot, live merge, live block migration Dor Laor @ 2011-05-09 15:23 ` Anthony Liguori 2011-05-09 20:58 ` Dor Laor ` (2 more replies) 2011-05-10 14:13 ` Marcelo Tosatti ` (2 subsequent siblings) 3 siblings, 3 replies; 28+ messages in thread From: Anthony Liguori @ 2011-05-09 15:23 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, jes sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 05/09/2011 08:40 AM, Dor Laor wrote: > No patch here (sorry) but collection of thoughts about these features > and their potential building blocks. Please review (also on > http://wiki.qemu.org/Features/LiveBlockMigration) > > Future qemu is expected to support these features (some already > implemented): > > * Live block copy > > Ability to copy 1+ virtual disk from the source backing file/block > device to a new target that is accessible by the host. The copy > supposed to be executed while the VM runs in a transparent way. > > Status: code exists (by Marcelo) today in qemu but needs refactoring > due to a race condition at the end of the copy operation. We agreed > that a re-implementation of the copy operation should take place > that makes sure the image is completely mirrored until management > decides what copy to keep. Live block copy is growing on me. It can actually be used (with an intermediate network storage) to do live block migration. > > * Live snapshots and live snapshot merge > > Live snapshot is already incorporated (by Jes) in qemu (still need > qemu-agent work to freeze the guest FS). Live snapshot is unfortunately not really "live". It runs a lot of operations synchronously which will cause the guest to incur downtime. We really need to refactor it to truly be live. > * Copy on read (image streaming) > Ability to start guest execution while the parent image reside > remotely and each block access is replicated to a local copy (image > format snapshot) > > It should be nice to have a general mechanism that will be used for > all image formats. What about the protocol to access these blocks > over the net? We can reuse existing ones (nbd/iscsi). I think the image format is really the best place to have this logic. Of course, if we have live snapshot merge, we could use a temporary QED/QCOW2 file and then merge afterwards. > * Using external dirty block bitmap > > FVD has an option to use external dirty block bitmap file in > addition to the regular mapping/data files. > > We can consider using it for live block migration and live merge too. > It can also allow additional usages of 3rd party tools to calculate > diffs between the snapshots. > There is a big down side thought since it will make management > complicated and there is the risky of the image and its bitmap file > get out of sync. It's much better choice to have qemu-img tool to be > the single interface to the dirty block bitmap data. Does the dirty block bitmap need to exist outside of QEMU? IOW, if it goes away after a guest shuts down, is that problematic? I think it potentially greatly simplifies the problem which makes it appealing from my perspective. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-09 15:23 ` Anthony Liguori @ 2011-05-09 20:58 ` Dor Laor 2011-05-12 14:18 ` Marcelo Tosatti 2011-05-12 15:37 ` Jes Sorensen 2 siblings, 0 replies; 28+ messages in thread From: Dor Laor @ 2011-05-09 20:58 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, jes sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 05/09/2011 06:23 PM, Anthony Liguori wrote: > On 05/09/2011 08:40 AM, Dor Laor wrote: >> No patch here (sorry) but collection of thoughts about these features >> and their potential building blocks. Please review (also on >> http://wiki.qemu.org/Features/LiveBlockMigration) >> >> Future qemu is expected to support these features (some already >> implemented): >> >> * Live block copy >> >> Ability to copy 1+ virtual disk from the source backing file/block >> device to a new target that is accessible by the host. The copy >> supposed to be executed while the VM runs in a transparent way. >> >> Status: code exists (by Marcelo) today in qemu but needs refactoring >> due to a race condition at the end of the copy operation. We agreed >> that a re-implementation of the copy operation should take place >> that makes sure the image is completely mirrored until management >> decides what copy to keep. > > Live block copy is growing on me. It can actually be used (with an > intermediate network storage) to do live block migration. I'm not sure that we can relay on such storage. While it looks that anyway can get such temporal storage, it makes failure cases complex, it will need additional locking, security permissions, etc. That said, the main gap is the block copy protocol and using qemu as iScsi target/initiator might be a good solution. > >> >> * Live snapshots and live snapshot merge >> >> Live snapshot is already incorporated (by Jes) in qemu (still need >> qemu-agent work to freeze the guest FS). > > Live snapshot is unfortunately not really "live". It runs a lot of > operations synchronously which will cause the guest to incur downtime. > > We really need to refactor it to truly be live. Well live migration is not really live too. It can be thought as implementation details and improved later on. > >> * Copy on read (image streaming) >> Ability to start guest execution while the parent image reside >> remotely and each block access is replicated to a local copy (image >> format snapshot) >> >> It should be nice to have a general mechanism that will be used for >> all image formats. What about the protocol to access these blocks >> over the net? We can reuse existing ones (nbd/iscsi). > > I think the image format is really the best place to have this logic. Of > course, if we have live snapshot merge, we could use a temporary > QED/QCOW2 file and then merge afterwards. > >> * Using external dirty block bitmap >> >> FVD has an option to use external dirty block bitmap file in >> addition to the regular mapping/data files. >> >> We can consider using it for live block migration and live merge too. >> It can also allow additional usages of 3rd party tools to calculate >> diffs between the snapshots. >> There is a big down side thought since it will make management >> complicated and there is the risky of the image and its bitmap file >> get out of sync. It's much better choice to have qemu-img tool to be >> the single interface to the dirty block bitmap data. > > Does the dirty block bitmap need to exist outside of QEMU? > > IOW, if it goes away after a guest shuts down, is that problematic? I admit I didn't give it enough thought, I think that sharing the code w/ qemu-img should be enough for us. If we have a live block operation and suddenly the guest shuts down in the middle we need to finish the block copy. > > I think it potentially greatly simplifies the problem which makes it > appealing from my perspective. > > Regards, > > Anthony Liguori > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-09 15:23 ` Anthony Liguori 2011-05-09 20:58 ` Dor Laor @ 2011-05-12 14:18 ` Marcelo Tosatti 2011-05-12 15:37 ` Jes Sorensen 2 siblings, 0 replies; 28+ messages in thread From: Marcelo Tosatti @ 2011-05-12 14:18 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, jes sorensen, dlaor, qemu-devel, Avi Kivity, Stefan Hajnoczi On Mon, May 09, 2011 at 10:23:03AM -0500, Anthony Liguori wrote: > On 05/09/2011 08:40 AM, Dor Laor wrote: > >No patch here (sorry) but collection of thoughts about these features > >and their potential building blocks. Please review (also on > >http://wiki.qemu.org/Features/LiveBlockMigration) > > > >Future qemu is expected to support these features (some already > >implemented): > > > >* Live block copy > > > >Ability to copy 1+ virtual disk from the source backing file/block > >device to a new target that is accessible by the host. The copy > >supposed to be executed while the VM runs in a transparent way. > > > >Status: code exists (by Marcelo) today in qemu but needs refactoring > >due to a race condition at the end of the copy operation. We agreed > >that a re-implementation of the copy operation should take place > >that makes sure the image is completely mirrored until management > >decides what copy to keep. > > Live block copy is growing on me. It can actually be used (with an > intermediate network storage) to do live block migration. > > > > >* Live snapshots and live snapshot merge > > > >Live snapshot is already incorporated (by Jes) in qemu (still need > >qemu-agent work to freeze the guest FS). > > Live snapshot is unfortunately not really "live". It runs a lot of > operations synchronously which will cause the guest to incur > downtime. > > We really need to refactor it to truly be live. > > > >* Copy on read (image streaming) > >Ability to start guest execution while the parent image reside > >remotely and each block access is replicated to a local copy (image > >format snapshot) > > > >It should be nice to have a general mechanism that will be used for > >all image formats. What about the protocol to access these blocks > >over the net? We can reuse existing ones (nbd/iscsi). > > I think the image format is really the best place to have this > logic. Of course, if we have live snapshot merge, we could use a > temporary QED/QCOW2 file and then merge afterwards. > > >* Using external dirty block bitmap > > > >FVD has an option to use external dirty block bitmap file in > >addition to the regular mapping/data files. > > > >We can consider using it for live block migration and live merge too. > >It can also allow additional usages of 3rd party tools to calculate > >diffs between the snapshots. > >There is a big down side thought since it will make management > >complicated and there is the risky of the image and its bitmap file > >get out of sync. It's much better choice to have qemu-img tool to be > >the single interface to the dirty block bitmap data. > > Does the dirty block bitmap need to exist outside of QEMU? > > IOW, if it goes away after a guest shuts down, is that problematic? > > I think it potentially greatly simplifies the problem which makes it > appealing from my perspective. One limitation of block copy is the need to rewrite data that differs from the base image on every "merge". But this is a limitation of qcow2 external snapshots represented as files, not block copy itself (with external qcow2 snapshots, even a "live block merge" would require potentially copying large amounts of data). Only with snapshots internal to an image data copying can be avoided (and depending on the scenario, this can be a nasty limitation). > > Regards, > > Anthony Liguori ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-09 15:23 ` Anthony Liguori 2011-05-09 20:58 ` Dor Laor 2011-05-12 14:18 ` Marcelo Tosatti @ 2011-05-12 15:37 ` Jes Sorensen 2 siblings, 0 replies; 28+ messages in thread From: Jes Sorensen @ 2011-05-12 15:37 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, dlaor, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 05/09/11 17:23, Anthony Liguori wrote: >> >> * Live snapshots and live snapshot merge >> >> Live snapshot is already incorporated (by Jes) in qemu (still need >> qemu-agent work to freeze the guest FS). > > Live snapshot is unfortunately not really "live". It runs a lot of > operations synchronously which will cause the guest to incur downtime. > > We really need to refactor it to truly be live. We keep having this discussion, but as pointed out in my last reply on this, you can pre-create your image if you so desire. The actual snapshot then becomes less in one command. Yes we can make it even nicer, but what we have now is far less bad than you make it out to be. Cheers, Jes ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-09 13:40 [Qemu-devel] [RFC] live snapshot, live merge, live block migration Dor Laor 2011-05-09 15:23 ` Anthony Liguori @ 2011-05-10 14:13 ` Marcelo Tosatti 2011-05-12 15:33 ` Jes Sorensen 2011-05-20 12:19 ` Stefan Hajnoczi 3 siblings, 0 replies; 28+ messages in thread From: Marcelo Tosatti @ 2011-05-10 14:13 UTC (permalink / raw) To: Dor Laor Cc: Kevin Wolf, Anthony Liguori, jes sorensen, qemu-devel, Avi Kivity, Stefan Hajnoczi On Mon, May 09, 2011 at 04:40:00PM +0300, Dor Laor wrote: > No patch here (sorry) but collection of thoughts about these > features and their potential building blocks. Please review (also on > http://wiki.qemu.org/Features/LiveBlockMigration) > > Future qemu is expected to support these features (some already > implemented): > > * Live block copy > > Ability to copy 1+ virtual disk from the source backing file/block > device to a new target that is accessible by the host. The copy > supposed to be executed while the VM runs in a transparent way. > > Status: code exists (by Marcelo) today in qemu but needs refactoring > due to a race condition at the end of the copy operation. We agreed > that a re-implementation of the copy operation should take place > that makes sure the image is completely mirrored until management > decides what copy to keep. > > * Live snapshots and live snapshot merge > > Live snapshot is already incorporated (by Jes) in qemu (still need > qemu-agent work to freeze the guest FS). > > Live snapshot merge is required in order of reducing the overhead > caused by the additional snapshots (sometimes over raw device). > Currently not implemented for a live running guest > > Possibility: enhance live copy to be used for live snapshot merge. > It is almost the same mechanism. The idea is to use live block copy to perform snapshot "live merges". The advantage is the simplicity, since there is no need to synchronize between live merge writes and guest writes. With live copy the guest is either using the old image or the new copy, so crash handling is relatively simple. > * Copy on read (image streaming) > Ability to start guest execution while the parent image reside > remotely and each block access is replicated to a local copy (image > format snapshot) > > It should be nice to have a general mechanism that will be used for > all image formats. What about the protocol to access these blocks > over the net? We can reuse existing ones (nbd/iscsi). > > Such functionality can be hooked together with live block migration > instead of the 'post copy' method. > > * Live block migration (pre/post) > > Beyond live block copy we'll sometimes need to move both the storage > and the guest. There are two main approached here: > - pre copy > First live copy the image and only then live migration the VM. > It is simple but if the purpose of the whole live block migration > was to balance the cpu load, it won't be practical to use since > copying an image of 100GB will take too long. > - post copy > First live migrate the VM, then live copy it's blocks. > It's better approach for HA/load balancing but it might make > management complex (need to keep the source VM alive, what happens > on failures?) > Using copy on read might simplify it - > post copy = live snapshot + copy on read. > > In addition there are two cases for the storage access: > 1. The source block device is shared and can be easily accessed by > the destination qemu-kvm process. > That's the easy case, no special protocol needed for the block > devices copying. > 2. There is no shared storage at all. > This means we should implement a block access protocol over the > live migration fd :( > > We need to chose whether to implement a new one, or re-use NBD or > iScsi (target&initiator) > > * Using external dirty block bitmap > > FVD has an option to use external dirty block bitmap file in > addition to the regular mapping/data files. > > We can consider using it for live block migration and live merge too. > It can also allow additional usages of 3rd party tools to calculate > diffs between the snapshots. > There is a big down side thought since it will make management > complicated and there is the risky of the image and its bitmap file > get out of sync. It's much better choice to have qemu-img tool to be > the single interface to the dirty block bitmap data. > > Summary: > * We need Marcelo's new (to come) block copy implementation > * should work in parallel to migration and hotplug > * General copy on read is desirable > * Live snapshot merge to be implemented using block copy > * Need to utilize a remote block access protocol (iscsi/nbd/other) > Which one is the best? > * Keep qemu-img the single interface for dirty block mappings. > * Live block migration pre copy == live copy + block access protocol > + live migration > * Live block migration post copy == live migration + block access > protocol/copy on read. > > Comments? > > Regards, > Dor ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-09 13:40 [Qemu-devel] [RFC] live snapshot, live merge, live block migration Dor Laor 2011-05-09 15:23 ` Anthony Liguori 2011-05-10 14:13 ` Marcelo Tosatti @ 2011-05-12 15:33 ` Jes Sorensen 2011-05-13 3:16 ` Jagane Sundar 2011-05-20 12:19 ` Stefan Hajnoczi 3 siblings, 1 reply; 28+ messages in thread From: Jes Sorensen @ 2011-05-12 15:33 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, Jagane Sundar, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 05/09/11 15:40, Dor Laor wrote: > Summary: > * We need Marcelo's new (to come) block copy implementation > * should work in parallel to migration and hotplug > * General copy on read is desirable > * Live snapshot merge to be implemented using block copy > * Need to utilize a remote block access protocol (iscsi/nbd/other) > Which one is the best? > * Keep qemu-img the single interface for dirty block mappings. > * Live block migration pre copy == live copy + block access protocol > + live migration > * Live block migration post copy == live migration + block access > protocol/copy on read. > > Comments? I think we should add Jagane Sundar's Livebackup to the watch list here. It looks very interesting as an alternative way to reach some of the same goals. Cheers, Jes ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-12 15:33 ` Jes Sorensen @ 2011-05-13 3:16 ` Jagane Sundar 2011-05-15 21:14 ` Dor Laor 0 siblings, 1 reply; 28+ messages in thread From: Jagane Sundar @ 2011-05-13 3:16 UTC (permalink / raw) To: Jes Sorensen Cc: Kevin Wolf, Anthony Liguori, dlaor, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 5/12/2011 8:33 AM, Jes Sorensen wrote: > On 05/09/11 15:40, Dor Laor wrote: >> Summary: >> * We need Marcelo's new (to come) block copy implementation >> * should work in parallel to migration and hotplug >> * General copy on read is desirable >> * Live snapshot merge to be implemented using block copy >> * Need to utilize a remote block access protocol (iscsi/nbd/other) >> Which one is the best? >> * Keep qemu-img the single interface for dirty block mappings. >> * Live block migration pre copy == live copy + block access protocol >> + live migration >> * Live block migration post copy == live migration + block access >> protocol/copy on read. >> >> Comments? > I think we should add Jagane Sundar's Livebackup to the watch list here. > It looks very interesting as an alternative way to reach some of the > same goals. > > Cheers, > Jes Thanks for the intro, Jes. I am very interested in garnering support for Livebackup. You are correct in that Livebackup solves some, but not all, problems in the same space. Some comments about my code: It took me about two months of development before I connected with you on the list. Initially, I started off by doing a dynamic block transfer such that fewer and fewer blocks are dirty till there are no more dirty blocks and we declare the backup complete. The problem with this approach was that there was no real way to plug in a guest file system quiesce function. I then moved on to the snapshot technique. With this snapshot technique I am also able to test the livebackup function very thoroughly - I use a technique where I create a LVM snapshot simultaneously, and do a cmp of the LVM snapshot and the livebackup backup image. With this mode of testing, I am very confident of the integrity of my solution. I chose to invent a new protocol that is very simple, and custom to livebackup, because I needed livebackup specific functions such as 'create snapshot', 'delete snapshot', etc. Also, I am currently implementing SSL based encryption with both client authenticating to server and server authenticating to client using self signed certificate. iSCSI or NBD would be more standards compliant, I suppose. My high level goal is to make this a natural solution for Infrastructure As A Cloud environments. I am looking carefully at integrating the management of the Livebackup function into Openstack. I would like to help in any way I can to make KVM be the *best* VM technology for IaaS clouds. Thanks, Jagane ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-13 3:16 ` Jagane Sundar @ 2011-05-15 21:14 ` Dor Laor 2011-05-15 21:38 ` Jagane Sundar 0 siblings, 1 reply; 28+ messages in thread From: Dor Laor @ 2011-05-15 21:14 UTC (permalink / raw) To: Jagane Sundar Cc: Kevin Wolf, Anthony Liguori, Jes Sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 05/13/2011 06:16 AM, Jagane Sundar wrote: > On 5/12/2011 8:33 AM, Jes Sorensen wrote: >> On 05/09/11 15:40, Dor Laor wrote: >>> Summary: >>> * We need Marcelo's new (to come) block copy implementation >>> * should work in parallel to migration and hotplug >>> * General copy on read is desirable >>> * Live snapshot merge to be implemented using block copy >>> * Need to utilize a remote block access protocol (iscsi/nbd/other) >>> Which one is the best? >>> * Keep qemu-img the single interface for dirty block mappings. >>> * Live block migration pre copy == live copy + block access protocol >>> + live migration >>> * Live block migration post copy == live migration + block access >>> protocol/copy on read. >>> >>> Comments? >> I think we should add Jagane Sundar's Livebackup to the watch list here. >> It looks very interesting as an alternative way to reach some of the >> same goals. >> >> Cheers, >> Jes > Thanks for the intro, Jes. I am very interested in garnering support for > Livebackup. > > You are correct in that Livebackup solves some, but not all, problems in > the same space. > > Some comments about my code: It took me about two months of development > before I connected with you on the list. > Initially, I started off by doing a dynamic block transfer such that > fewer and fewer blocks are dirty till there are no more dirty blocks and > we declare the backup complete. The problem with this approach was that > there was no real way to plug in a guest file system quiesce function. I > then moved on to the snapshot technique. With this snapshot technique I > am also able to test the livebackup function very thoroughly - I use a > technique where I create a LVM snapshot simultaneously, and do a cmp of > the LVM snapshot and the livebackup backup image. > > With this mode of testing, I am very confident of the integrity of my > solution. > > I chose to invent a new protocol that is very simple, and custom to > livebackup, because I needed livebackup specific functions such as > 'create snapshot', 'delete snapshot', etc. Also, I am currently > implementing SSL based encryption with both client authenticating to > server and server authenticating to client using self signed certificate. > iSCSI or NBD would be more standards compliant, I suppose. +1 that iScsi/NBD have better potential. > > My high level goal is to make this a natural solution for Infrastructure > As A Cloud environments. I am looking carefully at integrating the > management of the Livebackup function into Openstack. One important advantage of live snapshot over live backup is support of multiple (consecutive) live snapshots while there can be only a single live backup at one time. This is why I tend to think that although live backup carry some benefit (no merge required), the live snapshot + live merge are more robust mechanism. > > I would like to help in any way I can to make KVM be the *best* VM > technology for IaaS clouds. :) > > Thanks, > Jagane > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-15 21:14 ` Dor Laor @ 2011-05-15 21:38 ` Jagane Sundar 0 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-15 21:38 UTC (permalink / raw) To: dlaor Cc: Jes Sorensen, Kevin Wolf, Anthony Liguori, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi, kvm Hello Dor, > One important advantage of live snapshot over live backup is support of > multiple (consecutive) live snapshots while there can be only a single > live backup at one time. > > This is why I tend to think that although live backup carry some benefit > (no merge required), the live snapshot + live merge are more robust > mechanism. > The two things that concern me regarding the live snapshot/live merge approach are: 1. Performance considerations of having multiple active snapshots? 2. Robustness of this solution in the face of errors in the disk, etc. If any one of the snapshot files were to get corrupted, the whole VM is adversely impacted. The primary goal of Livebackup architecture was to have zero performance impact on the running VM. Livebackup impacts performance of the VM only when the backup client connects to qemu to transfer the modified blocks over, which should be, say 15 minutes a day, for a daily backup schedule VM. One useful thing to do is to evaluate the important use cases for this technology, and then decide which approach makes most sense. As an example, let me state this use case: - A IaaS cloud, where VMs are always on, running off of a local disk, and need to be backed up once a day or so. Can you list some of the other use cases that live snapshot and live merge were designed to solve. Perhaps we can put up a single wiki page that describes all of these proposals. Thanks, Jagane ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration @ 2011-05-15 21:38 ` Jagane Sundar 0 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-15 21:38 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi Hello Dor, > One important advantage of live snapshot over live backup is support of > multiple (consecutive) live snapshots while there can be only a single > live backup at one time. > > This is why I tend to think that although live backup carry some benefit > (no merge required), the live snapshot + live merge are more robust > mechanism. > The two things that concern me regarding the live snapshot/live merge approach are: 1. Performance considerations of having multiple active snapshots? 2. Robustness of this solution in the face of errors in the disk, etc. If any one of the snapshot files were to get corrupted, the whole VM is adversely impacted. The primary goal of Livebackup architecture was to have zero performance impact on the running VM. Livebackup impacts performance of the VM only when the backup client connects to qemu to transfer the modified blocks over, which should be, say 15 minutes a day, for a daily backup schedule VM. One useful thing to do is to evaluate the important use cases for this technology, and then decide which approach makes most sense. As an example, let me state this use case: - A IaaS cloud, where VMs are always on, running off of a local disk, and need to be backed up once a day or so. Can you list some of the other use cases that live snapshot and live merge were designed to solve. Perhaps we can put up a single wiki page that describes all of these proposals. Thanks, Jagane ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] live snapshot, live merge, live block migration 2011-05-15 21:38 ` Jagane Sundar @ 2011-05-16 7:53 ` Dor Laor -1 siblings, 0 replies; 28+ messages in thread From: Dor Laor @ 2011-05-16 7:53 UTC (permalink / raw) To: Jagane Sundar Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 05/16/2011 12:38 AM, Jagane Sundar wrote: > Hello Dor, >> One important advantage of live snapshot over live backup is support of >> multiple (consecutive) live snapshots while there can be only a single >> live backup at one time. >> >> This is why I tend to think that although live backup carry some benefit >> (no merge required), the live snapshot + live merge are more robust >> mechanism. >> > > The two things that concern me regarding the > live snapshot/live merge approach are: > 1. Performance considerations of having > multiple active snapshots? My description above was in accurate and I only hinted that multiple snapshots are possible but they are done consecutively - Live snapshot takes practically almost no time - just the time to get the guest virtagent to freeze the guest FS and to create the snapshot (for qcow2 is immediate). So if you like to have multiple snapshot, let's say 5 minute after you issued the first snapshot, there is no problem. The new writes will go to the snapshot while the former base is marked as read only. Eventually you like to (live) merge the snapshots together. This can be done in any point in time. > 2. Robustness of this solution in the face of > errors in the disk, etc. If any one of the snapshot > files were to get corrupted, the whole VM is > adversely impacted. Since the base images and any snapshot which is not a leaf is marked as read only there is no such risk. > > The primary goal of Livebackup architecture was to have zero > performance impact on the running VM. > > Livebackup impacts performance of the VM only when the > backup client connects to qemu to transfer the modified > blocks over, which should be, say 15 minutes a day, for a > daily backup schedule VM. In case there were lots of changing for example additional 50GB changes it will take more time and there will be a performance hit. > > One useful thing to do is to evaluate the important use cases > for this technology, and then decide which approach makes > most sense. As an example, let me state this use case: > - A IaaS cloud, where VMs are always on, running off of a local > disk, and need to be backed up once a day or so. > > Can you list some of the other use cases that live snapshot and > live merge were designed to solve. Perhaps we can put up a > single wiki page that describes all of these proposals. Both solutions can serve for the same scenario: With live snapshot the backup is done the following: 1. Take a live snapshot (s1) of image s0. 2. Newer writes goes to the snapshot s1 while s0 is read only. 3. Backup software processes s0 image. There are multiple ways for doing that - 1. Use qemu-img and get the dirty blocks from former backup. - Currently qemu-img does not support it. - Nevertheless, such mechanism will work for lvm, btrfs, NetApp 2. Mount the s0 image to another guest that runs traditional backup software at the file system level and let it do the backup. 4. Live merge s1->s0 We'll use live copy for that so each write is duplicated (like your live backup solution). 5. Delete s1 As you can see, both approaches are very similar, while live snapshot is more general and not tied to backup specifically. > > Thanks, > Jagane > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration @ 2011-05-16 7:53 ` Dor Laor 0 siblings, 0 replies; 28+ messages in thread From: Dor Laor @ 2011-05-16 7:53 UTC (permalink / raw) To: Jagane Sundar Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi On 05/16/2011 12:38 AM, Jagane Sundar wrote: > Hello Dor, >> One important advantage of live snapshot over live backup is support of >> multiple (consecutive) live snapshots while there can be only a single >> live backup at one time. >> >> This is why I tend to think that although live backup carry some benefit >> (no merge required), the live snapshot + live merge are more robust >> mechanism. >> > > The two things that concern me regarding the > live snapshot/live merge approach are: > 1. Performance considerations of having > multiple active snapshots? My description above was in accurate and I only hinted that multiple snapshots are possible but they are done consecutively - Live snapshot takes practically almost no time - just the time to get the guest virtagent to freeze the guest FS and to create the snapshot (for qcow2 is immediate). So if you like to have multiple snapshot, let's say 5 minute after you issued the first snapshot, there is no problem. The new writes will go to the snapshot while the former base is marked as read only. Eventually you like to (live) merge the snapshots together. This can be done in any point in time. > 2. Robustness of this solution in the face of > errors in the disk, etc. If any one of the snapshot > files were to get corrupted, the whole VM is > adversely impacted. Since the base images and any snapshot which is not a leaf is marked as read only there is no such risk. > > The primary goal of Livebackup architecture was to have zero > performance impact on the running VM. > > Livebackup impacts performance of the VM only when the > backup client connects to qemu to transfer the modified > blocks over, which should be, say 15 minutes a day, for a > daily backup schedule VM. In case there were lots of changing for example additional 50GB changes it will take more time and there will be a performance hit. > > One useful thing to do is to evaluate the important use cases > for this technology, and then decide which approach makes > most sense. As an example, let me state this use case: > - A IaaS cloud, where VMs are always on, running off of a local > disk, and need to be backed up once a day or so. > > Can you list some of the other use cases that live snapshot and > live merge were designed to solve. Perhaps we can put up a > single wiki page that describes all of these proposals. Both solutions can serve for the same scenario: With live snapshot the backup is done the following: 1. Take a live snapshot (s1) of image s0. 2. Newer writes goes to the snapshot s1 while s0 is read only. 3. Backup software processes s0 image. There are multiple ways for doing that - 1. Use qemu-img and get the dirty blocks from former backup. - Currently qemu-img does not support it. - Nevertheless, such mechanism will work for lvm, btrfs, NetApp 2. Mount the s0 image to another guest that runs traditional backup software at the file system level and let it do the backup. 4. Live merge s1->s0 We'll use live copy for that so each write is duplicated (like your live backup solution). 5. Delete s1 As you can see, both approaches are very similar, while live snapshot is more general and not tied to backup specifically. > > Thanks, > Jagane > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] live snapshot, live merge, live block migration 2011-05-16 7:53 ` [Qemu-devel] " Dor Laor @ 2011-05-16 8:23 ` Jagane Sundar -1 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-16 8:23 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi Hello Dor, Let me see if I understand live snapshot correctly: If I want to configure a VM for daily backup, then I would do the following: - Create a snapshot s1. s0 is marked read-only. - Do a full backup of s0 on day 0. - On day 1, I would create a new snapshot s2, then copy over the snapshot s1, which is the incremental backup image from s0 to s1. - After copying s1 over, I do not need that snapshot, so I would live merge s1 with s0, to create a new merged read-only image s1'. - On day 2, I would create a new snapshot s3, then copy over s2, which is the incremental backup from s1' to s2 - And so on... With this sequence of operations, I would need to keep a snapshot active at all times, in order to enable the incremental backup capability, right? If the base image is s0 and there is a single snapshot s1, then a read operation from the VM will first look in s1. if the block is not present in s1, then it will read the block from s0, right? So most reads from the VM will effectively translate into two reads, right? Isn't this a continuous performance penalty for the VM, amounting to almost doubling the read I/O from the VM? Please read below for more comments: >> 2. Robustness of this solution in the face of >> errors in the disk, etc. If any one of the snapshot >> files were to get corrupted, the whole VM is >> adversely impacted. > Since the base images and any snapshot which is not a leaf is marked as > read only there is no such risk. > What happens when a VM host reboots while a live merge of s0 and s1 is being done? >> The primary goal of Livebackup architecture was to have zero >> performance impact on the running VM. >> >> Livebackup impacts performance of the VM only when the >> backup client connects to qemu to transfer the modified >> blocks over, which should be, say 15 minutes a day, for a >> daily backup schedule VM. > In case there were lots of changing for example additional 50GB changes > it will take more time and there will be a performance hit. > Of course, the performance hit is proportional to the amount of data being copied over. However, the performance penalty is paid during the backup operation, and not during normal VM operation. >> One useful thing to do is to evaluate the important use cases >> for this technology, and then decide which approach makes >> most sense. As an example, let me state this use case: >> - A IaaS cloud, where VMs are always on, running off of a local >> disk, and need to be backed up once a day or so. >> >> Can you list some of the other use cases that live snapshot and >> live merge were designed to solve. Perhaps we can put up a >> single wiki page that describes all of these proposals. > Both solutions can serve for the same scenario: > With live snapshot the backup is done the following: > > 1. Take a live snapshot (s1) of image s0. > 2. Newer writes goes to the snapshot s1 while s0 is read only. > 3. Backup software processes s0 image. > There are multiple ways for doing that - > 1. Use qemu-img and get the dirty blocks from former backup. > - Currently qemu-img does not support it. > - Nevertheless, such mechanism will work for lvm, btrfs, NetApp > 2. Mount the s0 image to another guest that runs traditional backup > software at the file system level and let it do the backup. > 4. Live merge s1->s0 > We'll use live copy for that so each write is duplicated (like your > live backup solution). > 5. Delete s1 > > As you can see, both approaches are very similar, while live snapshot is > more general and not tied to backup specifically. > As I explained at the head of this email, I believe that live snapshot results in the VM read I/O paying a high penalty during normal operation of the VM, whereas Livebackup results in this penalty being paid only during the backup dirty block transfer operation. Finally, I would like to bring up considerations of disk space. To expand on my use case further, consider a Cloud Compute service with 100 VMs running on a host. If live snapshot is used to create snapshot COW files, then potentially each VM could grow the COW snapshot file to the size of the base file, which means the VM host needs to reserve space for the snapshot that equals the size of the VMs - i.e. a 8GB VM would require an additional 8GB of space to be reserved for the snapshot, so that the service provider could safely guarantee that the snapshot will not run out of space. Contrast this with livebackup, wherein the COW files are kept only when the dirty block transfers are being done. This means that for a host with 100 VMs, if the backup server is connecting to each of the 100 qemu's one by one and doing a livebackup, the service provider would need to provision spare disk for at most the COW size of one VM. Thanks, Jagane ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration @ 2011-05-16 8:23 ` Jagane Sundar 0 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-16 8:23 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi Hello Dor, Let me see if I understand live snapshot correctly: If I want to configure a VM for daily backup, then I would do the following: - Create a snapshot s1. s0 is marked read-only. - Do a full backup of s0 on day 0. - On day 1, I would create a new snapshot s2, then copy over the snapshot s1, which is the incremental backup image from s0 to s1. - After copying s1 over, I do not need that snapshot, so I would live merge s1 with s0, to create a new merged read-only image s1'. - On day 2, I would create a new snapshot s3, then copy over s2, which is the incremental backup from s1' to s2 - And so on... With this sequence of operations, I would need to keep a snapshot active at all times, in order to enable the incremental backup capability, right? If the base image is s0 and there is a single snapshot s1, then a read operation from the VM will first look in s1. if the block is not present in s1, then it will read the block from s0, right? So most reads from the VM will effectively translate into two reads, right? Isn't this a continuous performance penalty for the VM, amounting to almost doubling the read I/O from the VM? Please read below for more comments: >> 2. Robustness of this solution in the face of >> errors in the disk, etc. If any one of the snapshot >> files were to get corrupted, the whole VM is >> adversely impacted. > Since the base images and any snapshot which is not a leaf is marked as > read only there is no such risk. > What happens when a VM host reboots while a live merge of s0 and s1 is being done? >> The primary goal of Livebackup architecture was to have zero >> performance impact on the running VM. >> >> Livebackup impacts performance of the VM only when the >> backup client connects to qemu to transfer the modified >> blocks over, which should be, say 15 minutes a day, for a >> daily backup schedule VM. > In case there were lots of changing for example additional 50GB changes > it will take more time and there will be a performance hit. > Of course, the performance hit is proportional to the amount of data being copied over. However, the performance penalty is paid during the backup operation, and not during normal VM operation. >> One useful thing to do is to evaluate the important use cases >> for this technology, and then decide which approach makes >> most sense. As an example, let me state this use case: >> - A IaaS cloud, where VMs are always on, running off of a local >> disk, and need to be backed up once a day or so. >> >> Can you list some of the other use cases that live snapshot and >> live merge were designed to solve. Perhaps we can put up a >> single wiki page that describes all of these proposals. > Both solutions can serve for the same scenario: > With live snapshot the backup is done the following: > > 1. Take a live snapshot (s1) of image s0. > 2. Newer writes goes to the snapshot s1 while s0 is read only. > 3. Backup software processes s0 image. > There are multiple ways for doing that - > 1. Use qemu-img and get the dirty blocks from former backup. > - Currently qemu-img does not support it. > - Nevertheless, such mechanism will work for lvm, btrfs, NetApp > 2. Mount the s0 image to another guest that runs traditional backup > software at the file system level and let it do the backup. > 4. Live merge s1->s0 > We'll use live copy for that so each write is duplicated (like your > live backup solution). > 5. Delete s1 > > As you can see, both approaches are very similar, while live snapshot is > more general and not tied to backup specifically. > As I explained at the head of this email, I believe that live snapshot results in the VM read I/O paying a high penalty during normal operation of the VM, whereas Livebackup results in this penalty being paid only during the backup dirty block transfer operation. Finally, I would like to bring up considerations of disk space. To expand on my use case further, consider a Cloud Compute service with 100 VMs running on a host. If live snapshot is used to create snapshot COW files, then potentially each VM could grow the COW snapshot file to the size of the base file, which means the VM host needs to reserve space for the snapshot that equals the size of the VMs - i.e. a 8GB VM would require an additional 8GB of space to be reserved for the snapshot, so that the service provider could safely guarantee that the snapshot will not run out of space. Contrast this with livebackup, wherein the COW files are kept only when the dirty block transfers are being done. This means that for a host with 100 VMs, if the backup server is connecting to each of the 100 qemu's one by one and doing a livebackup, the service provider would need to provision spare disk for at most the COW size of one VM. Thanks, Jagane ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] live snapshot, live merge, live block migration 2011-05-16 8:23 ` [Qemu-devel] " Jagane Sundar @ 2011-05-17 22:53 ` Dor Laor -1 siblings, 0 replies; 28+ messages in thread From: Dor Laor @ 2011-05-17 22:53 UTC (permalink / raw) To: Jagane Sundar Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Ayal Baron, Avi Kivity, Stefan Hajnoczi On 05/16/2011 11:23 AM, Jagane Sundar wrote: > Hello Dor, > > Let me see if I understand live snapshot correctly: > If I want to configure a VM for daily backup, then I would do > the following: > - Create a snapshot s1. s0 is marked read-only. > - Do a full backup of s0 on day 0. > - On day 1, I would create a new snapshot s2, then > copy over the snapshot s1, which is the incremental > backup image from s0 to s1. > - After copying s1 over, I do not need that snapshot, so > I would live merge s1 with s0, to create a new merged > read-only image s1'. > - On day 2, I would create a new snapshot s3, then > copy over s2, which is the incremental backup from > s1' to s2 > - And so on... > > With this sequence of operations, I would need to keep a > snapshot active at all times, in order to enable the > incremental backup capability, right? No and yes ;-) For regular non incremental backup you can have no snapshot active most times: - Create a snapshot s1. s0 is marked read-only. - Do a full backup of s0 on day 0. - Once backup is finished, live merge s1 into s0 and make s0 writeable again. So this way there are no performance penalty here. Here we need an option to track dirty block bits (either as internal format or external file). This will be both efficient and get the job done. But in order to be efficient in storage we'll need to ask the snapshot creation to only refer to these dirt blocks. Well, thinking out load, it turned out to your solution :) Ok, I do see the value there is with incremental backups. I'm aware that there were requirements that the backup software itself will be done from the guest filesystem level, there incremental backup would be done on the FS layer. Still I do see the value in your solution. Another option for us would be to keep the latest snapshots around and and let the guest IO go through them all the time. There is some performance cost but as the newer image format develop, this cost is relatively very low. > > If the base image is s0 and there is a single snapshot s1, then a > read operation from the VM will first look in s1. if the block is > not present in s1, then it will read the block from s0, right? > So most reads from the VM will effectively translate into two > reads, right? > > Isn't this a continuous performance penalty for the VM, > amounting to almost doubling the read I/O from the VM? > > Please read below for more comments: >>> 2. Robustness of this solution in the face of >>> errors in the disk, etc. If any one of the snapshot >>> files were to get corrupted, the whole VM is >>> adversely impacted. >> Since the base images and any snapshot which is not a leaf is marked as >> read only there is no such risk. >> > What happens when a VM host reboots while a live merge of s0 > and s1 is being done? Live merge is using live copy that does duplicates each write IO. On a host crash, the merge will continue from the same point where it stopped. I think I answered the your other good comments above. Thanks, Dor >>> The primary goal of Livebackup architecture was to have zero >>> performance impact on the running VM. >>> >>> Livebackup impacts performance of the VM only when the >>> backup client connects to qemu to transfer the modified >>> blocks over, which should be, say 15 minutes a day, for a >>> daily backup schedule VM. >> In case there were lots of changing for example additional 50GB changes >> it will take more time and there will be a performance hit. >> > Of course, the performance hit is proportional to the amount of data > being copied over. However, the performance penalty is paid during > the backup operation, and not during normal VM operation. > >>> One useful thing to do is to evaluate the important use cases >>> for this technology, and then decide which approach makes >>> most sense. As an example, let me state this use case: >>> - A IaaS cloud, where VMs are always on, running off of a local >>> disk, and need to be backed up once a day or so. >>> >>> Can you list some of the other use cases that live snapshot and >>> live merge were designed to solve. Perhaps we can put up a >>> single wiki page that describes all of these proposals. >> Both solutions can serve for the same scenario: >> With live snapshot the backup is done the following: >> >> 1. Take a live snapshot (s1) of image s0. >> 2. Newer writes goes to the snapshot s1 while s0 is read only. >> 3. Backup software processes s0 image. >> There are multiple ways for doing that - >> 1. Use qemu-img and get the dirty blocks from former backup. >> - Currently qemu-img does not support it. >> - Nevertheless, such mechanism will work for lvm, btrfs, NetApp >> 2. Mount the s0 image to another guest that runs traditional backup >> software at the file system level and let it do the backup. >> 4. Live merge s1->s0 >> We'll use live copy for that so each write is duplicated (like your >> live backup solution). >> 5. Delete s1 >> >> As you can see, both approaches are very similar, while live snapshot is >> more general and not tied to backup specifically. >> > > As I explained at the head of this email, I believe that live snapshot > results in the VM read I/O paying a high penalty during normal operation > of the VM, whereas Livebackup results in this penalty being paid only > during the backup dirty block transfer operation. > > Finally, I would like to bring up considerations of disk space. To > expand on > my use case further, consider a Cloud Compute service with 100 VMs > running on a host. If live snapshot is used to create snapshot COW files, > then potentially each VM could grow the COW snapshot file to the size > of the base file, which means the VM host needs to reserve space for > the snapshot that equals the size of the VMs - i.e. a 8GB VM would > require an additional 8GB of space to be reserved for the snapshot, > so that the service provider could safely guarantee that the snapshot > will not run out of space. > Contrast this with livebackup, wherein the COW files are kept only when > the dirty block transfers are being done. This means that for a host with > 100 VMs, if the backup server is connecting to each of the 100 qemu's > one by one and doing a livebackup, the service provider would need > to provision spare disk for at most the COW size of one VM. > > Thanks, > Jagane > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration @ 2011-05-17 22:53 ` Dor Laor 0 siblings, 0 replies; 28+ messages in thread From: Dor Laor @ 2011-05-17 22:53 UTC (permalink / raw) To: Jagane Sundar Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Ayal Baron, Avi Kivity, Stefan Hajnoczi On 05/16/2011 11:23 AM, Jagane Sundar wrote: > Hello Dor, > > Let me see if I understand live snapshot correctly: > If I want to configure a VM for daily backup, then I would do > the following: > - Create a snapshot s1. s0 is marked read-only. > - Do a full backup of s0 on day 0. > - On day 1, I would create a new snapshot s2, then > copy over the snapshot s1, which is the incremental > backup image from s0 to s1. > - After copying s1 over, I do not need that snapshot, so > I would live merge s1 with s0, to create a new merged > read-only image s1'. > - On day 2, I would create a new snapshot s3, then > copy over s2, which is the incremental backup from > s1' to s2 > - And so on... > > With this sequence of operations, I would need to keep a > snapshot active at all times, in order to enable the > incremental backup capability, right? No and yes ;-) For regular non incremental backup you can have no snapshot active most times: - Create a snapshot s1. s0 is marked read-only. - Do a full backup of s0 on day 0. - Once backup is finished, live merge s1 into s0 and make s0 writeable again. So this way there are no performance penalty here. Here we need an option to track dirty block bits (either as internal format or external file). This will be both efficient and get the job done. But in order to be efficient in storage we'll need to ask the snapshot creation to only refer to these dirt blocks. Well, thinking out load, it turned out to your solution :) Ok, I do see the value there is with incremental backups. I'm aware that there were requirements that the backup software itself will be done from the guest filesystem level, there incremental backup would be done on the FS layer. Still I do see the value in your solution. Another option for us would be to keep the latest snapshots around and and let the guest IO go through them all the time. There is some performance cost but as the newer image format develop, this cost is relatively very low. > > If the base image is s0 and there is a single snapshot s1, then a > read operation from the VM will first look in s1. if the block is > not present in s1, then it will read the block from s0, right? > So most reads from the VM will effectively translate into two > reads, right? > > Isn't this a continuous performance penalty for the VM, > amounting to almost doubling the read I/O from the VM? > > Please read below for more comments: >>> 2. Robustness of this solution in the face of >>> errors in the disk, etc. If any one of the snapshot >>> files were to get corrupted, the whole VM is >>> adversely impacted. >> Since the base images and any snapshot which is not a leaf is marked as >> read only there is no such risk. >> > What happens when a VM host reboots while a live merge of s0 > and s1 is being done? Live merge is using live copy that does duplicates each write IO. On a host crash, the merge will continue from the same point where it stopped. I think I answered the your other good comments above. Thanks, Dor >>> The primary goal of Livebackup architecture was to have zero >>> performance impact on the running VM. >>> >>> Livebackup impacts performance of the VM only when the >>> backup client connects to qemu to transfer the modified >>> blocks over, which should be, say 15 minutes a day, for a >>> daily backup schedule VM. >> In case there were lots of changing for example additional 50GB changes >> it will take more time and there will be a performance hit. >> > Of course, the performance hit is proportional to the amount of data > being copied over. However, the performance penalty is paid during > the backup operation, and not during normal VM operation. > >>> One useful thing to do is to evaluate the important use cases >>> for this technology, and then decide which approach makes >>> most sense. As an example, let me state this use case: >>> - A IaaS cloud, where VMs are always on, running off of a local >>> disk, and need to be backed up once a day or so. >>> >>> Can you list some of the other use cases that live snapshot and >>> live merge were designed to solve. Perhaps we can put up a >>> single wiki page that describes all of these proposals. >> Both solutions can serve for the same scenario: >> With live snapshot the backup is done the following: >> >> 1. Take a live snapshot (s1) of image s0. >> 2. Newer writes goes to the snapshot s1 while s0 is read only. >> 3. Backup software processes s0 image. >> There are multiple ways for doing that - >> 1. Use qemu-img and get the dirty blocks from former backup. >> - Currently qemu-img does not support it. >> - Nevertheless, such mechanism will work for lvm, btrfs, NetApp >> 2. Mount the s0 image to another guest that runs traditional backup >> software at the file system level and let it do the backup. >> 4. Live merge s1->s0 >> We'll use live copy for that so each write is duplicated (like your >> live backup solution). >> 5. Delete s1 >> >> As you can see, both approaches are very similar, while live snapshot is >> more general and not tied to backup specifically. >> > > As I explained at the head of this email, I believe that live snapshot > results in the VM read I/O paying a high penalty during normal operation > of the VM, whereas Livebackup results in this penalty being paid only > during the backup dirty block transfer operation. > > Finally, I would like to bring up considerations of disk space. To > expand on > my use case further, consider a Cloud Compute service with 100 VMs > running on a host. If live snapshot is used to create snapshot COW files, > then potentially each VM could grow the COW snapshot file to the size > of the base file, which means the VM host needs to reserve space for > the snapshot that equals the size of the VMs - i.e. a 8GB VM would > require an additional 8GB of space to be reserved for the snapshot, > so that the service provider could safely guarantee that the snapshot > will not run out of space. > Contrast this with livebackup, wherein the COW files are kept only when > the dirty block transfers are being done. This means that for a host with > 100 VMs, if the backup server is connecting to each of the 100 qemu's > one by one and doing a livebackup, the service provider would need > to provision spare disk for at most the COW size of one VM. > > Thanks, > Jagane > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-17 22:53 ` [Qemu-devel] " Dor Laor @ 2011-05-18 15:49 ` Jagane Sundar -1 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-18 15:49 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Avi Kivity, Stefan Hajnoczi, Ayal Baron Hello Dor, I'm glad I could convince you of the value of Livebackup. I think Livesnapshot/Livemerge, Livebackup and Block Migration all have very interesting use cases. For example: - Livesnapshot/Livemerge is very useful in development/QA environments where one might want to create a snapshot before trying out some new software and then committing. - Livebackup is useful in cloud environments where the Cloud Service Provider may want to offer regularly scheduled backed up VMs with no effort on the part of the customer - Block Migration with COR is useful in Cloud Service provider environments where an arbitrary VM may need to be migrated over to another VM server, even though the VM is on direct attached storage. The above is by no means an exhaustive list of use cases. I am sure qemu/qemu-kvm users can come up with more. Although there are some common concepts in these three technologies, I think we should support all three in base qemu. This would make qemu/qemu-kvm more feature rich than vmware, xen and hyper-v. Thanks, Jagane On 5/17/2011 3:53 PM, Dor Laor wrote: > On 05/16/2011 11:23 AM, Jagane Sundar wrote: >> Hello Dor, >> >> Let me see if I understand live snapshot correctly: >> If I want to configure a VM for daily backup, then I would do >> the following: >> - Create a snapshot s1. s0 is marked read-only. >> - Do a full backup of s0 on day 0. >> - On day 1, I would create a new snapshot s2, then >> copy over the snapshot s1, which is the incremental >> backup image from s0 to s1. >> - After copying s1 over, I do not need that snapshot, so >> I would live merge s1 with s0, to create a new merged >> read-only image s1'. >> - On day 2, I would create a new snapshot s3, then >> copy over s2, which is the incremental backup from >> s1' to s2 >> - And so on... >> >> With this sequence of operations, I would need to keep a >> snapshot active at all times, in order to enable the >> incremental backup capability, right? > No and yes ;-) > > For regular non incremental backup you can have no snapshot active most > times: > > - Create a snapshot s1. s0 is marked read-only. > - Do a full backup of s0 on day 0. > - Once backup is finished, live merge s1 into s0 and make s0 writeable > again. > > So this way there are no performance penalty here. > Here we need an option to track dirty block bits (either as internal > format or external file). This will be both efficient and get the job done. > > But in order to be efficient in storage we'll need to ask the snapshot > creation to only refer to these dirt blocks. > Well, thinking out load, it turned out to your solution :) > > Ok, I do see the value there is with incremental backups. > > I'm aware that there were requirements that the backup software itself > will be done from the guest filesystem level, there incremental backup > would be done on the FS layer. > > Still I do see the value in your solution. > > Another option for us would be to keep the latest snapshots around and > and let the guest IO go through them all the time. There is some > performance cost but as the newer image format develop, this cost is > relatively very low. > >> If the base image is s0 and there is a single snapshot s1, then a >> read operation from the VM will first look in s1. if the block is >> not present in s1, then it will read the block from s0, right? >> So most reads from the VM will effectively translate into two >> reads, right? >> >> Isn't this a continuous performance penalty for the VM, >> amounting to almost doubling the read I/O from the VM? >> >> Please read below for more comments: >>>> 2. Robustness of this solution in the face of >>>> errors in the disk, etc. If any one of the snapshot >>>> files were to get corrupted, the whole VM is >>>> adversely impacted. >>> Since the base images and any snapshot which is not a leaf is marked as >>> read only there is no such risk. >>> >> What happens when a VM host reboots while a live merge of s0 >> and s1 is being done? > Live merge is using live copy that does duplicates each write IO. > On a host crash, the merge will continue from the same point where it > stopped. > > I think I answered the your other good comments above. > Thanks, > Dor > >>>> The primary goal of Livebackup architecture was to have zero >>>> performance impact on the running VM. >>>> >>>> Livebackup impacts performance of the VM only when the >>>> backup client connects to qemu to transfer the modified >>>> blocks over, which should be, say 15 minutes a day, for a >>>> daily backup schedule VM. >>> In case there were lots of changing for example additional 50GB changes >>> it will take more time and there will be a performance hit. >>> >> Of course, the performance hit is proportional to the amount of data >> being copied over. However, the performance penalty is paid during >> the backup operation, and not during normal VM operation. >> >>>> One useful thing to do is to evaluate the important use cases >>>> for this technology, and then decide which approach makes >>>> most sense. As an example, let me state this use case: >>>> - A IaaS cloud, where VMs are always on, running off of a local >>>> disk, and need to be backed up once a day or so. >>>> >>>> Can you list some of the other use cases that live snapshot and >>>> live merge were designed to solve. Perhaps we can put up a >>>> single wiki page that describes all of these proposals. >>> Both solutions can serve for the same scenario: >>> With live snapshot the backup is done the following: >>> >>> 1. Take a live snapshot (s1) of image s0. >>> 2. Newer writes goes to the snapshot s1 while s0 is read only. >>> 3. Backup software processes s0 image. >>> There are multiple ways for doing that - >>> 1. Use qemu-img and get the dirty blocks from former backup. >>> - Currently qemu-img does not support it. >>> - Nevertheless, such mechanism will work for lvm, btrfs, NetApp >>> 2. Mount the s0 image to another guest that runs traditional backup >>> software at the file system level and let it do the backup. >>> 4. Live merge s1->s0 >>> We'll use live copy for that so each write is duplicated (like your >>> live backup solution). >>> 5. Delete s1 >>> >>> As you can see, both approaches are very similar, while live snapshot is >>> more general and not tied to backup specifically. >>> >> As I explained at the head of this email, I believe that live snapshot >> results in the VM read I/O paying a high penalty during normal operation >> of the VM, whereas Livebackup results in this penalty being paid only >> during the backup dirty block transfer operation. >> >> Finally, I would like to bring up considerations of disk space. To >> expand on >> my use case further, consider a Cloud Compute service with 100 VMs >> running on a host. If live snapshot is used to create snapshot COW files, >> then potentially each VM could grow the COW snapshot file to the size >> of the base file, which means the VM host needs to reserve space for >> the snapshot that equals the size of the VMs - i.e. a 8GB VM would >> require an additional 8GB of space to be reserved for the snapshot, >> so that the service provider could safely guarantee that the snapshot >> will not run out of space. >> Contrast this with livebackup, wherein the COW files are kept only when >> the dirty block transfers are being done. This means that for a host with >> 100 VMs, if the backup server is connecting to each of the 100 qemu's >> one by one and doing a livebackup, the service provider would need >> to provision spare disk for at most the COW size of one VM. >> >> Thanks, >> Jagane >> >> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration @ 2011-05-18 15:49 ` Jagane Sundar 0 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-18 15:49 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, kvm, Jes Sorensen, Marcelo Tosatti, qemu-devel, Ayal Baron, Avi Kivity, Stefan Hajnoczi Hello Dor, I'm glad I could convince you of the value of Livebackup. I think Livesnapshot/Livemerge, Livebackup and Block Migration all have very interesting use cases. For example: - Livesnapshot/Livemerge is very useful in development/QA environments where one might want to create a snapshot before trying out some new software and then committing. - Livebackup is useful in cloud environments where the Cloud Service Provider may want to offer regularly scheduled backed up VMs with no effort on the part of the customer - Block Migration with COR is useful in Cloud Service provider environments where an arbitrary VM may need to be migrated over to another VM server, even though the VM is on direct attached storage. The above is by no means an exhaustive list of use cases. I am sure qemu/qemu-kvm users can come up with more. Although there are some common concepts in these three technologies, I think we should support all three in base qemu. This would make qemu/qemu-kvm more feature rich than vmware, xen and hyper-v. Thanks, Jagane On 5/17/2011 3:53 PM, Dor Laor wrote: > On 05/16/2011 11:23 AM, Jagane Sundar wrote: >> Hello Dor, >> >> Let me see if I understand live snapshot correctly: >> If I want to configure a VM for daily backup, then I would do >> the following: >> - Create a snapshot s1. s0 is marked read-only. >> - Do a full backup of s0 on day 0. >> - On day 1, I would create a new snapshot s2, then >> copy over the snapshot s1, which is the incremental >> backup image from s0 to s1. >> - After copying s1 over, I do not need that snapshot, so >> I would live merge s1 with s0, to create a new merged >> read-only image s1'. >> - On day 2, I would create a new snapshot s3, then >> copy over s2, which is the incremental backup from >> s1' to s2 >> - And so on... >> >> With this sequence of operations, I would need to keep a >> snapshot active at all times, in order to enable the >> incremental backup capability, right? > No and yes ;-) > > For regular non incremental backup you can have no snapshot active most > times: > > - Create a snapshot s1. s0 is marked read-only. > - Do a full backup of s0 on day 0. > - Once backup is finished, live merge s1 into s0 and make s0 writeable > again. > > So this way there are no performance penalty here. > Here we need an option to track dirty block bits (either as internal > format or external file). This will be both efficient and get the job done. > > But in order to be efficient in storage we'll need to ask the snapshot > creation to only refer to these dirt blocks. > Well, thinking out load, it turned out to your solution :) > > Ok, I do see the value there is with incremental backups. > > I'm aware that there were requirements that the backup software itself > will be done from the guest filesystem level, there incremental backup > would be done on the FS layer. > > Still I do see the value in your solution. > > Another option for us would be to keep the latest snapshots around and > and let the guest IO go through them all the time. There is some > performance cost but as the newer image format develop, this cost is > relatively very low. > >> If the base image is s0 and there is a single snapshot s1, then a >> read operation from the VM will first look in s1. if the block is >> not present in s1, then it will read the block from s0, right? >> So most reads from the VM will effectively translate into two >> reads, right? >> >> Isn't this a continuous performance penalty for the VM, >> amounting to almost doubling the read I/O from the VM? >> >> Please read below for more comments: >>>> 2. Robustness of this solution in the face of >>>> errors in the disk, etc. If any one of the snapshot >>>> files were to get corrupted, the whole VM is >>>> adversely impacted. >>> Since the base images and any snapshot which is not a leaf is marked as >>> read only there is no such risk. >>> >> What happens when a VM host reboots while a live merge of s0 >> and s1 is being done? > Live merge is using live copy that does duplicates each write IO. > On a host crash, the merge will continue from the same point where it > stopped. > > I think I answered the your other good comments above. > Thanks, > Dor > >>>> The primary goal of Livebackup architecture was to have zero >>>> performance impact on the running VM. >>>> >>>> Livebackup impacts performance of the VM only when the >>>> backup client connects to qemu to transfer the modified >>>> blocks over, which should be, say 15 minutes a day, for a >>>> daily backup schedule VM. >>> In case there were lots of changing for example additional 50GB changes >>> it will take more time and there will be a performance hit. >>> >> Of course, the performance hit is proportional to the amount of data >> being copied over. However, the performance penalty is paid during >> the backup operation, and not during normal VM operation. >> >>>> One useful thing to do is to evaluate the important use cases >>>> for this technology, and then decide which approach makes >>>> most sense. As an example, let me state this use case: >>>> - A IaaS cloud, where VMs are always on, running off of a local >>>> disk, and need to be backed up once a day or so. >>>> >>>> Can you list some of the other use cases that live snapshot and >>>> live merge were designed to solve. Perhaps we can put up a >>>> single wiki page that describes all of these proposals. >>> Both solutions can serve for the same scenario: >>> With live snapshot the backup is done the following: >>> >>> 1. Take a live snapshot (s1) of image s0. >>> 2. Newer writes goes to the snapshot s1 while s0 is read only. >>> 3. Backup software processes s0 image. >>> There are multiple ways for doing that - >>> 1. Use qemu-img and get the dirty blocks from former backup. >>> - Currently qemu-img does not support it. >>> - Nevertheless, such mechanism will work for lvm, btrfs, NetApp >>> 2. Mount the s0 image to another guest that runs traditional backup >>> software at the file system level and let it do the backup. >>> 4. Live merge s1->s0 >>> We'll use live copy for that so each write is duplicated (like your >>> live backup solution). >>> 5. Delete s1 >>> >>> As you can see, both approaches are very similar, while live snapshot is >>> more general and not tied to backup specifically. >>> >> As I explained at the head of this email, I believe that live snapshot >> results in the VM read I/O paying a high penalty during normal operation >> of the VM, whereas Livebackup results in this penalty being paid only >> during the backup dirty block transfer operation. >> >> Finally, I would like to bring up considerations of disk space. To >> expand on >> my use case further, consider a Cloud Compute service with 100 VMs >> running on a host. If live snapshot is used to create snapshot COW files, >> then potentially each VM could grow the COW snapshot file to the size >> of the base file, which means the VM host needs to reserve space for >> the snapshot that equals the size of the VMs - i.e. a 8GB VM would >> require an additional 8GB of space to be reserved for the snapshot, >> so that the service provider could safely guarantee that the snapshot >> will not run out of space. >> Contrast this with livebackup, wherein the COW files are kept only when >> the dirty block transfers are being done. This means that for a host with >> 100 VMs, if the backup server is connecting to each of the 100 qemu's >> one by one and doing a livebackup, the service provider would need >> to provision spare disk for at most the COW size of one VM. >> >> Thanks, >> Jagane >> >> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-09 13:40 [Qemu-devel] [RFC] live snapshot, live merge, live block migration Dor Laor ` (2 preceding siblings ...) 2011-05-12 15:33 ` Jes Sorensen @ 2011-05-20 12:19 ` Stefan Hajnoczi 2011-05-20 12:39 ` Jes Sorensen ` (2 more replies) 3 siblings, 3 replies; 28+ messages in thread From: Stefan Hajnoczi @ 2011-05-20 12:19 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, jes sorensen, Marcelo Tosatti, qemu-devel, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty I'm interested in what the API for snapshots would look like. Specifically how does user software do the following: 1. Create a snapshot 2. Delete a snapshot 3. List snapshots 4. Access data from a snapshot 5. Restore a VM from a snapshot 6. Get the dirty blocks list (for incremental backup) We've discussed image format-level approaches but I think the scope of the API should cover several levels at which snapshots are implemented: 1. Image format - image file snapshot (Jes, Jagane) 2. Host file system - ext4 and btrfs snapshots 3. Storage system - LVM or SAN volume snapshots It will be hard to take advantage of more efficient host file system or storage system snapshots if they are not designed in now. Is anyone familiar enough with the libvirt storage APIs to draft an extension that adds snapshot support? I will take a stab at it if no one else want to try it. Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-20 12:19 ` Stefan Hajnoczi @ 2011-05-20 12:39 ` Jes Sorensen 2011-05-20 12:49 ` Stefan Hajnoczi 2011-05-22 9:52 ` Dor Laor 2011-05-23 5:42 ` Jagane Sundar 2 siblings, 1 reply; 28+ messages in thread From: Jes Sorensen @ 2011-05-20 12:39 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Anthony Liguori, dlaor, Marcelo Tosatti, qemu-devel, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty, Jiri Denemark, Eric Blake On 05/20/11 14:19, Stefan Hajnoczi wrote: > I'm interested in what the API for snapshots would look like. I presume you're talking external snapshots here? The API is really what should be defined by libvirt, so you get a unified API that can work both on QEMU level snapshots as well as enterprise storage, host file system snapshots etc. > Specifically how does user software do the following: > 1. Create a snapshot There's a QMP patch out already that is still not applied, but it is pretty simple, similar to the hmp command. Alternatively you can do it the evil way by pre-creating the snapshot image file and feeding that the snapshot command. In this case QEMU won't create the snapshot file. > 2. Delete a snapshot This is still to be defined. > 3. List snapshots Again this is tricky as it depends on the type of snapshot. For QEMU level ones they are files, so 'ls' is your friend :) > 4. Access data from a snapshot You boot the snapshot file. > 5. Restore a VM from a snapshot We're talking snapshots not checkpointing here, so you cannot restore a VM from a snapshot. > 6. Get the dirty blocks list (for incremental backup) Good question > We've discussed image format-level approaches but I think the scope of > the API should cover several levels at which snapshots are > implemented: > 1. Image format - image file snapshot (Jes, Jagane) > 2. Host file system - ext4 and btrfs snapshots > 3. Storage system - LVM or SAN volume snapshots > > It will be hard to take advantage of more efficient host file system > or storage system snapshots if they are not designed in now. > > Is anyone familiar enough with the libvirt storage APIs to draft an > extension that adds snapshot support? I will take a stab at it if no > one else want to try it. I believe the libvirt guys are already looking at this. Adding to the CC list. Cheers, Jes ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-20 12:39 ` Jes Sorensen @ 2011-05-20 12:49 ` Stefan Hajnoczi 2011-05-20 12:56 ` Jes Sorensen 0 siblings, 1 reply; 28+ messages in thread From: Stefan Hajnoczi @ 2011-05-20 12:49 UTC (permalink / raw) To: Jes Sorensen Cc: Kevin Wolf, Anthony Liguori, dlaor, Marcelo Tosatti, qemu-devel, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty, Jiri Denemark, Eric Blake On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > On 05/20/11 14:19, Stefan Hajnoczi wrote: >> I'm interested in what the API for snapshots would look like. > > I presume you're talking external snapshots here? The API is really what > should be defined by libvirt, so you get a unified API that can work > both on QEMU level snapshots as well as enterprise storage, host file > system snapshots etc. Thanks for the pointers on external snapshots using image files. I'm really thinking about the libvirt API. Basically I'm not sure we'll implement the right things if we don't think through the API that the user sees first. Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-20 12:49 ` Stefan Hajnoczi @ 2011-05-20 12:56 ` Jes Sorensen 0 siblings, 0 replies; 28+ messages in thread From: Jes Sorensen @ 2011-05-20 12:56 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Anthony Liguori, dlaor, Marcelo Tosatti, qemu-devel, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty, Jiri Denemark, Eric Blake On 05/20/11 14:49, Stefan Hajnoczi wrote: > On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote: >> On 05/20/11 14:19, Stefan Hajnoczi wrote: >>> I'm interested in what the API for snapshots would look like. >> >> I presume you're talking external snapshots here? The API is really what >> should be defined by libvirt, so you get a unified API that can work >> both on QEMU level snapshots as well as enterprise storage, host file >> system snapshots etc. > > Thanks for the pointers on external snapshots using image files. I'm > really thinking about the libvirt API. > > Basically I'm not sure we'll implement the right things if we don't > think through the API that the user sees first. Right, I agree. There's a lot of variables there, and they are not necessarily easy to map into a single namespace. I am not sure it should be done either...... Cheers, Jes ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-20 12:19 ` Stefan Hajnoczi 2011-05-20 12:39 ` Jes Sorensen @ 2011-05-22 9:52 ` Dor Laor 2011-05-23 13:02 ` Stefan Hajnoczi 2011-05-23 5:42 ` Jagane Sundar 2 siblings, 1 reply; 28+ messages in thread From: Dor Laor @ 2011-05-22 9:52 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Anthony Liguori, libvir-list, jes sorensen, Marcelo Tosatti, qemu-devel, Ayal Baron, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote: > I'm interested in what the API for snapshots would look like. > Specifically how does user software do the following: > 1. Create a snapshot > 2. Delete a snapshot > 3. List snapshots > 4. Access data from a snapshot There are plenty of options there: - Run a (unrelated) VM and hotplug the snapshot as additional disk - Use v2v (libguestfs) - Boot the VM w/ RO - Plenty more > 5. Restore a VM from a snapshot > 6. Get the dirty blocks list (for incremental backup) It might be needed for additional proposes like efficient delta sync across sites or any other storage operation (dedup, etc) > > We've discussed image format-level approaches but I think the scope of > the API should cover several levels at which snapshots are > implemented: > 1. Image format - image file snapshot (Jes, Jagane) > 2. Host file system - ext4 and btrfs snapshots > 3. Storage system - LVM or SAN volume snapshots > > It will be hard to take advantage of more efficient host file system > or storage system snapshots if they are not designed in now. I agree but it can also be a chicken and the egg problem. Actually 1/2/3/5 are already working today regardless of live snapshots. > Is anyone familiar enough with the libvirt storage APIs to draft an > extension that adds snapshot support? I will take a stab at it if no > one else want to try it. I added libvirt-list and Ayal Baron from vdsm. What you're asking is even beyond snapshots, it's the whole management of VM images. Doing the above operations is simple but for enterprise virtualization solution you'll need to lock the NFS/SAN images, handle failures of VM/SAN/Mgmt, keep the snapshots info in mgmt DB, etc. Today it is managed by a combination of rhev-m/vdsm and libvirt. I agree it would have been nice to get such common single entry point interface. > > Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-22 9:52 ` Dor Laor @ 2011-05-23 13:02 ` Stefan Hajnoczi 2011-05-27 16:46 ` Stefan Hajnoczi 0 siblings, 1 reply; 28+ messages in thread From: Stefan Hajnoczi @ 2011-05-23 13:02 UTC (permalink / raw) To: dlaor Cc: Kevin Wolf, Anthony Liguori, libvir-list, jes sorensen, Marcelo Tosatti, qemu-devel, Ayal Baron, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty On Sun, May 22, 2011 at 10:52 AM, Dor Laor <dlaor@redhat.com> wrote: > On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote: >> >> I'm interested in what the API for snapshots would look like. >> Specifically how does user software do the following: >> 1. Create a snapshot >> 2. Delete a snapshot >> 3. List snapshots >> 4. Access data from a snapshot > > There are plenty of options there: > - Run a (unrelated) VM and hotplug the snapshot as additional disk This is the backup appliance VM model and makes it possible to move the backup application to where the data is (or not, if you have a SAN and decide to spin up the appliance VM on another host). This should be perfectly doable if snapshots are "volumes" at the libvirt level. A special-case of the backup appliance VM is using libguestfs to access the snapshot from the host. This includes both block-level and file system-level access along with OS detection APIs that libguestfs provides. If snapshots are "volumes" at the libvirt level, then it is also possible to use virStorageVolDownload() to stream the entire snapshot through libvirt: http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload Summarizing, here are three access methods that integrate with libvirt and cover many use cases: 1. Backup appliance VM. Add a readonly snapshot volume to a backup appliance VM. If shared storage (e.g. SAN) is available then the appliance can be run on any host. Otherwise the appliance must run on the same host that the snapshot resides on. 2. Libguestfs client on host. Launch libguestfs with the readonly snapshot volume. The backup application runs directly on the host, it has both block and file system access to the snapshot. 3. Download the snapshot to a remote host for backup processing. Use the virStorageVolDownload() API to download the snapshot onto a libvirt client machine. Dirty block tracking is still useful here since the virStorageVolDownload() API supports <offset, length> arguments. >> 5. Restore a VM from a snapshot Simplest option: virStorageVolUpload(). >> 6. Get the dirty blocks list (for incremental backup) > > It might be needed for additional proposes like efficient delta sync across > sites or any other storage operation (dedup, etc) > >> >> We've discussed image format-level approaches but I think the scope of >> the API should cover several levels at which snapshots are >> implemented: >> 1. Image format - image file snapshot (Jes, Jagane) >> 2. Host file system - ext4 and btrfs snapshots >> 3. Storage system - LVM or SAN volume snapshots >> >> It will be hard to take advantage of more efficient host file system >> or storage system snapshots if they are not designed in now. > > I agree but it can also be a chicken and the egg problem. > Actually 1/2/3/5 are already working today regardless of live snapshots. > >> Is anyone familiar enough with the libvirt storage APIs to draft an >> extension that adds snapshot support? I will take a stab at it if no >> one else want to try it. > > I added libvirt-list and Ayal Baron from vdsm. > What you're asking is even beyond snapshots, it's the whole management of VM > images. Doing the above operations is simple but for enterprise > virtualization solution you'll need to lock the NFS/SAN images, handle > failures of VM/SAN/Mgmt, keep the snapshots info in mgmt DB, etc. > > Today it is managed by a combination of rhev-m/vdsm and libvirt. > I agree it would have been nice to get such common single entry point > interface. Okay, the user API seems to be one layer above libvirt. Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-23 13:02 ` Stefan Hajnoczi @ 2011-05-27 16:46 ` Stefan Hajnoczi 2011-05-27 17:16 ` Jagane Sundar 0 siblings, 1 reply; 28+ messages in thread From: Stefan Hajnoczi @ 2011-05-27 16:46 UTC (permalink / raw) To: Jagane Sundar Cc: Kevin Wolf, Anthony Liguori, dlaor, libvir-list, jes sorensen, Marcelo Tosatti, qemu-devel, Ayal Baron, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty On Mon, May 23, 2011 at 2:02 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Sun, May 22, 2011 at 10:52 AM, Dor Laor <dlaor@redhat.com> wrote: >> On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote: >>> >>> I'm interested in what the API for snapshots would look like. >>> Specifically how does user software do the following: >>> 1. Create a snapshot >>> 2. Delete a snapshot >>> 3. List snapshots >>> 4. Access data from a snapshot >> >> There are plenty of options there: >> - Run a (unrelated) VM and hotplug the snapshot as additional disk > > This is the backup appliance VM model and makes it possible to move > the backup application to where the data is (or not, if you have a SAN > and decide to spin up the appliance VM on another host). This should > be perfectly doable if snapshots are "volumes" at the libvirt level. > > A special-case of the backup appliance VM is using libguestfs to > access the snapshot from the host. This includes both block-level and > file system-level access along with OS detection APIs that libguestfs > provides. > > If snapshots are "volumes" at the libvirt level, then it is also > possible to use virStorageVolDownload() to stream the entire snapshot > through libvirt: > http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload > > Summarizing, here are three access methods that integrate with libvirt > and cover many use cases: > > 1. Backup appliance VM. Add a readonly snapshot volume to a backup > appliance VM. If shared storage (e.g. SAN) is available then the > appliance can be run on any host. Otherwise the appliance must run on > the same host that the snapshot resides on. > > 2. Libguestfs client on host. Launch libguestfs with the readonly > snapshot volume. The backup application runs directly on the host, it > has both block and file system access to the snapshot. > > 3. Download the snapshot to a remote host for backup processing. Use > the virStorageVolDownload() API to download the snapshot onto a > libvirt client machine. Dirty block tracking is still useful here > since the virStorageVolDownload() API supports <offset, length> > arguments. Jagane, What do you think about these access methods? What does your custom protocol integrate with today - do you have a custom non-libvirt KVM management stack? Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-27 16:46 ` Stefan Hajnoczi @ 2011-05-27 17:16 ` Jagane Sundar 0 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-27 17:16 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Anthony Liguori, dlaor, libvir-list, jes sorensen, Marcelo Tosatti, qemu-devel, Ayal Baron, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty On 5/27/2011 9:46 AM, Stefan Hajnoczi wrote: > On Mon, May 23, 2011 at 2:02 PM, Stefan Hajnoczi<stefanha@gmail.com> wrote: >> On Sun, May 22, 2011 at 10:52 AM, Dor Laor<dlaor@redhat.com> wrote: >>> On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote: >>>> I'm interested in what the API for snapshots would look like. >>>> Specifically how does user software do the following: >>>> 1. Create a snapshot >>>> 2. Delete a snapshot >>>> 3. List snapshots >>>> 4. Access data from a snapshot >>> There are plenty of options there: >>> - Run a (unrelated) VM and hotplug the snapshot as additional disk >> This is the backup appliance VM model and makes it possible to move >> the backup application to where the data is (or not, if you have a SAN >> and decide to spin up the appliance VM on another host). This should >> be perfectly doable if snapshots are "volumes" at the libvirt level. >> >> A special-case of the backup appliance VM is using libguestfs to >> access the snapshot from the host. This includes both block-level and >> file system-level access along with OS detection APIs that libguestfs >> provides. >> >> If snapshots are "volumes" at the libvirt level, then it is also >> possible to use virStorageVolDownload() to stream the entire snapshot >> through libvirt: >> http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload >> >> Summarizing, here are three access methods that integrate with libvirt >> and cover many use cases: >> >> 1. Backup appliance VM. Add a readonly snapshot volume to a backup >> appliance VM. If shared storage (e.g. SAN) is available then the >> appliance can be run on any host. Otherwise the appliance must run on >> the same host that the snapshot resides on. >> >> 2. Libguestfs client on host. Launch libguestfs with the readonly >> snapshot volume. The backup application runs directly on the host, it >> has both block and file system access to the snapshot. >> >> 3. Download the snapshot to a remote host for backup processing. Use >> the virStorageVolDownload() API to download the snapshot onto a >> libvirt client machine. Dirty block tracking is still useful here >> since the virStorageVolDownload() API supports<offset, length> >> arguments. > Jagane, > What do you think about these access methods? What does your custom > protocol integrate with today - do you have a custom non-libvirt KVM > management stack? > > Stefan Hello Stefan, The current livebackup_client simply creates a backup of the VM on the backup server. It can save the backup image as a complete image for quick start of the VM on the backup server, or as 'full + n number of incremental backup redo files'. The 'full + n incremental redo' is useful if you want to store the backup on tape. I don't have a full backup management stack yet. If livebackup_client were available as part of kvm, then that would turn into the command line utility that the backup management stack would use. My own intertest is in using livebackup_client to integrate all management functions into openstack. All management built into openstack will be built with the intent of self service. However, other Enterprise backup management stacks such as that from Symantec, etc. can be enhanced to use livebackup_client to extract the backup from the VM Host. How does it apply to the above access mechanisms. Hmm. Let me see. 1. Backup appliance VM. : A backup appliance VM can be started up and the livebackup images can be connected to it. The limitation is that the backup appliance VM must be started up on the backup server, where the livebackup image resides on a local disk. 2. Libguestfs client on host. This too is possible. The restriction is that libguestfs must be on the backup server, and not on the VM Host. 3. Download the snapshot to a remote host for backup processing. This is the native method for livebackup. Thanks, Jagane ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration 2011-05-20 12:19 ` Stefan Hajnoczi 2011-05-20 12:39 ` Jes Sorensen 2011-05-22 9:52 ` Dor Laor @ 2011-05-23 5:42 ` Jagane Sundar 2 siblings, 0 replies; 28+ messages in thread From: Jagane Sundar @ 2011-05-23 5:42 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, jes sorensen, dlaor, qemu-devel, Mingming Cao, Avi Kivity, Stefan Hajnoczi, Badari Pulavarty Hello Stefan, I have been thinking about this since you sent out this message. A quick look at the libvirt API indicates that their notion of a snapshot often refers to a "disk+memory snapshot". It would be good to provide feedback to the libvirt developers to make sure that proper support for a 'disk only snapshot' capability is included. You might have already seen this, but here's a email chain from the libvirt mailing list that's relevant: http://www.redhat.com/archives/libvir-list/2010-March/msg01389.html I am very interested in enhancing libvirt to support the Livebackup semantics, for the following reason: If libvirt can be enhanced to support all the constructs required for full Livebackup functionality, then I would like to remove the built-in livebackup network protocol, and rewrite the client such that it is a native program on the VM host linked with libvirt, and can perform a full or incremental backup using libvirt. If a remote backup needs to be performed, then I would require the remote client to ssh into the VM host, and then run the local backup and pipe back to the remote backup host. This way I would not need to deal with authentication of livebackup client and server, and encryption of the network connection. Please see my feedback regarding the specific operations below: On 5/20/2011 5:19 AM, Stefan Hajnoczi wrote: > I'm interested in what the API for snapshots would look like. > Specifically how does user software do the following: > 1. Create a snapshot For livebackup, one parameter that is required is the 'full' or 'incremental' backup parameter. If the param is 'incremental' then only the blocks that were modified since the last snapshot command was issued are part of the snapshot. If the param is 'full', the the snapshot includes all the blocks of all the disks in the VM. > 2. Delete a snapshot Simple for livebackup, since no more than one snapshot is allowed. Hence naming is a non-issue. As is deleting. > 3. List snapshots Again, simple for livebackup, on account of the one active snapshot restriction. > 4. Access data from a snapshot In traditional terms, access could mean many things. Some examples: 1. Access lists a set of files on the local file system of the VM Host. A small VM may be started up, and mount these snapshot files as a set of secondary drives 2. Publish the snapshot drives as iSCSI LUNs. 3. If the origin drives are on a Netapp filer, perhaps a filer snapshot is created, and a URL describing that snapshot is printed out. Access, in Livebackup terms, is merely copying dirty blocks over from qemu. Livebackup does not provide a random access mode - i.e. one where a VM could be started using the snapshot. Currently, Livebackup uses 4K clusters of 512 byte blocks. 'Dirty clusters' are transferred over by the client supplying a 'cluster number' param, and qemu returning the next 'n' number of contiguous dirty clusters. At the end, qemu returns a 'no-more-dirty' error. > 5. Restore a VM from a snapshot Additional info for re-creating the VM needs to be saved when a snapshot is saved. The origin VM's libvirt XML desciptor should probably be saved along with the snapshot. > 6. Get the dirty blocks list (for incremental backup) Either a complete dump of the dirty blocks, or a way to iterate through the dirty blocks and fetch them needs to be provided. My preference is to use the iterate through the dirty blocks approach, since that will enable the client to pace the backup process and provide guarantees such as 'no more than 10% of the network b/w will be utilized for backup'. > We've discussed image format-level approaches but I think the scope of > the API should cover several levels at which snapshots are > implemented: > 1. Image format - image file snapshot (Jes, Jagane) Livebackup uses qcow2 to save the Copy-On-Write blocks that are dirtied by the VM when the snapshot is active. > 2. Host file system - ext4 and btrfs snapshots I have tested with ext4 and raw LVM volumes for the origin virtual disk files. The qcow2 COW files have only resided on ext4. > 3. Storage system - LVM or SAN volume snapshots > > It will be hard to take advantage of more efficient host file system > or storage system snapshots if they are not designed in now. > I agree. A snapshot and restore from backup should not result in the virtual disk file getting inflated (going from sparse to fully allocated, for example). > Is anyone familiar enough with the libvirt storage APIs to draft an > extension that adds snapshot support? I will take a stab at it if no > one else want to try it. > I have only looked at it briefly, after getting your email message. If you can take a deeper look at it, I would be willing to work with you to iron out details. Thanks, Jagane ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2011-05-27 17:17 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-09 13:40 [Qemu-devel] [RFC] live snapshot, live merge, live block migration Dor Laor 2011-05-09 15:23 ` Anthony Liguori 2011-05-09 20:58 ` Dor Laor 2011-05-12 14:18 ` Marcelo Tosatti 2011-05-12 15:37 ` Jes Sorensen 2011-05-10 14:13 ` Marcelo Tosatti 2011-05-12 15:33 ` Jes Sorensen 2011-05-13 3:16 ` Jagane Sundar 2011-05-15 21:14 ` Dor Laor 2011-05-15 21:38 ` Jagane Sundar 2011-05-15 21:38 ` Jagane Sundar 2011-05-16 7:53 ` Dor Laor 2011-05-16 7:53 ` [Qemu-devel] " Dor Laor 2011-05-16 8:23 ` Jagane Sundar 2011-05-16 8:23 ` [Qemu-devel] " Jagane Sundar 2011-05-17 22:53 ` Dor Laor 2011-05-17 22:53 ` [Qemu-devel] " Dor Laor 2011-05-18 15:49 ` Jagane Sundar 2011-05-18 15:49 ` Jagane Sundar 2011-05-20 12:19 ` Stefan Hajnoczi 2011-05-20 12:39 ` Jes Sorensen 2011-05-20 12:49 ` Stefan Hajnoczi 2011-05-20 12:56 ` Jes Sorensen 2011-05-22 9:52 ` Dor Laor 2011-05-23 13:02 ` Stefan Hajnoczi 2011-05-27 16:46 ` Stefan Hajnoczi 2011-05-27 17:16 ` Jagane Sundar 2011-05-23 5:42 ` Jagane Sundar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.