* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command @ 2013-09-02 12:57 Benoît Canet 2013-09-03 7:54 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Benoît Canet @ 2013-09-02 12:57 UTC (permalink / raw) To: dietmar; +Cc: pbonzini, qemu-devel, stefanha I don't see the point of using hashes. Using hashes means that at least one extra read will be done on the target to compute the candidate target hash. It's bad for a cloud provider where IOs count is a huge cost. Another structure to replace a bitmap (smaller on the canonical case) would be a block table as described in the Hystor paper: www.cse.ohio-state.edu/~fchen/paper/papers/ics11.pdf Best regards Benoît ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-09-02 12:57 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Benoît Canet @ 2013-09-03 7:54 ` Stefan Hajnoczi 0 siblings, 0 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-09-03 7:54 UTC (permalink / raw) To: Benoît Canet; +Cc: pbonzini, dietmar, qemu-devel On Mon, Sep 02, 2013 at 02:57:23PM +0200, Benoît Canet wrote: > > I don't see the point of using hashes. > Using hashes means that at least one extra read will be done on the target to > compute the candidate target hash. > It's bad for a cloud provider where IOs count is a huge cost. > > Another structure to replace a bitmap (smaller on the canonical case) would be > a block table as described in the Hystor paper: > www.cse.ohio-state.edu/~fchen/paper/papers/ics11.pdf This is similar to syncing image formats that use a revision number for each cluster instead of a hash. The problem with counters is overflow. In the case of Hystor it is not necessary to preserve exact counts. A dirty bitmap must mark a block dirty if it has been modified, otherwise there is a risk of data loss. A bit more than just counters are necessary to implement a persistent dirty bitmap, but maybe it's possible with some additional state. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command @ 2013-05-15 14:34 Stefan Hajnoczi 2013-05-16 6:16 ` Wenchao Xia 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw) To: qemu-devel Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi, Paolo Bonzini, xiawenc Note: These patches apply to my block-next tree. You can also grab the code from git here: git://github.com/stefanha/qemu.git block-backup-core This series adds a new QMP command, drive-backup, which takes a point-in-time snapshot of a block device. The snapshot is copied out to a target block device. A simple example is: drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2 The original drive-backup blockjob was written by Dietmar Maurer <dietmar@proxmox.com>. He is currently busy but I feel the feature is worth pushing into QEMU since there has been interest. This is my version of his patch, plus the QMP command and qemu-iotests test case. QMP 'transaction' support is included since v3. It adds support for atomic snapshots of multiple block devices. I also added an 'abort' transaction to allow testing of the .abort()/.cleanup() code path. Thanks to Wenchao for making qmp_transaction() extensible. How is this different from block-stream and drive-mirror? --------------------------------------------------------- Both block-stream and drive-mirror do not provide immediate point-in-time snapshots. Instead they copy data into a new file and then switch to it. In other words, the point at which the "snapshot" is taken cannot be controlled directly. drive-backup intercepts guest writes and saves data into the target block device before it is overwritten. The target block device can be a raw image file, backing files are not used to implement this feature. How can drive-backup be used? ----------------------------- The simplest use-case is to copy a point-in-time snapshot to a local file. More advanced users may wish to make the target an NBD URL. The NBD server listening on the other side can process the backup writes any way it wishes. I previously posted an RFC series with a backup server that streamed Dietmar's VMA backup archive format. What's next for drive-backup? ----------------------------- 1. Sync modes like drive-mirror (top, full, none). This makes it possible to preserve the backing file chain. v3: * Rename to drive-backup for consistency with drive-mirror [kwolf] * Add QMP transaction support [kwolf] * Introduce bdrv_add_before_write_cb() to hook writes * Mention 'query-block-jobs' lists job of type 'backup' [eblake] * Rename rwlock to flush_rwlock [kwolf] * Fix space in block/backup.c comment [kwolf] v2: * s/block_backup/block-backup/ in commit message [eblake] * Avoid funny spacing in QMP docs [eblake] * Document query-block-jobs and block-job-cancel usage [eblake] Dietmar Maurer (1): block: add basic backup support to block driver Stefan Hajnoczi (7): block: add bdrv_add_before_write_cb() block: add drive-backup QMP command qemu-iotests: add 055 drive-backup test case blockdev: rename BlkTransactionStates to singular blockdev: add DriveBackup transaction blockdev: add Abort transaction qemu-iotests: test 'drive-backup' transaction in 055 block.c | 37 +++++ block/Makefile.objs | 1 + block/backup.c | 282 ++++++++++++++++++++++++++++++++++++ blockdev.c | 264 +++++++++++++++++++++++++++------- include/block/block_int.h | 48 +++++++ qapi-schema.json | 65 ++++++++- qmp-commands.hx | 6 + tests/qemu-iotests/055 | 348 +++++++++++++++++++++++++++++++++++++++++++++ tests/qemu-iotests/055.out | 5 + tests/qemu-iotests/group | 1 + 10 files changed, 1004 insertions(+), 53 deletions(-) create mode 100644 block/backup.c create mode 100755 tests/qemu-iotests/055 create mode 100644 tests/qemu-iotests/055.out -- 1.8.1.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-15 14:34 Stefan Hajnoczi @ 2013-05-16 6:16 ` Wenchao Xia 2013-05-16 7:47 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Wenchao Xia @ 2013-05-16 6:16 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar 于 2013-5-15 22:34, Stefan Hajnoczi 写道: > Note: These patches apply to my block-next tree. You can also grab the code > from git here: > git://github.com/stefanha/qemu.git block-backup-core > > This series adds a new QMP command, drive-backup, which takes a point-in-time > snapshot of a block device. The snapshot is copied out to a target block > device. A simple example is: > > drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2 > > The original drive-backup blockjob was written by Dietmar Maurer > <dietmar@proxmox.com>. He is currently busy but I feel the feature is worth > pushing into QEMU since there has been interest. This is my version of his > patch, plus the QMP command and qemu-iotests test case. > > QMP 'transaction' support is included since v3. It adds support for atomic > snapshots of multiple block devices. I also added an 'abort' transaction to > allow testing of the .abort()/.cleanup() code path. Thanks to Wenchao for > making qmp_transaction() extensible. > > How is this different from block-stream and drive-mirror? > --------------------------------------------------------- > Both block-stream and drive-mirror do not provide immediate point-in-time > snapshots. Instead they copy data into a new file and then switch to it. In > other words, the point at which the "snapshot" is taken cannot be controlled > directly. > > drive-backup intercepts guest writes and saves data into the target block > device before it is overwritten. The target block device can be a raw image > file, backing files are not used to implement this feature. > > How can drive-backup be used? > ----------------------------- > The simplest use-case is to copy a point-in-time snapshot to a local file. > > More advanced users may wish to make the target an NBD URL. The NBD server > listening on the other side can process the backup writes any way it wishes. I > previously posted an RFC series with a backup server that streamed Dietmar's > VMA backup archive format. > > What's next for drive-backup? > ----------------------------- > 1. Sync modes like drive-mirror (top, full, none). This makes it possible to > preserve the backing file chain. > > v3: > * Rename to drive-backup for consistency with drive-mirror [kwolf] > * Add QMP transaction support [kwolf] > * Introduce bdrv_add_before_write_cb() to hook writes > * Mention 'query-block-jobs' lists job of type 'backup' [eblake] > * Rename rwlock to flush_rwlock [kwolf] > * Fix space in block/backup.c comment [kwolf] > > v2: > * s/block_backup/block-backup/ in commit message [eblake] > * Avoid funny spacing in QMP docs [eblake] > * Document query-block-jobs and block-job-cancel usage [eblake] After checking the code, I found it possible to add delta data backup support also, If an additional dirty bitmap was added. Compared with current solution, I think it is doing COW at qemu device level: qemu device | general block layer | virtual format layer | ----------------------- | | qcow2 vmdk.... This will make things complicated when more works comes, a better place for block COW, is under general block layer. Maybe later we can adjust block for it. -- Best Regards Wenchao Xia ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-16 6:16 ` Wenchao Xia @ 2013-05-16 7:47 ` Stefan Hajnoczi 2013-05-17 6:58 ` Wenchao Xia 2013-05-17 10:17 ` Paolo Bonzini 0 siblings, 2 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-16 7:47 UTC (permalink / raw) To: Wenchao Xia Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: > After checking the code, I found it possible to add delta data backup > support also, If an additional dirty bitmap was added. I've been thinking about this. Incremental backups need to know which blocks have changed, but keeping a persistent dirty bitmap is expensive and unnecessary. Backup applications need to support the full backup case anyway for their first run. Therefore we can keep a best-effort dirty bitmap which is persisted only when the guest is terminated cleanly. If the QEMU process crashes then the on-disk dirty bitmap will be invalid and the backup application needs to do a full backup next time. The advantage of this approach is that we don't need to fdatasync(2) before every guest write operation. > Compared with > current solution, I think it is doing COW at qemu device level: > > qemu device > | > general block layer > | > virtual format layer > | > ----------------------- > | | > qcow2 vmdk.... > > This will make things complicated when more works comes, a better > place for block COW, is under general block layer. Maybe later we > can adjust block for it. I don't consider block jobs to be "qemu device" layer. It sounds like you think the code should be in bdrv_co_do_writev()? The drive-backup operation doesn't really affect the source BlockDriverState, it just needs to intercept writes. Therefore it seems cleaner for the code to live separately (plus we reuse the code for the block job loop which copies out data while the guest is running). Otherwise we would squash all of the blockjob code into block.c and it would be an even bigger mess than it is today :-). ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-16 7:47 ` Stefan Hajnoczi @ 2013-05-17 6:58 ` Wenchao Xia 2013-05-17 9:14 ` Stefan Hajnoczi 2013-05-17 10:17 ` Paolo Bonzini 1 sibling, 1 reply; 24+ messages in thread From: Wenchao Xia @ 2013-05-17 6:58 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar 于 2013-5-16 15:47, Stefan Hajnoczi 写道: > On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: >> After checking the code, I found it possible to add delta data backup >> support also, If an additional dirty bitmap was added. > > I've been thinking about this. Incremental backups need to know which > blocks have changed, but keeping a persistent dirty bitmap is expensive > and unnecessary. > Yes, it would be likely another block layer, so hope not do that. > Backup applications need to support the full backup case anyway for > their first run. Therefore we can keep a best-effort dirty bitmap which > is persisted only when the guest is terminated cleanly. If the QEMU > process crashes then the on-disk dirty bitmap will be invalid and the > backup application needs to do a full backup next time. > > The advantage of this approach is that we don't need to fdatasync(2) > before every guest write operation. > >> Compared with >> current solution, I think it is doing COW at qemu device level: >> >> qemu device >> | >> general block layer >> | >> virtual format layer >> | >> ----------------------- >> | | >> qcow2 vmdk.... >> >> This will make things complicated when more works comes, a better >> place for block COW, is under general block layer. Maybe later we >> can adjust block for it. > > I don't consider block jobs to be "qemu device" layer. It sounds like > you think the code should be in bdrv_co_do_writev()? > I feel a trend of becoming fragility from different solutions, and COW is a key feature that block layer provide, so I wonder if it can be adjusted under block layer later, and leaves an abstract API for it. Some other operation such as commit, stream, could be also hide under block. qemu general testcase other user | | | -------------------------------------------------- | core block abstract layer(COW, zero R/W, image dup/backup) | --------------------- | | qemu's implement 3'rd party | | ------------- -------------- | | | | qcow2 vmdk lvm2 Enterprise storage integration It is not directly related to this serial, but I feel some effort should be paid when time allows, before things become complicated. > The drive-backup operation doesn't really affect the source > BlockDriverState, it just needs to intercept writes. Therefore it seems > cleaner for the code to live separately (plus we reuse the code for the > block job loop which copies out data while the guest is running). > Otherwise we would squash all of the blockjob code into block.c and it > would be an even bigger mess than it is today :-). > -- Best Regards Wenchao Xia ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-17 6:58 ` Wenchao Xia @ 2013-05-17 9:14 ` Stefan Hajnoczi 2013-05-21 3:25 ` Wenchao Xia 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-17 9:14 UTC (permalink / raw) To: Wenchao Xia Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, Paolo Bonzini, dietmar On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote: > 于 2013-5-16 15:47, Stefan Hajnoczi 写道: > >On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: > >> After checking the code, I found it possible to add delta data backup > >>support also, If an additional dirty bitmap was added. > > > >I've been thinking about this. Incremental backups need to know which > >blocks have changed, but keeping a persistent dirty bitmap is expensive > >and unnecessary. > > > Yes, it would be likely another block layer, so hope not do that. Not at all, persistent dirty bitmaps need to be part of the block layer since they need to support any image type - qcow2, Gluster, raw LVM, etc. > >I don't consider block jobs to be "qemu device" layer. It sounds like > >you think the code should be in bdrv_co_do_writev()? > > > I feel a trend of becoming fragility from different solutions, > and COW is a key feature that block layer provide, so I wonder if it > can be adjusted under block layer later The generic block layer includes more than just block.c. It also includes block jobs and the qcow2 metadata cache that Dong Xu has extracted recently, for example. Therefore you need to be more specific about "what" and "why". This copy-on-write backup approach is available as a block job which runs on top of any BlockDriverState. What concrete change are you proposing? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-17 9:14 ` Stefan Hajnoczi @ 2013-05-21 3:25 ` Wenchao Xia 2013-05-21 7:34 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Wenchao Xia @ 2013-05-21 3:25 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, Paolo Bonzini, dietmar 于 2013-5-17 17:14, Stefan Hajnoczi 写道: > On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote: >> 于 2013-5-16 15:47, Stefan Hajnoczi 写道: >>> On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: >>>> After checking the code, I found it possible to add delta data backup >>>> support also, If an additional dirty bitmap was added. >>> >>> I've been thinking about this. Incremental backups need to know which >>> blocks have changed, but keeping a persistent dirty bitmap is expensive >>> and unnecessary. >>> >> Yes, it would be likely another block layer, so hope not do that. > > Not at all, persistent dirty bitmaps need to be part of the block layer > since they need to support any image type - qcow2, Gluster, raw LVM, > etc. > >>> I don't consider block jobs to be "qemu device" layer. It sounds like >>> you think the code should be in bdrv_co_do_writev()? >>> >> I feel a trend of becoming fragility from different solutions, >> and COW is a key feature that block layer provide, so I wonder if it >> can be adjusted under block layer later > > The generic block layer includes more than just block.c. It also > includes block jobs and the qcow2 metadata cache that Dong Xu has > extracted recently, for example. Therefore you need to be more specific > about "what" and "why". > > This copy-on-write backup approach is available as a block job which > runs on top of any BlockDriverState. What concrete change are you > proposing? > Since hard to hide it BlockDriverState now, suggest add some document in qemu about the three snapshot types: qcow2 internal, backing chain, drive-backup, which are all qemu software based snapshot implemention, then user can know the difference with it eaiser. In long term, I hope to form a library expose those in a unified format, perhaps it calls qmp_transaction internally, and make it easier to be offloaded if possible, so hope a abstract-driver structure. -- Best Regards Wenchao Xia ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-21 3:25 ` Wenchao Xia @ 2013-05-21 7:34 ` Stefan Hajnoczi 0 siblings, 0 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-21 7:34 UTC (permalink / raw) To: Wenchao Xia Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain, Paolo Bonzini, dietmar On Tue, May 21, 2013 at 11:25:01AM +0800, Wenchao Xia wrote: > 于 2013-5-17 17:14, Stefan Hajnoczi 写道: > >On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote: > >>于 2013-5-16 15:47, Stefan Hajnoczi 写道: > >>>On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: > >>>> After checking the code, I found it possible to add delta data backup > >>>>support also, If an additional dirty bitmap was added. > >>> > >>>I've been thinking about this. Incremental backups need to know which > >>>blocks have changed, but keeping a persistent dirty bitmap is expensive > >>>and unnecessary. > >>> > >> Yes, it would be likely another block layer, so hope not do that. > > > >Not at all, persistent dirty bitmaps need to be part of the block layer > >since they need to support any image type - qcow2, Gluster, raw LVM, > >etc. > > > >>>I don't consider block jobs to be "qemu device" layer. It sounds like > >>>you think the code should be in bdrv_co_do_writev()? > >>> > >> I feel a trend of becoming fragility from different solutions, > >>and COW is a key feature that block layer provide, so I wonder if it > >>can be adjusted under block layer later > > > >The generic block layer includes more than just block.c. It also > >includes block jobs and the qcow2 metadata cache that Dong Xu has > >extracted recently, for example. Therefore you need to be more specific > >about "what" and "why". > > > >This copy-on-write backup approach is available as a block job which > >runs on top of any BlockDriverState. What concrete change are you > >proposing? > > > Since hard to hide it BlockDriverState now, suggest add some > document in qemu about the three snapshot types: qcow2 internal, > backing chain, drive-backup, which are all qemu software based snapshot > implemention, then user can know the difference with it eaiser. > > In long term, I hope to form a library expose those in a unified > format, perhaps it calls qmp_transaction internally, and make it > easier to be offloaded if possible, so hope a abstract-driver structure. Okay, just keep in mind they have different behavior. That means these snapshot types solve different problems and may be inappropriate for some use cases. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-16 7:47 ` Stefan Hajnoczi 2013-05-17 6:58 ` Wenchao Xia @ 2013-05-17 10:17 ` Paolo Bonzini 2013-05-20 6:24 ` Stefan Hajnoczi 1 sibling, 1 reply; 24+ messages in thread From: Paolo Bonzini @ 2013-05-17 10:17 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Wenchao Xia Il 16/05/2013 09:47, Stefan Hajnoczi ha scritto: > On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: >> After checking the code, I found it possible to add delta data backup >> support also, If an additional dirty bitmap was added. > > I've been thinking about this. Incremental backups need to know which > blocks have changed, but keeping a persistent dirty bitmap is expensive > and unnecessary. > > Backup applications need to support the full backup case anyway for > their first run. Therefore we can keep a best-effort dirty bitmap which > is persisted only when the guest is terminated cleanly. If the QEMU > process crashes then the on-disk dirty bitmap will be invalid and the > backup application needs to do a full backup next time. > > The advantage of this approach is that we don't need to fdatasync(2) > before every guest write operation. You only need to fdatasync() before every guest flush, no? Paolo >> Compared with >> current solution, I think it is doing COW at qemu device level: >> >> qemu device >> | >> general block layer >> | >> virtual format layer >> | >> ----------------------- >> | | >> qcow2 vmdk.... >> >> This will make things complicated when more works comes, a better >> place for block COW, is under general block layer. Maybe later we >> can adjust block for it. > > I don't consider block jobs to be "qemu device" layer. It sounds like > you think the code should be in bdrv_co_do_writev()? > > The drive-backup operation doesn't really affect the source > BlockDriverState, it just needs to intercept writes. Therefore it seems > cleaner for the code to live separately (plus we reuse the code for the > block job loop which copies out data while the guest is running). > Otherwise we would squash all of the blockjob code into block.c and it > would be an even bigger mess than it is today :-). > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-17 10:17 ` Paolo Bonzini @ 2013-05-20 6:24 ` Stefan Hajnoczi 2013-05-20 7:23 ` Paolo Bonzini 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-20 6:24 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi, Dietmar Maurer On Fri, May 17, 2013 at 12:17 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 16/05/2013 09:47, Stefan Hajnoczi ha scritto: >> On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: >>> After checking the code, I found it possible to add delta data backup >>> support also, If an additional dirty bitmap was added. >> >> I've been thinking about this. Incremental backups need to know which >> blocks have changed, but keeping a persistent dirty bitmap is expensive >> and unnecessary. >> >> Backup applications need to support the full backup case anyway for >> their first run. Therefore we can keep a best-effort dirty bitmap which >> is persisted only when the guest is terminated cleanly. If the QEMU >> process crashes then the on-disk dirty bitmap will be invalid and the >> backup application needs to do a full backup next time. >> >> The advantage of this approach is that we don't need to fdatasync(2) >> before every guest write operation. > > You only need to fdatasync() before every guest flush, no? No, you need to set the dirty bit before issuing the write on the host. Otherwise the image data may be modified without setting the appropriate dirty bit. That would allow data modifications to go undetected! Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-20 6:24 ` Stefan Hajnoczi @ 2013-05-20 7:23 ` Paolo Bonzini 2013-05-21 7:31 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Paolo Bonzini @ 2013-05-20 7:23 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, Dietmar Maurer, imain, Stefan Hajnoczi, Wenchao Xia Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto: >> > You only need to fdatasync() before every guest flush, no? > No, you need to set the dirty bit before issuing the write on the > host. Otherwise the image data may be modified without setting the > appropriate dirty bit. That would allow data modifications to go > undetected! But data modifications can go undetected until the guest flush returns, can't they? Paolo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-20 7:23 ` Paolo Bonzini @ 2013-05-21 7:31 ` Stefan Hajnoczi 2013-05-21 8:30 ` Paolo Bonzini 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-21 7:31 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, Dietmar Maurer, imain, Wenchao Xia On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote: > Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto: > >> > You only need to fdatasync() before every guest flush, no? > > No, you need to set the dirty bit before issuing the write on the > > host. Otherwise the image data may be modified without setting the > > appropriate dirty bit. That would allow data modifications to go > > undetected! > > But data modifications can go undetected until the guest flush returns, > can't they? You are thinking about it from the guest perspective - if a flush has not completed yet then there is no guarantee that the write has reached disk. But from a host perspective the dirty bitmap should be conservative so that the backup application can always restore a bit-for-bit identical copy of the disk image. It would be weird if writes can sneak in unnoticed. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-21 7:31 ` Stefan Hajnoczi @ 2013-05-21 8:30 ` Paolo Bonzini 2013-05-21 10:34 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Paolo Bonzini @ 2013-05-21 8:30 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, Dietmar Maurer, imain, Wenchao Xia Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto: > On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote: >> Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto: >>>>> You only need to fdatasync() before every guest flush, no? >>> No, you need to set the dirty bit before issuing the write on the >>> host. Otherwise the image data may be modified without setting the >>> appropriate dirty bit. That would allow data modifications to go >>> undetected! >> >> But data modifications can go undetected until the guest flush returns, >> can't they? > > You are thinking about it from the guest perspective - if a flush has > not completed yet then there is no guarantee that the write has reached > disk. > > But from a host perspective the dirty bitmap should be conservative so > that the backup application can always restore a bit-for-bit identical > copy of the disk image. It would be weird if writes can sneak in > unnoticed. True, but that would happen only in case the host crashes. Even for a QEMU crash the changes would be safe, I think. They would be written back when the persistent dirty bitmap's mmap() area is unmapped, during process exit. Paolo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-21 8:30 ` Paolo Bonzini @ 2013-05-21 10:34 ` Stefan Hajnoczi 2013-05-21 10:36 ` Paolo Bonzini 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-21 10:34 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, Dietmar Maurer, imain, Wenchao Xia On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote: > Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto: > > On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote: > >> Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto: > >>>>> You only need to fdatasync() before every guest flush, no? > >>> No, you need to set the dirty bit before issuing the write on the > >>> host. Otherwise the image data may be modified without setting the > >>> appropriate dirty bit. That would allow data modifications to go > >>> undetected! > >> > >> But data modifications can go undetected until the guest flush returns, > >> can't they? > > > > You are thinking about it from the guest perspective - if a flush has > > not completed yet then there is no guarantee that the write has reached > > disk. > > > > But from a host perspective the dirty bitmap should be conservative so > > that the backup application can always restore a bit-for-bit identical > > copy of the disk image. It would be weird if writes can sneak in > > unnoticed. > > True, but that would happen only in case the host crashes. Even for a > QEMU crash the changes would be safe, I think. They would be written > back when the persistent dirty bitmap's mmap() area is unmapped, during > process exit. I'd err on the side of caution, mark the persistent dirty bitmap while QEMU is running. Discard the file if there was a power failure. It really depends what the dirty bitmap users are doing. It could be okay to have a tiny chance of missing a modification but it might not. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-21 10:34 ` Stefan Hajnoczi @ 2013-05-21 10:36 ` Paolo Bonzini 2013-05-21 10:58 ` Dietmar Maurer 0 siblings, 1 reply; 24+ messages in thread From: Paolo Bonzini @ 2013-05-21 10:36 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, Dietmar Maurer, imain, Wenchao Xia Il 21/05/2013 12:34, Stefan Hajnoczi ha scritto: > On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote: >> Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto: >>> On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote: >>>> Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto: >>>>>>> You only need to fdatasync() before every guest flush, no? >>>>> No, you need to set the dirty bit before issuing the write on the >>>>> host. Otherwise the image data may be modified without setting the >>>>> appropriate dirty bit. That would allow data modifications to go >>>>> undetected! >>>> >>>> But data modifications can go undetected until the guest flush returns, >>>> can't they? >>> >>> You are thinking about it from the guest perspective - if a flush has >>> not completed yet then there is no guarantee that the write has reached >>> disk. >>> >>> But from a host perspective the dirty bitmap should be conservative so >>> that the backup application can always restore a bit-for-bit identical >>> copy of the disk image. It would be weird if writes can sneak in >>> unnoticed. >> >> True, but that would happen only in case the host crashes. Even for a >> QEMU crash the changes would be safe, I think. They would be written >> back when the persistent dirty bitmap's mmap() area is unmapped, during >> process exit. > > I'd err on the side of caution, mark the persistent dirty bitmap while > QEMU is running. Discard the file if there was a power failure. Agreed. Though this is something that management must do manually, isn't it? QEMU cannot distinguish a SIGKILL from a power failure, while management can afford treating SIGKILL as a power failure. > It really depends what the dirty bitmap users are doing. It could be > okay to have a tiny chance of missing a modification but it might not. Paolo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-21 10:36 ` Paolo Bonzini @ 2013-05-21 10:58 ` Dietmar Maurer 2013-05-22 13:43 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Dietmar Maurer @ 2013-05-21 10:58 UTC (permalink / raw) To: Paolo Bonzini, Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain, Wenchao Xia > >> True, but that would happen only in case the host crashes. Even for > >> a QEMU crash the changes would be safe, I think. They would be > >> written back when the persistent dirty bitmap's mmap() area is > >> unmapped, during process exit. > > > > I'd err on the side of caution, mark the persistent dirty bitmap while > > QEMU is running. Discard the file if there was a power failure. > > Agreed. Though this is something that management must do manually, isn't it? > QEMU cannot distinguish a SIGKILL from a power failure, while management > can afford treating SIGKILL as a power failure. > > > It really depends what the dirty bitmap users are doing. It could be > > okay to have a tiny chance of missing a modification but it might not. I just want to mention that there is another way to do incremental backups. Instead of using a dirty bitmap, you can compare the content, usually using a digest (SHA1) on clusters. That way you can also implement async replication to a remote site (like MS do). ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-21 10:58 ` Dietmar Maurer @ 2013-05-22 13:43 ` Stefan Hajnoczi 2013-05-22 15:10 ` Dietmar Maurer 2013-05-22 15:34 ` Dietmar Maurer 0 siblings, 2 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-22 13:43 UTC (permalink / raw) To: Dietmar Maurer Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain, Paolo Bonzini, Wenchao Xia On Tue, May 21, 2013 at 10:58:47AM +0000, Dietmar Maurer wrote: > > >> True, but that would happen only in case the host crashes. Even for > > >> a QEMU crash the changes would be safe, I think. They would be > > >> written back when the persistent dirty bitmap's mmap() area is > > >> unmapped, during process exit. > > > > > > I'd err on the side of caution, mark the persistent dirty bitmap while > > > QEMU is running. Discard the file if there was a power failure. > > > > Agreed. Though this is something that management must do manually, isn't it? > > QEMU cannot distinguish a SIGKILL from a power failure, while management > > can afford treating SIGKILL as a power failure. > > > > > It really depends what the dirty bitmap users are doing. It could be > > > okay to have a tiny chance of missing a modification but it might not. > > I just want to mention that there is another way to do incremental backups. Instead > of using a dirty bitmap, you can compare the content, usually using a digest (SHA1) on clusters. Reading gigabytes of data from disk is expensive though. I guess they keep a Merkle tree so it's easy to find out which parts of the image must be transferred without re-reading the entire image. That sounds like more work than a persistent dirty bitmap. The advantage is that while dirty bitmaps are consumed by a single user, the Merkle tree can be used to sync up any number of replicas. > That way you can also implement async replication to a remote site (like MS do). Sounds like rsync. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-22 13:43 ` Stefan Hajnoczi @ 2013-05-22 15:10 ` Dietmar Maurer 2013-05-22 15:34 ` Dietmar Maurer 1 sibling, 0 replies; 24+ messages in thread From: Dietmar Maurer @ 2013-05-22 15:10 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain, Paolo Bonzini, Wenchao Xia > > That way you can also implement async replication to a remote site (like MS > do). > > Sounds like rsync. yes, but we need 'snapshots' and something more optimized (rsync compared the whole files). I think this can be implemented using the backup job with a specialized backup driver. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-22 13:43 ` Stefan Hajnoczi 2013-05-22 15:10 ` Dietmar Maurer @ 2013-05-22 15:34 ` Dietmar Maurer 2013-05-23 8:04 ` Stefan Hajnoczi 1 sibling, 1 reply; 24+ messages in thread From: Dietmar Maurer @ 2013-05-22 15:34 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain, Paolo Bonzini, Wenchao Xia > That sounds like more work than a persistent dirty bitmap. The advantage is that > while dirty bitmaps are consumed by a single user, the Merkle tree can be used > to sync up any number of replicas. I also consider it safer, because you make sure the data exists (using hash keys like SHA1). I am unsure how you can check if a dirty bitmap contains errors, or is out of date? Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always related to a single image. (consider the user remove the latest backup from the backup target). ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-22 15:34 ` Dietmar Maurer @ 2013-05-23 8:04 ` Stefan Hajnoczi 2013-05-23 8:11 ` Dietmar Maurer 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-23 8:04 UTC (permalink / raw) To: Dietmar Maurer Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, Paolo Bonzini, Wenchao Xia On Wed, May 22, 2013 at 03:34:18PM +0000, Dietmar Maurer wrote: > > That sounds like more work than a persistent dirty bitmap. The advantage is that > > while dirty bitmaps are consumed by a single user, the Merkle tree can be used > > to sync up any number of replicas. > > I also consider it safer, because you make sure the data exists (using hash keys like SHA1). > > I am unsure how you can check if a dirty bitmap contains errors, or is out of date? > > Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always related to a single image. > (consider the user remove the latest backup from the backup target). One disadvantage of Merkle trees is that the client becomes stateful - the client needs to store its own Merkle tree and this requires fancier client-side code. It is also more expensive to update hashes than a dirty bitmap. Not because you need to hash data but because a small write (e.g. 1 sector) requires that you read the surrounding sectors to recompute a hash for the cluster. Therefore you can expect worse guest I/O performance than with a dirty bitmap. I still think it's a cool idea. Making it work well will require a lot more effort than a dirty bitmap. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-23 8:04 ` Stefan Hajnoczi @ 2013-05-23 8:11 ` Dietmar Maurer 2013-05-24 8:38 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Dietmar Maurer @ 2013-05-23 8:11 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, Paolo Bonzini, Wenchao Xia > > I also consider it safer, because you make sure the data exists (using hash keys > like SHA1). > > > > I am unsure how you can check if a dirty bitmap contains errors, or is out of > date? > > > > Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always > related to a single image. > > (consider the user remove the latest backup from the backup target). > > One disadvantage of Merkle trees is that the client becomes stateful - the client > needs to store its own Merkle tree and this requires fancier client-side code. What 'client' do you talk about here? But sure, the code gets more complex, and needs considerable amount of RAM to store the hash keys . > It is also more expensive to update hashes than a dirty bitmap. Not because you > need to hash data but because a small write (e.g. 1 sector) requires that you > read the surrounding sectors to recompute a hash for the cluster. Therefore you > can expect worse guest I/O performance than with a dirty bitmap. There is no need to update any hash - You only need to do that on backup - in fact, all those things can be done by the backup driver. > I still think it's a cool idea. Making it work well will require a lot more effort than > a dirty bitmap. How do you re-generate a dirty bitmap after a server crash? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-23 8:11 ` Dietmar Maurer @ 2013-05-24 8:38 ` Stefan Hajnoczi 2013-05-24 9:53 ` Dietmar Maurer 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-05-24 8:38 UTC (permalink / raw) To: Dietmar Maurer Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, Paolo Bonzini, Wenchao Xia On Thu, May 23, 2013 at 08:11:42AM +0000, Dietmar Maurer wrote: > > > I also consider it safer, because you make sure the data exists (using hash keys > > like SHA1). > > > > > > I am unsure how you can check if a dirty bitmap contains errors, or is out of > > date? > > > > > > Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always > > related to a single image. > > > (consider the user remove the latest backup from the backup target). > > > > One disadvantage of Merkle trees is that the client becomes stateful - the client > > needs to store its own Merkle tree and this requires fancier client-side code. > > What 'client' do you talk about here? A backup application, for example. Previously it could simply use api.getDirtyBlocks() -> [Sector] and it would translate into a single QMP API call. Now a Merkle tree needs to be stored on the client side and synced with the server. The client-side library becomes more complex. > But sure, the code gets more complex, and needs considerable amount of RAM > to store the hash keys . > > > It is also more expensive to update hashes than a dirty bitmap. Not because you > > need to hash data but because a small write (e.g. 1 sector) requires that you > > read the surrounding sectors to recompute a hash for the cluster. Therefore you > > can expect worse guest I/O performance than with a dirty bitmap. > > There is no need to update any hash - You only need to do that on backup - in fact, all > those things can be done by the backup driver. The problem is that if you leave hash calculation until backup time then you need to read in the entire disk image (100s of GB) from disk. That is slow and drains I/O resources. Maybe the best approach is to maintain a dirty bitmap while the guest is running, which is fairly cheap. Then you can use the dirty bitmap to only hash modified clusters when building the Merkle tree - this avoids reading the entire disk image. > > I still think it's a cool idea. Making it work well will require a lot more effort than > > a dirty bitmap. > > How do you re-generate a dirty bitmap after a server crash? The dirty bitmap is invalid after crash. A full backup is required, all clusters are considered dirty. The simplest way to implement this is to mark the persistent bitmap "invalid" upon the first guest write. When QEMU is terminated cleanly, flush all dirty bitmap updates to disk and then mark the file "valid" again. If QEMU finds the file is "invalid" on startup, start from scratch. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command 2013-05-24 8:38 ` Stefan Hajnoczi @ 2013-05-24 9:53 ` Dietmar Maurer 0 siblings, 0 replies; 24+ messages in thread From: Dietmar Maurer @ 2013-05-24 9:53 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, Paolo Bonzini, Wenchao Xia > Maybe the best approach is to maintain a dirty bitmap while the guest is > running, which is fairly cheap. Then you can use the dirty bitmap to only hash > modified clusters when building the Merkle tree - this avoids reading the > entire disk image. Yes, this is an good optimization. > > > I still think it's a cool idea. Making it work well will require a > > > lot more effort than a dirty bitmap. > > > > How do you re-generate a dirty bitmap after a server crash? > > The dirty bitmap is invalid after crash. A full backup is required, all clusters > are considered dirty. > > The simplest way to implement this is to mark the persistent bitmap "invalid" > upon the first guest write. When QEMU is terminated cleanly, flush all dirty > bitmap updates to disk and then mark the file "valid" > again. If QEMU finds the file is "invalid" on startup, start from scratch. Or you can compared the hash keys in that case? Although I guess computing all those SHA1 checksums needs a considerable amount of CPU time. ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2013-09-03 7:54 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-09-02 12:57 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Benoît Canet 2013-09-03 7:54 ` Stefan Hajnoczi -- strict thread matches above, loose matches on Subject: below -- 2013-05-15 14:34 Stefan Hajnoczi 2013-05-16 6:16 ` Wenchao Xia 2013-05-16 7:47 ` Stefan Hajnoczi 2013-05-17 6:58 ` Wenchao Xia 2013-05-17 9:14 ` Stefan Hajnoczi 2013-05-21 3:25 ` Wenchao Xia 2013-05-21 7:34 ` Stefan Hajnoczi 2013-05-17 10:17 ` Paolo Bonzini 2013-05-20 6:24 ` Stefan Hajnoczi 2013-05-20 7:23 ` Paolo Bonzini 2013-05-21 7:31 ` Stefan Hajnoczi 2013-05-21 8:30 ` Paolo Bonzini 2013-05-21 10:34 ` Stefan Hajnoczi 2013-05-21 10:36 ` Paolo Bonzini 2013-05-21 10:58 ` Dietmar Maurer 2013-05-22 13:43 ` Stefan Hajnoczi 2013-05-22 15:10 ` Dietmar Maurer 2013-05-22 15:34 ` Dietmar Maurer 2013-05-23 8:04 ` Stefan Hajnoczi 2013-05-23 8:11 ` Dietmar Maurer 2013-05-24 8:38 ` Stefan Hajnoczi 2013-05-24 9:53 ` Dietmar Maurer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.