From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41000) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aTAcR-0002pg-4D for qemu-devel@nongnu.org; Tue, 09 Feb 2016 10:54:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aTAcM-0007Mq-SR for qemu-devel@nongnu.org; Tue, 09 Feb 2016 10:54:23 -0500 Received: from mx2.parallels.com ([199.115.105.18]:53615) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aTAcM-0007Ml-Iq for qemu-devel@nongnu.org; Tue, 09 Feb 2016 10:54:18 -0500 Message-ID: <56BA0BA0.2060302@virtuozzo.com> Date: Tue, 9 Feb 2016 18:54:08 +0300 From: Vladimir Sementsov-Ogievskiy MIME-Version: 1.0 References: <1453482459-80179-1-git-send-email-vsementsov@virtuozzo.com> <56B4FC80.1070402@redhat.com> <56B5BA9B.8050109@virtuozzo.com> <56B90550.7080008@redhat.com> In-Reply-To: <56B90550.7080008@redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC] external backup api List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Snow , qemu-devel@nongnu.org Cc: famz@redhat.com, den@virtuozzo.com, Stefan Hajnoczi On 09.02.2016 00:14, John Snow wrote: > > On 02/06/2016 04:19 AM, Vladimir Sementsov-Ogievskiy wrote: >> On 05.02.2016 22:48, John Snow wrote: >>> On 01/22/2016 12:07 PM, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi all. >>>> >>>> This is the early begin of the series which aims to add external backup >>>> api. This is needed to allow backup software use our dirty bitmaps. >>>> >>>> Vmware and Parallels Cloud Server have this feature. >>>> >>> Have a link to the equivalent feature that VMWare exposes? (Or Parallels >>> Cloud Server) ... I'm curious about what the API there looks like. >> For VMware you need their Virtual Disk Api Programming Guide >> http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vddk60_programming.pdf >> > Great, thanks! > >> Look at Changed Block Tracking (CBT) , Backup and Restore. >> >> For PCS here is part of SDK header, related to the topic: >> >> ==================================== >> /* >> * Builds a map of the disk contents changes between 2 PITs. >> Parameters >> hDisk : A handle of type PHT_VIRTUAL_DISK identifying >> the virtual disk. >> sPit1Uuid : Uuid of the older PIT. >> sPit2Uuid : Uuid of the later PIT. >> phMap : A pointer to a variable which receives the >> result (a handle of type PHT_VIRTUAL_DISK_MAP). >> Returns >> PRL_RESULT. >> */ >> PRL_METHOD_DECL( PARALLELS_API_VER_5, >> PrlDisk_GetChangesMap_Local, ( >> PRL_HANDLE hDisk, >> PRL_CONST_STR sPit1Uuid, >> PRL_CONST_STR sPit2Uuid, >> PRL_HANDLE_PTR phMap) ); >> > Effectively giving you a dirty bitmap diff between two snapshots. > Something we don't currently genuinely support in QEMU. Just start dirty bitmap at point a and stop at point b.. > >> /* >> * Reports the number of significant bits in the map. >> Parameters >> hMap : A handle of type PHT_VIRTUAL_DISK_MAP identifying >> the changes map. >> phSize : A pointer to a variable which receives the >> result. >> Returns >> PRL_RESULT. >> */ >> PRL_METHOD_DECL( PARALLELS_API_VER_5, >> PrlDiskMap_GetSize, ( >> PRL_HANDLE hMap, >> PRL_UINT32_PTR pnSize) ); >> > I assume this is roughly the dirty bit count, for us, this would be > dirty clusters. (Or whatever granularity you specified, but usually > clusters.) > >> /* >> * Reports the size (in bytes) of a block mapped by a single bit >> * in the map. >> Parameters >> hMap : A handle of type PHT_VIRTUAL_DISK_MAP identifying >> the changes map. >> phSize : A pointer to a variable which receives the >> result. >> Returns >> PRL_RESULT. >> */ >> PRL_METHOD_DECL( PARALLELS_API_VER_5, >> PrlDiskMap_GetGranularity, ( >> PRL_HANDLE hMap, >> PRL_UINT32_PTR pnSize) ); >> > Basically a granularity query. > >> /* >> * Returns bits from the blocks map. >> Parameters >> hMap : A handle of type PHT_VIRTUAL_DISK_MAP identifying >> the changes map. >> pBuffer : A pointer to a store. >> pnCapacity : A pointer to a variable holding the size >> of the buffer and receiving the number of >> bytes actually written. >> Returns >> PRL_RESULT. >> */ >> PRL_METHOD_DECL( PARALLELS_API_VER_5, >> PrlDiskMap_Read, ( >> PRL_HANDLE hMap, >> PRL_VOID_PTR pBuffer, >> PRL_UINT32_PTR pnCapacity) ); >> > And this would be a direct bitmap query. > > Is the expected usage here that the third party client will use this > bitmap to read the source image? Or do you query for the data from API? - from API. > > I think the thought among block devs would be to opt for more of the > second option, and less allowing clients to directly interface with the > image files. > >> ======================================= >> >> >>>> There is only one patch here, about querying dirty bitmap from qemu by >>>> qmp command. It is just an updated and clipped (hmp command removed) old >>>> my patch "[PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap". >>>> >>>> Before writing the whole thing I'd like to discuss the details. Or, may >>>> be there are existing plans on this topic, or may be someone already >>>> works on it? >>>> >>>> I see it like this: >>>> >>>> ===== >>>> >>>> - add qmp commands for dirty-bitmap functions: create_successor, >>>> abdicate, >>>> reclaime. >>> Hm, why do we need such low-level control over splitting and merging >>> bitmaps from an external client? >>> >>>> - make create-successor command transaction-able >>>> - add query-block-dirty-bitmap qmp command >>>> >>>> then, external backup: >>>> >>>> qmp transaction { >>>> external-snapshot >>>> bitmap-create-successor >>>> } >>>> >>>> qmp query frozen bitmap, not acquiring aio context. >>>> >>>> do external backup, using snapshot and bitmap >>>> >>>> if (success backup) >>>> qmp bitmap-abdicate >>>> else >>>> qmp bitmap-reclaime >>>> >>>> qmp merge snapshot >>>> ===== >>>> >>> Hm, I see -- so you're hoping to manage the backup *entirely* >>> externally, so you want to be able to reach inside of QEMU and control >>> some status conditions to guarantee it'll be safe. >>> >>> I'm not convinced QEMU can guarantee such things -- due to various flush >>> properties, race conditions on write, etc. QEMU handles all of this >>> internally in a non-public way at the moment. >> Hm, can you be more concrete? What operations are dangerous? We can do >> them in paused state for example. >> > I suppose if you're going to pause the VM, then it should be reasonably > safe, but recently there have been endeavors to augment the .qcow2 > format to prohibit concurrent access, which might include a paused VM as > well, I'm not clear on the implementation. > > If you do it via paused only, then you also don't need to expose the > freeze/rollback mechanisms: the existing clear mechanism alone is > sufficient: > > (A) The frozen backup fails. Nothing new has been written, so we don't > need to adjust anything, we can just try again. > (B) The frozen backup succeeds. We can just clear the bitmap before > unfreezing. We can't query bitmap in paused state - it may take too much time. > > I definitely have reservations about using this as a live fleecing > mechanism -- the backup block job uses a write-notifier to make > just-in-time backups of data before it is altered, leaving it the only > "safe" live backup mechanism in QEMU currently. (Alongside mirror.) > > I actually have some patches from Fam to introduce a live fleecing > mechanism into QEMU (The idea being you create a point-in-time drive you > can get data from via NBD, then delete it when done) that might be more > appropriate, but I ran into a lot of problems with the patch. I'll post > the WIP for that patch to try to solicit comments on the best way forward. After adding ============= --- a/block.c +++ b/block.c @@ -1276,6 +1276,9 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd) /* Otherwise we won't be able to commit due to check in bdrv_commit */ bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET, bs->backing_blocker); + + bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE, + bs->backing_blocker); out: bdrv_refresh_limits(bs, NULL); } ============== and tiny fix for qemu_io interface in iotest Fam's "qemu-iotests: Image fleecing test case 089" works for me. Isn't it enough? > > Otherwise, My biggest question here is: > "What does fleecing a backup externally provide as a benefit over > backing up to an NBD target?" Look at our answers on v2 of these series: On 05.02.2016 11:28, Denis V. Lunev wrote: > On 02/03/2016 11:14 AM, Fam Zheng wrote: >> On Sat, 01/30 13:56, Vladimir Sementsov-Ogievskiy wrote: >>> Hi all. >>> >>> These series which aims to add external backup api. This is needed >>> to allow >>> backup software use our dirty bitmaps. >>> >>> Vmware and Parallels Cloud Server have this feature. >> What is the advantage of this appraoch over "drive-backup >> sync=incremental >> ..."? > > This will allow third-party vendors to backup QEMU VMs into > their own formats or to the cloud etc. > > You can already today perform incremental backups to an NBD target to > copy the data out via an external mechanism, is this not sufficient for > Parallels? If not, why? > >>>> In the following patch query-bitmap acquires aio context. This must be >>>> ofcourse dropped for frozen bitmap. >>>> But to make it in true way, I think, I should check somehow that this is >>>> not just frozen bitmap, but the bitmap frozen by qmp command, to avoid >>>> incorrect quering of bitmap frozen by internal backup (or other >>>> mechanizm).. May be, it is not necessary. >>>> >>>> >>>> >>> -- Best regards, Vladimir