On Fri, Apr 17, 2020 at 5:34 PM John Snow wrote: > > > On 4/17/20 6:57 PM, Leo Luan wrote: > > On Fri, Apr 17, 2020 at 1:24 PM Eric Blake > > wrote: > > > > On 4/17/20 3:11 PM, John Snow wrote: > > > > >> + > > >> + if (s->sync_mode == MIRROR_SYNC_MODE_FULL && > > >> + s->bcs->target->bs->drv != NULL && > > >> + strncmp(s->bcs->target->bs->drv->format_name, "qcow2", 5) > > == 0 && > > >> + s->bcs->source->bs->backing_file[0] == '\0') > > > > > > This isn't going to suffice upstream; the backup job can't be > > performing > > > format introspection to determine behavior on the fly. > > > > Agreed. The idea is right (we NEED to make backup operations smarter > > based on knowledge about both source and destination block status), > but > > the implementation is not (a check for strcncmp("qcow2") is not > ideal). > > > > > > I see/agree that using strncmp("qcow2") is not general enough for the > > upstream. Would changing it to bdrv_unallocated_blocks_are_zero() > suffice? > > > > I don't know, to be really honest with you. Vladimir reworked the backup > code recently and Virtuozzo et al have shown a very aggressive interest > in optimizing the backup loop. I haven't really worked on that code > since their rewrite. > > Dropping unallocated regions from the backup manifest is one strategy, > but I think there will be cases where we won't be able to treat it like > "TOP", but may still have unallocated regions we don't want to copy (We > have a backing file which is itself unallocated.) > > I'm interested in a more general purpose mechanism for efficient > copying. I think that instead of the backup job itself doing this in > backup.c by populating the copy manifest, that it's also appropriate to > try to copy every last block and have the backup loop implementation > decide it doesn't actually need to copy that block. > > That way, the copy optimizations can be shared by any implementation > that needs to do efficient copying, and we can avoid special format and > graph-inspection code in the backup job main interface code. > > To be clear, I see these as identical amounts of work: > > - backup job runs a loop to inspect every cluster to see if it is > allocated or not, and modifies its cluster backup manifest accordingly > This inspection can detect more than 1GB of unallocated (64KB) clusters per loop and it's a shallower path. > > - backup job loops through the entire block and calls a smart_copy() > function that might degrade into a no-op if the right conditions are met > (source is unallocated, explicit zeroes are not needed on the destination) > If I am not mistaken, the copy loop does one cluster per iteration using a twice deeper call path (trying to copy and eventually finding unallocated clusters). So with 64KB cluster size, it's 2 * 1G/64K ~= 32 million times less efficient with the CPU cycles for large sparse virtual disks. > > Either way, you're looping and interrogating the disk, but in one case > the efficiencies go deeper than *just* the backup code. > I think the early stop of inefficiency can help minimize the CPU impact of the backup job on the VM instance. > I think Vladimir has put a lot of work into making the backup code > highly optimized, so I would consult with him to find out where the best > place to put new optimizations are, if any -- he'll know! > Yes, hope that he will chime in. Thanks! > > --js > > > > > > > > > > I think what you're really after is something like > > > bdrv_unallocated_blocks_are_zero(). > > > > The fact that qemu-img already has a lot of optimizations makes me > > wonder what we can salvage from there into reusable code that both > > qemu-img and block backup can share, so that we're not reimplementing > > block status handling in multiple places. > > > > > > A general fix reusing some existing code would be great. When will it > > appear in the upstream? We are hoping to avoid needing to use a private > > branch if possible. > > > > > > > So the basic premise is that if you are copying a qcow2 file and > the > > > unallocated portions as defined by the qcow2 metadata are zero, > it's > > > safe to skip those, so you can treat it like SYNC_MODE_TOP. > > > > > > I think you *also* have to know if the *source* needs those regions > > > explicitly zeroed, and it's not always safe to just skip them at > the > > > manifest level. > > > > > > I thought there was code that handled this to some extent already, > > but I > > > don't know. I think Vladimir has worked on it recently and can > > probably > > > let you know where I am mistaken :) > > > > Yes, I'm hoping Vladimir (or his other buddies at Virtuozzo) can > chime > > in. Meanwhile, I've working on v2 of some patches that will improve > > qemu's ability to tell if a destination qcow2 file already reads as > all > > zeroes, and we already have bdrv_block_status() for telling which > > portions of a source image already read as all zeroes (whether or > > not it > > is due to not being allocated, the goal here is that we should NOT > have > > to copy anything that reads as zero on the source over to the > > destination if the destination already starts life as reading all > zero). > > > > > > Can the eventual/optimal solution allow unallocated clusters to be > > skipped entirely in the backup loop and make the detection of allocated > > zeroes an option, not forcing the backup thread to loop through a > > potentially huge empty virtual disk? > > > > I mean, using the TOP code is doing the same thing, really: it's looking > at allocation status and marking those blocks as "already copied", more > or less. > > > > > And if nothing else, qemu 5.0 just added 'qemu-img convert > > --target-is-zero' as a last-ditch means of telling qemu to assume the > > destination reads as all zeroes, even if it cannot quickly prove it; > we > > probably want to add a similar knob into the QMP commands for > > initiating > > block backup, for the same reasons. > > > > > > This seems a good way of assuring the status of the target file. > > > > Thanks! > > > >