Re: Avoid copying unallocated clusters during full backup

From: Leo Luan <leoluan@gmail.com>
To: John Snow <jsnow@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, Qemu-block <qemu-block@nongnu.org>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: Avoid copying unallocated clusters during full backup
Date: Fri, 17 Apr 2020 18:43:04 -0700	[thread overview]
Message-ID: <CAD_3s9+pSepV76-wp_AQTB-EQGwmga1FS+fT-Cb4n0Z48Zhxpg@mail.gmail.com> (raw)
In-Reply-To: <93052f9b-6539-0d4a-c922-fff7618c542d@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6347 bytes --]

On Fri, Apr 17, 2020 at 5:34 PM John Snow <jsnow@redhat.com> wrote:

>
>
> On 4/17/20 6:57 PM, Leo Luan wrote:
> > On Fri, Apr 17, 2020 at 1:24 PM Eric Blake <eblake@redhat.com
> > <mailto:eblake@redhat.com>> wrote:
> >
> >     On 4/17/20 3:11 PM, John Snow wrote:
> >
> >     >> +
> >     >> +    if (s->sync_mode == MIRROR_SYNC_MODE_FULL &&
> >     >> +       s->bcs->target->bs->drv != NULL &&
> >     >> +       strncmp(s->bcs->target->bs->drv->format_name, "qcow2", 5)
> >     == 0 &&
> >     >> +       s->bcs->source->bs->backing_file[0] == '\0')
> >     >
> >     > This isn't going to suffice upstream; the backup job can't be
> >     performing
> >     > format introspection to determine behavior on the fly.
> >
> >     Agreed.  The idea is right (we NEED to make backup operations smarter
> >     based on knowledge about both source and destination block status),
> but
> >     the implementation is not (a check for strcncmp("qcow2") is not
> ideal).
> >
> >
> > I see/agree that using strncmp("qcow2") is not general enough for the
> > upstream.  Would changing it to bdrv_unallocated_blocks_are_zero()
> suffice?
> >
>
> I don't know, to be really honest with you. Vladimir reworked the backup
> code recently and Virtuozzo et al have shown a very aggressive interest
> in optimizing the backup loop. I haven't really worked on that code
> since their rewrite.
>
> Dropping unallocated regions from the backup manifest is one strategy,
> but I think there will be cases where we won't be able to treat it like
> "TOP", but may still have unallocated regions we don't want to copy (We
> have a backing file which is itself unallocated.)
>
> I'm interested in a more general purpose mechanism for efficient
> copying. I think that instead of the backup job itself doing this in
> backup.c by populating the copy manifest, that it's also appropriate to
> try to copy every last block and have the backup loop implementation
> decide it doesn't actually need to copy that block.
>
> That way, the copy optimizations can be shared by any implementation
> that needs to do efficient copying, and we can avoid special format and
> graph-inspection code in the backup job main interface code.
>
> To be clear, I see these as identical amounts of work:
>
> - backup job runs a loop to inspect every cluster to see if it is
> allocated or not, and modifies its cluster backup manifest accordingly
>

This inspection can detect more than 1GB of unallocated (64KB) clusters per
loop and it's a shallower path.

>
> - backup job loops through the entire block and calls a smart_copy()
> function that might degrade into a no-op if the right conditions are met
> (source is unallocated, explicit zeroes are not needed on the destination)
>

If I am not mistaken, the copy loop does one cluster per iteration using a
twice deeper call path (trying to copy and eventually finding unallocated
clusters).  So with 64KB cluster size, it's 2 * 1G/64K ~= 32 million times
less efficient with the CPU cycles for large sparse virtual disks.

>
> Either way, you're looping and interrogating the disk, but in one case
> the efficiencies go deeper than *just* the backup code.
>

I think the early stop of inefficiency can help minimize the CPU impact of
the backup job on the VM instance.

> I think Vladimir has put a lot of work into making the backup code
> highly optimized, so I would consult with him to find out where the best
> place to put new optimizations are, if any -- he'll know!
>

Yes, hope that he will chime in.

Thanks!

>
> --js
>
>
> >
> >     >
> >     > I think what you're really after is something like
> >     > bdrv_unallocated_blocks_are_zero().
> >
> >     The fact that qemu-img already has a lot of optimizations makes me
> >     wonder what we can salvage from there into reusable code that both
> >     qemu-img and block backup can share, so that we're not reimplementing
> >     block status handling in multiple places.
> >
> >
> > A general fix reusing some existing code would be great.  When will it
> > appear in the upstream?  We are hoping to avoid needing to use a private
> > branch if possible.
> >
> >
> >     > So the basic premise is that if you are copying a qcow2 file and
> the
> >     > unallocated portions as defined by the qcow2 metadata are zero,
> it's
> >     > safe to skip those, so you can treat it like SYNC_MODE_TOP.
> >     >
> >     > I think you *also* have to know if the *source* needs those regions
> >     > explicitly zeroed, and it's not always safe to just skip them at
> the
> >     > manifest level.
> >     >
> >     > I thought there was code that handled this to some extent already,
> >     but I
> >     > don't know. I think Vladimir has worked on it recently and can
> >     probably
> >     > let you know where I am mistaken :)
> >
> >     Yes, I'm hoping Vladimir (or his other buddies at Virtuozzo) can
> chime
> >     in.  Meanwhile, I've working on v2 of some patches that will improve
> >     qemu's ability to tell if a destination qcow2 file already reads as
> all
> >     zeroes, and we already have bdrv_block_status() for telling which
> >     portions of a source image already read as all zeroes (whether or
> >     not it
> >     is due to not being allocated, the goal here is that we should NOT
> have
> >     to copy anything that reads as zero on the source over to the
> >     destination if the destination already starts life as reading all
> zero).
> >
> >
> > Can the eventual/optimal solution allow unallocated clusters to be
> > skipped entirely in the backup loop and make the detection of allocated
> > zeroes an option, not forcing the backup thread to loop through a
> > potentially huge empty virtual disk?
> >
>
> I mean, using the TOP code is doing the same thing, really: it's looking
> at allocation status and marking those blocks as "already copied", more
> or less.
>
> >
> >     And if nothing else, qemu 5.0 just added 'qemu-img convert
> >     --target-is-zero' as a last-ditch means of telling qemu to assume the
> >     destination reads as all zeroes, even if it cannot quickly prove it;
> we
> >     probably want to add a similar knob into the QMP commands for
> >     initiating
> >     block backup, for the same reasons.
> >
> >
> > This seems a good way of assuring the status of the target file.
> >
> > Thanks!
> >
>
>

[-- Attachment #2: Type: text/html, Size: 8568 bytes --]