All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leo Luan <leoluan@gmail.com>
To: John Snow <jsnow@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, Qemu-block <qemu-block@nongnu.org>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: Avoid copying unallocated clusters during full backup
Date: Fri, 17 Apr 2020 15:31:06 -0700	[thread overview]
Message-ID: <CAD_3s9LdScjB8ySrGGn9i-2PiDpby4YZo558PXZV4TaMip6pvQ@mail.gmail.com> (raw)
In-Reply-To: <ba8ff0c2-2e56-c8d7-a13a-4af48372f172@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4766 bytes --]

On Fri, Apr 17, 2020 at 1:11 PM John Snow <jsnow@redhat.com> wrote:

>
>
> On 4/17/20 2:33 PM, Leo Luan wrote:
> > When doing a full backup from a single layer qcow2 disk file to a new
> > qcow2 file, the backup_run function does not unset unallocated parts in
> > the copy bit map.  The subsequent backup_loop call goes through these
> > unallocated clusters unnecessarily.  In the case when the target and
> > source reside in different file systems, an EXDEV error would cause
> > zeroes to be actually copied into the target and that causes a target
> > file size explosion to the full virtual disk size.
> >
>
> I think the idea, generally, is to leave the detection of unallocated
> portions to the format (qcow2) and the protocol (posix file) respectively.
>
> As far as I know, it is incorrect to assume that unallocated data
> can/will/should always be read as zeroes; so it may not be the case that
> it is "safe" to skip this data, because the target may or may not need
> explicit zeroing.
>

Thanks for pointing this out.  Would it be safe to skip unallocated
clusters if both source and target's bdrv_unallocated_blocks_are_zero()
returns true?

> This patch aims to unset the unallocated parts in the copy bitmap when
> > it is safe to do so, thereby avoid dealing with unallocated clusters in
> > the backup loop to prevent significant performance or storage efficiency
> > impacts when running full backup jobs.
> >
> > Any insights or corrections?
> >
> > diff --git a/block/backup.c b/block/backup.c
> > index cf62b1a38c..609d551b1e 100644
> > --- a/block/backup.c
> > +++ b/block/backup.c
> > @@ -139,6 +139,29 @@ static void backup_clean(Job *job)
> >      bdrv_backup_top_drop(s->backup_top);
> >  }
> >
> > +static bool backup_ok_to_skip_unallocated(BackupBlockJob *s)
> > +{
> > +    /* Checks whether this backup job can avoid copying or dealing with
> > +       unallocated clusters in the backup loop and their associated
> > +       performance and storage effciency impacts. Check for the
> condition
> > +       when it's safe to skip copying unallocated clusters that allows
> the
> > +       corresponding bits in the copy bitmap to be unset.  The
> assumption
> > +       here is that it is ok to do so when we are doing a full backup,
> > +       the target file is a qcow2, and the source is single layer.
> > +       Do we need to add additional checks (so that it does not break
> > +       something) or add addtional conditions to optimize additional use
> > +       cases?
> > +     */
> > +
> > +    if (s->sync_mode == MIRROR_SYNC_MODE_FULL &&
> > +       s->bcs->target->bs->drv != NULL &&
> > +       strncmp(s->bcs->target->bs->drv->format_name, "qcow2", 5) == 0 &&
> > +       s->bcs->source->bs->backing_file[0] == '\0')
>
> This isn't going to suffice upstream; the backup job can't be performing
> format introspection to determine behavior on the fly.
>
> I think what you're really after is something like
> bdrv_unallocated_blocks_are_zero().
>

Thanks for this pointer.


>
> > +       return true;
> > +    else
> > +        return false;
> > +}
> > +
> >  void backup_do_checkpoint(BlockJob *job, Error **errp)
> >  {
> >      BackupBlockJob *backup_job = container_of(job, BackupBlockJob,
> common);
> > @@ -248,7 +271,7 @@ static int coroutine_fn backup_run(Job *job, Error
> > **errp)
> >
> >      backup_init_copy_bitmap(s);
> >
> > -    if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
> > +    if (s->sync_mode == MIRROR_SYNC_MODE_TOP ||
>
> So the basic premise is that if you are copying a qcow2 file and the
> unallocated portions as defined by the qcow2 metadata are zero, it's
> safe to skip those, so you can treat it like SYNC_MODE_TOP.
>

In the MIRROR_SYNC_MODE_TOP case, the check for unallocated clusters does
not go all the way to the base level.  So it would be incorrect to treat
the MIRROR_SYNC_MODE_FULL the same as MIRROR_SYNC_MODE_TOP unless the
source does not have a backing and the base itself.  That's why I added a
check for the backing_file field of the source.  I guess the code can be
potentially extended with a new flag to do the block status check all the
way into the base level for the case of the FULL mode?

I think you *also* have to know if the *source* needs those regions
> explicitly zeroed, and it's not always safe to just skip them at the
> manifest level.
>

Do you mean some operation changing the target into non-sparse?

>
> I thought there was code that handled this to some extent already, but I
> don't know. I think Vladimir has worked on it recently and can probably
> let you know where I am mistaken :)
>

Thanks for the reply!


> --js
>
> > backup_ok_to_skip_unallocated(s)) {
> >          int64_t offset = 0;
> >          int64_t count;
> >
>
> John Snow

[-- Attachment #2: Type: text/html, Size: 11924 bytes --]

      parent reply	other threads:[~2020-04-17 22:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-17 18:33 Avoid copying unallocated clusters during full backup Leo Luan
2020-04-17 20:11 ` John Snow
2020-04-17 20:24   ` Eric Blake
2020-04-17 22:57     ` Leo Luan
2020-04-18  0:34       ` John Snow
2020-04-18  1:43         ` Leo Luan
2020-04-20 10:56           ` Vladimir Sementsov-Ogievskiy
2020-04-20 14:31             ` Bryan S Rosenburg
2020-04-20 15:04               ` Vladimir Sementsov-Ogievskiy
2020-04-21 14:41                 ` Bryan S Rosenburg
2020-04-17 22:31   ` Leo Luan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD_3s9LdScjB8ySrGGn9i-2PiDpby4YZo558PXZV4TaMip6pvQ@mail.gmail.com \
    --to=leoluan@gmail.com \
    --cc=jsnow@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.