On 20.06.19 20:44, John Snow wrote:
> 
> 
> On 6/20/19 1:00 PM, Max Reitz wrote:
>> On 20.06.19 03:03, John Snow wrote:
>>> This adds an "always" policy for bitmap synchronization. Regardless of if
>>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>>> that for backups that fail part-way through, the bitmap retains a record of
>>> which sectors need to be copied out to accomplish a new backup using the
>>> old, partial result.
>>>
>>> In effect, this allows us to "resume" a failed backup; however the new backup
>>> will be from the new point in time, so it isn't a "resume" as much as it is
>>> an "incremental retry." This can be useful in the case of extremely large
>>> backups that fail considerably through the operation and we'd like to not waste
>>> the work that was already performed.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>  qapi/block-core.json |  5 ++++-
>>>  block/backup.c       | 10 ++++++----
>>>  2 files changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 0332dcaabc..58d267f1f5 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -1143,6 +1143,9 @@
>>>  # An enumeration of possible behaviors for the synchronization of a bitmap
>>>  # when used for data copy operations.
>>>  #
>>> +# @always: The bitmap is always synchronized with remaining blocks to copy,
>>> +#          whether or not the operation has completed successfully or not.
>>> +#
>>>  # @conditional: The bitmap is only synchronized when the operation is successul.
>>>  #               This is useful for Incremental semantics.
>>>  #
>>> @@ -1153,7 +1156,7 @@
>>>  # Since: 4.1
>>>  ##
>>>  { 'enum': 'BitmapSyncMode',
>>> -  'data': ['conditional', 'never'] }
>>> +  'data': ['always', 'conditional', 'never'] }
>>>  
>>>  ##
>>>  # @MirrorCopyMode:
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 627f724b68..beb2078696 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>>      BlockDriverState *bs = blk_bs(job->common.blk);
>>>  
>>>      if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>>> -        /* Failure, or we don't want to synchronize the bitmap.
>>> -         * Merge the successor back into the parent, delete nothing. */
>>> +        /* Failure, or we don't want to synchronize the bitmap. */
>>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
>>
>> Hmm...  OK, bitmaps in backup always confuse me, so bear with me, please.
>>
> 
> I realize this is an extremely dense section that actually covers a
> *lot* of pathways.
> 
>> (Hi, I’m a time traveler from the end of this section and I can tell you
>> that everything is fine.  I was just confused.  I’ll still keep this
>> here, because it was so much work.)
>>
>> The copy_bitmap is copied from the sync_bitmap at the beginning, so the
>> sync_bitmap can continue to be dirtied, but that won’t affect the job.
>> In normal incremental mode, this means that the sync point is always at
>> the beginning of the job.  (Well, naturally, because that’s how backup
>> is supposed to go.)
>>
> 
> sync_bitmap: This is used as an initial manifest for which sectors to
> copy out. It is the user-provided bitmap. We actually *never* edit this
> bitmap in the body of the job.
> 
> copy_bitmap: This is the manifest for which blocks remain to be copied
> out. We clear bits in this as we go, because we use it as our loop
> condition.
> 
> So what you say is actually only half-true: the sync_bitmap actually
> remains static during the duration of the job, and it has an anonymous
> child that accrues new writes. This is a holdover from before we had a
> copy_bitmap, and we used to use a sync_bitmap directly as our loop
> condition.
> 
> (This could be simplified upstream at present; but after this patch it
> cannot be for reasons explained below. We do wish to maintain three
> distinct sets of bits:
> 1. The bits at the start of the operation,
> 2. The bits accrued during the operation, and
> 3. The bits that remain to be, or were not, copied during the operation.)
> 
> So there's actually three bitmaps:
> 
> - sync_bitmap: actually just static and read-only
> - sync_bitmap's anonymous child: accrues new writes.

Ah, right...  Thanks for writing that up.

> - copy_bitmap: loop conditional.
> 
>> But then replacing the sync_bitmap with the copy_bitmap here means that
>> all of these dirtyings that happened during the job are lost.  Hmm, but
>> that doesn’t matter, does it?  Because whenever something was dirtied in
>> sync_bitmap, the corresponding area must have been copied to the backup
>> due to the job.
>>
> 
> The new dirty bits were accrued very secretly in the anonymous child.
> The new dirty bits are merged in via the reclaim() function.
> 
> So, what happens is:
> 
> - Sync_bitmap gets the bit pattern of copy_bitmap (one way or another)
> - Sync_bitmap reclaims (merges with) its anonymous child.
> 
>> Ah, yes, it would actually be wrong to keep the new dirty bits, because
>> in this mode, sync_bitmap should (on failure) reflect what is left to
>> copy to make the backup complete.  Copying these newly dirtied sectors
>> would be wrong.  (Yes, I know you wrote that in the documentation of
>> @always.  I just tried to get a different perspective.)
>>
>> Yes, yes, and copy_bitmap is always set whenever a CBW to the target
>> fails before the source can be updated.  Good, good.
>>
> 
> You might have slightly the wrong idea; it's important to keep track of
> what was dirtied during the operation because that data is important for
> the next bitmap backup.
> 
> The merging of "sectors left to copy" (in the case of a failed backup)
> and "sectors dirtied since we started the operation" forms the actual
> minimal set needed to re-write to this target to achieve a new
> functioning point in time. This is what you get with the "always" mode
> in a failure case.
> 
> In a success case, it just so happens that "sectors left to copy" is the
> empty set.
> 
> It's like an incremental on top of the incremental.
> 
> Consider this:
> 
> We have a 4TB drive and we have dirtied 3TB of it since our full backup.
> We copy out 2TB as part of a new incremental backup before suffering
> some kind of failure.
> 
> Today, you'd need to start a new incremental backup that copies that
> entire 3TB *plus* whatever was dirtied since the job failed.
> 
> With this mode, you'd only need to copy the remaining 1TB + whatever was
> dirtied since.
> 
> So, what this logic is really doing is:
> 
> If we failed, OR if we want the "never" sync policy:
> 
> Merge the anonymous child (bits written during op) back into sync_bitmap
> (bits we were instructed to copy), leaving us as if we have never
> started this operation.
> 
> If, however, we failed and we have the "always" sync policy, we destroy
> the sync_bitmap (bits we were instructed to copy) and replace it with
> the copy_bitmap (bits remaining to copy). Then, we merge that with the
> anonymous child (bits written during op).

Oh, so that’s the way it works.  I thought “always” meant that you can
repeat the backup.  But it just means you keep your partial backup and
pretend it’s a full incremental one.

Now that I think about it again...  Yeah, you can’t repeat a backup at a
later point, of course.  If data is gone in the meantime, it’s gone.

So, uh, I was wrong that it’s all good, because it would have been
wrong?  But thankfully I was just wrong myself, and so it is all good
after all?  My confusion with bitmaps as lifted, now I’m just confused
with myself.

I revoke my R-b and give a new one:

Reviewed-by: Max Reitz <mreitz@redhat.com>

Or something like that.

Again, thanks a lot for clarifying.

Max

> Or, in success cases (when sync policy is not never), we simply delete
> the sync_bitmap (bits we were instructed to copy) and replace it with
> its anonymous child (bits written during op).