On 11.07.19 05:21, John Snow wrote:
> 
> On 7/10/19 4:46 PM, Max Reitz wrote:
>> On 10.07.19 21:00, John Snow wrote:
>>> On 7/10/19 1:14 PM, Max Reitz wrote:
>>>> On 10.07.19 03:05, John Snow wrote:
>>>>
>>>> Hm.  How useful is bitmap support for 'top' then, anyway?  That means
>>>> that if you want to resume a top backup, you always have to resume it
>>>> like it was a full backup.  Which sounds kind of useless.
>>>>
>>>> Max
>>>>
>>>
>>> Good point!
>>>
>>> I think this can be fixed by doing an initialization pass of the
>>> copy_bitmap when sync=top to set only the allocated regions in the bitmap.
>>>
>>> This means that the write notifier won't copy out regions that are
>>> written to that weren't already in the top layer. I believe this is
>>> actually a bugfix; the data we'd copy out in such cases is actually in
>>> the backing layer and shouldn't be copied with sync=top.
>>
>> Now that you mention it...  I didn’t realize that.  Yes, you’re right.
>>
>>> So this would have two effects:
>>> (1) sync=top gets a little more judicious about what it copies out on
>>> sync=top, and
>>> (2) the bitmap return value is more meaningful again.
>>>
> 
> This might be harder than I first thought.
> 
> initializing the copy_bitmap generally happens before we install the
> write notifier, which means that it occurs before the first yield.
> 
> However, checking the allocation status can potentially be very slow,
> can't it? I can't just hog the thread while I check.

I was thinking about that myself.  It isn’t that bad, because you aren’t
doing the full block_status dance but just checking allocation status,
which is reasonably quick (it just needs to look at the image format
metadata, it doesn’t go down to the protocol layer).

But it’s probably not so good to halt the monitor for this, yes.

> There are ways to cooperatively process write notifier interrupts and
> continue to check allocated status once we enter the main loop, but the
> problem there becomes: if we fail early, how can we tell if the backup
> is worth resuming?
> 
> We might not have reached a convergence point for the copy_bitmap before
> we failed, and still have a lot of extra bits set.

Is that so bad?

> I suppose at least in the case where we aren't trying to save the
> copy_bitmap and need it to mean something specific, this is a reasonable
> approach to fixing sync=TOP.
> 
> As far as resume is concerned, I don't think I have good ideas. I could
> emit an event or something if you're using sync=top with a bitmap for
> output, but that feels *so* specialized for a niche(?) command that I
> don't know if it's worth pursuing.
> 
> (Plus, even then, what do you do if it fails before you see that event?
> You just have to give up on what we copied out? That seems like a waste
> and not the point of this exercise.)

Before that event, the bitmap can still be usable, as long as all
“unknown” areas are set to dirty.  Sure, your resumed backup will then
copy too much data.  But who cares.

So I don’t think you even need an event.

> The only way I can think of at all to get a retry on sync=top is to take
> an always policy, and to allow a special invocation with something like
> mode=bitmap+top:

Yes, that was my first idea, too.  But I didn’t even write about it,
because of...

> "Assume we need to copy anything set in the bitmap, unless it's not in
> the top layer, and then skip it."
> 
> Which seems awful, because it would be a specialty mode for the
> exclusive purpose of re-trying sync=top backups.

...exactly this.

> Meh.

I don’t think it’s all that bad.

Max