On 05.02.20 16:14, Vladimir Sementsov-Ogievskiy wrote:
> 05.02.2020 17:47, Vladimir Sementsov-Ogievskiy wrote:
>> 05.02.2020 17:26, Eric Blake wrote:
>>> On 2/5/20 3:25 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>>>> 3. For qcow2
>>>>> Hmm. Here, as I understand, than main case is freshly created qcow2,
>>>>> which is fully-unallocated. To understand that it is empty, we
>>>>> need only to check all L1 entries. And for empty L1 table it is fast.
>>>>> So we don't need any qcow2 format improvement to check it.
>>>>>
>>>>
>>>> Ah yes, I forget about preallocated case. Hmm. For preallocated
>>>> clusters,
>>>> we have zero bits in L2 entries. And with them, we even don't need
>>>> preallocated to be filled by zeros, as we never read them (but just
>>>> return
>>>> zeros on read)..
>>>
>>> Scanning all L2 entries is O(n), while an autoclear bit properly
>>> maintained is O(1).
>>>
>>>>
>>>> Then, may be we want similar flag for L1 entry (this will enable large
>>>> fast write-zero). And may be we want flag which marks the whole image
>>>> as read-zero (it's your flag). So, now I think, my previous idea
>>>> of "all allocated is zero" is worse. As for fully-preallocated images
>>>> we are sure that all clusters are allocated, and it is more native to
>>>> have flags similar to ZERO bit in L2 entry.
>>>
>>> Right now, we don't have any L1 entry flags.  Adding one would
>>> require adding an incompatible feature flag (if older qemu would
>>> choke to see unexpected flags in an L1 entry), or at best an
>>> autoclear feature flag (if the autoclear bit gets cleared because an
>>> older qemu opened the image and couldn't maintain L1 entry flags
>>> correctly, then newer qemu knows it cannot trust those L1 entry
>>> flags).  But as soon as you are talking about adding a feature bit,
>>> then why add one that still requires O(n) traversal to check (true,
>>> the 'n' in an O(n) traversal of L1 tables is much smaller than the
>>> 'n' in an O(n) traversal of L2 tables), when you can instead just add
>>> an O(1) autoclear bit that maintains all_zero status for the image as
>>> a whole?
>>>
>>
>> My suggestion about L1 entry flag is side thing, I understand
>> difference between O(n) and O(1) :) Still additional L1 entry will
>> help to make efficient large block-status and write-zero requests.
>>
>> And I agree that we need top level flag.. I just try to say, that it
>> seems good to make it similar with existing L2 flag. But yes, it would
>> be incomaptible change, as it marks all clusters as ZERO, and older
>> Qemu can't understand it and may treat all clusters as unallocated.
>>
> 
> Still, how long is this O(n) ? We load the whole L1 into memory anyway.
> For example, 16Tb disk with 64K granularity, we'll have 32768 L1
> entries. Will we get sensible performance benefit with an extension? I
> doubt in it now. And anyway, if we have an extension, we should fallback
> to this O(n) if we don't have the flag set.

(Sorry, it’s late and I haven’t followed this particular conversation
too closely, but:)

Keep in mind that the default metadata overlap protection mode causes
all L1 entries to be scanned on each I/O write.  So it can’t be that bad.

Max