On 05.02.20 16:14, Vladimir Sementsov-Ogievskiy wrote: > 05.02.2020 17:47, Vladimir Sementsov-Ogievskiy wrote: >> 05.02.2020 17:26, Eric Blake wrote: >>> On 2/5/20 3:25 AM, Vladimir Sementsov-Ogievskiy wrote: >>> >>>>> 3. For qcow2 >>>>> Hmm. Here, as I understand, than main case is freshly created qcow2, >>>>> which is fully-unallocated. To understand that it is empty, we >>>>> need only to check all L1 entries. And for empty L1 table it is fast. >>>>> So we don't need any qcow2 format improvement to check it. >>>>> >>>> >>>> Ah yes, I forget about preallocated case. Hmm. For preallocated >>>> clusters, >>>> we have zero bits in L2 entries. And with them, we even don't need >>>> preallocated to be filled by zeros, as we never read them (but just >>>> return >>>> zeros on read).. >>> >>> Scanning all L2 entries is O(n), while an autoclear bit properly >>> maintained is O(1). >>> >>>> >>>> Then, may be we want similar flag for L1 entry (this will enable large >>>> fast write-zero). And may be we want flag which marks the whole image >>>> as read-zero (it's your flag). So, now I think, my previous idea >>>> of "all allocated is zero" is worse. As for fully-preallocated images >>>> we are sure that all clusters are allocated, and it is more native to >>>> have flags similar to ZERO bit in L2 entry. >>> >>> Right now, we don't have any L1 entry flags.  Adding one would >>> require adding an incompatible feature flag (if older qemu would >>> choke to see unexpected flags in an L1 entry), or at best an >>> autoclear feature flag (if the autoclear bit gets cleared because an >>> older qemu opened the image and couldn't maintain L1 entry flags >>> correctly, then newer qemu knows it cannot trust those L1 entry >>> flags).  But as soon as you are talking about adding a feature bit, >>> then why add one that still requires O(n) traversal to check (true, >>> the 'n' in an O(n) traversal of L1 tables is much smaller than the >>> 'n' in an O(n) traversal of L2 tables), when you can instead just add >>> an O(1) autoclear bit that maintains all_zero status for the image as >>> a whole? >>> >> >> My suggestion about L1 entry flag is side thing, I understand >> difference between O(n) and O(1) :) Still additional L1 entry will >> help to make efficient large block-status and write-zero requests. >> >> And I agree that we need top level flag.. I just try to say, that it >> seems good to make it similar with existing L2 flag. But yes, it would >> be incomaptible change, as it marks all clusters as ZERO, and older >> Qemu can't understand it and may treat all clusters as unallocated. >> > > Still, how long is this O(n) ? We load the whole L1 into memory anyway. > For example, 16Tb disk with 64K granularity, we'll have 32768 L1 > entries. Will we get sensible performance benefit with an extension? I > doubt in it now. And anyway, if we have an extension, we should fallback > to this O(n) if we don't have the flag set. (Sorry, it’s late and I haven’t followed this particular conversation too closely, but:) Keep in mind that the default metadata overlap protection mode causes all L1 entries to be scanned on each I/O write. So it can’t be that bad. Max