On 2018-05-02 17:01, Eric Blake wrote:
> On 05/02/2018 09:37 AM, Max Reitz wrote:
>> On 2018-05-02 15:34, Ivan Ren wrote:
>>> qemu-img info with a block device which has a qcow2 format always
>>> return 0 for disk size, and this can not reflect the qcow2 size
>>> and the used space of the block device. This patch return the
>>> allocated size of qcow2 as the disk size.
>>
>> I'm not quite sure whether you really need this information for block
>> devices (I tend to agree with Eric that wr_highest_cluster is the more
>> important information there), but I can imagine it just being nice to
>> have.
> 
> Hmm, so in an extreme case, if you create an internal snapshot, then the
> guest makes edits, then you remove the internal snapshot, we have a
> wr_highest_offset that has advanced (because the guest changes had to
> allocate new clusters due to COW of refcount=2 clusters); but the
> deleted snapshot now means we have a lot of unused clusters earlier in
> the image (deleting the snapshot took refcount=2 clusters back to 1, and
> any COW'd clusters edited after the internal snapshot means the snapshot
> version is now back to refcount=0, whether or not we also try to punch a
> hole in the protocol layer for those freed clusters).  Thus, reporting
> the highest written cluster is a larger number than the number of
> clusters that are actually in use, and both numbers might be useful to
> know (how big do I have to size my block device, and how utilized is my
> block device), especially if we add code for online compaction or
> defragmentation of a qcow2 image so that we can move higher offsets into
> holes left earlier in the image.

In any case, since a block device is linear you really need to know the
highest offset that is in use.  Sure, if you know how much space there
is wasted in holes in the middle of the file, that is nice to know, but
it doesn't tell you when you need to grow your block device.

(It does tell you when to consider qemu-img convert or a mirror job to
defragment the image, though...)

> If you don't use internal snapshots, the only way to get holes of
> unallocated clusters earlier in the image is if the guest uses TRIM
> operations, and I'm not sure if that's easier or harder to trigger, nor
> which approach (internal snapshots vs. guest TRIM operations) is likely
> to leave more holes of unallocated clusters.

And I think we (at least used to) have quirks in qcow2's allocation
algorithm that meant it could leave some clusters unallocated in the
middle of the image (I think that was in case you allocate more than a
single cluster (e.g. for an L1 table), but then you also need to
allocate a new refblock, so you allocate that first and then you only
start to allocate clusters after that refblock, even though there might
still be space in front of it).

>> The whole implementation reminds me a lot of qcow2's check function,
>> which basically just recalculates the refcounts.  So I'm wondering
>> whether you could just count how many clusters with non-0 refcount there
>> are and thus simplify the implementation dramatically.
> 
> We also recently added 'qemu-img measure', which DOES report how many
> clusters are in use.  Is any of that reusable here?

It only tells you that information for a hypothetical new image, though,
doesn't it?

Max