On 12.08.19 22:26, John Snow wrote:
> 
> 
> On 7/25/19 11:57 AM, Max Reitz wrote:
>> Compressed writes generally have to write full clusters, not just in
>> theory but also in practice when it comes to vmdk's streamOptimized
>> subformat.  It currently is just silently broken for writes with
>> non-zero in-cluster offsets:
>>
>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
>> wrote 4096/4096 bytes at offset 4096
>> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
>> read failed: Invalid argument
>>
>> (The technical reason is that vmdk_write_extent() just writes the
>> incomplete compressed data actually to offset 4k.  When reading the
>> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
>> size to be 0, because that is what it reads from there.  This yields an
>> error.)
>>
>> For incomplete writes with zero in-cluster offsets, the error path when
>> reading the rest of the cluster is a bit different, but the result is
>> the same:
>>
>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
>> wrote 4096/4096 bytes at offset 0
>> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
>> read failed: Invalid argument
>>
>> (Here, vmdk_read_extent() finds the data and then sees that the
>> uncompressed data is short.)
>>
>> It is better to reject invalid writes than to make the user believe they
>> might have succeeded and then fail when trying to read it back.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/vmdk.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index db6acfc31e..641acacfe0 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
>>      if (extent->compressed) {
>>          void *compressed_data;
>>  
>> +        /* Only whole clusters */
>> +        if (offset_in_cluster ||
>> +            n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
>> +            (n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
>> +             offset + n_bytes != extent->end_sector * SECTOR_SIZE))
>> +        {
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +
>>          if (!extent->has_marker) {
>>              ret = -EINVAL;
>>              goto out;
>>
> 
> What does this look like from a guest's perspective? Is there something
> that enforces the alignment in the graph for us?
> 
> Or is it the case that indeed guests (or users via qemu-io) can request
> invalid writes and we will halt the VM in those cases (in preference to
> corrupting the disk)?

Have you ever tried using a streamOptimized VMDK disk with a guest?

I haven’t, but I know that it won’t work. O:-)  If you try to write to
an already allocated cluster, you’ll get an EIO and an error message via
error_report() (“Could not write to allocated cluster for
streamOptimized”).  So really, the only use of streamOptimized is as a
qemu-img convert source/target, or as a backup/mirror target.  (Just
like compressed clusters in qcow2 images.)

I suppose if I introduced streamOptimized support today, I wouldn’t just
forward vmdk_co_pwritev_compressed() to vmdk_co_pwritev(), but instead
make vmdk_co_pwritev_compressed() only work on streamOptimized images,
and vmdk_co_pwritev() only on everything else.  Then it would be more clear.

Hm.  In fact, that’s a bug, isn’t it?  vmdk will accept compressed
writes for any subformat, even if it doesn’t support compression.  So if
you use -c and convert to vmdk, it will succeed, but the result won’t be
compressed,

It’s also a bit weird to accept normal writes for streamOptimized, but
I’m not sure whether that’s really a bug?  In any case, changing this
behavior would not be backwards-compatible...  Should we deprecate
normal writes to streamOptimized?

Max