Re: [Qemu-devel] [PATCH 2/2] vmdk: Add read-only support for seSparse snapshots

From: Max Reitz <mreitz@redhat.com>
To: Sam Eiderman <shmuel.eiderman@oracle.com>
Cc: fam@euphon.net, kwolf@redhat.com, eyal.moscovici@oracle.com,
	qemu-block@nongnu.org, arbel.moshe@oracle.com,
	qemu-devel@nongnu.org, liran.alon@oracle.com,
	Karl Heubaum <karl.heubaum@oracle.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] vmdk: Add read-only support for seSparse snapshots
Date: Tue, 28 May 2019 14:17:09 +0200	[thread overview]
Message-ID: <dd62f69d-8984-6f5e-0953-e312e469bc03@redhat.com> (raw)
In-Reply-To: <AF0444E6-AE0A-4ADC-A1D5-487775E076A6@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 6229 bytes --]

On 28.05.19 09:57, Sam Eiderman wrote:
> Comments inline
> 
>> On 27 May 2019, at 20:39, Max Reitz <mreitz@redhat.com
>> <mailto:mreitz@redhat.com>> wrote:
>>
>> On 24.04.19 09:49, Sam Eiderman wrote:

[...]

>>> @@ -498,10 +543,15 @@ static int vmdk_init_tables(BlockDriverState
>>> *bs, VmdkExtent *extent,
>>>         goto fail_l1;
>>>     }
>>>     for (i = 0; i < extent->l1_size; i++) {
>>> -        le32_to_cpus(&extent->l1_table[i]);
>>> +        if (extent->sesparse) {
>>
>> Shouldn’t matter in practice, but I think evaluating extent->entry_size
>> would be more natural.
> 
> I just wanted to make it more clear that we are dealing with the
> seSparse format.
> But maybe in this case I should evaluate extent->entry_size and add
> assert on
> extent->sesparse inside the uint64_t case?
> WDYT?

I think that this code works independently of whether it’s dealing with
the seSparse format or something else.  All that matters is the L1 entry
size.  If some day there is another VMDK sub format that has 64-bit L1
entries, too, this code will be ready for it.

(It’s a different matter when interpreting e.g. L2 entries, because
their encoding may differ depending on the sub format.)

>>> +            le64_to_cpus((uint64_t *)extent->l1_table + i);
>>> +        } else {
>>> +            le32_to_cpus((uint32_t *)extent->l1_table + i);
>>> +        }
>>>     }
>>>

[...]

>>> +    if (header->journal_header_offset != 2) {
>>> +        error_setg(errp, "Unsupported journal header offset: %" PRIu64,
>>> +                   header->journal_header_offset);
>>> +        return -ENOTSUP;
>>> +    }
>>> +
>>> +    if (header->journal_header_size != 2) {
>>> +        error_setg(errp, "Unsupported journal header size: %" PRIu64,
>>> +                   header->journal_header_size);
>>> +        return -ENOTSUP;
>>> +    }
>>> +
>>> +    if (header->journal_offset != 2048) {
>>> +        error_setg(errp, "Unsupported journal offset: %" PRIu64,
>>> +                   header->journal_offset);
>>> +        return -ENOTSUP;
>>> +    }
>>> +
>>> +    if (header->journal_size != 2048) {
>>> +        error_setg(errp, "Unsupported journal size: %" PRIu64,
>>> +                   header->journal_size);
>>> +        return -ENOTSUP;
>>> +    }
>>> +
>>> +    if (header->grain_dir_offset != 4096) {
>>> +        error_setg(errp, "Unsupported grain directory offset: %" PRIu64,
>>> +                   header->grain_dir_offset);
>>> +        return -ENOTSUP;
>>> +    }
>>
>> Do these values (metadata structure offsets and sizes) really matter?
> 
> Not really, these are the values that I saw appear in 10MB, 150GB and 6TB
> snapshots, they never change.
> All of the checks in this function are just for “strictness” and can be
> removed.
> I just wanted to include them at least in the v1 series to make the
> format more
> clear and to not support vmdk's that give other values under the assumption
> that the format might misbehave in other parts too.
> They can be removed (maybe I’ll keep the version check in any case) or we
> can keep them and remove them at any later point.
> WDYT?

I’d remove them.  Because this patch only adds read-only support, there
is no danger of corrupting data.  The worst that can happen is reading
wrong data from some file that with these checks we just wouldn’t read
at all.

>>> +    /* in sectors */
>>> +    grain_table_coverage = ((header->grain_table_size *
>>
>> Hm, if header->grain_table_size is measured in sectors, it’d probably be
>> better to rename it to grain_table_sectors or something.
>>
>> (“size” is usually in bytes; sometimes it’s the number of entries, but
>> that’s already kind of weird, I think)
> 
> grain_table_sectors is not a good name since it is not the number of sectors
> a grain table uses but the number of sectors it covers.

I meant renaming header->grain_table_size to
header->grain_table_sectors.  grain_table_coverage seems good to me.

But from your answer about the volatile header it appears like
header->grain_table_size might be the official name.  If so, I guess
it’ll be fine as it is.

> grain_table_coverage_sectors is a better name but a bit too long.
> I can change grain_table_* to gt_* and grain_dir_* to gd_* but they seem too
> similar and confusing.
> WDYT?
> 
>>
>>> +                             BDRV_SECTOR_SIZE / sizeof(uint64_t)) *
>>> +                            header->grain_size);
>>> +    needed_grain_tables = header->capacity / grain_table_coverage;
>>> +    needed_grain_dir_size = DIV_ROUND_UP(needed_grain_tables *
>>> sizeof(uint64_t),
>>> +                                         BDRV_SECTOR_SIZE);
>>> +    needed_grain_dir_size = ROUND_UP(needed_grain_dir_size, 2048);
>>> +
>>> +    if (header->grain_dir_size != needed_grain_dir_size) {
>>
>> Wouldn’t it suffice to check that header->grain_dir_size >=
>> needed_grain_dir_size?  (The ROUND_UP() would become unnecessary, then.)
>>
>> (And also, maybe s/grain_dir_size/grain_dir_sectors/)
> 
> This is how VMware works - they round up the size to 2048 sectors for the
> needed_grain_dir_sectors - so I just make sure the math adds up.
> As said before, all of these checks are just for “strictness”.

The thing is, I don’t know whether this is part of the specification of
seSparse or just part of VMware’s implementation.  When qemu increases
the qcow2 L1 table size, it has a specific algorithm for that
(current size * 1.5).  But that doesn’t make it part of the qcow2
specification, any size is fine.  Similarly, I can imagine that VMware
simply chose to increase the L1 size in units of 1 MB (so it doesn’t
need to be grown very often), but that it can work with any size.

Max

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]