* [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format @ 2019-06-05 12:17 Sam Eiderman 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 1/3] vmdk: Fix comment regarding max l1_size coverage Sam Eiderman ` (3 more replies) 0 siblings, 4 replies; 9+ messages in thread From: Sam Eiderman @ 2019-06-05 12:17 UTC (permalink / raw) To: kwolf, qemu-block, qemu-devel, mreitz Cc: liran.alon, arbel.moshe, shmuel.eiderman, eyal.moscovici, karl.heubaum v1: VMware introduced a new snapshot format in VMFS6 - seSparse (Space Efficient Sparse) which is the default format available in ESXi 6.7. Add read-only support for the new snapshot format. v2: Fixed after Max's review: * Removed strict sesparse checks * Reduced maximal L1 table size * Added non-write mode check in vmdk_open() on sesparse Sam Eiderman (3): vmdk: Fix comment regarding max l1_size coverage vmdk: Reduce the max bound for L1 table size vmdk: Add read-only support for seSparse snapshots block/vmdk.c | 371 ++++++++++++++++++++++++++++++++++++++++++--- tests/qemu-iotests/059.out | 2 +- 2 files changed, 352 insertions(+), 21 deletions(-) -- 2.13.3 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 1/3] vmdk: Fix comment regarding max l1_size coverage 2019-06-05 12:17 [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman @ 2019-06-05 12:17 ` Sam Eiderman 2019-06-19 17:10 ` Max Reitz 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 2/3] vmdk: Reduce the max bound for L1 table size Sam Eiderman ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Sam Eiderman @ 2019-06-05 12:17 UTC (permalink / raw) To: kwolf, qemu-block, qemu-devel, mreitz Cc: liran.alon, arbel.moshe, shmuel.eiderman, eyal.moscovici, karl.heubaum Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent") extended the l1_size check from VMDK4 to VMDK3 but did not update the default coverage in the moved comment. The previous vmdk4 calculation: (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB The added vmdk3 calculation: (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB Adding the calculation of vmdk3 to the comment. In any case, VMware does not offer virtual disks more than 2TB for vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is not implemented yet in qemu. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> --- block/vmdk.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 51067c774f..0f2e453bf5 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -426,10 +426,15 @@ static int vmdk_add_extent(BlockDriverState *bs, return -EFBIG; } if (l1_size > 512 * 1024 * 1024) { - /* Although with big capacity and small l1_entry_sectors, we can get a + /* + * Although with big capacity and small l1_entry_sectors, we can get a * big l1_size, we don't want unbounded value to allocate the table. - * Limit it to 512M, which is 16PB for default cluster and L2 table - * size */ + * Limit it to 512M, which is: + * 16PB - for default "Hosted Sparse Extent" (VMDK4) + * cluster size: 64KB, L2 table size: 512 entries + * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse) + * cluster size: 512B, L2 table size: 4096 entries + */ error_setg(errp, "L1 size too big"); return -EFBIG; } -- 2.13.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/3] vmdk: Fix comment regarding max l1_size coverage 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 1/3] vmdk: Fix comment regarding max l1_size coverage Sam Eiderman @ 2019-06-19 17:10 ` Max Reitz 0 siblings, 0 replies; 9+ messages in thread From: Max Reitz @ 2019-06-19 17:10 UTC (permalink / raw) To: Sam Eiderman, kwolf, qemu-block, qemu-devel Cc: arbel.moshe, liran.alon, eyal.moscovici, karl.heubaum [-- Attachment #1.1: Type: text/plain, Size: 1100 bytes --] On 05.06.19 14:17, Sam Eiderman wrote: > Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent") > extended the l1_size check from VMDK4 to VMDK3 but did not update the > default coverage in the moved comment. > > The previous vmdk4 calculation: > > (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB > > The added vmdk3 calculation: > > (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB > > Adding the calculation of vmdk3 to the comment. > > In any case, VMware does not offer virtual disks more than 2TB for > vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is > not implemented yet in qemu. > > Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> > Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> > Reviewed-by: Liran Alon <liran.alon@oracle.com> > Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> > Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> > --- > block/vmdk.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) Reviewed-by: Max Reitz <mreitz@redhat.com> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 2/3] vmdk: Reduce the max bound for L1 table size 2019-06-05 12:17 [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 1/3] vmdk: Fix comment regarding max l1_size coverage Sam Eiderman @ 2019-06-05 12:17 ` Sam Eiderman 2019-06-19 17:09 ` Max Reitz 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 3/3] vmdk: Add read-only support for seSparse snapshots Sam Eiderman 2019-06-19 9:31 ` [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman 3 siblings, 1 reply; 9+ messages in thread From: Sam Eiderman @ 2019-06-05 12:17 UTC (permalink / raw) To: kwolf, qemu-block, qemu-devel, mreitz Cc: liran.alon, arbel.moshe, shmuel.eiderman, eyal.moscovici, karl.heubaum 512M of L1 entries is a very loose bound, only 32M are required to store the maximal supported VMDK file size of 2TB. Fixed qemu-iotest 59# - now failure occures before on impossible L1 table size. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> --- block/vmdk.c | 13 +++++++------ tests/qemu-iotests/059.out | 2 +- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 0f2e453bf5..931eb2759c 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -425,15 +425,16 @@ static int vmdk_add_extent(BlockDriverState *bs, error_setg(errp, "Invalid granularity, image may be corrupt"); return -EFBIG; } - if (l1_size > 512 * 1024 * 1024) { + if (l1_size > 32 * 1024 * 1024) { /* * Although with big capacity and small l1_entry_sectors, we can get a * big l1_size, we don't want unbounded value to allocate the table. - * Limit it to 512M, which is: - * 16PB - for default "Hosted Sparse Extent" (VMDK4) - * cluster size: 64KB, L2 table size: 512 entries - * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse) - * cluster size: 512B, L2 table size: 4096 entries + * Limit it to 32M, which is enough to store: + * 8TB - for both VMDK3 & VMDK4 with + * minimal cluster size: 512B + * minimal L2 table size: 512 entries + * 8 TB is still more than the maximal value supported for + * VMDK3 & VMDK4 which is 2TB. */ error_setg(errp, "L1 size too big"); return -EFBIG; diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out index f51394ae8e..4fab42a28c 100644 --- a/tests/qemu-iotests/059.out +++ b/tests/qemu-iotests/059.out @@ -2358,5 +2358,5 @@ Offset Length Mapped to File 0x140000000 0x10000 0x50000 TEST_DIR/t-s003.vmdk === Testing afl image with a very large capacity === -qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large +qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big *** done -- 2.13.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/3] vmdk: Reduce the max bound for L1 table size 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 2/3] vmdk: Reduce the max bound for L1 table size Sam Eiderman @ 2019-06-19 17:09 ` Max Reitz 0 siblings, 0 replies; 9+ messages in thread From: Max Reitz @ 2019-06-19 17:09 UTC (permalink / raw) To: Sam Eiderman, kwolf, qemu-block, qemu-devel Cc: arbel.moshe, liran.alon, eyal.moscovici, karl.heubaum [-- Attachment #1.1: Type: text/plain, Size: 733 bytes --] On 05.06.19 14:17, Sam Eiderman wrote: > 512M of L1 entries is a very loose bound, only 32M are required to store > the maximal supported VMDK file size of 2TB. > > Fixed qemu-iotest 59# - now failure occures before on impossible L1 > table size. > > Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> > Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> > Reviewed-by: Liran Alon <liran.alon@oracle.com> > Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> > Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> > --- > block/vmdk.c | 13 +++++++------ > tests/qemu-iotests/059.out | 2 +- > 2 files changed, 8 insertions(+), 7 deletions(-) Reviewed-by: Max Reitz <mreitz@redhat.com> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 3/3] vmdk: Add read-only support for seSparse snapshots 2019-06-05 12:17 [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 1/3] vmdk: Fix comment regarding max l1_size coverage Sam Eiderman 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 2/3] vmdk: Reduce the max bound for L1 table size Sam Eiderman @ 2019-06-05 12:17 ` Sam Eiderman 2019-06-19 17:12 ` Max Reitz 2019-06-19 9:31 ` [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman 3 siblings, 1 reply; 9+ messages in thread From: Sam Eiderman @ 2019-06-05 12:17 UTC (permalink / raw) To: kwolf, qemu-block, qemu-devel, mreitz Cc: liran.alon, arbel.moshe, shmuel.eiderman, eyal.moscovici, karl.heubaum Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in QEMU). This format was lacking in the following: * Grain directory (L1) and grain table (L2) entries were 32-bit, allowing access to only 2TB (slightly less) of data. * The grain size (default) was 512 bytes - leading to data fragmentation and many grain tables. * For space reclamation purposes, it was necessary to find all the grains which are not pointed to by any grain table - so a reverse mapping of "offset of grain in vmdk" to "grain table" must be constructed - which takes large amounts of CPU/RAM. The format specification can be found in VMware's documentation: https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf In ESXi 6.5, to support snapshot files larger than 2TB, a new format was introduced: SESparse (Space Efficient). This format fixes the above issues: * All entries are now 64-bit. * The grain size (default) is 4KB. * Grain directory and grain tables are now located at the beginning of the file. + seSparse format reserves space for all grain tables. + Grain tables can be addressed using an index. + Grains are located in the end of the file and can also be addressed with an index. - seSparse vmdks of large disks (64TB) have huge preallocated headers - mainly due to L2 tables, even for empty snapshots. * The header contains a reverse mapping ("backmap") of "offset of grain in vmdk" to "grain table" and a bitmap ("free bitmap") which specifies for each grain - whether it is allocated or not. Using these data structures we can implement space reclamation efficiently. * Due to the fact that the header now maintains two mappings: * The regular one (grain directory & grain tables) * A reverse one (backmap and free bitmap) These data structures can lose consistency upon crash and result in a corrupted VMDK. Therefore, a journal is also added to the VMDK and is replayed when the VMware reopens the file after a crash. Since ESXi 6.7 - SESparse is the only snapshot format available. Unfortunately, VMware does not provide documentation regarding the new seSparse format. This commit is based on black-box research of the seSparse format. Various in-guest block operations and their effect on the snapshot file were tested. The only VMware provided source of information (regarding the underlying implementation) was a log file on the ESXi: /var/log/hostd.log Whenever an seSparse snapshot is created - the log is being populated with seSparse records. Relevant log records are of the form: [...] Const Header: [...] constMagic = 0xcafebabe [...] version = 2.1 [...] capacity = 204800 [...] grainSize = 8 [...] grainTableSize = 64 [...] flags = 0 [...] Extents: [...] Header : <1 : 1> [...] JournalHdr : <2 : 2> [...] Journal : <2048 : 2048> [...] GrainDirectory : <4096 : 2048> [...] GrainTables : <6144 : 2048> [...] FreeBitmap : <8192 : 2048> [...] BackMap : <10240 : 2048> [...] Grain : <12288 : 204800> [...] Volatile Header: [...] volatileMagic = 0xcafecafe [...] FreeGTNumber = 0 [...] nextTxnSeqNumber = 0 [...] replayJournal = 0 The sizes that are seen in the log file are in sectors. Extents are of the following format: <offset : size> This commit is a strict implementation which enforces: * magics * version number 2.1 * grain size of 8 sectors (4KB) * grain table size of 64 sectors * zero flags * extent locations Additionally, this commit proivdes only a subset of the functionality offered by seSparse's format: * Read-only * No journal replay * No space reclamation * No unmap support Hence, journal header, journal, free bitmap and backmap extents are unused, only the "classic" (L1 -> L2 -> data) grain access is implemented. However there are several differences in the grain access itself. Grain directory (L1): * Grain directory entries are indexes (not offsets) to grain tables. * Valid grain directory entries have their highest nibble set to 0x1. * Since grain tables are always located in the beginning of the file - the index can fit into 32 bits - so we can use its low part if it's valid. Grain table (L2): * Grain table entries are indexes (not offsets) to grains. * If the highest nibble of the entry is: 0x0: The grain in not allocated. The rest of the bytes are 0. 0x1: The grain is unmapped - guest sees a zero grain. The rest of the bits point to the previously mapped grain, see 0x3 case. 0x2: The grain is zero. 0x3: The grain is allocated - to get the index calculate: ((entry & 0x0fff000000000000) >> 48) | ((entry & 0x0000ffffffffffff) << 12) * The difference between 0x1 and 0x2 is that 0x1 is an unallocated grain which results from the guest using sg_unmap to unmap the grain - but the grain itself still exists in the grain extent - a space reclamation procedure should delete it. Unmapping a zero grain has no effect (0x2 will not change to 0x1) but unmapping an unallocated grain will (0x0 to 0x1) - naturally. In order to implement seSparse some fields had to be changed to support both 32-bit and 64-bit entry sizes. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> --- block/vmdk.c | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 341 insertions(+), 16 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 931eb2759c..4377779635 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -91,6 +91,44 @@ typedef struct { uint16_t compressAlgorithm; } QEMU_PACKED VMDK4Header; +typedef struct VMDKSESparseConstHeader { + uint64_t magic; + uint64_t version; + uint64_t capacity; + uint64_t grain_size; + uint64_t grain_table_size; + uint64_t flags; + uint64_t reserved1; + uint64_t reserved2; + uint64_t reserved3; + uint64_t reserved4; + uint64_t volatile_header_offset; + uint64_t volatile_header_size; + uint64_t journal_header_offset; + uint64_t journal_header_size; + uint64_t journal_offset; + uint64_t journal_size; + uint64_t grain_dir_offset; + uint64_t grain_dir_size; + uint64_t grain_tables_offset; + uint64_t grain_tables_size; + uint64_t free_bitmap_offset; + uint64_t free_bitmap_size; + uint64_t backmap_offset; + uint64_t backmap_size; + uint64_t grains_offset; + uint64_t grains_size; + uint8_t pad[304]; +} QEMU_PACKED VMDKSESparseConstHeader; + +typedef struct VMDKSESparseVolatileHeader { + uint64_t magic; + uint64_t free_gt_number; + uint64_t next_txn_seq_number; + uint64_t replay_journal; + uint8_t pad[480]; +} QEMU_PACKED VMDKSESparseVolatileHeader; + #define L2_CACHE_SIZE 16 typedef struct VmdkExtent { @@ -99,19 +137,23 @@ typedef struct VmdkExtent { bool compressed; bool has_marker; bool has_zero_grain; + bool sesparse; + uint64_t sesparse_l2_tables_offset; + uint64_t sesparse_clusters_offset; + int32_t entry_size; int version; int64_t sectors; int64_t end_sector; int64_t flat_start_offset; int64_t l1_table_offset; int64_t l1_backup_table_offset; - uint32_t *l1_table; + void *l1_table; uint32_t *l1_backup_table; unsigned int l1_size; uint32_t l1_entry_sectors; unsigned int l2_size; - uint32_t *l2_cache; + void *l2_cache; uint32_t l2_cache_offsets[L2_CACHE_SIZE]; uint32_t l2_cache_counts[L2_CACHE_SIZE]; @@ -435,6 +477,11 @@ static int vmdk_add_extent(BlockDriverState *bs, * minimal L2 table size: 512 entries * 8 TB is still more than the maximal value supported for * VMDK3 & VMDK4 which is 2TB. + * 64TB - for "ESXi seSparse Extent" + * minimal cluster size: 512B (default is 4KB) + * L2 table size: 4096 entries (const). + * 64TB is more than the maximal value supported for + * seSparse VMDKs (which is slightly less than 64TB) */ error_setg(errp, "L1 size too big"); return -EFBIG; @@ -460,6 +507,7 @@ static int vmdk_add_extent(BlockDriverState *bs, extent->l2_size = l2_size; extent->cluster_sectors = flat ? sectors : cluster_sectors; extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors); + extent->entry_size = sizeof(uint32_t); if (s->num_extents > 1) { extent->end_sector = (*(extent - 1)).end_sector + extent->sectors; @@ -481,7 +529,7 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent, int i; /* read the L1 table */ - l1_size = extent->l1_size * sizeof(uint32_t); + l1_size = extent->l1_size * extent->entry_size; extent->l1_table = g_try_malloc(l1_size); if (l1_size && extent->l1_table == NULL) { return -ENOMEM; @@ -499,10 +547,16 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent, goto fail_l1; } for (i = 0; i < extent->l1_size; i++) { - le32_to_cpus(&extent->l1_table[i]); + if (extent->entry_size == sizeof(uint64_t)) { + le64_to_cpus((uint64_t *)extent->l1_table + i); + } else { + assert(extent->entry_size == sizeof(uint32_t)); + le32_to_cpus((uint32_t *)extent->l1_table + i); + } } if (extent->l1_backup_table_offset) { + assert(!extent->sesparse); extent->l1_backup_table = g_try_malloc(l1_size); if (l1_size && extent->l1_backup_table == NULL) { ret = -ENOMEM; @@ -525,7 +579,7 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent, } extent->l2_cache = - g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE); + g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE); return 0; fail_l1b: g_free(extent->l1_backup_table); @@ -571,6 +625,204 @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs, return ret; } +#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe) +#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe) + +/* Strict checks - format not officially documented */ +static int check_se_sparse_const_header(VMDKSESparseConstHeader *header, + Error **errp) +{ + header->magic = le64_to_cpu(header->magic); + header->version = le64_to_cpu(header->version); + header->grain_size = le64_to_cpu(header->grain_size); + header->grain_table_size = le64_to_cpu(header->grain_table_size); + header->flags = le64_to_cpu(header->flags); + header->reserved1 = le64_to_cpu(header->reserved1); + header->reserved2 = le64_to_cpu(header->reserved2); + header->reserved3 = le64_to_cpu(header->reserved3); + header->reserved4 = le64_to_cpu(header->reserved4); + + header->volatile_header_offset = + le64_to_cpu(header->volatile_header_offset); + header->volatile_header_size = le64_to_cpu(header->volatile_header_size); + + header->journal_header_offset = le64_to_cpu(header->journal_header_offset); + header->journal_header_size = le64_to_cpu(header->journal_header_size); + + header->journal_offset = le64_to_cpu(header->journal_offset); + header->journal_size = le64_to_cpu(header->journal_size); + + header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset); + header->grain_dir_size = le64_to_cpu(header->grain_dir_size); + + header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset); + header->grain_tables_size = le64_to_cpu(header->grain_tables_size); + + header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset); + header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size); + + header->backmap_offset = le64_to_cpu(header->backmap_offset); + header->backmap_size = le64_to_cpu(header->backmap_size); + + header->grains_offset = le64_to_cpu(header->grains_offset); + header->grains_size = le64_to_cpu(header->grains_size); + + if (header->magic != SESPARSE_CONST_HEADER_MAGIC) { + error_setg(errp, "Bad const header magic: 0x%016" PRIx64, + header->magic); + return -EINVAL; + } + + if (header->version != 0x0000000200000001) { + error_setg(errp, "Unsupported version: 0x%016" PRIx64, + header->version); + return -ENOTSUP; + } + + if (header->grain_size != 8) { + error_setg(errp, "Unsupported grain size: %" PRIu64, + header->grain_size); + return -ENOTSUP; + } + + if (header->grain_table_size != 64) { + error_setg(errp, "Unsupported grain table size: %" PRIu64, + header->grain_table_size); + return -ENOTSUP; + } + + if (header->flags != 0) { + error_setg(errp, "Unsupported flags: 0x%016" PRIx64, + header->flags); + return -ENOTSUP; + } + + if (header->reserved1 != 0 || header->reserved2 != 0 || + header->reserved3 != 0 || header->reserved4 != 0) { + error_setg(errp, "Unsupported reserved bits:" + " 0x%016" PRIx64 " 0x%016" PRIx64 + " 0x%016" PRIx64 " 0x%016" PRIx64, + header->reserved1, header->reserved2, + header->reserved3, header->reserved4); + return -ENOTSUP; + } + + /* check that padding is 0 */ + if (!buffer_is_zero(header->pad, sizeof(header->pad))) { + error_setg(errp, "Unsupported non-zero const header padding"); + return -ENOTSUP; + } + + return 0; +} + +static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header, + Error **errp) +{ + header->magic = le64_to_cpu(header->magic); + header->free_gt_number = le64_to_cpu(header->free_gt_number); + header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number); + header->replay_journal = le64_to_cpu(header->replay_journal); + + if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) { + error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64, + header->magic); + return -EINVAL; + } + + if (header->replay_journal) { + error_setg(errp, "Image is dirty, Replaying journal not supported"); + return -ENOTSUP; + } + + /* check that padding is 0 */ + if (!buffer_is_zero(header->pad, sizeof(header->pad))) { + error_setg(errp, "Unsupported non-zero volatile header padding"); + return -ENOTSUP; + } + + return 0; +} + +static int vmdk_open_se_sparse(BlockDriverState *bs, + BdrvChild *file, + int flags, Error **errp) +{ + int ret; + VMDKSESparseConstHeader const_header; + VMDKSESparseVolatileHeader volatile_header; + VmdkExtent *extent; + + if (flags & BDRV_O_RDWR) { + error_setg(errp, "No write support for seSparse images available"); + return -ENOTSUP; + } + + assert(sizeof(const_header) == SECTOR_SIZE); + + ret = bdrv_pread(file, 0, &const_header, sizeof(const_header)); + if (ret < 0) { + bdrv_refresh_filename(file->bs); + error_setg_errno(errp, -ret, + "Could not read const header from file '%s'", + file->bs->filename); + return ret; + } + + /* check const header */ + ret = check_se_sparse_const_header(&const_header, errp); + if (ret < 0) { + return ret; + } + + assert(sizeof(volatile_header) == SECTOR_SIZE); + + ret = bdrv_pread(file, + const_header.volatile_header_offset * SECTOR_SIZE, + &volatile_header, sizeof(volatile_header)); + if (ret < 0) { + bdrv_refresh_filename(file->bs); + error_setg_errno(errp, -ret, + "Could not read volatile header from file '%s'", + file->bs->filename); + return ret; + } + + /* check volatile header */ + ret = check_se_sparse_volatile_header(&volatile_header, errp); + if (ret < 0) { + return ret; + } + + ret = vmdk_add_extent(bs, file, false, + const_header.capacity, + const_header.grain_dir_offset * SECTOR_SIZE, + 0, + const_header.grain_dir_size * + SECTOR_SIZE / sizeof(uint64_t), + const_header.grain_table_size * + SECTOR_SIZE / sizeof(uint64_t), + const_header.grain_size, + &extent, + errp); + if (ret < 0) { + return ret; + } + + extent->sesparse = true; + extent->sesparse_l2_tables_offset = const_header.grain_tables_offset; + extent->sesparse_clusters_offset = const_header.grains_offset; + extent->entry_size = sizeof(uint64_t); + + ret = vmdk_init_tables(bs, extent, errp); + if (ret) { + /* free extent allocated by vmdk_add_extent */ + vmdk_free_last_extent(bs); + } + + return ret; +} + static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf, QDict *options, Error **errp); @@ -848,6 +1100,7 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs, * RW [size in sectors] SPARSE "file-name.vmdk" * RW [size in sectors] VMFS "file-name.vmdk" * RW [size in sectors] VMFSSPARSE "file-name.vmdk" + * RW [size in sectors] SESPARSE "file-name.vmdk" */ flat_offset = -1; matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64, @@ -870,7 +1123,8 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs, if (sectors <= 0 || (strcmp(type, "FLAT") && strcmp(type, "SPARSE") && - strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) || + strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") && + strcmp(type, "SESPARSE")) || (strcmp(access, "RW"))) { continue; } @@ -923,6 +1177,13 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs, return ret; } extent = &s->extents[s->num_extents - 1]; + } else if (!strcmp(type, "SESPARSE")) { + ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp); + if (ret) { + bdrv_unref_child(bs, extent_file); + return ret; + } + extent = &s->extents[s->num_extents - 1]; } else { error_setg(errp, "Unsupported extent type '%s'", type); bdrv_unref_child(bs, extent_file); @@ -957,6 +1218,7 @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf, if (strcmp(ct, "monolithicFlat") && strcmp(ct, "vmfs") && strcmp(ct, "vmfsSparse") && + strcmp(ct, "seSparse") && strcmp(ct, "twoGbMaxExtentSparse") && strcmp(ct, "twoGbMaxExtentFlat")) { error_setg(errp, "Unsupported image type '%s'", ct); @@ -1207,10 +1469,12 @@ static int get_cluster_offset(BlockDriverState *bs, { unsigned int l1_index, l2_offset, l2_index; int min_index, i, j; - uint32_t min_count, *l2_table; + uint32_t min_count; + void *l2_table; bool zeroed = false; int64_t ret; int64_t cluster_sector; + unsigned int l2_size_bytes = extent->l2_size * extent->entry_size; if (m_data) { m_data->valid = 0; @@ -1225,7 +1489,36 @@ static int get_cluster_offset(BlockDriverState *bs, if (l1_index >= extent->l1_size) { return VMDK_ERROR; } - l2_offset = extent->l1_table[l1_index]; + if (extent->sesparse) { + uint64_t l2_offset_u64; + + assert(extent->entry_size == sizeof(uint64_t)); + + l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index]; + if (l2_offset_u64 == 0) { + l2_offset = 0; + } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) { + /* + * Top most nibble is 0x1 if grain table is allocated. + * strict check - top most 4 bytes must be 0x10000000 since max + * supported size is 64TB for disk - so no more than 64TB / 16MB + * grain directories which is smaller than uint32, + * where 16MB is the only supported default grain table coverage. + */ + return VMDK_ERROR; + } else { + l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff; + l2_offset_u64 = extent->sesparse_l2_tables_offset + + l2_offset_u64 * l2_size_bytes / SECTOR_SIZE; + if (l2_offset_u64 > 0x00000000ffffffff) { + return VMDK_ERROR; + } + l2_offset = (unsigned int)(l2_offset_u64); + } + } else { + assert(extent->entry_size == sizeof(uint32_t)); + l2_offset = ((uint32_t *)extent->l1_table)[l1_index]; + } if (!l2_offset) { return VMDK_UNALLOC; } @@ -1237,7 +1530,7 @@ static int get_cluster_offset(BlockDriverState *bs, extent->l2_cache_counts[j] >>= 1; } } - l2_table = extent->l2_cache + (i * extent->l2_size); + l2_table = (char *)extent->l2_cache + (i * l2_size_bytes); goto found; } } @@ -1250,13 +1543,13 @@ static int get_cluster_offset(BlockDriverState *bs, min_index = i; } } - l2_table = extent->l2_cache + (min_index * extent->l2_size); + l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes); BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD); if (bdrv_pread(extent->file, (int64_t)l2_offset * 512, l2_table, - extent->l2_size * sizeof(uint32_t) - ) != extent->l2_size * sizeof(uint32_t)) { + l2_size_bytes + ) != l2_size_bytes) { return VMDK_ERROR; } @@ -1264,16 +1557,45 @@ static int get_cluster_offset(BlockDriverState *bs, extent->l2_cache_counts[min_index] = 1; found: l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size; - cluster_sector = le32_to_cpu(l2_table[l2_index]); - if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) { - zeroed = true; + if (extent->sesparse) { + cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]); + switch (cluster_sector & 0xf000000000000000) { + case 0x0000000000000000: + /* unallocated grain */ + if (cluster_sector != 0) { + return VMDK_ERROR; + } + break; + case 0x1000000000000000: + /* scsi-unmapped grain - fallthrough */ + case 0x2000000000000000: + /* zero grain */ + zeroed = true; + break; + case 0x3000000000000000: + /* allocated grain */ + cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) | + ((cluster_sector & 0x0000ffffffffffff) << 12)); + cluster_sector = extent->sesparse_clusters_offset + + cluster_sector * extent->cluster_sectors; + break; + default: + return VMDK_ERROR; + } + } else { + cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]); + + if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) { + zeroed = true; + } } if (!cluster_sector || zeroed) { if (!allocate) { return zeroed ? VMDK_ZEROED : VMDK_UNALLOC; } + assert(!extent->sesparse); if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) { return VMDK_ERROR; @@ -1297,7 +1619,7 @@ static int get_cluster_offset(BlockDriverState *bs, m_data->l1_index = l1_index; m_data->l2_index = l2_index; m_data->l2_offset = l2_offset; - m_data->l2_cache_entry = &l2_table[l2_index]; + m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index; } } *cluster_offset = cluster_sector << BDRV_SECTOR_BITS; @@ -1623,6 +1945,9 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset, if (!extent) { return -EIO; } + if (extent->sesparse) { + return -ENOTSUP; + } offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset); n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE - offset_in_cluster); -- 2.13.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/3] vmdk: Add read-only support for seSparse snapshots 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 3/3] vmdk: Add read-only support for seSparse snapshots Sam Eiderman @ 2019-06-19 17:12 ` Max Reitz 2019-06-20 8:48 ` Sam Eiderman 0 siblings, 1 reply; 9+ messages in thread From: Max Reitz @ 2019-06-19 17:12 UTC (permalink / raw) To: Sam Eiderman, kwolf, qemu-block, qemu-devel Cc: arbel.moshe, liran.alon, eyal.moscovici, karl.heubaum [-- Attachment #1.1: Type: text/plain, Size: 7222 bytes --] On 05.06.19 14:17, Sam Eiderman wrote: > Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in > QEMU). > > This format was lacking in the following: > > * Grain directory (L1) and grain table (L2) entries were 32-bit, > allowing access to only 2TB (slightly less) of data. > * The grain size (default) was 512 bytes - leading to data > fragmentation and many grain tables. > * For space reclamation purposes, it was necessary to find all the > grains which are not pointed to by any grain table - so a reverse > mapping of "offset of grain in vmdk" to "grain table" must be > constructed - which takes large amounts of CPU/RAM. > > The format specification can be found in VMware's documentation: > https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf > > In ESXi 6.5, to support snapshot files larger than 2TB, a new format was > introduced: SESparse (Space Efficient). > > This format fixes the above issues: > > * All entries are now 64-bit. > * The grain size (default) is 4KB. > * Grain directory and grain tables are now located at the beginning > of the file. > + seSparse format reserves space for all grain tables. > + Grain tables can be addressed using an index. > + Grains are located in the end of the file and can also be > addressed with an index. > - seSparse vmdks of large disks (64TB) have huge preallocated > headers - mainly due to L2 tables, even for empty snapshots. > * The header contains a reverse mapping ("backmap") of "offset of > grain in vmdk" to "grain table" and a bitmap ("free bitmap") which > specifies for each grain - whether it is allocated or not. > Using these data structures we can implement space reclamation > efficiently. > * Due to the fact that the header now maintains two mappings: > * The regular one (grain directory & grain tables) > * A reverse one (backmap and free bitmap) > These data structures can lose consistency upon crash and result > in a corrupted VMDK. > Therefore, a journal is also added to the VMDK and is replayed > when the VMware reopens the file after a crash. > > Since ESXi 6.7 - SESparse is the only snapshot format available. > > Unfortunately, VMware does not provide documentation regarding the new > seSparse format. > > This commit is based on black-box research of the seSparse format. > Various in-guest block operations and their effect on the snapshot file > were tested. > > The only VMware provided source of information (regarding the underlying > implementation) was a log file on the ESXi: > > /var/log/hostd.log > > Whenever an seSparse snapshot is created - the log is being populated > with seSparse records. > > Relevant log records are of the form: > > [...] Const Header: > [...] constMagic = 0xcafebabe > [...] version = 2.1 > [...] capacity = 204800 > [...] grainSize = 8 > [...] grainTableSize = 64 > [...] flags = 0 > [...] Extents: > [...] Header : <1 : 1> > [...] JournalHdr : <2 : 2> > [...] Journal : <2048 : 2048> > [...] GrainDirectory : <4096 : 2048> > [...] GrainTables : <6144 : 2048> > [...] FreeBitmap : <8192 : 2048> > [...] BackMap : <10240 : 2048> > [...] Grain : <12288 : 204800> > [...] Volatile Header: > [...] volatileMagic = 0xcafecafe > [...] FreeGTNumber = 0 > [...] nextTxnSeqNumber = 0 > [...] replayJournal = 0 > > The sizes that are seen in the log file are in sectors. > Extents are of the following format: <offset : size> > > This commit is a strict implementation which enforces: > * magics > * version number 2.1 > * grain size of 8 sectors (4KB) > * grain table size of 64 sectors > * zero flags > * extent locations > > Additionally, this commit proivdes only a subset of the functionality > offered by seSparse's format: > * Read-only > * No journal replay > * No space reclamation > * No unmap support > > Hence, journal header, journal, free bitmap and backmap extents are > unused, only the "classic" (L1 -> L2 -> data) grain access is > implemented. > > However there are several differences in the grain access itself. > Grain directory (L1): > * Grain directory entries are indexes (not offsets) to grain > tables. > * Valid grain directory entries have their highest nibble set to > 0x1. > * Since grain tables are always located in the beginning of the > file - the index can fit into 32 bits - so we can use its low > part if it's valid. > Grain table (L2): > * Grain table entries are indexes (not offsets) to grains. > * If the highest nibble of the entry is: > 0x0: > The grain in not allocated. > The rest of the bytes are 0. > 0x1: > The grain is unmapped - guest sees a zero grain. > The rest of the bits point to the previously mapped grain, > see 0x3 case. > 0x2: > The grain is zero. > 0x3: > The grain is allocated - to get the index calculate: > ((entry & 0x0fff000000000000) >> 48) | > ((entry & 0x0000ffffffffffff) << 12) > * The difference between 0x1 and 0x2 is that 0x1 is an unallocated > grain which results from the guest using sg_unmap to unmap the > grain - but the grain itself still exists in the grain extent - a > space reclamation procedure should delete it. > Unmapping a zero grain has no effect (0x2 will not change to 0x1) > but unmapping an unallocated grain will (0x0 to 0x1) - naturally. > > In order to implement seSparse some fields had to be changed to support > both 32-bit and 64-bit entry sizes. > > Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> > Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> > Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> > Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> > --- > block/vmdk.c | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 341 insertions(+), 16 deletions(-) > > diff --git a/block/vmdk.c b/block/vmdk.c > index 931eb2759c..4377779635 100644 > --- a/block/vmdk.c > +++ b/block/vmdk.c [...] > +static int vmdk_open_se_sparse(BlockDriverState *bs, > + BdrvChild *file, > + int flags, Error **errp) > +{ > + int ret; > + VMDKSESparseConstHeader const_header; > + VMDKSESparseVolatileHeader volatile_header; > + VmdkExtent *extent; > + > + if (flags & BDRV_O_RDWR) { > + error_setg(errp, "No write support for seSparse images available"); > + return -ENOTSUP; > + } Kind of works for me, but why not bdrv_apply_auto_read_only() like I had proposed? The advantage is that this would make the node read-only if the user has specified auto-read-only=on instead of failing. Max [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/3] vmdk: Add read-only support for seSparse snapshots 2019-06-19 17:12 ` Max Reitz @ 2019-06-20 8:48 ` Sam Eiderman 0 siblings, 0 replies; 9+ messages in thread From: Sam Eiderman @ 2019-06-20 8:48 UTC (permalink / raw) To: Max Reitz Cc: kwolf, eyal.moscovici, qemu-block, arbel.moshe, qemu-devel, liran.alon, karl.heubaum > On 19 Jun 2019, at 20:12, Max Reitz <mreitz@redhat.com> wrote: > > On 05.06.19 14:17, Sam Eiderman wrote: >> Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in >> QEMU). >> >> This format was lacking in the following: >> >> * Grain directory (L1) and grain table (L2) entries were 32-bit, >> allowing access to only 2TB (slightly less) of data. >> * The grain size (default) was 512 bytes - leading to data >> fragmentation and many grain tables. >> * For space reclamation purposes, it was necessary to find all the >> grains which are not pointed to by any grain table - so a reverse >> mapping of "offset of grain in vmdk" to "grain table" must be >> constructed - which takes large amounts of CPU/RAM. >> >> The format specification can be found in VMware's documentation: >> https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf >> >> In ESXi 6.5, to support snapshot files larger than 2TB, a new format was >> introduced: SESparse (Space Efficient). >> >> This format fixes the above issues: >> >> * All entries are now 64-bit. >> * The grain size (default) is 4KB. >> * Grain directory and grain tables are now located at the beginning >> of the file. >> + seSparse format reserves space for all grain tables. >> + Grain tables can be addressed using an index. >> + Grains are located in the end of the file and can also be >> addressed with an index. >> - seSparse vmdks of large disks (64TB) have huge preallocated >> headers - mainly due to L2 tables, even for empty snapshots. >> * The header contains a reverse mapping ("backmap") of "offset of >> grain in vmdk" to "grain table" and a bitmap ("free bitmap") which >> specifies for each grain - whether it is allocated or not. >> Using these data structures we can implement space reclamation >> efficiently. >> * Due to the fact that the header now maintains two mappings: >> * The regular one (grain directory & grain tables) >> * A reverse one (backmap and free bitmap) >> These data structures can lose consistency upon crash and result >> in a corrupted VMDK. >> Therefore, a journal is also added to the VMDK and is replayed >> when the VMware reopens the file after a crash. >> >> Since ESXi 6.7 - SESparse is the only snapshot format available. >> >> Unfortunately, VMware does not provide documentation regarding the new >> seSparse format. >> >> This commit is based on black-box research of the seSparse format. >> Various in-guest block operations and their effect on the snapshot file >> were tested. >> >> The only VMware provided source of information (regarding the underlying >> implementation) was a log file on the ESXi: >> >> /var/log/hostd.log >> >> Whenever an seSparse snapshot is created - the log is being populated >> with seSparse records. >> >> Relevant log records are of the form: >> >> [...] Const Header: >> [...] constMagic = 0xcafebabe >> [...] version = 2.1 >> [...] capacity = 204800 >> [...] grainSize = 8 >> [...] grainTableSize = 64 >> [...] flags = 0 >> [...] Extents: >> [...] Header : <1 : 1> >> [...] JournalHdr : <2 : 2> >> [...] Journal : <2048 : 2048> >> [...] GrainDirectory : <4096 : 2048> >> [...] GrainTables : <6144 : 2048> >> [...] FreeBitmap : <8192 : 2048> >> [...] BackMap : <10240 : 2048> >> [...] Grain : <12288 : 204800> >> [...] Volatile Header: >> [...] volatileMagic = 0xcafecafe >> [...] FreeGTNumber = 0 >> [...] nextTxnSeqNumber = 0 >> [...] replayJournal = 0 >> >> The sizes that are seen in the log file are in sectors. >> Extents are of the following format: <offset : size> >> >> This commit is a strict implementation which enforces: >> * magics >> * version number 2.1 >> * grain size of 8 sectors (4KB) >> * grain table size of 64 sectors >> * zero flags >> * extent locations >> >> Additionally, this commit proivdes only a subset of the functionality >> offered by seSparse's format: >> * Read-only >> * No journal replay >> * No space reclamation >> * No unmap support >> >> Hence, journal header, journal, free bitmap and backmap extents are >> unused, only the "classic" (L1 -> L2 -> data) grain access is >> implemented. >> >> However there are several differences in the grain access itself. >> Grain directory (L1): >> * Grain directory entries are indexes (not offsets) to grain >> tables. >> * Valid grain directory entries have their highest nibble set to >> 0x1. >> * Since grain tables are always located in the beginning of the >> file - the index can fit into 32 bits - so we can use its low >> part if it's valid. >> Grain table (L2): >> * Grain table entries are indexes (not offsets) to grains. >> * If the highest nibble of the entry is: >> 0x0: >> The grain in not allocated. >> The rest of the bytes are 0. >> 0x1: >> The grain is unmapped - guest sees a zero grain. >> The rest of the bits point to the previously mapped grain, >> see 0x3 case. >> 0x2: >> The grain is zero. >> 0x3: >> The grain is allocated - to get the index calculate: >> ((entry & 0x0fff000000000000) >> 48) | >> ((entry & 0x0000ffffffffffff) << 12) >> * The difference between 0x1 and 0x2 is that 0x1 is an unallocated >> grain which results from the guest using sg_unmap to unmap the >> grain - but the grain itself still exists in the grain extent - a >> space reclamation procedure should delete it. >> Unmapping a zero grain has no effect (0x2 will not change to 0x1) >> but unmapping an unallocated grain will (0x0 to 0x1) - naturally. >> >> In order to implement seSparse some fields had to be changed to support >> both 32-bit and 64-bit entry sizes. >> >> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> >> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> >> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> >> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> >> --- >> block/vmdk.c | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 341 insertions(+), 16 deletions(-) >> >> diff --git a/block/vmdk.c b/block/vmdk.c >> index 931eb2759c..4377779635 100644 >> --- a/block/vmdk.c >> +++ b/block/vmdk.c > > [...] > >> +static int vmdk_open_se_sparse(BlockDriverState *bs, >> + BdrvChild *file, >> + int flags, Error **errp) >> +{ >> + int ret; >> + VMDKSESparseConstHeader const_header; >> + VMDKSESparseVolatileHeader volatile_header; >> + VmdkExtent *extent; >> + >> + if (flags & BDRV_O_RDWR) { >> + error_setg(errp, "No write support for seSparse images available"); >> + return -ENOTSUP; >> + } > Kind of works for me, but why not bdrv_apply_auto_read_only() like I had > proposed? The advantage is that this would make the node read-only if > the user has specified auto-read-only=on instead of failing. > Ah, I have not realized that bdrv_apply_auto_read_only() is preferred. I’ll send a v3. Sam > Max ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format 2019-06-05 12:17 [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman ` (2 preceding siblings ...) 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 3/3] vmdk: Add read-only support for seSparse snapshots Sam Eiderman @ 2019-06-19 9:31 ` Sam Eiderman 3 siblings, 0 replies; 9+ messages in thread From: Sam Eiderman @ 2019-06-19 9:31 UTC (permalink / raw) To: kwolf, qemu-block, QEMU, Max Reitz Cc: arbel.moshe, liran.alon, Eyal Moscovici, karl.heubaum Gentle ping > On 5 Jun 2019, at 15:17, Sam Eiderman <shmuel.eiderman@oracle.com> wrote: > > v1: > > VMware introduced a new snapshot format in VMFS6 - seSparse (Space > Efficient Sparse) which is the default format available in ESXi 6.7. > Add read-only support for the new snapshot format. > > v2: > > Fixed after Max's review: > > * Removed strict sesparse checks > * Reduced maximal L1 table size > * Added non-write mode check in vmdk_open() on sesparse > > Sam Eiderman (3): > vmdk: Fix comment regarding max l1_size coverage > vmdk: Reduce the max bound for L1 table size > vmdk: Add read-only support for seSparse snapshots > > block/vmdk.c | 371 ++++++++++++++++++++++++++++++++++++++++++--- > tests/qemu-iotests/059.out | 2 +- > 2 files changed, 352 insertions(+), 21 deletions(-) > > -- > 2.13.3 > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-06-20 9:29 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-05 12:17 [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 1/3] vmdk: Fix comment regarding max l1_size coverage Sam Eiderman 2019-06-19 17:10 ` Max Reitz 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 2/3] vmdk: Reduce the max bound for L1 table size Sam Eiderman 2019-06-19 17:09 ` Max Reitz 2019-06-05 12:17 ` [Qemu-devel] [PATCH v2 3/3] vmdk: Add read-only support for seSparse snapshots Sam Eiderman 2019-06-19 17:12 ` Max Reitz 2019-06-20 8:48 ` Sam Eiderman 2019-06-19 9:31 ` [Qemu-devel] [PATCH v2 0/3] vmdk: Add read-only support for the new seSparse format Sam Eiderman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).