All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters
@ 2017-06-05  7:52 Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 1/8] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Previously posted series patches:
v1 - http://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg02044.html
v2 - http://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg05080.html
v3 - http://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg00074.html
v4 - http://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg03851.html
v5 - http://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg00929.html

This series helps to optimize the I/O performance of VMDK driver.

Patch 1 helps us to move vmdk_find_offset_in_cluster.

Patch 2 & 3 perform a simple function re-naming tasks.

Patch 4 is used to factor out metadata loading code and implement it in separate
functions. This will help us to avoid code duplication in future patches of this
series.

Patch 5 helps to set the upper limit of the bytes handled in one cycle.

Patch 6 adds new functions to help us allocate multiple clusters according to
the size requested, perform COW if required and return the offset of the first
newly allocated cluster.

Patch 7 changes the metadata update code to update the L2 tables for multiple
clusters at once.

Patch 8 helps us to finally change vmdk_get_cluster_offset() to find cluster offset
only as cluster allocation task is now handled by vmdk_alloc_clusters()

Optimization test results:

This patch series improves 128 KB sequential write performance to an
empty VMDK file by 54%

Benchmark command: ./qemu-img bench -w -c 1024 -s 128K -d 1 -t none -f
vmdk test.vmdk

Changes in v6:
- rename total_alloc_clusters as alloc_clusters_counter (fam)

Changes in v5:
- fix commit message and comment in patch 4 (fam)
- add vmdk_ prefix to handle_alloc() (fam)
- fix alignment issue in patch 6 (fam)
- use BDRV_SECTOR_BITS (fam)
- fix endianness calculation in patch 7 (fam)

Changes in v4:
- fix commit message in patch 1 (fam)
- drop size_to_clusters() function (fam)
- fix grammatical errors in function documentations (fam)
- factor out metadata loading coding in a separate patch (patch 4) (fam)
- rename vmdk_alloc_cluster_offset() to vmdk_alloc_clusters() (fam)
- break patch 4(in v3) into separate patches (patch 3 and 8) (fam)
- rename extent_size to extent_end (fam)
- use QEMU_ALIGN_UP instead of vmdk_align_offset. (fam)
- drop next and simply do m_data = m_data->next (fam)

Changes in v3:
- move size_to_clusters() from patch 1 to 3 (fam)
- use DIV_ROUND_UP in size_to_clusters (fam)
- make patch 2 compilable (fam)
- rename vmdk_L2update as vmdk_l2update and use UINT32_MAX (fam)
- combine patch 3 and patch 4 (as in v2) to make them compilable (fam)
- call bdrv_pwrite_sync() for batches of atmost 512 clusters at once (fam)

Changes in v2:
- segregate the ugly Patch 1 in v1 into 6 readable and sensible patches
- include benchmark test results in v2

Ashijeet Acharya (8):
  vmdk: Move vmdk_find_offset_in_cluster() to the top
  vmdk: Rename get_whole_cluster() to vmdk_perform_cow()
  vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  vmdk: Factor out metadata loading code out of
    vmdk_get_cluster_offset()
  vmdk: Set maximum bytes allocated in one cycle
  vmdk: New functions to assist allocating multiple clusters
  vmdk: Update metadata for multiple clusters
  vmdk: Make vmdk_get_cluster_offset() return cluster offset only

 block/vmdk.c | 529 +++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 407 insertions(+), 122 deletions(-)

-- 
2.6.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 1/8] vmdk: Move vmdk_find_offset_in_cluster() to the top
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 2/8] vmdk: Rename get_whole_cluster() to vmdk_perform_cow() Ashijeet Acharya
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Move the existing vmdk_find_offset_in_cluster() function to the top of
the driver.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/vmdk.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a9bd22b..22be887 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -242,6 +242,18 @@ static void vmdk_free_last_extent(BlockDriverState *bs)
     s->extents = g_renew(VmdkExtent, s->extents, s->num_extents);
 }
 
+static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
+                                                   int64_t offset)
+{
+    uint64_t extent_begin_offset, extent_relative_offset;
+    uint64_t cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
+
+    extent_begin_offset =
+        (extent->end_sector - extent->sectors) * BDRV_SECTOR_SIZE;
+    extent_relative_offset = offset - extent_begin_offset;
+    return extent_relative_offset % cluster_size;
+}
+
 static uint32_t vmdk_read_cid(BlockDriverState *bs, int parent)
 {
     char *desc;
@@ -1266,18 +1278,6 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
     return NULL;
 }
 
-static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
-                                                   int64_t offset)
-{
-    uint64_t extent_begin_offset, extent_relative_offset;
-    uint64_t cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
-
-    extent_begin_offset =
-        (extent->end_sector - extent->sectors) * BDRV_SECTOR_SIZE;
-    extent_relative_offset = offset - extent_begin_offset;
-    return extent_relative_offset % cluster_size;
-}
-
 static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
                                                   int64_t sector_num)
 {
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 2/8] vmdk: Rename get_whole_cluster() to vmdk_perform_cow()
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 1/8] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 3/8] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset() Ashijeet Acharya
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Rename the existing function get_whole_cluster() to vmdk_perform_cow()
as its sole purpose is to perform COW for the first and the last
allocated clusters if needed.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/vmdk.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 22be887..73ae786 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1028,8 +1028,8 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
     }
 }
 
-/**
- * get_whole_cluster
+/*
+ * vmdk_perform_cow
  *
  * Copy backing file's cluster that covers @sector_num, otherwise write zero,
  * to the cluster at @cluster_sector_num.
@@ -1037,13 +1037,18 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)
  * If @skip_start_sector < @skip_end_sector, the relative range
  * [@skip_start_sector, @skip_end_sector) is not copied or written, and leave
  * it for call to write user data in the request.
+ *
+ * Returns:
+ *   VMDK_OK:       on success
+ *
+ *   VMDK_ERROR:    in error cases
  */
-static int get_whole_cluster(BlockDriverState *bs,
-                             VmdkExtent *extent,
-                             uint64_t cluster_offset,
-                             uint64_t offset,
-                             uint64_t skip_start_bytes,
-                             uint64_t skip_end_bytes)
+static int vmdk_perform_cow(BlockDriverState *bs,
+                            VmdkExtent *extent,
+                            uint64_t cluster_offset,
+                            uint64_t offset,
+                            uint64_t skip_start_bytes,
+                            uint64_t skip_end_bytes)
 {
     int ret = VMDK_OK;
     int64_t cluster_bytes;
@@ -1244,7 +1249,7 @@ static int get_cluster_offset(BlockDriverState *bs,
          * This problem may occur because of insufficient space on host disk
          * or inappropriate VM shutdown.
          */
-        ret = get_whole_cluster(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
+        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
                                 offset, skip_start_bytes, skip_end_bytes);
         if (ret) {
             return ret;
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 3/8] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset()
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 1/8] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 2/8] vmdk: Rename get_whole_cluster() to vmdk_perform_cow() Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 4/8] vmdk: Factor out metadata loading code out of vmdk_get_cluster_offset() Ashijeet Acharya
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Rename the existing get_cluster_offset() to vmdk_get_cluster_offset()
and update name in all the callers accordingly.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/vmdk.c | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 73ae786..f403981 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1144,7 +1144,7 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
 }
 
 /**
- * get_cluster_offset
+ * vmdk_get_cluster_offset
  *
  * Look up cluster offset in extent file by sector number, and store in
  * @cluster_offset.
@@ -1163,14 +1163,14 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
  *          VMDK_UNALLOC if cluster is not mapped and @allocate is false.
  *          VMDK_ERROR if failed.
  */
-static int get_cluster_offset(BlockDriverState *bs,
-                              VmdkExtent *extent,
-                              VmdkMetaData *m_data,
-                              uint64_t offset,
-                              bool allocate,
-                              uint64_t *cluster_offset,
-                              uint64_t skip_start_bytes,
-                              uint64_t skip_end_bytes)
+static int vmdk_get_cluster_offset(BlockDriverState *bs,
+                                   VmdkExtent *extent,
+                                   VmdkMetaData *m_data,
+                                   uint64_t offset,
+                                   bool allocate,
+                                   uint64_t *cluster_offset,
+                                   uint64_t skip_start_bytes,
+                                   uint64_t skip_end_bytes)
 {
     unsigned int l1_index, l2_offset, l2_index;
     int min_index, i, j;
@@ -1304,9 +1304,9 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
         return 0;
     }
     qemu_co_mutex_lock(&s->lock);
-    ret = get_cluster_offset(bs, extent, NULL,
-                             sector_num * 512, false, &offset,
-                             0, 0);
+    ret = vmdk_get_cluster_offset(bs, extent, NULL,
+                                  sector_num * 512, false, &offset,
+                                  0, 0);
     qemu_co_mutex_unlock(&s->lock);
 
     index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
@@ -1497,8 +1497,8 @@ vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
             ret = -EIO;
             goto fail;
         }
-        ret = get_cluster_offset(bs, extent, NULL,
-                                 offset, false, &cluster_offset, 0, 0);
+        ret = vmdk_get_cluster_offset(bs, extent, NULL,
+                                      offset, false, &cluster_offset, 0, 0);
         offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 
         n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
@@ -1584,10 +1584,10 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                              - offset_in_cluster);
 
-        ret = get_cluster_offset(bs, extent, &m_data, offset,
-                                 !(extent->compressed || zeroed),
-                                 &cluster_offset, offset_in_cluster,
-                                 offset_in_cluster + n_bytes);
+        ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
+                                      !(extent->compressed || zeroed),
+                                      &cluster_offset, offset_in_cluster,
+                                      offset_in_cluster + n_bytes);
         if (extent->compressed) {
             if (ret == VMDK_OK) {
                 /* Refuse write to allocated cluster for streamOptimized */
@@ -1596,8 +1596,8 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
                 return -EIO;
             } else {
                 /* allocate */
-                ret = get_cluster_offset(bs, extent, &m_data, offset,
-                                         true, &cluster_offset, 0, 0);
+                ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
+                                              true, &cluster_offset, 0, 0);
             }
         }
         if (ret == VMDK_ERROR) {
@@ -2229,9 +2229,9 @@ static int vmdk_check(BlockDriverState *bs, BdrvCheckResult *result,
                     sector_num);
             break;
         }
-        ret = get_cluster_offset(bs, extent, NULL,
-                                 sector_num << BDRV_SECTOR_BITS,
-                                 false, &cluster_offset, 0, 0);
+        ret = vmdk_get_cluster_offset(bs, extent, NULL,
+                                      sector_num << BDRV_SECTOR_BITS,
+                                      false, &cluster_offset, 0, 0);
         if (ret == VMDK_ERROR) {
             fprintf(stderr,
                     "ERROR: could not get cluster_offset for sector %"
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 4/8] vmdk: Factor out metadata loading code out of vmdk_get_cluster_offset()
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (2 preceding siblings ...)
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 3/8] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset() Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 5/8] vmdk: Set maximum bytes allocated in one cycle Ashijeet Acharya
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Move the cluster tables loading code out of the existing
vmdk_get_cluster_offset() function and implement it in separate
get_cluster_table() and vmdk_l2load() functions.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/vmdk.c | 153 ++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 105 insertions(+), 48 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index f403981..5647f53 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1143,6 +1143,105 @@ static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
     return VMDK_OK;
 }
 
+/*
+ * vmdk_l2load
+ *
+ * Load a new L2 table into memory. If the table is in the cache, the cache
+ * is used; otherwise the L2 table is loaded from the image file.
+ *
+ * Returns:
+ *   VMDK_OK:       on success
+ *   VMDK_ERROR:    in error cases
+ */
+static int vmdk_l2load(VmdkExtent *extent, uint64_t offset, int l2_offset,
+                       uint32_t **new_l2_table, int *new_l2_index)
+{
+    int min_index, i, j;
+    uint32_t *l2_table;
+    uint32_t min_count;
+
+    for (i = 0; i < L2_CACHE_SIZE; i++) {
+        if (l2_offset == extent->l2_cache_offsets[i]) {
+            /* increment the hit count */
+            if (++extent->l2_cache_counts[i] == UINT32_MAX) {
+                for (j = 0; j < L2_CACHE_SIZE; j++) {
+                    extent->l2_cache_counts[j] >>= 1;
+                }
+            }
+            l2_table = extent->l2_cache + (i * extent->l2_size);
+            goto found;
+        }
+    }
+    /* not found: load a new entry in the least used one */
+    min_index = 0;
+    min_count = UINT32_MAX;
+    for (i = 0; i < L2_CACHE_SIZE; i++) {
+        if (extent->l2_cache_counts[i] < min_count) {
+            min_count = extent->l2_cache_counts[i];
+            min_index = i;
+        }
+    }
+    l2_table = extent->l2_cache + (min_index * extent->l2_size);
+    if (bdrv_pread(extent->file,
+                (int64_t)l2_offset * 512,
+                l2_table,
+                extent->l2_size * sizeof(uint32_t)
+            ) != extent->l2_size * sizeof(uint32_t)) {
+        return VMDK_ERROR;
+    }
+
+    extent->l2_cache_offsets[min_index] = l2_offset;
+    extent->l2_cache_counts[min_index] = 1;
+found:
+    *new_l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
+    *new_l2_table = l2_table;
+
+    return VMDK_OK;
+}
+
+/*
+ * get_cluster_table
+ *
+ * For a given offset, load (and allocate if needed) the l2 table.
+ *
+ * Returns:
+ *   VMDK_OK:        on success
+ *
+ *   VMDK_UNALLOC:   if cluster is not mapped
+ *
+ *   VMDK_ERROR:     in error cases
+ */
+static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
+                             int *new_l1_index, int *new_l2_offset,
+                             int *new_l2_index, uint32_t **new_l2_table)
+{
+    int l1_index, l2_offset, l2_index;
+    uint32_t *l2_table;
+    int ret;
+
+    offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
+    l1_index = (offset >> 9) / extent->l1_entry_sectors;
+    if (l1_index >= extent->l1_size) {
+        return VMDK_ERROR;
+    }
+    l2_offset = extent->l1_table[l1_index];
+    if (!l2_offset) {
+        return VMDK_UNALLOC;
+    }
+
+    ret = vmdk_l2load(extent, offset, l2_offset, &l2_table, &l2_index);
+    if (ret < 0) {
+        return ret;
+    }
+
+    *new_l1_index = l1_index;
+    *new_l2_offset = l2_offset;
+    *new_l2_index = l2_index;
+    *new_l2_table = l2_table;
+
+    return VMDK_OK;
+}
+
 /**
  * vmdk_get_cluster_offset
  *
@@ -1172,66 +1271,24 @@ static int vmdk_get_cluster_offset(BlockDriverState *bs,
                                    uint64_t skip_start_bytes,
                                    uint64_t skip_end_bytes)
 {
-    unsigned int l1_index, l2_offset, l2_index;
-    int min_index, i, j;
-    uint32_t min_count, *l2_table;
+    int l1_index, l2_offset, l2_index;
+    uint32_t *l2_table;
     bool zeroed = false;
     int64_t ret;
     int64_t cluster_sector;
 
-    if (m_data) {
-        m_data->valid = 0;
-    }
     if (extent->flat) {
         *cluster_offset = extent->flat_start_offset;
         return VMDK_OK;
     }
 
-    offset -= (extent->end_sector - extent->sectors) * SECTOR_SIZE;
-    l1_index = (offset >> 9) / extent->l1_entry_sectors;
-    if (l1_index >= extent->l1_size) {
-        return VMDK_ERROR;
-    }
-    l2_offset = extent->l1_table[l1_index];
-    if (!l2_offset) {
-        return VMDK_UNALLOC;
-    }
-    for (i = 0; i < L2_CACHE_SIZE; i++) {
-        if (l2_offset == extent->l2_cache_offsets[i]) {
-            /* increment the hit count */
-            if (++extent->l2_cache_counts[i] == 0xffffffff) {
-                for (j = 0; j < L2_CACHE_SIZE; j++) {
-                    extent->l2_cache_counts[j] >>= 1;
-                }
-            }
-            l2_table = extent->l2_cache + (i * extent->l2_size);
-            goto found;
-        }
-    }
-    /* not found: load a new entry in the least used one */
-    min_index = 0;
-    min_count = 0xffffffff;
-    for (i = 0; i < L2_CACHE_SIZE; i++) {
-        if (extent->l2_cache_counts[i] < min_count) {
-            min_count = extent->l2_cache_counts[i];
-            min_index = i;
-        }
-    }
-    l2_table = extent->l2_cache + (min_index * extent->l2_size);
-    if (bdrv_pread(extent->file,
-                (int64_t)l2_offset * 512,
-                l2_table,
-                extent->l2_size * sizeof(uint32_t)
-            ) != extent->l2_size * sizeof(uint32_t)) {
-        return VMDK_ERROR;
+    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
+                            &l2_index, &l2_table);
+    if (ret < 0) {
+        return ret;
     }
 
-    extent->l2_cache_offsets[min_index] = l2_offset;
-    extent->l2_cache_counts[min_index] = 1;
- found:
-    l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
     cluster_sector = le32_to_cpu(l2_table[l2_index]);
-
     if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
         zeroed = true;
     }
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 5/8] vmdk: Set maximum bytes allocated in one cycle
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (3 preceding siblings ...)
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 4/8] vmdk: Factor out metadata loading code out of vmdk_get_cluster_offset() Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Set the maximum bytes allowed to get allocated at once to be not more
than the extent size boundary to handle writes at two separate extents
appropriately.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/vmdk.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 5647f53..fe2046b 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1624,6 +1624,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     uint64_t cluster_offset;
     uint64_t bytes_done = 0;
     VmdkMetaData m_data;
+    uint64_t extent_end;
 
     if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
         error_report("Wrong offset: offset=0x%" PRIx64
@@ -1637,9 +1638,17 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         if (!extent) {
             return -EIO;
         }
+        extent_end = extent->end_sector * BDRV_SECTOR_SIZE;
+
         offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
-        n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
-                             - offset_in_cluster);
+
+        /* truncate n_bytes to first cluster because we need to perform COW */
+        if (offset_in_cluster > 0) {
+            n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
+                                 - offset_in_cluster);
+        } else {
+            n_bytes = MIN(bytes, extent_end - offset);
+        }
 
         ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
                                       !(extent->compressed || zeroed),
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (4 preceding siblings ...)
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 5/8] vmdk: Set maximum bytes allocated in one cycle Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-27  8:02   ` Fam Zheng
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for " Ashijeet Acharya
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only Ashijeet Acharya
  7 siblings, 1 reply; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Introduce two new helper functions handle_alloc() and
vmdk_alloc_cluster_offset(). handle_alloc() helps to allocate multiple
clusters at once starting from a given offset on disk and performs COW
if necessary for first and last allocated clusters.
vmdk_alloc_cluster_offset() helps to return the offset of the first of
the many newly allocated clusters. Also, provide proper documentation
for both.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 182 insertions(+), 10 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index fe2046b..b671dc9 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -136,6 +136,7 @@ typedef struct VmdkMetaData {
     unsigned int l2_offset;
     int valid;
     uint32_t *l2_cache_entry;
+    uint32_t nb_clusters;
 } VmdkMetaData;
 
 typedef struct VmdkGrainMarker {
@@ -1242,6 +1243,174 @@ static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
     return VMDK_OK;
 }
 
+/*
+ * vmdk_handle_alloc
+ *
+ * Allocate new clusters for an area that either is yet unallocated or needs a
+ * copy on write. If *cluster_offset is non_zero, clusters are only allocated if
+ * the new allocation can match the specified host offset.
+ *
+ * Returns:
+ *   VMDK_OK:       if new clusters were allocated, *bytes may be decreased if
+ *                  the new allocation doesn't cover all of the requested area.
+ *                  *cluster_offset is updated to contain the offset of the
+ *                  first newly allocated cluster.
+ *
+ *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is left
+ *                  unchanged.
+ *
+ *   VMDK_ERROR:    in error cases
+ */
+static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
+                             uint64_t offset, uint64_t *cluster_offset,
+                             int64_t *bytes, VmdkMetaData *m_data,
+                             bool allocate, uint32_t *alloc_clusters_counter)
+{
+    int l1_index, l2_offset, l2_index;
+    uint32_t *l2_table;
+    uint32_t cluster_sector;
+    uint32_t nb_clusters;
+    bool zeroed = false;
+    uint64_t skip_start_bytes, skip_end_bytes;
+    int ret;
+
+    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
+                            &l2_index, &l2_table);
+    if (ret < 0) {
+        return ret;
+    }
+
+    cluster_sector = le32_to_cpu(l2_table[l2_index]);
+
+    skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
+    /* Calculate the number of clusters to look for. Here we truncate the last
+     * cluster, i.e. 1 less than the actual value calculated as we may need to
+     * perform COW for the last one. */
+    nb_clusters = DIV_ROUND_UP(skip_start_bytes + *bytes,
+                               extent->cluster_sectors << BDRV_SECTOR_BITS) - 1;
+
+    nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
+    assert(nb_clusters <= INT_MAX);
+
+    /* update bytes according to final nb_clusters value */
+    if (nb_clusters != 0) {
+        *bytes = ((nb_clusters * extent->cluster_sectors) << BDRV_SECTOR_BITS)
+                 - skip_start_bytes;
+    } else {
+        nb_clusters = 1;
+    }
+    *alloc_clusters_counter += nb_clusters;
+    skip_end_bytes = skip_start_bytes + MIN(*bytes,
+                     extent->cluster_sectors * BDRV_SECTOR_SIZE
+                                    - skip_start_bytes);
+
+    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
+        zeroed = true;
+    }
+
+    if (!cluster_sector || zeroed) {
+        if (!allocate) {
+            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
+        }
+
+        cluster_sector = extent->next_cluster_sector;
+        extent->next_cluster_sector += extent->cluster_sectors
+                                                * nb_clusters;
+
+        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
+                               offset, skip_start_bytes,
+                               skip_end_bytes);
+        if (ret < 0) {
+            return ret;
+        }
+        if (m_data) {
+            m_data->valid = 1;
+            m_data->l1_index = l1_index;
+            m_data->l2_index = l2_index;
+            m_data->l2_offset = l2_offset;
+            m_data->l2_cache_entry = &l2_table[l2_index];
+            m_data->nb_clusters = nb_clusters;
+        }
+    }
+    *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
+    return VMDK_OK;
+}
+
+/*
+ * vmdk_alloc_clusters
+ *
+ * For a given offset on the virtual disk, find the cluster offset in vmdk
+ * file. If the offset is not found, allocate a new cluster.
+ *
+ * If the cluster is newly allocated, m_data->nb_clusters is set to the number
+ * of contiguous clusters that have been allocated. In this case, the other
+ * fields of m_data are valid and contain information about the first allocated
+ * cluster.
+ *
+ * Returns:
+ *
+ *   VMDK_OK:           on success and @cluster_offset was set
+ *
+ *   VMDK_UNALLOC:      if no clusters were allocated and @cluster_offset is
+ *                      set to zero
+ *
+ *   VMDK_ERROR:        in error cases
+ */
+static int vmdk_alloc_clusters(BlockDriverState *bs,
+                               VmdkExtent *extent,
+                               VmdkMetaData *m_data, uint64_t offset,
+                               bool allocate, uint64_t *cluster_offset,
+                               int64_t bytes,
+                               uint32_t *total_alloc_clusters)
+{
+    uint64_t start, remaining;
+    uint64_t new_cluster_offset;
+    int64_t n_bytes;
+    int ret;
+
+    if (extent->flat) {
+        *cluster_offset = extent->flat_start_offset;
+        return VMDK_OK;
+    }
+
+    start = offset;
+    remaining = bytes;
+    new_cluster_offset = 0;
+    *cluster_offset = 0;
+    n_bytes = 0;
+    if (m_data) {
+        m_data->valid = 0;
+    }
+
+    /* due to L2 table margins all bytes may not get allocated at once */
+    while (true) {
+
+        if (!*cluster_offset) {
+            *cluster_offset = new_cluster_offset;
+        }
+
+        start              += n_bytes;
+        remaining          -= n_bytes;
+        new_cluster_offset += n_bytes;
+
+        if (remaining == 0) {
+            break;
+        }
+
+        n_bytes = remaining;
+
+        ret = vmdk_handle_alloc(bs, extent, start, &new_cluster_offset, &n_bytes,
+                                m_data, allocate, total_alloc_clusters);
+
+        if (ret < 0) {
+            return ret;
+
+        }
+    }
+
+    return VMDK_OK;
+}
+
 /**
  * vmdk_get_cluster_offset
  *
@@ -1625,6 +1794,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     uint64_t bytes_done = 0;
     VmdkMetaData m_data;
     uint64_t extent_end;
+    uint32_t total_alloc_clusters = 0;
 
     if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
         error_report("Wrong offset: offset=0x%" PRIx64
@@ -1650,10 +1820,10 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
             n_bytes = MIN(bytes, extent_end - offset);
         }
 
-        ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
-                                      !(extent->compressed || zeroed),
-                                      &cluster_offset, offset_in_cluster,
-                                      offset_in_cluster + n_bytes);
+        ret = vmdk_alloc_clusters(bs, extent, &m_data, offset,
+                                  !(extent->compressed || zeroed),
+                                  &cluster_offset, n_bytes,
+                                  &total_alloc_clusters);
         if (extent->compressed) {
             if (ret == VMDK_OK) {
                 /* Refuse write to allocated cluster for streamOptimized */
@@ -1662,8 +1832,9 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
                 return -EIO;
             } else {
                 /* allocate */
-                ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
-                                              true, &cluster_offset, 0, 0);
+                ret = vmdk_alloc_clusters(bs, extent, &m_data, offset,
+                                          true, &cluster_offset, n_bytes,
+                                          &total_alloc_clusters);
             }
         }
         if (ret == VMDK_ERROR) {
@@ -1671,10 +1842,11 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         }
         if (zeroed) {
             /* Do zeroed write, buf is ignored */
-            if (extent->has_zero_grain &&
-                    offset_in_cluster == 0 &&
-                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE) {
-                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE;
+            if (extent->has_zero_grain && offset_in_cluster == 0 &&
+                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE *
+                        total_alloc_clusters) {
+                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE *
+                                        total_alloc_clusters;
                 if (!zero_dry_run) {
                     /* update L2 tables */
                     if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for multiple clusters
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (5 preceding siblings ...)
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-27  8:04   ` Fam Zheng
  2017-06-27  8:14   ` Fam Zheng
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only Ashijeet Acharya
  7 siblings, 2 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

Include a next pointer in VmdkMetaData struct to point to the previous
allocated L2 table. Modify vmdk_L2update to start updating metadata for
allocation of multiple clusters at once.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 101 insertions(+), 27 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index b671dc9..9fa2414 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -137,6 +137,8 @@ typedef struct VmdkMetaData {
     int valid;
     uint32_t *l2_cache_entry;
     uint32_t nb_clusters;
+    uint32_t offset;
+    struct VmdkMetaData *next;
 } VmdkMetaData;
 
 typedef struct VmdkGrainMarker {
@@ -1116,34 +1118,87 @@ exit:
     return ret;
 }
 
-static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
-                         uint32_t offset)
+static int vmdk_alloc_cluster_link_l2(VmdkExtent *extent,
+                                      VmdkMetaData *m_data, bool zeroed)
 {
-    offset = cpu_to_le32(offset);
+    int i;
+    uint32_t offset, temp_offset;
+    int *l2_table_array;
+    int l2_array_size;
+
+    if (zeroed) {
+        temp_offset = VMDK_GTE_ZEROED;
+    } else {
+        temp_offset = m_data->offset;
+    }
+
+    l2_array_size = sizeof(uint32_t) * m_data->nb_clusters;
+    l2_table_array = qemu_try_blockalign(extent->file->bs,
+                                         QEMU_ALIGN_UP(l2_array_size,
+                                                       BDRV_SECTOR_SIZE));
+    if (l2_table_array == NULL) {
+        return VMDK_ERROR;
+    }
+    memset(l2_table_array, 0, QEMU_ALIGN_UP(l2_array_size, BDRV_SECTOR_SIZE));
     /* update L2 table */
+    offset = temp_offset;
+    for (i = 0; i < m_data->nb_clusters; i++) {
+        l2_table_array[i] = cpu_to_le32(offset);
+        if (!zeroed) {
+            offset += 128;
+        }
+    }
     if (bdrv_pwrite_sync(extent->file,
-                ((int64_t)m_data->l2_offset * 512)
-                    + (m_data->l2_index * sizeof(offset)),
-                &offset, sizeof(offset)) < 0) {
+                         ((int64_t)m_data->l2_offset * 512)
+                         + ((m_data->l2_index) * sizeof(offset)),
+                         l2_table_array, l2_array_size) < 0) {
         return VMDK_ERROR;
     }
     /* update backup L2 table */
     if (extent->l1_backup_table_offset != 0) {
         m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
         if (bdrv_pwrite_sync(extent->file,
-                    ((int64_t)m_data->l2_offset * 512)
-                        + (m_data->l2_index * sizeof(offset)),
-                    &offset, sizeof(offset)) < 0) {
+                             ((int64_t)m_data->l2_offset * 512)
+                             + ((m_data->l2_index) * sizeof(offset)),
+                             l2_table_array, l2_array_size) < 0) {
             return VMDK_ERROR;
         }
     }
+
+    offset = temp_offset;
     if (m_data->l2_cache_entry) {
-        *m_data->l2_cache_entry = offset;
+        for (i = 0; i < m_data->nb_clusters; i++) {
+            *m_data->l2_cache_entry = cpu_to_le32(offset);
+            m_data->l2_cache_entry++;
+
+            if (!zeroed) {
+                offset += 128;
+            }
+        }
     }
 
+    qemu_vfree(l2_table_array);
     return VMDK_OK;
 }
 
+static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
+                         bool zeroed)
+{
+    int ret;
+
+    while (m_data->next != NULL) {
+
+        ret = vmdk_alloc_cluster_link_l2(extent, m_data, zeroed);
+        if (ret < 0) {
+            return ret;
+        }
+
+        m_data = m_data->next;
+     }
+
+     return VMDK_OK;
+}
+
 /*
  * vmdk_l2load
  *
@@ -1261,9 +1316,10 @@ static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
  *
  *   VMDK_ERROR:    in error cases
  */
+
 static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
                              uint64_t offset, uint64_t *cluster_offset,
-                             int64_t *bytes, VmdkMetaData *m_data,
+                             int64_t *bytes, VmdkMetaData **m_data,
                              bool allocate, uint32_t *alloc_clusters_counter)
 {
     int l1_index, l2_offset, l2_index;
@@ -1272,6 +1328,7 @@ static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
     uint32_t nb_clusters;
     bool zeroed = false;
     uint64_t skip_start_bytes, skip_end_bytes;
+    VmdkMetaData *old_m_data;
     int ret;
 
     ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
@@ -1323,13 +1380,21 @@ static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
         if (ret < 0) {
             return ret;
         }
-        if (m_data) {
-            m_data->valid = 1;
-            m_data->l1_index = l1_index;
-            m_data->l2_index = l2_index;
-            m_data->l2_offset = l2_offset;
-            m_data->l2_cache_entry = &l2_table[l2_index];
-            m_data->nb_clusters = nb_clusters;
+
+        if (*m_data) {
+            old_m_data = *m_data;
+            *m_data = g_malloc0(sizeof(**m_data));
+
+            **m_data = (VmdkMetaData) {
+                .valid            =    1,
+                .l1_index         =    l1_index,
+                .l2_index         =    l2_index,
+                .l2_offset        =    l2_offset,
+                .l2_cache_entry   =    &l2_table[l2_index],
+                .nb_clusters      =    nb_clusters,
+                .offset           =    cluster_sector,
+                .next             =    old_m_data,
+            };
         }
     }
     *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
@@ -1358,7 +1423,7 @@ static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
  */
 static int vmdk_alloc_clusters(BlockDriverState *bs,
                                VmdkExtent *extent,
-                               VmdkMetaData *m_data, uint64_t offset,
+                               VmdkMetaData **m_data, uint64_t offset,
                                bool allocate, uint64_t *cluster_offset,
                                int64_t bytes,
                                uint32_t *total_alloc_clusters)
@@ -1378,8 +1443,8 @@ static int vmdk_alloc_clusters(BlockDriverState *bs,
     new_cluster_offset = 0;
     *cluster_offset = 0;
     n_bytes = 0;
-    if (m_data) {
-        m_data->valid = 0;
+    if (*m_data) {
+        (*m_data)->valid = 0;
     }
 
     /* due to L2 table margins all bytes may not get allocated at once */
@@ -1792,10 +1857,12 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     int64_t offset_in_cluster, n_bytes;
     uint64_t cluster_offset;
     uint64_t bytes_done = 0;
-    VmdkMetaData m_data;
     uint64_t extent_end;
+    VmdkMetaData *m_data;
     uint32_t total_alloc_clusters = 0;
 
+    m_data = g_malloc0(sizeof(*m_data));
+
     if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
         error_report("Wrong offset: offset=0x%" PRIx64
                      " total_sectors=0x%" PRIx64,
@@ -1804,6 +1871,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
     }
 
     while (bytes > 0) {
+        m_data->next = NULL;
         extent = find_extent(s, offset >> BDRV_SECTOR_BITS, extent);
         if (!extent) {
             return -EIO;
@@ -1849,7 +1917,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
                                         total_alloc_clusters;
                 if (!zero_dry_run) {
                     /* update L2 tables */
-                    if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
+                    if (vmdk_L2update(extent, m_data, zeroed)
                             != VMDK_OK) {
                         return -EIO;
                     }
@@ -1863,11 +1931,9 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
             if (ret) {
                 return ret;
             }
-            if (m_data.valid) {
+            if (m_data->valid) {
                 /* update L2 tables */
-                if (vmdk_L2update(extent, &m_data,
-                                  cluster_offset >> BDRV_SECTOR_BITS)
-                        != VMDK_OK) {
+                if (vmdk_L2update(extent, m_data, zeroed) != VMDK_OK) {
                     return -EIO;
                 }
             }
@@ -1876,6 +1942,13 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         offset += n_bytes;
         bytes_done += n_bytes;
 
+        while (m_data->next != NULL) {
+            VmdkMetaData *next;
+            next = m_data->next;
+            g_free(m_data);
+            m_data = next;
+        }
+
         /* update CID on the first write every time the virtual disk is
          * opened */
         if (!s->cid_updated) {
@@ -1886,6 +1959,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
             s->cid_updated = true;
         }
     }
+    g_free(m_data);
     return 0;
 }
 
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v6 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only
  2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
                   ` (6 preceding siblings ...)
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for " Ashijeet Acharya
@ 2017-06-05  7:52 ` Ashijeet Acharya
  2017-06-27  8:15   ` Fam Zheng
  7 siblings, 1 reply; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-05  7:52 UTC (permalink / raw)
  To: famz
  Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block, Ashijeet Acharya

vmdk_alloc_clusters() introduced earlier now handles the task of
allocating clusters and performing COW when needed. Thus we can change
vmdk_get_cluster_offset() to stick to the sole purpose of returning
cluster offset using sector number. Update the changes at all call
sites.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
---
 block/vmdk.c | 56 ++++++++++++--------------------------------------------
 1 file changed, 12 insertions(+), 44 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 9fa2414..accf1c3 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1485,25 +1485,16 @@ static int vmdk_alloc_clusters(BlockDriverState *bs,
  * For flat extents, the start offset as parsed from the description file is
  * returned.
  *
- * For sparse extents, look up in L1, L2 table. If allocate is true, return an
- * offset for a new cluster and update L2 cache. If there is a backing file,
- * COW is done before returning; otherwise, zeroes are written to the allocated
- * cluster. Both COW and zero writing skips the sector range
- * [@skip_start_sector, @skip_end_sector) passed in by caller, because caller
- * has new data to write there.
+ * For sparse extents, look up the L1, L2 table.
  *
  * Returns: VMDK_OK if cluster exists and mapped in the image.
- *          VMDK_UNALLOC if cluster is not mapped and @allocate is false.
- *          VMDK_ERROR if failed.
+ *          VMDK_UNALLOC if cluster is not mapped.
+ *          VMDK_ERROR if failed
  */
 static int vmdk_get_cluster_offset(BlockDriverState *bs,
                                    VmdkExtent *extent,
-                                   VmdkMetaData *m_data,
                                    uint64_t offset,
-                                   bool allocate,
-                                   uint64_t *cluster_offset,
-                                   uint64_t skip_start_bytes,
-                                   uint64_t skip_end_bytes)
+                                   uint64_t *cluster_offset)
 {
     int l1_index, l2_offset, l2_index;
     uint32_t *l2_table;
@@ -1528,31 +1519,9 @@ static int vmdk_get_cluster_offset(BlockDriverState *bs,
     }
 
     if (!cluster_sector || zeroed) {
-        if (!allocate) {
-            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
-        }
-
-        cluster_sector = extent->next_cluster_sector;
-        extent->next_cluster_sector += extent->cluster_sectors;
-
-        /* First of all we write grain itself, to avoid race condition
-         * that may to corrupt the image.
-         * This problem may occur because of insufficient space on host disk
-         * or inappropriate VM shutdown.
-         */
-        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
-                                offset, skip_start_bytes, skip_end_bytes);
-        if (ret) {
-            return ret;
-        }
-        if (m_data) {
-            m_data->valid = 1;
-            m_data->l1_index = l1_index;
-            m_data->l2_index = l2_index;
-            m_data->l2_offset = l2_offset;
-            m_data->l2_cache_entry = &l2_table[l2_index];
-        }
+        return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
     }
+
     *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
     return VMDK_OK;
 }
@@ -1595,9 +1564,7 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
         return 0;
     }
     qemu_co_mutex_lock(&s->lock);
-    ret = vmdk_get_cluster_offset(bs, extent, NULL,
-                                  sector_num * 512, false, &offset,
-                                  0, 0);
+    ret = vmdk_get_cluster_offset(bs, extent, sector_num * 512, &offset);
     qemu_co_mutex_unlock(&s->lock);
 
     index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
@@ -1788,13 +1755,14 @@ vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
             ret = -EIO;
             goto fail;
         }
-        ret = vmdk_get_cluster_offset(bs, extent, NULL,
-                                      offset, false, &cluster_offset, 0, 0);
+
         offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 
         n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                              - offset_in_cluster);
 
+        ret = vmdk_get_cluster_offset(bs, extent, offset, &cluster_offset);
+
         if (ret != VMDK_OK) {
             /* if not allocated, try to read from parent image, if exist */
             if (bs->backing && ret != VMDK_ZEROED) {
@@ -2541,9 +2509,9 @@ static int vmdk_check(BlockDriverState *bs, BdrvCheckResult *result,
                     sector_num);
             break;
         }
-        ret = vmdk_get_cluster_offset(bs, extent, NULL,
+        ret = vmdk_get_cluster_offset(bs, extent,
                                       sector_num << BDRV_SECTOR_BITS,
-                                      false, &cluster_offset, 0, 0);
+                                      &cluster_offset);
         if (ret == VMDK_ERROR) {
             fprintf(stderr,
                     "ERROR: could not get cluster_offset for sector %"
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
@ 2017-06-27  8:02   ` Fam Zheng
  2017-06-29  7:42     ` Ashijeet Acharya
  0 siblings, 1 reply; 15+ messages in thread
From: Fam Zheng @ 2017-06-27  8:02 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> +/*
> + * vmdk_handle_alloc
> + *
> + * Allocate new clusters for an area that either is yet unallocated or needs a
> + * copy on write. If *cluster_offset is non_zero, clusters are only allocated if
> + * the new allocation can match the specified host offset.

I don't think this matches the function body, the passed in *cluster_offset
value is ignored.

> + *
> + * Returns:
> + *   VMDK_OK:       if new clusters were allocated, *bytes may be decreased if
> + *                  the new allocation doesn't cover all of the requested area.
> + *                  *cluster_offset is updated to contain the offset of the
> + *                  first newly allocated cluster.
> + *
> + *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is left
> + *                  unchanged.
> + *
> + *   VMDK_ERROR:    in error cases
> + */
> +static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
> +                             uint64_t offset, uint64_t *cluster_offset,
> +                             int64_t *bytes, VmdkMetaData *m_data,
> +                             bool allocate, uint32_t *alloc_clusters_counter)
> +{
> +    int l1_index, l2_offset, l2_index;
> +    uint32_t *l2_table;
> +    uint32_t cluster_sector;
> +    uint32_t nb_clusters;
> +    bool zeroed = false;
> +    uint64_t skip_start_bytes, skip_end_bytes;
> +    int ret;
> +
> +    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
> +                            &l2_index, &l2_table);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    cluster_sector = le32_to_cpu(l2_table[l2_index]);
> +
> +    skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
> +    /* Calculate the number of clusters to look for. Here we truncate the last
> +     * cluster, i.e. 1 less than the actual value calculated as we may need to
> +     * perform COW for the last one. */
> +    nb_clusters = DIV_ROUND_UP(skip_start_bytes + *bytes,
> +                               extent->cluster_sectors << BDRV_SECTOR_BITS) - 1;
> +
> +    nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
> +    assert(nb_clusters <= INT_MAX);
> +
> +    /* update bytes according to final nb_clusters value */
> +    if (nb_clusters != 0) {
> +        *bytes = ((nb_clusters * extent->cluster_sectors) << BDRV_SECTOR_BITS)
> +                 - skip_start_bytes;
> +    } else {
> +        nb_clusters = 1;
> +    }
> +    *alloc_clusters_counter += nb_clusters;
> +    skip_end_bytes = skip_start_bytes + MIN(*bytes,
> +                     extent->cluster_sectors * BDRV_SECTOR_SIZE
> +                                    - skip_start_bytes);

I don't understand the MIN part, shouldn't skip_end_bytes simply be
skip_start_bytes + *bytes?

> +
> +    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
> +        zeroed = true;
> +    }
> +
> +    if (!cluster_sector || zeroed) {
> +        if (!allocate) {
> +            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
> +        }
> +
> +        cluster_sector = extent->next_cluster_sector;
> +        extent->next_cluster_sector += extent->cluster_sectors
> +                                                * nb_clusters;
> +
> +        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
> +                               offset, skip_start_bytes,
> +                               skip_end_bytes);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        if (m_data) {
> +            m_data->valid = 1;
> +            m_data->l1_index = l1_index;
> +            m_data->l2_index = l2_index;
> +            m_data->l2_offset = l2_offset;
> +            m_data->l2_cache_entry = &l2_table[l2_index];
> +            m_data->nb_clusters = nb_clusters;
> +        }
> +    }
> +    *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
> +    return VMDK_OK;
> +}
> +
> +/*
> + * vmdk_alloc_clusters
> + *
> + * For a given offset on the virtual disk, find the cluster offset in vmdk
> + * file. If the offset is not found, allocate a new cluster.
> + *
> + * If the cluster is newly allocated, m_data->nb_clusters is set to the number
> + * of contiguous clusters that have been allocated. In this case, the other
> + * fields of m_data are valid and contain information about the first allocated
> + * cluster.
> + *
> + * Returns:
> + *
> + *   VMDK_OK:           on success and @cluster_offset was set
> + *
> + *   VMDK_UNALLOC:      if no clusters were allocated and @cluster_offset is
> + *                      set to zero
> + *
> + *   VMDK_ERROR:        in error cases
> + */
> +static int vmdk_alloc_clusters(BlockDriverState *bs,
> +                               VmdkExtent *extent,
> +                               VmdkMetaData *m_data, uint64_t offset,
> +                               bool allocate, uint64_t *cluster_offset,
> +                               int64_t bytes,
> +                               uint32_t *total_alloc_clusters)
> +{
> +    uint64_t start, remaining;
> +    uint64_t new_cluster_offset;
> +    int64_t n_bytes;
> +    int ret;
> +
> +    if (extent->flat) {
> +        *cluster_offset = extent->flat_start_offset;
> +        return VMDK_OK;
> +    }
> +
> +    start = offset;
> +    remaining = bytes;
> +    new_cluster_offset = 0;
> +    *cluster_offset = 0;
> +    n_bytes = 0;
> +    if (m_data) {
> +        m_data->valid = 0;
> +    }
> +
> +    /* due to L2 table margins all bytes may not get allocated at once */
> +    while (true) {
> +
> +        if (!*cluster_offset) {
> +            *cluster_offset = new_cluster_offset;
> +        }
> +
> +        start              += n_bytes;
> +        remaining          -= n_bytes;
> +        new_cluster_offset += n_bytes;

Like said above, even though you increment new_cluster_offset by n_bytes, it has
no effect inside vmdk_handle_alloc. Is this intended?

> +
> +        if (remaining == 0) {
> +            break;
> +        }
> +
> +        n_bytes = remaining;
> +
> +        ret = vmdk_handle_alloc(bs, extent, start, &new_cluster_offset, &n_bytes,
> +                                m_data, allocate, total_alloc_clusters);
> +
> +        if (ret < 0) {
> +            return ret;
> +
> +        }
> +    }
> +
> +    return VMDK_OK;
> +}
> +
>  /**
>   * vmdk_get_cluster_offset
>   *
> @@ -1625,6 +1794,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>      uint64_t bytes_done = 0;
>      VmdkMetaData m_data;
>      uint64_t extent_end;
> +    uint32_t total_alloc_clusters = 0;
>  
>      if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
>          error_report("Wrong offset: offset=0x%" PRIx64
> @@ -1650,10 +1820,10 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>              n_bytes = MIN(bytes, extent_end - offset);
>          }
>  
> -        ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
> -                                      !(extent->compressed || zeroed),
> -                                      &cluster_offset, offset_in_cluster,
> -                                      offset_in_cluster + n_bytes);
> +        ret = vmdk_alloc_clusters(bs, extent, &m_data, offset,
> +                                  !(extent->compressed || zeroed),
> +                                  &cluster_offset, n_bytes,
> +                                  &total_alloc_clusters);
>          if (extent->compressed) {
>              if (ret == VMDK_OK) {
>                  /* Refuse write to allocated cluster for streamOptimized */
> @@ -1662,8 +1832,9 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>                  return -EIO;
>              } else {
>                  /* allocate */
> -                ret = vmdk_get_cluster_offset(bs, extent, &m_data, offset,
> -                                              true, &cluster_offset, 0, 0);
> +                ret = vmdk_alloc_clusters(bs, extent, &m_data, offset,
> +                                          true, &cluster_offset, n_bytes,
> +                                          &total_alloc_clusters);
>              }
>          }
>          if (ret == VMDK_ERROR) {
> @@ -1671,10 +1842,11 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>          }
>          if (zeroed) {
>              /* Do zeroed write, buf is ignored */
> -            if (extent->has_zero_grain &&
> -                    offset_in_cluster == 0 &&
> -                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE) {
> -                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE;
> +            if (extent->has_zero_grain && offset_in_cluster == 0 &&
> +                    n_bytes >= extent->cluster_sectors * BDRV_SECTOR_SIZE *
> +                        total_alloc_clusters) {
> +                n_bytes = extent->cluster_sectors * BDRV_SECTOR_SIZE *
> +                                        total_alloc_clusters;
>                  if (!zero_dry_run) {
>                      /* update L2 tables */
>                      if (vmdk_L2update(extent, &m_data, VMDK_GTE_ZEROED)
> -- 
> 2.6.2
> 

Fam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for multiple clusters
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for " Ashijeet Acharya
@ 2017-06-27  8:04   ` Fam Zheng
  2017-06-29  8:48     ` Ashijeet Acharya
  2017-06-27  8:14   ` Fam Zheng
  1 sibling, 1 reply; 15+ messages in thread
From: Fam Zheng @ 2017-06-27  8:04 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> @@ -1876,6 +1942,13 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>          offset += n_bytes;
>          bytes_done += n_bytes;
>  
> +        while (m_data->next != NULL) {

If you do

           while (m_data) {

> +            VmdkMetaData *next;
> +            next = m_data->next;
> +            g_free(m_data);
> +            m_data = next;
> +        }
> +
>          /* update CID on the first write every time the virtual disk is
>           * opened */
>          if (!s->cid_updated) {
> @@ -1886,6 +1959,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>              s->cid_updated = true;
>          }
>      }
> +    g_free(m_data);

then you can remove this line.

>      return 0;
>  }
>  
> -- 
> 2.6.2
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for multiple clusters
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for " Ashijeet Acharya
  2017-06-27  8:04   ` Fam Zheng
@ 2017-06-27  8:14   ` Fam Zheng
  1 sibling, 0 replies; 15+ messages in thread
From: Fam Zheng @ 2017-06-27  8:14 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> Include a next pointer in VmdkMetaData struct to point to the previous
> allocated L2 table. Modify vmdk_L2update to start updating metadata for
> allocation of multiple clusters at once.
> 
> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> ---
>  block/vmdk.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 101 insertions(+), 27 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index b671dc9..9fa2414 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -137,6 +137,8 @@ typedef struct VmdkMetaData {
>      int valid;
>      uint32_t *l2_cache_entry;
>      uint32_t nb_clusters;
> +    uint32_t offset;
> +    struct VmdkMetaData *next;
>  } VmdkMetaData;
>  
>  typedef struct VmdkGrainMarker {
> @@ -1116,34 +1118,87 @@ exit:
>      return ret;
>  }
>  
> -static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> -                         uint32_t offset)
> +static int vmdk_alloc_cluster_link_l2(VmdkExtent *extent,
> +                                      VmdkMetaData *m_data, bool zeroed)
>  {
> -    offset = cpu_to_le32(offset);
> +    int i;
> +    uint32_t offset, temp_offset;
> +    int *l2_table_array;
> +    int l2_array_size;
> +
> +    if (zeroed) {
> +        temp_offset = VMDK_GTE_ZEROED;
> +    } else {
> +        temp_offset = m_data->offset;
> +    }
> +
> +    l2_array_size = sizeof(uint32_t) * m_data->nb_clusters;
> +    l2_table_array = qemu_try_blockalign(extent->file->bs,
> +                                         QEMU_ALIGN_UP(l2_array_size,
> +                                                       BDRV_SECTOR_SIZE));
> +    if (l2_table_array == NULL) {
> +        return VMDK_ERROR;
> +    }
> +    memset(l2_table_array, 0, QEMU_ALIGN_UP(l2_array_size, BDRV_SECTOR_SIZE));
>      /* update L2 table */
> +    offset = temp_offset;
> +    for (i = 0; i < m_data->nb_clusters; i++) {
> +        l2_table_array[i] = cpu_to_le32(offset);
> +        if (!zeroed) {
> +            offset += 128;

s/128/extent->cluster_sectors/?

> +        }
> +    }
>      if (bdrv_pwrite_sync(extent->file,
> -                ((int64_t)m_data->l2_offset * 512)
> -                    + (m_data->l2_index * sizeof(offset)),
> -                &offset, sizeof(offset)) < 0) {
> +                         ((int64_t)m_data->l2_offset * 512)
> +                         + ((m_data->l2_index) * sizeof(offset)),
> +                         l2_table_array, l2_array_size) < 0) {
>          return VMDK_ERROR;
>      }
>      /* update backup L2 table */
>      if (extent->l1_backup_table_offset != 0) {
>          m_data->l2_offset = extent->l1_backup_table[m_data->l1_index];
>          if (bdrv_pwrite_sync(extent->file,
> -                    ((int64_t)m_data->l2_offset * 512)
> -                        + (m_data->l2_index * sizeof(offset)),
> -                    &offset, sizeof(offset)) < 0) {
> +                             ((int64_t)m_data->l2_offset * 512)
> +                             + ((m_data->l2_index) * sizeof(offset)),
> +                             l2_table_array, l2_array_size) < 0) {
>              return VMDK_ERROR;
>          }
>      }
> +
> +    offset = temp_offset;
>      if (m_data->l2_cache_entry) {
> -        *m_data->l2_cache_entry = offset;
> +        for (i = 0; i < m_data->nb_clusters; i++) {
> +            *m_data->l2_cache_entry = cpu_to_le32(offset);
> +            m_data->l2_cache_entry++;
> +
> +            if (!zeroed) {
> +                offset += 128;

Ditto.

> +            }
> +        }
>      }
>  
> +    qemu_vfree(l2_table_array);
>      return VMDK_OK;
>  }
>  
> +static int vmdk_L2update(VmdkExtent *extent, VmdkMetaData *m_data,
> +                         bool zeroed)
> +{
> +    int ret;
> +
> +    while (m_data->next != NULL) {
> +
> +        ret = vmdk_alloc_cluster_link_l2(extent, m_data, zeroed);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        m_data = m_data->next;
> +     }
> +
> +     return VMDK_OK;
> +}
> +
>  /*
>   * vmdk_l2load
>   *
> @@ -1261,9 +1316,10 @@ static int get_cluster_table(VmdkExtent *extent, uint64_t offset,
>   *
>   *   VMDK_ERROR:    in error cases
>   */
> +
>  static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>                               uint64_t offset, uint64_t *cluster_offset,
> -                             int64_t *bytes, VmdkMetaData *m_data,
> +                             int64_t *bytes, VmdkMetaData **m_data,
>                               bool allocate, uint32_t *alloc_clusters_counter)
>  {
>      int l1_index, l2_offset, l2_index;
> @@ -1272,6 +1328,7 @@ static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>      uint32_t nb_clusters;
>      bool zeroed = false;
>      uint64_t skip_start_bytes, skip_end_bytes;
> +    VmdkMetaData *old_m_data;
>      int ret;
>  
>      ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
> @@ -1323,13 +1380,21 @@ static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>          if (ret < 0) {
>              return ret;
>          }
> -        if (m_data) {
> -            m_data->valid = 1;
> -            m_data->l1_index = l1_index;
> -            m_data->l2_index = l2_index;
> -            m_data->l2_offset = l2_offset;
> -            m_data->l2_cache_entry = &l2_table[l2_index];
> -            m_data->nb_clusters = nb_clusters;
> +
> +        if (*m_data) {

Please check m_data != NULL if it's possible, otherwise assert it.

> +            old_m_data = *m_data;
> +            *m_data = g_malloc0(sizeof(**m_data));
> +
> +            **m_data = (VmdkMetaData) {
> +                .valid            =    1,
> +                .l1_index         =    l1_index,
> +                .l2_index         =    l2_index,
> +                .l2_offset        =    l2_offset,
> +                .l2_cache_entry   =    &l2_table[l2_index],
> +                .nb_clusters      =    nb_clusters,
> +                .offset           =    cluster_sector,
> +                .next             =    old_m_data,
> +            };
>          }
>      }
>      *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
> @@ -1792,10 +1857,12 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>      int64_t offset_in_cluster, n_bytes;
>      uint64_t cluster_offset;
>      uint64_t bytes_done = 0;
> -    VmdkMetaData m_data;
>      uint64_t extent_end;
> +    VmdkMetaData *m_data;
>      uint32_t total_alloc_clusters = 0;
>  
> +    m_data = g_malloc0(sizeof(*m_data));

You can use "g_new0(VmdkMetaData, 1)" which is more readable.

> +
>      if (DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) > bs->total_sectors) {
>          error_report("Wrong offset: offset=0x%" PRIx64
>                       " total_sectors=0x%" PRIx64,

Fam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v6 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only
  2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only Ashijeet Acharya
@ 2017-06-27  8:15   ` Fam Zheng
  0 siblings, 0 replies; 15+ messages in thread
From: Fam Zheng @ 2017-06-27  8:15 UTC (permalink / raw)
  To: Ashijeet Acharya; +Cc: kwolf, jsnow, mreitz, stefanha, qemu-devel, qemu-block

On Mon, 06/05 13:22, Ashijeet Acharya wrote:
> vmdk_alloc_clusters() introduced earlier now handles the task of
> allocating clusters and performing COW when needed. Thus we can change
> vmdk_get_cluster_offset() to stick to the sole purpose of returning
> cluster offset using sector number. Update the changes at all call
> sites.
> 
> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
> ---
>  block/vmdk.c | 56 ++++++++++++--------------------------------------------
>  1 file changed, 12 insertions(+), 44 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 9fa2414..accf1c3 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1485,25 +1485,16 @@ static int vmdk_alloc_clusters(BlockDriverState *bs,
>   * For flat extents, the start offset as parsed from the description file is
>   * returned.
>   *
> - * For sparse extents, look up in L1, L2 table. If allocate is true, return an
> - * offset for a new cluster and update L2 cache. If there is a backing file,
> - * COW is done before returning; otherwise, zeroes are written to the allocated
> - * cluster. Both COW and zero writing skips the sector range
> - * [@skip_start_sector, @skip_end_sector) passed in by caller, because caller
> - * has new data to write there.
> + * For sparse extents, look up the L1, L2 table.
>   *
>   * Returns: VMDK_OK if cluster exists and mapped in the image.
> - *          VMDK_UNALLOC if cluster is not mapped and @allocate is false.
> - *          VMDK_ERROR if failed.
> + *          VMDK_UNALLOC if cluster is not mapped.
> + *          VMDK_ERROR if failed
>   */
>  static int vmdk_get_cluster_offset(BlockDriverState *bs,
>                                     VmdkExtent *extent,
> -                                   VmdkMetaData *m_data,
>                                     uint64_t offset,
> -                                   bool allocate,
> -                                   uint64_t *cluster_offset,
> -                                   uint64_t skip_start_bytes,
> -                                   uint64_t skip_end_bytes)
> +                                   uint64_t *cluster_offset)
>  {
>      int l1_index, l2_offset, l2_index;
>      uint32_t *l2_table;
> @@ -1528,31 +1519,9 @@ static int vmdk_get_cluster_offset(BlockDriverState *bs,
>      }
>  
>      if (!cluster_sector || zeroed) {
> -        if (!allocate) {
> -            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
> -        }
> -
> -        cluster_sector = extent->next_cluster_sector;
> -        extent->next_cluster_sector += extent->cluster_sectors;
> -
> -        /* First of all we write grain itself, to avoid race condition
> -         * that may to corrupt the image.
> -         * This problem may occur because of insufficient space on host disk
> -         * or inappropriate VM shutdown.
> -         */
> -        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
> -                                offset, skip_start_bytes, skip_end_bytes);
> -        if (ret) {
> -            return ret;
> -        }
> -        if (m_data) {
> -            m_data->valid = 1;
> -            m_data->l1_index = l1_index;
> -            m_data->l2_index = l2_index;
> -            m_data->l2_offset = l2_offset;
> -            m_data->l2_cache_entry = &l2_table[l2_index];
> -        }
> +        return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
>      }
> +
>      *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
>      return VMDK_OK;
>  }
> @@ -1595,9 +1564,7 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
>          return 0;
>      }
>      qemu_co_mutex_lock(&s->lock);
> -    ret = vmdk_get_cluster_offset(bs, extent, NULL,
> -                                  sector_num * 512, false, &offset,
> -                                  0, 0);
> +    ret = vmdk_get_cluster_offset(bs, extent, sector_num * 512, &offset);
>      qemu_co_mutex_unlock(&s->lock);
>  
>      index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
> @@ -1788,13 +1755,14 @@ vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>              ret = -EIO;
>              goto fail;
>          }
> -        ret = vmdk_get_cluster_offset(bs, extent, NULL,
> -                                      offset, false, &cluster_offset, 0, 0);
> +
>          offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
>  
>          n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
>                               - offset_in_cluster);
>  
> +        ret = vmdk_get_cluster_offset(bs, extent, offset, &cluster_offset);
> +
>          if (ret != VMDK_OK) {
>              /* if not allocated, try to read from parent image, if exist */
>              if (bs->backing && ret != VMDK_ZEROED) {
> @@ -2541,9 +2509,9 @@ static int vmdk_check(BlockDriverState *bs, BdrvCheckResult *result,
>                      sector_num);
>              break;
>          }
> -        ret = vmdk_get_cluster_offset(bs, extent, NULL,
> +        ret = vmdk_get_cluster_offset(bs, extent,
>                                        sector_num << BDRV_SECTOR_BITS,
> -                                      false, &cluster_offset, 0, 0);
> +                                      &cluster_offset);
>          if (ret == VMDK_ERROR) {
>              fprintf(stderr,
>                      "ERROR: could not get cluster_offset for sector %"
> -- 
> 2.6.2
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters
  2017-06-27  8:02   ` Fam Zheng
@ 2017-06-29  7:42     ` Ashijeet Acharya
  0 siblings, 0 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-29  7:42 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, John Snow, Max Reitz, Stefan Hajnoczi,
	QEMU Developers, qemu block

On Tue, Jun 27, 2017 at 1:32 PM, Fam Zheng <famz@redhat.com> wrote:
> On Mon, 06/05 13:22, Ashijeet Acharya wrote:
>> +/*
>> + * vmdk_handle_alloc
>> + *
>> + * Allocate new clusters for an area that either is yet unallocated or needs a
>> + * copy on write. If *cluster_offset is non_zero, clusters are only allocated if
>> + * the new allocation can match the specified host offset.
>
> I don't think this matches the function body, the passed in *cluster_offset
> value is ignored.
>
>> + *
>> + * Returns:
>> + *   VMDK_OK:       if new clusters were allocated, *bytes may be decreased if
>> + *                  the new allocation doesn't cover all of the requested area.
>> + *                  *cluster_offset is updated to contain the offset of the
>> + *                  first newly allocated cluster.
>> + *
>> + *   VMDK_UNALLOC:  if no clusters could be allocated. *cluster_offset is left
>> + *                  unchanged.
>> + *
>> + *   VMDK_ERROR:    in error cases
>> + */
>> +static int vmdk_handle_alloc(BlockDriverState *bs, VmdkExtent *extent,
>> +                             uint64_t offset, uint64_t *cluster_offset,
>> +                             int64_t *bytes, VmdkMetaData *m_data,
>> +                             bool allocate, uint32_t *alloc_clusters_counter)
>> +{
>> +    int l1_index, l2_offset, l2_index;
>> +    uint32_t *l2_table;
>> +    uint32_t cluster_sector;
>> +    uint32_t nb_clusters;
>> +    bool zeroed = false;
>> +    uint64_t skip_start_bytes, skip_end_bytes;
>> +    int ret;
>> +
>> +    ret = get_cluster_table(extent, offset, &l1_index, &l2_offset,
>> +                            &l2_index, &l2_table);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    cluster_sector = le32_to_cpu(l2_table[l2_index]);
>> +
>> +    skip_start_bytes = vmdk_find_offset_in_cluster(extent, offset);
>> +    /* Calculate the number of clusters to look for. Here we truncate the last
>> +     * cluster, i.e. 1 less than the actual value calculated as we may need to
>> +     * perform COW for the last one. */
>> +    nb_clusters = DIV_ROUND_UP(skip_start_bytes + *bytes,
>> +                               extent->cluster_sectors << BDRV_SECTOR_BITS) - 1;
>> +
>> +    nb_clusters = MIN(nb_clusters, extent->l2_size - l2_index);
>> +    assert(nb_clusters <= INT_MAX);
>> +
>> +    /* update bytes according to final nb_clusters value */
>> +    if (nb_clusters != 0) {
>> +        *bytes = ((nb_clusters * extent->cluster_sectors) << BDRV_SECTOR_BITS)
>> +                 - skip_start_bytes;
>> +    } else {
>> +        nb_clusters = 1;
>> +    }
>> +    *alloc_clusters_counter += nb_clusters;
>> +    skip_end_bytes = skip_start_bytes + MIN(*bytes,
>> +                     extent->cluster_sectors * BDRV_SECTOR_SIZE
>> +                                    - skip_start_bytes);
>
> I don't understand the MIN part, shouldn't skip_end_bytes simply be
> skip_start_bytes + *bytes?
>
>> +
>> +    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
>> +        zeroed = true;
>> +    }
>> +
>> +    if (!cluster_sector || zeroed) {
>> +        if (!allocate) {
>> +            return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
>> +        }
>> +
>> +        cluster_sector = extent->next_cluster_sector;
>> +        extent->next_cluster_sector += extent->cluster_sectors
>> +                                                * nb_clusters;
>> +
>> +        ret = vmdk_perform_cow(bs, extent, cluster_sector * BDRV_SECTOR_SIZE,
>> +                               offset, skip_start_bytes,
>> +                               skip_end_bytes);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +        if (m_data) {
>> +            m_data->valid = 1;
>> +            m_data->l1_index = l1_index;
>> +            m_data->l2_index = l2_index;
>> +            m_data->l2_offset = l2_offset;
>> +            m_data->l2_cache_entry = &l2_table[l2_index];
>> +            m_data->nb_clusters = nb_clusters;
>> +        }
>> +    }
>> +    *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
>> +    return VMDK_OK;
>> +}
>> +
>> +/*
>> + * vmdk_alloc_clusters
>> + *
>> + * For a given offset on the virtual disk, find the cluster offset in vmdk
>> + * file. If the offset is not found, allocate a new cluster.
>> + *
>> + * If the cluster is newly allocated, m_data->nb_clusters is set to the number
>> + * of contiguous clusters that have been allocated. In this case, the other
>> + * fields of m_data are valid and contain information about the first allocated
>> + * cluster.
>> + *
>> + * Returns:
>> + *
>> + *   VMDK_OK:           on success and @cluster_offset was set
>> + *
>> + *   VMDK_UNALLOC:      if no clusters were allocated and @cluster_offset is
>> + *                      set to zero
>> + *
>> + *   VMDK_ERROR:        in error cases
>> + */
>> +static int vmdk_alloc_clusters(BlockDriverState *bs,
>> +                               VmdkExtent *extent,
>> +                               VmdkMetaData *m_data, uint64_t offset,
>> +                               bool allocate, uint64_t *cluster_offset,
>> +                               int64_t bytes,
>> +                               uint32_t *total_alloc_clusters)
>> +{
>> +    uint64_t start, remaining;
>> +    uint64_t new_cluster_offset;
>> +    int64_t n_bytes;
>> +    int ret;
>> +
>> +    if (extent->flat) {
>> +        *cluster_offset = extent->flat_start_offset;
>> +        return VMDK_OK;
>> +    }
>> +
>> +    start = offset;
>> +    remaining = bytes;
>> +    new_cluster_offset = 0;
>> +    *cluster_offset = 0;
>> +    n_bytes = 0;
>> +    if (m_data) {
>> +        m_data->valid = 0;
>> +    }
>> +
>> +    /* due to L2 table margins all bytes may not get allocated at once */
>> +    while (true) {
>> +
>> +        if (!*cluster_offset) {
>> +            *cluster_offset = new_cluster_offset;
>> +        }
>> +
>> +        start              += n_bytes;
>> +        remaining          -= n_bytes;
>> +        new_cluster_offset += n_bytes;
>
> Like said above, even though you increment new_cluster_offset by n_bytes, it has
> no effect inside vmdk_handle_alloc. Is this intended?

I think that comment may have caused a confusion as it was valid only
till v2/v3 and not now. I will remove it.

Ashijeet

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for multiple clusters
  2017-06-27  8:04   ` Fam Zheng
@ 2017-06-29  8:48     ` Ashijeet Acharya
  0 siblings, 0 replies; 15+ messages in thread
From: Ashijeet Acharya @ 2017-06-29  8:48 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, John Snow, Max Reitz, Stefan Hajnoczi,
	QEMU Developers, qemu block

On Tue, Jun 27, 2017 at 1:34 PM, Fam Zheng <famz@redhat.com> wrote:
> On Mon, 06/05 13:22, Ashijeet Acharya wrote:
>> @@ -1876,6 +1942,13 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>          offset += n_bytes;
>>          bytes_done += n_bytes;
>>
>> +        while (m_data->next != NULL) {
>
> If you do
>
>            while (m_data) {
>
>> +            VmdkMetaData *next;
>> +            next = m_data->next;
>> +            g_free(m_data);
>> +            m_data = next;
>> +        }
>> +
>>          /* update CID on the first write every time the virtual disk is
>>           * opened */
>>          if (!s->cid_updated) {
>> @@ -1886,6 +1959,7 @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
>>              s->cid_updated = true;
>>          }
>>      }
>> +    g_free(m_data);
>
> then you can remove this line.

As I explained last time, I can't do this because I am reusing the
first allocated m_data. If I am to do it the way you suggest, I will
have to move the allocation of first m_data (m_data =
g_new0(VmdkMetaData, 1)) inside the outer while loop, otherwise things
will segfault.

Ashijeet

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-29  8:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05  7:52 [Qemu-devel] [PATCH v6 0/8] Optimize VMDK I/O by allocating multiple clusters Ashijeet Acharya
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 1/8] vmdk: Move vmdk_find_offset_in_cluster() to the top Ashijeet Acharya
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 2/8] vmdk: Rename get_whole_cluster() to vmdk_perform_cow() Ashijeet Acharya
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 3/8] vmdk: Rename get_cluster_offset() to vmdk_get_cluster_offset() Ashijeet Acharya
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 4/8] vmdk: Factor out metadata loading code out of vmdk_get_cluster_offset() Ashijeet Acharya
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 5/8] vmdk: Set maximum bytes allocated in one cycle Ashijeet Acharya
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 6/8] vmdk: New functions to assist allocating multiple clusters Ashijeet Acharya
2017-06-27  8:02   ` Fam Zheng
2017-06-29  7:42     ` Ashijeet Acharya
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 7/8] vmdk: Update metadata for " Ashijeet Acharya
2017-06-27  8:04   ` Fam Zheng
2017-06-29  8:48     ` Ashijeet Acharya
2017-06-27  8:14   ` Fam Zheng
2017-06-05  7:52 ` [Qemu-devel] [PATCH v6 8/8] vmdk: Make vmdk_get_cluster_offset() return cluster offset only Ashijeet Acharya
2017-06-27  8:15   ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.