All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
@ 2017-05-31 14:43 Pavel Butsykin
  2017-05-31 14:43 ` [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support Pavel Butsykin
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Pavel Butsykin @ 2017-05-31 14:43 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: kwolf, mreitz, armbru, eblake, pbutsykin

This patch adds the reduction of the image file for qcow2. As a result, this
allows us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and reduction is done by punching
holes in the image file.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2
Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.59 (395.180 MiB/sec and 0.3859 ops/sec)

# ./qemu-img resize image.qcow2 128M
Image resized.

# qemu-img info image.qcow2 
image: image.qcow2
file format: qcow2
virtual size: 128M (134217728 bytes)
disk size: 128M
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# du -h image.qcow2 
129M    image.qcow2

Pavel Butsykin (2):
  qcow2: add reduce image support
  qemu-iotests: add reducing image test in 025

 block/qcow2-cache.c        |  8 +++++
 block/qcow2-cluster.c      | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c     | 65 ++++++++++++++++++++++++++++++++++++
 block/qcow2.c              | 40 ++++++++++++++++------
 block/qcow2.h              |  4 +++
 qapi/block-core.json       |  4 ++-
 tests/qemu-iotests/025     | 19 +++++++++--
 tests/qemu-iotests/025.out | 12 ++++++-
 8 files changed, 221 insertions(+), 14 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support
  2017-05-31 14:43 [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Pavel Butsykin
@ 2017-05-31 14:43 ` Pavel Butsykin
  2017-06-01 14:41   ` Kevin Wolf
  2017-05-31 14:43 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025 Pavel Butsykin
  2017-05-31 15:03 ` [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Eric Blake
  2 siblings, 1 reply; 19+ messages in thread
From: Pavel Butsykin @ 2017-05-31 14:43 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: kwolf, mreitz, armbru, eblake, pbutsykin

This patch adds the reduction of the image file for qcow2. As a result, this
allows us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and reduction is done by punching
holes in the image file.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
---
 block/qcow2-cache.c    |  8 +++++
 block/qcow2-cluster.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c | 65 +++++++++++++++++++++++++++++++++++++++
 block/qcow2.c          | 40 ++++++++++++++++++------
 block/qcow2.h          |  4 +++
 qapi/block-core.json   |  4 ++-
 6 files changed, 193 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147392..da55118ca7 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -411,3 +411,11 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
     assert(c->entries[i].offset != 0);
     c->entries[i].dirty = true;
 }
+
+void qcow2_cache_entry_mark_clean(BlockDriverState *bs, Qcow2Cache *c,
+     void *table)
+{
+    int i = qcow2_cache_get_table_idx(bs, c, table);
+    assert(c->entries[i].offset != 0);
+    c->entries[i].dirty = false;
+}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 347d94b0d2..47e04d7317 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -32,6 +32,89 @@
 #include "qemu/bswap.h"
 #include "trace.h"
 
+int qcow2_reduce_l1_table(BlockDriverState *bs, uint64_t max_size)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int64_t new_l1_size_bytes, free_l1_clusters;
+    uint64_t *new_l1_table;
+    int new_l1_size, i, ret;
+
+    if (max_size >= s->l1_size) {
+        return 0;
+    }
+
+    new_l1_size = max_size;
+
+#ifdef DEBUG_ALLOC2
+    fprintf(stderr, "reduce l1_table from %d to %" PRId64 "\n",
+            s->l1_size, new_l1_size);
+#endif
+
+    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    if (ret < 0) {
+        return ret;
+    }
+
+    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_FREE_L2_CLUSTERS);
+    for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
+        if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
+            continue;
+        }
+        qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
+                            s->l2_size * sizeof(uint64_t),
+                            QCOW2_DISCARD_ALWAYS);
+    }
+
+    new_l1_size_bytes = sizeof(uint64_t) * new_l1_size;
+
+    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_WRITE_TABLE);
+    ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset + new_l1_size_bytes,
+                             s->l1_size * sizeof(uint64_t) - new_l1_size_bytes,
+                             0);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_flush(bs->file->bs);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* set new table size */
+    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_ACTIVATE_TABLE);
+    new_l1_size = cpu_to_be32(new_l1_size);
+    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
+                           &new_l1_size, sizeof(new_l1_size));
+    new_l1_size = be32_to_cpu(new_l1_size);
+    if (ret < 0) {
+        return ret;
+    }
+
+    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_FREE_L1_CLUSTERS);
+    free_l1_clusters =
+        DIV_ROUND_UP(s->l1_size * sizeof(uint64_t), s->cluster_size) -
+        DIV_ROUND_UP(new_l1_size_bytes, s->cluster_size);
+    if (free_l1_clusters) {
+        qcow2_free_clusters(bs, s->l1_table_offset +
+                                ROUND_UP(new_l1_size_bytes, s->cluster_size),
+                            free_l1_clusters << s->cluster_bits,
+                            QCOW2_DISCARD_ALWAYS);
+    }
+
+    new_l1_table = qemu_try_blockalign(bs->file->bs,
+                                       align_offset(new_l1_size_bytes, 512));
+    if (new_l1_table == NULL) {
+        return -ENOMEM;
+    }
+    memcpy(new_l1_table, s->l1_table, new_l1_size_bytes);
+
+    qemu_vfree(s->l1_table);
+    s->l1_table = new_l1_table;
+    s->l1_size = new_l1_size;
+
+    return 0;
+}
+
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
                         bool exact_size)
 {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..5481b623cd 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -29,6 +29,7 @@
 #include "block/qcow2.h"
 #include "qemu/range.h"
 #include "qemu/bswap.h"
+#include "qemu/cutils.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -2931,3 +2932,67 @@ done:
     qemu_vfree(new_refblock);
     return ret;
 }
+
+int qcow2_reftable_shrink(BlockDriverState *bs)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int i, ret;
+
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    if (ret < 0) {
+        return ret;
+    }
+
+    for (i = 0; i < s->refcount_table_size; i++) {
+        int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
+        void *refblock;
+        bool unused_block;
+
+        if (refblock_offs == 0) {
+            continue;
+        }
+        ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
+                              &refblock);
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* the refblock has own reference */
+        if (i == refblock_offs >> (s->refcount_block_bits + s->cluster_bits)) {
+            uint64_t blk_index = (refblock_offs >> s->cluster_bits) &
+                                 (s->refcount_block_size - 1);
+            uint64_t refcount = s->get_refcount(refblock, blk_index);
+
+            s->set_refcount(refblock, blk_index, 0);
+
+            unused_block = buffer_is_zero(refblock, s->refcount_block_size);
+
+            s->set_refcount(refblock, blk_index, refcount);
+        } else {
+            unused_block = buffer_is_zero(refblock, s->refcount_block_size);
+        }
+
+        if (unused_block) {
+            qcow2_free_clusters(bs, refblock_offs, s->cluster_size,
+                                QCOW2_DISCARD_ALWAYS);
+            qcow2_cache_entry_mark_clean(bs, s->refcount_block_cache, refblock);
+            s->refcount_table[i] = 0;
+        }
+        qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+    }
+
+    for (i = 0; i < s->refcount_table_size; i++) {
+        s->refcount_table[i] = cpu_to_be64(s->refcount_table[i]);
+    }
+    ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset,
+                            s->refcount_table,
+                            sizeof(uint64_t) * s->refcount_table_size);
+    if (ret < 0) {
+        return ret;
+    }
+    for (i = 0; i < s->refcount_table_size; i++) {
+        s->refcount_table[i] = be64_to_cpu(s->refcount_table[i]);
+    }
+
+    return 0;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index a8d61f0981..4da8bc85d1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2545,6 +2545,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;
     int64_t new_l1_size;
+    uint64_t total_size;
     int ret;
 
     if (offset & 511) {
@@ -2558,17 +2559,36 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
         return -ENOTSUP;
     }
 
-    /* shrinking is currently not supported */
-    if (offset < bs->total_sectors * 512) {
-        error_setg(errp, "qcow2 doesn't support shrinking images yet");
-        return -ENOTSUP;
-    }
-
     new_l1_size = size_to_l1(s, offset);
-    ret = qcow2_grow_l1_table(bs, new_l1_size, true);
-    if (ret < 0) {
-        error_setg_errno(errp, -ret, "Failed to grow the L1 table");
-        return ret;
+    total_size = bs->total_sectors << BDRV_SECTOR_BITS;
+
+    if (offset < total_size) {
+        ret = qcow2_cluster_discard(bs, ROUND_UP(offset, s->cluster_size),
+                                    total_size - ROUND_UP(offset,
+                                                          s->cluster_size),
+                                    QCOW2_DISCARD_ALWAYS, true);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to discard reduced clasters");
+            return ret;
+        }
+
+        ret = qcow2_reduce_l1_table(bs, new_l1_size);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to reduce the L1 table");
+            return ret;
+        }
+
+        ret = qcow2_reftable_shrink(bs);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to shrink the refcount table");
+            return ret;
+        }
+    } else {
+        ret = qcow2_grow_l1_table(bs, new_l1_size, true);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to grow the L1 table");
+            return ret;
+        }
     }
 
     /* write updated header.size */
diff --git a/block/qcow2.h b/block/qcow2.h
index 1801dc30dc..03cebabb3d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -531,10 +531,12 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
 int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
                                 BlockDriverAmendStatusCB *status_cb,
                                 void *cb_opaque, Error **errp);
+int qcow2_reftable_shrink(BlockDriverState *bs);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
                         bool exact_size);
+int qcow2_reduce_l1_table(BlockDriverState *bs, uint64_t max_size);
 int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
@@ -583,6 +585,8 @@ int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
 
 void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
      void *table);
+void qcow2_cache_entry_mark_clean(BlockDriverState *bs, Qcow2Cache *c,
+     void *table);
 int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_write(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6b974b952f..dcd2d0241f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2371,7 +2371,9 @@
             'cluster_alloc_bytes', 'cluster_free', 'flush_to_os',
             'flush_to_disk', 'pwritev_rmw_head', 'pwritev_rmw_after_head',
             'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
-            'pwritev_zero', 'pwritev_done', 'empty_image_prepare' ] }
+            'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
+            'l1_reduce_write_table', 'l1_reduce_activate_table',
+            'l1_reduce_free_l2_clusters', 'l1_reduce_free_l1_clusters' ] }
 
 ##
 # @BlkdebugInjectErrorOptions:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025
  2017-05-31 14:43 [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Pavel Butsykin
  2017-05-31 14:43 ` [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support Pavel Butsykin
@ 2017-05-31 14:43 ` Pavel Butsykin
  2017-05-31 14:54   ` Pavel Butsykin
  2017-05-31 15:03 ` [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Eric Blake
  2 siblings, 1 reply; 19+ messages in thread
From: Pavel Butsykin @ 2017-05-31 14:43 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: kwolf, mreitz, armbru, eblake, pbutsykin

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
---
 tests/qemu-iotests/025     | 19 +++++++++++++++++--
 tests/qemu-iotests/025.out | 12 +++++++++++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index f5e672e6b3..658601579b 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.filter
 . ./common.pattern
 
-_supported_fmt raw qcow2 qed
+_supported_fmt raw qcow2
 _supported_proto file sheepdog rbd nfs
 _supported_os Linux
 
@@ -46,6 +46,7 @@ echo "=== Creating image"
 echo
 small_size=$((128 * 1024 * 1024))
 big_size=$((384 * 1024 * 1024))
+bigger_size=$((512 * 1024 * 1024))
 _make_test_img $small_size
 
 echo
@@ -54,7 +55,20 @@ io_pattern write 0 $small_size 0 1 0xc5
 _check_test_img
 
 echo
-echo "=== Resizing image"
+echo "=== Growing image"
+$QEMU_IO "$TEST_IMG" <<EOF | _filter_qemu_io
+length
+truncate $bigger_size
+length
+EOF
+_check_test_img
+
+echo
+echo "=== Verifying image size after reopen"
+$QEMU_IO -c "length" "$TEST_IMG"
+
+echo
+echo "=== Reducing image"
 $QEMU_IO "$TEST_IMG" <<EOF | _filter_qemu_io
 length
 truncate $big_size
@@ -70,6 +84,7 @@ echo
 echo "=== Verifying resized image"
 io_pattern read 0 $small_size 0 1 0xc5
 io_pattern read $small_size $(($big_size - $small_size)) 0 1 0
+io_pattern read $big_size $(($bigger_size - $big_size)) 0 1 0
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/025.out b/tests/qemu-iotests/025.out
index f13fc2863c..a0293711c7 100644
--- a/tests/qemu-iotests/025.out
+++ b/tests/qemu-iotests/025.out
@@ -9,8 +9,16 @@ wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 No errors were found on the image.
 
-=== Resizing image
+=== Growing image
 128 MiB
+512 MiB
+No errors were found on the image.
+
+=== Verifying image size after reopen
+512 MiB
+
+=== Reducing image
+512 MiB
 384 MiB
 No errors were found on the image.
 
@@ -24,4 +32,6 @@ read 134217728/134217728 bytes at offset 0
 === IO: pattern 0
 read 268435456/268435456 bytes at offset 134217728
 256 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+=== IO: pattern 0
+read failed: Input/output error
 *** done
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025
  2017-05-31 14:43 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025 Pavel Butsykin
@ 2017-05-31 14:54   ` Pavel Butsykin
  2017-06-01  9:14     ` Kevin Wolf
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Butsykin @ 2017-05-31 14:54 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: kwolf, mreitz, armbru, eblake



On 31.05.2017 17:43, Pavel Butsykin wrote:
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> ---
>   tests/qemu-iotests/025     | 19 +++++++++++++++++--
>   tests/qemu-iotests/025.out | 12 +++++++++++-
>   2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
> index f5e672e6b3..658601579b 100755
> --- a/tests/qemu-iotests/025
> +++ b/tests/qemu-iotests/025
> @@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>   . ./common.filter
>   . ./common.pattern
>   
> -_supported_fmt raw qcow2 qed
> +_supported_fmt raw qcow2

I'm not sure, can I so blatantly drop QED here. But this place is very
suitable for reduce image case. Perhaps the alternative would be adding
a new test, I just didn't want to copy the tests, which are testing
almost the same thing.

>   _supported_proto file sheepdog rbd nfs
>   _supported_os Linux
>   
> @@ -46,6 +46,7 @@ echo "=== Creating image"
>   echo
>   small_size=$((128 * 1024 * 1024))
>   big_size=$((384 * 1024 * 1024))
> +bigger_size=$((512 * 1024 * 1024))
>   _make_test_img $small_size
>   
>   echo
> @@ -54,7 +55,20 @@ io_pattern write 0 $small_size 0 1 0xc5
>   _check_test_img
>   
>   echo
> -echo "=== Resizing image"
> +echo "=== Growing image"
> +$QEMU_IO "$TEST_IMG" <<EOF | _filter_qemu_io
> +length
> +truncate $bigger_size
> +length
> +EOF
> +_check_test_img
> +
> +echo
> +echo "=== Verifying image size after reopen"
> +$QEMU_IO -c "length" "$TEST_IMG"
> +
> +echo
> +echo "=== Reducing image"
>   $QEMU_IO "$TEST_IMG" <<EOF | _filter_qemu_io
>   length
>   truncate $big_size
> @@ -70,6 +84,7 @@ echo
>   echo "=== Verifying resized image"
>   io_pattern read 0 $small_size 0 1 0xc5
>   io_pattern read $small_size $(($big_size - $small_size)) 0 1 0
> +io_pattern read $big_size $(($bigger_size - $big_size)) 0 1 0
>   
>   # success, all done
>   echo "*** done"
> diff --git a/tests/qemu-iotests/025.out b/tests/qemu-iotests/025.out
> index f13fc2863c..a0293711c7 100644
> --- a/tests/qemu-iotests/025.out
> +++ b/tests/qemu-iotests/025.out
> @@ -9,8 +9,16 @@ wrote 134217728/134217728 bytes at offset 0
>   128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   No errors were found on the image.
>   
> -=== Resizing image
> +=== Growing image
>   128 MiB
> +512 MiB
> +No errors were found on the image.
> +
> +=== Verifying image size after reopen
> +512 MiB
> +
> +=== Reducing image
> +512 MiB
>   384 MiB
>   No errors were found on the image.
>   
> @@ -24,4 +32,6 @@ read 134217728/134217728 bytes at offset 0
>   === IO: pattern 0
>   read 268435456/268435456 bytes at offset 134217728
>   256 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +=== IO: pattern 0
> +read failed: Input/output error
>   *** done
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-05-31 14:43 [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Pavel Butsykin
  2017-05-31 14:43 ` [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support Pavel Butsykin
  2017-05-31 14:43 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025 Pavel Butsykin
@ 2017-05-31 15:03 ` Eric Blake
  2017-05-31 15:54   ` Pavel Butsykin
  2017-06-01  9:12   ` Kevin Wolf
  2 siblings, 2 replies; 19+ messages in thread
From: Eric Blake @ 2017-05-31 15:03 UTC (permalink / raw)
  To: Pavel Butsykin, qemu-block, qemu-devel; +Cc: kwolf, mreitz, armbru

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
> This patch adds the reduction of the image file for qcow2. As a result, this
> allows us to reduce the virtual image size and free up space on the disk without
> copying the image. Image can be fragmented and reduction is done by punching
> holes in the image file.
> 
> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
> 
> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2

So this is 1G of guest-visible data...

> # ./qemu-img resize image.qcow2 128M
> Image resized.

...and you are truncating the image by throwing away guest-visible
content, with no warning or double-checking (such as requiring a -f
force parameter or something) about the data loss.  Shrinking images is
something we should allow, but it feels like is a rare enough operation
that you don't want it to be casually easy to throw away data.

Is it feasible to require that a shrink operation will not be performed
unless all clusters being eliminated have been previously discarded (or
maybe written to zero), as an assurance that the guest did not care
about the tail of the image?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-05-31 15:03 ` [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Eric Blake
@ 2017-05-31 15:54   ` Pavel Butsykin
  2017-05-31 16:03     ` Max Reitz
  2017-05-31 16:10     ` Richard W.M. Jones
  2017-06-01  9:12   ` Kevin Wolf
  1 sibling, 2 replies; 19+ messages in thread
From: Pavel Butsykin @ 2017-05-31 15:54 UTC (permalink / raw)
  To: Eric Blake, qemu-block, qemu-devel; +Cc: kwolf, mreitz, armbru

On 31.05.2017 18:03, Eric Blake wrote:
> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
>> This patch adds the reduction of the image file for qcow2. As a result, this
>> allows us to reduce the virtual image size and free up space on the disk without
>> copying the image. Image can be fragmented and reduction is done by punching
>> holes in the image file.
>>
>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
>> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>
>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
> 
> So this is 1G of guest-visible data...
> 
>> # ./qemu-img resize image.qcow2 128M
>> Image resized.
> 
> ...and you are truncating the image by throwing away guest-visible
> content, with no warning or double-checking (such as requiring a -f
> force parameter or something) about the data loss.  Shrinking images is
> something we should allow, but it feels like is a rare enough operation
> that you don't want it to be casually easy to throw away data.

It is assumed that the user has already made a preparatory with the
image:
1. freeing space at the end of the image
2. reducing the last partition on the disk
3. rebuilding fs
Only after these steps, the user can reduce the image by qemu-img.

I think it's not so rare case, sometimes people run out of disk space 
and this is another way to solve the problem (along with the use of
trim).

We already have all the interfaces, left them only to support :)

> Is it feasible to require that a shrink operation will not be performed
> unless all clusters being eliminated have been previously discarded (or
> maybe written to zero), as an assurance that the guest did not care
> about the tail of the image?
> 

Yes.

# ./qemu-img create -f qcow2 -o size=4G image.qcow2

# ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
# ./qemu-io -c "write -P 0x22 1G 1G" image.qcow2

# qemu-img map ./image.qcow2
Offset          Length          Mapped to       File
0               0x20000000      0x50000         ./image.qcow2
0x20000000      0x20000000      0x20060000      ./image.qcow2
0x40000000      0x20000000      0x40070000      ./image.qcow2
0x60000000      0x20000000      0x60080000      ./image.qcow2

# ./qemu-io -c "discard 1G 1G" ./image.qcow2

# qemu-img map ./image.qcow2
Offset          Length          Mapped to       File0 
0x20000000      0x50000         ./image.qcow2
0x20000000      0x20000000      0x20060000      ./image.qcow2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-05-31 15:54   ` Pavel Butsykin
@ 2017-05-31 16:03     ` Max Reitz
  2017-05-31 17:01       ` Pavel Butsykin
  2017-05-31 16:10     ` Richard W.M. Jones
  1 sibling, 1 reply; 19+ messages in thread
From: Max Reitz @ 2017-05-31 16:03 UTC (permalink / raw)
  To: Pavel Butsykin, Eric Blake, qemu-block, qemu-devel; +Cc: kwolf, armbru

[-- Attachment #1: Type: text/plain, Size: 3638 bytes --]

On 2017-05-31 17:54, Pavel Butsykin wrote:
> On 31.05.2017 18:03, Eric Blake wrote:
>> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
>>> This patch adds the reduction of the image file for qcow2. As a
>>> result, this
>>> allows us to reduce the virtual image size and free up space on the
>>> disk without
>>> copying the image. Image can be fragmented and reduction is done by
>>> punching
>>> holes in the image file.
>>>
>>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
>>> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off
>>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>>
>>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
>>
>> So this is 1G of guest-visible data...
>>
>>> # ./qemu-img resize image.qcow2 128M
>>> Image resized.
>>
>> ...and you are truncating the image by throwing away guest-visible
>> content, with no warning or double-checking (such as requiring a -f
>> force parameter or something) about the data loss.  Shrinking images is
>> something we should allow, but it feels like is a rare enough operation
>> that you don't want it to be casually easy to throw away data.
> 
> It is assumed that the user has already made a preparatory with the
> image:
> 1. freeing space at the end of the image
> 2. reducing the last partition on the disk
> 3. rebuilding fs
> Only after these steps, the user can reduce the image by qemu-img.

But your implementation allows the user to reduce it anyway, even if
these steps have not been performed.

I very much agree that we have to be careful because otherwise you might
ruin all your data if you hand-type a resize command and drop a digit.

> I think it's not so rare case, sometimes people run out of disk space
> and this is another way to solve the problem (along with the use of
> trim).
> 
> We already have all the interfaces, left them only to support :)
> 
>> Is it feasible to require that a shrink operation will not be performed
>> unless all clusters being eliminated have been previously discarded (or
>> maybe written to zero), as an assurance that the guest did not care
>> about the tail of the image?
>>
> 
> Yes.
> 
> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
> 
> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
> # ./qemu-io -c "write -P 0x22 1G 1G" image.qcow2
> 
> # qemu-img map ./image.qcow2
> Offset          Length          Mapped to       File
> 0               0x20000000      0x50000         ./image.qcow2
> 0x20000000      0x20000000      0x20060000      ./image.qcow2
> 0x40000000      0x20000000      0x40070000      ./image.qcow2
> 0x60000000      0x20000000      0x60080000      ./image.qcow2
> 
> # ./qemu-io -c "discard 1G 1G" ./image.qcow2
> 
> # qemu-img map ./image.qcow2
> Offset          Length          Mapped to       File0 0x20000000     
> 0x50000         ./image.qcow2
> 0x20000000      0x20000000      0x20060000      ./image.qcow2

No, it isn't, because qemu-io is a debugging tool and a debugging tool only.

We could require the user to perform a trim operation or something in
the guest instead -- but I'd prefer a plain new flag for qemu-img resize
that says the user is OK with shrinking the image and thus throwing data
way.

I think it's fine to have this flag only as part of the qemu-img
interface, not e.g. for the block-resize QMP command. I think it's
reasonable to assume someone sending a QMP command (i.e. usually the
management layer) to know exactly what they are doing. OTOH, I wouldn't
oppose a flag there, though, I just don't think it's absolutely necessary.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 498 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-05-31 15:54   ` Pavel Butsykin
  2017-05-31 16:03     ` Max Reitz
@ 2017-05-31 16:10     ` Richard W.M. Jones
  2017-05-31 17:39       ` Pavel Butsykin
  1 sibling, 1 reply; 19+ messages in thread
From: Richard W.M. Jones @ 2017-05-31 16:10 UTC (permalink / raw)
  To: Pavel Butsykin; +Cc: Eric Blake, qemu-block, qemu-devel, kwolf, armbru, mreitz

On Wed, May 31, 2017 at 06:54:33PM +0300, Pavel Butsykin wrote:
> It is assumed that the user has already made a preparatory with the
> image:
> 1. freeing space at the end of the image
> 2. reducing the last partition on the disk
> 3. rebuilding fs
> Only after these steps, the user can reduce the image by qemu-img.

It's tricky with GPT, as GPT puts a second copy of the header in the
last few sectors of the disk.  I always advise people to use trim (or
virt-sparsify) instead of trying to adjust virtual disk size.

Nevertheless I don't think we should prevent shrinking qcow2 virtual
size.  It's likely useful to someone.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-05-31 16:03     ` Max Reitz
@ 2017-05-31 17:01       ` Pavel Butsykin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Butsykin @ 2017-05-31 17:01 UTC (permalink / raw)
  To: Max Reitz, Eric Blake, qemu-block, qemu-devel; +Cc: kwolf, armbru

On 31.05.2017 19:03, Max Reitz wrote:
> On 2017-05-31 17:54, Pavel Butsykin wrote:
>> On 31.05.2017 18:03, Eric Blake wrote:
>>> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
>>>> This patch adds the reduction of the image file for qcow2. As a
>>>> result, this
>>>> allows us to reduce the virtual image size and free up space on the
>>>> disk without
>>>> copying the image. Image can be fragmented and reduction is done by
>>>> punching
>>>> holes in the image file.
>>>>
>>>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
>>>> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off
>>>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>>>
>>>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
>>>
>>> So this is 1G of guest-visible data...
>>>
>>>> # ./qemu-img resize image.qcow2 128M
>>>> Image resized.
>>>
>>> ...and you are truncating the image by throwing away guest-visible
>>> content, with no warning or double-checking (such as requiring a -f
>>> force parameter or something) about the data loss.  Shrinking images is
>>> something we should allow, but it feels like is a rare enough operation
>>> that you don't want it to be casually easy to throw away data.
>>
>> It is assumed that the user has already made a preparatory with the
>> image:
>> 1. freeing space at the end of the image
>> 2. reducing the last partition on the disk
>> 3. rebuilding fs
>> Only after these steps, the user can reduce the image by qemu-img.
> 
> But your implementation allows the user to reduce it anyway, even if
> these steps have not been performed.
> 
> I very much agree that we have to be careful because otherwise you might
> ruin all your data if you hand-type a resize command and drop a digit.
>

We could check that the shrinking part of the image doesn't contain
non-zero clusters and print just a warning. But on the other hand, if
the user has not performed trim, at the end of the disk will still be
dirty cluster and we can't force users to do trim :)

We can add a flag --force and without flag just print a warning.

>> I think it's not so rare case, sometimes people run out of disk space
>> and this is another way to solve the problem (along with the use of
>> trim).
>>
>> We already have all the interfaces, left them only to support :)
>>
>>> Is it feasible to require that a shrink operation will not be performed
>>> unless all clusters being eliminated have been previously discarded (or
>>> maybe written to zero), as an assurance that the guest did not care
>>> about the tail of the image?
>>>
>>
>> Yes.
>>
>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
>>
>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
>> # ./qemu-io -c "write -P 0x22 1G 1G" image.qcow2
>>
>> # qemu-img map ./image.qcow2
>> Offset          Length          Mapped to       File
>> 0               0x20000000      0x50000         ./image.qcow2
>> 0x20000000      0x20000000      0x20060000      ./image.qcow2
>> 0x40000000      0x20000000      0x40070000      ./image.qcow2
>> 0x60000000      0x20000000      0x60080000      ./image.qcow2
>>
>> # ./qemu-io -c "discard 1G 1G" ./image.qcow2
>>
>> # qemu-img map ./image.qcow2
>> Offset          Length          Mapped to       File0 0x20000000
>> 0x50000         ./image.qcow2
>> 0x20000000      0x20000000      0x20060000      ./image.qcow2
> 
> No, it isn't, because qemu-io is a debugging tool and a debugging tool only.
> 
> We could require the user to perform a trim operation or something in
> the guest instead -- but I'd prefer a plain new flag for qemu-img resize
> that says the user is OK with shrinking the image and thus throwing data
> way.
> 
> I think it's fine to have this flag only as part of the qemu-img
> interface, not e.g. for the block-resize QMP command. I think it's
> reasonable to assume someone sending a QMP command (i.e. usually the
> management layer) to know exactly what they are doing. OTOH, I wouldn't
> oppose a flag there, though, I just don't think it's absolutely necessary.
>

I agree that the flag can be only as protection from the human factor.

> Max
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-05-31 16:10     ` Richard W.M. Jones
@ 2017-05-31 17:39       ` Pavel Butsykin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Butsykin @ 2017-05-31 17:39 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Eric Blake, qemu-block, qemu-devel, kwolf, armbru, mreitz


On 31.05.2017 19:10, Richard W.M. Jones wrote:
> On Wed, May 31, 2017 at 06:54:33PM +0300, Pavel Butsykin wrote:
>> It is assumed that the user has already made a preparatory with the
>> image:
>> 1. freeing space at the end of the image
>> 2. reducing the last partition on the disk
>> 3. rebuilding fs
>> Only after these steps, the user can reduce the image by qemu-img.
> 
> It's tricky with GPT, as GPT puts a second copy of the header in the
> last few sectors of the disk.  I always advise people to use trim (or
> virt-sparsify) instead of trying to adjust virtual disk size.

virt-sparsify is a very useful utility, I use it too :) I agree that the
trim is much better if the goal is only to free up space on the host
disk. But there's another goal is to limit further growth of the image.

> 
> Nevertheless I don't think we should prevent shrinking qcow2 virtual
> size.  It's likely useful to someone.
> 
> Rich.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-05-31 15:03 ` [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Eric Blake
  2017-05-31 15:54   ` Pavel Butsykin
@ 2017-06-01  9:12   ` Kevin Wolf
  2017-06-01 11:11     ` Denis V. Lunev
  1 sibling, 1 reply; 19+ messages in thread
From: Kevin Wolf @ 2017-06-01  9:12 UTC (permalink / raw)
  To: Eric Blake; +Cc: Pavel Butsykin, qemu-block, qemu-devel, mreitz, armbru

[-- Attachment #1: Type: text/plain, Size: 2073 bytes --]

Am 31.05.2017 um 17:03 hat Eric Blake geschrieben:
> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
> > This patch adds the reduction of the image file for qcow2. As a result, this
> > allows us to reduce the virtual image size and free up space on the disk without
> > copying the image. Image can be fragmented and reduction is done by punching
> > holes in the image file.
> > 
> > # ./qemu-img create -f qcow2 -o size=4G image.qcow2
> > Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
> > 
> > # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
> 
> So this is 1G of guest-visible data...
> 
> > # ./qemu-img resize image.qcow2 128M
> > Image resized.
> 
> ...and you are truncating the image by throwing away guest-visible
> content, with no warning or double-checking (such as requiring a -f
> force parameter or something) about the data loss.  Shrinking images is
> something we should allow, but it feels like is a rare enough operation
> that you don't want it to be casually easy to throw away data.
> 
> Is it feasible to require that a shrink operation will not be performed
> unless all clusters being eliminated have been previously discarded (or
> maybe written to zero), as an assurance that the guest did not care
> about the tail of the image?

I think that ship has sailed long ago because raw images have always
supported shrinking images without any special checks or options. We
want consistency between raw and qcow2, and we also need to provide
backwards compatibility.

The only thing I can imagine we could do now is to introduce a --shrink
option in qemu-img, print a deprecation warning for shrinking without
this option, and then make it mandatory in a few years.

If we want to distinguish based on the block driver, so that we can
require --shrink for all drivers that didn't support shrinking until
now, we'd have to check the .bdrv_truncate() implementations of all
drivers to see whether it already allowed shrinking.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025
  2017-05-31 14:54   ` Pavel Butsykin
@ 2017-06-01  9:14     ` Kevin Wolf
  0 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2017-06-01  9:14 UTC (permalink / raw)
  To: Pavel Butsykin; +Cc: qemu-block, qemu-devel, mreitz, armbru, eblake

Am 31.05.2017 um 16:54 hat Pavel Butsykin geschrieben:
> On 31.05.2017 17:43, Pavel Butsykin wrote:
> >Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> >---
> >  tests/qemu-iotests/025     | 19 +++++++++++++++++--
> >  tests/qemu-iotests/025.out | 12 +++++++++++-
> >  2 files changed, 28 insertions(+), 3 deletions(-)
> >
> >diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
> >index f5e672e6b3..658601579b 100755
> >--- a/tests/qemu-iotests/025
> >+++ b/tests/qemu-iotests/025
> >@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
> >  . ./common.filter
> >  . ./common.pattern
> >-_supported_fmt raw qcow2 qed
> >+_supported_fmt raw qcow2
> 
> I'm not sure, can I so blatantly drop QED here. But this place is very
> suitable for reduce image case. Perhaps the alternative would be adding
> a new test, I just didn't want to copy the tests, which are testing
> almost the same thing.

I think it's better to have a separate test case for shrinking, so that
we don't damage the test coverage of qed. The resulting new patch would
be a very small one, but that's okay.

Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-06-01  9:12   ` Kevin Wolf
@ 2017-06-01 11:11     ` Denis V. Lunev
  2017-06-01 11:31       ` Kevin Wolf
  2017-06-07 13:37       ` Max Reitz
  0 siblings, 2 replies; 19+ messages in thread
From: Denis V. Lunev @ 2017-06-01 11:11 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake
  Cc: armbru, mreitz, qemu-devel, qemu-block, Pavel Butsykin

On 06/01/2017 12:12 PM, Kevin Wolf wrote:
> Am 31.05.2017 um 17:03 hat Eric Blake geschrieben:
>> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
>>> This patch adds the reduction of the image file for qcow2. As a result, this
>>> allows us to reduce the virtual image size and free up space on the disk without
>>> copying the image. Image can be fragmented and reduction is done by punching
>>> holes in the image file.
>>>
>>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
>>> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>>
>>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
>> So this is 1G of guest-visible data...
>>
>>> # ./qemu-img resize image.qcow2 128M
>>> Image resized.
>> ...and you are truncating the image by throwing away guest-visible
>> content, with no warning or double-checking (such as requiring a -f
>> force parameter or something) about the data loss.  Shrinking images is
>> something we should allow, but it feels like is a rare enough operation
>> that you don't want it to be casually easy to throw away data.
>>
>> Is it feasible to require that a shrink operation will not be performed
>> unless all clusters being eliminated have been previously discarded (or
>> maybe written to zero), as an assurance that the guest did not care
>> about the tail of the image?
> I think that ship has sailed long ago because raw images have always
> supported shrinking images without any special checks or options. We
> want consistency between raw and qcow2, and we also need to provide
> backwards compatibility.
>
> The only thing I can imagine we could do now is to introduce a --shrink
> option in qemu-img, print a deprecation warning for shrinking without
> this option, and then make it mandatory in a few years.
>
> If we want to distinguish based on the block driver, so that we can
> require --shrink for all drivers that didn't support shrinking until
> now, we'd have to check the .bdrv_truncate() implementations of all
> drivers to see whether it already allowed shrinking.
>
> Kevin
Will the solution proposed by Pavel in the answer to Max work:
- if there are no data blocks in the truncated space, truncate is allowed
  without additional options
- if there are data blocks in the truncated space, truncate is allowed
  with the flag --force or error is generated

Den

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-06-01 11:11     ` Denis V. Lunev
@ 2017-06-01 11:31       ` Kevin Wolf
  2017-06-07 13:37       ` Max Reitz
  1 sibling, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2017-06-01 11:31 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Eric Blake, armbru, mreitz, qemu-devel, qemu-block, Pavel Butsykin

Am 01.06.2017 um 13:11 hat Denis V. Lunev geschrieben:
> On 06/01/2017 12:12 PM, Kevin Wolf wrote:
> > Am 31.05.2017 um 17:03 hat Eric Blake geschrieben:
> >> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
> >>> This patch adds the reduction of the image file for qcow2. As a result, this
> >>> allows us to reduce the virtual image size and free up space on the disk without
> >>> copying the image. Image can be fragmented and reduction is done by punching
> >>> holes in the image file.
> >>>
> >>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
> >>> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
> >>>
> >>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
> >> So this is 1G of guest-visible data...
> >>
> >>> # ./qemu-img resize image.qcow2 128M
> >>> Image resized.
> >> ...and you are truncating the image by throwing away guest-visible
> >> content, with no warning or double-checking (such as requiring a -f
> >> force parameter or something) about the data loss.  Shrinking images is
> >> something we should allow, but it feels like is a rare enough operation
> >> that you don't want it to be casually easy to throw away data.
> >>
> >> Is it feasible to require that a shrink operation will not be performed
> >> unless all clusters being eliminated have been previously discarded (or
> >> maybe written to zero), as an assurance that the guest did not care
> >> about the tail of the image?
> > I think that ship has sailed long ago because raw images have always
> > supported shrinking images without any special checks or options. We
> > want consistency between raw and qcow2, and we also need to provide
> > backwards compatibility.
> >
> > The only thing I can imagine we could do now is to introduce a --shrink
> > option in qemu-img, print a deprecation warning for shrinking without
> > this option, and then make it mandatory in a few years.
> >
> > If we want to distinguish based on the block driver, so that we can
> > require --shrink for all drivers that didn't support shrinking until
> > now, we'd have to check the .bdrv_truncate() implementations of all
> > drivers to see whether it already allowed shrinking.
> >
> > Kevin
> Will the solution proposed by Pavel in the answer to Max work:
> - if there are no data blocks in the truncated space, truncate is allowed
>   without additional options
> - if there are data blocks in the truncated space, truncate is allowed
>   with the flag --force or error is generated

I don't think that's a great user interface because it means that you
can only shrink images without --force if you use a guest device that
supports discard (i.e. not virtio-blk) and your guest OS supports it,
too. In fact, even with a SCSI device and Linux, I wouldn't be sure
offhand how to discard unused space after the last partition. I'd rather
call the switch --shrink and require it always.

In any case, however, I think 'qemu-img resize' should work consistently
across different formats. That is, if we require that there be no data
blocks in the truncated space for qcow2, we must require the same for
raw. And that's where we come back to backwards compatibility, so it's
the same case as I mentioned above: We would have to print a deprecation
warning for some years and only then we could make it mandatory.

Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support
  2017-05-31 14:43 ` [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support Pavel Butsykin
@ 2017-06-01 14:41   ` Kevin Wolf
  2017-06-02  9:53     ` Pavel Butsykin
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Wolf @ 2017-06-01 14:41 UTC (permalink / raw)
  To: Pavel Butsykin; +Cc: qemu-block, qemu-devel, mreitz, armbru, eblake

Am 31.05.2017 um 16:43 hat Pavel Butsykin geschrieben:
> This patch adds the reduction of the image file for qcow2. As a result, this
> allows us to reduce the virtual image size and free up space on the disk without
> copying the image. Image can be fragmented and reduction is done by punching
> holes in the image file.
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> ---
>  block/qcow2-cache.c    |  8 +++++
>  block/qcow2-cluster.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2-refcount.c | 65 +++++++++++++++++++++++++++++++++++++++
>  block/qcow2.c          | 40 ++++++++++++++++++------
>  block/qcow2.h          |  4 +++
>  qapi/block-core.json   |  4 ++-
>  6 files changed, 193 insertions(+), 11 deletions(-)
> 
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index 1d25147392..da55118ca7 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -411,3 +411,11 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
>      assert(c->entries[i].offset != 0);
>      c->entries[i].dirty = true;
>  }
> +
> +void qcow2_cache_entry_mark_clean(BlockDriverState *bs, Qcow2Cache *c,
> +     void *table)
> +{
> +    int i = qcow2_cache_get_table_idx(bs, c, table);
> +    assert(c->entries[i].offset != 0);
> +    c->entries[i].dirty = false;
> +}

This is an interesting function. We can use it whenever we're not
interested in the content of the table any more. However, we still keep
that data in the cache and may even evict other tables before this one.
The data in the cache also becomes inconsistent with the data in the
file, which should not be a problem in theory (because nobody should be
using it), but it surely could be confusing when debugging something in
the cache.

We can easily improve this a little: Make it qcow2_cache_discard(), a
function that gets a cluster offset, asserts that a table at this
offset isn't in use (not cached or ref == 0), and then just directly
drops it from the cache. This can be called from update_refcount()
whenever a refcount goes to 0, immediately before or after calling
update_refcount_discard() - those two are closely related. Then this
would automatically also be used for L2 tables.

Adding this mechanism could be a patch of its own.

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 347d94b0d2..47e04d7317 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -32,6 +32,89 @@
>  #include "qemu/bswap.h"
>  #include "trace.h"
>  
> +int qcow2_reduce_l1_table(BlockDriverState *bs, uint64_t max_size)

Let's call this shrink, it's easier to understand and consistent with
qcow2_reftable_shrink() that this patch adds, too.

> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    int64_t new_l1_size_bytes, free_l1_clusters;
> +    uint64_t *new_l1_table;
> +    int new_l1_size, i, ret;
> +
> +    if (max_size >= s->l1_size) {
> +        return 0;
> +    }
> +
> +    new_l1_size = max_size;
> +
> +#ifdef DEBUG_ALLOC2
> +    fprintf(stderr, "reduce l1_table from %d to %" PRId64 "\n",
> +            s->l1_size, new_l1_size);
> +#endif
> +
> +    ret = qcow2_cache_flush(bs, s->l2_table_cache);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_FREE_L2_CLUSTERS);
> +    for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
> +        if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
> +            continue;
> +        }
> +        qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
> +                            s->l2_size * sizeof(uint64_t),
> +                            QCOW2_DISCARD_ALWAYS);
> +    }
> +
> +    new_l1_size_bytes = sizeof(uint64_t) * new_l1_size;

On 32 bit hosts, this does a 32 bit calculation and assigns it to a 64
bit value. I think this is still correct because new_l1_size_bytes is
limited by QCOW_MAX_L1_SIZE (0x2000000).

If this is the intention, maybe it would be more obvious to use a
normal int for new_l1_size_bytes, or to assert(new_l1_size <=
QCOW_MAX_L1_SIZE / sizeof(uint64_t)) before this line.

> +    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_WRITE_TABLE);
> +    ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset + new_l1_size_bytes,
> +                             s->l1_size * sizeof(uint64_t) - new_l1_size_bytes,
> +                             0);

s->l1_table and the on-disk content are out of sync now. Error paths
must bring them back into sync from now on.

This is easier with the approach of qcow2_grow_l1_table(), which creates
a completely new L1 table and then atomically switches from old to new
with a header update.

> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_flush(bs->file->bs);
> +    if (ret < 0) {
> +        return ret;
> +    }

In both of these error cases, we don't know the actual state of the L1
table on disk.

> +    /* set new table size */
> +    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_ACTIVATE_TABLE);
> +    new_l1_size = cpu_to_be32(new_l1_size);
> +    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
> +                           &new_l1_size, sizeof(new_l1_size));
> +    new_l1_size = be32_to_cpu(new_l1_size);
> +    if (ret < 0) {
> +        return ret;
> +    }

Maybe we can salvage the error handling if we move this first. If this
update fails, we simply have to keep using the old L1 table. If it
succeeds, we have successfully shrunk the L1 table and the contents of
the old entries doesn't strictly matter. We can zero it out just to be
nice, but correctness isn't affected by it.

> +    BLKDBG_EVENT(bs->file, BLKDBG_L1_REDUCE_FREE_L1_CLUSTERS);
> +    free_l1_clusters =
> +        DIV_ROUND_UP(s->l1_size * sizeof(uint64_t), s->cluster_size) -
> +        DIV_ROUND_UP(new_l1_size_bytes, s->cluster_size);
> +    if (free_l1_clusters) {
> +        qcow2_free_clusters(bs, s->l1_table_offset +
> +                                ROUND_UP(new_l1_size_bytes, s->cluster_size),
> +                            free_l1_clusters << s->cluster_bits,
> +                            QCOW2_DISCARD_ALWAYS);
> +    }
> +
> +    new_l1_table = qemu_try_blockalign(bs->file->bs,
> +                                       align_offset(new_l1_size_bytes, 512));
> +    if (new_l1_table == NULL) {
> +        return -ENOMEM;

Now the disk has a shortened L1 size, but our in-memory representation
still has the old size. This will cause corruption when the L1 table is
grown again.

> +    }
> +    memcpy(new_l1_table, s->l1_table, new_l1_size_bytes);
> +
> +    qemu_vfree(s->l1_table);
> +    s->l1_table = new_l1_table;
> +    s->l1_size = new_l1_size;
> +
> +    return 0;
> +}

Another thought: Is resizing the L1 table actually worth it given how
easy it is to get the error paths wrong?

With 64k clusters, you get another L1 table cluster every 4 TB. So
leaving 64k in the image file uselessly allocated for resizing an image
from 8 TB to 4 TB sounds like a waste that is totally acceptable if we
use it to get some more confidence that we won't corrupt images in error
cases.

We could just free the now unused L2 tables and overwrite their L1
entries with 0 without actually resizing the L1 table.

> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 7c06061aae..5481b623cd 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c

Skipping the review for refcount tables for now. The same argument
as for L1 tables applies, except that each cluster of the refcount table
covers 16 TB instead of 4 TB here.

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 6b974b952f..dcd2d0241f 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2371,7 +2371,9 @@
>              'cluster_alloc_bytes', 'cluster_free', 'flush_to_os',
>              'flush_to_disk', 'pwritev_rmw_head', 'pwritev_rmw_after_head',
>              'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
> -            'pwritev_zero', 'pwritev_done', 'empty_image_prepare' ] }
> +            'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
> +            'l1_reduce_write_table', 'l1_reduce_activate_table',
> +            'l1_reduce_free_l2_clusters', 'l1_reduce_free_l1_clusters' ] }

If you rename the function above, s/reduce/shrink/ here as well.

Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support
  2017-06-01 14:41   ` Kevin Wolf
@ 2017-06-02  9:53     ` Pavel Butsykin
  2017-06-02 13:33       ` Kevin Wolf
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Butsykin @ 2017-06-02  9:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, qemu-devel, mreitz, armbru, eblake



On 01.06.2017 17:41, Kevin Wolf wrote:
> Am 31.05.2017 um 16:43 hat Pavel Butsykin geschrieben:
>> This patch adds the reduction of the image file for qcow2. As a result, this
>> allows us to reduce the virtual image size and free up space on the disk without
>> copying the image. Image can be fragmented and reduction is done by punching
>> holes in the image file.
>>
>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
>> ---
>>   block/qcow2-cache.c    |  8 +++++
>>   block/qcow2-cluster.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2-refcount.c | 65 +++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.c          | 40 ++++++++++++++++++------
>>   block/qcow2.h          |  4 +++
>>   qapi/block-core.json   |  4 ++-
>>   6 files changed, 193 insertions(+), 11 deletions(-)
>>
>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>> index 1d25147392..da55118ca7 100644
>> --- a/block/qcow2-cache.c
>> +++ b/block/qcow2-cache.c
>> @@ -411,3 +411,11 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
>>       assert(c->entries[i].offset != 0);
>>       c->entries[i].dirty = true;
>>   }
>> +
>> +void qcow2_cache_entry_mark_clean(BlockDriverState *bs, Qcow2Cache *c,
>> +     void *table)
>> +{
>> +    int i = qcow2_cache_get_table_idx(bs, c, table);
>> +    assert(c->entries[i].offset != 0);
>> +    c->entries[i].dirty = false;
>> +}
> 
> This is an interesting function. We can use it whenever we're not
> interested in the content of the table any more. However, we still keep
> that data in the cache and may even evict other tables before this one.
> The data in the cache also becomes inconsistent with the data in the
> file, which should not be a problem in theory (because nobody should be
> using it), but it surely could be confusing when debugging something in
> the cache.
> 

Good idea!

> We can easily improve this a little: Make it qcow2_cache_discard(), a
> function that gets a cluster offset, asserts that a table at this
> offset isn't in use (not cached or ref == 0), and then just directly
> drops it from the cache. This can be called from update_refcount()
> whenever a refcount goes to 0, immediately before or after calling
> update_refcount_discard() - those two are closely related. Then this
> would automatically also be used for L2 tables.
> 

Did I understand correctly? Every time we need to check the incoming
offset to make sure it is offset to L2/refcount table (not to the guest 
data) ?

> Adding this mechanism could be a patch of its own
...
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support
  2017-06-02  9:53     ` Pavel Butsykin
@ 2017-06-02 13:33       ` Kevin Wolf
  0 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2017-06-02 13:33 UTC (permalink / raw)
  To: Pavel Butsykin; +Cc: qemu-block, qemu-devel, mreitz, armbru, eblake

Am 02.06.2017 um 11:53 hat Pavel Butsykin geschrieben:
> On 01.06.2017 17:41, Kevin Wolf wrote:
> >Am 31.05.2017 um 16:43 hat Pavel Butsykin geschrieben:
> >>This patch adds the reduction of the image file for qcow2. As a result, this
> >>allows us to reduce the virtual image size and free up space on the disk without
> >>copying the image. Image can be fragmented and reduction is done by punching
> >>holes in the image file.
> >>
> >>Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> >>---
> >>  block/qcow2-cache.c    |  8 +++++
> >>  block/qcow2-cluster.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  block/qcow2-refcount.c | 65 +++++++++++++++++++++++++++++++++++++++
> >>  block/qcow2.c          | 40 ++++++++++++++++++------
> >>  block/qcow2.h          |  4 +++
> >>  qapi/block-core.json   |  4 ++-
> >>  6 files changed, 193 insertions(+), 11 deletions(-)
> >>
> >>diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> >>index 1d25147392..da55118ca7 100644
> >>--- a/block/qcow2-cache.c
> >>+++ b/block/qcow2-cache.c
> >>@@ -411,3 +411,11 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
> >>      assert(c->entries[i].offset != 0);
> >>      c->entries[i].dirty = true;
> >>  }
> >>+
> >>+void qcow2_cache_entry_mark_clean(BlockDriverState *bs, Qcow2Cache *c,
> >>+     void *table)
> >>+{
> >>+    int i = qcow2_cache_get_table_idx(bs, c, table);
> >>+    assert(c->entries[i].offset != 0);
> >>+    c->entries[i].dirty = false;
> >>+}
> >
> >This is an interesting function. We can use it whenever we're not
> >interested in the content of the table any more. However, we still keep
> >that data in the cache and may even evict other tables before this one.
> >The data in the cache also becomes inconsistent with the data in the
> >file, which should not be a problem in theory (because nobody should be
> >using it), but it surely could be confusing when debugging something in
> >the cache.
> >
> 
> Good idea!
> 
> >We can easily improve this a little: Make it qcow2_cache_discard(), a
> >function that gets a cluster offset, asserts that a table at this
> >offset isn't in use (not cached or ref == 0), and then just directly
> >drops it from the cache. This can be called from update_refcount()
> >whenever a refcount goes to 0, immediately before or after calling
> >update_refcount_discard() - those two are closely related. Then this
> >would automatically also be used for L2 tables.
> >
> 
> Did I understand correctly? Every time we need to check the incoming
> offset to make sure it is offset to L2/refcount table (not to the
> guest data) ?

Yes. Basically, whenever the refcount of a cluster becomes 0 and it is
in a cache, remove it from the cache.

Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-06-01 11:11     ` Denis V. Lunev
  2017-06-01 11:31       ` Kevin Wolf
@ 2017-06-07 13:37       ` Max Reitz
  2017-06-07 15:51         ` Kevin Wolf
  1 sibling, 1 reply; 19+ messages in thread
From: Max Reitz @ 2017-06-07 13:37 UTC (permalink / raw)
  To: Denis V. Lunev, Kevin Wolf, Eric Blake
  Cc: armbru, qemu-devel, qemu-block, Pavel Butsykin

[-- Attachment #1: Type: text/plain, Size: 3157 bytes --]

On 2017-06-01 13:11, Denis V. Lunev wrote:
> On 06/01/2017 12:12 PM, Kevin Wolf wrote:
>> Am 31.05.2017 um 17:03 hat Eric Blake geschrieben:
>>> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
>>>> This patch adds the reduction of the image file for qcow2. As a result, this
>>>> allows us to reduce the virtual image size and free up space on the disk without
>>>> copying the image. Image can be fragmented and reduction is done by punching
>>>> holes in the image file.
>>>>
>>>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
>>>> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>>>
>>>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
>>> So this is 1G of guest-visible data...
>>>
>>>> # ./qemu-img resize image.qcow2 128M
>>>> Image resized.
>>> ...and you are truncating the image by throwing away guest-visible
>>> content, with no warning or double-checking (such as requiring a -f
>>> force parameter or something) about the data loss.  Shrinking images is
>>> something we should allow, but it feels like is a rare enough operation
>>> that you don't want it to be casually easy to throw away data.
>>>
>>> Is it feasible to require that a shrink operation will not be performed
>>> unless all clusters being eliminated have been previously discarded (or
>>> maybe written to zero), as an assurance that the guest did not care
>>> about the tail of the image?
>> I think that ship has sailed long ago because raw images have always
>> supported shrinking images without any special checks or options. We
>> want consistency between raw and qcow2, and we also need to provide
>> backwards compatibility.
>>
>> The only thing I can imagine we could do now is to introduce a --shrink
>> option in qemu-img, print a deprecation warning for shrinking without
>> this option, and then make it mandatory in a few years.

Do I hear a "3.0"?

>> If we want to distinguish based on the block driver, so that we can
>> require --shrink for all drivers that didn't support shrinking until
>> now, we'd have to check the .bdrv_truncate() implementations of all
>> drivers to see whether it already allowed shrinking.

We could have an ugly raw-only hack directly in qemu-img (and
qmp_block_resize) until 3.0.

I'm really concerned about someone mistyping something and accidentally
dropping a digit.

>> Kevin
> Will the solution proposed by Pavel in the answer to Max work:
> - if there are no data blocks in the truncated space, truncate is allowed
>   without additional options
> - if there are data blocks in the truncated space, truncate is allowed
>   with the flag --force or error is generated

I'd be OK with that, but I'm not sure we really need it. It would be
nice to have for convenience, but I don't think it solves the
backwards-compatibility problem (as said by Kevin), and I'm not sure it
makes things that much more convenient: Just specifying --force is
easier than to manually trim the device in the guest (and then maybe
having to specify --force anyway because you didn't do it right).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 498 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2
  2017-06-07 13:37       ` Max Reitz
@ 2017-06-07 15:51         ` Kevin Wolf
  0 siblings, 0 replies; 19+ messages in thread
From: Kevin Wolf @ 2017-06-07 15:51 UTC (permalink / raw)
  To: Max Reitz
  Cc: Denis V. Lunev, Eric Blake, armbru, qemu-devel, qemu-block,
	Pavel Butsykin

[-- Attachment #1: Type: text/plain, Size: 3933 bytes --]

Am 07.06.2017 um 15:37 hat Max Reitz geschrieben:
> On 2017-06-01 13:11, Denis V. Lunev wrote:
> > On 06/01/2017 12:12 PM, Kevin Wolf wrote:
> >> Am 31.05.2017 um 17:03 hat Eric Blake geschrieben:
> >>> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
> >>>> This patch adds the reduction of the image file for qcow2. As a result, this
> >>>> allows us to reduce the virtual image size and free up space on the disk without
> >>>> copying the image. Image can be fragmented and reduction is done by punching
> >>>> holes in the image file.
> >>>>
> >>>> # ./qemu-img create -f qcow2 -o size=4G image.qcow2
> >>>> Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
> >>>>
> >>>> # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
> >>> So this is 1G of guest-visible data...
> >>>
> >>>> # ./qemu-img resize image.qcow2 128M
> >>>> Image resized.
> >>> ...and you are truncating the image by throwing away guest-visible
> >>> content, with no warning or double-checking (such as requiring a -f
> >>> force parameter or something) about the data loss.  Shrinking images is
> >>> something we should allow, but it feels like is a rare enough operation
> >>> that you don't want it to be casually easy to throw away data.
> >>>
> >>> Is it feasible to require that a shrink operation will not be performed
> >>> unless all clusters being eliminated have been previously discarded (or
> >>> maybe written to zero), as an assurance that the guest did not care
> >>> about the tail of the image?
> >> I think that ship has sailed long ago because raw images have always
> >> supported shrinking images without any special checks or options. We
> >> want consistency between raw and qcow2, and we also need to provide
> >> backwards compatibility.
> >>
> >> The only thing I can imagine we could do now is to introduce a --shrink
> >> option in qemu-img, print a deprecation warning for shrinking without
> >> this option, and then make it mandatory in a few years.
> 
> Do I hear a "3.0"?

You do.

> >> If we want to distinguish based on the block driver, so that we can
> >> require --shrink for all drivers that didn't support shrinking until
> >> now, we'd have to check the .bdrv_truncate() implementations of all
> >> drivers to see whether it already allowed shrinking.
> 
> We could have an ugly raw-only hack directly in qemu-img (and
> qmp_block_resize) until 3.0.
> 
> I'm really concerned about someone mistyping something and accidentally
> dropping a digit.

Me too. But still we can't break working command lines (at least before
3.0).

I'm okay with temporary hacks in qemu-img, but did you check whether
it's really raw-only? We know that raw can shrink currently and qcow2
can't, but there are 12 drivers implementing .bdrv_truncate, not only
two.

> > Will the solution proposed by Pavel in the answer to Max work:
> > - if there are no data blocks in the truncated space, truncate is allowed
> >   without additional options
> > - if there are data blocks in the truncated space, truncate is allowed
> >   with the flag --force or error is generated
> 
> I'd be OK with that, but I'm not sure we really need it.

Agreed. It would add a lot of complexity for little use. As I wrote in a
previous mail: I don't even know how I would discard the unpartitioned
space after shrinking my guest filesystem.

> It would be nice to have for convenience, but I don't think it solves
> the backwards-compatibility problem (as said by Kevin), and I'm not
> sure it makes things that much more convenient: Just specifying
> --force is easier than to manually trim the device in the guest (and
> then maybe having to specify --force anyway because you didn't do it
> right).

I think it should be specifically --shrink rather than an unspecific
--force that could mean anything.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-06-07 15:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 14:43 [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Pavel Butsykin
2017-05-31 14:43 ` [Qemu-devel] [PATCH 1/2] qcow2: add reduce image support Pavel Butsykin
2017-06-01 14:41   ` Kevin Wolf
2017-06-02  9:53     ` Pavel Butsykin
2017-06-02 13:33       ` Kevin Wolf
2017-05-31 14:43 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: add reducing image test in 025 Pavel Butsykin
2017-05-31 14:54   ` Pavel Butsykin
2017-06-01  9:14     ` Kevin Wolf
2017-05-31 15:03 ` [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2 Eric Blake
2017-05-31 15:54   ` Pavel Butsykin
2017-05-31 16:03     ` Max Reitz
2017-05-31 17:01       ` Pavel Butsykin
2017-05-31 16:10     ` Richard W.M. Jones
2017-05-31 17:39       ` Pavel Butsykin
2017-06-01  9:12   ` Kevin Wolf
2017-06-01 11:11     ` Denis V. Lunev
2017-06-01 11:31       ` Kevin Wolf
2017-06-07 13:37       ` Max Reitz
2017-06-07 15:51         ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.