All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images
@ 2010-04-24  8:12 Stefan Hajnoczi
  2010-04-24  8:12 ` [Qemu-devel] [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images Stefan Hajnoczi
  2010-04-26 11:52 ` [Qemu-devel] Re: [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images Kevin Wolf
  0 siblings, 2 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2010-04-24  8:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

This patch adds a 'resize' command to grow/shrink disk images.  This
allows changing the size of disk images without copying to a new image
file.  Currently only raw files support resize.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
This applies to kevin/block.

I am also sending a qemu-iotests patch to test bdrv_truncate().

 qemu-img-cmds.hx |    6 +++
 qemu-img.c       |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-img.texi    |   12 +++++++
 3 files changed, 110 insertions(+), 0 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index f96876a..c079019 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -49,5 +49,11 @@ DEF("rebase", img_rebase,
     "rebase [-f fmt] [-u] -b backing_file [-F backing_fmt] filename")
 STEXI
 @item rebase [-f @var{fmt}] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename}
+ETEXI
+
+DEF("resize", img_resize,
+    "resize filename [+ | -]size")
+STEXI
+@item rebase @var{filename} [+ | -]@var{size}
 @end table
 ETEXI
diff --git a/qemu-img.c b/qemu-img.c
index 74311a5..c21d999 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1225,6 +1225,98 @@ static int img_rebase(int argc, char **argv)
     return 0;
 }
 
+static int img_resize(int argc, char **argv)
+{
+    int c, ret, relative;
+    const char *filename, *fmt, *size;
+    int64_t n, total_size;
+    BlockDriverState *bs;
+    QEMUOptionParameter *param;
+    QEMUOptionParameter resize_options[] = {
+        {
+            .name = BLOCK_OPT_SIZE,
+            .type = OPT_SIZE,
+            .help = "Virtual disk size"
+        },
+        { NULL }
+    };
+
+    fmt = NULL;
+    for(;;) {
+        c = getopt(argc, argv, "f:h");
+        if (c == -1) {
+            break;
+        }
+        switch(c) {
+        case 'h':
+            help();
+            break;
+        case 'f':
+            fmt = optarg;
+            break;
+        }
+    }
+    if (optind + 1 >= argc) {
+        help();
+    }
+    filename = argv[optind++];
+    size = argv[optind++];
+
+    /* Choose grow, shrink, or absolute resize mode */
+    switch (size[0]) {
+    case '+':
+        relative = 1;
+        size++;
+        break;
+    case '-':
+        relative = -1;
+        size++;
+        break;
+    default:
+        relative = 0;
+        break;
+    }
+
+    /* Parse size */
+    param = parse_option_parameters("", resize_options, NULL);
+    if (set_option_parameter(param, BLOCK_OPT_SIZE, size)) {
+        /* Error message already printed when size parsing fails */
+        exit(1);
+    }
+    n = get_option_parameter(param, BLOCK_OPT_SIZE)->value.n;
+    free_option_parameters(param);
+
+    bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR);
+
+    if (relative) {
+        total_size = bdrv_getlength(bs) + n * relative;
+    } else {
+        total_size = n;
+    }
+    if (total_size <= 0) {
+        error("New image size must be positive");
+    }
+
+    ret = bdrv_truncate(bs, total_size);
+    switch (ret) {
+    case 0:
+        printf("Image resized.\n");
+        break;
+    case -ENOTSUP:
+        error("This image format does not support resize");
+        break;
+    case -EACCES:
+        error("Image is read-only");
+        break;
+    default:
+        error("Error resizing image (%d)", -ret);
+        break;
+    }
+
+    bdrv_delete(bs);
+    return 0;
+}
+
 static const img_cmd_t img_cmds[] = {
 #define DEF(option, callback, arg_string)        \
     { option, callback },
diff --git a/qemu-img.texi b/qemu-img.texi
index ac97854..c1b1f27 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -106,6 +106,18 @@ they are displayed too.
 @item snapshot [-l | -a @var{snapshot} | -c @var{snapshot} | -d @var{snapshot} ] @var{filename}
 
 List, apply, create or delete snapshots in image @var{filename}.
+
+@item resize @var{filename} [+ | -]@var{size}
+
+Change the disk image as if it had been created with @var{size}.
+
+Before using this command to shrink a disk image, you MUST use file system and
+partitioning tools inside the VM to reduce allocated file systems and partition
+sizes accordingly.  Failure to do so will result in data loss!
+
+After using this command to grow a disk image, you must use file system and
+partitioning tools inside the VM to actually begin using the new space on the
+device.
 @end table
 
 Supported image file formats:
-- 
1.7.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images
  2010-04-24  8:12 [Qemu-devel] [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images Stefan Hajnoczi
@ 2010-04-24  8:12 ` Stefan Hajnoczi
  2010-04-26 11:46   ` [Qemu-devel] " Kevin Wolf
  2010-04-26 11:52 ` [Qemu-devel] Re: [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images Kevin Wolf
  1 sibling, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2010-04-24  8:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

This patch adds the ability to grow qcow2 images in-place using
bdrv_truncate().  This enables qemu-img resize command support for
qcow2.

Snapshots are not supported and bdrv_truncate() will return -ENOTSUP.
The notion of resizing an image with snapshots could lead to confusion:
users may expect snapshots to remain unchanged, but this is not possible
with the current qcow2 on-disk format where the header.size field is
global instead of per-snapshot.  Others may expect snapshots to change
size along with the current image data.  I think it is safest to not
support snapshots and perhaps add behavior later if there is a
consensus.

Backing images continue to work.  If the image is now larger than its
backing image, zeroes are read when accessing beyond the end of the
backing image.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
This applies to kevin/block.

 block/qcow2-cluster.c  |   64 ++++++++++++++++++++++++++++++++++++++---------
 block/qcow2-snapshot.c |    2 +-
 block/qcow2.c          |   43 +++++++++++++++++++++++++++++---
 block/qcow2.h          |    9 ++++++-
 4 files changed, 99 insertions(+), 19 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c11680d..20c8426 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -28,30 +28,39 @@
 #include "block_int.h"
 #include "block/qcow2.h"
 
-int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
+/*
+ * qcow2_grow_l1_table_common
+ *
+ * Grows the L1 table and updates the header on disk.
+ *
+ * Setting new_l1_vm_state_index to s->l1_vm_state_index grows the vm state
+ * area.
+ *
+ * Setting new_l1_vm_state_index to the new end of image grows the image data
+ * area.
+ *
+ * Returns 0 on success, -errno in failure case.
+ */
+static int qcow2_grow_l1_table_common(BlockDriverState *bs,
+                                      int new_l1_vm_state_index,
+                                      int new_l1_size)
 {
     BDRVQcowState *s = bs->opaque;
-    int new_l1_size, new_l1_size2, ret, i;
+    int new_l1_size2, ret, i;
     uint64_t *new_l1_table;
     int64_t new_l1_table_offset;
     uint8_t data[12];
 
-    new_l1_size = s->l1_size;
-    if (min_size <= new_l1_size)
-        return 0;
-    if (new_l1_size == 0) {
-        new_l1_size = 1;
-    }
-    while (min_size > new_l1_size) {
-        new_l1_size = (new_l1_size * 3 + 1) / 2;
-    }
 #ifdef DEBUG_ALLOC2
     printf("grow l1_table from %d to %d\n", s->l1_size, new_l1_size);
 #endif
 
     new_l1_size2 = sizeof(uint64_t) * new_l1_size;
     new_l1_table = qemu_mallocz(align_offset(new_l1_size2, 512));
-    memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
+    memcpy(new_l1_table, s->l1_table, s->l1_vm_state_index * sizeof(uint64_t));
+    memcpy(&new_l1_table[new_l1_vm_state_index],
+           &s->l1_table[s->l1_vm_state_index],
+           (s->l1_size - s->l1_vm_state_index) * sizeof(uint64_t));
 
     /* write new table (align to cluster) */
     BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
@@ -83,6 +92,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
     s->l1_table_offset = new_l1_table_offset;
     s->l1_table = new_l1_table;
     s->l1_size = new_l1_size;
+    s->l1_vm_state_index = new_l1_vm_state_index;
     return 0;
  fail:
     qemu_free(new_l1_table);
@@ -90,6 +100,34 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
     return ret < 0 ? ret : -EIO;
 }
 
+int qcow2_grow_l1_vm_state(BlockDriverState *bs, int min_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    int new_l1_size;
+
+    new_l1_size = s->l1_size;
+    if (min_size <= new_l1_size)
+        return 0;
+    if (new_l1_size == 0) {
+        new_l1_size = 1;
+    }
+    while (min_size > new_l1_size) {
+        new_l1_size = (new_l1_size * 3 + 1) / 2;
+    }
+
+    return qcow2_grow_l1_table_common(bs, s->l1_vm_state_index, new_l1_size);
+}
+
+int qcow2_grow_l1_image_data(BlockDriverState *bs, int new_l1_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    int new_l1_vm_state_index;
+
+    new_l1_vm_state_index = new_l1_size;
+    new_l1_size += s->l1_size - s->l1_vm_state_index;
+    return qcow2_grow_l1_table_common(bs, new_l1_vm_state_index, new_l1_size);
+}
+
 void qcow2_l2_cache_reset(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
@@ -524,7 +562,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
 
     l1_index = offset >> (s->l2_bits + s->cluster_bits);
     if (l1_index >= s->l1_size) {
-        ret = qcow2_grow_l1_table(bs, l1_index + 1);
+        ret = qcow2_grow_l1_vm_state(bs, l1_index + 1);
         if (ret < 0) {
             return ret;
         }
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 2a21c17..7f0d810 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -326,7 +326,7 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
     if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, -1) < 0)
         goto fail;
 
-    if (qcow2_grow_l1_table(bs, sn->l1_size) < 0)
+    if (qcow2_grow_l1_vm_state(bs, sn->l1_size) < 0)
         goto fail;
 
     s->l1_size = sn->l1_size;
diff --git a/block/qcow2.c b/block/qcow2.c
index 4a7ab66..ab622a2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -140,7 +140,7 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset,
 static int qcow_open(BlockDriverState *bs, int flags)
 {
     BDRVQcowState *s = bs->opaque;
-    int len, i, shift;
+    int len, i;
     QCowHeader header;
     uint64_t ext_end;
 
@@ -188,8 +188,7 @@ static int qcow_open(BlockDriverState *bs, int flags)
 
     /* read the level 1 table */
     s->l1_size = header.l1_size;
-    shift = s->cluster_bits + s->l2_bits;
-    s->l1_vm_state_index = (header.size + (1LL << shift) - 1) >> shift;
+    s->l1_vm_state_index = size_to_l1(s, header.size);
     /* the L1 table must contain at least enough entries to put
        header.size bytes */
     if (s->l1_size < s->l1_vm_state_index)
@@ -1095,6 +1094,40 @@ static int qcow_make_empty(BlockDriverState *bs)
     return 0;
 }
 
+static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret, new_l1_size;
+
+    if (offset & 511) {
+        return -EINVAL;
+    }
+
+    /* cannot proceed if image has snapshots */
+    if (s->nb_snapshots) {
+        return -ENOTSUP;
+    }
+
+    /* shrinking is currently not supported */
+    if (offset < bs->total_sectors << BDRV_SECTOR_BITS) {
+        return -ENOTSUP;
+    }
+
+    new_l1_size = size_to_l1(s, offset);
+    ret = qcow2_grow_l1_image_data(bs, new_l1_size);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* write updated header.size */
+    offset = cpu_to_be64(offset);
+    if (bdrv_pwrite(bs->file, offsetof(QCowHeader, size), &offset,
+                    sizeof(uint64_t)) != sizeof(uint64_t)) {
+        return -EIO;
+    }
+    return 0;
+}
+
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
 static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
@@ -1294,7 +1327,9 @@ static BlockDriver bdrv_qcow2 = {
     .bdrv_aio_readv	= qcow_aio_readv,
     .bdrv_aio_writev	= qcow_aio_writev,
     .bdrv_aio_flush	= qcow_aio_flush,
-    .bdrv_write_compressed = qcow_write_compressed,
+
+    .bdrv_truncate          = qcow2_truncate,
+    .bdrv_write_compressed  = qcow_write_compressed,
 
     .bdrv_snapshot_create   = qcow2_snapshot_create,
     .bdrv_snapshot_goto     = qcow2_snapshot_goto,
diff --git a/block/qcow2.h b/block/qcow2.h
index 5bd08db..c328248 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -150,6 +150,12 @@ static inline int size_to_clusters(BDRVQcowState *s, int64_t size)
     return (size + (s->cluster_size - 1)) >> s->cluster_bits;
 }
 
+static inline int size_to_l1(BDRVQcowState *s, int64_t size)
+{
+    int shift = s->cluster_bits + s->l2_bits;
+    return (size + (1ULL << shift) - 1) >> shift;
+}
+
 static inline int64_t align_offset(int64_t offset, int n)
 {
     offset = (offset + n - 1) & ~(n - 1);
@@ -182,7 +188,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 int qcow2_check_refcounts(BlockDriverState *bs);
 
 /* qcow2-cluster.c functions */
-int qcow2_grow_l1_table(BlockDriverState *bs, int min_size);
+int qcow2_grow_l1_vm_state(BlockDriverState *bs, int min_size);
+int qcow2_grow_l1_image_data(BlockDriverState *bs, int new_l1_size);
 void qcow2_l2_cache_reset(BlockDriverState *bs);
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
 void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num,
-- 
1.7.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] Re: [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images
  2010-04-24  8:12 ` [Qemu-devel] [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images Stefan Hajnoczi
@ 2010-04-26 11:46   ` Kevin Wolf
  2010-04-26 14:01     ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2010-04-26 11:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

Am 24.04.2010 10:12, schrieb Stefan Hajnoczi:
> This patch adds the ability to grow qcow2 images in-place using
> bdrv_truncate().  This enables qemu-img resize command support for
> qcow2.
> 
> Snapshots are not supported and bdrv_truncate() will return -ENOTSUP.
> The notion of resizing an image with snapshots could lead to confusion:
> users may expect snapshots to remain unchanged, but this is not possible
> with the current qcow2 on-disk format where the header.size field is
> global instead of per-snapshot.  Others may expect snapshots to change
> size along with the current image data.  I think it is safest to not
> support snapshots and perhaps add behavior later if there is a
> consensus.

I think it would make most sense if we kept the disk size per snapshot
(so your first option). The snapshot header is extensible, so it should
be possible.

Just disabling it is okay with me, too, but in that case this patch is
much more complicated than it needs to be: If there are no snapshots,
there is no vmstate area.

> Backing images continue to work.  If the image is now larger than its
> backing image, zeroes are read when accessing beyond the end of the
> backing image.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
> This applies to kevin/block.
> 
>  block/qcow2-cluster.c  |   64 ++++++++++++++++++++++++++++++++++++++---------
>  block/qcow2-snapshot.c |    2 +-
>  block/qcow2.c          |   43 +++++++++++++++++++++++++++++---
>  block/qcow2.h          |    9 ++++++-
>  4 files changed, 99 insertions(+), 19 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index c11680d..20c8426 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,30 +28,39 @@
>  #include "block_int.h"
>  #include "block/qcow2.h"
>  
> -int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
> +/*
> + * qcow2_grow_l1_table_common
> + *
> + * Grows the L1 table and updates the header on disk.
> + *
> + * Setting new_l1_vm_state_index to s->l1_vm_state_index grows the vm state
> + * area.
> + *
> + * Setting new_l1_vm_state_index to the new end of image grows the image data
> + * area.

I don't understand this distinction. I'm sure you describe something
trivial here, but it sounds like magic.

There's really not much difference between data and vmstate clusters and
callers don't know on which of both they are working.

> + *
> + * Returns 0 on success, -errno in failure case.
> + */
> +static int qcow2_grow_l1_table_common(BlockDriverState *bs,
> +                                      int new_l1_vm_state_index,
> +                                      int new_l1_size)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int new_l1_size, new_l1_size2, ret, i;
> +    int new_l1_size2, ret, i;
>      uint64_t *new_l1_table;
>      int64_t new_l1_table_offset;
>      uint8_t data[12];
>  
> -    new_l1_size = s->l1_size;
> -    if (min_size <= new_l1_size)
> -        return 0;
> -    if (new_l1_size == 0) {
> -        new_l1_size = 1;
> -    }
> -    while (min_size > new_l1_size) {
> -        new_l1_size = (new_l1_size * 3 + 1) / 2;
> -    }
>  #ifdef DEBUG_ALLOC2
>      printf("grow l1_table from %d to %d\n", s->l1_size, new_l1_size);
>  #endif
>  
>      new_l1_size2 = sizeof(uint64_t) * new_l1_size;
>      new_l1_table = qemu_mallocz(align_offset(new_l1_size2, 512));
> -    memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
> +    memcpy(new_l1_table, s->l1_table, s->l1_vm_state_index * sizeof(uint64_t));

This change smells like a buffer overflow. It both can read beyond the
end of the old L1 table and can overflow the new one if the L1 table
hasn't yet reached its full size.

I think this is the main problem of this patch: You seem to assume that
the L1 table is created with the full size needed to cover all of the
data area. Which seems to be true at least for images created by
qemu-img, but I don't think we can rely on it (nor do I think we should,
because it introduces a special case where there is none).

I rather consider this preallocation a performance optimization. If we
stop treating it as such, we would need to refuse opening any images
that don't fulfill this requirement.

> +    memcpy(&new_l1_table[new_l1_vm_state_index],
> +           &s->l1_table[s->l1_vm_state_index],
> +           (s->l1_size - s->l1_vm_state_index) * sizeof(uint64_t));
>  
>      /* write new table (align to cluster) */
>      BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
> @@ -83,6 +92,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
>      s->l1_table_offset = new_l1_table_offset;
>      s->l1_table = new_l1_table;
>      s->l1_size = new_l1_size;
> +    s->l1_vm_state_index = new_l1_vm_state_index;
>      return 0;
>   fail:
>      qemu_free(new_l1_table);
> @@ -90,6 +100,34 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
>      return ret < 0 ? ret : -EIO;
>  }
>  
> +int qcow2_grow_l1_vm_state(BlockDriverState *bs, int min_size)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int new_l1_size;
> +
> +    new_l1_size = s->l1_size;
> +    if (min_size <= new_l1_size)
> +        return 0;
> +    if (new_l1_size == 0) {
> +        new_l1_size = 1;
> +    }
> +    while (min_size > new_l1_size) {
> +        new_l1_size = (new_l1_size * 3 + 1) / 2;
> +    }
> +
> +    return qcow2_grow_l1_table_common(bs, s->l1_vm_state_index, new_l1_size);
> +}
> +
> +int qcow2_grow_l1_image_data(BlockDriverState *bs, int new_l1_size)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int new_l1_vm_state_index;
> +
> +    new_l1_vm_state_index = new_l1_size;
> +    new_l1_size += s->l1_size - s->l1_vm_state_index;

Same assumption as above. This might turn into a negative number added
to new_l1_size.

> +    return qcow2_grow_l1_table_common(bs, new_l1_vm_state_index, new_l1_size);
> +}
> +
>  void qcow2_l2_cache_reset(BlockDriverState *bs)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -524,7 +562,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
>  
>      l1_index = offset >> (s->l2_bits + s->cluster_bits);
>      if (l1_index >= s->l1_size) {qcow2_grow_l1_vm_state(
> -        ret = qcow2_grow_l1_table(bs, l1_index + 1);
> +        ret = qcow2_grow_l1_vm_state(bs, l1_index + 1);

Once you agree that get_cluster_table can't know if it's working on data
or vmstate, qcow2_grow_l1_vm_state is a misnomer at least.

>          if (ret < 0) {
>              return ret;
>          }
> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
> index 2a21c17..7f0d810 100644
> --- a/block/qcow2-snapshot.c
> +++ b/block/qcow2-snapshot.c
> @@ -326,7 +326,7 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
>      if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, -1) < 0)
>          goto fail;
>  
> -    if (qcow2_grow_l1_table(bs, sn->l1_size) < 0)
> +    if (qcow2_grow_l1_vm_state(bs, sn->l1_size) < 0)
>          goto fail;
>  
>      s->l1_size = sn->l1_size;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 4a7ab66..ab622a2 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -140,7 +140,7 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>  static int qcow_open(BlockDriverState *bs, int flags)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int len, i, shift;
> +    int len, i;
>      QCowHeader header;
>      uint64_t ext_end;
>  
> @@ -188,8 +188,7 @@ static int qcow_open(BlockDriverState *bs, int flags)
>  
>      /* read the level 1 table */
>      s->l1_size = header.l1_size;
> -    shift = s->cluster_bits + s->l2_bits;
> -    s->l1_vm_state_index = (header.size + (1LL << shift) - 1) >> shift;
> +    s->l1_vm_state_index = size_to_l1(s, header.size);
>      /* the L1 table must contain at least enough entries to put
>         header.size bytes */
>      if (s->l1_size < s->l1_vm_state_index)
> @@ -1095,6 +1094,40 @@ static int qcow_make_empty(BlockDriverState *bs)
>      return 0;
>  }
>  
> +static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int ret, new_l1_size;
> +
> +    if (offset & 511) {
> +        return -EINVAL;
> +    }
> +
> +    /* cannot proceed if image has snapshots */
> +    if (s->nb_snapshots) {
> +        return -ENOTSUP;
> +    }
> +
> +    /* shrinking is currently not supported */
> +    if (offset < bs->total_sectors << BDRV_SECTOR_BITS) {
> +        return -ENOTSUP;
> +    }
> +
> +    new_l1_size = size_to_l1(s, offset);
> +    ret = qcow2_grow_l1_image_data(bs, new_l1_size);
> +    if (ret < 0) {
> +        return ret;
> +    }

Instead of the confusing changes above you could just increase the L1
table size using the old function and keep the data/vmstate thing local
to qcow2_truncate (or maybe better a qcow2_move_vmstate(bs, offset)
which internally grows the L1 table as needed)

Actually, I think this is not that different from your patch (you called
the almost same function qcow2_grow_l1_image_data and avoided the normal
calculation of the next L1 table size for some reason), but probably a
lot less confusing.

> +
> +    /* write updated header.size */
> +    offset = cpu_to_be64(offset);
> +    if (bdrv_pwrite(bs->file, offsetof(QCowHeader, size), &offset,
> +                    sizeof(uint64_t)) != sizeof(uint64_t)) {
> +        return -EIO;
> +    }

The vmstate_offset field needs to be updated somewhere, too. In my
suggestion this would be in qcow2_move_vmstate.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] Re: [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images
  2010-04-24  8:12 [Qemu-devel] [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images Stefan Hajnoczi
  2010-04-24  8:12 ` [Qemu-devel] [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images Stefan Hajnoczi
@ 2010-04-26 11:52 ` Kevin Wolf
  1 sibling, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2010-04-26 11:52 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

Am 24.04.2010 10:12, schrieb Stefan Hajnoczi:
> This patch adds a 'resize' command to grow/shrink disk images.  This
> allows changing the size of disk images without copying to a new image
> file.  Currently only raw files support resize.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Thanks, applied to the block branch.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images
  2010-04-26 11:46   ` [Qemu-devel] " Kevin Wolf
@ 2010-04-26 14:01     ` Stefan Hajnoczi
  2010-04-26 15:01       ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2010-04-26 14:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel

On Mon, Apr 26, 2010 at 12:46 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 24.04.2010 10:12, schrieb Stefan Hajnoczi:
>> This patch adds the ability to grow qcow2 images in-place using
>> bdrv_truncate().  This enables qemu-img resize command support for
>> qcow2.
>>
>> Snapshots are not supported and bdrv_truncate() will return -ENOTSUP.
>> The notion of resizing an image with snapshots could lead to confusion:
>> users may expect snapshots to remain unchanged, but this is not possible
>> with the current qcow2 on-disk format where the header.size field is
>> global instead of per-snapshot.  Others may expect snapshots to change
>> size along with the current image data.  I think it is safest to not
>> support snapshots and perhaps add behavior later if there is a
>> consensus.
>
> I think it would make most sense if we kept the disk size per snapshot
> (so your first option). The snapshot header is extensible, so it should
> be possible.
>
> Just disabling it is okay with me, too, but in that case this patch is
> much more complicated than it needs to be: If there are no snapshots,
> there is no vmstate area.

Good point, things are simplified greatly if there is no vm state
data.  I haven't read the save/load vm yet and failed to make the
connection here ;).

I will eliminate most of the code as you explained and send a v2
patch.  I have responded below because I hope to learn more by
discussing the comments with you, but most of the issues will not
apply to a slimmed down v2 so feel free to ignore if you are busy.

>> Backing images continue to work.  If the image is now larger than its
>> backing image, zeroes are read when accessing beyond the end of the
>> backing image.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>> This applies to kevin/block.
>>
>>  block/qcow2-cluster.c  |   64 ++++++++++++++++++++++++++++++++++++++---------
>>  block/qcow2-snapshot.c |    2 +-
>>  block/qcow2.c          |   43 +++++++++++++++++++++++++++++---
>>  block/qcow2.h          |    9 ++++++-
>>  4 files changed, 99 insertions(+), 19 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index c11680d..20c8426 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -28,30 +28,39 @@
>>  #include "block_int.h"
>>  #include "block/qcow2.h"
>>
>> -int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
>> +/*
>> + * qcow2_grow_l1_table_common
>> + *
>> + * Grows the L1 table and updates the header on disk.
>> + *
>> + * Setting new_l1_vm_state_index to s->l1_vm_state_index grows the vm state
>> + * area.
>> + *
>> + * Setting new_l1_vm_state_index to the new end of image grows the image data
>> + * area.
>
> I don't understand this distinction. I'm sure you describe something
> trivial here, but it sounds like magic.
>
> There's really not much difference between data and vmstate clusters and
> callers don't know on which of both they are working.

The new_l1_vm_state_index argument is used to position the vm state
data in the new L1 table.  If we're growing the L1 table to make room
for more vm state data, then new_l1_vm_state_index should keep its old
value (s->l1_vm_state_index).  If we're growing the image data and
need to move the vm state out of the way, then new_l1_vm_state_index
should be recalculated for the new image size.

The function allocates the new L1 table and performs two separate
copies: one of the image data L1 entries and one for the vm state L1
entries.  The new_l1_vm_state_index argument controls where the vm
state data entries are copied; it provides the ability the move the vm
state data to make room for new image data entries.

>> + *
>> + * Returns 0 on success, -errno in failure case.
>> + */
>> +static int qcow2_grow_l1_table_common(BlockDriverState *bs,
>> +                                      int new_l1_vm_state_index,
>> +                                      int new_l1_size)
>>  {
>>      BDRVQcowState *s = bs->opaque;
>> -    int new_l1_size, new_l1_size2, ret, i;
>> +    int new_l1_size2, ret, i;
>>      uint64_t *new_l1_table;
>>      int64_t new_l1_table_offset;
>>      uint8_t data[12];
>>
>> -    new_l1_size = s->l1_size;
>> -    if (min_size <= new_l1_size)
>> -        return 0;
>> -    if (new_l1_size == 0) {
>> -        new_l1_size = 1;
>> -    }
>> -    while (min_size > new_l1_size) {
>> -        new_l1_size = (new_l1_size * 3 + 1) / 2;
>> -    }
>>  #ifdef DEBUG_ALLOC2
>>      printf("grow l1_table from %d to %d\n", s->l1_size, new_l1_size);
>>  #endif
>>
>>      new_l1_size2 = sizeof(uint64_t) * new_l1_size;
>>      new_l1_table = qemu_mallocz(align_offset(new_l1_size2, 512));
>> -    memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
>> +    memcpy(new_l1_table, s->l1_table, s->l1_vm_state_index * sizeof(uint64_t));
>
> This change smells like a buffer overflow. It both can read beyond the
> end of the old L1 table and can overflow the new one if the L1 table
> hasn't yet reached its full size.
>
> I think this is the main problem of this patch: You seem to assume that
> the L1 table is created with the full size needed to cover all of the
> data area. Which seems to be true at least for images created by
> qemu-img, but I don't think we can rely on it (nor do I think we should,
> because it introduces a special case where there is none).
>
> I rather consider this preallocation a performance optimization. If we
> stop treating it as such, we would need to refuse opening any images
> that don't fulfill this requirement.

>From qcow_open():

/* read the level 1 table */
s->l1_size = header.l1_size;
shift = s->cluster_bits + s->l2_bits;
s->l1_vm_state_index = (header.size + (1LL << shift) - 1) >> shift;
/* the L1 table must contain at least enough entries to put
   header.size bytes */
if (s->l1_size < s->l1_vm_state_index)
    goto fail;

Images where L1 size is not at least l1_vm_state_index cannot be
opened by QEMU.  Right now the special case exists and perhaps some
qcow2 code already relies on this assumption.

>> +    memcpy(&new_l1_table[new_l1_vm_state_index],
>> +           &s->l1_table[s->l1_vm_state_index],
>> +           (s->l1_size - s->l1_vm_state_index) * sizeof(uint64_t));
>>
>>      /* write new table (align to cluster) */
>>      BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
>> @@ -83,6 +92,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
>>      s->l1_table_offset = new_l1_table_offset;
>>      s->l1_table = new_l1_table;
>>      s->l1_size = new_l1_size;
>> +    s->l1_vm_state_index = new_l1_vm_state_index;
>>      return 0;
>>   fail:
>>      qemu_free(new_l1_table);
>> @@ -90,6 +100,34 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
>>      return ret < 0 ? ret : -EIO;
>>  }
>>
>> +int qcow2_grow_l1_vm_state(BlockDriverState *bs, int min_size)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int new_l1_size;
>> +
>> +    new_l1_size = s->l1_size;
>> +    if (min_size <= new_l1_size)
>> +        return 0;
>> +    if (new_l1_size == 0) {
>> +        new_l1_size = 1;
>> +    }
>> +    while (min_size > new_l1_size) {
>> +        new_l1_size = (new_l1_size * 3 + 1) / 2;
>> +    }
>> +
>> +    return qcow2_grow_l1_table_common(bs, s->l1_vm_state_index, new_l1_size);
>> +}
>> +
>> +int qcow2_grow_l1_image_data(BlockDriverState *bs, int new_l1_size)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int new_l1_vm_state_index;
>> +
>> +    new_l1_vm_state_index = new_l1_size;
>> +    new_l1_size += s->l1_size - s->l1_vm_state_index;
>
> Same assumption as above. This might turn into a negative number added
> to new_l1_size.
>
>> +    return qcow2_grow_l1_table_common(bs, new_l1_vm_state_index, new_l1_size);
>> +}
>> +
>>  void qcow2_l2_cache_reset(BlockDriverState *bs)
>>  {
>>      BDRVQcowState *s = bs->opaque;
>> @@ -524,7 +562,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
>>
>>      l1_index = offset >> (s->l2_bits + s->cluster_bits);
>>      if (l1_index >= s->l1_size) {qcow2_grow_l1_vm_state(
>> -        ret = qcow2_grow_l1_table(bs, l1_index + 1);
>> +        ret = qcow2_grow_l1_vm_state(bs, l1_index + 1);
>
> Once you agree that get_cluster_table can't know if it's working on data
> or vmstate, qcow2_grow_l1_vm_state is a misnomer at least.
>
>>          if (ret < 0) {
>>              return ret;
>>          }
>> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
>> index 2a21c17..7f0d810 100644
>> --- a/block/qcow2-snapshot.c
>> +++ b/block/qcow2-snapshot.c
>> @@ -326,7 +326,7 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
>>      if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, -1) < 0)
>>          goto fail;
>>
>> -    if (qcow2_grow_l1_table(bs, sn->l1_size) < 0)
>> +    if (qcow2_grow_l1_vm_state(bs, sn->l1_size) < 0)
>>          goto fail;
>>
>>      s->l1_size = sn->l1_size;
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 4a7ab66..ab622a2 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -140,7 +140,7 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>  static int qcow_open(BlockDriverState *bs, int flags)
>>  {
>>      BDRVQcowState *s = bs->opaque;
>> -    int len, i, shift;
>> +    int len, i;
>>      QCowHeader header;
>>      uint64_t ext_end;
>>
>> @@ -188,8 +188,7 @@ static int qcow_open(BlockDriverState *bs, int flags)
>>
>>      /* read the level 1 table */
>>      s->l1_size = header.l1_size;
>> -    shift = s->cluster_bits + s->l2_bits;
>> -    s->l1_vm_state_index = (header.size + (1LL << shift) - 1) >> shift;
>> +    s->l1_vm_state_index = size_to_l1(s, header.size);
>>      /* the L1 table must contain at least enough entries to put
>>         header.size bytes */
>>      if (s->l1_size < s->l1_vm_state_index)
>> @@ -1095,6 +1094,40 @@ static int qcow_make_empty(BlockDriverState *bs)
>>      return 0;
>>  }
>>
>> +static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int ret, new_l1_size;
>> +
>> +    if (offset & 511) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* cannot proceed if image has snapshots */
>> +    if (s->nb_snapshots) {
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    /* shrinking is currently not supported */
>> +    if (offset < bs->total_sectors << BDRV_SECTOR_BITS) {
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    new_l1_size = size_to_l1(s, offset);
>> +    ret = qcow2_grow_l1_image_data(bs, new_l1_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>
> Instead of the confusing changes above you could just increase the L1
> table size using the old function and keep the data/vmstate thing local
> to qcow2_truncate (or maybe better a qcow2_move_vmstate(bs, offset)
> which internally grows the L1 table as needed)
>
> Actually, I think this is not that different from your patch (you called
> the almost same function qcow2_grow_l1_image_data and avoided the normal
> calculation of the next L1 table size for some reason), but probably a
> lot less confusing.

I like the qcow2_move_vmstate() name.  It is clearer than
qcow2_grow_l1_image_data().

If I understand correctly, the next L1 table size calculation tries to
grow the table in steps large enough so that the grow operation does
not need to be performed frequently.  This makes sense when appending
arbitrary vm state data, but the truncate operation knows the exact
new size of the image and doesn't need to speculatively allocate more
L1 table space.

>> +
>> +    /* write updated header.size */
>> +    offset = cpu_to_be64(offset);
>> +    if (bdrv_pwrite(bs->file, offsetof(QCowHeader, size), &offset,
>> +                    sizeof(uint64_t)) != sizeof(uint64_t)) {
>> +        return -EIO;
>> +    }
>
> The vmstate_offset field needs to be updated somewhere, too. In my
> suggestion this would be in qcow2_move_vmstate.

Which field (I can't find a vmstate_offset field)?  The patch updates
s->l1_vm_state_index in qcow2_grow_l1_table_common(), is that the one
you mean?

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] qcow2: Implement bdrv_truncate() for  growing images
  2010-04-26 14:01     ` Stefan Hajnoczi
@ 2010-04-26 15:01       ` Kevin Wolf
  0 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2010-04-26 15:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Stefan Hajnoczi, qemu-devel

Am 26.04.2010 16:01, schrieb Stefan Hajnoczi:
> From qcow_open():
> 
> /* read the level 1 table */
> s->l1_size = header.l1_size;
> shift = s->cluster_bits + s->l2_bits;
> s->l1_vm_state_index = (header.size + (1LL << shift) - 1) >> shift;
> /* the L1 table must contain at least enough entries to put
>    header.size bytes */
> if (s->l1_size < s->l1_vm_state_index)
>     goto fail;
> 
> Images where L1 size is not at least l1_vm_state_index cannot be
> opened by QEMU.  Right now the special case exists and perhaps some
> qcow2 code already relies on this assumption.

Hm, yes. You win.

>>> @@ -1095,6 +1094,40 @@ static int qcow_make_empty(BlockDriverState *bs)
>>>      return 0;
>>>  }
>>>
>>> +static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
>>> +{
>>> +    BDRVQcowState *s = bs->opaque;
>>> +    int ret, new_l1_size;
>>> +
>>> +    if (offset & 511) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /* cannot proceed if image has snapshots */
>>> +    if (s->nb_snapshots) {
>>> +        return -ENOTSUP;
>>> +    }
>>> +
>>> +    /* shrinking is currently not supported */
>>> +    if (offset < bs->total_sectors << BDRV_SECTOR_BITS) {
>>> +        return -ENOTSUP;
>>> +    }
>>> +
>>> +    new_l1_size = size_to_l1(s, offset);
>>> +    ret = qcow2_grow_l1_image_data(bs, new_l1_size);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>
>> Instead of the confusing changes above you could just increase the L1
>> table size using the old function and keep the data/vmstate thing local
>> to qcow2_truncate (or maybe better a qcow2_move_vmstate(bs, offset)
>> which internally grows the L1 table as needed)
>>
>> Actually, I think this is not that different from your patch (you called
>> the almost same function qcow2_grow_l1_image_data and avoided the normal
>> calculation of the next L1 table size for some reason), but probably a
>> lot less confusing.
> 
> I like the qcow2_move_vmstate() name.  It is clearer than
> qcow2_grow_l1_image_data().
> 
> If I understand correctly, the next L1 table size calculation tries to
> grow the table in steps large enough so that the grow operation does
> not need to be performed frequently.  This makes sense when appending
> arbitrary vm state data, but the truncate operation knows the exact
> new size of the image and doesn't need to speculatively allocate more
> L1 table space.

Agreed. I just doubt that saving a few bytes is worth splitting up
qcow2_grow_l1_table when rounding up doesn't really hurt.

Maybe most of my real problem was really just naming, though. It made
everything look more complicated than it actually is. And the second
thing is probably that grow_l1_table started to do more than just
growing the L1 table.

>>> +
>>> +    /* write updated header.size */
>>> +    offset = cpu_to_be64(offset);
>>> +    if (bdrv_pwrite(bs->file, offsetof(QCowHeader, size), &offset,
>>> +                    sizeof(uint64_t)) != sizeof(uint64_t)) {
>>> +        return -EIO;
>>> +    }
>>
>> The vmstate_offset field needs to be updated somewhere, too. In my
>> suggestion this would be in qcow2_move_vmstate.
> 
> Which field (I can't find a vmstate_offset field)?  The patch updates
> s->l1_vm_state_index in qcow2_grow_l1_table_common(), is that the one
> you mean?

I meant in the header. And you can't find it because it doesn't exist,
s->l1_vm_state_index is calculated in qcow_open. Never let me review
patches when I'm tired. :-)

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-26 15:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-24  8:12 [Qemu-devel] [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images Stefan Hajnoczi
2010-04-24  8:12 ` [Qemu-devel] [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images Stefan Hajnoczi
2010-04-26 11:46   ` [Qemu-devel] " Kevin Wolf
2010-04-26 14:01     ` Stefan Hajnoczi
2010-04-26 15:01       ` Kevin Wolf
2010-04-26 11:52 ` [Qemu-devel] Re: [PATCH 1/2] qemu-img: Add 'resize' command to grow/shrink disk images Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.