All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [patch 0/5] block streaming base support
@ 2011-12-30 10:03 Marcelo Tosatti
  2011-12-30 10:03 ` [Qemu-devel] [patch 1/5] block: add bdrv_find_backing_image Marcelo Tosatti
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2011-12-30 10:03 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel

Add support for streaming data from an intermediate section of the
image chain.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 1/5] block: add bdrv_find_backing_image
  2011-12-30 10:03 [Qemu-devel] [patch 0/5] block streaming base support Marcelo Tosatti
@ 2011-12-30 10:03 ` Marcelo Tosatti
  2011-12-30 10:03 ` [Qemu-devel] [patch 2/5] block: implement bdrv_find_backing_image in qcow2 Marcelo Tosatti
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2011-12-30 10:03 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: bdrv-stream-shared-base-helpers --]
[-- Type: text/plain, Size: 2005 bytes --]

Add bdrv_find_backing_image: given a BlockDriverState pointer, and an id,
traverse the backing image chain to locate the id.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/block.c
===================================================================
--- stefanha.orig/block.c
+++ stefanha/block.c
@@ -2580,6 +2580,23 @@ int bdrv_snapshot_load_tmp(BlockDriverSt
     return -ENOTSUP;
 }
 
+/*
+ * Return the BlockDriverState pointer for a backing image
+ * with 'id'.
+ */
+BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
+                                          const char *id)
+{
+    BlockDriver *drv = bs->drv;
+    if (!drv) {
+        return NULL;
+    }
+    if (drv->bdrv_find_backing_image) {
+            return drv->bdrv_find_backing_image(bs, id);
+    }
+    return NULL;
+}
+
 #define NB_SUFFIXES 4
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size)
Index: stefanha/block.h
===================================================================
--- stefanha.orig/block.h
+++ stefanha/block.h
@@ -146,6 +146,7 @@ int coroutine_fn bdrv_co_writev(BlockDri
     int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, int *pnum);
+BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs, const char *id);
 int bdrv_truncate(BlockDriverState *bs, int64_t offset);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
Index: stefanha/block_int.h
===================================================================
--- stefanha.orig/block_int.h
+++ stefanha/block_int.h
@@ -136,6 +136,9 @@ struct BlockDriver {
     int coroutine_fn (*bdrv_co_is_allocated)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum);
 
+    BlockDriverState *(*bdrv_find_backing_image)(BlockDriverState *bs,
+        const char *id);
+
     /*
      * Invalidate any cached meta-data.
      */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 2/5] block: implement bdrv_find_backing_image in qcow2
  2011-12-30 10:03 [Qemu-devel] [patch 0/5] block streaming base support Marcelo Tosatti
  2011-12-30 10:03 ` [Qemu-devel] [patch 1/5] block: add bdrv_find_backing_image Marcelo Tosatti
@ 2011-12-30 10:03 ` Marcelo Tosatti
  2012-01-03 13:44   ` Stefan Hajnoczi
  2011-12-30 10:03 ` [Qemu-devel] [patch 3/5] add QERR_BASE_ID_NOT_FOUND Marcelo Tosatti
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2011-12-30 10:03 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: bdrv-stream-shared-base-helper-backend --]
[-- Type: text/plain, Size: 1052 bytes --]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/block/qcow2.c
===================================================================
--- stefanha.orig/block/qcow2.c
+++ stefanha/block/qcow2.c
@@ -767,6 +767,20 @@ static int qcow2_change_backing_file(Blo
     return qcow2_update_ext_header(bs, backing_file, backing_fmt);
 }
 
+static BlockDriverState *qcow2_find_backing_image(BlockDriverState *bs,
+                                                  const char *id)
+{
+
+    do {
+        if (!strncmp(bs->backing_file, id, sizeof(bs->backing_file)))
+            return bs->backing_hd;
+
+        bs = bs->backing_hd;
+    } while (bs);
+
+    return NULL;
+}
+
 static int preallocate(BlockDriverState *bs)
 {
     uint64_t nb_sectors;
@@ -1304,6 +1318,7 @@ static BlockDriver bdrv_qcow2 = {
     .bdrv_load_vmstate    = qcow2_load_vmstate,
 
     .bdrv_change_backing_file   = qcow2_change_backing_file,
+    .bdrv_find_backing_image    = qcow2_find_backing_image,
 
     .bdrv_invalidate_cache      = qcow2_invalidate_cache,
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 3/5] add QERR_BASE_ID_NOT_FOUND
  2011-12-30 10:03 [Qemu-devel] [patch 0/5] block streaming base support Marcelo Tosatti
  2011-12-30 10:03 ` [Qemu-devel] [patch 1/5] block: add bdrv_find_backing_image Marcelo Tosatti
  2011-12-30 10:03 ` [Qemu-devel] [patch 2/5] block: implement bdrv_find_backing_image in qcow2 Marcelo Tosatti
@ 2011-12-30 10:03 ` Marcelo Tosatti
  2011-12-30 10:03 ` [Qemu-devel] [patch 4/5] block stream: add support for partial streaming Marcelo Tosatti
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2011-12-30 10:03 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: qerr-base-id-not-found --]
[-- Type: text/plain, Size: 956 bytes --]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/qerror.c
===================================================================
--- stefanha.orig/qerror.c
+++ stefanha/qerror.c
@@ -254,6 +254,10 @@ static const QErrorStringTable qerror_ta
         .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
         .desc      = "Invalid paramter combination",
     },
+    {
+        .error_fmt = QERR_BASE_ID_NOT_FOUND,
+        .desc      = "The base id %(base_id) has not been found",
+    },
     {}
 };
 
Index: stefanha/qerror.h
===================================================================
--- stefanha.orig/qerror.h
+++ stefanha/qerror.h
@@ -210,4 +210,7 @@ QError *qobject_to_qerror(const QObject 
 #define QERR_INVALID_PARAMETER_COMBINATION \
     "{ 'class': 'InvalidParameterCombination', 'data': {} }"
 
+#define QERR_BASE_ID_NOT_FOUND \
+    "{ 'class': 'BaseIdNotFound', 'data': { 'base_id': %s } }"
+
 #endif /* QERROR_H */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 4/5] block stream: add support for partial streaming
  2011-12-30 10:03 [Qemu-devel] [patch 0/5] block streaming base support Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2011-12-30 10:03 ` [Qemu-devel] [patch 3/5] add QERR_BASE_ID_NOT_FOUND Marcelo Tosatti
@ 2011-12-30 10:03 ` Marcelo Tosatti
  2012-01-04 12:39   ` Stefan Hajnoczi
  2011-12-30 10:03 ` [Qemu-devel] [patch 5/5] add doc to describe live block operations Marcelo Tosatti
  2012-01-04 14:08 ` [Qemu-devel] [patch 0/4] block streaming base support (v2) Marcelo Tosatti
  5 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2011-12-30 10:03 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: block-stream-base --]
[-- Type: text/plain, Size: 6913 bytes --]

Add support for streaming data from an intermediate section of the 
image chain (see patch and documentation for details).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/block.c
===================================================================
--- stefanha.orig/block.c
+++ stefanha/block.c
@@ -2229,6 +2229,70 @@ int bdrv_is_allocated(BlockDriverState *
     return data.ret;
 }
 
+/*
+ * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
+ *
+ * Return true if the given sector is allocated in top or base.
+ * Return false if the given sector is allocated in intermediate images.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ *  the specified sector) that are known to be in the same
+ *  allocated/unallocated state.
+ *
+ */
+int bdrv_is_allocated_base(BlockDriverState *top,
+                           BlockDriverState *base,
+                           int64_t sector_num,
+                           int nb_sectors, int *pnum)
+{
+    BlockDriverState *intermediate;
+    int ret, n;
+
+    ret = bdrv_co_is_allocated(top, sector_num, nb_sectors, &n);
+    if (ret) {
+        *pnum = n;
+        return ret;
+    }
+
+    /*
+     * Is the unallocated chunk [sector_num, n] also
+     * unallocated between base and top?
+     */
+    intermediate = top->backing_hd;
+
+    while (intermediate) {
+        int pnum_inter;
+
+        /* reached base */
+        if (intermediate == base) {
+            *pnum = n;
+            return 1;
+        }
+        ret = bdrv_co_is_allocated(intermediate, sector_num, nb_sectors,
+                                   &pnum_inter);
+        if (ret < 0) {
+            return ret;
+        } else if (ret) {
+            *pnum = pnum_inter;
+            return 0;
+        }
+
+        /*
+         * [sector_num, nb_sectors] is unallocated on top but intermediate
+         * might have
+         *
+         * [sector_num+x, nr_sectors] allocated.
+         */
+        if (n > pnum_inter) {
+            n = pnum_inter;
+        }
+
+        intermediate = intermediate->backing_hd;
+    }
+
+    return 1;
+}
+
 void bdrv_mon_event(const BlockDriverState *bdrv,
                     BlockMonEventAction action, int is_read)
 {
Index: stefanha/block.h
===================================================================
--- stefanha.orig/block.h
+++ stefanha/block.h
@@ -222,6 +222,8 @@ int bdrv_co_discard(BlockDriverState *bs
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
                       int *pnum);
+int bdrv_is_allocated_base(BlockDriverState *top, BlockDriverState *base,
+                           int64_t sector_num, int nb_sectors, int *pnum);
 
 #define BIOS_ATA_TRANSLATION_AUTO   0
 #define BIOS_ATA_TRANSLATION_NONE   1
Index: stefanha/block/stream.c
===================================================================
--- stefanha.orig/block/stream.c
+++ stefanha/block/stream.c
@@ -57,6 +57,7 @@ typedef struct StreamBlockJob {
     BlockJob common;
     RateLimit limit;
     BlockDriverState *base;
+    char backing_file_id[1024];
 } StreamBlockJob;
 
 static int coroutine_fn stream_populate(BlockDriverState *bs,
@@ -79,6 +80,7 @@ static void coroutine_fn stream_run(void
 {
     StreamBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
+    BlockDriverState *base = s->base;
     int64_t sector_num, end;
     int ret = 0;
     int n;
@@ -97,8 +99,17 @@ retry:
             break;
         }
 
-        ret = bdrv_co_is_allocated(bs, sector_num,
-                                   STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+
+        if (base) {
+            ret = bdrv_is_allocated_base(bs, base, sector_num,
+                                            STREAM_BUFFER_SIZE /
+                                            BDRV_SECTOR_SIZE,
+                                            &n);
+        } else {
+            ret = bdrv_co_is_allocated(bs, sector_num,
+                                       STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+                                       &n);
+        }
         trace_stream_one_iteration(s, sector_num, n, ret);
         if (ret == 0) {
             if (s->common.speed) {
@@ -115,6 +126,7 @@ retry:
         if (ret < 0) {
             break;
         }
+        ret = 0;
 
         /* Publish progress */
         s->common.offset += n * BDRV_SECTOR_SIZE;
@@ -129,7 +141,10 @@ retry:
     bdrv_disable_zero_detection(bs);
 
     if (sector_num == end && ret == 0) {
-        bdrv_change_backing_file(bs, NULL, NULL);
+        const char *base_id = NULL;
+        if (base)
+            base_id = s->backing_file_id;
+        bdrv_change_backing_file(bs, base_id, NULL);
     }
 
     qemu_vfree(buf);
@@ -155,7 +170,8 @@ static BlockJobType stream_job_type = {
 };
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
-                 BlockDriverCompletionFunc *cb, void *opaque)
+                 const char *base_id, BlockDriverCompletionFunc *cb,
+                 void *opaque)
 {
     StreamBlockJob *s;
     Coroutine *co;
@@ -166,6 +182,8 @@ int stream_start(BlockDriverState *bs, B
 
     s = block_job_create(&stream_job_type, bs, cb, opaque);
     s->base = base;
+    if (base_id)
+        strncpy(s->backing_file_id, base_id, sizeof(s->backing_file_id) - 1);
 
     co = qemu_coroutine_create(stream_run);
     trace_stream_start(bs, base, s, co, opaque);
Index: stefanha/blockdev.c
===================================================================
--- stefanha.orig/blockdev.c
+++ stefanha/blockdev.c
@@ -931,6 +931,7 @@ void qmp_block_stream(const char *device
                       const char *base, Error **errp)
 {
     BlockDriverState *bs;
+    BlockDriverState *base_bs;
     int ret;
 
     bs = bdrv_find(device);
@@ -939,13 +940,15 @@ void qmp_block_stream(const char *device
         return;
     }
 
-    /* Base device not supported */
     if (base) {
-        error_set(errp, QERR_NOT_SUPPORTED);
-        return;
+        base_bs = bdrv_find_backing_image(bs, base);
+        if (base_bs == NULL) {
+            error_set(errp, QERR_BASE_ID_NOT_FOUND, base);
+            return;
+        }
     }
 
-    ret = stream_start(bs, NULL, block_stream_cb, bs);
+    ret = stream_start(bs, base_bs, base, block_stream_cb, bs);
     if (ret < 0) {
         switch (ret) {
         case -EBUSY:
Index: stefanha/block_int.h
===================================================================
--- stefanha.orig/block_int.h
+++ stefanha/block_int.h
@@ -380,6 +380,7 @@ static inline bool block_job_is_cancelle
 }
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
-                 BlockDriverCompletionFunc *cb, void *opaque);
+                 const char *base_id, BlockDriverCompletionFunc *cb,
+                 void *opaque);
 
 #endif /* BLOCK_INT_H */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 5/5] add doc to describe live block operations
  2011-12-30 10:03 [Qemu-devel] [patch 0/5] block streaming base support Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2011-12-30 10:03 ` [Qemu-devel] [patch 4/5] block stream: add support for partial streaming Marcelo Tosatti
@ 2011-12-30 10:03 ` Marcelo Tosatti
  2012-01-04 14:08 ` [Qemu-devel] [patch 0/4] block streaming base support (v2) Marcelo Tosatti
  5 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2011-12-30 10:03 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: live-block-operations --]
[-- Type: text/plain, Size: 2108 bytes --]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/docs/live-block-ops.txt
===================================================================
--- /dev/null
+++ stefanha/docs/live-block-ops.txt
@@ -0,0 +1,58 @@
+LIVE BLOCK OPERATIONS
+=====================
+
+High level description of live block operations. Note these are not
+supported for use with the raw format at the moment.
+
+Snapshot live merge
+===================
+
+Given a snapshot chain, described in this document in the following
+format:
+
+[A] -> [B] -> [C] -> [D]
+
+Where the rightmost object ([D] in the example) described is the current
+image which the guest OS has write access to. To the left of it is its base
+image, and so on accordingly until the leftmost image, which has no
+base.
+
+The snapshot live merge operation transforms such a chain into a
+smaller one with fewer elements, such as this transformation relative
+to the first example:
+
+[A] -> [D]
+
+Currently only forward merge with target being the active image is
+supported, that is, data copy is performed in the right direction with
+destination being the rightmost image.
+
+The operation is implemented in QEMU through image streaming facilities.
+
+The basic idea is to execute 'block_stream virtio0' while the guest is
+running. Progress can be monitored using 'info block-jobs'. When the
+streaming operation completes it raises a QMP event. 'block_stream'
+copies data from the backing file(s) into the active image. When finished,
+it adjusts the backing file pointer.
+
+The 'base' parameter specifies an image which data need not be streamed from.
+This image will be used as the backing file for the active image when the
+operation is finished.
+
+In the example above, the command would be:
+
+(qemu) block_stream virtio0 A
+
+
+Live block copy
+===============
+
+To copy an in use image to another destination in the filesystem, one
+should create a live snapshot in the desired destination, then stream
+into that image. Example:
+
+(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img qcow2
+
+(qemu) block_stream ide0-hd0
+
+

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 2/5] block: implement bdrv_find_backing_image in qcow2
  2011-12-30 10:03 ` [Qemu-devel] [patch 2/5] block: implement bdrv_find_backing_image in qcow2 Marcelo Tosatti
@ 2012-01-03 13:44   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2012-01-03 13:44 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kwolf, stefanha, qemu-devel

On Fri, Dec 30, 2011 at 10:03 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: stefanha/block/qcow2.c
> ===================================================================
> --- stefanha.orig/block/qcow2.c
> +++ stefanha/block/qcow2.c
> @@ -767,6 +767,20 @@ static int qcow2_change_backing_file(Blo
>     return qcow2_update_ext_header(bs, backing_file, backing_fmt);
>  }
>
> +static BlockDriverState *qcow2_find_backing_image(BlockDriverState *bs,
> +                                                  const char *id)
> +{
> +
> +    do {
> +        if (!strncmp(bs->backing_file, id, sizeof(bs->backing_file)))
> +            return bs->backing_hd;

Coding style uses {} always.

> +
> +        bs = bs->backing_hd;
> +    } while (bs);
> +
> +    return NULL;
> +}

The backing file may not be qcow2, so we cannot loop over
bs->backing_hd.  We need to recurse instead.

That said, any image format which uses bs->backing_file will use
bs->backing_hd.  For example, QED could use the exact same
.bdrv_find_backing_file() implementation.

Perhaps instead we need a generic implementation which does something like:

BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs, const char *id)
{
    if (!bs->drv) {
        return NULL;
    }

    if (bs->backing_hd) {
        if (strcmp(id, bs->backing_file) == 0) {
            return bs->backing_hd;
        } else {
            return bdrv_find_backing_file(bs->backing_hd, id);
        }
    }

    if (bs->drv->bdrv_find_backing_file) {
        return bs->drv->bdrv_find_backing_file(bs, id);
    }
    return NULL;
}

Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 4/5] block stream: add support for partial streaming
  2011-12-30 10:03 ` [Qemu-devel] [patch 4/5] block stream: add support for partial streaming Marcelo Tosatti
@ 2012-01-04 12:39   ` Stefan Hajnoczi
  2012-01-04 13:52     ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2012-01-04 12:39 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kwolf, stefanha, qemu-devel

On Fri, Dec 30, 2011 at 10:03 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> +int bdrv_is_allocated_base(BlockDriverState *top,
> +                           BlockDriverState *base,
> +                           int64_t sector_num,
> +                           int nb_sectors, int *pnum)

Since this function runs in coroutine context it should be marked:
int coroutine_fn bdrv_co_is_allocated_base(...)

The _co_ in the filename is just a convention to identify block layer
functions that execute in coroutine context.  The coroutine_fn
annotation is useful if we ever want static checker support that
verifies that coroutine functions are always called from coroutine
context.

> +{
> +    BlockDriverState *intermediate;
> +    int ret, n;
> +
> +    ret = bdrv_co_is_allocated(top, sector_num, nb_sectors, &n);
> +    if (ret) {
> +        *pnum = n;
> +        return ret;
> +    }
> +
> +    /*
> +     * Is the unallocated chunk [sector_num, n] also
> +     * unallocated between base and top?
> +     */
> +    intermediate = top->backing_hd;

This reaches into BlockDriverState->backing_hd.  In practice this is
probably the simplest and best way to do it.  But if we're going to do
this then do we even need the BlockDriver .bdrv_find_backing_file()
abstraction?  We could implement generic code which traverses
->backing_hd in block.c and if you don't use
BlockDriverState->backing_hd then you're out of luck.

> @@ -129,7 +141,10 @@ retry:
>     bdrv_disable_zero_detection(bs);
>
>     if (sector_num == end && ret == 0) {
> -        bdrv_change_backing_file(bs, NULL, NULL);
> +        const char *base_id = NULL;
> +        if (base)
> +            base_id = s->backing_file_id;
> +        bdrv_change_backing_file(bs, base_id, NULL);

This makes me want to move the bdrv_change_backing_file() call out to
blockdev.c where we would perform it on successful completion.  That
way we don't need to pass base_id into stream.c at all.  A streaming
block job would no longer automatically change the backing file on
completion.

> @@ -166,6 +182,8 @@ int stream_start(BlockDriverState *bs, B
>
>     s = block_job_create(&stream_job_type, bs, cb, opaque);
>     s->base = base;
> +    if (base_id)
> +        strncpy(s->backing_file_id, base_id, sizeof(s->backing_file_id) - 1);

cutils.c:pstrcpy() is nicer than strncpy(), it does not need the size - 1.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 4/5] block stream: add support for partial streaming
  2012-01-04 12:39   ` Stefan Hajnoczi
@ 2012-01-04 13:52     ` Marcelo Tosatti
  0 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 13:52 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, stefanha, qemu-devel

On Wed, Jan 04, 2012 at 12:39:57PM +0000, Stefan Hajnoczi wrote:
> On Fri, Dec 30, 2011 at 10:03 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > +int bdrv_is_allocated_base(BlockDriverState *top,
> > +                           BlockDriverState *base,
> > +                           int64_t sector_num,
> > +                           int nb_sectors, int *pnum)
> 
> Since this function runs in coroutine context it should be marked:
> int coroutine_fn bdrv_co_is_allocated_base(...)
> 
> The _co_ in the filename is just a convention to identify block layer
> functions that execute in coroutine context.  The coroutine_fn
> annotation is useful if we ever want static checker support that
> verifies that coroutine functions are always called from coroutine
> context.

Done.

> > +{
> > +    BlockDriverState *intermediate;
> > +    int ret, n;
> > +
> > +    ret = bdrv_co_is_allocated(top, sector_num, nb_sectors, &n);
> > +    if (ret) {
> > +        *pnum = n;
> > +        return ret;
> > +    }
> > +
> > +    /*
> > +     * Is the unallocated chunk [sector_num, n] also
> > +     * unallocated between base and top?
> > +     */
> > +    intermediate = top->backing_hd;
> 
> This reaches into BlockDriverState->backing_hd.  In practice this is
> probably the simplest and best way to do it.  But if we're going to do
> this then do we even need the BlockDriver .bdrv_find_backing_file()
> abstraction?  We could implement generic code which traverses
> ->backing_hd in block.c and if you don't use
> BlockDriverState->backing_hd then you're out of luck.

Right. Then it becomes necessary for drivers to fill in 
->backing_hd and ->backing_file properly, which is reasonable.

Will drop the abstractions.

> > @@ -129,7 +141,10 @@ retry:
> >     bdrv_disable_zero_detection(bs);
> >
> >     if (sector_num == end && ret == 0) {
> > -        bdrv_change_backing_file(bs, NULL, NULL);
> > +        const char *base_id = NULL;
> > +        if (base)
> > +            base_id = s->backing_file_id;
> > +        bdrv_change_backing_file(bs, base_id, NULL);
> 
> This makes me want to move the bdrv_change_backing_file() call out to
> blockdev.c where we would perform it on successful completion.  That
> way we don't need to pass base_id into stream.c at all.  A streaming
> block job would no longer automatically change the backing file on
> completion.

Well, it is safer to perform the backing file change under refcounting 
protection (i don't see the advantage of your suggestion
other than the cosmetic aspect of saving a variable).

> > @@ -166,6 +182,8 @@ int stream_start(BlockDriverState *bs, B
> >
> >     s = block_job_create(&stream_job_type, bs, cb, opaque);
> >     s->base = base;
> > +    if (base_id)
> > +        strncpy(s->backing_file_id, base_id, sizeof(s->backing_file_id) - 1);
> 
> cutils.c:pstrcpy() is nicer than strncpy(), it does not need the size - 1.

Done.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 0/4] block streaming base support (v2)
  2011-12-30 10:03 [Qemu-devel] [patch 0/5] block streaming base support Marcelo Tosatti
                   ` (4 preceding siblings ...)
  2011-12-30 10:03 ` [Qemu-devel] [patch 5/5] add doc to describe live block operations Marcelo Tosatti
@ 2012-01-04 14:08 ` Marcelo Tosatti
  2012-01-04 14:08   ` [Qemu-devel] [patch 1/4] block: add bdrv_find_backing_image Marcelo Tosatti
                     ` (3 more replies)
  5 siblings, 4 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 14:08 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel


Add support for streaming data from an intermediate section of the
image chain.

v2: address comments from Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 1/4] block: add bdrv_find_backing_image
  2012-01-04 14:08 ` [Qemu-devel] [patch 0/4] block streaming base support (v2) Marcelo Tosatti
@ 2012-01-04 14:08   ` Marcelo Tosatti
  2012-01-04 14:08   ` [Qemu-devel] [patch 2/4] add QERR_BASE_ID_NOT_FOUND Marcelo Tosatti
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 14:08 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: bdrv-stream-shared-base-helpers --]
[-- Type: text/plain, Size: 1513 bytes --]

Add bdrv_find_backing_image: given a BlockDriverState pointer, and an id,
traverse the backing image chain to locate the id.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/block.c
===================================================================
--- stefanha.orig/block.c
+++ stefanha/block.c
@@ -2580,6 +2580,24 @@ int bdrv_snapshot_load_tmp(BlockDriverSt
     return -ENOTSUP;
 }
 
+BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs, const char *id)
+{
+    if (!bs->drv) {
+        return NULL;
+    }
+
+    if (bs->backing_hd) {
+        if (strncmp(bs->backing_file, id, sizeof(bs->backing_file)) == 0) {
+            return bs->backing_hd;
+        } else {
+            return bdrv_find_backing_image(bs->backing_hd, id);
+        }
+    }
+
+    return NULL;
+}
+
+
 #define NB_SUFFIXES 4
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size)
Index: stefanha/block.h
===================================================================
--- stefanha.orig/block.h
+++ stefanha/block.h
@@ -146,6 +146,7 @@ int coroutine_fn bdrv_co_writev(BlockDri
     int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, int *pnum);
+BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs, const char *id);
 int bdrv_truncate(BlockDriverState *bs, int64_t offset);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 2/4] add QERR_BASE_ID_NOT_FOUND
  2012-01-04 14:08 ` [Qemu-devel] [patch 0/4] block streaming base support (v2) Marcelo Tosatti
  2012-01-04 14:08   ` [Qemu-devel] [patch 1/4] block: add bdrv_find_backing_image Marcelo Tosatti
@ 2012-01-04 14:08   ` Marcelo Tosatti
  2012-01-04 14:08   ` [Qemu-devel] [patch 3/4] block stream: add support for partial streaming Marcelo Tosatti
  2012-01-04 14:08   ` [Qemu-devel] [patch 4/4] add doc to describe live block operations Marcelo Tosatti
  3 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 14:08 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: qerr-base-id-not-found --]
[-- Type: text/plain, Size: 956 bytes --]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/qerror.c
===================================================================
--- stefanha.orig/qerror.c
+++ stefanha/qerror.c
@@ -254,6 +254,10 @@ static const QErrorStringTable qerror_ta
         .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
         .desc      = "Invalid paramter combination",
     },
+    {
+        .error_fmt = QERR_BASE_ID_NOT_FOUND,
+        .desc      = "The base id %(base_id) has not been found",
+    },
     {}
 };
 
Index: stefanha/qerror.h
===================================================================
--- stefanha.orig/qerror.h
+++ stefanha/qerror.h
@@ -210,4 +210,7 @@ QError *qobject_to_qerror(const QObject 
 #define QERR_INVALID_PARAMETER_COMBINATION \
     "{ 'class': 'InvalidParameterCombination', 'data': {} }"
 
+#define QERR_BASE_ID_NOT_FOUND \
+    "{ 'class': 'BaseIdNotFound', 'data': { 'base_id': %s } }"
+
 #endif /* QERROR_H */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 14:08 ` [Qemu-devel] [patch 0/4] block streaming base support (v2) Marcelo Tosatti
  2012-01-04 14:08   ` [Qemu-devel] [patch 1/4] block: add bdrv_find_backing_image Marcelo Tosatti
  2012-01-04 14:08   ` [Qemu-devel] [patch 2/4] add QERR_BASE_ID_NOT_FOUND Marcelo Tosatti
@ 2012-01-04 14:08   ` Marcelo Tosatti
  2012-01-04 16:02     ` Eric Blake
  2012-01-04 14:08   ` [Qemu-devel] [patch 4/4] add doc to describe live block operations Marcelo Tosatti
  3 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 14:08 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: block-stream-base --]
[-- Type: text/plain, Size: 7104 bytes --]

Add support for streaming data from an intermediate section of the 
image chain (see patch and documentation for details).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/block.c
===================================================================
--- stefanha.orig/block.c
+++ stefanha/block.c
@@ -2229,6 +2229,70 @@ int bdrv_is_allocated(BlockDriverState *
     return data.ret;
 }
 
+/*
+ * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
+ *
+ * Return true if the given sector is allocated in top or base.
+ * Return false if the given sector is allocated in intermediate images.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ *  the specified sector) that are known to be in the same
+ *  allocated/unallocated state.
+ *
+ */
+int coroutine_fn bdrv_co_is_allocated_base(BlockDriverState *top,
+                                           BlockDriverState *base,
+                                           int64_t sector_num,
+                                           int nb_sectors, int *pnum)
+{
+    BlockDriverState *intermediate;
+    int ret, n;
+
+    ret = bdrv_co_is_allocated(top, sector_num, nb_sectors, &n);
+    if (ret) {
+        *pnum = n;
+        return ret;
+    }
+
+    /*
+     * Is the unallocated chunk [sector_num, n] also
+     * unallocated between base and top?
+     */
+    intermediate = top->backing_hd;
+
+    while (intermediate) {
+        int pnum_inter;
+
+        /* reached base */
+        if (intermediate == base) {
+            *pnum = n;
+            return 1;
+        }
+        ret = bdrv_co_is_allocated(intermediate, sector_num, nb_sectors,
+                                   &pnum_inter);
+        if (ret < 0) {
+            return ret;
+        } else if (ret) {
+            *pnum = pnum_inter;
+            return 0;
+        }
+
+        /*
+         * [sector_num, nb_sectors] is unallocated on top but intermediate
+         * might have
+         *
+         * [sector_num+x, nr_sectors] allocated.
+         */
+        if (n > pnum_inter) {
+            n = pnum_inter;
+        }
+
+        intermediate = intermediate->backing_hd;
+    }
+
+    return 1;
+}
+
 void bdrv_mon_event(const BlockDriverState *bdrv,
                     BlockMonEventAction action, int is_read)
 {
Index: stefanha/block.h
===================================================================
--- stefanha.orig/block.h
+++ stefanha/block.h
@@ -222,6 +222,10 @@ int bdrv_co_discard(BlockDriverState *bs
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
                       int *pnum);
+int coroutine_fn bdrv_co_is_allocated_base(BlockDriverState *top,
+                                           BlockDriverState *base,
+                                           int64_t sector_num, int nb_sectors,
+                                           int *pnum);
 
 #define BIOS_ATA_TRANSLATION_AUTO   0
 #define BIOS_ATA_TRANSLATION_NONE   1
Index: stefanha/block/stream.c
===================================================================
--- stefanha.orig/block/stream.c
+++ stefanha/block/stream.c
@@ -57,6 +57,7 @@ typedef struct StreamBlockJob {
     BlockJob common;
     RateLimit limit;
     BlockDriverState *base;
+    char backing_file_id[1024];
 } StreamBlockJob;
 
 static int coroutine_fn stream_populate(BlockDriverState *bs,
@@ -79,6 +80,7 @@ static void coroutine_fn stream_run(void
 {
     StreamBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
+    BlockDriverState *base = s->base;
     int64_t sector_num, end;
     int ret = 0;
     int n;
@@ -97,8 +99,17 @@ retry:
             break;
         }
 
-        ret = bdrv_co_is_allocated(bs, sector_num,
-                                   STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+
+        if (base) {
+            ret = bdrv_co_is_allocated_base(bs, base, sector_num,
+                                            STREAM_BUFFER_SIZE /
+                                            BDRV_SECTOR_SIZE,
+                                            &n);
+        } else {
+            ret = bdrv_co_is_allocated(bs, sector_num,
+                                       STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+                                       &n);
+        }
         trace_stream_one_iteration(s, sector_num, n, ret);
         if (ret == 0) {
             if (s->common.speed) {
@@ -115,6 +126,7 @@ retry:
         if (ret < 0) {
             break;
         }
+        ret = 0;
 
         /* Publish progress */
         s->common.offset += n * BDRV_SECTOR_SIZE;
@@ -129,7 +141,10 @@ retry:
     bdrv_disable_zero_detection(bs);
 
     if (sector_num == end && ret == 0) {
-        bdrv_change_backing_file(bs, NULL, NULL);
+        const char *base_id = NULL;
+        if (base)
+            base_id = s->backing_file_id;
+        bdrv_change_backing_file(bs, base_id, NULL);
     }
 
     qemu_vfree(buf);
@@ -155,7 +170,8 @@ static BlockJobType stream_job_type = {
 };
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
-                 BlockDriverCompletionFunc *cb, void *opaque)
+                 const char *base_id, BlockDriverCompletionFunc *cb,
+                 void *opaque)
 {
     StreamBlockJob *s;
     Coroutine *co;
@@ -166,6 +182,8 @@ int stream_start(BlockDriverState *bs, B
 
     s = block_job_create(&stream_job_type, bs, cb, opaque);
     s->base = base;
+    if (base_id)
+        pstrcpy(s->backing_file_id, sizeof(s->backing_file_id), base_id);
 
     co = qemu_coroutine_create(stream_run);
     trace_stream_start(bs, base, s, co, opaque);
Index: stefanha/blockdev.c
===================================================================
--- stefanha.orig/blockdev.c
+++ stefanha/blockdev.c
@@ -931,6 +931,7 @@ void qmp_block_stream(const char *device
                       const char *base, Error **errp)
 {
     BlockDriverState *bs;
+    BlockDriverState *base_bs = NULL;
     int ret;
 
     bs = bdrv_find(device);
@@ -939,13 +940,15 @@ void qmp_block_stream(const char *device
         return;
     }
 
-    /* Base device not supported */
     if (base) {
-        error_set(errp, QERR_NOT_SUPPORTED);
-        return;
+        base_bs = bdrv_find_backing_image(bs, base);
+        if (base_bs == NULL) {
+            error_set(errp, QERR_BASE_ID_NOT_FOUND, base);
+            return;
+        }
     }
 
-    ret = stream_start(bs, NULL, block_stream_cb, bs);
+    ret = stream_start(bs, base_bs, base, block_stream_cb, bs);
     if (ret < 0) {
         switch (ret) {
         case -EBUSY:
Index: stefanha/block_int.h
===================================================================
--- stefanha.orig/block_int.h
+++ stefanha/block_int.h
@@ -380,6 +380,7 @@ static inline bool block_job_is_cancelle
 }
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
-                 BlockDriverCompletionFunc *cb, void *opaque);
+                 const char *base_id, BlockDriverCompletionFunc *cb,
+                 void *opaque);
 
 #endif /* BLOCK_INT_H */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [patch 4/4] add doc to describe live block operations
  2012-01-04 14:08 ` [Qemu-devel] [patch 0/4] block streaming base support (v2) Marcelo Tosatti
                     ` (2 preceding siblings ...)
  2012-01-04 14:08   ` [Qemu-devel] [patch 3/4] block stream: add support for partial streaming Marcelo Tosatti
@ 2012-01-04 14:08   ` Marcelo Tosatti
  3 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 14:08 UTC (permalink / raw)
  To: stefanha, kwolf, qemu-devel; +Cc: Marcelo Tosatti

[-- Attachment #1: live-block-operations --]
[-- Type: text/plain, Size: 2108 bytes --]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: stefanha/docs/live-block-ops.txt
===================================================================
--- /dev/null
+++ stefanha/docs/live-block-ops.txt
@@ -0,0 +1,58 @@
+LIVE BLOCK OPERATIONS
+=====================
+
+High level description of live block operations. Note these are not
+supported for use with the raw format at the moment.
+
+Snapshot live merge
+===================
+
+Given a snapshot chain, described in this document in the following
+format:
+
+[A] -> [B] -> [C] -> [D]
+
+Where the rightmost object ([D] in the example) described is the current
+image which the guest OS has write access to. To the left of it is its base
+image, and so on accordingly until the leftmost image, which has no
+base.
+
+The snapshot live merge operation transforms such a chain into a
+smaller one with fewer elements, such as this transformation relative
+to the first example:
+
+[A] -> [D]
+
+Currently only forward merge with target being the active image is
+supported, that is, data copy is performed in the right direction with
+destination being the rightmost image.
+
+The operation is implemented in QEMU through image streaming facilities.
+
+The basic idea is to execute 'block_stream virtio0' while the guest is
+running. Progress can be monitored using 'info block-jobs'. When the
+streaming operation completes it raises a QMP event. 'block_stream'
+copies data from the backing file(s) into the active image. When finished,
+it adjusts the backing file pointer.
+
+The 'base' parameter specifies an image which data need not be streamed from.
+This image will be used as the backing file for the active image when the
+operation is finished.
+
+In the example above, the command would be:
+
+(qemu) block_stream virtio0 A
+
+
+Live block copy
+===============
+
+To copy an in use image to another destination in the filesystem, one
+should create a live snapshot in the desired destination, then stream
+into that image. Example:
+
+(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img qcow2
+
+(qemu) block_stream ide0-hd0
+
+

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 14:08   ` [Qemu-devel] [patch 3/4] block stream: add support for partial streaming Marcelo Tosatti
@ 2012-01-04 16:02     ` Eric Blake
  2012-01-04 17:47       ` Marcelo Tosatti
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Blake @ 2012-01-04 16:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kwolf, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1818 bytes --]

On 01/04/2012 07:08 AM, Marcelo Tosatti wrote:
> Add support for streaming data from an intermediate section of the 
> image chain (see patch and documentation for details).
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> Index: stefanha/block.c
> ===================================================================
> --- stefanha.orig/block.c
> +++ stefanha/block.c
> @@ -2229,6 +2229,70 @@ int bdrv_is_allocated(BlockDriverState *
>      return data.ret;
>  }
>  
> +/*
> + * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
> + *
> + * Return true if the given sector is allocated in top or base.
> + * Return false if the given sector is allocated in intermediate images.
> + *
> + * 'pnum' is set to the number of sectors (including and immediately following
> + *  the specified sector) that are known to be in the same
> + *  allocated/unallocated state.

Not a problem with this patch, per say, so much as a question about the
next steps:

How hard would it be to go one step further, and provide a monitor
command where qemu could dump the state of BASE, INTER1, or INTER2
without removing it from the image chain?  Libvirt would really like to
be able to have a command where the user can request to inspect to see
the contents of (a portion of) the disk at the time the snapshot was
created, all while qemu continues to run and the TOP file continues to
be adding deltas to that portion of the disk.

For that matter, I'm still missing out on the ability to extract the
contents of a qcow2 internal snapshot from an image that is in use by
qemu - we have the ability to delete internal snapshots but not to probe
their contents.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 16:02     ` Eric Blake
@ 2012-01-04 17:47       ` Marcelo Tosatti
  2012-01-04 18:03         ` Eric Blake
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 17:47 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, stefanha, qemu-devel

On Wed, Jan 04, 2012 at 09:02:06AM -0700, Eric Blake wrote:
> On 01/04/2012 07:08 AM, Marcelo Tosatti wrote:
> > Add support for streaming data from an intermediate section of the 
> > image chain (see patch and documentation for details).
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > Index: stefanha/block.c
> > ===================================================================
> > --- stefanha.orig/block.c
> > +++ stefanha/block.c
> > @@ -2229,6 +2229,70 @@ int bdrv_is_allocated(BlockDriverState *
> >      return data.ret;
> >  }
> >  
> > +/*
> > + * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
> > + *
> > + * Return true if the given sector is allocated in top or base.
> > + * Return false if the given sector is allocated in intermediate images.
> > + *
> > + * 'pnum' is set to the number of sectors (including and immediately following
> > + *  the specified sector) that are known to be in the same
> > + *  allocated/unallocated state.
> 
> Not a problem with this patch, per say, so much as a question about the
> next steps:
> 
> How hard would it be to go one step further, and provide a monitor
> command where qemu could dump the state of BASE, INTER1, or INTER2
> without removing it from the image chain?  Libvirt would really like to
> be able to have a command where the user can request to inspect to see
> the contents of (a portion of) the disk at the time the snapshot was
> created, all while qemu continues to run and the TOP file continues to
> be adding deltas to that portion of the disk.

What exactly do you mean "dump the state of"? You want access to
the contents of INTER2, INTER1, BASE, via libguestfs?

> For that matter, I'm still missing out on the ability to extract the
> contents of a qcow2 internal snapshot from an image that is in use by
> qemu - we have the ability to delete internal snapshots but not to probe
> their contents.

Same question (although i am not familiar with internal snapshots).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 17:47       ` Marcelo Tosatti
@ 2012-01-04 18:03         ` Eric Blake
  2012-01-04 19:22           ` Marcelo Tosatti
  2012-01-04 22:40           ` Stefan Hajnoczi
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Blake @ 2012-01-04 18:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kwolf, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2736 bytes --]

On 01/04/2012 10:47 AM, Marcelo Tosatti wrote:
>>> +/*
>>> + * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>>> + *
>>
>> How hard would it be to go one step further, and provide a monitor
>> command where qemu could dump the state of BASE, INTER1, or INTER2
>> without removing it from the image chain?  Libvirt would really like to
>> be able to have a command where the user can request to inspect to see
>> the contents of (a portion of) the disk at the time the snapshot was
>> created, all while qemu continues to run and the TOP file continues to
>> be adding deltas to that portion of the disk.
> 
> What exactly do you mean "dump the state of"? You want access to
> the contents of INTER2, INTER1, BASE, via libguestfs?

I want access via the qemu monitor (which can then be used by libvirt,
libguestfs, and others, to do whatever further management operations on
that snapshot as desired).

> 
>> For that matter, I'm still missing out on the ability to extract the
>> contents of a qcow2 internal snapshot from an image that is in use by
>> qemu - we have the ability to delete internal snapshots but not to probe
>> their contents.
> 
> Same question (although i am not familiar with internal snapshots).

With external snapshots, I know that once the external snapshot TOP is
created, then qemu is treating INTER2 as read-only; therefore, I can
then use qemu-img in parallel on INTER2 to probe the contents of the
snapshot; therefore, in libvirt, it would be possible for me to create a
raw image corresponding to the qcow2 contents of INTER2, or to create a
cloned qcow2 image corresponding to the raw contents of BASE, all while
TOP continues to be modified.

But with internal snapshots, both the snapshot and the current disk
state reside in the same qcow2 file, which is under current use by qemu,
and therefore, qemu-img cannot be safely used on that file.  The only
way I know of to extract the contents of that internal snapshot is via
qemu itself, but qemu does not currently expose that.  I envision
something similar to the memsave and pmemsave monitor commands, which
copy a (portion) of the guest's memory into a file (although copying
into an already-open fd passed via SCM_RIGHTS would be nicer than
requiring a file name, as is the current case with memsave).

And once we get qemu to expose the contents of an internal snapshot,
that same monitor command seems like it would be useful for exposing the
contents of an external snapshot such as INTER2 or BASE, rather than
having to use qemu-img in parallel on the external file.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 18:03         ` Eric Blake
@ 2012-01-04 19:22           ` Marcelo Tosatti
  2012-01-04 22:40           ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2012-01-04 19:22 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, stefanha, qemu-devel

On Wed, Jan 04, 2012 at 11:03:14AM -0700, Eric Blake wrote:
> On 01/04/2012 10:47 AM, Marcelo Tosatti wrote:
> >>> +/*
> >>> + * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
> >>> + *
> >>
> >> How hard would it be to go one step further, and provide a monitor
> >> command where qemu could dump the state of BASE, INTER1, or INTER2
> >> without removing it from the image chain?  Libvirt would really like to
> >> be able to have a command where the user can request to inspect to see
> >> the contents of (a portion of) the disk at the time the snapshot was
> >> created, all while qemu continues to run and the TOP file continues to
> >> be adding deltas to that portion of the disk.
> > 
> > What exactly do you mean "dump the state of"? You want access to
> > the contents of INTER2, INTER1, BASE, via libguestfs?
> 
> I want access via the qemu monitor (which can then be used by libvirt,
> libguestfs, and others, to do whatever further management operations on
> that snapshot as desired).
> 
> > 
> >> For that matter, I'm still missing out on the ability to extract the
> >> contents of a qcow2 internal snapshot from an image that is in use by
> >> qemu - we have the ability to delete internal snapshots but not to probe
> >> their contents.
> > 
> > Same question (although i am not familiar with internal snapshots).
> 
> With external snapshots, I know that once the external snapshot TOP is
> created, then qemu is treating INTER2 as read-only; therefore, I can
> then use qemu-img in parallel on INTER2 to probe the contents of the
> snapshot; therefore, in libvirt, it would be possible for me to create a
> raw image corresponding to the qcow2 contents of INTER2, or to create a
> cloned qcow2 image corresponding to the raw contents of BASE, all while
> TOP continues to be modified.

Correct.

> But with internal snapshots, both the snapshot and the current disk
> state reside in the same qcow2 file, which is under current use by qemu,
> and therefore, qemu-img cannot be safely used on that file.  The only
> way I know of to extract the contents of that internal snapshot is via
> qemu itself, but qemu does not currently expose that.  I envision
> something similar to the memsave and pmemsave monitor commands, which
> copy a (portion) of the guest's memory into a file (although copying
> into an already-open fd passed via SCM_RIGHTS would be nicer than
> requiring a file name, as is the current case with memsave).
> 
> And once we get qemu to expose the contents of an internal snapshot,
> that same monitor command seems like it would be useful for exposing the
> contents of an external snapshot such as INTER2 or BASE, rather than
> having to use qemu-img in parallel on the external file.

I'll defer to Kevin or Stefan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 18:03         ` Eric Blake
  2012-01-04 19:22           ` Marcelo Tosatti
@ 2012-01-04 22:40           ` Stefan Hajnoczi
  2012-01-05  7:46             ` Paolo Bonzini
  2012-01-09 10:58             ` Kevin Wolf
  1 sibling, 2 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2012-01-04 22:40 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, Paolo Bonzini, Marcelo Tosatti, stefanha, qemu-devel

On Wed, Jan 4, 2012 at 6:03 PM, Eric Blake <eblake@redhat.com> wrote:
> On 01/04/2012 10:47 AM, Marcelo Tosatti wrote:
>>>> +/*
>>>> + * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>>>> + *
>>>
>>> How hard would it be to go one step further, and provide a monitor
>>> command where qemu could dump the state of BASE, INTER1, or INTER2
>>> without removing it from the image chain?  Libvirt would really like to
>>> be able to have a command where the user can request to inspect to see
>>> the contents of (a portion of) the disk at the time the snapshot was
>>> created, all while qemu continues to run and the TOP file continues to
>>> be adding deltas to that portion of the disk.
>>
>> What exactly do you mean "dump the state of"? You want access to
>> the contents of INTER2, INTER1, BASE, via libguestfs?
>
> I want access via the qemu monitor (which can then be used by libvirt,
> libguestfs, and others, to do whatever further management operations on
> that snapshot as desired).
>
>>
>>> For that matter, I'm still missing out on the ability to extract the
>>> contents of a qcow2 internal snapshot from an image that is in use by
>>> qemu - we have the ability to delete internal snapshots but not to probe
>>> their contents.
>>
>> Same question (although i am not familiar with internal snapshots).
>
> With external snapshots, I know that once the external snapshot TOP is
> created, then qemu is treating INTER2 as read-only; therefore, I can
> then use qemu-img in parallel on INTER2 to probe the contents of the
> snapshot; therefore, in libvirt, it would be possible for me to create a
> raw image corresponding to the qcow2 contents of INTER2, or to create a
> cloned qcow2 image corresponding to the raw contents of BASE, all while
> TOP continues to be modified.
>
> But with internal snapshots, both the snapshot and the current disk
> state reside in the same qcow2 file, which is under current use by qemu,
> and therefore, qemu-img cannot be safely used on that file.  The only
> way I know of to extract the contents of that internal snapshot is via
> qemu itself, but qemu does not currently expose that.  I envision
> something similar to the memsave and pmemsave monitor commands, which
> copy a (portion) of the guest's memory into a file (although copying
> into an already-open fd passed via SCM_RIGHTS would be nicer than
> requiring a file name, as is the current case with memsave).
>
> And once we get qemu to expose the contents of an internal snapshot,
> that same monitor command seems like it would be useful for exposing the
> contents of an external snapshot such as INTER2 or BASE, rather than
> having to use qemu-img in parallel on the external file.

The qcow2 implementation never accesses snapshots directly.  Instead
there's the concept of the current L1 table, which means there is a
single global state of the disk.  Snapshots are immutable and are
never accessed directly, only copied into the current L1 table.  The
single global state makes it a little tricky to access a snapshot
while the VM is running.

That said, the file format itself doesn't prevent an implementation
from supporting read-only access to snapshots.  In theory we can
extend the qcow2 implementation to support this behavior.

What you want sounds almost like an NBD server that can be
launched/stopped while qemu is already running a VM.  This could be a
QEMU monitor command like:
nbd-start tcp::1234 virtio-disk0 --snapshot 20120104

It would be possible to stop the server using the same <socket, drive,
snapshot> tuple.  Note the server needs to provide read-only access,
allowing writes probably has little use and people will hose their
data.

Paolo: I haven't looked at the new and improved NBD server yet.  Does
this sound doable?

Kevin: I think we need something like qcow2_snapshot_load_tmp() but it
returns a full new BlockDriverState.  The hard thing is that duping a
read-only snapshot qcow2 state leads to sharing and lifecycle problems
- what if we want to close the original BlockDriverState, will the
read-only snapshot state prevent this?

Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 22:40           ` Stefan Hajnoczi
@ 2012-01-05  7:46             ` Paolo Bonzini
  2012-01-09 10:58             ` Kevin Wolf
  1 sibling, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2012-01-05  7:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, Marcelo Tosatti, Eric Blake, stefanha, qemu-devel

On 01/04/2012 11:40 PM, Stefan Hajnoczi wrote:
> What you want sounds almost like an NBD server that can be
> launched/stopped while qemu is already running a VM.  This could be a
> QEMU monitor command like:
> nbd-start tcp::1234 virtio-disk0 --snapshot 20120104
>
> It would be possible to stop the server using the same<socket, drive,
> snapshot>  tuple.  Note the server needs to provide read-only access,
> allowing writes probably has little use and people will hose their
> data.

That makes sense, just like most qemu-img commands have an equivalent in 
the monitor for online usage.

> Paolo: I haven't looked at the new and improved NBD server yet.  Does
> this sound doable?

Yes, indeed.  It should not be hard.  The NBD server is now entirely 
asynchronous, and by using the main loop the integration is very much 
simplified.

Briefly, nbd.c now has a simple server API:

     typedef struct NBDExport NBDExport;
     typedef struct NBDClient NBDClient;

     NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
                               off_t size, uint32_t nbdflags);
     void nbd_export_close(NBDExport *exp);
     NBDClient *nbd_client_new(NBDExport *exp, int csock,
                               void (*close)(NBDClient *));

... that takes care of everything except creating the server socket and 
accepting clients from it.  Which is actually even better, because 
instead of having a generic NBD server you could start one on a file 
descriptor that you pass via SCM_RIGHTS (aka getfd).

> Kevin: I think we need something like qcow2_snapshot_load_tmp() but it
> returns a full new BlockDriverState.  The hard thing is that duping a
> read-only snapshot qcow2 state leads to sharing and lifecycle problems
> - what if we want to close the original BlockDriverState, will the
> read-only snapshot state prevent this?

We can prevent closing the parent BDS until all its children are gone.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-04 22:40           ` Stefan Hajnoczi
  2012-01-05  7:46             ` Paolo Bonzini
@ 2012-01-09 10:58             ` Kevin Wolf
  2012-01-09 13:14               ` Stefan Hajnoczi
  1 sibling, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2012-01-09 10:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Paolo Bonzini, Marcelo Tosatti, Eric Blake, stefanha, qemu-devel

Am 04.01.2012 23:40, schrieb Stefan Hajnoczi:
> The qcow2 implementation never accesses snapshots directly.  Instead
> there's the concept of the current L1 table, which means there is a
> single global state of the disk.  Snapshots are immutable and are
> never accessed directly, only copied into the current L1 table.  The
> single global state makes it a little tricky to access a snapshot
> while the VM is running.
> 
> That said, the file format itself doesn't prevent an implementation
> from supporting read-only access to snapshots.  In theory we can
> extend the qcow2 implementation to support this behavior.

And I think in practice we should. I've been meaning to do something
like this for too long, but as you know everyone is focussed on external
snapshots, so the internal ones don't get the attention they would deserve.

> What you want sounds almost like an NBD server that can be
> launched/stopped while qemu is already running a VM.  This could be a
> QEMU monitor command like:
> nbd-start tcp::1234 virtio-disk0 --snapshot 20120104

I like this idea. :-)

> It would be possible to stop the server using the same <socket, drive,
> snapshot> tuple.  Note the server needs to provide read-only access,
> allowing writes probably has little use and people will hose their
> data.
> 
> Paolo: I haven't looked at the new and improved NBD server yet.  Does
> this sound doable?
> 
> Kevin: I think we need something like qcow2_snapshot_load_tmp() but it
> returns a full new BlockDriverState.  The hard thing is that duping a
> read-only snapshot qcow2 state leads to sharing and lifecycle problems
> - what if we want to close the original BlockDriverState, will the
> read-only snapshot state prevent this?

Yes, creating new read-only BlockDriverStates from one image is exactly
the thought that I had when reading this thread. The problem is that the
BlockDriverStates wouldn't be fully independent. What if you delete the
snapshot that is used by another BlockDriverState etc.?

I think the least you would need is to have a separation between one
BlockImage (which is a whole qcow2 file) and multiple BlockDriverStates
(which is the backend that devices/NBD servers/whatever use). Not sure
if such a fundamental block layer change is worth the effort. On the
other hand, if Anthony is serious about QOM we'll get a fundamental
design change anyway at some point...

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [patch 3/4] block stream: add support for partial streaming
  2012-01-09 10:58             ` Kevin Wolf
@ 2012-01-09 13:14               ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2012-01-09 13:14 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Paolo Bonzini, Marcelo Tosatti, Eric Blake, stefanha, qemu-devel

On Mon, Jan 9, 2012 at 10:58 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 04.01.2012 23:40, schrieb Stefan Hajnoczi:
>> Kevin: I think we need something like qcow2_snapshot_load_tmp() but it
>> returns a full new BlockDriverState.  The hard thing is that duping a
>> read-only snapshot qcow2 state leads to sharing and lifecycle problems
>> - what if we want to close the original BlockDriverState, will the
>> read-only snapshot state prevent this?
>
> Yes, creating new read-only BlockDriverStates from one image is exactly
> the thought that I had when reading this thread. The problem is that the
> BlockDriverStates wouldn't be fully independent. What if you delete the
> snapshot that is used by another BlockDriverState etc.?
>
> I think the least you would need is to have a separation between one
> BlockImage (which is a whole qcow2 file) and multiple BlockDriverStates
> (which is the backend that devices/NBD servers/whatever use). Not sure
> if such a fundamental block layer change is worth the effort.

If we want internal snapshots to be useful then I think this change is
necessary.

One thing that bothers me about an in-process NBD server is that
client access to data is tied to the qemu process lifetime.  From the
libvirt level this would mean being aware that the in-process NBD
server should be used when the VM is running and using qemu-nbd
instead when the VM is not running.  Transitions between the running
and not running state would not be very smooth unless it did something
crazy like an NBD proxy that multiplexes between the in-process and
the qemu-nbd servers.

Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-01-09 13:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-30 10:03 [Qemu-devel] [patch 0/5] block streaming base support Marcelo Tosatti
2011-12-30 10:03 ` [Qemu-devel] [patch 1/5] block: add bdrv_find_backing_image Marcelo Tosatti
2011-12-30 10:03 ` [Qemu-devel] [patch 2/5] block: implement bdrv_find_backing_image in qcow2 Marcelo Tosatti
2012-01-03 13:44   ` Stefan Hajnoczi
2011-12-30 10:03 ` [Qemu-devel] [patch 3/5] add QERR_BASE_ID_NOT_FOUND Marcelo Tosatti
2011-12-30 10:03 ` [Qemu-devel] [patch 4/5] block stream: add support for partial streaming Marcelo Tosatti
2012-01-04 12:39   ` Stefan Hajnoczi
2012-01-04 13:52     ` Marcelo Tosatti
2011-12-30 10:03 ` [Qemu-devel] [patch 5/5] add doc to describe live block operations Marcelo Tosatti
2012-01-04 14:08 ` [Qemu-devel] [patch 0/4] block streaming base support (v2) Marcelo Tosatti
2012-01-04 14:08   ` [Qemu-devel] [patch 1/4] block: add bdrv_find_backing_image Marcelo Tosatti
2012-01-04 14:08   ` [Qemu-devel] [patch 2/4] add QERR_BASE_ID_NOT_FOUND Marcelo Tosatti
2012-01-04 14:08   ` [Qemu-devel] [patch 3/4] block stream: add support for partial streaming Marcelo Tosatti
2012-01-04 16:02     ` Eric Blake
2012-01-04 17:47       ` Marcelo Tosatti
2012-01-04 18:03         ` Eric Blake
2012-01-04 19:22           ` Marcelo Tosatti
2012-01-04 22:40           ` Stefan Hajnoczi
2012-01-05  7:46             ` Paolo Bonzini
2012-01-09 10:58             ` Kevin Wolf
2012-01-09 13:14               ` Stefan Hajnoczi
2012-01-04 14:08   ` [Qemu-devel] [patch 4/4] add doc to describe live block operations Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.