All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
@ 2013-05-15 14:34 Stefan Hajnoczi
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb() Stefan Hajnoczi
                   ` (8 more replies)
  0 siblings, 9 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

Note: These patches apply to my block-next tree.  You can also grab the code
from git here:
git://github.com/stefanha/qemu.git block-backup-core

This series adds a new QMP command, drive-backup, which takes a point-in-time
snapshot of a block device.  The snapshot is copied out to a target block
device.  A simple example is:

  drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2

The original drive-backup blockjob was written by Dietmar Maurer
<dietmar@proxmox.com>.  He is currently busy but I feel the feature is worth
pushing into QEMU since there has been interest.  This is my version of his
patch, plus the QMP command and qemu-iotests test case.

QMP 'transaction' support is included since v3.  It adds support for atomic
snapshots of multiple block devices.  I also added an 'abort' transaction to
allow testing of the .abort()/.cleanup() code path.  Thanks to Wenchao for
making qmp_transaction() extensible.

How is this different from block-stream and drive-mirror?
---------------------------------------------------------
Both block-stream and drive-mirror do not provide immediate point-in-time
snapshots.  Instead they copy data into a new file and then switch to it.  In
other words, the point at which the "snapshot" is taken cannot be controlled
directly.

drive-backup intercepts guest writes and saves data into the target block
device before it is overwritten.  The target block device can be a raw image
file, backing files are not used to implement this feature.

How can drive-backup be used?
-----------------------------
The simplest use-case is to copy a point-in-time snapshot to a local file.

More advanced users may wish to make the target an NBD URL.  The NBD server
listening on the other side can process the backup writes any way it wishes.  I
previously posted an RFC series with a backup server that streamed Dietmar's
VMA backup archive format.

What's next for drive-backup?
-----------------------------
1. Sync modes like drive-mirror (top, full, none).  This makes it possible to
   preserve the backing file chain.

v3:
 * Rename to drive-backup for consistency with drive-mirror [kwolf]
 * Add QMP transaction support [kwolf]
 * Introduce bdrv_add_before_write_cb() to hook writes
 * Mention 'query-block-jobs' lists job of type 'backup' [eblake]
 * Rename rwlock to flush_rwlock [kwolf]
 * Fix space in block/backup.c comment [kwolf]

v2:
 * s/block_backup/block-backup/ in commit message [eblake]
 * Avoid funny spacing in QMP docs [eblake]
 * Document query-block-jobs and block-job-cancel usage [eblake]

Dietmar Maurer (1):
  block: add basic backup support to block driver

Stefan Hajnoczi (7):
  block: add bdrv_add_before_write_cb()
  block: add drive-backup QMP command
  qemu-iotests: add 055 drive-backup test case
  blockdev: rename BlkTransactionStates to singular
  blockdev: add DriveBackup transaction
  blockdev: add Abort transaction
  qemu-iotests: test 'drive-backup' transaction in 055

 block.c                    |  37 +++++
 block/Makefile.objs        |   1 +
 block/backup.c             | 282 ++++++++++++++++++++++++++++++++++++
 blockdev.c                 | 264 +++++++++++++++++++++++++++-------
 include/block/block_int.h  |  48 +++++++
 qapi-schema.json           |  65 ++++++++-
 qmp-commands.hx            |   6 +
 tests/qemu-iotests/055     | 348 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/055.out |   5 +
 tests/qemu-iotests/group   |   1 +
 10 files changed, 1004 insertions(+), 53 deletions(-)
 create mode 100644 block/backup.c
 create mode 100755 tests/qemu-iotests/055
 create mode 100644 tests/qemu-iotests/055.out

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb()
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-15 14:42   ` Paolo Bonzini
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver Stefan Hajnoczi
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

The bdrv_add_before_write_cb() function installs a callback that is
invoked before a write request is processed.  This will be used to
implement copy-on-write point-in-time snapshots where we need to copy
out old data before overwriting it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                   | 37 +++++++++++++++++++++++++++++++++++++
 include/block/block_int.h | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/block.c b/block.c
index 3f87489..0fd7167 100644
--- a/block.c
+++ b/block.c
@@ -308,6 +308,7 @@ BlockDriverState *bdrv_new(const char *device_name)
     }
     bdrv_iostatus_disable(bs);
     notifier_list_init(&bs->close_notifiers);
+    QTAILQ_INIT(&bs->before_write_cbs);
 
     return bs;
 }
@@ -1383,6 +1384,8 @@ void bdrv_close(BlockDriverState *bs)
         bs->growable = 0;
         QDECREF(bs->options);
         bs->options = NULL;
+        assert(QTAILQ_EMPTY(&bs->before_write_cbs));
+        QTAILQ_INIT(&bs->before_write_cbs);
 
         if (bs->file != NULL) {
             bdrv_delete(bs->file);
@@ -2587,6 +2590,22 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     return ret;
 }
 
+struct BDRVBeforeWrite {
+    BDRVBeforeWriteFunc *cb;
+    QTAILQ_ENTRY(BDRVBeforeWrite) list;
+};
+
+static void invoke_before_write_cb(BlockDriverState *bs, int64_t sector_num,
+                                   int nb_sectors, QEMUIOVector *qiov)
+{
+    BDRVBeforeWrite *before_write;
+    BDRVBeforeWrite *tmp;
+    QTAILQ_FOREACH_SAFE(before_write, &bs->before_write_cbs, list, tmp) {
+        before_write->cb(bs, sector_num, nb_sectors, qiov);
+    }
+}
+
+
 /*
  * Handle a write request in coroutine context
  */
@@ -2619,6 +2638,8 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
 
+    invoke_before_write_cb(bs, sector_num, nb_sectors, qiov);
+
     if (flags & BDRV_REQ_ZERO_WRITE) {
         ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
     } else {
@@ -4883,3 +4904,19 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs)
     /* Currently BlockDriverState always uses the main loop AioContext */
     return qemu_get_aio_context();
 }
+
+BDRVBeforeWrite *bdrv_add_before_write_cb(BlockDriverState *bs,
+                                          BDRVBeforeWriteFunc *cb)
+{
+    BDRVBeforeWrite *elem = g_slice_new(BDRVBeforeWrite);
+    elem->cb = cb;
+    QTAILQ_INSERT_TAIL(&bs->before_write_cbs, elem, list);
+    return elem;
+}
+
+void bdrv_remove_before_write_cb(BlockDriverState *bs,
+                                 BDRVBeforeWrite *before_write)
+{
+    QTAILQ_REMOVE(&bs->before_write_cbs, before_write, list);
+    g_slice_free(BDRVBeforeWrite, before_write);
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6078dd3..e2299df 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -211,6 +211,16 @@ struct BlockDriver {
     QLIST_ENTRY(BlockDriver) list;
 };
 
+/**
+ * BDRVBeforeWriteFunc:
+ *
+ * See #bdrv_add_before_write_cb().
+ */
+typedef void coroutine_fn BDRVBeforeWriteFunc(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+
+typedef struct BDRVBeforeWrite BDRVBeforeWrite;
+
 /*
  * Note: the function bdrv_append() copies and swaps contents of
  * BlockDriverStates, so if you add new fields to this struct, please
@@ -289,6 +299,9 @@ struct BlockDriverState {
     /* long-running background operation */
     BlockJob *job;
 
+    /* Callback before write request is processed */
+    QTAILQ_HEAD(, BDRVBeforeWrite) before_write_cbs;
+
     QDict *options;
 };
 
@@ -298,6 +311,25 @@ void bdrv_set_io_limits(BlockDriverState *bs,
                         BlockIOLimit *io_limits);
 
 /**
+ * bdrv_add_before_write_cb:
+ *
+ * Register a callback that is invoked before write requests are processed but
+ * after any throttling or waiting for overlapping requests.
+ *
+ * Returns: a #BDRVBeforeWrite to use with bdrv_remove_before_write_cb()
+ */
+BDRVBeforeWrite *bdrv_add_before_write_cb(BlockDriverState *bs,
+                                          BDRVBeforeWriteFunc *cb);
+
+/**
+ * bdrv_remove_before_write_cb:
+ *
+ * Unregister a before write callback.
+ */
+void bdrv_remove_before_write_cb(BlockDriverState *bs,
+                                 BDRVBeforeWrite *before_write);
+
+/**
  * bdrv_get_aio_context:
  *
  * Returns: the currently bound #AioContext
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb() Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-16  3:27   ` Wenchao Xia
  2013-05-20 11:30   ` Paolo Bonzini
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 3/8] block: add drive-backup QMP command Stefan Hajnoczi
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

From: Dietmar Maurer <dietmar@proxmox.com>

backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.

We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten.  The data is then written to the target device.

Currently backup cluster size is hardcoded to 65536 bytes.

[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy.  Here is the full list:

 * Drop BackupDumpFunc interface in favor of a target block device
 * Detect zero clusters with buffer_is_zero()
 * Don't write zero clusters to the target
 * Use 0 delay instead of 1us, like other block jobs
 * Unify creation/start functions into backup_start()
 * Simplify cleanup, free bitmap in backup_run() instead of cb
 * function
 * Use HBitmap to avoid duplicating bitmap code
 * Use bdrv_getlength() instead of accessing ->total_sectors
 * directly
 * Delete the backup.h header file, it is no longer necessary
 * Move ./backup.c to block/backup.c
 * Remove #ifdefed out code
 * Coding style and whitespace cleanups
 * Use bdrv_add_before_write_cb() instead of blockjob-specific hooks
 * Keep our own in-flight CowRequest list instead of using block.c
   tracked requests

-- stefanha]

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/Makefile.objs       |   1 +
 block/backup.c            | 282 ++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block_int.h |  16 +++
 3 files changed, 299 insertions(+)
 create mode 100644 block/backup.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 5f0358a..88bd101 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -20,5 +20,6 @@ endif
 common-obj-y += stream.o
 common-obj-y += commit.o
 common-obj-y += mirror.o
+common-obj-y += backup.o
 
 $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS)
diff --git a/block/backup.c b/block/backup.c
new file mode 100644
index 0000000..01d2fd3
--- /dev/null
+++ b/block/backup.c
@@ -0,0 +1,282 @@
+/*
+ * QEMU backup
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include "block/block.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "qemu/ratelimit.h"
+
+#define DEBUG_BACKUP 0
+
+#define DPRINTF(fmt, ...) \
+    do { \
+        if (DEBUG_BACKUP) { \
+            fprintf(stderr, "backup: " fmt, ## __VA_ARGS__); \
+        } \
+    } while (0)
+
+#define BACKUP_CLUSTER_BITS 16
+#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
+#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct CowRequest {
+    int64_t start;
+    int64_t end;
+    QLIST_ENTRY(CowRequest) list;
+    CoQueue wait_queue; /* coroutines blocked on this request */
+} CowRequest;
+
+typedef struct BackupBlockJob {
+    BlockJob common;
+    BlockDriverState *target;
+    RateLimit limit;
+    CoRwlock flush_rwlock;
+    uint64_t sectors_read;
+    HBitmap *bitmap;
+    QLIST_HEAD(, CowRequest) inflight_reqs;
+} BackupBlockJob;
+
+/* See if in-flight requests overlap and wait for them to complete */
+static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
+                                                       int64_t start,
+                                                       int64_t end)
+{
+    CowRequest *req;
+    bool retry;
+
+    do {
+        retry = false;
+        QLIST_FOREACH(req, &job->inflight_reqs, list) {
+            if (end > req->start && start < req->end) {
+                qemu_co_queue_wait(&req->wait_queue);
+                retry = true;
+                break;
+            }
+        }
+    } while (retry);
+}
+
+/* Keep track of an in-flight request */
+static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
+                                     int64_t start, int64_t end)
+{
+    req->start = start;
+    req->end = end;
+    qemu_co_queue_init(&req->wait_queue);
+    QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
+}
+
+/* Forget about a completed request */
+static void cow_request_end(CowRequest *req)
+{
+    QLIST_REMOVE(req, list);
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
+
+static int coroutine_fn backup_do_cow(BlockDriverState *bs,
+                                      int64_t sector_num, int nb_sectors)
+{
+    BackupBlockJob *job = (BackupBlockJob *)bs->job;
+    CowRequest cow_request;
+    struct iovec iov;
+    QEMUIOVector bounce_qiov;
+    void *bounce_buffer = NULL;
+    int ret = 0;
+    int64_t start, end;
+
+    qemu_co_rwlock_rdlock(&job->flush_rwlock);
+
+    start = sector_num / BACKUP_BLOCKS_PER_CLUSTER;
+    end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_BLOCKS_PER_CLUSTER);
+
+    DPRINTF("brdv_co_backup_cow enter %s C%" PRId64 " %" PRId64 " %d\n",
+            bdrv_get_device_name(bs), start, sector_num, nb_sectors);
+
+    wait_for_overlapping_requests(job, start, end);
+    cow_request_begin(&cow_request, job, start, end);
+
+    for (; start < end; start++) {
+        if (hbitmap_get(job->bitmap, start)) {
+            DPRINTF("brdv_co_backup_cow skip C%" PRId64 "\n", start);
+            continue; /* already copied */
+        }
+
+        /* immediately set bitmap (avoid coroutine race) */
+        hbitmap_set(job->bitmap, start, 1);
+
+        DPRINTF("brdv_co_backup_cow C%" PRId64 "\n", start);
+
+        if (!bounce_buffer) {
+            iov.iov_len = BACKUP_CLUSTER_SIZE;
+            iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
+            qemu_iovec_init_external(&bounce_qiov, &iov, 1);
+        }
+
+        ret = bdrv_co_readv(bs, start * BACKUP_BLOCKS_PER_CLUSTER,
+                            BACKUP_BLOCKS_PER_CLUSTER,
+                            &bounce_qiov);
+        if (ret < 0) {
+            DPRINTF("brdv_co_backup_cow bdrv_read C%" PRId64 " failed\n",
+                    start);
+            goto out;
+        }
+
+        job->sectors_read += BACKUP_BLOCKS_PER_CLUSTER;
+
+        if (!buffer_is_zero(bounce_buffer, BACKUP_CLUSTER_SIZE)) {
+            ret = bdrv_co_writev(job->target, start * BACKUP_BLOCKS_PER_CLUSTER,
+                                 BACKUP_BLOCKS_PER_CLUSTER,
+                                 &bounce_qiov);
+            if (ret < 0) {
+                DPRINTF("brdv_co_backup_cow dump_cluster_cb C%" PRId64
+                        " failed\n", start);
+                goto out;
+            }
+        }
+
+        DPRINTF("brdv_co_backup_cow done C%" PRId64 "\n", start);
+    }
+
+out:
+    if (bounce_buffer) {
+        qemu_vfree(bounce_buffer);
+    }
+
+    cow_request_end(&cow_request);
+
+    qemu_co_rwlock_unlock(&job->flush_rwlock);
+
+    return ret;
+}
+
+static void coroutine_fn backup_before_write(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int nb_sectors,
+                                             QEMUIOVector *qiov)
+{
+    backup_do_cow(bs, sector_num, nb_sectors);
+}
+
+static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+
+    if (speed < 0) {
+        error_set(errp, QERR_INVALID_PARAMETER, "speed");
+        return;
+    }
+    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType backup_job_type = {
+    .instance_size = sizeof(BackupBlockJob),
+    .job_type = "backup",
+    .set_speed = backup_set_speed,
+};
+
+static void coroutine_fn backup_run(void *opaque)
+{
+    BackupBlockJob *job = opaque;
+    BlockDriverState *bs = job->common.bs;
+    BDRVBeforeWrite *before_write;
+    int64_t start, end;
+    int ret = 0;
+
+    QLIST_INIT(&job->inflight_reqs);
+    qemu_co_rwlock_init(&job->flush_rwlock);
+
+    start = 0;
+    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
+                       BACKUP_BLOCKS_PER_CLUSTER);
+
+    job->bitmap = hbitmap_alloc(end, 0);
+
+    before_write = bdrv_add_before_write_cb(bs, backup_before_write);
+
+    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
+            bdrv_get_device_name(bs), start, end);
+
+    for (; start < end; start++) {
+        if (block_job_is_cancelled(&job->common)) {
+            break;
+        }
+
+        /* we need to yield so that qemu_aio_flush() returns.
+         * (without, VM does not reboot)
+         */
+        if (job->common.speed) {
+            uint64_t delay_ns = ratelimit_calculate_delay(
+                &job->limit, job->sectors_read);
+            job->sectors_read = 0;
+            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
+        } else {
+            block_job_sleep_ns(&job->common, rt_clock, 0);
+        }
+
+        if (block_job_is_cancelled(&job->common)) {
+            break;
+        }
+
+        DPRINTF("backup_run loop C%" PRId64 "\n", start);
+
+        ret = backup_do_cow(bs, start * BACKUP_BLOCKS_PER_CLUSTER, 1);
+        if (ret < 0) {
+            break;
+        }
+
+        /* Publish progress */
+        job->common.offset += BACKUP_CLUSTER_SIZE;
+    }
+
+    bdrv_remove_before_write_cb(bs, before_write);
+
+    /* wait until pending backup_do_cow() calls have completed */
+    qemu_co_rwlock_wrlock(&job->flush_rwlock);
+    qemu_co_rwlock_unlock(&job->flush_rwlock);
+
+    hbitmap_free(job->bitmap);
+
+    bdrv_delete(job->target);
+
+    DPRINTF("backup_run complete %d\n", ret);
+    block_job_completed(&job->common, ret);
+}
+
+void backup_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed,
+                  BlockDriverCompletionFunc *cb, void *opaque,
+                  Error **errp)
+{
+    assert(bs);
+    assert(target);
+    assert(cb);
+
+    DPRINTF("backup_start %s\n", bdrv_get_device_name(bs));
+
+    BackupBlockJob *job = block_job_create(&backup_job_type, bs, speed,
+                                           cb, opaque, errp);
+    if (!job) {
+        return;
+    }
+
+    job->target = target;
+    job->common.len = bdrv_getlength(bs);
+    job->common.co = qemu_coroutine_create(backup_run);
+    qemu_coroutine_enter(job->common.co, job);
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e2299df..e10ac50 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -409,4 +409,20 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
+/*
+ * backup_start:
+ * @bs: Block device to operate on.
+ * @target: Block device to write to.
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ *
+ * Start a backup operation on @bs.  Clusters in @bs are written to @target
+ * until the job is cancelled or manually completed.
+ */
+void backup_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed,
+                  BlockDriverCompletionFunc *cb, void *opaque,
+                  Error **errp);
+
 #endif /* BLOCK_INT_H */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 3/8] block: add drive-backup QMP command
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb() Stefan Hajnoczi
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-15 19:04   ` Eric Blake
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 4/8] qemu-iotests: add 055 drive-backup test case Stefan Hajnoczi
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

@drive-backup

Start a point-in-time copy of a block device to a new destination.  The
status of ongoing drive-backup operations can be checked with
query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
The operation can be stopped before it has completed using the
block-job-cancel command.

@device: the name of the device whose writes should be mirrored.

@target: the target of the new image. If the file exists, or if it
         is a device, the existing file/device will be used as the new
         destination.  If it does not exist, a new file will be created.

@format: #optional the format of the new destination, default is to
         probe if @mode is 'existing', else the format of the source

@mode: #optional whether and how QEMU should create a new image, default is
       'absolute-paths'.

@speed: #optional the maximum speed, in bytes per second

Returns: nothing on success
         If @device is not a valid block device, DeviceNotFound

Since 1.6

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c       | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 32 ++++++++++++++++++++
 qmp-commands.hx  |  6 ++++
 3 files changed, 130 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index d1ec99a..c7a5e1e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1353,6 +1353,98 @@ void qmp_block_commit(const char *device,
     drive_get_ref(drive_get_by_blockdev(bs));
 }
 
+void qmp_drive_backup(const char *device, const char *target,
+                      bool has_format, const char *format,
+                      bool has_mode, enum NewImageMode mode,
+                      bool has_speed, int64_t speed,
+                      Error **errp)
+{
+    BlockDriverState *bs;
+    BlockDriverState *target_bs;
+    BlockDriver *proto_drv;
+    BlockDriver *drv = NULL;
+    Error *local_err = NULL;
+    int flags;
+    uint64_t size;
+    int ret;
+
+    if (!has_speed) {
+        speed = 0;
+    }
+    if (!has_mode) {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    if (!bdrv_is_inserted(bs)) {
+        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+        return;
+    }
+
+    if (!has_format) {
+        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
+    }
+    if (format) {
+        drv = bdrv_find_format(format);
+        if (!drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            return;
+        }
+    }
+
+    if (bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, device);
+        return;
+    }
+
+    flags = bs->open_flags | BDRV_O_RDWR;
+
+    proto_drv = bdrv_find_protocol(target);
+    if (!proto_drv) {
+        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+        return;
+    }
+
+    bdrv_get_geometry(bs, &size);
+    size *= 512;
+    if (mode != NEW_IMAGE_MODE_EXISTING) {
+        assert(format && drv);
+        bdrv_img_create(target, format,
+                        NULL, NULL, NULL, size, flags, &local_err, false);
+    }
+
+    if (error_is_set(&local_err)) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    target_bs = bdrv_new("");
+    ret = bdrv_open(target_bs, target, NULL, flags, drv);
+
+    if (ret < 0) {
+        bdrv_delete(target_bs);
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    backup_start(bs, target_bs, speed, block_job_cb, bs, &local_err);
+    if (local_err != NULL) {
+        bdrv_delete(target_bs);
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
+
 #define DEFAULT_MIRROR_BUF_SIZE   (10 << 20)
 
 void qmp_drive_mirror(const char *device, const char *target,
diff --git a/qapi-schema.json b/qapi-schema.json
index 98cd81c..e716522 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1730,6 +1730,38 @@
             '*speed': 'int' } }
 
 ##
+# @drive-backup
+#
+# Start a point-in-time copy of a block device to a new destination.  The
+# status of ongoing drive-backup operations can be checked with
+# query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
+# The operation can be stopped before it has completed using the
+# block-job-cancel command.
+#
+# @device: the name of the device whose writes should be mirrored.
+#
+# @target: the target of the new image. If the file exists, or if it
+#          is a device, the existing file/device will be used as the new
+#          destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is to
+#          probe if @mode is 'existing', else the format of the source
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+#        'absolute-paths'.
+#
+# @speed: #optional the maximum speed, in bytes per second
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#
+# Since 1.6
+##
+{ 'command': 'drive-backup',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            '*mode': 'NewImageMode', '*speed': 'int' } }
+
+##
 # @drive-mirror
 #
 # Start mirroring a block device's writes to a new destination.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ffd130e..ed7b9d4 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -912,6 +912,12 @@ EQMP
     },
 
     {
+        .name       = "drive-backup",
+        .args_type  = "device:B,target:s,speed:i?,mode:s?,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_drive_backup,
+    },
+
+    {
         .name       = "block-job-set-speed",
         .args_type  = "device:B,speed:o",
         .mhandler.cmd_new = qmp_marshal_input_block_job_set_speed,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 4/8] qemu-iotests: add 055 drive-backup test case
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 3/8] block: add drive-backup QMP command Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular Stefan Hajnoczi
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

Testing drive-backup is similar to image streaming and drive mirroring.
This test case is based on 041.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/055     | 230 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/055.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 236 insertions(+)
 create mode 100755 tests/qemu-iotests/055
 create mode 100644 tests/qemu-iotests/055.out

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
new file mode 100755
index 0000000..bc2eebf
--- /dev/null
+++ b/tests/qemu-iotests/055
@@ -0,0 +1,230 @@
+#!/usr/bin/env python
+#
+# Tests for drive-backup
+#
+# Copyright (C) 2013 Red Hat, Inc.
+#
+# Based on 041.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+target_img = os.path.join(iotests.test_dir, 'target.img')
+
+class DriveBackupTestCase(iotests.QMPTestCase):
+    '''Abstract base class for drive-backup test cases'''
+
+    def assert_no_active_backups(self):
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return', [])
+
+    def cancel_and_wait(self, drive='drive0'):
+        '''Cancel a block job and wait for it to finish'''
+        result = self.vm.qmp('block-job-cancel', device=drive)
+        self.assert_qmp(result, 'return', {})
+
+        cancelled = False
+        while not cancelled:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED' or \
+                   event['event'] == 'BLOCK_JOB_CANCELLED':
+                    self.assert_qmp(event, 'data/type', 'backup')
+                    self.assert_qmp(event, 'data/device', drive)
+                    cancelled = True
+
+        self.assert_no_active_backups()
+
+    def complete_and_wait(self):
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'backup')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+        self.assert_no_active_backups()
+
+    def compare_images(self, img1, img2):
+        try:
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img1, img1 + '.raw')
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img2, img2 + '.raw')
+            file1 = open(img1 + '.raw', 'r')
+            file2 = open(img2 + '.raw', 'r')
+            return file1.read() == file2.read()
+        finally:
+            if file1 is not None:
+                file1.close()
+            if file2 is not None:
+                file2.close()
+            try:
+                os.remove(img1 + '.raw')
+            except OSError:
+                pass
+            try:
+                os.remove(img2 + '.raw')
+            except OSError:
+                pass
+
+class TestSingleDrive(DriveBackupTestCase):
+    image_len = 120 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestSingleDrive.image_len))
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def test_complete(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after backup')
+
+    def test_cancel(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.cancel_and_wait()
+
+    def test_pause(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after backup')
+
+    def test_medium_not_found(self):
+        result = self.vm.qmp('drive-backup', device='ide1-cd0',
+                             target=target_img)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_image_not_found(self):
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_device_not_found(self):
+        result = self.vm.qmp('drive-backup', device='nonexistent',
+                             target=target_img)
+        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+class TestSetSpeed(DriveBackupTestCase):
+    image_len = 80 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestSetSpeed.image_len))
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(target_img)
+
+    def test_set_speed(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        # Default speed is 0
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 0)
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=8 * 1024 * 1024)
+        self.assert_qmp(result, 'return', {})
+
+        # Ensure the speed we set was accepted
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 8 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+        # Check setting speed in drive-backup works
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             target=target_img, speed=4*1024*1024)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 4 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+    def test_set_speed_invalid(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             target=target_img, speed=-1)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('drive-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=-1)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.cancel_and_wait()
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['raw', 'qcow2'])
diff --git a/tests/qemu-iotests/055.out b/tests/qemu-iotests/055.out
new file mode 100644
index 0000000..594c16f
--- /dev/null
+++ b/tests/qemu-iotests/055.out
@@ -0,0 +1,5 @@
+........
+----------------------------------------------------------------------
+Ran 8 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 387b050..e5762f9 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -61,3 +61,4 @@
 052 rw auto backing
 053 rw auto
 054 rw auto
+055 rw auto
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 4/8] qemu-iotests: add 055 drive-backup test case Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-15 19:09   ` Eric Blake
  2013-05-16  2:28   ` Wenchao Xia
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 6/8] blockdev: add DriveBackup transaction Stefan Hajnoczi
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

The QMP 'transaction' command keeps a list of in-flight transactions.
The transaction state structure is called BlkTransactionStates even
though it only deals with a single transaction.  The only plural thing
is the linked list of transaction states.

I find it confusing to call the single structure "States".  This patch
renames it to "State", just like BlockDriverState is singular.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c | 104 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index c7a5e1e..b6109da 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -780,7 +780,7 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
 
 /* New and old BlockDriverState structs for group snapshots */
 
-typedef struct BlkTransactionStates BlkTransactionStates;
+typedef struct BlkTransactionState BlkTransactionState;
 
 /* Only prepare() may fail. In a single transaction, only one of commit() or
    abort() will be called, clean() will always be called if it present. */
@@ -788,13 +788,13 @@ typedef struct BdrvActionOps {
     /* Size of state struct, in bytes. */
     size_t instance_size;
     /* Prepare the work, must NOT be NULL. */
-    void (*prepare)(BlkTransactionStates *common, Error **errp);
+    void (*prepare)(BlkTransactionState *common, Error **errp);
     /* Commit the changes, must NOT be NULL. */
-    void (*commit)(BlkTransactionStates *common);
+    void (*commit)(BlkTransactionState *common);
     /* Abort the changes on fail, can be NULL. */
-    void (*abort)(BlkTransactionStates *common);
+    void (*abort)(BlkTransactionState *common);
     /* Clean up resource in the end, can be NULL. */
-    void (*clean)(BlkTransactionStates *common);
+    void (*clean)(BlkTransactionState *common);
 } BdrvActionOps;
 
 /*
@@ -802,20 +802,20 @@ typedef struct BdrvActionOps {
  * that compiler will also arrange it to the same address with parent instance.
  * Later it will be used in free().
  */
-struct BlkTransactionStates {
+struct BlkTransactionState {
     TransactionAction *action;
     const BdrvActionOps *ops;
-    QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
+    QSIMPLEQ_ENTRY(BlkTransactionState) entry;
 };
 
 /* external snapshot private data */
-typedef struct ExternalSnapshotStates {
-    BlkTransactionStates common;
+typedef struct ExternalSnapshotState {
+    BlkTransactionState common;
     BlockDriverState *old_bs;
     BlockDriverState *new_bs;
-} ExternalSnapshotStates;
+} ExternalSnapshotState;
 
-static void external_snapshot_prepare(BlkTransactionStates *common,
+static void external_snapshot_prepare(BlkTransactionState *common,
                                       Error **errp)
 {
     BlockDriver *proto_drv;
@@ -826,8 +826,8 @@ static void external_snapshot_prepare(BlkTransactionStates *common,
     const char *new_image_file;
     const char *format = "qcow2";
     enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
-    ExternalSnapshotStates *states =
-                             DO_UPCAST(ExternalSnapshotStates, common, common);
+    ExternalSnapshotState *state =
+                             DO_UPCAST(ExternalSnapshotState, common, common);
     TransactionAction *action = common->action;
 
     /* get parameters */
@@ -849,30 +849,30 @@ static void external_snapshot_prepare(BlkTransactionStates *common,
         return;
     }
 
-    states->old_bs = bdrv_find(device);
-    if (!states->old_bs) {
+    state->old_bs = bdrv_find(device);
+    if (!state->old_bs) {
         error_set(errp, QERR_DEVICE_NOT_FOUND, device);
         return;
     }
 
-    if (!bdrv_is_inserted(states->old_bs)) {
+    if (!bdrv_is_inserted(state->old_bs)) {
         error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
         return;
     }
 
-    if (bdrv_in_use(states->old_bs)) {
+    if (bdrv_in_use(state->old_bs)) {
         error_set(errp, QERR_DEVICE_IN_USE, device);
         return;
     }
 
-    if (!bdrv_is_read_only(states->old_bs)) {
-        if (bdrv_flush(states->old_bs)) {
+    if (!bdrv_is_read_only(state->old_bs)) {
+        if (bdrv_flush(state->old_bs)) {
             error_set(errp, QERR_IO_ERROR);
             return;
         }
     }
 
-    flags = states->old_bs->open_flags;
+    flags = state->old_bs->open_flags;
 
     proto_drv = bdrv_find_protocol(new_image_file);
     if (!proto_drv) {
@@ -883,8 +883,8 @@ static void external_snapshot_prepare(BlkTransactionStates *common,
     /* create new image w/backing file */
     if (mode != NEW_IMAGE_MODE_EXISTING) {
         bdrv_img_create(new_image_file, format,
-                        states->old_bs->filename,
-                        states->old_bs->drv->format_name,
+                        state->old_bs->filename,
+                        state->old_bs->drv->format_name,
                         NULL, -1, flags, &local_err, false);
         if (error_is_set(&local_err)) {
             error_propagate(errp, local_err);
@@ -893,42 +893,42 @@ static void external_snapshot_prepare(BlkTransactionStates *common,
     }
 
     /* We will manually add the backing_hd field to the bs later */
-    states->new_bs = bdrv_new("");
+    state->new_bs = bdrv_new("");
     /* TODO Inherit bs->options or only take explicit options with an
      * extended QMP command? */
-    ret = bdrv_open(states->new_bs, new_image_file, NULL,
+    ret = bdrv_open(state->new_bs, new_image_file, NULL,
                     flags | BDRV_O_NO_BACKING, drv);
     if (ret != 0) {
         error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
     }
 }
 
-static void external_snapshot_commit(BlkTransactionStates *common)
+static void external_snapshot_commit(BlkTransactionState *common)
 {
-    ExternalSnapshotStates *states =
-                             DO_UPCAST(ExternalSnapshotStates, common, common);
+    ExternalSnapshotState *state =
+                             DO_UPCAST(ExternalSnapshotState, common, common);
 
-    /* This removes our old bs from the bdrv_states, and adds the new bs */
-    bdrv_append(states->new_bs, states->old_bs);
+    /* This removes our old bs and adds the new bs */
+    bdrv_append(state->new_bs, state->old_bs);
     /* We don't need (or want) to use the transactional
      * bdrv_reopen_multiple() across all the entries at once, because we
      * don't want to abort all of them if one of them fails the reopen */
-    bdrv_reopen(states->new_bs, states->new_bs->open_flags & ~BDRV_O_RDWR,
+    bdrv_reopen(state->new_bs, state->new_bs->open_flags & ~BDRV_O_RDWR,
                 NULL);
 }
 
-static void external_snapshot_abort(BlkTransactionStates *common)
+static void external_snapshot_abort(BlkTransactionState *common)
 {
-    ExternalSnapshotStates *states =
-                             DO_UPCAST(ExternalSnapshotStates, common, common);
-    if (states->new_bs) {
-        bdrv_delete(states->new_bs);
+    ExternalSnapshotState *state =
+                             DO_UPCAST(ExternalSnapshotState, common, common);
+    if (state->new_bs) {
+        bdrv_delete(state->new_bs);
     }
 }
 
 static const BdrvActionOps actions[] = {
     [TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
-        .instance_size = sizeof(ExternalSnapshotStates),
+        .instance_size = sizeof(ExternalSnapshotState),
         .prepare  = external_snapshot_prepare,
         .commit   = external_snapshot_commit,
         .abort = external_snapshot_abort,
@@ -943,10 +943,10 @@ static const BdrvActionOps actions[] = {
 void qmp_transaction(TransactionActionList *dev_list, Error **errp)
 {
     TransactionActionList *dev_entry = dev_list;
-    BlkTransactionStates *states, *next;
+    BlkTransactionState *state, *next;
     Error *local_err = NULL;
 
-    QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionStates) snap_bdrv_states;
+    QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionState) snap_bdrv_states;
     QSIMPLEQ_INIT(&snap_bdrv_states);
 
     /* drain all i/o before any snapshots */
@@ -963,20 +963,20 @@ void qmp_transaction(TransactionActionList *dev_list, Error **errp)
         assert(dev_info->kind < ARRAY_SIZE(actions));
 
         ops = &actions[dev_info->kind];
-        states = g_malloc0(ops->instance_size);
-        states->ops = ops;
-        states->action = dev_info;
-        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry);
+        state = g_malloc0(ops->instance_size);
+        state->ops = ops;
+        state->action = dev_info;
+        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, state, entry);
 
-        states->ops->prepare(states, &local_err);
+        state->ops->prepare(state, &local_err);
         if (error_is_set(&local_err)) {
             error_propagate(errp, local_err);
             goto delete_and_fail;
         }
     }
 
-    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
-        states->ops->commit(states);
+    QSIMPLEQ_FOREACH(state, &snap_bdrv_states, entry) {
+        state->ops->commit(state);
     }
 
     /* success */
@@ -987,17 +987,17 @@ delete_and_fail:
     * failure, and it is all-or-none; abandon each new bs, and keep using
     * the original bs for all images
     */
-    QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
-        if (states->ops->abort) {
-            states->ops->abort(states);
+    QSIMPLEQ_FOREACH(state, &snap_bdrv_states, entry) {
+        if (state->ops->abort) {
+            state->ops->abort(state);
         }
     }
 exit:
-    QSIMPLEQ_FOREACH_SAFE(states, &snap_bdrv_states, entry, next) {
-        if (states->ops->clean) {
-            states->ops->clean(states);
+    QSIMPLEQ_FOREACH_SAFE(state, &snap_bdrv_states, entry, next) {
+        if (state->ops->clean) {
+            state->ops->clean(state);
         }
-        g_free(states);
+        g_free(state);
     }
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 6/8] blockdev: add DriveBackup transaction
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-15 19:13   ` Eric Blake
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction Stefan Hajnoczi
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

This patch adds a transactional version of the drive-backup QMP command.
It allows atomic snapshots of multiple drives along with automatic
cleanup if there is a failure to start one of the backup jobs.

Note that QMP events are emitted for block job completion/cancellation
and the block job will be listed by query-block-jobs.

@DriveBackup

@device: the name of the device whose writes should be mirrored.

@target: the target of the new image. If the file exists, or if it
         is a device, the existing file/device will be used as the new
         destination.  If it does not exist, a new file will be created.

@format: #optional the format of the new destination, default is to
         probe if @mode is 'existing', else the format of the source

@mode: #optional whether and how QEMU should create a new image, default is
       'absolute-paths'.

@speed: #optional the maximum speed, in bytes per second

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c       | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 24 +++++++++++++++++++++++-
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index b6109da..c386bb6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -926,6 +926,53 @@ static void external_snapshot_abort(BlkTransactionState *common)
     }
 }
 
+typedef struct DriveBackupState {
+    BlkTransactionState common;
+    BlockDriverState *bs;
+    BlockJob *job;
+} DriveBackupState;
+
+static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
+{
+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+    DriveBackup *backup;
+    Error *local_err = NULL;
+
+    assert(common->action->kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
+    backup = common->action->drive_backup;
+
+    qmp_drive_backup(backup->device, backup->target,
+                     backup->has_format, backup->format,
+                     backup->has_mode, backup->mode,
+                     backup->has_speed, backup->speed,
+                     &local_err);
+    if (error_is_set(&local_err)) {
+        error_propagate(errp, local_err);
+        state->bs = NULL;
+        state->job = NULL;
+        return;
+    }
+
+    state->bs = bdrv_find(backup->device);
+    state->job = state->bs->job;
+}
+
+static void drive_backup_commit(BlkTransactionState *common)
+{
+    /* Block job has started, nothing to do here */
+}
+
+static void drive_backup_abort(BlkTransactionState *common)
+{
+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+    BlockDriverState *bs = state->bs;
+
+    /* Only cancel if it's the job we started */
+    if (bs && bs->job && bs->job == state->job) {
+        block_job_cancel_sync(bs->job);
+    }
+}
+
 static const BdrvActionOps actions[] = {
     [TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
         .instance_size = sizeof(ExternalSnapshotState),
@@ -933,6 +980,12 @@ static const BdrvActionOps actions[] = {
         .commit   = external_snapshot_commit,
         .abort = external_snapshot_abort,
     },
+    [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
+        .instance_size = sizeof(DriveBackupState),
+        .prepare = drive_backup_prepare,
+        .commit = drive_backup_commit,
+        .abort = drive_backup_abort,
+    },
 };
 
 /*
diff --git a/qapi-schema.json b/qapi-schema.json
index e716522..9ca9352 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1609,6 +1609,27 @@
             '*mode': 'NewImageMode' } }
 
 ##
+# @DriveBackup
+#
+# @device: the name of the device whose writes should be mirrored.
+#
+# @target: the target of the new image. If the file exists, or if it
+#          is a device, the existing file/device will be used as the new
+#          destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is to
+#          probe if @mode is 'existing', else the format of the source
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+#        'absolute-paths'.
+#
+# @speed: #optional the maximum speed, in bytes per second
+##
+{ 'type': 'DriveBackup',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            '*mode': 'NewImageMode', '*speed': 'int' } }
+
+##
 # @TransactionAction
 #
 # A discriminated record of operations that can be performed with
@@ -1616,7 +1637,8 @@
 ##
 { 'union': 'TransactionAction',
   'data': {
-       'blockdev-snapshot-sync': 'BlockdevSnapshot'
+       'blockdev-snapshot-sync': 'BlockdevSnapshot',
+       'drive-backup': 'DriveBackup'
    } }
 
 ##
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 6/8] blockdev: add DriveBackup transaction Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-15 19:01   ` Eric Blake
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 8/8] qemu-iotests: test 'drive-backup' transaction in 055 Stefan Hajnoczi
  2013-05-16  6:16 ` [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Wenchao Xia
  8 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

The Abort action can be used to test QMP 'transaction' failure.  Add it
as the last action to exercise the .abort() and .cleanup() code paths
for all previous actions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c       | 15 +++++++++++++++
 qapi-schema.json | 11 ++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index c386bb6..fcce219 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -973,6 +973,16 @@ static void drive_backup_abort(BlkTransactionState *common)
     }
 }
 
+static void abort_prepare(BlkTransactionState *common, Error **errp)
+{
+    error_setg(errp, "Transaction aborted using Abort action");
+}
+
+static void abort_commit(BlkTransactionState *common)
+{
+    assert(false); /* this action never succeeds */
+}
+
 static const BdrvActionOps actions[] = {
     [TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
         .instance_size = sizeof(ExternalSnapshotState),
@@ -986,6 +996,11 @@ static const BdrvActionOps actions[] = {
         .commit = drive_backup_commit,
         .abort = drive_backup_abort,
     },
+    [TRANSACTION_ACTION_KIND_ABORT] = {
+        .instance_size = sizeof(BlkTransactionState),
+        .prepare = abort_prepare,
+        .commit = abort_commit,
+    },
 };
 
 /*
diff --git a/qapi-schema.json b/qapi-schema.json
index 9ca9352..15221a5 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1630,6 +1630,14 @@
             '*mode': 'NewImageMode', '*speed': 'int' } }
 
 ##
+# @Abort
+#
+# This action can be used to test transaction failure.
+###
+{ 'type': 'Abort',
+  'data': { } }
+
+##
 # @TransactionAction
 #
 # A discriminated record of operations that can be performed with
@@ -1638,7 +1646,8 @@
 { 'union': 'TransactionAction',
   'data': {
        'blockdev-snapshot-sync': 'BlockdevSnapshot',
-       'drive-backup': 'DriveBackup'
+       'drive-backup': 'DriveBackup',
+       'abort': 'Abort'
    } }
 
 ##
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 8/8] qemu-iotests: test 'drive-backup' transaction in 055
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction Stefan Hajnoczi
@ 2013-05-15 14:34 ` Stefan Hajnoczi
  2013-05-16  6:16 ` [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Wenchao Xia
  8 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-15 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, dietmar, imain, Stefan Hajnoczi,
	Paolo Bonzini, xiawenc

Ensure that the 'drive-backup' transaction works and check that a
transaction abort works properly.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/055     | 118 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/055.out |   4 +-
 2 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index bc2eebf..d61993f 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -226,5 +226,123 @@ class TestSetSpeed(DriveBackupTestCase):
 
         self.cancel_and_wait()
 
+class TestSingleTransaction(DriveBackupTestCase):
+    image_len = 120 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestSingleDrive.image_len))
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def test_complete(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('transaction', actions=[{
+                'type': 'drive-backup',
+                'data': { 'device': 'drive0',
+                          'target': target_img },
+            }
+        ])
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after backup')
+
+    def test_cancel(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('transaction', actions=[{
+                'type': 'drive-backup',
+                'data': { 'device': 'drive0',
+                          'target': target_img },
+            }
+        ])
+        self.assert_qmp(result, 'return', {})
+
+        self.cancel_and_wait()
+
+    def test_pause(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('transaction', actions=[{
+                'type': 'drive-backup',
+                'data': { 'device': 'drive0',
+                          'target': target_img },
+            }
+        ])
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after backup')
+
+    def test_medium_not_found(self):
+        result = self.vm.qmp('transaction', actions=[{
+                'type': 'drive-backup',
+                'data': { 'device': 'ide1-cd0',
+                          'target': target_img },
+            }
+        ])
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_image_not_found(self):
+        result = self.vm.qmp('transaction', actions=[{
+                'type': 'drive-backup',
+                'data': { 'device': 'drive0',
+                          'mode': 'existing',
+                          'target': target_img },
+            }
+        ])
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_device_not_found(self):
+        result = self.vm.qmp('transaction', actions=[{
+                'type': 'drive-backup',
+                'data': { 'device': 'nonexistent',
+                          'mode': 'existing',
+                          'target': target_img },
+            }
+        ])
+        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+    def test_abort(self):
+        result = self.vm.qmp('transaction', actions=[{
+                'type': 'drive-backup',
+                'data': { 'device': 'nonexistent',
+                          'mode': 'existing',
+                          'target': target_img },
+            }, {
+                'type': 'Abort',
+                'data': {},
+            }
+        ])
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_no_active_backups()
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['raw', 'qcow2'])
diff --git a/tests/qemu-iotests/055.out b/tests/qemu-iotests/055.out
index 594c16f..96961ed 100644
--- a/tests/qemu-iotests/055.out
+++ b/tests/qemu-iotests/055.out
@@ -1,5 +1,5 @@
-........
+...............
 ----------------------------------------------------------------------
-Ran 8 tests
+Ran 15 tests
 
 OK
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb()
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb() Stefan Hajnoczi
@ 2013-05-15 14:42   ` Paolo Bonzini
  2013-05-16  2:42     ` Wenchao Xia
  2013-05-16  8:11     ` Stefan Hajnoczi
  0 siblings, 2 replies; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-15 14:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, xiawenc, imain, dietmar

Il 15/05/2013 16:34, Stefan Hajnoczi ha scritto:
> The bdrv_add_before_write_cb() function installs a callback that is
> invoked before a write request is processed.  This will be used to
> implement copy-on-write point-in-time snapshots where we need to copy
> out old data before overwriting it.

Perhaps a notifier list that receives the BdrvTrackedRequest?  (BTW we
should probably remove all the notifier_remove wrappers, they're useless).

The BdrvTrackedRequest pointer would also act as a unique id of the request.

> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c                   | 37 +++++++++++++++++++++++++++++++++++++
>  include/block/block_int.h | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 3f87489..0fd7167 100644
> --- a/block.c
> +++ b/block.c
> @@ -308,6 +308,7 @@ BlockDriverState *bdrv_new(const char *device_name)
>      }
>      bdrv_iostatus_disable(bs);
>      notifier_list_init(&bs->close_notifiers);
> +    QTAILQ_INIT(&bs->before_write_cbs);
>  
>      return bs;
>  }
> @@ -1383,6 +1384,8 @@ void bdrv_close(BlockDriverState *bs)
>          bs->growable = 0;
>          QDECREF(bs->options);
>          bs->options = NULL;
> +        assert(QTAILQ_EMPTY(&bs->before_write_cbs));
> +        QTAILQ_INIT(&bs->before_write_cbs);

INIT not needed if you assert before.

Paolo

>  
>          if (bs->file != NULL) {
>              bdrv_delete(bs->file);
> @@ -2587,6 +2590,22 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>      return ret;
>  }
>  
> +struct BDRVBeforeWrite {
> +    BDRVBeforeWriteFunc *cb;
> +    QTAILQ_ENTRY(BDRVBeforeWrite) list;
> +};
> +
> +static void invoke_before_write_cb(BlockDriverState *bs, int64_t sector_num,
> +                                   int nb_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVBeforeWrite *before_write;
> +    BDRVBeforeWrite *tmp;
> +    QTAILQ_FOREACH_SAFE(before_write, &bs->before_write_cbs, list, tmp) {
> +        before_write->cb(bs, sector_num, nb_sectors, qiov);
> +    }
> +}
> +
> +
>  /*
>   * Handle a write request in coroutine context
>   */
> @@ -2619,6 +2638,8 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>  
>      tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
>  
> +    invoke_before_write_cb(bs, sector_num, nb_sectors, qiov);
> +
>      if (flags & BDRV_REQ_ZERO_WRITE) {
>          ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
>      } else {
> @@ -4883,3 +4904,19 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs)
>      /* Currently BlockDriverState always uses the main loop AioContext */
>      return qemu_get_aio_context();
>  }
> +
> +BDRVBeforeWrite *bdrv_add_before_write_cb(BlockDriverState *bs,
> +                                          BDRVBeforeWriteFunc *cb)
> +{
> +    BDRVBeforeWrite *elem = g_slice_new(BDRVBeforeWrite);
> +    elem->cb = cb;
> +    QTAILQ_INSERT_TAIL(&bs->before_write_cbs, elem, list);
> +    return elem;
> +}
> +
> +void bdrv_remove_before_write_cb(BlockDriverState *bs,
> +                                 BDRVBeforeWrite *before_write)
> +{
> +    QTAILQ_REMOVE(&bs->before_write_cbs, before_write, list);
> +    g_slice_free(BDRVBeforeWrite, before_write);
> +}
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 6078dd3..e2299df 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -211,6 +211,16 @@ struct BlockDriver {
>      QLIST_ENTRY(BlockDriver) list;
>  };
>  
> +/**
> + * BDRVBeforeWriteFunc:
> + *
> + * See #bdrv_add_before_write_cb().
> + */
> +typedef void coroutine_fn BDRVBeforeWriteFunc(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
> +
> +typedef struct BDRVBeforeWrite BDRVBeforeWrite;
> +
>  /*
>   * Note: the function bdrv_append() copies and swaps contents of
>   * BlockDriverStates, so if you add new fields to this struct, please
> @@ -289,6 +299,9 @@ struct BlockDriverState {
>      /* long-running background operation */
>      BlockJob *job;
>  
> +    /* Callback before write request is processed */
> +    QTAILQ_HEAD(, BDRVBeforeWrite) before_write_cbs;
> +
>      QDict *options;
>  };
>  
> @@ -298,6 +311,25 @@ void bdrv_set_io_limits(BlockDriverState *bs,
>                          BlockIOLimit *io_limits);
>  
>  /**
> + * bdrv_add_before_write_cb:
> + *
> + * Register a callback that is invoked before write requests are processed but
> + * after any throttling or waiting for overlapping requests.
> + *
> + * Returns: a #BDRVBeforeWrite to use with bdrv_remove_before_write_cb()
> + */
> +BDRVBeforeWrite *bdrv_add_before_write_cb(BlockDriverState *bs,
> +                                          BDRVBeforeWriteFunc *cb);
> +
> +/**
> + * bdrv_remove_before_write_cb:
> + *
> + * Unregister a before write callback.
> + */
> +void bdrv_remove_before_write_cb(BlockDriverState *bs,
> +                                 BDRVBeforeWrite *before_write);
> +
> +/**
>   * bdrv_get_aio_context:
>   *
>   * Returns: the currently bound #AioContext
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction Stefan Hajnoczi
@ 2013-05-15 19:01   ` Eric Blake
  2013-05-16  2:26     ` Wenchao Xia
  2013-05-16  7:37     ` Stefan Hajnoczi
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Blake @ 2013-05-15 19:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Paolo Bonzini,
	xiawenc

[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]

On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
> The Abort action can be used to test QMP 'transaction' failure.  Add it
> as the last action to exercise the .abort() and .cleanup() code paths
> for all previous actions.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  blockdev.c       | 15 +++++++++++++++
>  qapi-schema.json | 11 ++++++++++-
>  2 files changed, 25 insertions(+), 1 deletion(-)

I like it!  Even though libvirt will never use it, I can see how having
it definitely helps development of other transaction actions.

> +++ b/qapi-schema.json
> @@ -1630,6 +1630,14 @@
>              '*mode': 'NewImageMode', '*speed': 'int' } }
>  
>  ##
> +# @Abort
> +#
> +# This action can be used to test transaction failure.
> +###
> +{ 'type': 'Abort',
> +  'data': { } }
> +

Probably should add a Since: 1.6 notation.  With that,

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/8] block: add drive-backup QMP command
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 3/8] block: add drive-backup QMP command Stefan Hajnoczi
@ 2013-05-15 19:04   ` Eric Blake
  2013-05-16  7:31     ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Blake @ 2013-05-15 19:04 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Paolo Bonzini,
	xiawenc

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
> @drive-backup
> 
> Start a point-in-time copy of a block device to a new destination.  The
> status of ongoing drive-backup operations can be checked with
> query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
> The operation can be stopped before it has completed using the
> block-job-cancel command.
> 

> +++ b/qmp-commands.hx
> @@ -912,6 +912,12 @@ EQMP
>      },
>  
>      {
> +        .name       = "drive-backup",
> +        .args_type  = "device:B,target:s,speed:i?,mode:s?,format:s?",
> +        .mhandler.cmd_new = qmp_marshal_input_drive_backup,
> +    },

No example?  But I'm not going to block this patch because of that.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular Stefan Hajnoczi
@ 2013-05-15 19:09   ` Eric Blake
  2013-05-16  2:28   ` Wenchao Xia
  1 sibling, 0 replies; 57+ messages in thread
From: Eric Blake @ 2013-05-15 19:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Paolo Bonzini,
	xiawenc

[-- Attachment #1: Type: text/plain, Size: 995 bytes --]

On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
> The QMP 'transaction' command keeps a list of in-flight transactions.
> The transaction state structure is called BlkTransactionStates even
> though it only deals with a single transaction.  The only plural thing
> is the linked list of transaction states.
> 
> I find it confusing to call the single structure "States".  This patch
> renames it to "State", just like BlockDriverState is singular.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  blockdev.c | 104 ++++++++++++++++++++++++++++++-------------------------------
>  1 file changed, 52 insertions(+), 52 deletions(-)

Mechanical, and singular makes sense to me (for a given callback, you
are dealing with the single state for that callback, not all the states
chained into the transaction).

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/8] blockdev: add DriveBackup transaction
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 6/8] blockdev: add DriveBackup transaction Stefan Hajnoczi
@ 2013-05-15 19:13   ` Eric Blake
  2013-05-16  7:36     ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Blake @ 2013-05-15 19:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Paolo Bonzini,
	xiawenc

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]

On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
> This patch adds a transactional version of the drive-backup QMP command.
> It allows atomic snapshots of multiple drives along with automatic
> cleanup if there is a failure to start one of the backup jobs.
> 
> Note that QMP events are emitted for block job completion/cancellation
> and the block job will be listed by query-block-jobs.
> 

> +
> +static void drive_backup_abort(BlkTransactionState *common)
> +{
> +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> +    BlockDriverState *bs = state->bs;
> +
> +    /* Only cancel if it's the job we started */
> +    if (bs && bs->job && bs->job == state->job) {
> +        block_job_cancel_sync(bs->job);
> +    }

Question - if starting the job created the target file, should aborting
the job unlink() that file so that we aren't polluting the file system?

> +++ b/qapi-schema.json
> @@ -1609,6 +1609,27 @@
>              '*mode': 'NewImageMode' } }
>  
>  ##
> +# @DriveBackup
> +#
> +# @device: the name of the device whose writes should be mirrored.
> +#
> +# @target: the target of the new image. If the file exists, or if it
> +#          is a device, the existing file/device will be used as the new
> +#          destination.  If it does not exist, a new file will be created.
> +#
> +# @format: #optional the format of the new destination, default is to
> +#          probe if @mode is 'existing', else the format of the source
> +#
> +# @mode: #optional whether and how QEMU should create a new image, default is
> +#        'absolute-paths'.
> +#
> +# @speed: #optional the maximum speed, in bytes per second
> +##

Mention "Since: 1.6"

> +{ 'type': 'DriveBackup',
> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> +            '*mode': 'NewImageMode', '*speed': 'int' } }
> +
> +##
>  # @TransactionAction
>  #
>  # A discriminated record of operations that can be performed with
> @@ -1616,7 +1637,8 @@
>  ##
>  { 'union': 'TransactionAction',
>    'data': {
> -       'blockdev-snapshot-sync': 'BlockdevSnapshot'
> +       'blockdev-snapshot-sync': 'BlockdevSnapshot',
> +       'drive-backup': 'DriveBackup'
>     } }
>  
>  ##
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction
  2013-05-15 19:01   ` Eric Blake
@ 2013-05-16  2:26     ` Wenchao Xia
  2013-05-16  7:37     ` Stefan Hajnoczi
  1 sibling, 0 replies; 57+ messages in thread
From: Wenchao Xia @ 2013-05-16  2:26 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi,
	Paolo Bonzini, dietmar

于 2013-5-16 3:01, Eric Blake 写道:
> On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
>> The Abort action can be used to test QMP 'transaction' failure.  Add it
>> as the last action to exercise the .abort() and .cleanup() code paths
>> for all previous actions.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   blockdev.c       | 15 +++++++++++++++
>>   qapi-schema.json | 11 ++++++++++-
>>   2 files changed, 25 insertions(+), 1 deletion(-)
>
> I like it!  Even though libvirt will never use it, I can see how having
> it definitely helps development of other transaction actions.
>
>> +++ b/qapi-schema.json
>> @@ -1630,6 +1630,14 @@
>>               '*mode': 'NewImageMode', '*speed': 'int' } }
>>
>>   ##
>> +# @Abort
>> +#
>> +# This action can be used to test transaction failure.
>> +###
>> +{ 'type': 'Abort',
>> +  'data': { } }
>> +
>
> Probably should add a Since: 1.6 notation.  With that,
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
   Nice to have it for injecting error.

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>

-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular Stefan Hajnoczi
  2013-05-15 19:09   ` Eric Blake
@ 2013-05-16  2:28   ` Wenchao Xia
  1 sibling, 0 replies; 57+ messages in thread
From: Wenchao Xia @ 2013-05-16  2:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar

> The QMP 'transaction' command keeps a list of in-flight transactions.
> The transaction state structure is called BlkTransactionStates even
> though it only deals with a single transaction.  The only plural thing
> is the linked list of transaction states.
> 
> I find it confusing to call the single structure "States".  This patch
> renames it to "State", just like BlockDriverState is singular.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockdev.c | 104 ++++++++++++++++++++++++++++++-------------------------------
>   1 file changed, 52 insertions(+), 52 deletions(-)
> 

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>




-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb()
  2013-05-15 14:42   ` Paolo Bonzini
@ 2013-05-16  2:42     ` Wenchao Xia
  2013-05-16  8:11     ` Stefan Hajnoczi
  1 sibling, 0 replies; 57+ messages in thread
From: Wenchao Xia @ 2013-05-16  2:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, dietmar

Reviewed the code, except Paolo's comments, function seems fine.

> Il 15/05/2013 16:34, Stefan Hajnoczi ha scritto:
>> The bdrv_add_before_write_cb() function installs a callback that is
>> invoked before a write request is processed.  This will be used to
>> implement copy-on-write point-in-time snapshots where we need to copy
>> out old data before overwriting it.
>
> Perhaps a notifier list that receives the BdrvTrackedRequest?  (BTW we
> should probably remove all the notifier_remove wrappers, they're useless).
>
> The BdrvTrackedRequest pointer would also act as a unique id of the request.
>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block.c                   | 37 +++++++++++++++++++++++++++++++++++++
>>   include/block/block_int.h | 32 ++++++++++++++++++++++++++++++++
>>   2 files changed, 69 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 3f87489..0fd7167 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -308,6 +308,7 @@ BlockDriverState *bdrv_new(const char *device_name)
>>       }
>>       bdrv_iostatus_disable(bs);
>>       notifier_list_init(&bs->close_notifiers);
>> +    QTAILQ_INIT(&bs->before_write_cbs);
>>
>>       return bs;
>>   }
>> @@ -1383,6 +1384,8 @@ void bdrv_close(BlockDriverState *bs)
>>           bs->growable = 0;
>>           QDECREF(bs->options);
>>           bs->options = NULL;
>> +        assert(QTAILQ_EMPTY(&bs->before_write_cbs));
>> +        QTAILQ_INIT(&bs->before_write_cbs);
>
> INIT not needed if you assert before.
>
> Paolo
>


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver Stefan Hajnoczi
@ 2013-05-16  3:27   ` Wenchao Xia
  2013-05-16  7:30     ` Stefan Hajnoczi
  2013-05-20 11:30   ` Paolo Bonzini
  1 sibling, 1 reply; 57+ messages in thread
From: Wenchao Xia @ 2013-05-16  3:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar

> From: Dietmar Maurer <dietmar@proxmox.com>
> 
> backup_start() creates a block job that copies a point-in-time snapshot
> of a block device to a target block device.
> 
> We call backup_do_cow() for each write during backup. That function
> reads the original data from the block device before it gets
> overwritten.  The data is then written to the target device.
> 
> Currently backup cluster size is hardcoded to 65536 bytes.
> 
> [I made a number of changes to Dietmar's original patch and folded them
> in to make code review easy.  Here is the full list:
> 
>   * Drop BackupDumpFunc interface in favor of a target block device
>   * Detect zero clusters with buffer_is_zero()
>   * Don't write zero clusters to the target
>   * Use 0 delay instead of 1us, like other block jobs
>   * Unify creation/start functions into backup_start()
>   * Simplify cleanup, free bitmap in backup_run() instead of cb
>   * function
>   * Use HBitmap to avoid duplicating bitmap code
>   * Use bdrv_getlength() instead of accessing ->total_sectors
>   * directly
>   * Delete the backup.h header file, it is no longer necessary
>   * Move ./backup.c to block/backup.c
>   * Remove #ifdefed out code
>   * Coding style and whitespace cleanups
>   * Use bdrv_add_before_write_cb() instead of blockjob-specific hooks
>   * Keep our own in-flight CowRequest list instead of using block.c
>     tracked requests
> 
> -- stefanha]
> 
> Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/Makefile.objs       |   1 +
>   block/backup.c            | 282 ++++++++++++++++++++++++++++++++++++++++++++++
>   include/block/block_int.h |  16 +++
>   3 files changed, 299 insertions(+)
>   create mode 100644 block/backup.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 5f0358a..88bd101 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -20,5 +20,6 @@ endif
>   common-obj-y += stream.o
>   common-obj-y += commit.o
>   common-obj-y += mirror.o
> +common-obj-y += backup.o
> 
>   $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS)
> diff --git a/block/backup.c b/block/backup.c
> new file mode 100644
> index 0000000..01d2fd3
> --- /dev/null
> +++ b/block/backup.c
> @@ -0,0 +1,282 @@
> +/*
> + * QEMU backup
> + *
> + * Copyright (C) 2013 Proxmox Server Solutions
> + *
> + * Authors:
> + *  Dietmar Maurer (dietmar@proxmox.com)
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include <stdio.h>
> +#include <errno.h>
> +#include <unistd.h>
> +
> +#include "block/block.h"
> +#include "block/block_int.h"
> +#include "block/blockjob.h"
> +#include "qemu/ratelimit.h"
> +
> +#define DEBUG_BACKUP 0
> +
> +#define DPRINTF(fmt, ...) \
> +    do { \
> +        if (DEBUG_BACKUP) { \
> +            fprintf(stderr, "backup: " fmt, ## __VA_ARGS__); \
> +        } \
> +    } while (0)
> +
> +#define BACKUP_CLUSTER_BITS 16
> +#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
> +#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
> +
> +#define SLICE_TIME 100000000ULL /* ns */
> +
> +typedef struct CowRequest {
> +    int64_t start;
> +    int64_t end;
> +    QLIST_ENTRY(CowRequest) list;
> +    CoQueue wait_queue; /* coroutines blocked on this request */
> +} CowRequest;
> +
> +typedef struct BackupBlockJob {
> +    BlockJob common;
> +    BlockDriverState *target;
> +    RateLimit limit;
> +    CoRwlock flush_rwlock;
> +    uint64_t sectors_read;
> +    HBitmap *bitmap;
> +    QLIST_HEAD(, CowRequest) inflight_reqs;
> +} BackupBlockJob;
> +
> +/* See if in-flight requests overlap and wait for them to complete */
> +static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
> +                                                       int64_t start,
> +                                                       int64_t end)
> +{
> +    CowRequest *req;
> +    bool retry;
> +
> +    do {
> +        retry = false;
> +        QLIST_FOREACH(req, &job->inflight_reqs, list) {
> +            if (end > req->start && start < req->end) {
> +                qemu_co_queue_wait(&req->wait_queue);
> +                retry = true;
> +                break;
> +            }
> +        }
> +    } while (retry);
> +}
> +

  In my understanding, there will be possible two program routines entering
here at same time since it holds read lock instead of write lock, and
they may also modify job->inflight_reqs, is it possible a race
condition here? I am not sure whether back-ground job will becomes a
thread.



-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 8/8] qemu-iotests: test 'drive-backup' transaction in 055 Stefan Hajnoczi
@ 2013-05-16  6:16 ` Wenchao Xia
  2013-05-16  7:47   ` Stefan Hajnoczi
  8 siblings, 1 reply; 57+ messages in thread
From: Wenchao Xia @ 2013-05-16  6:16 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar

于 2013-5-15 22:34, Stefan Hajnoczi 写道:
> Note: These patches apply to my block-next tree.  You can also grab the code
> from git here:
> git://github.com/stefanha/qemu.git block-backup-core
> 
> This series adds a new QMP command, drive-backup, which takes a point-in-time
> snapshot of a block device.  The snapshot is copied out to a target block
> device.  A simple example is:
> 
>    drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2
> 
> The original drive-backup blockjob was written by Dietmar Maurer
> <dietmar@proxmox.com>.  He is currently busy but I feel the feature is worth
> pushing into QEMU since there has been interest.  This is my version of his
> patch, plus the QMP command and qemu-iotests test case.
> 
> QMP 'transaction' support is included since v3.  It adds support for atomic
> snapshots of multiple block devices.  I also added an 'abort' transaction to
> allow testing of the .abort()/.cleanup() code path.  Thanks to Wenchao for
> making qmp_transaction() extensible.
> 
> How is this different from block-stream and drive-mirror?
> ---------------------------------------------------------
> Both block-stream and drive-mirror do not provide immediate point-in-time
> snapshots.  Instead they copy data into a new file and then switch to it.  In
> other words, the point at which the "snapshot" is taken cannot be controlled
> directly.
> 
> drive-backup intercepts guest writes and saves data into the target block
> device before it is overwritten.  The target block device can be a raw image
> file, backing files are not used to implement this feature.
> 
> How can drive-backup be used?
> -----------------------------
> The simplest use-case is to copy a point-in-time snapshot to a local file.
> 
> More advanced users may wish to make the target an NBD URL.  The NBD server
> listening on the other side can process the backup writes any way it wishes.  I
> previously posted an RFC series with a backup server that streamed Dietmar's
> VMA backup archive format.
> 
> What's next for drive-backup?
> -----------------------------
> 1. Sync modes like drive-mirror (top, full, none).  This makes it possible to
>     preserve the backing file chain.
> 
> v3:
>   * Rename to drive-backup for consistency with drive-mirror [kwolf]
>   * Add QMP transaction support [kwolf]
>   * Introduce bdrv_add_before_write_cb() to hook writes
>   * Mention 'query-block-jobs' lists job of type 'backup' [eblake]
>   * Rename rwlock to flush_rwlock [kwolf]
>   * Fix space in block/backup.c comment [kwolf]
> 
> v2:
>   * s/block_backup/block-backup/ in commit message [eblake]
>   * Avoid funny spacing in QMP docs [eblake]
>   * Document query-block-jobs and block-job-cancel usage [eblake]

  After checking the code, I found it possible to add delta data backup
support also, If an additional dirty bitmap was added. Compared with
current solution, I think it is doing COW at qemu device level:

    qemu device
        |
general block layer
        |
virtual format layer
        |
-----------------------
|                     |
qcow2             vmdk....

  This will make things complicated when more works comes, a better
place for block COW, is under general block layer. Maybe later we
can adjust block for it.


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-16  3:27   ` Wenchao Xia
@ 2013-05-16  7:30     ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-16  7:30 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar

On Thu, May 16, 2013 at 11:27:38AM +0800, Wenchao Xia wrote:
> > +/* See if in-flight requests overlap and wait for them to complete */
> > +static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
> > +                                                       int64_t start,
> > +                                                       int64_t end)
> > +{
> > +    CowRequest *req;
> > +    bool retry;
> > +
> > +    do {
> > +        retry = false;
> > +        QLIST_FOREACH(req, &job->inflight_reqs, list) {
> > +            if (end > req->start && start < req->end) {
> > +                qemu_co_queue_wait(&req->wait_queue);
> > +                retry = true;
> > +                break;
> > +            }
> > +        }
> > +    } while (retry);
> > +}
> > +
> 
>   In my understanding, there will be possible two program routines entering
> here at same time since it holds read lock instead of write lock, and
> they may also modify job->inflight_reqs, is it possible a race
> condition here? I am not sure whether back-ground job will becomes a
> thread.

No, all operations on a BlockDriverState execute in the same event loop
thread.  Only coroutine synchronization is necessary, which is provided
in these patches.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/8] block: add drive-backup QMP command
  2013-05-15 19:04   ` Eric Blake
@ 2013-05-16  7:31     ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-16  7:31 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Paolo Bonzini,
	xiawenc

On Wed, May 15, 2013 at 01:04:55PM -0600, Eric Blake wrote:
> On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
> > @drive-backup
> > 
> > Start a point-in-time copy of a block device to a new destination.  The
> > status of ongoing drive-backup operations can be checked with
> > query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
> > The operation can be stopped before it has completed using the
> > block-job-cancel command.
> > 
> 
> > +++ b/qmp-commands.hx
> > @@ -912,6 +912,12 @@ EQMP
> >      },
> >  
> >      {
> > +        .name       = "drive-backup",
> > +        .args_type  = "device:B,target:s,speed:i?,mode:s?,format:s?",
> > +        .mhandler.cmd_new = qmp_marshal_input_drive_backup,
> > +    },
> 
> No example?  But I'm not going to block this patch because of that.

I'll add an example in v4.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/8] blockdev: add DriveBackup transaction
  2013-05-15 19:13   ` Eric Blake
@ 2013-05-16  7:36     ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-16  7:36 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Paolo Bonzini,
	xiawenc

On Wed, May 15, 2013 at 01:13:26PM -0600, Eric Blake wrote:
> On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
> > This patch adds a transactional version of the drive-backup QMP command.
> > It allows atomic snapshots of multiple drives along with automatic
> > cleanup if there is a failure to start one of the backup jobs.
> > 
> > Note that QMP events are emitted for block job completion/cancellation
> > and the block job will be listed by query-block-jobs.
> > 
> 
> > +
> > +static void drive_backup_abort(BlkTransactionState *common)
> > +{
> > +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> > +    BlockDriverState *bs = state->bs;
> > +
> > +    /* Only cancel if it's the job we started */
> > +    if (bs && bs->job && bs->job == state->job) {
> > +        block_job_cancel_sync(bs->job);
> > +    }
> 
> Question - if starting the job created the target file, should aborting
> the job unlink() that file so that we aren't polluting the file system?

blockdev-snapshot-sync action does not delete the file on abort.

> > +++ b/qapi-schema.json
> > @@ -1609,6 +1609,27 @@
> >              '*mode': 'NewImageMode' } }
> >  
> >  ##
> > +# @DriveBackup
> > +#
> > +# @device: the name of the device whose writes should be mirrored.
> > +#
> > +# @target: the target of the new image. If the file exists, or if it
> > +#          is a device, the existing file/device will be used as the new
> > +#          destination.  If it does not exist, a new file will be created.
> > +#
> > +# @format: #optional the format of the new destination, default is to
> > +#          probe if @mode is 'existing', else the format of the source
> > +#
> > +# @mode: #optional whether and how QEMU should create a new image, default is
> > +#        'absolute-paths'.
> > +#
> > +# @speed: #optional the maximum speed, in bytes per second
> > +##
> 
> Mention "Since: 1.6"

Will add in v4.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction
  2013-05-15 19:01   ` Eric Blake
  2013-05-16  2:26     ` Wenchao Xia
@ 2013-05-16  7:37     ` Stefan Hajnoczi
  1 sibling, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-16  7:37 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Paolo Bonzini,
	xiawenc

On Wed, May 15, 2013 at 01:01:15PM -0600, Eric Blake wrote:
> On 05/15/2013 08:34 AM, Stefan Hajnoczi wrote:
> > +++ b/qapi-schema.json
> > @@ -1630,6 +1630,14 @@
> >              '*mode': 'NewImageMode', '*speed': 'int' } }
> >  
> >  ##
> > +# @Abort
> > +#
> > +# This action can be used to test transaction failure.
> > +###
> > +{ 'type': 'Abort',
> > +  'data': { } }
> > +
> 
> Probably should add a Since: 1.6 notation.  With that,

Will add in v4.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-16  6:16 ` [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Wenchao Xia
@ 2013-05-16  7:47   ` Stefan Hajnoczi
  2013-05-17  6:58     ` Wenchao Xia
  2013-05-17 10:17     ` Paolo Bonzini
  0 siblings, 2 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-16  7:47 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar

On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
>   After checking the code, I found it possible to add delta data backup
> support also, If an additional dirty bitmap was added.

I've been thinking about this.  Incremental backups need to know which
blocks have changed, but keeping a persistent dirty bitmap is expensive
and unnecessary.

Backup applications need to support the full backup case anyway for
their first run.  Therefore we can keep a best-effort dirty bitmap which
is persisted only when the guest is terminated cleanly.  If the QEMU
process crashes then the on-disk dirty bitmap will be invalid and the
backup application needs to do a full backup next time.

The advantage of this approach is that we don't need to fdatasync(2)
before every guest write operation.

> Compared with
> current solution, I think it is doing COW at qemu device level:
> 
>     qemu device
>         |
> general block layer
>         |
> virtual format layer
>         |
> -----------------------
> |                     |
> qcow2             vmdk....
> 
>   This will make things complicated when more works comes, a better
> place for block COW, is under general block layer. Maybe later we
> can adjust block for it.

I don't consider block jobs to be "qemu device" layer.  It sounds like
you think the code should be in bdrv_co_do_writev()?

The drive-backup operation doesn't really affect the source
BlockDriverState, it just needs to intercept writes.  Therefore it seems
cleaner for the code to live separately (plus we reuse the code for the
block job loop which copies out data while the guest is running).
Otherwise we would squash all of the blockjob code into block.c and it
would be an even bigger mess than it is today :-).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb()
  2013-05-15 14:42   ` Paolo Bonzini
  2013-05-16  2:42     ` Wenchao Xia
@ 2013-05-16  8:11     ` Stefan Hajnoczi
  1 sibling, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-16  8:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Fam Zheng, qemu-devel, xiawenc, imain, dietmar

On Wed, May 15, 2013 at 04:42:57PM +0200, Paolo Bonzini wrote:
> Il 15/05/2013 16:34, Stefan Hajnoczi ha scritto:
> > The bdrv_add_before_write_cb() function installs a callback that is
> > invoked before a write request is processed.  This will be used to
> > implement copy-on-write point-in-time snapshots where we need to copy
> > out old data before overwriting it.
> 
> Perhaps a notifier list that receives the BdrvTrackedRequest?  (BTW we
> should probably remove all the notifier_remove wrappers, they're useless).

Nice idea, done in v4.  I originally rejected NotifierList because it
only has a void * argument, but BdrvRequest has the information we need.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-16  7:47   ` Stefan Hajnoczi
@ 2013-05-17  6:58     ` Wenchao Xia
  2013-05-17  9:14       ` Stefan Hajnoczi
  2013-05-17 10:17     ` Paolo Bonzini
  1 sibling, 1 reply; 57+ messages in thread
From: Wenchao Xia @ 2013-05-17  6:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Paolo Bonzini, dietmar

于 2013-5-16 15:47, Stefan Hajnoczi 写道:
> On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
>>    After checking the code, I found it possible to add delta data backup
>> support also, If an additional dirty bitmap was added.
>
> I've been thinking about this.  Incremental backups need to know which
> blocks have changed, but keeping a persistent dirty bitmap is expensive
> and unnecessary.
>
   Yes, it would be likely another block layer, so hope not do that.

> Backup applications need to support the full backup case anyway for
> their first run.  Therefore we can keep a best-effort dirty bitmap which
> is persisted only when the guest is terminated cleanly.  If the QEMU
> process crashes then the on-disk dirty bitmap will be invalid and the
> backup application needs to do a full backup next time.
>
> The advantage of this approach is that we don't need to fdatasync(2)
> before every guest write operation.
>
>> Compared with
>> current solution, I think it is doing COW at qemu device level:
>>
>>      qemu device
>>          |
>> general block layer
>>          |
>> virtual format layer
>>          |
>> -----------------------
>> |                     |
>> qcow2             vmdk....
>>
>>    This will make things complicated when more works comes, a better
>> place for block COW, is under general block layer. Maybe later we
>> can adjust block for it.
>
> I don't consider block jobs to be "qemu device" layer.  It sounds like
> you think the code should be in bdrv_co_do_writev()?
>
   I feel a trend of becoming fragility from different solutions,
and COW is a key feature that block layer provide, so I wonder if it
can be adjusted under block layer later, and leaves an abstract API for
it. Some other operation such as commit, stream, could be also hide
under block.

qemu                 general testcase        other user
    |                       |                        |
    --------------------------------------------------
                     |
         core block abstract layer(COW, zero R/W, image dup/backup)
                     |
           ---------------------
           |                    |
     qemu's implement       3'rd party
           |                    |
     -------------        --------------
     |           |        |            |
   qcow2        vmdk     lvm2    Enterprise storage integration

   It is not directly related to this serial, but I feel some effort
should be paid when time allows, before things become complicated.

> The drive-backup operation doesn't really affect the source
> BlockDriverState, it just needs to intercept writes.  Therefore it seems
> cleaner for the code to live separately (plus we reuse the code for the
> block job loop which copies out data while the guest is running).
> Otherwise we would squash all of the blockjob code into block.c and it
> would be an even bigger mess than it is today :-).
>


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-17  6:58     ` Wenchao Xia
@ 2013-05-17  9:14       ` Stefan Hajnoczi
  2013-05-21  3:25         ` Wenchao Xia
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-17  9:14 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi,
	Paolo Bonzini, dietmar

On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote:
> 于 2013-5-16 15:47, Stefan Hajnoczi 写道:
> >On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
> >>   After checking the code, I found it possible to add delta data backup
> >>support also, If an additional dirty bitmap was added.
> >
> >I've been thinking about this.  Incremental backups need to know which
> >blocks have changed, but keeping a persistent dirty bitmap is expensive
> >and unnecessary.
> >
>   Yes, it would be likely another block layer, so hope not do that.

Not at all, persistent dirty bitmaps need to be part of the block layer
since they need to support any image type - qcow2, Gluster, raw LVM,
etc.

> >I don't consider block jobs to be "qemu device" layer.  It sounds like
> >you think the code should be in bdrv_co_do_writev()?
> >
>   I feel a trend of becoming fragility from different solutions,
> and COW is a key feature that block layer provide, so I wonder if it
> can be adjusted under block layer later

The generic block layer includes more than just block.c.  It also
includes block jobs and the qcow2 metadata cache that Dong Xu has
extracted recently, for example.  Therefore you need to be more specific
about "what" and "why".

This copy-on-write backup approach is available as a block job which
runs on top of any BlockDriverState.  What concrete change are you
proposing?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-16  7:47   ` Stefan Hajnoczi
  2013-05-17  6:58     ` Wenchao Xia
@ 2013-05-17 10:17     ` Paolo Bonzini
  2013-05-20  6:24       ` Stefan Hajnoczi
  1 sibling, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-17 10:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, Wenchao Xia

Il 16/05/2013 09:47, Stefan Hajnoczi ha scritto:
> On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
>>   After checking the code, I found it possible to add delta data backup
>> support also, If an additional dirty bitmap was added.
> 
> I've been thinking about this.  Incremental backups need to know which
> blocks have changed, but keeping a persistent dirty bitmap is expensive
> and unnecessary.
> 
> Backup applications need to support the full backup case anyway for
> their first run.  Therefore we can keep a best-effort dirty bitmap which
> is persisted only when the guest is terminated cleanly.  If the QEMU
> process crashes then the on-disk dirty bitmap will be invalid and the
> backup application needs to do a full backup next time.
> 
> The advantage of this approach is that we don't need to fdatasync(2)
> before every guest write operation.

You only need to fdatasync() before every guest flush, no?

Paolo

>> Compared with
>> current solution, I think it is doing COW at qemu device level:
>>
>>     qemu device
>>         |
>> general block layer
>>         |
>> virtual format layer
>>         |
>> -----------------------
>> |                     |
>> qcow2             vmdk....
>>
>>   This will make things complicated when more works comes, a better
>> place for block COW, is under general block layer. Maybe later we
>> can adjust block for it.
> 
> I don't consider block jobs to be "qemu device" layer.  It sounds like
> you think the code should be in bdrv_co_do_writev()?
> 
> The drive-backup operation doesn't really affect the source
> BlockDriverState, it just needs to intercept writes.  Therefore it seems
> cleaner for the code to live separately (plus we reuse the code for the
> block job loop which copies out data while the guest is running).
> Otherwise we would squash all of the blockjob code into block.c and it
> would be an even bigger mess than it is today :-).
> 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-17 10:17     ` Paolo Bonzini
@ 2013-05-20  6:24       ` Stefan Hajnoczi
  2013-05-20  7:23         ` Paolo Bonzini
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-20  6:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Wenchao Xia, imain,
	Stefan Hajnoczi, Dietmar Maurer

On Fri, May 17, 2013 at 12:17 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 16/05/2013 09:47, Stefan Hajnoczi ha scritto:
>> On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
>>>   After checking the code, I found it possible to add delta data backup
>>> support also, If an additional dirty bitmap was added.
>>
>> I've been thinking about this.  Incremental backups need to know which
>> blocks have changed, but keeping a persistent dirty bitmap is expensive
>> and unnecessary.
>>
>> Backup applications need to support the full backup case anyway for
>> their first run.  Therefore we can keep a best-effort dirty bitmap which
>> is persisted only when the guest is terminated cleanly.  If the QEMU
>> process crashes then the on-disk dirty bitmap will be invalid and the
>> backup application needs to do a full backup next time.
>>
>> The advantage of this approach is that we don't need to fdatasync(2)
>> before every guest write operation.
>
> You only need to fdatasync() before every guest flush, no?

No, you need to set the dirty bit before issuing the write on the
host.  Otherwise the image data may be modified without setting the
appropriate dirty bit.  That would allow data modifications to go
undetected!

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-20  6:24       ` Stefan Hajnoczi
@ 2013-05-20  7:23         ` Paolo Bonzini
  2013-05-21  7:31           ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-20  7:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Dietmar Maurer, imain,
	Stefan Hajnoczi, Wenchao Xia

Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
>> > You only need to fdatasync() before every guest flush, no?
> No, you need to set the dirty bit before issuing the write on the
> host.  Otherwise the image data may be modified without setting the
> appropriate dirty bit.  That would allow data modifications to go
> undetected!

But data modifications can go undetected until the guest flush returns,
can't they?

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver Stefan Hajnoczi
  2013-05-16  3:27   ` Wenchao Xia
@ 2013-05-20 11:30   ` Paolo Bonzini
  2013-05-21 13:34     ` Stefan Hajnoczi
  1 sibling, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-20 11:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain, xiawenc

Il 15/05/2013 16:34, Stefan Hajnoczi ha scritto:
> +    wait_for_overlapping_requests(job, start, end);
> +    cow_request_begin(&cow_request, job, start, end);
> +
> +    for (; start < end; start++) {
> +        if (hbitmap_get(job->bitmap, start)) {
> +            DPRINTF("brdv_co_backup_cow skip C%" PRId64 "\n", start);
> +            continue; /* already copied */
> +        }
> +
> +        /* immediately set bitmap (avoid coroutine race) */
> +        hbitmap_set(job->bitmap, start, 1);
> +

HBitmap already has code for finding the next set bit, but you're not
using it.

If you reverse the direction of the bitmap, you can use an HBitmapIter here:

> 
> +    start = 0;
> +    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
> +                       BACKUP_BLOCKS_PER_CLUSTER);
> +
> +    job->bitmap = hbitmap_alloc(end, 0);
> +
> +    before_write = bdrv_add_before_write_cb(bs, backup_before_write);
> +
> +    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
> +            bdrv_get_device_name(bs), start, end);
> +
> +    for (; start < end; start++) {

... instead of iterating through each item manually.

Paolo

> +        if (block_job_is_cancelled(&job->common)) {
> +            break;
> +        }
> +
> +        /* we need to yield so that qemu_aio_flush() returns.
> +         * (without, VM does not reboot)
> +         */
> +        if (job->common.speed) {
> +            uint64_t delay_ns = ratelimit_calculate_delay(
> +                &job->limit, job->sectors_read);
> +            job->sectors_read = 0;
> +            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
> +        } else {
> +            block_job_sleep_ns(&job->common, rt_clock, 0);
> +        }
> +
> +        if (block_job_is_cancelled(&job->common)) {
> +            break;
> +        }
> +
> +        DPRINTF("backup_run loop C%" PRId64 "\n", start);
> +
> +        ret = backup_do_cow(bs, start * BACKUP_BLOCKS_PER_CLUSTER, 1);
> +        if (ret < 0) {
> +            break;
> +        }
> +
> +        /* Publish progress */
> +        job->common.offset += BACKUP_CLUSTER_SIZE;
> +    }

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-17  9:14       ` Stefan Hajnoczi
@ 2013-05-21  3:25         ` Wenchao Xia
  2013-05-21  7:34           ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Wenchao Xia @ 2013-05-21  3:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi,
	Paolo Bonzini, dietmar

于 2013-5-17 17:14, Stefan Hajnoczi 写道:
> On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote:
>> 于 2013-5-16 15:47, Stefan Hajnoczi 写道:
>>> On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
>>>>    After checking the code, I found it possible to add delta data backup
>>>> support also, If an additional dirty bitmap was added.
>>>
>>> I've been thinking about this.  Incremental backups need to know which
>>> blocks have changed, but keeping a persistent dirty bitmap is expensive
>>> and unnecessary.
>>>
>>    Yes, it would be likely another block layer, so hope not do that.
>
> Not at all, persistent dirty bitmaps need to be part of the block layer
> since they need to support any image type - qcow2, Gluster, raw LVM,
> etc.
>
>>> I don't consider block jobs to be "qemu device" layer.  It sounds like
>>> you think the code should be in bdrv_co_do_writev()?
>>>
>>    I feel a trend of becoming fragility from different solutions,
>> and COW is a key feature that block layer provide, so I wonder if it
>> can be adjusted under block layer later
>
> The generic block layer includes more than just block.c.  It also
> includes block jobs and the qcow2 metadata cache that Dong Xu has
> extracted recently, for example.  Therefore you need to be more specific
> about "what" and "why".
>
> This copy-on-write backup approach is available as a block job which
> runs on top of any BlockDriverState.  What concrete change are you
> proposing?
>
   Since hard to hide it BlockDriverState now, suggest add some
document in qemu about the three snapshot types: qcow2 internal,
backing chain, drive-backup, which are all qemu software based snapshot
implemention, then user can know the difference with it eaiser.

   In long term, I hope to form a library expose those in a unified
format, perhaps it calls qmp_transaction internally, and make it
easier to be offloaded if possible, so hope a abstract-driver structure.
-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-20  7:23         ` Paolo Bonzini
@ 2013-05-21  7:31           ` Stefan Hajnoczi
  2013-05-21  8:30             ` Paolo Bonzini
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-21  7:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel,
	Dietmar Maurer, imain, Wenchao Xia

On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
> Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
> >> > You only need to fdatasync() before every guest flush, no?
> > No, you need to set the dirty bit before issuing the write on the
> > host.  Otherwise the image data may be modified without setting the
> > appropriate dirty bit.  That would allow data modifications to go
> > undetected!
> 
> But data modifications can go undetected until the guest flush returns,
> can't they?

You are thinking about it from the guest perspective - if a flush has
not completed yet then there is no guarantee that the write has reached
disk.

But from a host perspective the dirty bitmap should be conservative so
that the backup application can always restore a bit-for-bit identical
copy of the disk image.  It would be weird if writes can sneak in
unnoticed.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-21  3:25         ` Wenchao Xia
@ 2013-05-21  7:34           ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-21  7:34 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Paolo Bonzini, dietmar

On Tue, May 21, 2013 at 11:25:01AM +0800, Wenchao Xia wrote:
> 于 2013-5-17 17:14, Stefan Hajnoczi 写道:
> >On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote:
> >>于 2013-5-16 15:47, Stefan Hajnoczi 写道:
> >>>On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
> >>>>   After checking the code, I found it possible to add delta data backup
> >>>>support also, If an additional dirty bitmap was added.
> >>>
> >>>I've been thinking about this.  Incremental backups need to know which
> >>>blocks have changed, but keeping a persistent dirty bitmap is expensive
> >>>and unnecessary.
> >>>
> >>   Yes, it would be likely another block layer, so hope not do that.
> >
> >Not at all, persistent dirty bitmaps need to be part of the block layer
> >since they need to support any image type - qcow2, Gluster, raw LVM,
> >etc.
> >
> >>>I don't consider block jobs to be "qemu device" layer.  It sounds like
> >>>you think the code should be in bdrv_co_do_writev()?
> >>>
> >>   I feel a trend of becoming fragility from different solutions,
> >>and COW is a key feature that block layer provide, so I wonder if it
> >>can be adjusted under block layer later
> >
> >The generic block layer includes more than just block.c.  It also
> >includes block jobs and the qcow2 metadata cache that Dong Xu has
> >extracted recently, for example.  Therefore you need to be more specific
> >about "what" and "why".
> >
> >This copy-on-write backup approach is available as a block job which
> >runs on top of any BlockDriverState.  What concrete change are you
> >proposing?
> >
>   Since hard to hide it BlockDriverState now, suggest add some
> document in qemu about the three snapshot types: qcow2 internal,
> backing chain, drive-backup, which are all qemu software based snapshot
> implemention, then user can know the difference with it eaiser.
> 
>   In long term, I hope to form a library expose those in a unified
> format, perhaps it calls qmp_transaction internally, and make it
> easier to be offloaded if possible, so hope a abstract-driver structure.

Okay, just keep in mind they have different behavior.  That means these
snapshot types solve different problems and may be inappropriate for
some use cases.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-21  7:31           ` Stefan Hajnoczi
@ 2013-05-21  8:30             ` Paolo Bonzini
  2013-05-21 10:34               ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-21  8:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel,
	Dietmar Maurer, imain, Wenchao Xia

Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto:
> On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
>> Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
>>>>> You only need to fdatasync() before every guest flush, no?
>>> No, you need to set the dirty bit before issuing the write on the
>>> host.  Otherwise the image data may be modified without setting the
>>> appropriate dirty bit.  That would allow data modifications to go
>>> undetected!
>>
>> But data modifications can go undetected until the guest flush returns,
>> can't they?
> 
> You are thinking about it from the guest perspective - if a flush has
> not completed yet then there is no guarantee that the write has reached
> disk.
> 
> But from a host perspective the dirty bitmap should be conservative so
> that the backup application can always restore a bit-for-bit identical
> copy of the disk image.  It would be weird if writes can sneak in
> unnoticed.

True, but that would happen only in case the host crashes.  Even for a
QEMU crash the changes would be safe, I think.  They would be written
back when the persistent dirty bitmap's mmap() area is unmapped, during
process exit.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-21  8:30             ` Paolo Bonzini
@ 2013-05-21 10:34               ` Stefan Hajnoczi
  2013-05-21 10:36                 ` Paolo Bonzini
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-21 10:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel,
	Dietmar Maurer, imain, Wenchao Xia

On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote:
> Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto:
> > On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
> >> Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
> >>>>> You only need to fdatasync() before every guest flush, no?
> >>> No, you need to set the dirty bit before issuing the write on the
> >>> host.  Otherwise the image data may be modified without setting the
> >>> appropriate dirty bit.  That would allow data modifications to go
> >>> undetected!
> >>
> >> But data modifications can go undetected until the guest flush returns,
> >> can't they?
> > 
> > You are thinking about it from the guest perspective - if a flush has
> > not completed yet then there is no guarantee that the write has reached
> > disk.
> > 
> > But from a host perspective the dirty bitmap should be conservative so
> > that the backup application can always restore a bit-for-bit identical
> > copy of the disk image.  It would be weird if writes can sneak in
> > unnoticed.
> 
> True, but that would happen only in case the host crashes.  Even for a
> QEMU crash the changes would be safe, I think.  They would be written
> back when the persistent dirty bitmap's mmap() area is unmapped, during
> process exit.

I'd err on the side of caution, mark the persistent dirty bitmap while
QEMU is running.  Discard the file if there was a power failure.

It really depends what the dirty bitmap users are doing.  It could be
okay to have a tiny chance of missing a modification but it might not.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-21 10:34               ` Stefan Hajnoczi
@ 2013-05-21 10:36                 ` Paolo Bonzini
  2013-05-21 10:58                   ` Dietmar Maurer
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-21 10:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel,
	Dietmar Maurer, imain, Wenchao Xia

Il 21/05/2013 12:34, Stefan Hajnoczi ha scritto:
> On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote:
>> Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto:
>>> On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
>>>> Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
>>>>>>> You only need to fdatasync() before every guest flush, no?
>>>>> No, you need to set the dirty bit before issuing the write on the
>>>>> host.  Otherwise the image data may be modified without setting the
>>>>> appropriate dirty bit.  That would allow data modifications to go
>>>>> undetected!
>>>>
>>>> But data modifications can go undetected until the guest flush returns,
>>>> can't they?
>>>
>>> You are thinking about it from the guest perspective - if a flush has
>>> not completed yet then there is no guarantee that the write has reached
>>> disk.
>>>
>>> But from a host perspective the dirty bitmap should be conservative so
>>> that the backup application can always restore a bit-for-bit identical
>>> copy of the disk image.  It would be weird if writes can sneak in
>>> unnoticed.
>>
>> True, but that would happen only in case the host crashes.  Even for a
>> QEMU crash the changes would be safe, I think.  They would be written
>> back when the persistent dirty bitmap's mmap() area is unmapped, during
>> process exit.
> 
> I'd err on the side of caution, mark the persistent dirty bitmap while
> QEMU is running.  Discard the file if there was a power failure.

Agreed.  Though this is something that management must do manually,
isn't it?  QEMU cannot distinguish a SIGKILL from a power failure, while
management can afford treating SIGKILL as a power failure.

> It really depends what the dirty bitmap users are doing.  It could be
> okay to have a tiny chance of missing a modification but it might not.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-21 10:36                 ` Paolo Bonzini
@ 2013-05-21 10:58                   ` Dietmar Maurer
  2013-05-22 13:43                     ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-21 10:58 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain, Wenchao Xia

> >> True, but that would happen only in case the host crashes.  Even for
> >> a QEMU crash the changes would be safe, I think.  They would be
> >> written back when the persistent dirty bitmap's mmap() area is
> >> unmapped, during process exit.
> >
> > I'd err on the side of caution, mark the persistent dirty bitmap while
> > QEMU is running.  Discard the file if there was a power failure.
> 
> Agreed.  Though this is something that management must do manually, isn't it?
> QEMU cannot distinguish a SIGKILL from a power failure, while management
> can afford treating SIGKILL as a power failure.
> 
> > It really depends what the dirty bitmap users are doing.  It could be
> > okay to have a tiny chance of missing a modification but it might not.

I just want to mention that there is another way to do incremental backups. Instead
of using a dirty bitmap, you can compare the content, usually using a digest (SHA1) on clusters.

That way you can also implement async replication to a remote site (like MS do).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-20 11:30   ` Paolo Bonzini
@ 2013-05-21 13:34     ` Stefan Hajnoczi
  2013-05-21 15:11       ` Paolo Bonzini
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-21 13:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, xiawenc, imain,
	Stefan Hajnoczi, dietmar

On Mon, May 20, 2013 at 01:30:37PM +0200, Paolo Bonzini wrote:
> Il 15/05/2013 16:34, Stefan Hajnoczi ha scritto:
> > +    wait_for_overlapping_requests(job, start, end);
> > +    cow_request_begin(&cow_request, job, start, end);
> > +
> > +    for (; start < end; start++) {
> > +        if (hbitmap_get(job->bitmap, start)) {
> > +            DPRINTF("brdv_co_backup_cow skip C%" PRId64 "\n", start);
> > +            continue; /* already copied */
> > +        }
> > +
> > +        /* immediately set bitmap (avoid coroutine race) */
> > +        hbitmap_set(job->bitmap, start, 1);
> > +
> 
> HBitmap already has code for finding the next set bit, but you're not
> using it.
> 
> If you reverse the direction of the bitmap, you can use an HBitmapIter here:
> 
> > 
> > +    start = 0;
> > +    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
> > +                       BACKUP_BLOCKS_PER_CLUSTER);
> > +
> > +    job->bitmap = hbitmap_alloc(end, 0);
> > +
> > +    before_write = bdrv_add_before_write_cb(bs, backup_before_write);
> > +
> > +    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
> > +            bdrv_get_device_name(bs), start, end);
> > +
> > +    for (; start < end; start++) {
> 
> ... instead of iterating through each item manually.

/**
 * hbitmap_iter_init:
[...]
 * position of the iterator is also okay.  However, concurrent resetting of
 * bits can lead to unexpected behavior if the iterator has not yet reached
 * those bits. 
 */ 
void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);

This worries me.  We would initialize the bitmap to all 1s.  Backing up
a cluster resets the bit.  But the documentation says it is not safe to
reset bits while iterating?

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 13:34     ` Stefan Hajnoczi
@ 2013-05-21 15:11       ` Paolo Bonzini
  2013-05-21 15:25         ` Dietmar Maurer
  2013-05-21 16:26         ` Dietmar Maurer
  0 siblings, 2 replies; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-21 15:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, xiawenc, imain,
	Stefan Hajnoczi, dietmar

Il 21/05/2013 15:34, Stefan Hajnoczi ha scritto:
> /**
>  * hbitmap_iter_init:
> [...]
>  * position of the iterator is also okay.  However, concurrent resetting of
>  * bits can lead to unexpected behavior if the iterator has not yet reached
>  * those bits. 
>  */ 
> void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);
> 
> This worries me.  We would initialize the bitmap to all 1s.  Backing up
> a cluster resets the bit.  But the documentation says it is not safe to
> reset bits while iterating?

Hmm, right.  But do we need the bitmap at all?  We can just use
bdrv_is_allocated like bdrv_co_do_readv does.

The code has a comment:

> +        /* immediately set bitmap (avoid coroutine race) */
> +        hbitmap_set(job->bitmap, start, 1);

but wouldn't this be avoided anyway, because of the mutual exclusion
between overlapping requests?

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 15:11       ` Paolo Bonzini
@ 2013-05-21 15:25         ` Dietmar Maurer
  2013-05-21 15:35           ` Paolo Bonzini
  2013-05-21 16:26         ` Dietmar Maurer
  1 sibling, 1 reply; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-21 15:25 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, xiawenc

> Hmm, right.  But do we need the bitmap at all?  We can just use
> bdrv_is_allocated like bdrv_co_do_readv does.

If a write occur, we read and backup that cluster immediately (out of order). So
I am quite sure we need the bitmap.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 15:25         ` Dietmar Maurer
@ 2013-05-21 15:35           ` Paolo Bonzini
  2013-05-21 15:54             ` Dietmar Maurer
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-21 15:35 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Stefan Hajnoczi, xiawenc

Il 21/05/2013 17:25, Dietmar Maurer ha scritto:
>> Hmm, right.  But do we need the bitmap at all?  We can just use
>> > bdrv_is_allocated like bdrv_co_do_readv does.
> If a write occur, we read and backup that cluster immediately (out of order). So
> I am quite sure we need the bitmap.

This is the same as how copy-on-read happens during image streaming, and
it doesn't use a bitmap.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 15:35           ` Paolo Bonzini
@ 2013-05-21 15:54             ` Dietmar Maurer
  2013-05-21 16:03               ` Paolo Bonzini
  0 siblings, 1 reply; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-21 15:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Stefan Hajnoczi, xiawenc

> >> Hmm, right.  But do we need the bitmap at all?  We can just use
> >> > bdrv_is_allocated like bdrv_co_do_readv does.
> > If a write occur, we read and backup that cluster immediately (out of
> > order). So I am quite sure we need the bitmap.
> 
> This is the same as how copy-on-read happens during image streaming, and it
> doesn't use a bitmap.

But a read does not modify the content.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 15:54             ` Dietmar Maurer
@ 2013-05-21 16:03               ` Paolo Bonzini
  2013-05-21 16:15                 ` Dietmar Maurer
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-21 16:03 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Stefan Hajnoczi, xiawenc

Il 21/05/2013 17:54, Dietmar Maurer ha scritto:
>>>> Hmm, right.  But do we need the bitmap at all?  We can just use
>>>>> bdrv_is_allocated like bdrv_co_do_readv does.
>>> If a write occur, we read and backup that cluster immediately (out of
>>> order). So I am quite sure we need the bitmap.
>>
>> This is the same as how copy-on-read happens during image streaming, and it
>> doesn't use a bitmap.
> 
> But a read does not modify the content.

Copy-on-read modifies the topmost image even without changing the
content, just like copy-before-write modifies the backup image.

But writes to the backup copy are serialized anyway via
wait_for_overlapping_requests, so there is no problem here.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 16:03               ` Paolo Bonzini
@ 2013-05-21 16:15                 ` Dietmar Maurer
  0 siblings, 0 replies; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-21 16:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Stefan Hajnoczi, xiawenc

> Copy-on-read modifies the topmost image even without changing the content,
> just like copy-before-write modifies the backup image.

Ah, got it. But this only works if you use a BS as target. This will not work with my
old backup driver patches (I am still not convinced about using a BS as backup target)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 15:11       ` Paolo Bonzini
  2013-05-21 15:25         ` Dietmar Maurer
@ 2013-05-21 16:26         ` Dietmar Maurer
  2013-05-21 16:46           ` Paolo Bonzini
  1 sibling, 1 reply; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-21 16:26 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi, xiawenc

> Hmm, right.  But do we need the bitmap at all?  We can just use
> bdrv_is_allocated like bdrv_co_do_readv does.

Does that works with a nbd driver? Or does that add another RPC call (slow down)?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 16:26         ` Dietmar Maurer
@ 2013-05-21 16:46           ` Paolo Bonzini
  2013-05-22 13:37             ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-05-21 16:46 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Stefan Hajnoczi, xiawenc

Il 21/05/2013 18:26, Dietmar Maurer ha scritto:
>> Hmm, right.  But do we need the bitmap at all?  We can just use
>> > bdrv_is_allocated like bdrv_co_do_readv does.
> Does that works with a nbd driver?

Ah, right.  That's the answer.

> Or does that add another RPC call (slow down)?

It doesn't work at all.  Thus Stefan's patch is fine (except if you
don't use HBitmapIter, there's no advantage in using HBitmap; you can
use qemu/bitmap.h instead).

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
  2013-05-21 16:46           ` Paolo Bonzini
@ 2013-05-22 13:37             ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-22 13:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, xiawenc,
	imain, Dietmar Maurer

On Tue, May 21, 2013 at 06:46:39PM +0200, Paolo Bonzini wrote:
> Il 21/05/2013 18:26, Dietmar Maurer ha scritto:
> >> Hmm, right.  But do we need the bitmap at all?  We can just use
> >> > bdrv_is_allocated like bdrv_co_do_readv does.
> > Does that works with a nbd driver?
> 
> Ah, right.  That's the answer.

Yes, the target may not supported allocation info.

> > Or does that add another RPC call (slow down)?
> 
> It doesn't work at all.  Thus Stefan's patch is fine (except if you
> don't use HBitmapIter, there's no advantage in using HBitmap; you can
> use qemu/bitmap.h instead).

I chose HBitmap because bitmap.h uses the int type instead of uint64_t,
which makes it risky to use in the block layer.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-21 10:58                   ` Dietmar Maurer
@ 2013-05-22 13:43                     ` Stefan Hajnoczi
  2013-05-22 15:10                       ` Dietmar Maurer
  2013-05-22 15:34                       ` Dietmar Maurer
  0 siblings, 2 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-22 13:43 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Paolo Bonzini, Wenchao Xia

On Tue, May 21, 2013 at 10:58:47AM +0000, Dietmar Maurer wrote:
> > >> True, but that would happen only in case the host crashes.  Even for
> > >> a QEMU crash the changes would be safe, I think.  They would be
> > >> written back when the persistent dirty bitmap's mmap() area is
> > >> unmapped, during process exit.
> > >
> > > I'd err on the side of caution, mark the persistent dirty bitmap while
> > > QEMU is running.  Discard the file if there was a power failure.
> > 
> > Agreed.  Though this is something that management must do manually, isn't it?
> > QEMU cannot distinguish a SIGKILL from a power failure, while management
> > can afford treating SIGKILL as a power failure.
> > 
> > > It really depends what the dirty bitmap users are doing.  It could be
> > > okay to have a tiny chance of missing a modification but it might not.
> 
> I just want to mention that there is another way to do incremental backups. Instead
> of using a dirty bitmap, you can compare the content, usually using a digest (SHA1) on clusters.

Reading gigabytes of data from disk is expensive though.  I guess they
keep a Merkle tree so it's easy to find out which parts of the image
must be transferred without re-reading the entire image.

That sounds like more work than a persistent dirty bitmap.  The
advantage is that while dirty bitmaps are consumed by a single user, the
Merkle tree can be used to sync up any number of replicas.

> That way you can also implement async replication to a remote site (like MS do).

Sounds like rsync.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-22 13:43                     ` Stefan Hajnoczi
@ 2013-05-22 15:10                       ` Dietmar Maurer
  2013-05-22 15:34                       ` Dietmar Maurer
  1 sibling, 0 replies; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-22 15:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Paolo Bonzini, Wenchao Xia

> > That way you can also implement async replication to a remote site (like MS
> do).
> 
> Sounds like rsync.

yes, but we need 'snapshots' and something more optimized (rsync compared the whole files). 
I think this can be implemented using the backup job with a specialized backup driver.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-22 13:43                     ` Stefan Hajnoczi
  2013-05-22 15:10                       ` Dietmar Maurer
@ 2013-05-22 15:34                       ` Dietmar Maurer
  2013-05-23  8:04                         ` Stefan Hajnoczi
  1 sibling, 1 reply; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-22 15:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, imain,
	Paolo Bonzini, Wenchao Xia

> That sounds like more work than a persistent dirty bitmap.  The advantage is that
> while dirty bitmaps are consumed by a single user, the Merkle tree can be used
> to sync up any number of replicas.

I also consider it safer, because you make sure the data exists (using hash keys like SHA1).

I am unsure how you can check if a dirty bitmap contains errors, or is out of date?

Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always related to a single image.
(consider the user remove the latest backup from the backup target). 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-22 15:34                       ` Dietmar Maurer
@ 2013-05-23  8:04                         ` Stefan Hajnoczi
  2013-05-23  8:11                           ` Dietmar Maurer
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-23  8:04 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi,
	Paolo Bonzini, Wenchao Xia

On Wed, May 22, 2013 at 03:34:18PM +0000, Dietmar Maurer wrote:
> > That sounds like more work than a persistent dirty bitmap.  The advantage is that
> > while dirty bitmaps are consumed by a single user, the Merkle tree can be used
> > to sync up any number of replicas.
> 
> I also consider it safer, because you make sure the data exists (using hash keys like SHA1).
> 
> I am unsure how you can check if a dirty bitmap contains errors, or is out of date?
> 
> Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always related to a single image.
> (consider the user remove the latest backup from the backup target). 

One disadvantage of Merkle trees is that the client becomes stateful -
the client needs to store its own Merkle tree and this requires fancier
client-side code.

It is also more expensive to update hashes than a dirty bitmap.  Not
because you need to hash data but because a small write (e.g. 1 sector)
requires that you read the surrounding sectors to recompute a hash for
the cluster.  Therefore you can expect worse guest I/O performance than
with a dirty bitmap.

I still think it's a cool idea.  Making it work well will require a lot
more effort than a dirty bitmap.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-23  8:04                         ` Stefan Hajnoczi
@ 2013-05-23  8:11                           ` Dietmar Maurer
  2013-05-24  8:38                             ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-23  8:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi,
	Paolo Bonzini, Wenchao Xia

> > I also consider it safer, because you make sure the data exists (using hash keys
> like SHA1).
> >
> > I am unsure how you can check if a dirty bitmap contains errors, or is out of
> date?
> >
> > Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always
> related to a single image.
> > (consider the user remove the latest backup from the backup target).
> 
> One disadvantage of Merkle trees is that the client becomes stateful - the client
> needs to store its own Merkle tree and this requires fancier client-side code.

What 'client' do you talk about here?

But sure, the code gets more complex, and needs considerable amount of RAM
to store the hash keys .
 
> It is also more expensive to update hashes than a dirty bitmap.  Not because you
> need to hash data but because a small write (e.g. 1 sector) requires that you
> read the surrounding sectors to recompute a hash for the cluster.  Therefore you
> can expect worse guest I/O performance than with a dirty bitmap.

There is no need to update any hash - You only need to do that on backup - in fact, all
those things can be done by the backup driver.
 
> I still think it's a cool idea.  Making it work well will require a lot more effort than
> a dirty bitmap.

How do you re-generate a dirty bitmap after a server crash?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-23  8:11                           ` Dietmar Maurer
@ 2013-05-24  8:38                             ` Stefan Hajnoczi
  2013-05-24  9:53                               ` Dietmar Maurer
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-05-24  8:38 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi,
	Paolo Bonzini, Wenchao Xia

On Thu, May 23, 2013 at 08:11:42AM +0000, Dietmar Maurer wrote:
> > > I also consider it safer, because you make sure the data exists (using hash keys
> > like SHA1).
> > >
> > > I am unsure how you can check if a dirty bitmap contains errors, or is out of
> > date?
> > >
> > > Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always
> > related to a single image.
> > > (consider the user remove the latest backup from the backup target).
> > 
> > One disadvantage of Merkle trees is that the client becomes stateful - the client
> > needs to store its own Merkle tree and this requires fancier client-side code.
> 
> What 'client' do you talk about here?

A backup application, for example.  Previously it could simply use
api.getDirtyBlocks() -> [Sector] and it would translate into a single
QMP API call.

Now a Merkle tree needs to be stored on the client side and synced with
the server.  The client-side library becomes more complex.

> But sure, the code gets more complex, and needs considerable amount of RAM
> to store the hash keys .
>  
> > It is also more expensive to update hashes than a dirty bitmap.  Not because you
> > need to hash data but because a small write (e.g. 1 sector) requires that you
> > read the surrounding sectors to recompute a hash for the cluster.  Therefore you
> > can expect worse guest I/O performance than with a dirty bitmap.
> 
> There is no need to update any hash - You only need to do that on backup - in fact, all
> those things can be done by the backup driver.

The problem is that if you leave hash calculation until backup time then
you need to read in the entire disk image (100s of GB) from disk.  That
is slow and drains I/O resources.

Maybe the best approach is to maintain a dirty bitmap while the guest is
running, which is fairly cheap.  Then you can use the dirty bitmap to
only hash modified clusters when building the Merkle tree - this avoids
reading the entire disk image.

> > I still think it's a cool idea.  Making it work well will require a lot more effort than
> > a dirty bitmap.
> 
> How do you re-generate a dirty bitmap after a server crash?

The dirty bitmap is invalid after crash.  A full backup is required, all
clusters are considered dirty.

The simplest way to implement this is to mark the persistent bitmap
"invalid" upon the first guest write.  When QEMU is terminated cleanly,
flush all dirty bitmap updates to disk and then mark the file "valid"
again.  If QEMU finds the file is "invalid" on startup, start from
scratch.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-05-24  8:38                             ` Stefan Hajnoczi
@ 2013-05-24  9:53                               ` Dietmar Maurer
  0 siblings, 0 replies; 57+ messages in thread
From: Dietmar Maurer @ 2013-05-24  9:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, imain, Stefan Hajnoczi,
	Paolo Bonzini, Wenchao Xia

> Maybe the best approach is to maintain a dirty bitmap while the guest is
> running, which is fairly cheap.  Then you can use the dirty bitmap to only hash
> modified clusters when building the Merkle tree - this avoids reading the
> entire disk image.

Yes, this is an good optimization.

> > > I still think it's a cool idea.  Making it work well will require a
> > > lot more effort than a dirty bitmap.
> >
> > How do you re-generate a dirty bitmap after a server crash?
> 
> The dirty bitmap is invalid after crash.  A full backup is required, all clusters
> are considered dirty.
> 
> The simplest way to implement this is to mark the persistent bitmap "invalid"
> upon the first guest write.  When QEMU is terminated cleanly, flush all dirty
> bitmap updates to disk and then mark the file "valid"
> again.  If QEMU finds the file is "invalid" on startup, start from scratch.

Or you can compared the hash keys in that case?

Although I guess computing all those SHA1 checksums needs a considerable amount of CPU time.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
  2013-09-02 12:57 Benoît Canet
@ 2013-09-03  7:54 ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2013-09-03  7:54 UTC (permalink / raw)
  To: Benoît Canet; +Cc: pbonzini, dietmar, qemu-devel

On Mon, Sep 02, 2013 at 02:57:23PM +0200, Benoît Canet wrote:
> 
> I don't see the point of using hashes.
> Using hashes means that at least one extra read will be done on the target to
> compute the candidate target hash.
> It's bad for a cloud provider where IOs count is a huge cost.
> 
> Another structure to replace a bitmap (smaller on the canonical case) would be
> a block table as described in the Hystor paper:
> www.cse.ohio-state.edu/~fchen/paper/papers/ics11.pdf

This is similar to syncing image formats that use a revision number for
each cluster instead of a hash.

The problem with counters is overflow.  In the case of Hystor it is not
necessary to preserve exact counts.  A dirty bitmap must mark a block
dirty if it has been modified, otherwise there is a risk of data loss.

A bit more than just counters are necessary to implement a persistent
dirty bitmap, but maybe it's possible with some additional state.

Stefan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
@ 2013-09-02 12:57 Benoît Canet
  2013-09-03  7:54 ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Benoît Canet @ 2013-09-02 12:57 UTC (permalink / raw)
  To: dietmar; +Cc: pbonzini, qemu-devel, stefanha


I don't see the point of using hashes.
Using hashes means that at least one extra read will be done on the target to
compute the candidate target hash.
It's bad for a cloud provider where IOs count is a huge cost.

Another structure to replace a bitmap (smaller on the canonical case) would be
a block table as described in the Hystor paper:
www.cse.ohio-state.edu/~fchen/paper/papers/ics11.pdf

Best regards

Benoît

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2013-09-03  7:54 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-15 14:34 [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Stefan Hajnoczi
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 1/8] block: add bdrv_add_before_write_cb() Stefan Hajnoczi
2013-05-15 14:42   ` Paolo Bonzini
2013-05-16  2:42     ` Wenchao Xia
2013-05-16  8:11     ` Stefan Hajnoczi
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver Stefan Hajnoczi
2013-05-16  3:27   ` Wenchao Xia
2013-05-16  7:30     ` Stefan Hajnoczi
2013-05-20 11:30   ` Paolo Bonzini
2013-05-21 13:34     ` Stefan Hajnoczi
2013-05-21 15:11       ` Paolo Bonzini
2013-05-21 15:25         ` Dietmar Maurer
2013-05-21 15:35           ` Paolo Bonzini
2013-05-21 15:54             ` Dietmar Maurer
2013-05-21 16:03               ` Paolo Bonzini
2013-05-21 16:15                 ` Dietmar Maurer
2013-05-21 16:26         ` Dietmar Maurer
2013-05-21 16:46           ` Paolo Bonzini
2013-05-22 13:37             ` Stefan Hajnoczi
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 3/8] block: add drive-backup QMP command Stefan Hajnoczi
2013-05-15 19:04   ` Eric Blake
2013-05-16  7:31     ` Stefan Hajnoczi
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 4/8] qemu-iotests: add 055 drive-backup test case Stefan Hajnoczi
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 5/8] blockdev: rename BlkTransactionStates to singular Stefan Hajnoczi
2013-05-15 19:09   ` Eric Blake
2013-05-16  2:28   ` Wenchao Xia
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 6/8] blockdev: add DriveBackup transaction Stefan Hajnoczi
2013-05-15 19:13   ` Eric Blake
2013-05-16  7:36     ` Stefan Hajnoczi
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 7/8] blockdev: add Abort transaction Stefan Hajnoczi
2013-05-15 19:01   ` Eric Blake
2013-05-16  2:26     ` Wenchao Xia
2013-05-16  7:37     ` Stefan Hajnoczi
2013-05-15 14:34 ` [Qemu-devel] [PATCH v3 8/8] qemu-iotests: test 'drive-backup' transaction in 055 Stefan Hajnoczi
2013-05-16  6:16 ` [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command Wenchao Xia
2013-05-16  7:47   ` Stefan Hajnoczi
2013-05-17  6:58     ` Wenchao Xia
2013-05-17  9:14       ` Stefan Hajnoczi
2013-05-21  3:25         ` Wenchao Xia
2013-05-21  7:34           ` Stefan Hajnoczi
2013-05-17 10:17     ` Paolo Bonzini
2013-05-20  6:24       ` Stefan Hajnoczi
2013-05-20  7:23         ` Paolo Bonzini
2013-05-21  7:31           ` Stefan Hajnoczi
2013-05-21  8:30             ` Paolo Bonzini
2013-05-21 10:34               ` Stefan Hajnoczi
2013-05-21 10:36                 ` Paolo Bonzini
2013-05-21 10:58                   ` Dietmar Maurer
2013-05-22 13:43                     ` Stefan Hajnoczi
2013-05-22 15:10                       ` Dietmar Maurer
2013-05-22 15:34                       ` Dietmar Maurer
2013-05-23  8:04                         ` Stefan Hajnoczi
2013-05-23  8:11                           ` Dietmar Maurer
2013-05-24  8:38                             ` Stefan Hajnoczi
2013-05-24  9:53                               ` Dietmar Maurer
2013-09-02 12:57 Benoît Canet
2013-09-03  7:54 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.