All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD
@ 2013-07-17  9:42 Fam Zheng
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard Fam Zheng
                   ` (11 more replies)
  0 siblings, 12 replies; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

This series adds for point-in-time snapshot NBD exporting based on
blockdev-backup (variant of drive-backup with existing device as target). This
patch is built on top of imain's sync mode patches for drive-backup.

We get a thin point-in-time snapshot by COW mechanism of drive-backup, and
export it through built in NBD server. The steps are as below:

 1. (SHELL) qemu-img create -f qcow2 BACKUP.qcow2 <source size here>

    (Alternatively we can use -o backing_file=RUNNING-VM.img to omit explicitly
    providing the size by ourselves, but it's risky because RUNNING-VM.qcow2 is
    used r/w by guest. Whether or not setting backing file in the image file
    doesn't matter, as we are going to override the backing hd in the next
    step)

 2. (HMP) drive_add backing=ide0-hd0,file=BACKUP.qcow2,id=target0,if=none

    (where ide0-hd0 is the running BlockDriverState name for
    RUNNING-VM.img. This patch implements "backing=" option to override
    backing_hd for added drive)

 3. (QMP) blockdev-backup device=ide0-hd0 sync=none target=target0

    (this is the QMP command introduced by this series, which use a named
    device as target of drive-backup)

 4. (QMP) nbd-server-add device=target0

When image fleecing done:

 1. (QMP) block-job-complete device=ide0-hd0

 2. (HMP) drive_del target0

 3. (SHELL) rm BACKUP.qcow2

v2:
    * Introduce soft and hard reference count, compared to a single type of ref count in v1.
    * Add "backing=" option to drive_add.
    * Introduce QMP command "blockdev-backup"

Fam Zheng (11):
  block: replace in_use with refcnt_soft and refcnt_hard
  block: use refcnt for bs->backing_hd and bs->file
  block: use refcnt for drive_init/drive_uninit
  block: use refcnt for device attach/detach
  migration: omit drive ref as we have bdrv_ref now
  xen_disk: simplify blk_disconnect with refcnt
  block: hold hard reference for backup/mirror target
  block: simplify bdrv_drop_intermediate
  block: add assertion to check refcount before deleting
  block: add option 'backing' to -drive options
  qmp: add command 'blockdev-backup'

 block-migration.c               |   6 +-
 block.c                         | 155 +++++++++++++++++++++-------------------
 block/backup.c                  |   3 +-
 block/blkdebug.c                |   1 +
 block/blkverify.c               |   1 +
 block/mirror.c                  |   4 +-
 block/snapshot.c                |   2 +-
 block/stream.c                  |   2 +-
 block/vvfat.c                   |   1 +
 blockdev.c                      |  77 +++++++++++++++++++-
 blockjob.c                      |   6 +-
 hw/block/dataplane/virtio-blk.c |   4 +-
 hw/block/xen_disk.c             |   7 +-
 include/block/block.h           |   5 +-
 include/block/block_int.h       |   3 +-
 qapi-schema.json                |  49 +++++++++++++
 qmp-commands.hx                 |  22 ++++++
 17 files changed, 248 insertions(+), 100 deletions(-)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-17 12:26   ` Paolo Bonzini
  2013-07-23  9:36   ` Stefan Hajnoczi
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 02/11] block: use refcnt for bs->backing_hd and bs->file Fam Zheng
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
to BlockDriverState, since in_use mechanism cannot provide proper
management of lifecycle when a BDS is referenced in multiple places
(e.g. pointed to by another bs's backing_hd while also used as a block
job device, in the use case of image fleecing).

The original in_use case is considered a "hard reference" in this patch,
where the bs is busy and should not be used in other tasks that require
a hard reference. (However the interface doesn't force this, caller
still need to call bdrv_in_use() to check by itself.).

A soft reference is implemented but not used yet. It will be used in
following patches to manage the lifecycle together with hard reference.

If bdrv_ref() is called on a BDS, it must be released by exactly the
same numbers of bdrv_unref() with the same "soft/hard" type, and never
call bdrv_delete() directly. If the BDS is only used locally (unnamed),
bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block-migration.c               |  4 ++--
 block.c                         | 40 +++++++++++++++++++++++++++++++---------
 blockjob.c                      |  6 +++---
 hw/block/dataplane/virtio-blk.c |  4 ++--
 include/block/block.h           |  5 +++--
 include/block/block_int.h       |  3 ++-
 6 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 2fd7699..d558410 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -321,7 +321,7 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
         bmds->shared_base = block_mig_state.shared_base;
         alloc_aio_bitmap(bmds);
         drive_get_ref(drive_get_by_blockdev(bs));
-        bdrv_set_in_use(bs, 1);
+        bdrv_ref(bs, true);
 
         block_mig_state.total_sector_sum += sectors;
 
@@ -557,7 +557,7 @@ static void blk_mig_cleanup(void)
     blk_mig_lock();
     while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
         QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
-        bdrv_set_in_use(bmds->bs, 0);
+        bdrv_unref(bmds->bs, true);
         drive_put_ref(drive_get_by_blockdev(bmds->bs));
         g_free(bmds->aio_bitmap);
         g_free(bmds);
diff --git a/block.c b/block.c
index b560241..4170ff6 100644
--- a/block.c
+++ b/block.c
@@ -1511,8 +1511,11 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     /* dirty bitmap */
     bs_dest->dirty_bitmap       = bs_src->dirty_bitmap;
 
+    /* reference count */
+    bs_dest->refcnt_soft        = bs_src->refcnt_soft;
+    bs_dest->refcnt_hard        = bs_src->refcnt_hard;
+
     /* job */
-    bs_dest->in_use             = bs_src->in_use;
     bs_dest->job                = bs_src->job;
 
     /* keep the same entry in bdrv_states */
@@ -1542,7 +1545,6 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     assert(bs_new->dirty_bitmap == NULL);
     assert(bs_new->job == NULL);
     assert(bs_new->dev == NULL);
-    assert(bs_new->in_use == 0);
     assert(bs_new->io_limits_enabled == false);
     assert(bs_new->block_timer == NULL);
 
@@ -1561,7 +1563,6 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     /* Check a few fields that should remain attached to the device */
     assert(bs_new->dev == NULL);
     assert(bs_new->job == NULL);
-    assert(bs_new->in_use == 0);
     assert(bs_new->io_limits_enabled == false);
     assert(bs_new->block_timer == NULL);
 
@@ -1598,7 +1599,6 @@ void bdrv_delete(BlockDriverState *bs)
 {
     assert(!bs->dev);
     assert(!bs->job);
-    assert(!bs->in_use);
 
     /* remove from list, if necessary */
     bdrv_make_anon(bs);
@@ -4374,15 +4374,37 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs)
     }
 }
 
-void bdrv_set_in_use(BlockDriverState *bs, int in_use)
+/* Get a soft or hard reference to bs */
+void bdrv_ref(BlockDriverState *bs, bool is_hard)
+{
+    if (is_hard) {
+        bs->refcnt_hard++;
+    } else {
+        bs->refcnt_soft++;
+    }
+}
+
+/* Release a previously grabbed reference to bs, need to specify if it is hard
+ * or soft. If after this releasing, both soft and hard reference counts are
+ * zero, the BlockDriverState is deleted. */
+void bdrv_unref(BlockDriverState *bs, bool is_hard)
 {
-    assert(bs->in_use != in_use);
-    bs->in_use = in_use;
+    if (is_hard) {
+        assert(bs->refcnt_hard > 0);
+        bs->refcnt_hard--;
+    } else {
+        assert(bs->refcnt_soft > 0);
+        bs->refcnt_soft--;
+    }
+    if (bs->refcnt_hard == 0 && bs->refcnt_soft == 0) {
+        bdrv_close(bs);
+        bdrv_delete(bs);
+    }
 }
 
-int bdrv_in_use(BlockDriverState *bs)
+bool bdrv_in_use(BlockDriverState *bs)
 {
-    return bs->in_use;
+    return bs->refcnt_hard > 0;
 }
 
 void bdrv_iostatus_enable(BlockDriverState *bs)
diff --git a/blockjob.c b/blockjob.c
index ca80df1..24f07f9 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -45,7 +45,7 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
         error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
         return NULL;
     }
-    bdrv_set_in_use(bs, 1);
+    bdrv_ref(bs, true);
 
     job = g_malloc0(job_type->instance_size);
     job->job_type      = job_type;
@@ -63,7 +63,7 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
         if (error_is_set(&local_err)) {
             bs->job = NULL;
             g_free(job);
-            bdrv_set_in_use(bs, 0);
+            bdrv_unref(bs, true);
             error_propagate(errp, local_err);
             return NULL;
         }
@@ -79,7 +79,7 @@ void block_job_completed(BlockJob *job, int ret)
     job->cb(job->opaque, ret);
     bs->job = NULL;
     g_free(job);
-    bdrv_set_in_use(bs, 0);
+    bdrv_unref(bs, true);
 }
 
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 0356665..3075e84 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -431,7 +431,7 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk,
     s->blk = blk;
 
     /* Prevent block operations that conflict with data plane thread */
-    bdrv_set_in_use(blk->conf.bs, 1);
+    bdrv_ref(blk->conf.bs, true);
 
     error_setg(&s->migration_blocker,
             "x-data-plane does not support migration");
@@ -450,7 +450,7 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
     virtio_blk_data_plane_stop(s);
     migrate_del_blocker(s->migration_blocker);
     error_free(s->migration_blocker);
-    bdrv_set_in_use(s->blk->conf.bs, 0);
+    bdrv_unref(s->blk->conf.bs, true);
     g_free(s);
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index b6b9014..8df8794 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -354,8 +354,9 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
-void bdrv_set_in_use(BlockDriverState *bs, int in_use);
-int bdrv_in_use(BlockDriverState *bs);
+void bdrv_ref(BlockDriverState *bs, bool is_hard);
+void bdrv_unref(BlockDriverState *bs, bool is_hard);
+bool bdrv_in_use(BlockDriverState *bs);
 
 #ifdef CONFIG_LINUX_AIO
 int raw_get_aio_fd(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c6ac871..9de5a8f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -294,7 +294,8 @@ struct BlockDriverState {
     BlockDeviceIoStatus iostatus;
     char device_name[32];
     HBitmap *dirty_bitmap;
-    int in_use; /* users other than guest access, eg. block migration */
+    int refcnt_soft;
+    int refcnt_hard;
     QTAILQ_ENTRY(BlockDriverState) list;
 
     QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 02/11] block: use refcnt for bs->backing_hd and bs->file
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 03/11] block: use refcnt for drive_init/drive_uninit Fam Zheng
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c           | 18 ++++++++++++++----
 block/blkdebug.c  |  1 +
 block/blkverify.c |  1 +
 block/snapshot.c  |  2 +-
 block/stream.c    |  2 +-
 block/vvfat.c     |  1 +
 6 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 4170ff6..7b46669 100644
--- a/block.c
+++ b/block.c
@@ -932,6 +932,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options)
         bs->open_flags |= BDRV_O_NO_BACKING;
         return ret;
     }
+    bdrv_ref(bs->backing_hd, false);
     return 0;
 }
 
@@ -1075,9 +1076,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
 
     if (bs->file != file) {
         bdrv_delete(file);
-        file = NULL;
     }
 
+    file = NULL;
+    bdrv_ref(bs->file, false);
+
     /* If there is a backing file, use it */
     if ((flags & BDRV_O_NO_BACKING) == 0) {
         QDict *backing_options;
@@ -1375,7 +1378,7 @@ void bdrv_close(BlockDriverState *bs)
 
     if (bs->drv) {
         if (bs->backing_hd) {
-            bdrv_delete(bs->backing_hd);
+            bdrv_unref(bs->backing_hd, false);
             bs->backing_hd = NULL;
         }
         bs->drv->bdrv_close(bs);
@@ -1399,7 +1402,7 @@ void bdrv_close(BlockDriverState *bs)
         bs->options = NULL;
 
         if (bs->file != NULL) {
-            bdrv_delete(bs->file);
+            bdrv_unref(bs->file, false);
             bs->file = NULL;
         }
     }
@@ -1587,7 +1590,11 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
 
     /* The contents of 'tmp' will become bs_top, as we are
      * swapping bs_new and bs_top contents. */
+    if (bs_top->backing_hd) {
+        bdrv_unref(bs_top->backing_hd, false);
+    }
     bs_top->backing_hd = bs_new;
+    bdrv_ref(bs_new, false);
     bs_top->open_flags &= ~BDRV_O_NO_BACKING;
     pstrcpy(bs_top->backing_file, sizeof(bs_top->backing_file),
             bs_new->filename);
@@ -2108,8 +2115,11 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
     if (ret) {
         goto exit;
     }
+    if (new_top_bs->backing_hd) {
+        bdrv_unref(new_top_bs->backing_hd, false);
+    }
     new_top_bs->backing_hd = base_bs;
-
+    bdrv_ref(base_bs, false);
 
     QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
         /* so that bdrv_close() does not recursively close the chain */
diff --git a/block/blkdebug.c b/block/blkdebug.c
index ccb627a..c4ddbac 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -389,6 +389,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags)
     if (ret < 0) {
         goto fail;
     }
+    bdrv_ref(bs->file, false);
 
     ret = 0;
 fail:
diff --git a/block/blkverify.c b/block/blkverify.c
index 1d58cc3..5996045 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -144,6 +144,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags)
     if (ret < 0) {
         goto fail;
     }
+    bdrv_ref(bs->file, false);
 
     /* Open the test file */
     filename = qemu_opt_get(opts, "x-image");
diff --git a/block/snapshot.c b/block/snapshot.c
index 6c6d9de..4f145fa 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -99,7 +99,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         ret = bdrv_snapshot_goto(bs->file, snapshot_id);
         open_ret = drv->bdrv_open(bs, NULL, bs->open_flags);
         if (open_ret < 0) {
-            bdrv_delete(bs->file);
+            bdrv_unref(bs->file, false);
             bs->drv = NULL;
             return open_ret;
         }
diff --git a/block/stream.c b/block/stream.c
index 7fe9e48..bcc8dc6 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -68,7 +68,7 @@ static void close_unused_images(BlockDriverState *top, BlockDriverState *base,
         unused = intermediate;
         intermediate = intermediate->backing_hd;
         unused->backing_hd = NULL;
-        bdrv_delete(unused);
+        bdrv_unref(unused, false);
     }
     top->backing_hd = base;
 }
diff --git a/block/vvfat.c b/block/vvfat.c
index 87b0279..353d4ac 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2949,6 +2949,7 @@ static int enable_write_target(BDRVVVFATState *s)
     s->bs->backing_hd->drv = &vvfat_write_target;
     s->bs->backing_hd->opaque = g_malloc(sizeof(void*));
     *(void**)s->bs->backing_hd->opaque = s;
+    bdrv_ref(s->bs->backing_hd, false);
 
     return 0;
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 03/11] block: use refcnt for drive_init/drive_uninit
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard Fam Zheng
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 02/11] block: use refcnt for bs->backing_hd and bs->file Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach Fam Zheng
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index c5abd65..bb986a1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -212,7 +212,7 @@ static void bdrv_format_print(void *opaque, const char *name)
 static void drive_uninit(DriveInfo *dinfo)
 {
     qemu_opts_del(dinfo->opts);
-    bdrv_delete(dinfo->bdrv);
+    bdrv_unref(dinfo->bdrv, false);
     g_free(dinfo->id);
     QTAILQ_REMOVE(&drives, dinfo, next);
     g_free(dinfo->serial);
@@ -709,8 +709,10 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
         goto err;
     }
 
-    if (bdrv_key_required(dinfo->bdrv))
+    bdrv_ref(dinfo->bdrv, false);
+    if (bdrv_key_required(dinfo->bdrv)) {
         autostart = 0;
+    }
 
     qemu_opts_del(opts);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (2 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 03/11] block: use refcnt for drive_init/drive_uninit Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-23  9:44   ` Stefan Hajnoczi
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now Fam Zheng
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 7b46669..57a3876 100644
--- a/block.c
+++ b/block.c
@@ -1622,6 +1622,7 @@ int bdrv_attach_dev(BlockDriverState *bs, void *dev)
         return -EBUSY;
     }
     bs->dev = dev;
+    bdrv_ref(bs, false);
     bdrv_iostatus_reset(bs);
     return 0;
 }
@@ -1639,6 +1640,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev)
 {
     assert(bs->dev == dev);
     bs->dev = NULL;
+    bdrv_unref(bs, false);
     bs->dev_ops = NULL;
     bs->dev_opaque = NULL;
     bs->buffer_alignment = 512;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (3 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-23  9:49   ` Stefan Hajnoczi
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt Fam Zheng
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block-migration.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index d558410..d14f4eb 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -320,7 +320,6 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
         bmds->completed_sectors = 0;
         bmds->shared_base = block_mig_state.shared_base;
         alloc_aio_bitmap(bmds);
-        drive_get_ref(drive_get_by_blockdev(bs));
         bdrv_ref(bs, true);
 
         block_mig_state.total_sector_sum += sectors;
@@ -558,7 +557,6 @@ static void blk_mig_cleanup(void)
     while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
         QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
         bdrv_unref(bmds->bs, true);
-        drive_put_ref(drive_get_by_blockdev(bmds->bs));
         g_free(bmds->aio_bitmap);
         g_free(bmds);
     }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (4 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-23  9:50   ` Stefan Hajnoczi
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target Fam Zheng
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

We call bdrv_attach_dev when initializing whether or not bs is created
locally, so call bdrv_detach_dev and let the refcnt handle the
lifecycle.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/block/xen_disk.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 247f32f..ae17acc 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -910,12 +910,7 @@ static void blk_disconnect(struct XenDevice *xendev)
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 
     if (blkdev->bs) {
-        if (!blkdev->dinfo) {
-            /* close/delete only if we created it ourself */
-            bdrv_close(blkdev->bs);
-            bdrv_detach_dev(blkdev->bs, blkdev);
-            bdrv_delete(blkdev->bs);
-        }
+        bdrv_detach_dev(blkdev->bs, blkdev);
         blkdev->bs = NULL;
     }
     xen_be_unbind_evtchn(&blkdev->xendev);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (5 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-23  9:52   ` Stefan Hajnoczi
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 08/11] block: simplify bdrv_drop_intermediate Fam Zheng
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/backup.c | 3 ++-
 block/mirror.c | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 16105d4..b82f601 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -294,7 +294,7 @@ static void coroutine_fn backup_run(void *opaque)
     hbitmap_free(job->bitmap);
 
     bdrv_iostatus_disable(target);
-    bdrv_delete(target);
+    bdrv_unref(target, true);
 
     block_job_completed(&job->common, ret);
 }
@@ -332,6 +332,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
         return;
     }
 
+    bdrv_ref(target, true);
     job->on_source_error = on_source_error;
     job->on_target_error = on_target_error;
     job->target = target;
diff --git a/block/mirror.c b/block/mirror.c
index bed4a7e..decdedb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -479,8 +479,7 @@ immediate_exit:
         }
         bdrv_swap(s->target, s->common.bs);
     }
-    bdrv_close(s->target);
-    bdrv_delete(s->target);
+    bdrv_unref(s->target, true);
     block_job_completed(&s->common, ret);
 }
 
@@ -574,6 +573,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->granularity = granularity;
     s->buf_size = MAX(buf_size, granularity);
 
+    bdrv_ref(target, true);
     bdrv_set_dirty_tracking(bs, granularity);
     bdrv_set_enable_write_cache(s->target, true);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 08/11] block: simplify bdrv_drop_intermediate
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (6 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-24 23:16   ` Jeff Cody
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 09/11] block: add assertion to check refcount before deleting Fam Zheng
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

bdrv_drop_intermediate used a local list to iterate through backing
chain and delete each BDS. It is simplified while adopting to refcount
mechanism.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 71 ++++++++++-------------------------------------------------------
 1 file changed, 11 insertions(+), 60 deletions(-)

diff --git a/block.c b/block.c
index 57a3876..499de22 100644
--- a/block.c
+++ b/block.c
@@ -2027,12 +2027,6 @@ BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
     return overlay;
 }
 
-typedef struct BlkIntermediateStates {
-    BlockDriverState *bs;
-    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
-} BlkIntermediateStates;
-
-
 /*
  * Drops images above 'base' up to and including 'top', and sets the image
  * above 'top' to have base as its backing file.
@@ -2062,15 +2056,9 @@ typedef struct BlkIntermediateStates {
 int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
                            BlockDriverState *base)
 {
-    BlockDriverState *intermediate;
-    BlockDriverState *base_bs = NULL;
     BlockDriverState *new_top_bs = NULL;
-    BlkIntermediateStates *intermediate_state, *next;
     int ret = -EIO;
 
-    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
-    QSIMPLEQ_INIT(&states_to_delete);
-
     if (!top->drv || !base->drv) {
         goto exit;
     }
@@ -2082,58 +2070,21 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
         goto exit;
     }
 
-    /* special case of new_top_bs->backing_hd already pointing to base - nothing
-     * to do, no intermediate images */
-    if (new_top_bs->backing_hd == base) {
-        ret = 0;
-        goto exit;
-    }
-
-    intermediate = top;
-
-    /* now we will go down through the list, and add each BDS we find
-     * into our deletion queue, until we hit the 'base'
-     */
-    while (intermediate) {
-        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
-        intermediate_state->bs = intermediate;
-        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
-
-        if (intermediate->backing_hd == base) {
-            base_bs = intermediate->backing_hd;
-            break;
+    while (new_top_bs->backing_hd && new_top_bs->backing_hd != base) {
+        BlockDriverState *backing = new_top_bs->backing_hd;
+        if (backing == NULL) {
+            goto exit;
         }
-        intermediate = intermediate->backing_hd;
-    }
-    if (base_bs == NULL) {
-        /* something went wrong, we did not end at the base. safely
-         * unravel everything, and exit with error */
-        goto exit;
+        new_top_bs->backing_hd = backing->backing_hd;
+        /* break backing_hd chain before releasing bs, so we don't free all the
+         * way up the backing chain */
+        backing->backing_hd = NULL;
+        bdrv_unref(backing, false);
     }
 
-    /* success - we can delete the intermediate states, and link top->base */
-    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
-                                   base_bs->drv ? base_bs->drv->format_name : "");
-    if (ret) {
-        goto exit;
-    }
-    if (new_top_bs->backing_hd) {
-        bdrv_unref(new_top_bs->backing_hd, false);
-    }
-    new_top_bs->backing_hd = base_bs;
-    bdrv_ref(base_bs, false);
-
-    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
-        /* so that bdrv_close() does not recursively close the chain */
-        intermediate_state->bs->backing_hd = NULL;
-        bdrv_delete(intermediate_state->bs);
-    }
-    ret = 0;
-
+    ret = bdrv_change_backing_file(new_top_bs, base->filename,
+                                   base->drv ? base->drv->format_name : "");
 exit:
-    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
-        g_free(intermediate_state);
-    }
     return ret;
 }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 09/11] block: add assertion to check refcount before deleting
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (7 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 08/11] block: simplify bdrv_drop_intermediate Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options Fam Zheng
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

If BDS is managed with refcnt, we should not call bdrv_delete()
directly, instead bdrv_unref() should be used. Adding assertion to
ensure this.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 499de22..147a448 100644
--- a/block.c
+++ b/block.c
@@ -1606,6 +1606,8 @@ void bdrv_delete(BlockDriverState *bs)
 {
     assert(!bs->dev);
     assert(!bs->job);
+    assert(!bs->refcnt_soft);
+    assert(!bs->refcnt_hard);
 
     /* remove from list, if necessary */
     bdrv_make_anon(bs);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (8 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 09/11] block: add assertion to check refcount before deleting Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-17 12:36   ` Paolo Bonzini
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup' Fam Zheng
  2013-07-19 10:41 ` [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Wenchao Xia
  11 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

This option allows overriding backing hd of drive. If the target drive
exists, it's referenced as the backing file and refcount incremented.

Example:
    qemu-system-x86_64 -drive \
        file.filename=foo.qcow2,if=none,id=foo \
        -drive file=bar.qcow2,backing=foo

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 147a448..31fab07 100644
--- a/block.c
+++ b/block.c
@@ -1083,12 +1083,30 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
 
     /* If there is a backing file, use it */
     if ((flags & BDRV_O_NO_BACKING) == 0) {
-        QDict *backing_options;
-
-        extract_subqdict(options, &backing_options, "backing.");
-        ret = bdrv_open_backing_file(bs, backing_options);
-        if (ret < 0) {
-            goto close_and_fail;
+        const char *backing_drive;
+        backing_drive = qdict_get_try_str(options, "backing");
+        if (backing_drive) {
+            bs->backing_hd = bdrv_find(backing_drive);
+            if (bs->backing_hd) {
+                bdrv_ref(bs->backing_hd, false);
+                pstrcpy(bs->backing_file, sizeof(bs->backing_file),
+                        bs->backing_hd->filename);
+                pstrcpy(bs->backing_format, sizeof(bs->backing_format),
+                        bs->backing_hd->drv->format_name);
+            } else {
+                qerror_report(ERROR_CLASS_DEVICE_NOT_FOUND,
+                        "Can't find drive with name %s\n", backing_drive);
+                ret = -EINVAL;
+                goto close_and_fail;
+            }
+            qdict_del(options, "backing");
+        } else {
+            QDict *backing_options;
+            extract_subqdict(options, &backing_options, "backing.");
+            ret = bdrv_open_backing_file(bs, backing_options);
+            if (ret < 0) {
+                goto close_and_fail;
+            }
         }
     }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup'
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (9 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options Fam Zheng
@ 2013-07-17  9:42 ` Fam Zheng
  2013-07-17 12:44   ` Eric Blake
  2013-07-19 10:41 ` [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Wenchao Xia
  11 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-17  9:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, hbrock, rjones, imain, stefanha, pbonzini

Similar to drive-backup, but this command uses a device id as target
instead of creating/opening an image file.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c       | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 49 ++++++++++++++++++++++++++++++++++++++
 qmp-commands.hx  | 22 ++++++++++++++++++
 3 files changed, 142 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index bb986a1..a3fa5d1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1517,7 +1517,78 @@ void qmp_drive_backup(const char *device, const char *target,
         error_propagate(errp, local_err);
         return;
     }
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
+
+void qmp_blockdev_backup(const char *device, const char *target,
+                       enum MirrorSyncMode sync,
+                       bool has_speed, int64_t speed,
+                       bool has_on_source_error,
+                       BlockdevOnError on_source_error,
+                       bool has_on_target_error,
+                       BlockdevOnError on_target_error,
+                       Error **errp)
+{
+    BlockDriverState *bs;
+    BlockDriverState *target_bs;
+    Error *local_err = NULL;
+
+    if (sync != MIRROR_SYNC_MODE_FULL) {
+        error_setg(errp, "only sync mode 'full' is currently supported");
+        return;
+    }
+    if (!has_speed) {
+        speed = 0;
+    }
+    if (!has_on_source_error) {
+        on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
+    if (!has_on_target_error) {
+        on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    if (!bdrv_is_inserted(bs)) {
+        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+        return;
+    }
+
+    if (bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, device);
+        return;
+    }
+
+    target_bs = bdrv_find(target);
+    if (!target_bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, target);
+        return;
+    }
+
+    if (!bdrv_is_inserted(target_bs)) {
+        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, target);
+        return;
+    }
+
+    if (bdrv_in_use(target_bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, target);
+        return;
+    }
 
+    backup_start(bs, target_bs, speed, on_source_error, on_target_error,
+                 block_job_cb, bs, &local_err);
+    if (local_err != NULL) {
+        bdrv_delete(target_bs);
+        error_propagate(errp, local_err);
+        return;
+    }
     /* Grab a reference so hotplug does not delete the BlockDriverState from
      * underneath us.
      */
diff --git a/qapi-schema.json b/qapi-schema.json
index 7b9fef1..3b7cfcc 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1665,6 +1665,40 @@
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
+# @BlockdevBackup
+#
+# @device: the name of the device which should be copied.
+#
+# @target: the name of the backup target device
+#
+# @sync: what parts of the disk image should be copied to the destination
+#        (all the disk, only the sectors allocated in the topmost image, or
+#        only new I/O).
+#
+# @speed: #optional the maximum speed, in bytes per second
+#
+# @on-source-error: #optional the action to take on an error on the source,
+#                   default 'report'.  'stop' and 'enospc' can only be used
+#                   if the block device supports io-status (see BlockInfo).
+#
+# @on-target-error: #optional the action to take on an error on the target,
+#                   default 'report' (no limitations, since this applies to
+#                   a different block device than @device).
+#
+# Note that @on-source-error and @on-target-error only affect background I/O.
+# If an error occurs during a guest write request, the device's rerror/werror
+# actions will be used.
+#
+# Since: 1.6
+##
+{ 'type': 'BlockdevBackup',
+  'data': { 'device': 'str', 'target': 'str',
+            'sync': 'MirrorSyncMode',
+            '*speed': 'int',
+            '*on-source-error': 'BlockdevOnError',
+            '*on-target-error': 'BlockdevOnError' } }
+
+##
 # @Abort
 #
 # This action can be used to test transaction failure.
@@ -1806,6 +1840,21 @@
 { 'command': 'drive-backup', 'data': 'DriveBackup' }
 
 ##
+# @blockdev-backup
+#
+# Block device version for drive-backup command. Use existing device as target
+# of backup.
+#
+# For the arguments, see the documentation of BlockdevBackup.
+#
+# Returns: nothing on success
+#          If @device or @target is not a valid block device, DeviceNotFound
+#
+# Since 1.6
+##
+{ 'command': 'blockdev-backup', 'data': 'BlockdevBackup' }
+
+##
 # @drive-mirror
 #
 # Start mirroring a block device's writes to a new destination.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e075df4..ecd9f64 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -959,6 +959,28 @@ Example:
 -> { "execute": "drive-backup", "arguments": { "device": "drive0",
                                                "target": "backup.img" } }
 <- { "return": {} }
+
+EQMP
+
+    {
+        .name       = "blockdev-backup",
+        .args_type  = "sync:s,device:B,target:B,speed:i?,"
+                      "on-source-error:s?,on-target-error:s?",
+        .mhandler.cmd_new = qmp_marshal_input_blockdev_backup,
+    },
+
+SQMP
+blockdev-backup
+------------
+
+The device version of drive-backup: this command doesn't create new image for
+target, instead it uses a existing named device as target.
+
+Example:
+-> { "execute": "blockdev-backup", "arguments": { "device": "drive0",
+                                                  "target": "drive1" } }
+<- { "return": {} }
+
 EQMP
 
     {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard Fam Zheng
@ 2013-07-17 12:26   ` Paolo Bonzini
  2013-07-18  4:53     ` Fam Zheng
  2013-07-23  9:36   ` Stefan Hajnoczi
  1 sibling, 1 reply; 46+ messages in thread
From: Paolo Bonzini @ 2013-07-17 12:26 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha

Il 17/07/2013 11:42, Fam Zheng ha scritto:
> Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> to BlockDriverState, since in_use mechanism cannot provide proper
> management of lifecycle when a BDS is referenced in multiple places
> (e.g. pointed to by another bs's backing_hd while also used as a block
> job device, in the use case of image fleecing).
> 
> The original in_use case is considered a "hard reference" in this patch,
> where the bs is busy and should not be used in other tasks that require
> a hard reference. (However the interface doesn't force this, caller
> still need to call bdrv_in_use() to check by itself.).
> 
> A soft reference is implemented but not used yet. It will be used in
> following patches to manage the lifecycle together with hard reference.
> 
> If bdrv_ref() is called on a BDS, it must be released by exactly the
> same numbers of bdrv_unref() with the same "soft/hard" type, and never
> call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().

Pardon the stupid question: why do we need a "soft" reference?  We have
these behaviors:

- a sync:'none' backup job doesn't stop until cancelled

- cancelling or completing the job closes the target device, which in
turn stops the NBD server and removes the need to access the source
device via backing_hd

- ejecting the source device cancels the job, which in turn also removes
the need to access the source device via backing_hd

blockdev-backup can sipmly add a reference to the DriveInfo of the
target that it receives.  backup_start has to choose between using
drive_put_ref and bdrv_delete on the target, and can do so using
drive_get_by_blockdev.

backup_start can also mark the target in use, so that drive_del is
prevented while backup_start is running.  After the target device is
closed (and not in_use anymore), all you need to do is invoke drive_del
to delete the BDS.

I don't dislike the series, but I wonder if all this machinery is
actually needed in 1.6.

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options Fam Zheng
@ 2013-07-17 12:36   ` Paolo Bonzini
  2013-07-17 12:58     ` Kevin Wolf
  2013-07-22  6:07     ` Fam Zheng
  0 siblings, 2 replies; 46+ messages in thread
From: Paolo Bonzini @ 2013-07-17 12:36 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha

Il 17/07/2013 11:42, Fam Zheng ha scritto:
> This option allows overriding backing hd of drive. If the target drive
> exists, it's referenced as the backing file and refcount incremented.
> 
> Example:
>     qemu-system-x86_64 -drive \
>         file.filename=foo.qcow2,if=none,id=foo \
>         -drive file=bar.qcow2,backing=foo

I guess this is where we need the soft reference.

This has a _lot_ of potential for misuse, I think Kevin bashed me and
Federico very heavily when we tried to do something similar.

block/backup.c is the right place where we can override the backing hd
of the drive.  Perhaps we can add a way to open a file with
BDRV_O_NO_BACKING from the command line.

Paolo

> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 147a448..31fab07 100644
> --- a/block.c
> +++ b/block.c
> @@ -1083,12 +1083,30 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
>  
>      /* If there is a backing file, use it */
>      if ((flags & BDRV_O_NO_BACKING) == 0) {
> -        QDict *backing_options;
> -
> -        extract_subqdict(options, &backing_options, "backing.");
> -        ret = bdrv_open_backing_file(bs, backing_options);
> -        if (ret < 0) {
> -            goto close_and_fail;
> +        const char *backing_drive;
> +        backing_drive = qdict_get_try_str(options, "backing");
> +        if (backing_drive) {
> +            bs->backing_hd = bdrv_find(backing_drive);
> +            if (bs->backing_hd) {
> +                bdrv_ref(bs->backing_hd, false);
> +                pstrcpy(bs->backing_file, sizeof(bs->backing_file),
> +                        bs->backing_hd->filename);
> +                pstrcpy(bs->backing_format, sizeof(bs->backing_format),
> +                        bs->backing_hd->drv->format_name);
> +            } else {
> +                qerror_report(ERROR_CLASS_DEVICE_NOT_FOUND,
> +                        "Can't find drive with name %s\n", backing_drive);
> +                ret = -EINVAL;
> +                goto close_and_fail;
> +            }
> +            qdict_del(options, "backing");
> +        } else {
> +            QDict *backing_options;
> +            extract_subqdict(options, &backing_options, "backing.");
> +            ret = bdrv_open_backing_file(bs, backing_options);
> +            if (ret < 0) {
> +                goto close_and_fail;
> +            }
>          }
>      }
>  
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup'
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup' Fam Zheng
@ 2013-07-17 12:44   ` Eric Blake
  2013-07-18  4:41     ` Fam Zheng
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2013-07-17 12:44 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

On 07/17/2013 03:42 AM, Fam Zheng wrote:
> Similar to drive-backup, but this command uses a device id as target
> instead of creating/opening an image file.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  blockdev.c       | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json | 49 ++++++++++++++++++++++++++++++++++++++
>  qmp-commands.hx  | 22 ++++++++++++++++++
>  3 files changed, 142 insertions(+)
> 

> +++ b/qapi-schema.json
> @@ -1665,6 +1665,40 @@
>              '*on-target-error': 'BlockdevOnError' } }
>  
>  ##
> +# @BlockdevBackup
> +#

> +{ 'type': 'BlockdevBackup',
> +  'data': { 'device': 'str', 'target': 'str',
> +            'sync': 'MirrorSyncMode',
> +            '*speed': 'int',
> +            '*on-source-error': 'BlockdevOnError',
> +            '*on-target-error': 'BlockdevOnError' } }

Seems okay.  But what is missing is the addition of this type into the
union used for 'transaction' - shouldn't it be possible to mix this with
other transaction capabilities?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 12:36   ` Paolo Bonzini
@ 2013-07-17 12:58     ` Kevin Wolf
  2013-07-17 13:13       ` Paolo Bonzini
  2013-07-22  6:07     ` Fam Zheng
  1 sibling, 1 reply; 46+ messages in thread
From: Kevin Wolf @ 2013-07-17 12:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Fam Zheng, hbrock, qemu-devel, rjones, imain, stefanha

Am 17.07.2013 um 14:36 hat Paolo Bonzini geschrieben:
> Il 17/07/2013 11:42, Fam Zheng ha scritto:
> > This option allows overriding backing hd of drive. If the target drive
> > exists, it's referenced as the backing file and refcount incremented.
> > 
> > Example:
> >     qemu-system-x86_64 -drive \
> >         file.filename=foo.qcow2,if=none,id=foo \
> >         -drive file=bar.qcow2,backing=foo
> 
> I guess this is where we need the soft reference.
> 
> This has a _lot_ of potential for misuse, I think Kevin bashed me and
> Federico very heavily when we tried to do something similar.

Not sure what exactly I "bashed" you for, but I think this is really the
right thing to do, and it's the general direction we're headed for with
blockdev-add. The user manages his BlockDriverStates and connects them
as he sees fit. The defaults of qemu are only used when the user doesn't
override them (and libvirt is going to override most to use fd passing).

My expectation is, admittedly, that it's hard to get right from where we
stand today, because of the coupling of BlockDriverStates with guest
devices, but with the refcounting patches (which I haven't reviewed
yet), maybe one of the biggest concerns is addressed.

This is basically restarting the discussion where I suggested to give
the targets of a block job names so that they can be reused. It's about
the same kind of misuse that becomes possible and that we need to
protect against.

Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 12:58     ` Kevin Wolf
@ 2013-07-17 13:13       ` Paolo Bonzini
  2013-07-17 13:48         ` Kevin Wolf
  0 siblings, 1 reply; 46+ messages in thread
From: Paolo Bonzini @ 2013-07-17 13:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, hbrock, qemu-devel, rjones, imain, stefanha

Il 17/07/2013 14:58, Kevin Wolf ha scritto:
> Am 17.07.2013 um 14:36 hat Paolo Bonzini geschrieben:
>> Il 17/07/2013 11:42, Fam Zheng ha scritto:
>>> This option allows overriding backing hd of drive. If the target drive
>>> exists, it's referenced as the backing file and refcount incremented.
>>>
>>> Example:
>>>     qemu-system-x86_64 -drive \
>>>         file.filename=foo.qcow2,if=none,id=foo \
>>>         -drive file=bar.qcow2,backing=foo
>>
>> I guess this is where we need the soft reference.
>>
>> This has a _lot_ of potential for misuse, I think Kevin bashed me and
>> Federico very heavily when we tried to do something similar.
> 
> Not sure what exactly I "bashed" you for

Doing strange things with bs->backing_hd (blkmirror comes to mind).

> This is basically restarting the discussion where I suggested to give
> the targets of a block job names so that they can be reused. It's about
> the same kind of misuse that becomes possible and that we need to
> protect against.

Yes.  But then I'm not sure why we need to rush in blockdev-backup now.
 Instead we can simply make drive-backup optionally give a name to the
target.

I understand this is the right thing to do long term, but pre-opening of
the target is not really needed for fleecing.

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 13:13       ` Paolo Bonzini
@ 2013-07-17 13:48         ` Kevin Wolf
  2013-07-17 14:16           ` Paolo Bonzini
  0 siblings, 1 reply; 46+ messages in thread
From: Kevin Wolf @ 2013-07-17 13:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Fam Zheng, hbrock, qemu-devel, rjones, imain, stefanha

Am 17.07.2013 um 15:13 hat Paolo Bonzini geschrieben:
> Il 17/07/2013 14:58, Kevin Wolf ha scritto:
> > Am 17.07.2013 um 14:36 hat Paolo Bonzini geschrieben:
> >> Il 17/07/2013 11:42, Fam Zheng ha scritto:
> >>> This option allows overriding backing hd of drive. If the target drive
> >>> exists, it's referenced as the backing file and refcount incremented.
> >>>
> >>> Example:
> >>>     qemu-system-x86_64 -drive \
> >>>         file.filename=foo.qcow2,if=none,id=foo \
> >>>         -drive file=bar.qcow2,backing=foo
> >>
> >> I guess this is where we need the soft reference.
> >>
> >> This has a _lot_ of potential for misuse, I think Kevin bashed me and
> >> Federico very heavily when we tried to do something similar.
> > 
> > Not sure what exactly I "bashed" you for
> 
> Doing strange things with bs->backing_hd (blkmirror comes to mind).
> 
> > This is basically restarting the discussion where I suggested to give
> > the targets of a block job names so that they can be reused. It's about
> > the same kind of misuse that becomes possible and that we need to
> > protect against.
> 
> Yes.  But then I'm not sure why we need to rush in blockdev-backup now.
>  Instead we can simply make drive-backup optionally give a name to the
> target.
> 
> I understand this is the right thing to do long term, but pre-opening of
> the target is not really needed for fleecing.

So for how much longer should we plan to procrastinate? (I know, not an
entirely fair question, but we have to make the step at some point)

I guess we can give a name to the target, and we can make drive-backup
automatically connect the target with the original as its backing file
(still needs the refcounting, by the way). But is giving a name to the
target not enough to allow "interesting" things to be done? I don't
remember the details from the mirroring discussion, but it seems it were
enough that you didn't want to do it.

And we'll want to reference existing BDSes as backing/protocol files in
blockdev-add soon anyway, so if we decide against it here, it's just
moving from Fam's to-do list to mine...

So no, I'm not totally comfortable with allowing it, but not allowing it
isn't really an option either.

Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 13:48         ` Kevin Wolf
@ 2013-07-17 14:16           ` Paolo Bonzini
  2013-07-17 15:09             ` Kevin Wolf
  0 siblings, 1 reply; 46+ messages in thread
From: Paolo Bonzini @ 2013-07-17 14:16 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, hbrock, qemu-devel, rjones, imain, stefanha

Il 17/07/2013 15:48, Kevin Wolf ha scritto:
>> I understand this is the right thing to do long term, but pre-opening of
>> the target is not really needed for fleecing.
> 
> So for how much longer should we plan to procrastinate? (I know, not an
> entirely fair question, but we have to make the step at some point)

If we bring it up during soft freeze, we will procrastinate it a lot. :)
 If we bring it up at the beginning of a release cycle, the wait could
be as short as 1 month...

> I guess we can give a name to the target, and we can make drive-backup
> automatically connect the target with the original as its backing file
> (still needs the refcounting, by the way).

No, it doesn't need the refcounting (see my reply to the cover letter).
 In his next submission of drive-backup sync modes, Ian is already going
to handle the automatic connection of the target with the original.

> But is giving a name to the
> target not enough to allow "interesting" things to be done? I don't
> remember the details from the mirroring discussion, but it seems it were
> enough that you didn't want to do it.

Yes, but this time we have to bite the bullet on that one at least,
because we have no other choice (we want to do at least one
"interesting" thing, namely connect to it with the NBD server).

> And we'll want to reference existing BDSes as backing/protocol files in
> blockdev-add soon anyway, so if we decide against it here, it's just
> moving from Fam's to-do list to mine...

We can reconsider these very patches in 1 month.  It's just the timing,
combined with the fact that this is not necessary for fleecing, that I'm
uncomfortable with.

Paolo

> So no, I'm not totally comfortable with allowing it, but not allowing it
> isn't really an option either.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 14:16           ` Paolo Bonzini
@ 2013-07-17 15:09             ` Kevin Wolf
  2013-07-17 15:23               ` Paolo Bonzini
  2013-07-23 20:07               ` Ian Main
  0 siblings, 2 replies; 46+ messages in thread
From: Kevin Wolf @ 2013-07-17 15:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Fam Zheng, hbrock, qemu-devel, rjones, imain, stefanha

Am 17.07.2013 um 16:16 hat Paolo Bonzini geschrieben:
> Il 17/07/2013 15:48, Kevin Wolf ha scritto:
> >> I understand this is the right thing to do long term, but pre-opening of
> >> the target is not really needed for fleecing.
> > 
> > So for how much longer should we plan to procrastinate? (I know, not an
> > entirely fair question, but we have to make the step at some point)
> 
> If we bring it up during soft freeze, we will procrastinate it a lot. :)
>  If we bring it up at the beginning of a release cycle, the wait could
> be as short as 1 month...
> 
> > And we'll want to reference existing BDSes as backing/protocol files in
> > blockdev-add soon anyway, so if we decide against it here, it's just
> > moving from Fam's to-do list to mine...
> 
> We can reconsider these very patches in 1 month.  It's just the timing,
> combined with the fact that this is not necessary for fleecing, that I'm
> uncomfortable with.

Okay, I see where this is going. Let me reinforce one fundamental policy
that you may not like. I've rejected patches based on it before and I'll
likely have to do it again in the future. This is it:

    I'm not going to compromise on upstream APIs to accommodate
    downstream schedules.

If doing things properly means moving fleecing to 1.7, so be it. There's
no pressure to have it in upstream 1.6. It might mean downstream patches
for some distro, but that's not a valid point in upstream design
discussions.

Now is the right time to discuss design changes for 1.7, so that as soon
as block-next starts to come to life (i.e. beginning of August), we can
start with implementing this at the earliest possible point in the 1.7
release cycle.

> > I guess we can give a name to the target, and we can make drive-backup
> > automatically connect the target with the original as its backing file
> > (still needs the refcounting, by the way).
> 
> No, it doesn't need the refcounting (see my reply to the cover letter).
>  In his next submission of drive-backup sync modes, Ian is already going
> to handle the automatic connection of the target with the original.

Okay, I'll have a look, but I can't imagine how it work without
refcounting. As soon as it has a name, you can attach the target to a
guest, nbd server, start block jobs and do all kinds of fun with it so
that taking it away when the source goes away becomes problematic. But
I'll have a look at the patches and hope that the commit messages
explain the details.

> > But is giving a name to the
> > target not enough to allow "interesting" things to be done? I don't
> > remember the details from the mirroring discussion, but it seems it were
> > enough that you didn't want to do it.
> 
> Yes, but this time we have to bite the bullet on that one at least,
> because we have no other choice (we want to do at least one
> "interesting" thing, namely connect to it with the NBD server).

Yes, like I said, we might not feel comfortable with enabling these
cases, but not enabling them isn't an option either. So now is the time
to do the real thing.

Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 15:09             ` Kevin Wolf
@ 2013-07-17 15:23               ` Paolo Bonzini
  2013-07-23 20:07               ` Ian Main
  1 sibling, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2013-07-17 15:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, hbrock, qemu-devel, rjones, imain, stefanha

Il 17/07/2013 17:09, Kevin Wolf ha scritto:
> Am 17.07.2013 um 16:16 hat Paolo Bonzini geschrieben:
>> Il 17/07/2013 15:48, Kevin Wolf ha scritto:
>>>> I understand this is the right thing to do long term, but pre-opening of
>>>> the target is not really needed for fleecing.
>>>
>>> So for how much longer should we plan to procrastinate? (I know, not an
>>> entirely fair question, but we have to make the step at some point)
>>
>> If we bring it up during soft freeze, we will procrastinate it a lot. :)
>>  If we bring it up at the beginning of a release cycle, the wait could
>> be as short as 1 month...
>>
>>> And we'll want to reference existing BDSes as backing/protocol files in
>>> blockdev-add soon anyway, so if we decide against it here, it's just
>>> moving from Fam's to-do list to mine...
>>
>> We can reconsider these very patches in 1 month.  It's just the timing,
>> combined with the fact that this is not necessary for fleecing, that I'm
>> uncomfortable with.
> 
> Okay, I see where this is going. Let me reinforce one fundamental policy
> that you may not like.

No, I like it actually.  In no way I want to force fleecing in 1.6.  And
this is by no means about downstream schedules, I don't care about them
(not even sure which downstream you're referring to :)).

However, this API seems (a) overengineered for the purposes of fleecing;
(b) ignoring what Ian is doing to support sync modes in block/backup.c.

>>> I guess we can give a name to the target, and we can make drive-backup
>>> automatically connect the target with the original as its backing file
>>> (still needs the refcounting, by the way).
>>
>> No, it doesn't need the refcounting (see my reply to the cover letter).
>>  In his next submission of drive-backup sync modes, Ian is already going
>> to handle the automatic connection of the target with the original.
> 
> Okay, I'll have a look, but I can't imagine how it work without
> refcounting.  As soon as it has a name, you can attach the target to a
> guest, nbd server, start block jobs and do all kinds of fun with it so
> that taking it away when the source goes away becomes problematic.

- Attach the target to a guest: yes, though I wouldn't do it because the
backup job can bdrv_close it under the guest's feet.  Deletion would be
protected by DriveInfo's refcount.

- NBD server: handles closing just fine

- start block jobs: job will be cancelled as soon as the backup job
bdrv_close's the target

- all kinds of fun with it so that taking it away when the source goes
away becomes problematic: taking away the source takes away the target
too, so it is in the same ballpark as attaching the target to a guest
(i.e. be prepared to have I/O that starts to fail, and remember to use
drive_get/put_ref).

>> Yes, but this time we have to bite the bullet on that one at least,
>> because we have no other choice (we want to do at least one
>> "interesting" thing, namely connect to it with the NBD server).
> 
> Yes, like I said, we might not feel comfortable with enabling these
> cases, but not enabling them isn't an option either. So now is the time
> to do the real thing.

I agree and understand.  At the same time, we already have an (ugly)
mechanism for "soft" reference counts, namely DriveInfo.  So I don't
care about the order between "move refcnt from DriveInfo to BDS" and
"support fleecing with the backup job", but they should be two
completely separate series.  Also, the right API shouldn't be influenced
by which part goes in first.

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup'
  2013-07-17 12:44   ` Eric Blake
@ 2013-07-18  4:41     ` Fam Zheng
  2013-07-19 10:16       ` Wenchao Xia
  0 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-18  4:41 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha, pbonzini

On Wed, 07/17 06:44, Eric Blake wrote:
> On 07/17/2013 03:42 AM, Fam Zheng wrote:
> > Similar to drive-backup, but this command uses a device id as target
> > instead of creating/opening an image file.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  blockdev.c       | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json | 49 ++++++++++++++++++++++++++++++++++++++
> >  qmp-commands.hx  | 22 ++++++++++++++++++
> >  3 files changed, 142 insertions(+)
> > 
> 
> > +++ b/qapi-schema.json
> > @@ -1665,6 +1665,40 @@
> >              '*on-target-error': 'BlockdevOnError' } }
> >  
> >  ##
> > +# @BlockdevBackup
> > +#
> 
> > +{ 'type': 'BlockdevBackup',
> > +  'data': { 'device': 'str', 'target': 'str',
> > +            'sync': 'MirrorSyncMode',
> > +            '*speed': 'int',
> > +            '*on-source-error': 'BlockdevOnError',
> > +            '*on-target-error': 'BlockdevOnError' } }
> 
> Seems okay.  But what is missing is the addition of this type into the
> union used for 'transaction' - shouldn't it be possible to mix this with
> other transaction capabilities?
> 

Should be possible, as users may want point-in-time snapshot of multiple
disks. If this API looks OK, I will include it into transaction in the
next version.

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-17 12:26   ` Paolo Bonzini
@ 2013-07-18  4:53     ` Fam Zheng
  0 siblings, 0 replies; 46+ messages in thread
From: Fam Zheng @ 2013-07-18  4:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha

On Wed, 07/17 14:26, Paolo Bonzini wrote:
> Il 17/07/2013 11:42, Fam Zheng ha scritto:
> > Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> > to BlockDriverState, since in_use mechanism cannot provide proper
> > management of lifecycle when a BDS is referenced in multiple places
> > (e.g. pointed to by another bs's backing_hd while also used as a block
> > job device, in the use case of image fleecing).
> > 
> > The original in_use case is considered a "hard reference" in this patch,
> > where the bs is busy and should not be used in other tasks that require
> > a hard reference. (However the interface doesn't force this, caller
> > still need to call bdrv_in_use() to check by itself.).
> > 
> > A soft reference is implemented but not used yet. It will be used in
> > following patches to manage the lifecycle together with hard reference.
> > 
> > If bdrv_ref() is called on a BDS, it must be released by exactly the
> > same numbers of bdrv_unref() with the same "soft/hard" type, and never
> > call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> > bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
> 
> Pardon the stupid question: why do we need a "soft" reference?

Added in other patch of the series:

    drive_add if=none,file=foo.qcow2,backing=bar
                                     ^^^^^^^^^^^

This is the reason, with it we can add arbitrary devices referencing bar
as backing_hd, which we can't backtrack from "bar" when user runs
drive_del, so we can't simply delete. Either we close all children (and
descendants), or we keep it until no one references it.

In this context, it's solved with reference count. Why not hard? Becasue
hard ref blocks block job.

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup'
  2013-07-18  4:41     ` Fam Zheng
@ 2013-07-19 10:16       ` Wenchao Xia
  2013-07-23 10:10         ` Stefan Hajnoczi
  0 siblings, 1 reply; 46+ messages in thread
From: Wenchao Xia @ 2013-07-19 10:16 UTC (permalink / raw)
  To: famz; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha, pbonzini

于 2013-7-18 12:41, Fam Zheng 写道:
> On Wed, 07/17 06:44, Eric Blake wrote:
>> On 07/17/2013 03:42 AM, Fam Zheng wrote:
>>> Similar to drive-backup, but this command uses a device id as target
>>> instead of creating/opening an image file.
>>>
>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>> ---
>>>   blockdev.c       | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   qapi-schema.json | 49 ++++++++++++++++++++++++++++++++++++++
>>>   qmp-commands.hx  | 22 ++++++++++++++++++
>>>   3 files changed, 142 insertions(+)
>>>
>>
>>> +++ b/qapi-schema.json
>>> @@ -1665,6 +1665,40 @@
>>>               '*on-target-error': 'BlockdevOnError' } }
>>>
>>>   ##
>>> +# @BlockdevBackup
>>> +#
>>
>>> +{ 'type': 'BlockdevBackup',
>>> +  'data': { 'device': 'str', 'target': 'str',
>>> +            'sync': 'MirrorSyncMode',
>>> +            '*speed': 'int',
>>> +            '*on-source-error': 'BlockdevOnError',
>>> +            '*on-target-error': 'BlockdevOnError' } }
>>
>> Seems okay.  But what is missing is the addition of this type into the
>> union used for 'transaction' - shouldn't it be possible to mix this with
>> other transaction capabilities?
>>
>
> Should be possible, as users may want point-in-time snapshot of multiple
> disks. If this API looks OK, I will include it into transaction in the
> next version.
>
   Instead of add this new API, how about extend Driver-backup API? This
patch is basically doing similar things with driver-backup, extension
of old API will save trouble to do same things driver-backup already
does, such as support qmp_transaction.

-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD
  2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
                   ` (10 preceding siblings ...)
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup' Fam Zheng
@ 2013-07-19 10:41 ` Wenchao Xia
  2013-07-23  1:52   ` Wenchao Xia
  11 siblings, 1 reply; 46+ messages in thread
From: Wenchao Xia @ 2013-07-19 10:41 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha, pbonzini

于 2013-7-17 17:42, Fam Zheng 写道:
> This series adds for point-in-time snapshot NBD exporting based on
> blockdev-backup (variant of drive-backup with existing device as target). This
> patch is built on top of imain's sync mode patches for drive-backup.
> 
> We get a thin point-in-time snapshot by COW mechanism of drive-backup, and
> export it through built in NBD server. The steps are as below:
> 
  Compared with driver-backup, this approach is more flexible in some way,
since user can read the content and copy out what he need, with a
sacrifice of write of old data to temporary file. Overall it seems good.

  Dirty change tracking is what miss for a full solution, which
naturally exist in backing chain snapshot.

> 


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 12:36   ` Paolo Bonzini
  2013-07-17 12:58     ` Kevin Wolf
@ 2013-07-22  6:07     ` Fam Zheng
  2013-07-23 19:57       ` Ian Main
  1 sibling, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-22  6:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha

On Wed, 07/17 14:36, Paolo Bonzini wrote:
> Il 17/07/2013 11:42, Fam Zheng ha scritto:
> > This option allows overriding backing hd of drive. If the target drive
> > exists, it's referenced as the backing file and refcount incremented.
> > 
> > Example:
> >     qemu-system-x86_64 -drive \
> >         file.filename=foo.qcow2,if=none,id=foo \
> >         -drive file=bar.qcow2,backing=foo
> 
> I guess this is where we need the soft reference.
> 
> This has a _lot_ of potential for misuse, I think Kevin bashed me and
> Federico very heavily when we tried to do something similar.
> 
> block/backup.c is the right place where we can override the backing hd
> of the drive.  Perhaps we can add a way to open a file with
> BDRV_O_NO_BACKING from the command line.

OK. If we get the override in block/backup.c, the only thing we need is
naming the target so we can add it to nbd server. If refcounting,
overriding backing with option and blockdev-backup are still good to
have, I can split them and targeting to future releases.

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD
  2013-07-19 10:41 ` [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Wenchao Xia
@ 2013-07-23  1:52   ` Wenchao Xia
  2013-07-23  6:35     ` Paolo Bonzini
  0 siblings, 1 reply; 46+ messages in thread
From: Wenchao Xia @ 2013-07-23  1:52 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha, pbonzini

于 2013-7-19 18:41, Wenchao Xia 写道:
> 于 2013-7-17 17:42, Fam Zheng 写道:
>> This series adds for point-in-time snapshot NBD exporting based on
>> blockdev-backup (variant of drive-backup with existing device as target). This
>> patch is built on top of imain's sync mode patches for drive-backup.
>>
>> We get a thin point-in-time snapshot by COW mechanism of drive-backup, and
>> export it through built in NBD server. The steps are as below:
>>
>    Compared with driver-backup, this approach is more flexible in some way,
> since user can read the content and copy out what he need, with a
> sacrifice of write of old data to temporary file. Overall it seems good.
> 
>    Dirty change tracking is what miss for a full solution, which
> naturally exist in backing chain snapshot.
> 
>>
> 
> 
  Any one is doing dirty tracking feature? If no, I'd like to draft a
version based on add-cow.

-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD
  2013-07-23  1:52   ` Wenchao Xia
@ 2013-07-23  6:35     ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2013-07-23  6:35 UTC (permalink / raw)
  To: Wenchao Xia; +Cc: kwolf, Fam Zheng, hbrock, qemu-devel, rjones, imain, stefanha

Il 23/07/2013 03:52, Wenchao Xia ha scritto:
> >    Dirty change tracking is what miss for a full solution, which
> > naturally exist in backing chain snapshot.
> 
>   Any one is doing dirty tracking feature? If no, I'd like to draft a
> version based on add-cow.

I had started it, I can dig it up.  Remind me if I don't write about it
in a couple of days.

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard Fam Zheng
  2013-07-17 12:26   ` Paolo Bonzini
@ 2013-07-23  9:36   ` Stefan Hajnoczi
  2013-07-23 10:32     ` Fam Zheng
  1 sibling, 1 reply; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23  9:36 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, pbonzini

On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
> Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> to BlockDriverState, since in_use mechanism cannot provide proper
> management of lifecycle when a BDS is referenced in multiple places
> (e.g. pointed to by another bs's backing_hd while also used as a block
> job device, in the use case of image fleecing).
> 
> The original in_use case is considered a "hard reference" in this patch,
> where the bs is busy and should not be used in other tasks that require
> a hard reference. (However the interface doesn't force this, caller
> still need to call bdrv_in_use() to check by itself.).
> 
> A soft reference is implemented but not used yet. It will be used in
> following patches to manage the lifecycle together with hard reference.
> 
> If bdrv_ref() is called on a BDS, it must be released by exactly the
> same numbers of bdrv_unref() with the same "soft/hard" type, and never
> call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().

It is risky to keep bdrv_delete() public.  I suggest replacing
bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
static in block.c.

This way it is impossible to make the mistake of calling bdrv_delete()
on a BDS which has refcnt > 1.

I don't really understand this patch.  There are now two separate
refcounts.  They must both reach 0 for deletion to occur.  I think
you plan to treat the "hard" refcount like the in_use counter (there
should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
to keep in_use separate: let in_use callers take a refcount and also set
in_use.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach Fam Zheng
@ 2013-07-23  9:44   ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23  9:44 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, pbonzini

On Wed, Jul 17, 2013 at 05:42:09PM +0800, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 7b46669..57a3876 100644
> --- a/block.c
> +++ b/block.c
> @@ -1622,6 +1622,7 @@ int bdrv_attach_dev(BlockDriverState *bs, void *dev)
>          return -EBUSY;
>      }
>      bs->dev = dev;
> +    bdrv_ref(bs, false);
>      bdrv_iostatus_reset(bs);
>      return 0;
>  }
> @@ -1639,6 +1640,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev)
>  {
>      assert(bs->dev == dev);
>      bs->dev = NULL;
> +    bdrv_unref(bs, false);
>      bs->dev_ops = NULL;
>      bs->dev_opaque = NULL;
>      bs->buffer_alignment = 512;

I'm not 100% sure about this.  sd_init() and
ide_init2_with_non_qdev_drives() call bdrv_attach_dev_nofail() but I
don't see a bdrv_detach_dev() call.

It may be necessary to audit the lifecycle of emulated storage
controllers more closely before the refcounts work correctly.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now Fam Zheng
@ 2013-07-23  9:49   ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23  9:49 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, pbonzini

On Wed, Jul 17, 2013 at 05:42:10PM +0800, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block-migration.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/block-migration.c b/block-migration.c
> index d558410..d14f4eb 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -320,7 +320,6 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
>          bmds->completed_sectors = 0;
>          bmds->shared_base = block_mig_state.shared_base;
>          alloc_aio_bitmap(bmds);
> -        drive_get_ref(drive_get_by_blockdev(bs));
>          bdrv_ref(bs, true);
>  
>          block_mig_state.total_sector_sum += sectors;
> @@ -558,7 +557,6 @@ static void blk_mig_cleanup(void)
>      while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
>          QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
>          bdrv_unref(bmds->bs, true);
> -        drive_put_ref(drive_get_by_blockdev(bmds->bs));
>          g_free(bmds->aio_bitmap);
>          g_free(bmds);
>      }
> -- 
> 1.8.3.2

The key information here is that block-migration.c does not actually use
DriveInfo anywhere.  Hence it's safe to drop this code since we really
only cared about referencing BDS.

I suggest including an explanation like this in the commit description.
I had to audit the code to check whether the DriveInfo was used anywhere
else.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt Fam Zheng
@ 2013-07-23  9:50   ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23  9:50 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, pbonzini

On Wed, Jul 17, 2013 at 05:42:11PM +0800, Fam Zheng wrote:
> We call bdrv_attach_dev when initializing whether or not bs is created
> locally, so call bdrv_detach_dev and let the refcnt handle the
> lifecycle.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  hw/block/xen_disk.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 247f32f..ae17acc 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -910,12 +910,7 @@ static void blk_disconnect(struct XenDevice *xendev)
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>  
>      if (blkdev->bs) {
> -        if (!blkdev->dinfo) {
> -            /* close/delete only if we created it ourself */
> -            bdrv_close(blkdev->bs);
> -            bdrv_detach_dev(blkdev->bs, blkdev);
> -            bdrv_delete(blkdev->bs);
> -        }
> +        bdrv_detach_dev(blkdev->bs, blkdev);
>          blkdev->bs = NULL;
>      }
>      xen_be_unbind_evtchn(&blkdev->xendev);

This reminds me that bdrv_detach_dev() needs a comment documenting that
it decrements the refcount; bs may be deleted when the function returns
and must not be accessed anymore.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target Fam Zheng
@ 2013-07-23  9:52   ` Stefan Hajnoczi
  2013-07-25  6:08     ` Fam Zheng
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23  9:52 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, pbonzini

On Wed, Jul 17, 2013 at 05:42:12PM +0800, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/backup.c | 3 ++-
>  block/mirror.c | 4 ++--
>  2 files changed, 4 insertions(+), 3 deletions(-)

Should we update the blockjob.c in_use code instead of adding
refcounting to specific block jobs?  This ought to be handled
generically for all block jobs.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup'
  2013-07-19 10:16       ` Wenchao Xia
@ 2013-07-23 10:10         ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23 10:10 UTC (permalink / raw)
  To: Wenchao Xia; +Cc: kwolf, famz, hbrock, qemu-devel, rjones, imain, pbonzini

On Fri, Jul 19, 2013 at 06:16:55PM +0800, Wenchao Xia wrote:
> 于 2013-7-18 12:41, Fam Zheng 写道:
> >On Wed, 07/17 06:44, Eric Blake wrote:
> >>On 07/17/2013 03:42 AM, Fam Zheng wrote:
> >>>Similar to drive-backup, but this command uses a device id as target
> >>>instead of creating/opening an image file.
> >>>
> >>>Signed-off-by: Fam Zheng <famz@redhat.com>
> >>>---
> >>>  blockdev.c       | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  qapi-schema.json | 49 ++++++++++++++++++++++++++++++++++++++
> >>>  qmp-commands.hx  | 22 ++++++++++++++++++
> >>>  3 files changed, 142 insertions(+)
> >>>
> >>
> >>>+++ b/qapi-schema.json
> >>>@@ -1665,6 +1665,40 @@
> >>>              '*on-target-error': 'BlockdevOnError' } }
> >>>
> >>>  ##
> >>>+# @BlockdevBackup
> >>>+#
> >>
> >>>+{ 'type': 'BlockdevBackup',
> >>>+  'data': { 'device': 'str', 'target': 'str',
> >>>+            'sync': 'MirrorSyncMode',
> >>>+            '*speed': 'int',
> >>>+            '*on-source-error': 'BlockdevOnError',
> >>>+            '*on-target-error': 'BlockdevOnError' } }
> >>
> >>Seems okay.  But what is missing is the addition of this type into the
> >>union used for 'transaction' - shouldn't it be possible to mix this with
> >>other transaction capabilities?
> >>
> >
> >Should be possible, as users may want point-in-time snapshot of multiple
> >disks. If this API looks OK, I will include it into transaction in the
> >next version.
> >
>   Instead of add this new API, how about extend Driver-backup API? This
> patch is basically doing similar things with driver-backup, extension
> of old API will save trouble to do same things driver-backup already
> does, such as support qmp_transaction.

The rationale was that drive-* commands should be high-level and can
create image files.  blockdev-* commands should be low-level and work on
an existing -drive.

The reason for keeping two separate commands is to avoid adding in
parameters that work at different levels (filename vs drive name).

In terms of API design I think drive- + blockdev- really is cleaner.
Kevin can explain in more detail if I got something wrong.

Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-23  9:36   ` Stefan Hajnoczi
@ 2013-07-23 10:32     ` Fam Zheng
  2013-07-23 13:34       ` Stefan Hajnoczi
  0 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-23 10:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, pbonzini

On Tue, 07/23 11:36, Stefan Hajnoczi wrote:
> On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
> > Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> > to BlockDriverState, since in_use mechanism cannot provide proper
> > management of lifecycle when a BDS is referenced in multiple places
> > (e.g. pointed to by another bs's backing_hd while also used as a block
> > job device, in the use case of image fleecing).
> > 
> > The original in_use case is considered a "hard reference" in this patch,
> > where the bs is busy and should not be used in other tasks that require
> > a hard reference. (However the interface doesn't force this, caller
> > still need to call bdrv_in_use() to check by itself.).
> > 
> > A soft reference is implemented but not used yet. It will be used in
> > following patches to manage the lifecycle together with hard reference.
> > 
> > If bdrv_ref() is called on a BDS, it must be released by exactly the
> > same numbers of bdrv_unref() with the same "soft/hard" type, and never
> > call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> > bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
> 
> It is risky to keep bdrv_delete() public.  I suggest replacing
> bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
> static in block.c.
> 
> This way it is impossible to make the mistake of calling bdrv_delete()
> on a BDS which has refcnt > 1.
> 
> I don't really understand this patch.  There are now two separate
> refcounts.  They must both reach 0 for deletion to occur.  I think
> you plan to treat the "hard" refcount like the in_use counter (there
> should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
> to keep in_use separate: let in_use callers take a refcount and also set
> in_use.

OK, I like your ideas, make bdrv_delete private is much cleaner. Will
fix in next revision.

I plan to make it like this:

    /* soft ref */
    void bdrv_{ref,unref}(bs)

    /* hard ref */
    bool bdrv_hard_{ref,unref}(bs)

usage:
    bs = bdrv_new()
    <implicit bdrv_ref(bs) called>
    ...
    bdrv_unref(bs)
    <automatically deleted here>

or with hard ref:
    bs = bdrv_new()
    <implicit bdrv_ref() called>

    bdrv_hard_ref(bs)
    ...
    bdrv_hard_unref(bs)

    bdrv_unref(bs)
    <automatically deleted here>

The second bdrv_hard_ref call to a bs returns false, caller check the
return value.

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-23 10:32     ` Fam Zheng
@ 2013-07-23 13:34       ` Stefan Hajnoczi
  2013-07-24  0:39         ` Fam Zheng
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23 13:34 UTC (permalink / raw)
  To: Fam Zheng
  Cc: kwolf, hbrock, qemu-devel, rjones, imain, Stefan Hajnoczi, pbonzini

On Tue, Jul 23, 2013 at 06:32:25PM +0800, Fam Zheng wrote:
> On Tue, 07/23 11:36, Stefan Hajnoczi wrote:
> > On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
> > > Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> > > to BlockDriverState, since in_use mechanism cannot provide proper
> > > management of lifecycle when a BDS is referenced in multiple places
> > > (e.g. pointed to by another bs's backing_hd while also used as a block
> > > job device, in the use case of image fleecing).
> > > 
> > > The original in_use case is considered a "hard reference" in this patch,
> > > where the bs is busy and should not be used in other tasks that require
> > > a hard reference. (However the interface doesn't force this, caller
> > > still need to call bdrv_in_use() to check by itself.).
> > > 
> > > A soft reference is implemented but not used yet. It will be used in
> > > following patches to manage the lifecycle together with hard reference.
> > > 
> > > If bdrv_ref() is called on a BDS, it must be released by exactly the
> > > same numbers of bdrv_unref() with the same "soft/hard" type, and never
> > > call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> > > bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
> > 
> > It is risky to keep bdrv_delete() public.  I suggest replacing
> > bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
> > static in block.c.
> > 
> > This way it is impossible to make the mistake of calling bdrv_delete()
> > on a BDS which has refcnt > 1.
> > 
> > I don't really understand this patch.  There are now two separate
> > refcounts.  They must both reach 0 for deletion to occur.  I think
> > you plan to treat the "hard" refcount like the in_use counter (there
> > should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
> > to keep in_use separate: let in_use callers take a refcount and also set
> > in_use.
> 
> OK, I like your ideas, make bdrv_delete private is much cleaner. Will
> fix in next revision.
> 
> I plan to make it like this:
> 
>     /* soft ref */
>     void bdrv_{ref,unref}(bs)
> 
>     /* hard ref */
>     bool bdrv_hard_{ref,unref}(bs)
> 
> usage:
>     bs = bdrv_new()
>     <implicit bdrv_ref(bs) called>
>     ...
>     bdrv_unref(bs)
>     <automatically deleted here>
> 
> or with hard ref:
>     bs = bdrv_new()
>     <implicit bdrv_ref() called>
> 
>     bdrv_hard_ref(bs)
>     ...
>     bdrv_hard_unref(bs)
> 
>     bdrv_unref(bs)
>     <automatically deleted here>
> 
> The second bdrv_hard_ref call to a bs returns false, caller check the
> return value.

Why is hard ref necessary when we already have
bdrv_in_use()/bdrv_set_in_use()?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-22  6:07     ` Fam Zheng
@ 2013-07-23 19:57       ` Ian Main
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Main @ 2013-07-23 19:57 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, stefanha, Paolo Bonzini

On Mon, Jul 22, 2013 at 02:07:15PM +0800, Fam Zheng wrote:
> On Wed, 07/17 14:36, Paolo Bonzini wrote:
> > Il 17/07/2013 11:42, Fam Zheng ha scritto:
> > > This option allows overriding backing hd of drive. If the target drive
> > > exists, it's referenced as the backing file and refcount incremented.
> > > 
> > > Example:
> > >     qemu-system-x86_64 -drive \
> > >         file.filename=foo.qcow2,if=none,id=foo \
> > >         -drive file=bar.qcow2,backing=foo
> > 
> > I guess this is where we need the soft reference.
> > 
> > This has a _lot_ of potential for misuse, I think Kevin bashed me and
> > Federico very heavily when we tried to do something similar.
> > 
> > block/backup.c is the right place where we can override the backing hd
> > of the drive.  Perhaps we can add a way to open a file with
> > BDRV_O_NO_BACKING from the command line.
> 
> OK. If we get the override in block/backup.c, the only thing we need is
> naming the target so we can add it to nbd server. If refcounting,
> overriding backing with option and blockdev-backup are still good to
> have, I can split them and targeting to future releases.

In my latest patchset I have attempted to override the backing hd of the
target.  It is a little over my head however so any input there would be
appreciated.  It seems to me like applying that patch would get us a
working baseline and then you can add the more fancy stuff on top of
that?  That sound reasonable?

	Ian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options
  2013-07-17 15:09             ` Kevin Wolf
  2013-07-17 15:23               ` Paolo Bonzini
@ 2013-07-23 20:07               ` Ian Main
  1 sibling, 0 replies; 46+ messages in thread
From: Ian Main @ 2013-07-23 20:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, hbrock, qemu-devel, rjones, stefanha, Paolo Bonzini

On Wed, Jul 17, 2013 at 05:09:05PM +0200, Kevin Wolf wrote:
> Am 17.07.2013 um 16:16 hat Paolo Bonzini geschrieben:
> > Il 17/07/2013 15:48, Kevin Wolf ha scritto:
> > >> I understand this is the right thing to do long term, but pre-opening of
> > >> the target is not really needed for fleecing.
> > > 
> > > So for how much longer should we plan to procrastinate? (I know, not an
> > > entirely fair question, but we have to make the step at some point)
> > 
> > If we bring it up during soft freeze, we will procrastinate it a lot. :)
> >  If we bring it up at the beginning of a release cycle, the wait could
> > be as short as 1 month...
> > 
> > > And we'll want to reference existing BDSes as backing/protocol files in
> > > blockdev-add soon anyway, so if we decide against it here, it's just
> > > moving from Fam's to-do list to mine...
> > 
> > We can reconsider these very patches in 1 month.  It's just the timing,
> > combined with the fact that this is not necessary for fleecing, that I'm
> > uncomfortable with.
> 
> Okay, I see where this is going. Let me reinforce one fundamental policy
> that you may not like. I've rejected patches based on it before and I'll
> likely have to do it again in the future. This is it:
> 
>     I'm not going to compromise on upstream APIs to accommodate
>     downstream schedules.
> 
> If doing things properly means moving fleecing to 1.7, so be it. There's
> no pressure to have it in upstream 1.6. It might mean downstream patches
> for some distro, but that's not a valid point in upstream design
> discussions.

I'll just throw out there that *I'm* feeling pressure for 1.6, but at
the same time I understand your concerns.  Some feel that independent of
distro concerns this is a feature gap for KVM/qemu.

It seems to me a practical approach of taking what is needed to get this
working while also putting out patches for doing it right would work out
well in the end.

Anyway, just my 2c.  I am just helping out and I know you guys have to
deal with the long term effects so I respect your decisions. 

	Ian

> Now is the right time to discuss design changes for 1.7, so that as soon
> as block-next starts to come to life (i.e. beginning of August), we can
> start with implementing this at the earliest possible point in the 1.7
> release cycle.
> 
> > > I guess we can give a name to the target, and we can make drive-backup
> > > automatically connect the target with the original as its backing file
> > > (still needs the refcounting, by the way).
> > 
> > No, it doesn't need the refcounting (see my reply to the cover letter).
> >  In his next submission of drive-backup sync modes, Ian is already going
> > to handle the automatic connection of the target with the original.
> 
> Okay, I'll have a look, but I can't imagine how it work without
> refcounting. As soon as it has a name, you can attach the target to a
> guest, nbd server, start block jobs and do all kinds of fun with it so
> that taking it away when the source goes away becomes problematic. But
> I'll have a look at the patches and hope that the commit messages
> explain the details.
> 
> > > But is giving a name to the
> > > target not enough to allow "interesting" things to be done? I don't
> > > remember the details from the mirroring discussion, but it seems it were
> > > enough that you didn't want to do it.
> > 
> > Yes, but this time we have to bite the bullet on that one at least,
> > because we have no other choice (we want to do at least one
> > "interesting" thing, namely connect to it with the NBD server).
> 
> Yes, like I said, we might not feel comfortable with enabling these
> cases, but not enabling them isn't an option either. So now is the time
> to do the real thing.
> 
> Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-23 13:34       ` Stefan Hajnoczi
@ 2013-07-24  0:39         ` Fam Zheng
  2013-07-24  7:35           ` Stefan Hajnoczi
  0 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-24  0:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, hbrock, qemu-devel, rjones, imain, Stefan Hajnoczi, pbonzini

On Tue, 07/23 15:34, Stefan Hajnoczi wrote:
> On Tue, Jul 23, 2013 at 06:32:25PM +0800, Fam Zheng wrote:
> > On Tue, 07/23 11:36, Stefan Hajnoczi wrote:
> > > On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
> > > > Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> > > > to BlockDriverState, since in_use mechanism cannot provide proper
> > > > management of lifecycle when a BDS is referenced in multiple places
> > > > (e.g. pointed to by another bs's backing_hd while also used as a block
> > > > job device, in the use case of image fleecing).
> > > > 
> > > > The original in_use case is considered a "hard reference" in this patch,
> > > > where the bs is busy and should not be used in other tasks that require
> > > > a hard reference. (However the interface doesn't force this, caller
> > > > still need to call bdrv_in_use() to check by itself.).
> > > > 
> > > > A soft reference is implemented but not used yet. It will be used in
> > > > following patches to manage the lifecycle together with hard reference.
> > > > 
> > > > If bdrv_ref() is called on a BDS, it must be released by exactly the
> > > > same numbers of bdrv_unref() with the same "soft/hard" type, and never
> > > > call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> > > > bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
> > > 
> > > It is risky to keep bdrv_delete() public.  I suggest replacing
> > > bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
> > > static in block.c.
> > > 
> > > This way it is impossible to make the mistake of calling bdrv_delete()
> > > on a BDS which has refcnt > 1.
> > > 
> > > I don't really understand this patch.  There are now two separate
> > > refcounts.  They must both reach 0 for deletion to occur.  I think
> > > you plan to treat the "hard" refcount like the in_use counter (there
> > > should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
> > > to keep in_use separate: let in_use callers take a refcount and also set
> > > in_use.
> > 
> > OK, I like your ideas, make bdrv_delete private is much cleaner. Will
> > fix in next revision.
> > 
> > I plan to make it like this:
> > 
> >     /* soft ref */
> >     void bdrv_{ref,unref}(bs)
> > 
> >     /* hard ref */
> >     bool bdrv_hard_{ref,unref}(bs)
> > 
> > usage:
> >     bs = bdrv_new()
> >     <implicit bdrv_ref(bs) called>
> >     ...
> >     bdrv_unref(bs)
> >     <automatically deleted here>
> > 
> > or with hard ref:
> >     bs = bdrv_new()
> >     <implicit bdrv_ref() called>
> > 
> >     bdrv_hard_ref(bs)
> >     ...
> >     bdrv_hard_unref(bs)
> > 
> >     bdrv_unref(bs)
> >     <automatically deleted here>
> > 
> > The second bdrv_hard_ref call to a bs returns false, caller check the
> > return value.
> 
> Why is hard ref necessary when we already have
> bdrv_in_use()/bdrv_set_in_use()?

Keeping the names of bdrv_in_use() and bdrv_set_in_use() is noting
wrong, if no more than one "hard ref" is enforced. However I think we
should manage lifecycle with both bdrv_ref and bdrv_set_in_use, so name
them both ref sounds consistent: make it clearer to caller both hard ref
(in_use) and soft ref preserve the bs from being released.

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-24  0:39         ` Fam Zheng
@ 2013-07-24  7:35           ` Stefan Hajnoczi
  2013-07-24  7:44             ` Fam Zheng
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-24  7:35 UTC (permalink / raw)
  To: Fam Zheng
  Cc: kwolf, hbrock, qemu-devel, rjones, imain, Stefan Hajnoczi, pbonzini

On Wed, Jul 24, 2013 at 08:39:53AM +0800, Fam Zheng wrote:
> On Tue, 07/23 15:34, Stefan Hajnoczi wrote:
> > On Tue, Jul 23, 2013 at 06:32:25PM +0800, Fam Zheng wrote:
> > > On Tue, 07/23 11:36, Stefan Hajnoczi wrote:
> > > > On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
> > > > > Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> > > > > to BlockDriverState, since in_use mechanism cannot provide proper
> > > > > management of lifecycle when a BDS is referenced in multiple places
> > > > > (e.g. pointed to by another bs's backing_hd while also used as a block
> > > > > job device, in the use case of image fleecing).
> > > > > 
> > > > > The original in_use case is considered a "hard reference" in this patch,
> > > > > where the bs is busy and should not be used in other tasks that require
> > > > > a hard reference. (However the interface doesn't force this, caller
> > > > > still need to call bdrv_in_use() to check by itself.).
> > > > > 
> > > > > A soft reference is implemented but not used yet. It will be used in
> > > > > following patches to manage the lifecycle together with hard reference.
> > > > > 
> > > > > If bdrv_ref() is called on a BDS, it must be released by exactly the
> > > > > same numbers of bdrv_unref() with the same "soft/hard" type, and never
> > > > > call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> > > > > bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
> > > > 
> > > > It is risky to keep bdrv_delete() public.  I suggest replacing
> > > > bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
> > > > static in block.c.
> > > > 
> > > > This way it is impossible to make the mistake of calling bdrv_delete()
> > > > on a BDS which has refcnt > 1.
> > > > 
> > > > I don't really understand this patch.  There are now two separate
> > > > refcounts.  They must both reach 0 for deletion to occur.  I think
> > > > you plan to treat the "hard" refcount like the in_use counter (there
> > > > should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
> > > > to keep in_use separate: let in_use callers take a refcount and also set
> > > > in_use.
> > > 
> > > OK, I like your ideas, make bdrv_delete private is much cleaner. Will
> > > fix in next revision.
> > > 
> > > I plan to make it like this:
> > > 
> > >     /* soft ref */
> > >     void bdrv_{ref,unref}(bs)
> > > 
> > >     /* hard ref */
> > >     bool bdrv_hard_{ref,unref}(bs)
> > > 
> > > usage:
> > >     bs = bdrv_new()
> > >     <implicit bdrv_ref(bs) called>
> > >     ...
> > >     bdrv_unref(bs)
> > >     <automatically deleted here>
> > > 
> > > or with hard ref:
> > >     bs = bdrv_new()
> > >     <implicit bdrv_ref() called>
> > > 
> > >     bdrv_hard_ref(bs)
> > >     ...
> > >     bdrv_hard_unref(bs)
> > > 
> > >     bdrv_unref(bs)
> > >     <automatically deleted here>
> > > 
> > > The second bdrv_hard_ref call to a bs returns false, caller check the
> > > return value.
> > 
> > Why is hard ref necessary when we already have
> > bdrv_in_use()/bdrv_set_in_use()?
> 
> Keeping the names of bdrv_in_use() and bdrv_set_in_use() is noting
> wrong, if no more than one "hard ref" is enforced. However I think we
> should manage lifecycle with both bdrv_ref and bdrv_set_in_use, so name
> them both ref sounds consistent: make it clearer to caller both hard ref
> (in_use) and soft ref preserve the bs from being released.

I actually find "hard"/"soft" ref naming confusing and prefer keeping
bdrv_in_use().  Refcount is for object lifetime whereas in_use is for
"busy" status.

Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-24  7:35           ` Stefan Hajnoczi
@ 2013-07-24  7:44             ` Fam Zheng
  2013-07-25  7:52               ` Stefan Hajnoczi
  0 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-24  7:44 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, hbrock, qemu-devel, rjones, imain, Stefan Hajnoczi, pbonzini

On Wed, 07/24 09:35, Stefan Hajnoczi wrote:
> On Wed, Jul 24, 2013 at 08:39:53AM +0800, Fam Zheng wrote:
> > On Tue, 07/23 15:34, Stefan Hajnoczi wrote:
> > > On Tue, Jul 23, 2013 at 06:32:25PM +0800, Fam Zheng wrote:
> > > > On Tue, 07/23 11:36, Stefan Hajnoczi wrote:
> > > > > On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
> > > > > > Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> > > > > > to BlockDriverState, since in_use mechanism cannot provide proper
> > > > > > management of lifecycle when a BDS is referenced in multiple places
> > > > > > (e.g. pointed to by another bs's backing_hd while also used as a block
> > > > > > job device, in the use case of image fleecing).
> > > > > > 
> > > > > > The original in_use case is considered a "hard reference" in this patch,
> > > > > > where the bs is busy and should not be used in other tasks that require
> > > > > > a hard reference. (However the interface doesn't force this, caller
> > > > > > still need to call bdrv_in_use() to check by itself.).
> > > > > > 
> > > > > > A soft reference is implemented but not used yet. It will be used in
> > > > > > following patches to manage the lifecycle together with hard reference.
> > > > > > 
> > > > > > If bdrv_ref() is called on a BDS, it must be released by exactly the
> > > > > > same numbers of bdrv_unref() with the same "soft/hard" type, and never
> > > > > > call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> > > > > > bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
> > > > > 
> > > > > It is risky to keep bdrv_delete() public.  I suggest replacing
> > > > > bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
> > > > > static in block.c.
> > > > > 
> > > > > This way it is impossible to make the mistake of calling bdrv_delete()
> > > > > on a BDS which has refcnt > 1.
> > > > > 
> > > > > I don't really understand this patch.  There are now two separate
> > > > > refcounts.  They must both reach 0 for deletion to occur.  I think
> > > > > you plan to treat the "hard" refcount like the in_use counter (there
> > > > > should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
> > > > > to keep in_use separate: let in_use callers take a refcount and also set
> > > > > in_use.
> > > > 
> > > > OK, I like your ideas, make bdrv_delete private is much cleaner. Will
> > > > fix in next revision.
> > > > 
> > > > I plan to make it like this:
> > > > 
> > > >     /* soft ref */
> > > >     void bdrv_{ref,unref}(bs)
> > > > 
> > > >     /* hard ref */
> > > >     bool bdrv_hard_{ref,unref}(bs)
> > > > 
> > > > usage:
> > > >     bs = bdrv_new()
> > > >     <implicit bdrv_ref(bs) called>
> > > >     ...
> > > >     bdrv_unref(bs)
> > > >     <automatically deleted here>
> > > > 
> > > > or with hard ref:
> > > >     bs = bdrv_new()
> > > >     <implicit bdrv_ref() called>
> > > > 
> > > >     bdrv_hard_ref(bs)
> > > >     ...
> > > >     bdrv_hard_unref(bs)
> > > > 
> > > >     bdrv_unref(bs)
> > > >     <automatically deleted here>
> > > > 
> > > > The second bdrv_hard_ref call to a bs returns false, caller check the
> > > > return value.
> > > 
> > > Why is hard ref necessary when we already have
> > > bdrv_in_use()/bdrv_set_in_use()?
> > 
> > Keeping the names of bdrv_in_use() and bdrv_set_in_use() is noting
> > wrong, if no more than one "hard ref" is enforced. However I think we
> > should manage lifecycle with both bdrv_ref and bdrv_set_in_use, so name
> > them both ref sounds consistent: make it clearer to caller both hard ref
> > (in_use) and soft ref preserve the bs from being released.
> 
> I actually find "hard"/"soft" ref naming confusing and prefer keeping
> bdrv_in_use().  Refcount is for object lifetime whereas in_use is for
> "busy" status.
> 
OK, do you suggest keeping in_use as is and call bdrv_delete(bs) in
bdrv_unref() regardless of bs->in_use?

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/11] block: simplify bdrv_drop_intermediate
  2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 08/11] block: simplify bdrv_drop_intermediate Fam Zheng
@ 2013-07-24 23:16   ` Jeff Cody
  2013-07-25  1:34     ` Fam Zheng
  0 siblings, 1 reply; 46+ messages in thread
From: Jeff Cody @ 2013-07-24 23:16 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha, pbonzini

On Wed, Jul 17, 2013 at 05:42:13PM +0800, Fam Zheng wrote:
> bdrv_drop_intermediate used a local list to iterate through backing
> chain and delete each BDS. It is simplified while adopting to refcount
> mechanism.
> 

Hi Fam,

The reason for the local list is to keep the BDS deletion
transactional, so it can be rolled back in case of error (see below)

> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 71 ++++++++++-------------------------------------------------------
>  1 file changed, 11 insertions(+), 60 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 57a3876..499de22 100644
> --- a/block.c
> +++ b/block.c
> @@ -2027,12 +2027,6 @@ BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>      return overlay;
>  }
>  
> -typedef struct BlkIntermediateStates {
> -    BlockDriverState *bs;
> -    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
> -} BlkIntermediateStates;
> -
> -
>  /*
>   * Drops images above 'base' up to and including 'top', and sets the image
>   * above 'top' to have base as its backing file.
> @@ -2062,15 +2056,9 @@ typedef struct BlkIntermediateStates {
>  int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
>                             BlockDriverState *base)
>  {
> -    BlockDriverState *intermediate;
> -    BlockDriverState *base_bs = NULL;
>      BlockDriverState *new_top_bs = NULL;
> -    BlkIntermediateStates *intermediate_state, *next;
>      int ret = -EIO;
>  
> -    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
> -    QSIMPLEQ_INIT(&states_to_delete);
> -
>      if (!top->drv || !base->drv) {
>          goto exit;
>      }
> @@ -2082,58 +2070,21 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
>          goto exit;
>      }
>  
> -    /* special case of new_top_bs->backing_hd already pointing to base - nothing
> -     * to do, no intermediate images */
> -    if (new_top_bs->backing_hd == base) {
> -        ret = 0;
> -        goto exit;
> -    }
> -
> -    intermediate = top;
> -
> -    /* now we will go down through the list, and add each BDS we find
> -     * into our deletion queue, until we hit the 'base'
> -     */
> -    while (intermediate) {
> -        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
> -        intermediate_state->bs = intermediate;
> -        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
> -
> -        if (intermediate->backing_hd == base) {
> -            base_bs = intermediate->backing_hd;
> -            break;
> +    while (new_top_bs->backing_hd && new_top_bs->backing_hd != base) {
> +        BlockDriverState *backing = new_top_bs->backing_hd;
> +        if (backing == NULL) {
> +            goto exit;

If you simplify it until just a while loop that unrefs/deletes the BDS
inside the loop as you navigate the chain, then any error exit leaves
you in a bad state, with a potentially invalid chain.  This is one
such error potential.

>          }
> -        intermediate = intermediate->backing_hd;
> -    }
> -    if (base_bs == NULL) {
> -        /* something went wrong, we did not end at the base. safely
> -         * unravel everything, and exit with error */
> -        goto exit;
> +        new_top_bs->backing_hd = backing->backing_hd;
> +        /* break backing_hd chain before releasing bs, so we don't free all the
> +         * way up the backing chain */
> +        backing->backing_hd = NULL;
> +        bdrv_unref(backing, false);

These two statements, which unlink this BDS from the chain, can't be
undone now, in case of error.

>      }
>  
> -    /* success - we can delete the intermediate states, and link top->base */
> -    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
> -                                   base_bs->drv ? base_bs->drv->format_name : "");
> -    if (ret) {
> -        goto exit;
> -    }
> -    if (new_top_bs->backing_hd) {
> -        bdrv_unref(new_top_bs->backing_hd, false);
> -    }
> -    new_top_bs->backing_hd = base_bs;
> -    bdrv_ref(base_bs, false);
> -
> -    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
> -        /* so that bdrv_close() does not recursively close the chain */
> -        intermediate_state->bs->backing_hd = NULL;
> -        bdrv_delete(intermediate_state->bs);
> -    }

The foreach loop over the list was placed such that there were no more
error paths; we were guaranteed at this point to have been able delete
and unchain each intermediate BDS.

> -    ret = 0;
> -
> +    ret = bdrv_change_backing_file(new_top_bs, base->filename,
> +                                   base->drv ? base->drv->format_name : "");

This is effectively another error path that would cause problems, if
ret < 0.

>  exit:
> -    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
> -        g_free(intermediate_state);
> -    }
>      return ret;
>  }
>  
> -- 
> 1.8.3.2
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/11] block: simplify bdrv_drop_intermediate
  2013-07-24 23:16   ` Jeff Cody
@ 2013-07-25  1:34     ` Fam Zheng
  0 siblings, 0 replies; 46+ messages in thread
From: Fam Zheng @ 2013-07-25  1:34 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, stefanha, pbonzini

On Wed, 07/24 19:16, Jeff Cody wrote:
> On Wed, Jul 17, 2013 at 05:42:13PM +0800, Fam Zheng wrote:
> > bdrv_drop_intermediate used a local list to iterate through backing
> > chain and delete each BDS. It is simplified while adopting to refcount
> > mechanism.
> > 
> 
> Hi Fam,
> 
> The reason for the local list is to keep the BDS deletion
> transactional, so it can be rolled back in case of error (see below)
> 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  block.c | 71 ++++++++++-------------------------------------------------------
> >  1 file changed, 11 insertions(+), 60 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index 57a3876..499de22 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -2027,12 +2027,6 @@ BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
> >      return overlay;
> >  }
> >  
> > -typedef struct BlkIntermediateStates {
> > -    BlockDriverState *bs;
> > -    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
> > -} BlkIntermediateStates;
> > -
> > -
> >  /*
> >   * Drops images above 'base' up to and including 'top', and sets the image
> >   * above 'top' to have base as its backing file.
> > @@ -2062,15 +2056,9 @@ typedef struct BlkIntermediateStates {
> >  int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
> >                             BlockDriverState *base)
> >  {
> > -    BlockDriverState *intermediate;
> > -    BlockDriverState *base_bs = NULL;
> >      BlockDriverState *new_top_bs = NULL;
> > -    BlkIntermediateStates *intermediate_state, *next;
> >      int ret = -EIO;
> >  
> > -    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
> > -    QSIMPLEQ_INIT(&states_to_delete);
> > -
> >      if (!top->drv || !base->drv) {
> >          goto exit;
> >      }
> > @@ -2082,58 +2070,21 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
> >          goto exit;
> >      }
> >  
> > -    /* special case of new_top_bs->backing_hd already pointing to base - nothing
> > -     * to do, no intermediate images */
> > -    if (new_top_bs->backing_hd == base) {
> > -        ret = 0;
> > -        goto exit;
> > -    }
> > -
> > -    intermediate = top;
> > -
> > -    /* now we will go down through the list, and add each BDS we find
> > -     * into our deletion queue, until we hit the 'base'
> > -     */
> > -    while (intermediate) {
> > -        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
> > -        intermediate_state->bs = intermediate;
> > -        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
> > -
> > -        if (intermediate->backing_hd == base) {
> > -            base_bs = intermediate->backing_hd;
> > -            break;
> > +    while (new_top_bs->backing_hd && new_top_bs->backing_hd != base) {
> > +        BlockDriverState *backing = new_top_bs->backing_hd;
> > +        if (backing == NULL) {
> > +            goto exit;
> 
> If you simplify it until just a while loop that unrefs/deletes the BDS
> inside the loop as you navigate the chain, then any error exit leaves
> you in a bad state, with a potentially invalid chain.  This is one
> such error potential.
> 

Yes, I'll fix this.

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target
  2013-07-23  9:52   ` Stefan Hajnoczi
@ 2013-07-25  6:08     ` Fam Zheng
  2013-07-25  7:59       ` Stefan Hajnoczi
  0 siblings, 1 reply; 46+ messages in thread
From: Fam Zheng @ 2013-07-25  6:08 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, hbrock, qemu-devel, rjones, imain, pbonzini

On Tue, 07/23 11:52, Stefan Hajnoczi wrote:
> On Wed, Jul 17, 2013 at 05:42:12PM +0800, Fam Zheng wrote:
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  block/backup.c | 3 ++-
> >  block/mirror.c | 4 ++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> Should we update the blockjob.c in_use code instead of adding
> refcounting to specific block jobs?  This ought to be handled
> generically for all block jobs.

Target is not common in block jobs (e.g. doesn't apply to block-commit),
so it seems only specific block job knows about this.

-- 
Fam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
  2013-07-24  7:44             ` Fam Zheng
@ 2013-07-25  7:52               ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-25  7:52 UTC (permalink / raw)
  To: Fam Zheng
  Cc: kwolf, hbrock, qemu-devel, rjones, imain, Stefan Hajnoczi, pbonzini

On Wed, Jul 24, 2013 at 03:44:38PM +0800, Fam Zheng wrote:
> On Wed, 07/24 09:35, Stefan Hajnoczi wrote:
> > On Wed, Jul 24, 2013 at 08:39:53AM +0800, Fam Zheng wrote:
> > > On Tue, 07/23 15:34, Stefan Hajnoczi wrote:
> > > > On Tue, Jul 23, 2013 at 06:32:25PM +0800, Fam Zheng wrote:
> > > > > On Tue, 07/23 11:36, Stefan Hajnoczi wrote:
> > > > > > On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
> > > > > > > Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
> > > > > > > to BlockDriverState, since in_use mechanism cannot provide proper
> > > > > > > management of lifecycle when a BDS is referenced in multiple places
> > > > > > > (e.g. pointed to by another bs's backing_hd while also used as a block
> > > > > > > job device, in the use case of image fleecing).
> > > > > > > 
> > > > > > > The original in_use case is considered a "hard reference" in this patch,
> > > > > > > where the bs is busy and should not be used in other tasks that require
> > > > > > > a hard reference. (However the interface doesn't force this, caller
> > > > > > > still need to call bdrv_in_use() to check by itself.).
> > > > > > > 
> > > > > > > A soft reference is implemented but not used yet. It will be used in
> > > > > > > following patches to manage the lifecycle together with hard reference.
> > > > > > > 
> > > > > > > If bdrv_ref() is called on a BDS, it must be released by exactly the
> > > > > > > same numbers of bdrv_unref() with the same "soft/hard" type, and never
> > > > > > > call bdrv_delete() directly. If the BDS is only used locally (unnamed),
> > > > > > > bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
> > > > > > 
> > > > > > It is risky to keep bdrv_delete() public.  I suggest replacing
> > > > > > bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
> > > > > > static in block.c.
> > > > > > 
> > > > > > This way it is impossible to make the mistake of calling bdrv_delete()
> > > > > > on a BDS which has refcnt > 1.
> > > > > > 
> > > > > > I don't really understand this patch.  There are now two separate
> > > > > > refcounts.  They must both reach 0 for deletion to occur.  I think
> > > > > > you plan to treat the "hard" refcount like the in_use counter (there
> > > > > > should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
> > > > > > to keep in_use separate: let in_use callers take a refcount and also set
> > > > > > in_use.
> > > > > 
> > > > > OK, I like your ideas, make bdrv_delete private is much cleaner. Will
> > > > > fix in next revision.
> > > > > 
> > > > > I plan to make it like this:
> > > > > 
> > > > >     /* soft ref */
> > > > >     void bdrv_{ref,unref}(bs)
> > > > > 
> > > > >     /* hard ref */
> > > > >     bool bdrv_hard_{ref,unref}(bs)
> > > > > 
> > > > > usage:
> > > > >     bs = bdrv_new()
> > > > >     <implicit bdrv_ref(bs) called>
> > > > >     ...
> > > > >     bdrv_unref(bs)
> > > > >     <automatically deleted here>
> > > > > 
> > > > > or with hard ref:
> > > > >     bs = bdrv_new()
> > > > >     <implicit bdrv_ref() called>
> > > > > 
> > > > >     bdrv_hard_ref(bs)
> > > > >     ...
> > > > >     bdrv_hard_unref(bs)
> > > > > 
> > > > >     bdrv_unref(bs)
> > > > >     <automatically deleted here>
> > > > > 
> > > > > The second bdrv_hard_ref call to a bs returns false, caller check the
> > > > > return value.
> > > > 
> > > > Why is hard ref necessary when we already have
> > > > bdrv_in_use()/bdrv_set_in_use()?
> > > 
> > > Keeping the names of bdrv_in_use() and bdrv_set_in_use() is noting
> > > wrong, if no more than one "hard ref" is enforced. However I think we
> > > should manage lifecycle with both bdrv_ref and bdrv_set_in_use, so name
> > > them both ref sounds consistent: make it clearer to caller both hard ref
> > > (in_use) and soft ref preserve the bs from being released.
> > 
> > I actually find "hard"/"soft" ref naming confusing and prefer keeping
> > bdrv_in_use().  Refcount is for object lifetime whereas in_use is for
> > "busy" status.
> > 
> OK, do you suggest keeping in_use as is and call bdrv_delete(bs) in
> bdrv_unref() regardless of bs->in_use?

Yes, because in_use is orthogonal to object lifetime.

Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target
  2013-07-25  6:08     ` Fam Zheng
@ 2013-07-25  7:59       ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2013-07-25  7:59 UTC (permalink / raw)
  To: Fam Zheng
  Cc: kwolf, hbrock, qemu-devel, rjones, imain, Stefan Hajnoczi, pbonzini

On Thu, Jul 25, 2013 at 02:08:42PM +0800, Fam Zheng wrote:
> On Tue, 07/23 11:52, Stefan Hajnoczi wrote:
> > On Wed, Jul 17, 2013 at 05:42:12PM +0800, Fam Zheng wrote:
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > ---
> > >  block/backup.c | 3 ++-
> > >  block/mirror.c | 4 ++--
> > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > Should we update the blockjob.c in_use code instead of adding
> > refcounting to specific block jobs?  This ought to be handled
> > generically for all block jobs.
> 
> Target is not common in block jobs (e.g. doesn't apply to block-commit),
> so it seems only specific block job knows about this.

Of course you are right.

Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2013-07-25  7:59 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-17  9:42 [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Fam Zheng
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard Fam Zheng
2013-07-17 12:26   ` Paolo Bonzini
2013-07-18  4:53     ` Fam Zheng
2013-07-23  9:36   ` Stefan Hajnoczi
2013-07-23 10:32     ` Fam Zheng
2013-07-23 13:34       ` Stefan Hajnoczi
2013-07-24  0:39         ` Fam Zheng
2013-07-24  7:35           ` Stefan Hajnoczi
2013-07-24  7:44             ` Fam Zheng
2013-07-25  7:52               ` Stefan Hajnoczi
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 02/11] block: use refcnt for bs->backing_hd and bs->file Fam Zheng
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 03/11] block: use refcnt for drive_init/drive_uninit Fam Zheng
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach Fam Zheng
2013-07-23  9:44   ` Stefan Hajnoczi
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now Fam Zheng
2013-07-23  9:49   ` Stefan Hajnoczi
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt Fam Zheng
2013-07-23  9:50   ` Stefan Hajnoczi
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target Fam Zheng
2013-07-23  9:52   ` Stefan Hajnoczi
2013-07-25  6:08     ` Fam Zheng
2013-07-25  7:59       ` Stefan Hajnoczi
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 08/11] block: simplify bdrv_drop_intermediate Fam Zheng
2013-07-24 23:16   ` Jeff Cody
2013-07-25  1:34     ` Fam Zheng
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 09/11] block: add assertion to check refcount before deleting Fam Zheng
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 10/11] block: add option 'backing' to -drive options Fam Zheng
2013-07-17 12:36   ` Paolo Bonzini
2013-07-17 12:58     ` Kevin Wolf
2013-07-17 13:13       ` Paolo Bonzini
2013-07-17 13:48         ` Kevin Wolf
2013-07-17 14:16           ` Paolo Bonzini
2013-07-17 15:09             ` Kevin Wolf
2013-07-17 15:23               ` Paolo Bonzini
2013-07-23 20:07               ` Ian Main
2013-07-22  6:07     ` Fam Zheng
2013-07-23 19:57       ` Ian Main
2013-07-17  9:42 ` [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup' Fam Zheng
2013-07-17 12:44   ` Eric Blake
2013-07-18  4:41     ` Fam Zheng
2013-07-19 10:16       ` Wenchao Xia
2013-07-23 10:10         ` Stefan Hajnoczi
2013-07-19 10:41 ` [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD Wenchao Xia
2013-07-23  1:52   ` Wenchao Xia
2013-07-23  6:35     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.