All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] CoW image commit+shrink(= make_empty) support
@ 2012-06-07  6:19 Taisuke Yamada
  2012-06-07 14:14 ` Jeff Cody
  2012-06-08 10:39 ` Kevin Wolf
  0 siblings, 2 replies; 31+ messages in thread
From: Taisuke Yamada @ 2012-06-07  6:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody

[-- Attachment #1: Type: text/plain, Size: 1352 bytes --]

I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling
Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me
to post the bits I have regarding my question on qemu's  support on shrinking
CoW image.

Here's my problem description.

I recently designed a experimental system which holds VM master images
on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only.
This split-image configration is done to keep VM I/Os on a SSD

As SSD capacity is rather limited, I need to do a writeback commit from SSD to
HDD time to time, and that is done during weekend/midnight. The problem is
although a commit is made, that alone won't shrink CoW image - all unused blocks
are still kept in a snapshot, and uses up space.

Patch attached is a workaround I added to cope with the problem,
but the basic problem I faced was that both QCOW2/QED format still does not
support "bdrv_make_empty" API.

Implementing the API (say, by hole punching) seemed like a lot of effort, so
I ended up creating a new CoW image, and then replace current CoW
snapshot with a new (empty) one. But I find the code ugly.

In his talk, Paolo suggested possibility of using new "live op" API for this
task, but I'm not aware of the actual API. Is there any documentation or
source code I can look at to re-implement above feature?

Best Regards,

[-- Attachment #2: qemu-block-refresh.patch --]
[-- Type: application/octet-stream, Size: 6575 bytes --]

From 85a0001b90efe59e52f89b62cda56771d79d0242 Mon Sep 17 00:00:00 2001
From: Taisuke Yamada <tai@rakugaki.org>
Date: Thu, 7 Jun 2012 13:35:31 +0900
Subject: [PATCH] - Imported block-refresh (commit+shrink) patch from qemu-kvm
 custom branch.

Signed-off-by: Taisuke Yamada <tai@rakugaki.org>

diff --git a/block/vmdk.c b/block/vmdk.c
index 18e9b4c..3241be6 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1388,6 +1388,8 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
             total_size = options->value.n;
         } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
             backing_file = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
+            ; // ignore - VMDK is the only supported format, after all
         } else if (!strcmp(options->name, BLOCK_OPT_COMPAT6)) {
             flags |= options->value.n ? BLOCK_FLAG_COMPAT6 : 0;
         } else if (!strcmp(options->name, BLOCK_OPT_SUBFMT)) {
@@ -1574,6 +1576,11 @@ static QEMUOptionParameter vmdk_create_options[] = {
         .help = "File name of a base image"
     },
     {
+        .name = BLOCK_OPT_BACKING_FMT,
+        .type = OPT_STRING,
+        .help = "File format of a base image"
+    },
+    {
         .name = BLOCK_OPT_COMPAT6,
         .type = OPT_FLAG,
         .help = "VMDK version 6 image"
diff --git a/blockdev.c b/blockdev.c
index 67895b2..e5df77a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -832,6 +832,90 @@ exit:
 }
 
 
+void qmp_block_refresh(const char *device, Error **errp)
+{
+    BlockDriverState *bs;
+    BlockDriver *drv;
+    int ret = 0;
+    int flags;
+    char curfile[1024], newfile[1024], oldfile[1024];
+    pid_t pid = getpid();
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    // Check if this device is really in diff+backing file configuration
+    // Otherwise, there is no point in "refresh"ing.
+    if (!bs->backing_hd) {
+        return; // success-as-nop
+    }
+
+    // Keep a copy of parameters needed after bdrv_close(bs).
+    drv   = bs->drv;
+    flags = bs->open_flags;
+    snprintf(curfile, sizeof(curfile), "%s",        bs->filename);
+    snprintf(newfile, sizeof(newfile), "%s.%d.new", bs->filename, pid);
+    ret = snprintf(oldfile, sizeof(oldfile), "%s.%d.old", bs->filename, pid);
+
+    // Pathname too long
+    if (ret < 0 || ret >= sizeof(oldfile)) {
+        error_set(errp, QERR_BUFFER_OVERRUN);
+        return;
+    }
+
+    // Quick sanity check to prevent unexpected overwrite.
+    if (access(newfile, F_OK) == 0 || access(oldfile, F_OK) == 0) {
+        error_set(errp, QERR_UNSUPPORTED);
+        return;
+    }
+
+    // Commit all data back to backing file
+    ret = bdrv_commit(bs);
+    if (ret != 0) {
+        error_set(errp, QERR_IO_ERROR);
+        return;
+    }
+
+    // Create new block image, with same backing file and flags
+    ret = bdrv_img_create(newfile,
+                          bs->drv->format_name,
+                          bs->backing_file,
+                          bs->backing_hd->drv->format_name,
+                          NULL, -1, bs->open_flags);
+    if (ret != 0) {
+        error_set(errp, QERR_OPEN_FILE_FAILED, newfile);
+        return;
+    }
+
+    // Switch block image by close-rename-open
+    bdrv_close(bs);
+    if (rename(curfile, oldfile) != 0) goto error_rename_cur;
+    if (rename(newfile, curfile) != 0) goto error_rename_new;
+    if (bdrv_open(bs, curfile, flags, drv) != 0) goto error_reopen;
+
+    // Successfully completed
+    unlink(oldfile);
+    return;
+
+    // Upon failure, try to recover back to original state
+error_reopen:
+    rename(curfile, newfile);
+error_rename_new:
+    rename(oldfile, curfile);
+error_rename_cur:
+    ret = bdrv_open(bs, curfile, flags, drv);
+
+    if (ret != 0) {
+        error_set(errp, QERR_OPEN_FILE_FAILED, curfile);
+    }
+    else {
+        error_set(errp, QERR_UNDEFINED_ERROR);
+    }
+}
+
 static void eject_device(BlockDriverState *bs, int force, Error **errp)
 {
     if (bdrv_in_use(bs)) {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..d096dc2 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -901,6 +901,20 @@ Snapshot device, using snapshot file as target if provided
 ETEXI
 
     {
+        .name       = "block_refresh",
+        .args_type  = "device:B",
+        .params     = "device",
+        .help       = "refresh (commit+regenerate) cow image of device.",
+        .mhandler.cmd = hmp_block_refresh,
+    },
+
+STEXI
+@item block_refresh
+@findex block_refresh
+Refresh cow image of device.
+ETEXI
+
+    {
         .name       = "drive_add",
         .args_type  = "pci_addr:s,opts:s",
         .params     = "[[<domain>:]<bus>:]<slot>\n"
diff --git a/hmp.c b/hmp.c
index bb0952e..c3792fd 100644
--- a/hmp.c
+++ b/hmp.c
@@ -713,6 +713,15 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &errp);
 }
 
+void hmp_block_refresh(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    Error *errp = NULL;
+
+    qmp_block_refresh(device, &errp);
+    hmp_handle_error(mon, &errp);
+}
+
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_migrate_cancel(NULL);
diff --git a/hmp.h b/hmp.h
index 443b812..f5f61b6 100644
--- a/hmp.h
+++ b/hmp.h
@@ -55,6 +55,7 @@ void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
 void hmp_change(Monitor *mon, const QDict *qdict);
+void hmp_block_refresh(Monitor *mon, const QDict *qdict);
 void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
 void hmp_block_stream(Monitor *mon, const QDict *qdict);
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index 2ca7195..05c6b34 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1244,6 +1244,20 @@
             '*mode': 'NewImageMode'} }
 
 ##
+# @block-refresh
+#
+# Re-generates CoW image of a block device. For certain block formats,
+# this shrinks snapshot image back to minimal size.
+#
+# @device:  the name of the device to refresh the image.
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+##
+{ 'command': 'block-refresh',
+  'data': { 'device': 'str' } }
+
+##
 # @human-monitor-command:
 #
 # Execute a command on the human monitor and return the output.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-07  6:19 [Qemu-devel] CoW image commit+shrink(= make_empty) support Taisuke Yamada
@ 2012-06-07 14:14 ` Jeff Cody
  2012-06-08 12:42   ` Stefan Hajnoczi
  2012-06-08 10:39 ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Jeff Cody @ 2012-06-07 14:14 UTC (permalink / raw)
  To: Taisuke Yamada; +Cc: qemu-devel

On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
> I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling
> Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me
> to post the bits I have regarding my question on qemu's  support on shrinking
> CoW image.
> 
> Here's my problem description.
> 
> I recently designed a experimental system which holds VM master images
> on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only.
> This split-image configration is done to keep VM I/Os on a SSD
> 
> As SSD capacity is rather limited, I need to do a writeback commit from SSD to
> HDD time to time, and that is done during weekend/midnight. The problem is
> although a commit is made, that alone won't shrink CoW image - all unused blocks
> are still kept in a snapshot, and uses up space.
> 
> Patch attached is a workaround I added to cope with the problem,
> but the basic problem I faced was that both QCOW2/QED format still does not
> support "bdrv_make_empty" API.
> 
> Implementing the API (say, by hole punching) seemed like a lot of effort, so
> I ended up creating a new CoW image, and then replace current CoW
> snapshot with a new (empty) one. But I find the code ugly.
> 
> In his talk, Paolo suggested possibility of using new "live op" API for this
> task, but I'm not aware of the actual API. Is there any documentation or
> source code I can look at to re-implement above feature?
> 
> Best Regards,

Hello Taisuke-san,

I am working on a document now for a live commit proposal, with the API
being similar to the block-stream command, but for a live commit.  Here
is what I am thinking about proposing for the command:

{ 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
                                       '*top': 'str', '*speed': 'int' } }

I think something similar to the above would be good for a 'live
commit', and it would be somewhat analogous to block streaming, but in
the other direction.

One issue I see with the patch attached, is the reliance on bdrv_close()
and a subsequent bdrv_open() - once you perform a bdrv_close(), you no
longer have the ability to safely recover from error, because it is
possible for the recovery bdrv_open() to fail for some reason.

The live block commit command I am working on operates like the block
streaming code, and like transactional commands in that the use of
bdrv_close() / bdrv_open() to change an image is avoided, so that error
recovery can be safely done by just abandoning the operation.  A key
point that needs to be done 'transactionally', is to open the base or
intermediate target image with file access mode r/w, as the backing
files are open as r/o by default.

I am going to be putting all my documentation into the qemu wiki today /
tomorrow, and I will follow up with a link to that if you like.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-07  6:19 [Qemu-devel] CoW image commit+shrink(= make_empty) support Taisuke Yamada
  2012-06-07 14:14 ` Jeff Cody
@ 2012-06-08 10:39 ` Kevin Wolf
  2012-06-09 11:21   ` Taisuke Yamada
  1 sibling, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-06-08 10:39 UTC (permalink / raw)
  To: Taisuke Yamada; +Cc: jcody, qemu-devel

Am 07.06.2012 08:19, schrieb Taisuke Yamada:
> I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling
> Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me
> to post the bits I have regarding my question on qemu's  support on shrinking
> CoW image.
> 
> Here's my problem description.
> 
> I recently designed a experimental system which holds VM master images
> on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only.
> This split-image configration is done to keep VM I/Os on a SSD

This is an interesting use case that I wasn't aware of yet. So you're
not really interested in a snapshot here, but what you're trying to do
is using the SSD as some sort of a cache, right?

> As SSD capacity is rather limited, I need to do a writeback commit from SSD to
> HDD time to time, and that is done during weekend/midnight. The problem is
> although a commit is made, that alone won't shrink CoW image - all unused blocks
> are still kept in a snapshot, and uses up space.
> 
> Patch attached is a workaround I added to cope with the problem,
> but the basic problem I faced was that both QCOW2/QED format still does not
> support "bdrv_make_empty" API.
> 
> Implementing the API (say, by hole punching) seemed like a lot of effort, so
> I ended up creating a new CoW image, and then replace current CoW
> snapshot with a new (empty) one. But I find the code ugly.

It's kind of a hack indeed, but if it works...? :-)

I agree that the real solution would be hole punching. We do already
support this for raw images on XFS and we want to extend it (I think
there are even patches floating around for it). Once you have this,
implementing bdrv_make_empty() for qcow2 shouldn't be too hard, it might
actually just take calling qcow2_co_discard() and adding another discard
call in qcow2_free_cluster() that passes the request to the image file.

> In his talk, Paolo suggested possibility of using new "live op" API for this
> task, but I'm not aware of the actual API. Is there any documentation or
> source code I can look at to re-implement above feature?

The problem that a live block operation could solve sounds unrelated to
me: While you perform your 'commit' monitor command, the VM doesn't run.
Some kind of live commit would surely be helpful there (that's what Jeff
Cody is working on, iirc).

Maybe you could actually use a live commit mode where the commit stays
active all the time in the background so that you write to your SSD and
signal completion to the guest while the background job starts copying
the request to your backing file on the slow disk. Or actually, it
sounds quite similar to the "block mirror" approaches that were
discussed recently, where guest requests are duplicated to the current
(SSD) image and a secondary (HDD) image.

One other thing to consider that complicates everything is that
committing to the backing file when bdrv_make_empty is implemented
obviously also means that reads go to the slow disk now. So I guess you
really want to commit only part of the image on the SSD...

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-07 14:14 ` Jeff Cody
@ 2012-06-08 12:42   ` Stefan Hajnoczi
  2012-06-08 13:19     ` Jeff Cody
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2012-06-08 12:42 UTC (permalink / raw)
  To: jcody; +Cc: Zhi Hui Li, Taisuke Yamada, qemu-devel

On Thu, Jun 7, 2012 at 3:14 PM, Jeff Cody <jcody@redhat.com> wrote:
> On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
>> I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling
>> Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me
>> to post the bits I have regarding my question on qemu's  support on shrinking
>> CoW image.
>>
>> Here's my problem description.
>>
>> I recently designed a experimental system which holds VM master images
>> on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only.
>> This split-image configration is done to keep VM I/Os on a SSD
>>
>> As SSD capacity is rather limited, I need to do a writeback commit from SSD to
>> HDD time to time, and that is done during weekend/midnight. The problem is
>> although a commit is made, that alone won't shrink CoW image - all unused blocks
>> are still kept in a snapshot, and uses up space.
>>
>> Patch attached is a workaround I added to cope with the problem,
>> but the basic problem I faced was that both QCOW2/QED format still does not
>> support "bdrv_make_empty" API.
>>
>> Implementing the API (say, by hole punching) seemed like a lot of effort, so
>> I ended up creating a new CoW image, and then replace current CoW
>> snapshot with a new (empty) one. But I find the code ugly.
>>
>> In his talk, Paolo suggested possibility of using new "live op" API for this
>> task, but I'm not aware of the actual API. Is there any documentation or
>> source code I can look at to re-implement above feature?
>>
>> Best Regards,
>
> Hello Taisuke-san,
>
> I am working on a document now for a live commit proposal, with the API
> being similar to the block-stream command, but for a live commit.  Here
> is what I am thinking about proposing for the command:
>
> { 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
>                                       '*top': 'str', '*speed': 'int' } }
>
> I think something similar to the above would be good for a 'live
> commit', and it would be somewhat analogous to block streaming, but in
> the other direction.
>
> One issue I see with the patch attached, is the reliance on bdrv_close()
> and a subsequent bdrv_open() - once you perform a bdrv_close(), you no
> longer have the ability to safely recover from error, because it is
> possible for the recovery bdrv_open() to fail for some reason.
>
> The live block commit command I am working on operates like the block
> streaming code, and like transactional commands in that the use of
> bdrv_close() / bdrv_open() to change an image is avoided, so that error
> recovery can be safely done by just abandoning the operation.  A key
> point that needs to be done 'transactionally', is to open the base or
> intermediate target image with file access mode r/w, as the backing
> files are open as r/o by default.
>
> I am going to be putting all my documentation into the qemu wiki today /
> tomorrow, and I will follow up with a link to that if you like.

Thanks for sharing.  This is also something Zhi Hui and I have been
thinking about, my notes are below.  The key difference to Taisuke's
requirement is that I imagined we would simply not support merging the
top image down while the VM is running.  You could only merge an image
down which is not top-most.

<quote>
For incremental backup we typically have a backing file chain like this:

vm001.img <-- snap1.qcow2 <-- snap2.qcow2

The guest is writing to snap2.qcow2.  vm001.img and snap1.qcow2 are
read-only and the guest cannot write to them.

We want to commit snap1.qcow2 down into vm001.img while the guest is running:

vm001.img <-- snap2.qcow2

This means copying allocated blocks from snap1.qcow2 and writing them
into vm001.img.  Once this process is complete it is safe to delete
snap1.qcow2 since all data is now in vm001.img.

As a result we have made the backing file chain shorter.  This is
improtant because otherwise incremental backup would grow the backing
file chain forever - each time it takes a new snapshot the chain
becomes longer and I/O accesses can become slower!

The task is to add a new block job type called "commit".  It is like
the qemu-img commit command except it works while the guest is
running.

The new QMP command should look like this:

{ 'command': 'block-commit', 'data': { 'device': 'str', 'image':
'str', 'base': 'str', '*speed': 'int' }

This command can take a backing file chain:

base <- a <- b <- image <- c

It copies allocated blocks from a <- b <- image into base:

base <- c

After the operation completes a, b, and image can be deleted.

Note that block-commit cannot work on the top-most image since the
guest is still writing to that image and we might never be able to
copy all the data into the base image (the guest could write new data
as quickly as we copy it to the base).  The command should check for
this and reject the top-most image.

This command is similar to block-stream but it copies data "down" to
the backing file instead of "up" from the backing file.  It's
necessary to add this command because in most cases block-commit is
much more efficient than block-stream (the CoW file usually has much
less data than the backing file so less data needs to be copied).
</unquote>

Let's figure out how to specify block-commit so we're all happy, that
way we can avoid duplicating work.  Any comments on my notes above?

Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 12:42   ` Stefan Hajnoczi
@ 2012-06-08 13:19     ` Jeff Cody
  2012-06-08 13:53       ` Stefan Hajnoczi
  2012-06-10 16:06       ` Paolo Bonzini
  0 siblings, 2 replies; 31+ messages in thread
From: Jeff Cody @ 2012-06-08 13:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Zhi Hui Li, Taisuke Yamada, qemu-devel

On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 7, 2012 at 3:14 PM, Jeff Cody <jcody@redhat.com> wrote:
>> On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
>>> I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling
>>> Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me
>>> to post the bits I have regarding my question on qemu's  support on shrinking
>>> CoW image.
>>>
>>> Here's my problem description.
>>>
>>> I recently designed a experimental system which holds VM master images
>>> on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only.
>>> This split-image configration is done to keep VM I/Os on a SSD
>>>
>>> As SSD capacity is rather limited, I need to do a writeback commit from SSD to
>>> HDD time to time, and that is done during weekend/midnight. The problem is
>>> although a commit is made, that alone won't shrink CoW image - all unused blocks
>>> are still kept in a snapshot, and uses up space.
>>>
>>> Patch attached is a workaround I added to cope with the problem,
>>> but the basic problem I faced was that both QCOW2/QED format still does not
>>> support "bdrv_make_empty" API.
>>>
>>> Implementing the API (say, by hole punching) seemed like a lot of effort, so
>>> I ended up creating a new CoW image, and then replace current CoW
>>> snapshot with a new (empty) one. But I find the code ugly.
>>>
>>> In his talk, Paolo suggested possibility of using new "live op" API for this
>>> task, but I'm not aware of the actual API. Is there any documentation or
>>> source code I can look at to re-implement above feature?
>>>
>>> Best Regards,
>>
>> Hello Taisuke-san,
>>
>> I am working on a document now for a live commit proposal, with the API
>> being similar to the block-stream command, but for a live commit.  Here
>> is what I am thinking about proposing for the command:
>>
>> { 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
>>                                       '*top': 'str', '*speed': 'int' } }
>>
>> I think something similar to the above would be good for a 'live
>> commit', and it would be somewhat analogous to block streaming, but in
>> the other direction.
>>
>> One issue I see with the patch attached, is the reliance on bdrv_close()

>> longer have the ability to safely recover from error, because it is
>> possible for the recovery bdrv_open() to fail for some reason.
>>
>> The live block commit command I am working on operates like the block
>> streaming code, and like transactional commands in that the use of
>> bdrv_close() / bdrv_open() to change an image is avoided, so that error
>> recovery can be safely done by just abandoning the operation.  A key
>> point that needs to be done 'transactionally', is to open the base or
>> intermediate target image with file access mode r/w, as the backing
>> files are open as r/o by default.
>>
>> I am going to be putting all my documentation into the qemu wiki today /
>> tomorrow, and I will follow up with a link to that if you like.
> 
> Thanks for sharing.  This is also something Zhi Hui and I have been
> thinking about, my notes are below.  The key difference to Taisuke's
> requirement is that I imagined we would simply not support merging the
> top image down while the VM is running.  You could only merge an image
> down which is not top-most.
> 
> <quote>
> For incremental backup we typically have a backing file chain like this:
> 
> vm001.img <-- snap1.qcow2 <-- snap2.qcow2
> 
> The guest is writing to snap2.qcow2.  vm001.img and snap1.qcow2 are
> read-only and the guest cannot write to them.
> 
> We want to commit snap1.qcow2 down into vm001.img while the guest is running:
> 
> vm001.img <-- snap2.qcow2
> 
> This means copying allocated blocks from snap1.qcow2 and writing them
> into vm001.img.  Once this process is complete it is safe to delete
> snap1.qcow2 since all data is now in vm001.img.

Yes, this is the same as what we are wanting to accomplish.  The trick
here is open vm001.img r/w, in a safe manner (by safe, I mean able to
abort in case of error while keeping the guest running live).

My thoughts on this has revolved around something similar to what was
done in bdrv_append(), where a duplicate BDS is created, a new file-open
performed with the appropriate access mode flags, and if successful
swapped out for the originally opened BDS for vm001.img.  If there is an
error, the new BDS is abandoned without modifying the BDS list.

> 
> As a result we have made the backing file chain shorter.  This is
> improtant because otherwise incremental backup would grow the backing
> file chain forever - each time it takes a new snapshot the chain
> becomes longer and I/O accesses can become slower!
> 
> The task is to add a new block job type called "commit".  It is like
> the qemu-img commit command except it works while the guest is
> running.
> 
> The new QMP command should look like this:
> 
> { 'command': 'block-commit', 'data': { 'device': 'str', 'image':
> 'str', 'base': 'str', '*speed': 'int' }

This is very similar to what I was thinking as well - I think the only
difference in the command is that I what you called 'image' I called
'top', and the argument order was after base.

Here is what I had for the command:

{ 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
                                       '*top': 'str', '*speed': 'int' } }

I don't think I have a strong preference for either of our proposed
commands - they are essentially the same.

> 
> This command can take a backing file chain:
> 
> base <- a <- b <- image <- c
> 
> It copies allocated blocks from a <- b <- image into base:
> 
> base <- c
> 
> After the operation completes a, b, and image can be deleted.

Yes - and of course, and other child / leaf images of base are now
invalid, but that is just a consequence of a commit operation.

> 
> Note that block-commit cannot work on the top-most image since the
> guest is still writing to that image and we might never be able to
> copy all the data into the base image (the guest could write new data
> as quickly as we copy it to the base).  The command should check for
> this and reject the top-most image.

By this you mean that you would like to disallow committing the
top-level image to the base?  Perhaps there is a way to attempt to
converge, and adaptively give more time to the co-routine if we are able
to detect divergence.  This may require violating the 'speed' parameter,
however, and make the commit less 'live'.

> 
> This command is similar to block-stream but it copies data "down" to
> the backing file instead of "up" from the backing file.  It's
> necessary to add this command because in most cases block-commit is
> much more efficient than block-stream (the CoW file usually has much
> less data than the backing file so less data needs to be copied).
> </unquote>

Definitely, that has been my mental model has well - block-stream in
reverse.

> 
> Let's figure out how to specify block-commit so we're all happy, that
> way we can avoid duplicating work.  Any comments on my notes above?
> 

I think we are almost completely on the same page - devil is in the
details, of course (for instance, on how to convert the destination base
from r/o to r/w).


> Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 13:19     ` Jeff Cody
@ 2012-06-08 13:53       ` Stefan Hajnoczi
  2012-06-08 14:32         ` Jeff Cody
  2012-06-10 16:06       ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2012-06-08 13:53 UTC (permalink / raw)
  To: jcody; +Cc: Zhi Hui Li, Taisuke Yamada, qemu-devel

On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>> On Thu, Jun 7, 2012 at 3:14 PM, Jeff Cody <jcody@redhat.com> wrote:
>>> On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
>> We want to commit snap1.qcow2 down into vm001.img while the guest is running:
>>
>> vm001.img <-- snap2.qcow2
>>
>> This means copying allocated blocks from snap1.qcow2 and writing them
>> into vm001.img.  Once this process is complete it is safe to delete
>> snap1.qcow2 since all data is now in vm001.img.
>
> Yes, this is the same as what we are wanting to accomplish.  The trick
> here is open vm001.img r/w, in a safe manner (by safe, I mean able to
> abort in case of error while keeping the guest running live).
>
> My thoughts on this has revolved around something similar to what was
> done in bdrv_append(), where a duplicate BDS is created, a new file-open
> performed with the appropriate access mode flags, and if successful
> swapped out for the originally opened BDS for vm001.img.  If there is an
> error, the new BDS is abandoned without modifying the BDS list.

Yes, there needs to be an atomic way to try opening the image read/write.

>>
>> As a result we have made the backing file chain shorter.  This is
>> improtant because otherwise incremental backup would grow the backing
>> file chain forever - each time it takes a new snapshot the chain
>> becomes longer and I/O accesses can become slower!
>>
>> The task is to add a new block job type called "commit".  It is like
>> the qemu-img commit command except it works while the guest is
>> running.
>>
>> The new QMP command should look like this:
>>
>> { 'command': 'block-commit', 'data': { 'device': 'str', 'image':
>> 'str', 'base': 'str', '*speed': 'int' }
>
> This is very similar to what I was thinking as well - I think the only
> difference in the command is that I what you called 'image' I called
> 'top', and the argument order was after base.
>
> Here is what I had for the command:
>
> { 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
>                                       '*top': 'str', '*speed': 'int' } }
>
> I don't think I have a strong preference for either of our proposed
> commands - they are essentially the same.

Yes, they are basically the same.  I don't mind which one we use.

>>
>> Note that block-commit cannot work on the top-most image since the
>> guest is still writing to that image and we might never be able to
>> copy all the data into the base image (the guest could write new data
>> as quickly as we copy it to the base).  The command should check for
>> this and reject the top-most image.
>
> By this you mean that you would like to disallow committing the
> top-level image to the base?  Perhaps there is a way to attempt to
> converge, and adaptively give more time to the co-routine if we are able
> to detect divergence.  This may require violating the 'speed' parameter,
> however, and make the commit less 'live'.

Yes, I think we should disallow merging down the topmost image.  The
convergence problem is the same issue that live migration has.  It's a
hard problem and not something that is essential for block-commit.  It
would add complexity into the block-commit implementation - the only
benefit would be that you can merge down the last COW snapshot.  We
already have an implementation that does this: the "commit" command
which stops the guest :).

If we take Taisuke's scenario, just create a snapshot on the SSD
immediately before merging down:

  /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/cow.qcow2

(qemu) snapshot_blkdev ...

  /big-slow-cheap-disk/master.img <-
/small-fast-expensive-ssd/cow.qcow2 <-
/small-fast-expensive-ssd/newcow.qcow2

(qemu) block-commit cow.qcow2

  /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/newcow.qcow2

>>
>> Let's figure out how to specify block-commit so we're all happy, that
>> way we can avoid duplicating work.  Any comments on my notes above?
>>
>
> I think we are almost completely on the same page - devil is in the
> details, of course (for instance, on how to convert the destination base
> from r/o to r/w).

Great.  The atomic r/o -> r/w transition and the commit coroutine can
be implemented on in parallel.  Are you happy to implement the atomic
r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
assume that part already works and focus on implementing the commit
coroutine in the meantime.  I'm just suggesting a way to split up the
work, please let me know if you think this is good.

Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 13:53       ` Stefan Hajnoczi
@ 2012-06-08 14:32         ` Jeff Cody
  2012-06-08 16:11           ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Cody @ 2012-06-08 14:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Zhi Hui Li, Taisuke Yamada, qemu-devel

On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>> On Thu, Jun 7, 2012 at 3:14 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>> On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
>>> We want to commit snap1.qcow2 down into vm001.img while the guest is running:
>>>
>>> vm001.img <-- snap2.qcow2
>>>
>>> This means copying allocated blocks from snap1.qcow2 and writing them
>>> into vm001.img.  Once this process is complete it is safe to delete
>>> snap1.qcow2 since all data is now in vm001.img.
>>
>> Yes, this is the same as what we are wanting to accomplish.  The trick
>> here is open vm001.img r/w, in a safe manner (by safe, I mean able to
>> abort in case of error while keeping the guest running live).
>>
>> My thoughts on this has revolved around something similar to what was
>> done in bdrv_append(), where a duplicate BDS is created, a new file-open
>> performed with the appropriate access mode flags, and if successful
>> swapped out for the originally opened BDS for vm001.img.  If there is an
>> error, the new BDS is abandoned without modifying the BDS list.
> 
> Yes, there needs to be an atomic way to try opening the image read/write.
> 
>>>
>>> As a result we have made the backing file chain shorter.  This is
>>> improtant because otherwise incremental backup would grow the backing
>>> file chain forever - each time it takes a new snapshot the chain
>>> becomes longer and I/O accesses can become slower!
>>>
>>> The task is to add a new block job type called "commit".  It is like
>>> the qemu-img commit command except it works while the guest is
>>> running.
>>>
>>> The new QMP command should look like this:
>>>
>>> { 'command': 'block-commit', 'data': { 'device': 'str', 'image':
>>> 'str', 'base': 'str', '*speed': 'int' }
>>
>> This is very similar to what I was thinking as well - I think the only
>> difference in the command is that I what you called 'image' I called
>> 'top', and the argument order was after base.
>>
>> Here is what I had for the command:
>>
>> { 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
>>                                       '*top': 'str', '*speed': 'int' } }
>>
>> I don't think I have a strong preference for either of our proposed
>> commands - they are essentially the same.
> 
> Yes, they are basically the same.  I don't mind which one we use.
> 
>>>
>>> Note that block-commit cannot work on the top-most image since the
>>> guest is still writing to that image and we might never be able to
>>> copy all the data into the base image (the guest could write new data
>>> as quickly as we copy it to the base).  The command should check for
>>> this and reject the top-most image.
>>
>> By this you mean that you would like to disallow committing the
>> top-level image to the base?  Perhaps there is a way to attempt to
>> converge, and adaptively give more time to the co-routine if we are able
>> to detect divergence.  This may require violating the 'speed' parameter,
>> however, and make the commit less 'live'.
> 
> Yes, I think we should disallow merging down the topmost image.  The
> convergence problem is the same issue that live migration has.  It's a
> hard problem and not something that is essential for block-commit.  It
> would add complexity into the block-commit implementation - the only
> benefit would be that you can merge down the last COW snapshot.  We
> already have an implementation that does this: the "commit" command
> which stops the guest :).

OK.  And, it is better to get this implemented now, and if desired, I
suppose we can always revisit the convergence at a later date (or not at
all, if there is no pressing case for it).

> 
> If we take Taisuke's scenario, just create a snapshot on the SSD
> immediately before merging down:
> 
>   /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/cow.qcow2
> 
> (qemu) snapshot_blkdev ...
> 
>   /big-slow-cheap-disk/master.img <-
> /small-fast-expensive-ssd/cow.qcow2 <-
> /small-fast-expensive-ssd/newcow.qcow2
> 
> (qemu) block-commit cow.qcow2
> 
>   /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/newcow.qcow2
> 
>>>
>>> Let's figure out how to specify block-commit so we're all happy, that
>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>
>>
>> I think we are almost completely on the same page - devil is in the
>> details, of course (for instance, on how to convert the destination base
>> from r/o to r/w).
> 
> Great.  The atomic r/o -> r/w transition and the commit coroutine can
> be implemented on in parallel.  Are you happy to implement the atomic
> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
> assume that part already works and focus on implementing the commit
> coroutine in the meantime.  I'm just suggesting a way to split up the
> work, please let me know if you think this is good.

I am happy to do it that way.  I'll shift my focus to the atomic image
reopen in r/w mode.  I'll go ahead and post my diagrams and other info
for block-commit on the wiki, because I don't think it conflicts with we
discussed above (although I will modify my diagrams to not show commit
from the top-level image).  Of course, feel free to change it as
necessary.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 14:32         ` Jeff Cody
@ 2012-06-08 16:11           ` Kevin Wolf
  2012-06-08 17:46             ` Jeff Cody
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-06-08 16:11 UTC (permalink / raw)
  To: jcody; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

Am 08.06.2012 16:32, schrieb Jeff Cody:
> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>> Note that block-commit cannot work on the top-most image since the
>>>> guest is still writing to that image and we might never be able to
>>>> copy all the data into the base image (the guest could write new data
>>>> as quickly as we copy it to the base).  The command should check for
>>>> this and reject the top-most image.
>>>
>>> By this you mean that you would like to disallow committing the
>>> top-level image to the base?  Perhaps there is a way to attempt to
>>> converge, and adaptively give more time to the co-routine if we are able
>>> to detect divergence.  This may require violating the 'speed' parameter,
>>> however, and make the commit less 'live'.
>>
>> Yes, I think we should disallow merging down the topmost image.  The
>> convergence problem is the same issue that live migration has.  It's a
>> hard problem and not something that is essential for block-commit.  It
>> would add complexity into the block-commit implementation - the only
>> benefit would be that you can merge down the last COW snapshot.  We
>> already have an implementation that does this: the "commit" command
>> which stops the guest :).
> 
> OK.  And, it is better to get this implemented now, and if desired, I
> suppose we can always revisit the convergence at a later date (or not at
> all, if there is no pressing case for it).

Not allowing live commit for the topmost image may be a good way to
break the problem down into more manageable pieces, but eventually the
feature isn't complete until you can do it.

Basically what you need there is the equivalent of an active mirror. I
wouldn't be surprised if actually code could be reused for both.

>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>
>>>
>>> I think we are almost completely on the same page - devil is in the
>>> details, of course (for instance, on how to convert the destination base
>>> from r/o to r/w).
>>
>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>> be implemented on in parallel.  Are you happy to implement the atomic
>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>> assume that part already works and focus on implementing the commit
>> coroutine in the meantime.  I'm just suggesting a way to split up the
>> work, please let me know if you think this is good.
> 
> I am happy to do it that way.  I'll shift my focus to the atomic image
> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
> for block-commit on the wiki, because I don't think it conflicts with we
> discussed above (although I will modify my diagrams to not show commit
> from the top-level image).  Of course, feel free to change it as
> necessary.

I may have mentioned it before, but just in case: I think Supriya's
bdrv_reopen() patches are a good starting point. I don't know why they
were never completed, but I think we all agreed on the general design,
so it should be possible to pick that up.

Though if you have already started with your own work on it, Jeff, I
expect that it won't be much different because it's basically the same
transactional approach that you know and that we already used for group
snapshots.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 16:11           ` Kevin Wolf
@ 2012-06-08 17:46             ` Jeff Cody
  2012-06-08 17:57               ` Kevin Wolf
  2012-06-11 12:09               ` Stefan Hajnoczi
  0 siblings, 2 replies; 31+ messages in thread
From: Jeff Cody @ 2012-06-08 17:46 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

On 06/08/2012 12:11 PM, Kevin Wolf wrote:
> Am 08.06.2012 16:32, schrieb Jeff Cody:
>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>> Note that block-commit cannot work on the top-most image since the
>>>>> guest is still writing to that image and we might never be able to
>>>>> copy all the data into the base image (the guest could write new data
>>>>> as quickly as we copy it to the base).  The command should check for
>>>>> this and reject the top-most image.
>>>>
>>>> By this you mean that you would like to disallow committing the
>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>> converge, and adaptively give more time to the co-routine if we are able
>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>> however, and make the commit less 'live'.
>>>
>>> Yes, I think we should disallow merging down the topmost image.  The
>>> convergence problem is the same issue that live migration has.  It's a
>>> hard problem and not something that is essential for block-commit.  It
>>> would add complexity into the block-commit implementation - the only
>>> benefit would be that you can merge down the last COW snapshot.  We
>>> already have an implementation that does this: the "commit" command
>>> which stops the guest :).
>>
>> OK.  And, it is better to get this implemented now, and if desired, I
>> suppose we can always revisit the convergence at a later date (or not at
>> all, if there is no pressing case for it).
> 
> Not allowing live commit for the topmost image may be a good way to
> break the problem down into more manageable pieces, but eventually the
> feature isn't complete until you can do it.
> 
> Basically what you need there is the equivalent of an active mirror. I
> wouldn't be surprised if actually code could be reused for both.

I agree, doing it like mirroring for new writes on the top layer makes
sense, as long as you are willing to violate the (optional) speed
parameter.  I wouldn't think violating the speed parameter in that case
would be an issue, as long as it is a documented affect of performing a
live commit on the active (top) layer.

> 
>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>
>>>>
>>>> I think we are almost completely on the same page - devil is in the
>>>> details, of course (for instance, on how to convert the destination base
>>>> from r/o to r/w).
>>>
>>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>>> be implemented on in parallel.  Are you happy to implement the atomic
>>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>> assume that part already works and focus on implementing the commit
>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>> work, please let me know if you think this is good.
>>
>> I am happy to do it that way.  I'll shift my focus to the atomic image
>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>> for block-commit on the wiki, because I don't think it conflicts with we
>> discussed above (although I will modify my diagrams to not show commit
>> from the top-level image).  Of course, feel free to change it as
>> necessary.
> 
> I may have mentioned it before, but just in case: I think Supriya's
> bdrv_reopen() patches are a good starting point. I don't know why they
> were never completed, but I think we all agreed on the general design,
> so it should be possible to pick that up.
> 
> Though if you have already started with your own work on it, Jeff, I
> expect that it won't be much different because it's basically the same
> transactional approach that you know and that we already used for group
> snapshots.
>

I will definitely use parts of Supriya's as it makes sense - what I
started work on is similar to bdrv_append() and the current transaction
approach, so there will be plenty in common to reuse, even with some
differences.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 17:46             ` Jeff Cody
@ 2012-06-08 17:57               ` Kevin Wolf
  2012-06-08 18:33                 ` Jeff Cody
  2012-06-10 16:10                 ` Paolo Bonzini
  2012-06-11 12:09               ` Stefan Hajnoczi
  1 sibling, 2 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-06-08 17:57 UTC (permalink / raw)
  To: jcody; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

Am 08.06.2012 19:46, schrieb Jeff Cody:
> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>> Note that block-commit cannot work on the top-most image since the
>>>>>> guest is still writing to that image and we might never be able to
>>>>>> copy all the data into the base image (the guest could write new data
>>>>>> as quickly as we copy it to the base).  The command should check for
>>>>>> this and reject the top-most image.
>>>>>
>>>>> By this you mean that you would like to disallow committing the
>>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>>> converge, and adaptively give more time to the co-routine if we are able
>>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>>> however, and make the commit less 'live'.
>>>>
>>>> Yes, I think we should disallow merging down the topmost image.  The
>>>> convergence problem is the same issue that live migration has.  It's a
>>>> hard problem and not something that is essential for block-commit.  It
>>>> would add complexity into the block-commit implementation - the only
>>>> benefit would be that you can merge down the last COW snapshot.  We
>>>> already have an implementation that does this: the "commit" command
>>>> which stops the guest :).
>>>
>>> OK.  And, it is better to get this implemented now, and if desired, I
>>> suppose we can always revisit the convergence at a later date (or not at
>>> all, if there is no pressing case for it).
>>
>> Not allowing live commit for the topmost image may be a good way to
>> break the problem down into more manageable pieces, but eventually the
>> feature isn't complete until you can do it.
>>
>> Basically what you need there is the equivalent of an active mirror. I
>> wouldn't be surprised if actually code could be reused for both.
> 
> I agree, doing it like mirroring for new writes on the top layer makes
> sense, as long as you are willing to violate the (optional) speed
> parameter.  I wouldn't think violating the speed parameter in that case
> would be an issue, as long as it is a documented affect of performing a
> live commit on the active (top) layer.

The question is what the speed parameter really means for a live commit
block job (or for an active mirror). I think it would make sense to
limit only the speed of I/O that doesn't come from the guest but is from
the background copying.

If you did apply it to guest I/O as well, then I think the logical
consequence would be that guest I/O is throttled as well because
requests only complete when they have reached both source and target.

(I know I'm happily mixing terminology of all kinds of live block jobs
here, hope you understand anyway what I mean ;-))

>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>> for block-commit on the wiki, because I don't think it conflicts with we
>>> discussed above (although I will modify my diagrams to not show commit
>>> from the top-level image).  Of course, feel free to change it as
>>> necessary.
>>
>> I may have mentioned it before, but just in case: I think Supriya's
>> bdrv_reopen() patches are a good starting point. I don't know why they
>> were never completed, but I think we all agreed on the general design,
>> so it should be possible to pick that up.
>>
>> Though if you have already started with your own work on it, Jeff, I
>> expect that it won't be much different because it's basically the same
>> transactional approach that you know and that we already used for group
>> snapshots.
> 
> I will definitely use parts of Supriya's as it makes sense - what I
> started work on is similar to bdrv_append() and the current transaction
> approach, so there will be plenty in common to reuse, even with some
> differences.

I'm not sure how far a bdrv_append-like approach will work in this case.
The problem is that you must not open any image r/w twice. The policy is
that you can open an image either once r/w or multiple times read-only.

Raw images are probably forgiving if you happen to open them twice r/w,
but more complex image formats like qcow2 or qed are likely to get
corrupted if you do it.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 17:57               ` Kevin Wolf
@ 2012-06-08 18:33                 ` Jeff Cody
  2012-06-08 21:08                   ` Kevin Wolf
  2012-06-10 16:10                 ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Jeff Cody @ 2012-06-08 18:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

On 06/08/2012 01:57 PM, Kevin Wolf wrote:
> Am 08.06.2012 19:46, schrieb Jeff Cody:
>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>> Note that block-commit cannot work on the top-most image since the
>>>>>>> guest is still writing to that image and we might never be able to
>>>>>>> copy all the data into the base image (the guest could write new data
>>>>>>> as quickly as we copy it to the base).  The command should check for
>>>>>>> this and reject the top-most image.
>>>>>>
>>>>>> By this you mean that you would like to disallow committing the
>>>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>>>> converge, and adaptively give more time to the co-routine if we are able
>>>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>>>> however, and make the commit less 'live'.
>>>>>
>>>>> Yes, I think we should disallow merging down the topmost image.  The
>>>>> convergence problem is the same issue that live migration has.  It's a
>>>>> hard problem and not something that is essential for block-commit.  It
>>>>> would add complexity into the block-commit implementation - the only
>>>>> benefit would be that you can merge down the last COW snapshot.  We
>>>>> already have an implementation that does this: the "commit" command
>>>>> which stops the guest :).
>>>>
>>>> OK.  And, it is better to get this implemented now, and if desired, I
>>>> suppose we can always revisit the convergence at a later date (or not at
>>>> all, if there is no pressing case for it).
>>>
>>> Not allowing live commit for the topmost image may be a good way to
>>> break the problem down into more manageable pieces, but eventually the
>>> feature isn't complete until you can do it.
>>>
>>> Basically what you need there is the equivalent of an active mirror. I
>>> wouldn't be surprised if actually code could be reused for both.
>>
>> I agree, doing it like mirroring for new writes on the top layer makes
>> sense, as long as you are willing to violate the (optional) speed
>> parameter.  I wouldn't think violating the speed parameter in that case
>> would be an issue, as long as it is a documented affect of performing a
>> live commit on the active (top) layer.
> 
> The question is what the speed parameter really means for a live commit
> block job (or for an active mirror). I think it would make sense to
> limit only the speed of I/O that doesn't come from the guest but is from
> the background copying.
> 
> If you did apply it to guest I/O as well, then I think the logical
> consequence would be that guest I/O is throttled as well because
> requests only complete when they have reached both source and target.
> 
> (I know I'm happily mixing terminology of all kinds of live block jobs
> here, hope you understand anyway what I mean ;-))

I understand what you mean :) My thought is that the block commit
'speed' parameter would essentially only apply to 'old data', that is
being copied in the commit coroutine, and it would be ignored (violated)
for 'new data' in the mirroring-like case.

I assumed we wouldn't want to throttle the guest I/O, but that may be a
bad assumption on my part.

> 
>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>> discussed above (although I will modify my diagrams to not show commit
>>>> from the top-level image).  Of course, feel free to change it as
>>>> necessary.
>>>
>>> I may have mentioned it before, but just in case: I think Supriya's
>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>> were never completed, but I think we all agreed on the general design,
>>> so it should be possible to pick that up.
>>>
>>> Though if you have already started with your own work on it, Jeff, I
>>> expect that it won't be much different because it's basically the same
>>> transactional approach that you know and that we already used for group
>>> snapshots.
>>
>> I will definitely use parts of Supriya's as it makes sense - what I
>> started work on is similar to bdrv_append() and the current transaction
>> approach, so there will be plenty in common to reuse, even with some
>> differences.
> 
> I'm not sure how far a bdrv_append-like approach will work in this case.
> The problem is that you must not open any image r/w twice. The policy is
> that you can open an image either once r/w or multiple times read-only.

I think there are 3 possibilities that would be OK: open once r/w, one
or more times r/o, or once r/w + one or more times r/o.  Do you agree
with this?  Is the latter case against block-layer policy?

In this special case for block commit, we need to do the reopen() to go
from a r/o image to a r/w image.  If an image is opened r/o, the drv
layer can do another open with a new descriptor, this time r/w, without
closing the r/o instance.  If the new r/w open was successful, then the
r/o instance can be closed, and some bdrv_append()-like operations done.
Otherwise, like the current transactional code, we can just 'walk away'
by only closing the new image (if it got that far), and the existing BDS
with the r/o image is safely unchanged (and never closed, so no
unrecoverable error paths).

> Raw images are probably forgiving if you happen to open them twice r/w,
> but more complex image formats like qcow2 or qed are likely to get
> corrupted if you do it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 18:33                 ` Jeff Cody
@ 2012-06-08 21:08                   ` Kevin Wolf
  2012-06-09 16:52                     ` Jeff Cody
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-06-08 21:08 UTC (permalink / raw)
  To: jcody; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

Am 08.06.2012 20:33, schrieb Jeff Cody:
> On 06/08/2012 01:57 PM, Kevin Wolf wrote:
>> Am 08.06.2012 19:46, schrieb Jeff Cody:
>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>> Note that block-commit cannot work on the top-most image since the
>>>>>>>> guest is still writing to that image and we might never be able to
>>>>>>>> copy all the data into the base image (the guest could write new data
>>>>>>>> as quickly as we copy it to the base).  The command should check for
>>>>>>>> this and reject the top-most image.
>>>>>>>
>>>>>>> By this you mean that you would like to disallow committing the
>>>>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>>>>> converge, and adaptively give more time to the co-routine if we are able
>>>>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>>>>> however, and make the commit less 'live'.
>>>>>>
>>>>>> Yes, I think we should disallow merging down the topmost image.  The
>>>>>> convergence problem is the same issue that live migration has.  It's a
>>>>>> hard problem and not something that is essential for block-commit.  It
>>>>>> would add complexity into the block-commit implementation - the only
>>>>>> benefit would be that you can merge down the last COW snapshot.  We
>>>>>> already have an implementation that does this: the "commit" command
>>>>>> which stops the guest :).
>>>>>
>>>>> OK.  And, it is better to get this implemented now, and if desired, I
>>>>> suppose we can always revisit the convergence at a later date (or not at
>>>>> all, if there is no pressing case for it).
>>>>
>>>> Not allowing live commit for the topmost image may be a good way to
>>>> break the problem down into more manageable pieces, but eventually the
>>>> feature isn't complete until you can do it.
>>>>
>>>> Basically what you need there is the equivalent of an active mirror. I
>>>> wouldn't be surprised if actually code could be reused for both.
>>>
>>> I agree, doing it like mirroring for new writes on the top layer makes
>>> sense, as long as you are willing to violate the (optional) speed
>>> parameter.  I wouldn't think violating the speed parameter in that case
>>> would be an issue, as long as it is a documented affect of performing a
>>> live commit on the active (top) layer.
>>
>> The question is what the speed parameter really means for a live commit
>> block job (or for an active mirror). I think it would make sense to
>> limit only the speed of I/O that doesn't come from the guest but is from
>> the background copying.
>>
>> If you did apply it to guest I/O as well, then I think the logical
>> consequence would be that guest I/O is throttled as well because
>> requests only complete when they have reached both source and target.
>>
>> (I know I'm happily mixing terminology of all kinds of live block jobs
>> here, hope you understand anyway what I mean ;-))
> 
> I understand what you mean :) My thought is that the block commit
> 'speed' parameter would essentially only apply to 'old data', that is
> being copied in the commit coroutine, and it would be ignored (violated)
> for 'new data' in the mirroring-like case.
> 
> I assumed we wouldn't want to throttle the guest I/O, but that may be a
> bad assumption on my part.

I don't disagree, I just wanted to point out that throttling the guest
would be another option. Applying the limit only to old data probably
makes most sense indeed.

>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>> from the top-level image).  Of course, feel free to change it as
>>>>> necessary.
>>>>
>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>> were never completed, but I think we all agreed on the general design,
>>>> so it should be possible to pick that up.
>>>>
>>>> Though if you have already started with your own work on it, Jeff, I
>>>> expect that it won't be much different because it's basically the same
>>>> transactional approach that you know and that we already used for group
>>>> snapshots.
>>>
>>> I will definitely use parts of Supriya's as it makes sense - what I
>>> started work on is similar to bdrv_append() and the current transaction
>>> approach, so there will be plenty in common to reuse, even with some
>>> differences.
>>
>> I'm not sure how far a bdrv_append-like approach will work in this case.
>> The problem is that you must not open any image r/w twice. The policy is
>> that you can open an image either once r/w or multiple times read-only.
> 
> I think there are 3 possibilities that would be OK: open once r/w, one
> or more times r/o, or once r/w + one or more times r/o.  Do you agree
> with this?  Is the latter case against block-layer policy?

Yes, it is. The problem is that image formats have data cached in
memory, and as soon as you have a writer, the other users of the same
image file - no matter whether r/w or r/o - get out of sync. They might
mix old (cached) matadata with other new metadata, and I think you can
imagine what mess would result.

Strictly speaking even raw caches things like the image size, so if the
r/w instance happens to resize the image, the additional r/o readers
would have a problem even with raw.

> In this special case for block commit, we need to do the reopen() to go
> from a r/o image to a r/w image.  If an image is opened r/o, the drv
> layer can do another open with a new descriptor, this time r/w, without
> closing the r/o instance.  If the new r/w open was successful, then the
> r/o instance can be closed, and some bdrv_append()-like operations done.
> Otherwise, like the current transactional code, we can just 'walk away'
> by only closing the new image (if it got that far), and the existing BDS
> with the r/o image is safely unchanged (and never closed, so no
> unrecoverable error paths).

Yes. It's just that you need to be very careful. You would ask the qcow2
driver to reopen the image, and it would take care to invalidate any
cached data and then pass the request on to the next layer. When you
have drilled down through the stack to raw-posix, it would actually open
the second file descriptor and wait for a commit/abort that goes through
the whole stack again.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 10:39 ` Kevin Wolf
@ 2012-06-09 11:21   ` Taisuke Yamada
  0 siblings, 0 replies; 31+ messages in thread
From: Taisuke Yamada @ 2012-06-09 11:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: jcody, qemu-devel

>> This split-image configration is done to keep VM I/Os on a SSD
>
> This is an interesting use case that I wasn't aware of yet. So you're
> not really interested in a snapshot here, but what you're trying to do
> is using the SSD as some sort of a cache, right?

Right. By splitting, not only I can have "SSD+HDD" configuration,
but I can also go with "local SSD cache + remote storage" configuration.
My aim is to reduce an I/O load from slow or expensive components like
HDD or network.

> The problem that a live block operation could solve sounds unrelated to
> me: While you perform your 'commit' monitor command, the VM doesn't run.
> Some kind of live commit would surely be helpful there (that's what Jeff
> Cody is working on, iirc).

It seems Jeff's work is very close to what I'm looking for.
I'm not yet sure with detail of live-commit, but transaction API does help.
I'll definitely check out wiki doc once Jeff updates it.

> One other thing to consider that complicates everything is that
> committing to the backing file when bdrv_make_empty is implemented
> obviously also means that reads go to the slow disk now. So I guess you
> really want to commit only part of the image on the SSD...

Yes. Althrough my primary focus is on write IOs, combined with qemu's
IO tracing capability, partial committing would be optimal as I can then
always keep frequently read sectors in cache. But that would be complex indeed.

Best Regards,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 21:08                   ` Kevin Wolf
@ 2012-06-09 16:52                     ` Jeff Cody
  2012-06-11  7:57                       ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Cody @ 2012-06-09 16:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

On 06/08/2012 05:08 PM, Kevin Wolf wrote:
> Am 08.06.2012 20:33, schrieb Jeff Cody:
>> On 06/08/2012 01:57 PM, Kevin Wolf wrote:
>>> Am 08.06.2012 19:46, schrieb Jeff Cody:
>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>> Note that block-commit cannot work on the top-most image since the
>>>>>>>>> guest is still writing to that image and we might never be able to
>>>>>>>>> copy all the data into the base image (the guest could write new data
>>>>>>>>> as quickly as we copy it to the base).  The command should check for
>>>>>>>>> this and reject the top-most image.
>>>>>>>>
>>>>>>>> By this you mean that you would like to disallow committing the
>>>>>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>>>>>> converge, and adaptively give more time to the co-routine if we are able
>>>>>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>>>>>> however, and make the commit less 'live'.
>>>>>>>
>>>>>>> Yes, I think we should disallow merging down the topmost image.  The
>>>>>>> convergence problem is the same issue that live migration has.  It's a
>>>>>>> hard problem and not something that is essential for block-commit.  It
>>>>>>> would add complexity into the block-commit implementation - the only
>>>>>>> benefit would be that you can merge down the last COW snapshot.  We
>>>>>>> already have an implementation that does this: the "commit" command
>>>>>>> which stops the guest :).
>>>>>>
>>>>>> OK.  And, it is better to get this implemented now, and if desired, I
>>>>>> suppose we can always revisit the convergence at a later date (or not at
>>>>>> all, if there is no pressing case for it).
>>>>>
>>>>> Not allowing live commit for the topmost image may be a good way to
>>>>> break the problem down into more manageable pieces, but eventually the
>>>>> feature isn't complete until you can do it.
>>>>>
>>>>> Basically what you need there is the equivalent of an active mirror. I
>>>>> wouldn't be surprised if actually code could be reused for both.
>>>>
>>>> I agree, doing it like mirroring for new writes on the top layer makes
>>>> sense, as long as you are willing to violate the (optional) speed
>>>> parameter.  I wouldn't think violating the speed parameter in that case
>>>> would be an issue, as long as it is a documented affect of performing a
>>>> live commit on the active (top) layer.
>>>
>>> The question is what the speed parameter really means for a live commit
>>> block job (or for an active mirror). I think it would make sense to
>>> limit only the speed of I/O that doesn't come from the guest but is from
>>> the background copying.
>>>
>>> If you did apply it to guest I/O as well, then I think the logical
>>> consequence would be that guest I/O is throttled as well because
>>> requests only complete when they have reached both source and target.
>>>
>>> (I know I'm happily mixing terminology of all kinds of live block jobs
>>> here, hope you understand anyway what I mean ;-))
>>
>> I understand what you mean :) My thought is that the block commit
>> 'speed' parameter would essentially only apply to 'old data', that is
>> being copied in the commit coroutine, and it would be ignored (violated)
>> for 'new data' in the mirroring-like case.
>>
>> I assumed we wouldn't want to throttle the guest I/O, but that may be a
>> bad assumption on my part.
> 
> I don't disagree, I just wanted to point out that throttling the guest
> would be another option. Applying the limit only to old data probably
> makes most sense indeed.
> 

Given that, perhaps it does make sense to get it implemented in two
phases - first the straightforward block commit for images below the
active layer, and then a follow-up patch set to implement committing in
the top layer alongside an active mirroring approach for convergence.

>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>> necessary.
>>>>>
>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>> were never completed, but I think we all agreed on the general design,
>>>>> so it should be possible to pick that up.
>>>>>
>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>> expect that it won't be much different because it's basically the same
>>>>> transactional approach that you know and that we already used for group
>>>>> snapshots.
>>>>
>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>> started work on is similar to bdrv_append() and the current transaction
>>>> approach, so there will be plenty in common to reuse, even with some
>>>> differences.
>>>
>>> I'm not sure how far a bdrv_append-like approach will work in this case.
>>> The problem is that you must not open any image r/w twice. The policy is
>>> that you can open an image either once r/w or multiple times read-only.
>>
>> I think there are 3 possibilities that would be OK: open once r/w, one
>> or more times r/o, or once r/w + one or more times r/o.  Do you agree
>> with this?  Is the latter case against block-layer policy?
> 
> Yes, it is. The problem is that image formats have data cached in
> memory, and as soon as you have a writer, the other users of the same
> image file - no matter whether r/w or r/o - get out of sync. They might
> mix old (cached) matadata with other new metadata, and I think you can
> imagine what mess would result.
> 
> Strictly speaking even raw caches things like the image size, so if the
> r/w instance happens to resize the image, the additional r/o readers
> would have a problem even with raw.
> 

OK, thanks. I should have qualified that with saying, for this
application, that it is not intended to have the images open r/w and r/o
for ongoing operations; it is just for the purpose of overlapping the
closing of the r/o instance.

>> In this special case for block commit, we need to do the reopen() to go
>> from a r/o image to a r/w image.  If an image is opened r/o, the drv
>> layer can do another open with a new descriptor, this time r/w, without
>> closing the r/o instance.  If the new r/w open was successful, then the
>> r/o instance can be closed, and some bdrv_append()-like operations done.
>> Otherwise, like the current transactional code, we can just 'walk away'
>> by only closing the new image (if it got that far), and the existing BDS
>> with the r/o image is safely unchanged (and never closed, so no
>> unrecoverable error paths).
> 
> Yes. It's just that you need to be very careful. You would ask the qcow2
> driver to reopen the image, and it would take care to invalidate any
> cached data and then pass the request on to the next layer. When you
> have drilled down through the stack to raw-posix, it would actually open
> the second file descriptor and wait for a commit/abort that goes through
> the whole stack again.
>

That makes sense, and I agree that it is a bit tricky.  The other
question is what protocol / drivers will support this sort of operation,
as well. And I certainly think that using Supriya's patches will prove
useful.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 13:19     ` Jeff Cody
  2012-06-08 13:53       ` Stefan Hajnoczi
@ 2012-06-10 16:06       ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2012-06-10 16:06 UTC (permalink / raw)
  To: jcody; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

Il 08/06/2012 15:19, Jeff Cody ha scritto:
>> > 
>> > This means copying allocated blocks from snap1.qcow2 and writing them
>> > into vm001.img.  Once this process is complete it is safe to delete
>> > snap1.qcow2 since all data is now in vm001.img.
> Yes, this is the same as what we are wanting to accomplish.  The trick
> here is open vm001.img r/w, in a safe manner (by safe, I mean able to
> abort in case of error while keeping the guest running live).
> 
> My thoughts on this has revolved around something similar to what was
> done in bdrv_append(), where a duplicate BDS is created, a new file-open
> performed with the appropriate access mode flags, and if successful
> swapped out for the originally opened BDS for vm001.img.  If there is an
> error, the new BDS is abandoned without modifying the BDS list.
> 

I haven't yet read the rest of the thread, but I'll drop here that
indeed I have a bdrv_swap in my patches for block mirroring.

A live commit of the topmost image is really just a mirroring operation,
the only thing that changes is the mechanics of opening the target and
perhaps also switching to it.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 17:57               ` Kevin Wolf
  2012-06-08 18:33                 ` Jeff Cody
@ 2012-06-10 16:10                 ` Paolo Bonzini
  2012-06-11  7:59                   ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2012-06-10 16:10 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, jcody, Zhi Hui Li, Taisuke Yamada, qemu-devel

Il 08/06/2012 19:57, Kevin Wolf ha scritto:
>> > 
>> > I agree, doing it like mirroring for new writes on the top layer makes
>> > sense, as long as you are willing to violate the (optional) speed
>> > parameter.  I wouldn't think violating the speed parameter in that case
>> > would be an issue, as long as it is a documented affect of performing a
>> > live commit on the active (top) layer.
> The question is what the speed parameter really means for a live commit
> block job (or for an active mirror). I think it would make sense to
> limit only the speed of I/O that doesn't come from the guest but is from
> the background copying.

Not necessarily, you can throttle the copying speed.  If the I/O from
the guest has bursts, you will pick up the work later when the rate
calms down.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-09 16:52                     ` Jeff Cody
@ 2012-06-11  7:57                       ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2012-06-11  7:57 UTC (permalink / raw)
  To: jcody; +Cc: Stefan Hajnoczi, Zhi Hui Li, Taisuke Yamada, qemu-devel

Am 09.06.2012 18:52, schrieb Jeff Cody:
> On 06/08/2012 05:08 PM, Kevin Wolf wrote:
>> Am 08.06.2012 20:33, schrieb Jeff Cody:
>>> On 06/08/2012 01:57 PM, Kevin Wolf wrote:
>>>> The question is what the speed parameter really means for a live commit
>>>> block job (or for an active mirror). I think it would make sense to
>>>> limit only the speed of I/O that doesn't come from the guest but is from
>>>> the background copying.
>>>>
>>>> If you did apply it to guest I/O as well, then I think the logical
>>>> consequence would be that guest I/O is throttled as well because
>>>> requests only complete when they have reached both source and target.
>>>>
>>>> (I know I'm happily mixing terminology of all kinds of live block jobs
>>>> here, hope you understand anyway what I mean ;-))
>>>
>>> I understand what you mean :) My thought is that the block commit
>>> 'speed' parameter would essentially only apply to 'old data', that is
>>> being copied in the commit coroutine, and it would be ignored (violated)
>>> for 'new data' in the mirroring-like case.
>>>
>>> I assumed we wouldn't want to throttle the guest I/O, but that may be a
>>> bad assumption on my part.
>>
>> I don't disagree, I just wanted to point out that throttling the guest
>> would be another option. Applying the limit only to old data probably
>> makes most sense indeed.
>>
> 
> Given that, perhaps it does make sense to get it implemented in two
> phases - first the straightforward block commit for images below the
> active layer, and then a follow-up patch set to implement committing in
> the top layer alongside an active mirroring approach for convergence.

Yes, perhaps that does make sense. I'm not sure how much of it we'll be
able to keep when we implement the active mirror approach, and possibly
we'll just replace the initial code wholesale then, but doing everything
at once sounds too big to be manageable.

>>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>>> necessary.
>>>>>>
>>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>>> were never completed, but I think we all agreed on the general design,
>>>>>> so it should be possible to pick that up.
>>>>>>
>>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>>> expect that it won't be much different because it's basically the same
>>>>>> transactional approach that you know and that we already used for group
>>>>>> snapshots.
>>>>>
>>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>>> started work on is similar to bdrv_append() and the current transaction
>>>>> approach, so there will be plenty in common to reuse, even with some
>>>>> differences.
>>>>
>>>> I'm not sure how far a bdrv_append-like approach will work in this case.
>>>> The problem is that you must not open any image r/w twice. The policy is
>>>> that you can open an image either once r/w or multiple times read-only.
>>>
>>> I think there are 3 possibilities that would be OK: open once r/w, one
>>> or more times r/o, or once r/w + one or more times r/o.  Do you agree
>>> with this?  Is the latter case against block-layer policy?
>>
>> Yes, it is. The problem is that image formats have data cached in
>> memory, and as soon as you have a writer, the other users of the same
>> image file - no matter whether r/w or r/o - get out of sync. They might
>> mix old (cached) matadata with other new metadata, and I think you can
>> imagine what mess would result.
>>
>> Strictly speaking even raw caches things like the image size, so if the
>> r/w instance happens to resize the image, the additional r/o readers
>> would have a problem even with raw.
>>
> 
> OK, thanks. I should have qualified that with saying, for this
> application, that it is not intended to have the images open r/w and r/o
> for ongoing operations; it is just for the purpose of overlapping the
> closing of the r/o instance.

It's quite possible that it works under these circumstances. Note
however that after finishing the merge you will also want to turn it
back into an r/o image. The very least you need there is a bdrv_flush(),
and I don't think it's enough in all cases (basically you get a problem
whenever the image format uses something like QED mode where the image
becomes consistent only during a clean shutdown).

Things are becoming really tricky quite quickly, and this is why I
prefer sticking to the strict "one r/w or many r/o" rule, even though
special cases may happen to work in violation of this rule.

>>> In this special case for block commit, we need to do the reopen() to go
>>> from a r/o image to a r/w image.  If an image is opened r/o, the drv
>>> layer can do another open with a new descriptor, this time r/w, without
>>> closing the r/o instance.  If the new r/w open was successful, then the
>>> r/o instance can be closed, and some bdrv_append()-like operations done.
>>> Otherwise, like the current transactional code, we can just 'walk away'
>>> by only closing the new image (if it got that far), and the existing BDS
>>> with the r/o image is safely unchanged (and never closed, so no
>>> unrecoverable error paths).
>>
>> Yes. It's just that you need to be very careful. You would ask the qcow2
>> driver to reopen the image, and it would take care to invalidate any
>> cached data and then pass the request on to the next layer. When you
>> have drilled down through the stack to raw-posix, it would actually open
>> the second file descriptor and wait for a commit/abort that goes through
>> the whole stack again.
> 
> That makes sense, and I agree that it is a bit tricky.  The other
> question is what protocol / drivers will support this sort of operation,
> as well. And I certainly think that using Supriya's patches will prove
> useful.

It might be a bit tricky, but a whole lot less ugly than bdrv_append. ;-)

For drivers that don't implement the operation, I think we can use a
generic .close/.open sequence. Of course, this breaks the transactional
semantics, but it is very unlikely that an error occurs during opening
the image format layer. If we get any error, it's almost for sure in the
lowest layer.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-10 16:10                 ` Paolo Bonzini
@ 2012-06-11  7:59                   ` Kevin Wolf
  2012-06-11  8:01                     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-06-11  7:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, jcody, Zhi Hui Li, Taisuke Yamada, qemu-devel

Am 10.06.2012 18:10, schrieb Paolo Bonzini:
> Il 08/06/2012 19:57, Kevin Wolf ha scritto:
>>>>
>>>> I agree, doing it like mirroring for new writes on the top layer makes
>>>> sense, as long as you are willing to violate the (optional) speed
>>>> parameter.  I wouldn't think violating the speed parameter in that case
>>>> would be an issue, as long as it is a documented affect of performing a
>>>> live commit on the active (top) layer.
>> The question is what the speed parameter really means for a live commit
>> block job (or for an active mirror). I think it would make sense to
>> limit only the speed of I/O that doesn't come from the guest but is from
>> the background copying.
> 
> Not necessarily, you can throttle the copying speed.  If the I/O from
> the guest has bursts, you will pick up the work later when the rate
> calms down.

Then it's not an active mirror any more, but a passive one.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11  7:59                   ` Kevin Wolf
@ 2012-06-11  8:01                     ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2012-06-11  8:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, jcody, Zhi Hui Li, Taisuke Yamada, qemu-devel

Il 11/06/2012 09:59, Kevin Wolf ha scritto:
>>>>> >>>> I agree, doing it like mirroring for new writes on the top layer makes
>>>>> >>>> sense, as long as you are willing to violate the (optional) speed
>>>>> >>>> parameter.  I wouldn't think violating the speed parameter in that case
>>>>> >>>> would be an issue, as long as it is a documented affect of performing a
>>>>> >>>> live commit on the active (top) layer.
>>> >> The question is what the speed parameter really means for a live commit
>>> >> block job (or for an active mirror). I think it would make sense to
>>> >> limit only the speed of I/O that doesn't come from the guest but is from
>>> >> the background copying.
>> > 
>> > Not necessarily, you can throttle the copying speed.  If the I/O from
>> > the guest has bursts, you will pick up the work later when the rate
>> > calms down.
> Then it's not an active mirror any more, but a passive one.

Yes, I guess what I was saying is that the passive mirror code can be
used also for live commit of the top image.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-08 17:46             ` Jeff Cody
  2012-06-08 17:57               ` Kevin Wolf
@ 2012-06-11 12:09               ` Stefan Hajnoczi
  2012-06-11 12:50                 ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2012-06-11 12:09 UTC (permalink / raw)
  To: jcody; +Cc: Kevin Wolf, Supriya Kannery, Zhi Hui Li, Taisuke Yamada, qemu-devel

On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody <jcody@redhat.com> wrote:
> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>
>>>>>
>>>>> I think we are almost completely on the same page - devil is in the
>>>>> details, of course (for instance, on how to convert the destination base
>>>>> from r/o to r/w).
>>>>
>>>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>> assume that part already works and focus on implementing the commit
>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>> work, please let me know if you think this is good.
>>>
>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>> for block-commit on the wiki, because I don't think it conflicts with we
>>> discussed above (although I will modify my diagrams to not show commit
>>> from the top-level image).  Of course, feel free to change it as
>>> necessary.
>>
>> I may have mentioned it before, but just in case: I think Supriya's
>> bdrv_reopen() patches are a good starting point. I don't know why they
>> were never completed, but I think we all agreed on the general design,
>> so it should be possible to pick that up.
>>
>> Though if you have already started with your own work on it, Jeff, I
>> expect that it won't be much different because it's basically the same
>> transactional approach that you know and that we already used for group
>> snapshots.
>>
>
> I will definitely use parts of Supriya's as it makes sense - what I
> started work on is similar to bdrv_append() and the current transaction
> approach, so there will be plenty in common to reuse, even with some
> differences.

I have CCed Supriya who has been working on the reopen patch series.
She is close to posting a new version.

Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11 12:09               ` Stefan Hajnoczi
@ 2012-06-11 12:50                 ` Kevin Wolf
  2012-06-11 14:24                   ` Stefan Hajnoczi
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2012-06-11 12:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Supriya Kannery, jcody, Zhi Hui Li, Taisuke Yamada, qemu-devel

Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody <jcody@redhat.com> wrote:
>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>
>>>>>>
>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>> from r/o to r/w).
>>>>>
>>>>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>> assume that part already works and focus on implementing the commit
>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>> work, please let me know if you think this is good.
>>>>
>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>> discussed above (although I will modify my diagrams to not show commit
>>>> from the top-level image).  Of course, feel free to change it as
>>>> necessary.
>>>
>>> I may have mentioned it before, but just in case: I think Supriya's
>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>> were never completed, but I think we all agreed on the general design,
>>> so it should be possible to pick that up.
>>>
>>> Though if you have already started with your own work on it, Jeff, I
>>> expect that it won't be much different because it's basically the same
>>> transactional approach that you know and that we already used for group
>>> snapshots.
>>>
>>
>> I will definitely use parts of Supriya's as it makes sense - what I
>> started work on is similar to bdrv_append() and the current transaction
>> approach, so there will be plenty in common to reuse, even with some
>> differences.
> 
> I have CCed Supriya who has been working on the reopen patch series.
> She is close to posting a new version.

It's just a bit disappointing that it takes several months between each
two versions of the patch series. We'd like to have this in qemu 1.2,
not only in qemu 1.14.

I can understand if someone can't find the time, but then allow at least
someone else to pick it up.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11 12:50                 ` Kevin Wolf
@ 2012-06-11 14:24                   ` Stefan Hajnoczi
  2012-06-11 15:37                     ` Jeff Cody
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2012-06-11 14:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Supriya Kannery, jcody, Zhi Hui Li, Taisuke Yamada, qemu-devel

On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody <jcody@redhat.com> wrote:
>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>>
>>>>>>>
>>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>>> from r/o to r/w).
>>>>>>
>>>>>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>>> assume that part already works and focus on implementing the commit
>>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>>> work, please let me know if you think this is good.
>>>>>
>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>> from the top-level image).  Of course, feel free to change it as
>>>>> necessary.
>>>>
>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>> were never completed, but I think we all agreed on the general design,
>>>> so it should be possible to pick that up.
>>>>
>>>> Though if you have already started with your own work on it, Jeff, I
>>>> expect that it won't be much different because it's basically the same
>>>> transactional approach that you know and that we already used for group
>>>> snapshots.
>>>>
>>>
>>> I will definitely use parts of Supriya's as it makes sense - what I
>>> started work on is similar to bdrv_append() and the current transaction
>>> approach, so there will be plenty in common to reuse, even with some
>>> differences.
>>
>> I have CCed Supriya who has been working on the reopen patch series.
>> She is close to posting a new version.
>
> It's just a bit disappointing that it takes several months between each
> two versions of the patch series. We'd like to have this in qemu 1.2,
> not only in qemu 1.14.
>
> I can understand if someone can't find the time, but then allow at least
> someone else to pick it up.

Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
she can join the discussion and see how much overlap there is with
what she's doing.  We all contribute to QEMU from different angles and
with different priorities.  If there is a time critical dependency on
the reopen code then discuss it here and the result will be that
someone officially drives the feature on.

Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11 14:24                   ` Stefan Hajnoczi
@ 2012-06-11 15:37                     ` Jeff Cody
  2012-06-11 19:12                       ` Paolo Bonzini
                                         ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Jeff Cody @ 2012-06-11 15:37 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Supriya Kannery, Zhi Hui Li, Taisuke Yamada, qemu-devel

On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>>>> from r/o to r/w).
>>>>>>>
>>>>>>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>>>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>>>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>>>> assume that part already works and focus on implementing the commit
>>>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>>>> work, please let me know if you think this is good.
>>>>>>
>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>> necessary.
>>>>>
>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>> were never completed, but I think we all agreed on the general design,
>>>>> so it should be possible to pick that up.
>>>>>
>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>> expect that it won't be much different because it's basically the same
>>>>> transactional approach that you know and that we already used for group
>>>>> snapshots.
>>>>>
>>>>
>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>> started work on is similar to bdrv_append() and the current transaction
>>>> approach, so there will be plenty in common to reuse, even with some
>>>> differences.
>>>
>>> I have CCed Supriya who has been working on the reopen patch series.
>>> She is close to posting a new version.
>>
>> It's just a bit disappointing that it takes several months between each
>> two versions of the patch series. We'd like to have this in qemu 1.2,
>> not only in qemu 1.14.
>>
>> I can understand if someone can't find the time, but then allow at least
>> someone else to pick it up.
> 
> Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
> she can join the discussion and see how much overlap there is with
> what she's doing.  We all contribute to QEMU from different angles and
> with different priorities.  If there is a time critical dependency on
> the reopen code then discuss it here and the result will be that
> someone officially drives the feature on.
> 

I am more than happy to take the previous reopen() patches, and drive
those forward, and also do whatever else is needed for live block
commit.

It sounds like Zhi Hui is working on live block commit patches, and
Supriya is working on the bdrv_reopen() portion - I don't want to
duplicate any effort, but if there is anything I can do to help with
either of those areas, just let me know.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11 15:37                     ` Jeff Cody
@ 2012-06-11 19:12                       ` Paolo Bonzini
  2012-06-12  7:27                         ` Zhi Hui Li
  2012-06-12 10:56                       ` Stefan Hajnoczi
  2012-06-14 14:23                       ` Zhi Hui Li
  2 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2012-06-11 19:12 UTC (permalink / raw)
  To: jcody
  Cc: Kevin Wolf, Supriya Kannery, Zhi Hui Li, Taisuke Yamada,
	Stefan Hajnoczi, qemu-devel

Il 11/06/2012 17:37, Jeff Cody ha scritto:
> It sounds like Zhi Hui is working on live block commit patches, and
> Supriya is working on the bdrv_reopen() portion - I don't want to
> duplicate any effort, but if there is anything I can do to help with
> either of those areas, just let me know.

Is Zhi Hui going to move forward with the floppy patches?

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11 19:12                       ` Paolo Bonzini
@ 2012-06-12  7:27                         ` Zhi Hui Li
  0 siblings, 0 replies; 31+ messages in thread
From: Zhi Hui Li @ 2012-06-12  7:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Supriya Kannery, zhihuili, Taisuke Yamada,
	Stefan Hajnoczi, jcody, qemu-devel

On 2012年06月12日 03:12, Paolo Bonzini wrote:
> Il 11/06/2012 17:37, Jeff Cody ha scritto:
>> It sounds like Zhi Hui is working on live block commit patches, and
>> Supriya is working on the bdrv_reopen() portion - I don't want to
>> duplicate any effort, but if there is anything I can do to help with
>> either of those areas, just let me know.
>
> Is Zhi Hui going to move forward with the floppy patches?
>
> Paolo
>
>
>

Yes,  I am continue to move forward with the floppy patches.

But I am also very interesting in block commit, so I want to do it.

Thank you very much!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11 15:37                     ` Jeff Cody
  2012-06-11 19:12                       ` Paolo Bonzini
@ 2012-06-12 10:56                       ` Stefan Hajnoczi
  2012-06-13 10:56                         ` Supriya Kannery
  2012-06-14 14:23                       ` Zhi Hui Li
  2 siblings, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2012-06-12 10:56 UTC (permalink / raw)
  To: Supriya Kannery; +Cc: Kevin Wolf, jcody, Zhi Hui Li, Taisuke Yamada, qemu-devel

On Mon, Jun 11, 2012 at 4:37 PM, Jeff Cody <jcody@redhat.com> wrote:
> On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
>> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>>>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>>>>> from r/o to r/w).
>>>>>>>>
>>>>>>>> Great.  The atomic r/o -> r/w transition and the commit coroutine can
>>>>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>>>>> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>>>>> assume that part already works and focus on implementing the commit
>>>>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>>>>> work, please let me know if you think this is good.
>>>>>>>
>>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>>> necessary.
>>>>>>
>>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>>> were never completed, but I think we all agreed on the general design,
>>>>>> so it should be possible to pick that up.
>>>>>>
>>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>>> expect that it won't be much different because it's basically the same
>>>>>> transactional approach that you know and that we already used for group
>>>>>> snapshots.
>>>>>>
>>>>>
>>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>>> started work on is similar to bdrv_append() and the current transaction
>>>>> approach, so there will be plenty in common to reuse, even with some
>>>>> differences.
>>>>
>>>> I have CCed Supriya who has been working on the reopen patch series.
>>>> She is close to posting a new version.
>>>
>>> It's just a bit disappointing that it takes several months between each
>>> two versions of the patch series. We'd like to have this in qemu 1.2,
>>> not only in qemu 1.14.
>>>
>>> I can understand if someone can't find the time, but then allow at least
>>> someone else to pick it up.
>>
>> Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
>> she can join the discussion and see how much overlap there is with
>> what she's doing.  We all contribute to QEMU from different angles and
>> with different priorities.  If there is a time critical dependency on
>> the reopen code then discuss it here and the result will be that
>> someone officially drives the feature on.
>>
>
> I am more than happy to take the previous reopen() patches, and drive
> those forward, and also do whatever else is needed for live block
> commit.

Supriya,
Can you share with us whether you have enough time to complete the
reopen() patches you've been working on?  This functionality is a
dependency for the new block-commit command.  Jeff is willing to take
on the reopen() work if you do not have time.  Please let us know.

Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-12 10:56                       ` Stefan Hajnoczi
@ 2012-06-13 10:56                         ` Supriya Kannery
  0 siblings, 0 replies; 31+ messages in thread
From: Supriya Kannery @ 2012-06-13 10:56 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, jcody, Zhi Hui Li, Taisuke Yamada, qemu-devel

On 06/12/2012 04:26 PM, Stefan Hajnoczi wrote:
> On Mon, Jun 11, 2012 at 4:37 PM, Jeff Cody<jcody@redhat.com>  wrote:
>> On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
>>> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf<kwolf@redhat.com>  wrote:
>>>> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>>>>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody<jcody@redhat.com>  wrote:
>>>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody<jcody@redhat.com>  wrote:
>>>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>>>>>> from r/o to r/w).
>>>>>>>>>
>>>>>>>>> Great.  The atomic r/o ->  r/w transition and the commit coroutine can
>>>>>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>>>>>> r/o ->  r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>>>>>> assume that part already works and focus on implementing the commit
>>>>>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>>>>>> work, please let me know if you think this is good.
>>>>>>>>
>>>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>>>> necessary.
>>>>>>>
>>>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>>>> were never completed, but I think we all agreed on the general design,
>>>>>>> so it should be possible to pick that up.
>>>>>>>
>>>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>>>> expect that it won't be much different because it's basically the same
>>>>>>> transactional approach that you know and that we already used for group
>>>>>>> snapshots.
>>>>>>>
>>>>>>
>>>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>>>> started work on is similar to bdrv_append() and the current transaction
>>>>>> approach, so there will be plenty in common to reuse, even with some
>>>>>> differences.
>>>>>
>>>>> I have CCed Supriya who has been working on the reopen patch series.
>>>>> She is close to posting a new version.
>>>>
>>>> It's just a bit disappointing that it takes several months between each
>>>> two versions of the patch series. We'd like to have this in qemu 1.2,
>>>> not only in qemu 1.14.
>>>>
>>>> I can understand if someone can't find the time, but then allow at least
>>>> someone else to pick it up.
>>>
>>> Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
>>> she can join the discussion and see how much overlap there is with
>>> what she's doing.  We all contribute to QEMU from different angles and
>>> with different priorities.  If there is a time critical dependency on
>>> the reopen code then discuss it here and the result will be that
>>> someone officially drives the feature on.
>>>
>>
>> I am more than happy to take the previous reopen() patches, and drive
>> those forward, and also do whatever else is needed for live block
>> commit.
>
> Supriya,
> Can you share with us whether you have enough time to complete the
> reopen() patches you've been working on?  This functionality is a
> dependency for the new block-commit command.  Jeff is willing to take
> on the reopen() work if you do not have time.  Please let us know.
>
> Stefan
>
All,

   Sorry! for not following up these discussions and hence the delay in
responding. After few month's gap, I had resumed working on 
bdrv_reopen() patchset but was not aware of this dependency and urgency. 
Will be posting whatever I have by tomorrow so that Jeff can use that 
code as needed for block-commit.

- Supriya

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-11 15:37                     ` Jeff Cody
  2012-06-11 19:12                       ` Paolo Bonzini
  2012-06-12 10:56                       ` Stefan Hajnoczi
@ 2012-06-14 14:23                       ` Zhi Hui Li
  2012-06-14 14:29                         ` Jeff Cody
  2 siblings, 1 reply; 31+ messages in thread
From: Zhi Hui Li @ 2012-06-14 14:23 UTC (permalink / raw)
  To: jcody
  Cc: Kevin Wolf, Supriya Kannery, zhihuili, Taisuke Yamada,
	Stefan Hajnoczi, qemu-devel

On 2012年06月11日 23:37, Jeff Cody wrote:
> On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
>> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf<kwolf@redhat.com>  wrote:
>>> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>>>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody<jcody@redhat.com>  wrote:
>>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody<jcody@redhat.com>  wrote:
>>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>>>>> from r/o to r/w).
>>>>>>>>
>>>>>>>> Great.  The atomic r/o ->  r/w transition and the commit coroutine can
>>>>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>>>>> r/o ->  r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>>>>> assume that part already works and focus on implementing the commit
>>>>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>>>>> work, please let me know if you think this is good.
>>>>>>>
>>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>>> necessary.
>>>>>>
>>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>>> were never completed, but I think we all agreed on the general design,
>>>>>> so it should be possible to pick that up.
>>>>>>
>>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>>> expect that it won't be much different because it's basically the same
>>>>>> transactional approach that you know and that we already used for group
>>>>>> snapshots.
>>>>>>
>>>>>
>>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>>> started work on is similar to bdrv_append() and the current transaction
>>>>> approach, so there will be plenty in common to reuse, even with some
>>>>> differences.
>>>>
>>>> I have CCed Supriya who has been working on the reopen patch series.
>>>> She is close to posting a new version.
>>>
>>> It's just a bit disappointing that it takes several months between each
>>> two versions of the patch series. We'd like to have this in qemu 1.2,
>>> not only in qemu 1.14.
>>>
>>> I can understand if someone can't find the time, but then allow at least
>>> someone else to pick it up.
>>
>> Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
>> she can join the discussion and see how much overlap there is with
>> what she's doing.  We all contribute to QEMU from different angles and
>> with different priorities.  If there is a time critical dependency on
>> the reopen code then discuss it here and the result will be that
>> someone officially drives the feature on.
>>
>
> I am more than happy to take the previous reopen() patches, and drive
> those forward, and also do whatever else is needed for live block
> commit.
>
> It sounds like Zhi Hui is working on live block commit patches, and
> Supriya is working on the bdrv_reopen() portion - I don't want to
> duplicate any effort, but if there is anything I can do to help with
> either of those areas, just let me know.
>
> Thanks,
> Jeff
>
>
>
Jeff please go ahead with block-commit, I
am finishing up my fdc async conversion patch series first.  I will
reply when I'm ready to work on block-commit.

Thank you very much!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-14 14:23                       ` Zhi Hui Li
@ 2012-06-14 14:29                         ` Jeff Cody
  2012-06-14 18:28                           ` Supriya Kannery
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Cody @ 2012-06-14 14:29 UTC (permalink / raw)
  To: Zhi Hui Li
  Cc: Kevin Wolf, Supriya Kannery, zhihuili, Taisuke Yamada,
	Stefan Hajnoczi, qemu-devel

On 06/14/2012 10:23 AM, Zhi Hui Li wrote:
> On 2012年06月11日 23:37, Jeff Cody wrote:
>> On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
>>> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf<kwolf@redhat.com>  wrote:
>>>> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>>>>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody<jcody@redhat.com>  wrote:
>>>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody<jcody@redhat.com>  wrote:
>>>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>>>>>> from r/o to r/w).
>>>>>>>>>
>>>>>>>>> Great.  The atomic r/o ->  r/w transition and the commit coroutine can
>>>>>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>>>>>> r/o ->  r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>>>>>> assume that part already works and focus on implementing the commit
>>>>>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>>>>>> work, please let me know if you think this is good.
>>>>>>>>
>>>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>>>> necessary.
>>>>>>>
>>>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>>>> were never completed, but I think we all agreed on the general design,
>>>>>>> so it should be possible to pick that up.
>>>>>>>
>>>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>>>> expect that it won't be much different because it's basically the same
>>>>>>> transactional approach that you know and that we already used for group
>>>>>>> snapshots.
>>>>>>>
>>>>>>
>>>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>>>> started work on is similar to bdrv_append() and the current transaction
>>>>>> approach, so there will be plenty in common to reuse, even with some
>>>>>> differences.
>>>>>
>>>>> I have CCed Supriya who has been working on the reopen patch series.
>>>>> She is close to posting a new version.
>>>>
>>>> It's just a bit disappointing that it takes several months between each
>>>> two versions of the patch series. We'd like to have this in qemu 1.2,
>>>> not only in qemu 1.14.
>>>>
>>>> I can understand if someone can't find the time, but then allow at least
>>>> someone else to pick it up.
>>>
>>> Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
>>> she can join the discussion and see how much overlap there is with
>>> what she's doing.  We all contribute to QEMU from different angles and
>>> with different priorities.  If there is a time critical dependency on
>>> the reopen code then discuss it here and the result will be that
>>> someone officially drives the feature on.
>>>
>>
>> I am more than happy to take the previous reopen() patches, and drive
>> those forward, and also do whatever else is needed for live block
>> commit.
>>
>> It sounds like Zhi Hui is working on live block commit patches, and
>> Supriya is working on the bdrv_reopen() portion - I don't want to
>> duplicate any effort, but if there is anything I can do to help with
>> either of those areas, just let me know.
>>
>> Thanks,
>> Jeff
>>
>>
>>
> Jeff please go ahead with block-commit, I
> am finishing up my fdc async conversion patch series first.  I will
> reply when I'm ready to work on block-commit.
> 
> Thank you very much!
> 

Hi Zhi,

I will do that.  Thanks!

Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-14 14:29                         ` Jeff Cody
@ 2012-06-14 18:28                           ` Supriya Kannery
  2012-06-15 21:01                             ` Supriya Kannery
  0 siblings, 1 reply; 31+ messages in thread
From: Supriya Kannery @ 2012-06-14 18:28 UTC (permalink / raw)
  To: jcody
  Cc: Kevin Wolf, zhihuili, Taisuke Yamada, Stefan Hajnoczi,
	qemu-devel, Zhi Hui Li

On 06/14/2012 07:59 PM, Jeff Cody wrote:
> On 06/14/2012 10:23 AM, Zhi Hui Li wrote:
>> On 2012年06月11日 23:37, Jeff Cody wrote:
>>> On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
>>>> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf<kwolf@redhat.com>   wrote:
>>>>> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>>>>>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody<jcody@redhat.com>   wrote:
>>>>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody<jcody@redhat.com>   wrote:
>>>>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>>>>> Let's figure out how to specify block-commit so we're all happy, that
>>>>>>>>>>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think we are almost completely on the same page - devil is in the
>>>>>>>>>>> details, of course (for instance, on how to convert the destination base
>>>>>>>>>>> from r/o to r/w).
>>>>>>>>>>
>>>>>>>>>> Great.  The atomic r/o ->   r/w transition and the commit coroutine can
>>>>>>>>>> be implemented on in parallel.  Are you happy to implement the atomic
>>>>>>>>>> r/o ->   r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
>>>>>>>>>> assume that part already works and focus on implementing the commit
>>>>>>>>>> coroutine in the meantime.  I'm just suggesting a way to split up the
>>>>>>>>>> work, please let me know if you think this is good.
>>>>>>>>>
>>>>>>>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>>>>>>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>>>>>>>> for block-commit on the wiki, because I don't think it conflicts with we
>>>>>>>>> discussed above (although I will modify my diagrams to not show commit
>>>>>>>>> from the top-level image).  Of course, feel free to change it as
>>>>>>>>> necessary.
>>>>>>>>
>>>>>>>> I may have mentioned it before, but just in case: I think Supriya's
>>>>>>>> bdrv_reopen() patches are a good starting point. I don't know why they
>>>>>>>> were never completed, but I think we all agreed on the general design,
>>>>>>>> so it should be possible to pick that up.
>>>>>>>>
>>>>>>>> Though if you have already started with your own work on it, Jeff, I
>>>>>>>> expect that it won't be much different because it's basically the same
>>>>>>>> transactional approach that you know and that we already used for group
>>>>>>>> snapshots.
>>>>>>>>
>>>>>>>
>>>>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>>>>> started work on is similar to bdrv_append() and the current transaction
>>>>>>> approach, so there will be plenty in common to reuse, even with some
>>>>>>> differences.
>>>>>>
>>>>>> I have CCed Supriya who has been working on the reopen patch series.
>>>>>> She is close to posting a new version.
>>>>>
>>>>> It's just a bit disappointing that it takes several months between each
>>>>> two versions of the patch series. We'd like to have this in qemu 1.2,
>>>>> not only in qemu 1.14.
>>>>>
>>>>> I can understand if someone can't find the time, but then allow at least
>>>>> someone else to pick it up.
>>>>
>>>> Hey, don't shoot the messenger :).  I just wanted add Supriya to CC so
>>>> she can join the discussion and see how much overlap there is with
>>>> what she's doing.  We all contribute to QEMU from different angles and
>>>> with different priorities.  If there is a time critical dependency on
>>>> the reopen code then discuss it here and the result will be that
>>>> someone officially drives the feature on.
>>>>
>>>
>>> I am more than happy to take the previous reopen() patches, and drive
>>> those forward, and also do whatever else is needed for live block
>>> commit.
>>>
>>> It sounds like Zhi Hui is working on live block commit patches, and
>>> Supriya is working on the bdrv_reopen() portion - I don't want to
>>> duplicate any effort, but if there is anything I can do to help with
>>> either of those areas, just let me know.
>>>
>>> Thanks,
>>> Jeff
>>>
>>>
>>>
>> Jeff please go ahead with block-commit, I
>> am finishing up my fdc async conversion patch series first.  I will
>> reply when I'm ready to work on block-commit.
>>
>> Thank you very much!
>>
>
> Hi Zhi,
>
> I will do that.  Thanks!
>
> Jeff
>
Jeff,
    Need another day's time for posting bdrv_reopen() patchset. Getting 
few hmp related build errors in hostcache toggling code when built on 
latest git. Will correct those and post the patchset soon.
- Supriya

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
  2012-06-14 18:28                           ` Supriya Kannery
@ 2012-06-15 21:01                             ` Supriya Kannery
  0 siblings, 0 replies; 31+ messages in thread
From: Supriya Kannery @ 2012-06-15 21:01 UTC (permalink / raw)
  To: supriyak
  Cc: Kevin Wolf, zhihuili, Taisuke Yamada, Stefan Hajnoczi, jcody,
	qemu-devel, Zhi Hui Li

On 06/14/2012 11:58 PM, Supriya Kannery wrote:
> On 06/14/2012 07:59 PM, Jeff Cody wrote:
>> On 06/14/2012 10:23 AM, Zhi Hui Li wrote:
>>> On 2012年06月11日 23:37, Jeff Cody wrote:
>>>> On 06/11/2012 10:24 AM, Stefan Hajnoczi wrote:
>>>>> On Mon, Jun 11, 2012 at 1:50 PM, Kevin Wolf<kwolf@redhat.com> wrote:
>>>>>> Am 11.06.2012 14:09, schrieb Stefan Hajnoczi:
>>>>>>> On Fri, Jun 8, 2012 at 6:46 PM, Jeff Cody<jcody@redhat.com> wrote:
>>>>>>>> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>>>>>>>>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>>>>>>>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>>>>>>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody<jcody@redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>>>>>>>>> Let's figure out how to specify block-commit so we're all
>>>>>>>>>>>>> happy, that
>>>>>>>>>>>>> way we can avoid duplicating work. Any comments on my notes
>>>>>>>>>>>>> above?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think we are almost completely on the same page - devil is
>>>>>>>>>>>> in the
>>>>>>>>>>>> details, of course (for instance, on how to convert the
>>>>>>>>>>>> destination base
>>>>>>>>>>>> from r/o to r/w).
>>>>>>>>>>>
>>>>>>>>>>> Great. The atomic r/o -> r/w transition and the commit
>>>>>>>>>>> coroutine can
>>>>>>>>>>> be implemented on in parallel. Are you happy to implement the
>>>>>>>>>>> atomic
>>>>>>>>>>> r/o -> r/w transition since you wrote bdrv_append()? Then Zhi
>>>>>>>>>>> Hui can
>>>>>>>>>>> assume that part already works and focus on implementing the
>>>>>>>>>>> commit
>>>>>>>>>>> coroutine in the meantime. I'm just suggesting a way to split
>>>>>>>>>>> up the
>>>>>>>>>>> work, please let me know if you think this is good.
>>>>>>>>>>
>>>>>>>>>> I am happy to do it that way. I'll shift my focus to the
>>>>>>>>>> atomic image
>>>>>>>>>> reopen in r/w mode. I'll go ahead and post my diagrams and
>>>>>>>>>> other info
>>>>>>>>>> for block-commit on the wiki, because I don't think it
>>>>>>>>>> conflicts with we
>>>>>>>>>> discussed above (although I will modify my diagrams to not
>>>>>>>>>> show commit
>>>>>>>>>> from the top-level image). Of course, feel free to change it as
>>>>>>>>>> necessary.
>>>>>>>>>
>>>>>>>>> I may have mentioned it before, but just in case: I think
>>>>>>>>> Supriya's
>>>>>>>>> bdrv_reopen() patches are a good starting point. I don't know
>>>>>>>>> why they
>>>>>>>>> were never completed, but I think we all agreed on the general
>>>>>>>>> design,
>>>>>>>>> so it should be possible to pick that up.
>>>>>>>>>
>>>>>>>>> Though if you have already started with your own work on it,
>>>>>>>>> Jeff, I
>>>>>>>>> expect that it won't be much different because it's basically
>>>>>>>>> the same
>>>>>>>>> transactional approach that you know and that we already used
>>>>>>>>> for group
>>>>>>>>> snapshots.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I will definitely use parts of Supriya's as it makes sense - what I
>>>>>>>> started work on is similar to bdrv_append() and the current
>>>>>>>> transaction
>>>>>>>> approach, so there will be plenty in common to reuse, even with
>>>>>>>> some
>>>>>>>> differences.
>>>>>>>
>>>>>>> I have CCed Supriya who has been working on the reopen patch series.
>>>>>>> She is close to posting a new version.
>>>>>>
>>>>>> It's just a bit disappointing that it takes several months between
>>>>>> each
>>>>>> two versions of the patch series. We'd like to have this in qemu 1.2,
>>>>>> not only in qemu 1.14.
>>>>>>
>>>>>> I can understand if someone can't find the time, but then allow at
>>>>>> least
>>>>>> someone else to pick it up.
>>>>>
>>>>> Hey, don't shoot the messenger :). I just wanted add Supriya to CC so
>>>>> she can join the discussion and see how much overlap there is with
>>>>> what she's doing. We all contribute to QEMU from different angles and
>>>>> with different priorities. If there is a time critical dependency on
>>>>> the reopen code then discuss it here and the result will be that
>>>>> someone officially drives the feature on.
>>>>>
>>>>
>>>> I am more than happy to take the previous reopen() patches, and drive
>>>> those forward, and also do whatever else is needed for live block
>>>> commit.
>>>>
>>>> It sounds like Zhi Hui is working on live block commit patches, and
>>>> Supriya is working on the bdrv_reopen() portion - I don't want to
>>>> duplicate any effort, but if there is anything I can do to help with
>>>> either of those areas, just let me know.
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>>
>>>>
>>> Jeff please go ahead with block-commit, I
>>> am finishing up my fdc async conversion patch series first. I will
>>> reply when I'm ready to work on block-commit.
>>>
>>> Thank you very much!
>>>
>>
>> Hi Zhi,
>>
>> I will do that. Thanks!
>>
>> Jeff
>>
> Jeff,
> Need another day's time for posting bdrv_reopen() patchset. Getting few
> hmp related build errors in hostcache toggling code when built on latest
> git. Will correct those and post the patchset soon.
> - Supriya
>
>
Jeff,
   bdrv_reopen v1 patchset posted. Title: "Dynamic host pagecache change
and image file reopen".
Thanks, Supriya

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-06-15 21:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-07  6:19 [Qemu-devel] CoW image commit+shrink(= make_empty) support Taisuke Yamada
2012-06-07 14:14 ` Jeff Cody
2012-06-08 12:42   ` Stefan Hajnoczi
2012-06-08 13:19     ` Jeff Cody
2012-06-08 13:53       ` Stefan Hajnoczi
2012-06-08 14:32         ` Jeff Cody
2012-06-08 16:11           ` Kevin Wolf
2012-06-08 17:46             ` Jeff Cody
2012-06-08 17:57               ` Kevin Wolf
2012-06-08 18:33                 ` Jeff Cody
2012-06-08 21:08                   ` Kevin Wolf
2012-06-09 16:52                     ` Jeff Cody
2012-06-11  7:57                       ` Kevin Wolf
2012-06-10 16:10                 ` Paolo Bonzini
2012-06-11  7:59                   ` Kevin Wolf
2012-06-11  8:01                     ` Paolo Bonzini
2012-06-11 12:09               ` Stefan Hajnoczi
2012-06-11 12:50                 ` Kevin Wolf
2012-06-11 14:24                   ` Stefan Hajnoczi
2012-06-11 15:37                     ` Jeff Cody
2012-06-11 19:12                       ` Paolo Bonzini
2012-06-12  7:27                         ` Zhi Hui Li
2012-06-12 10:56                       ` Stefan Hajnoczi
2012-06-13 10:56                         ` Supriya Kannery
2012-06-14 14:23                       ` Zhi Hui Li
2012-06-14 14:29                         ` Jeff Cody
2012-06-14 18:28                           ` Supriya Kannery
2012-06-15 21:01                             ` Supriya Kannery
2012-06-10 16:06       ` Paolo Bonzini
2012-06-08 10:39 ` Kevin Wolf
2012-06-09 11:21   ` Taisuke Yamada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.