All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/10] backup compression
@ 2016-05-14 12:45 Denis V. Lunev
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed Denis V. Lunev
                   ` (8 more replies)
  0 siblings, 9 replies; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

These patches add the ability to compress data during backup. This
functionality is implemented by means of adding options to the qmp/hmp
commands(drive-backup, blockdev-backup). The implementation is quite
simple, because the responsibility for data compression imposed on the
format driver.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>

Pavel Butsykin (10):
  block/io: add bdrv_co_write_compressed
  qcow2: add qcow2_co_write_compressed
  vmdk: add vmdk_co_write_compressed
  qcow: add qcow_co_write_compressed
  block: remove BlockDriver.bdrv_write_compressed
  drive-backup: added support for data compression
  blockdev-backup: added support for data compression
  qemu-iotests: test backup compression in 055
  block: fix backup in vmdk format image
  qemu-iotests: add vmdk for test backup compression in 055

 block/backup.c                |  13 +++++
 block/io.c                    |  57 ++++++++++++++++++--
 block/qcow.c                  |  78 ++++++++++++++++------------
 block/qcow2.c                 |  89 ++++++++++++++++++-------------
 block/vmdk.c                  |  61 ++++------------------
 blockdev.c                    |  20 ++++++-
 hmp-commands.hx               |   8 +--
 hmp.c                         |   3 +-
 include/block/block.h         |   2 +
 include/block/block_int.h     |   5 +-
 qapi/block-core.json          |   3 +-
 qemu-img.c                    |   2 +-
 qmp-commands.hx               |   7 ++-
 tests/qemu-iotests/055        | 118 ++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/055.out    |   4 +-
 tests/qemu-iotests/iotests.py |  10 ++--
 16 files changed, 336 insertions(+), 144 deletions(-)

--

Changes from v1:
- added unittest for backup compression (8)

Changes from v2:
- implemented a new .bdrv_co_write_compressed interface to replace the
  old .bdrv_write_compressed (1,2,3,4,5)

2.1.4

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-16 16:52   ` Eric Blake
  2016-05-19 21:25   ` Stefan Hajnoczi
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed Denis V. Lunev
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

This patch just adds the interface to the bdrv_co_write_compressed, which
is currently not used but will be useful for safe implementation of the
bdrv_co_write_compressed callback in format drivers.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c                | 71 ++++++++++++++++++++++++++++++++++++++++++++---
 include/block/block.h     |  2 ++
 include/block/block_int.h |  3 ++
 qemu-img.c                |  2 +-
 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index cd6d71a..88af10c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1828,8 +1828,8 @@ int bdrv_is_allocated_above(BlockDriverState *top,
     return 0;
 }
 
-int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
-                          const uint8_t *buf, int nb_sectors)
+int bdrv_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                             int nb_sectors, QEMUIOVector *qiov)
 {
     BlockDriver *drv = bs->drv;
     int ret;
@@ -1837,7 +1837,7 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
     if (!drv) {
         return -ENOMEDIUM;
     }
-    if (!drv->bdrv_write_compressed) {
+    if (!drv->bdrv_co_write_compressed) {
         return -ENOTSUP;
     }
     ret = bdrv_check_request(bs, sector_num, nb_sectors);
@@ -1846,8 +1846,71 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
     }
 
     assert(QLIST_EMPTY(&bs->dirty_bitmaps));
+    assert(qemu_in_coroutine());
+
+    return drv->bdrv_co_write_compressed(bs, sector_num, nb_sectors, qiov);
+}
+
+typedef struct BdrvWriteCompressedCo {
+    BlockDriverState *bs;
+    int64_t sector_num;
+    const uint8_t *buf;
+    int nb_sectors;
+    int ret;
+} BdrvWriteCompressedCo;
+
+static void bdrv_write_compressed_co_entry(void *opaque)
+{
+    BdrvWriteCompressedCo *co = opaque;
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base   = (uint8_t *)co->buf,
+        .iov_len    = co->nb_sectors << BDRV_SECTOR_BITS,
+    };
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    co->ret = bdrv_co_write_compressed(co->bs, co->sector_num,
+                                       co->nb_sectors, &qiov);
+}
+
+int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                          const uint8_t *buf, int nb_sectors)
+{
+    BlockDriver *drv = bs->drv;
+    BdrvWriteCompressedCo data = {
+        .bs         = bs,
+        .sector_num = sector_num,
+        .buf        = buf,
+        .nb_sectors = nb_sectors,
+        .ret        = -EINPROGRESS,
+    };
+
+    if (!drv) {
+        return -ENOMEDIUM;
+    }
+
+    if (drv->bdrv_write_compressed) {
+        int ret = bdrv_check_request(bs, sector_num, nb_sectors);
+        if (ret < 0) {
+            return ret;
+        }
+        assert(QLIST_EMPTY(&bs->dirty_bitmaps));
+        return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
+    }
 
-    return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
+    if (qemu_in_coroutine()) {
+        /* Fast-path if already in coroutine context */
+        bdrv_write_compressed_co_entry(&data);
+    } else {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+
+        Coroutine *co = qemu_coroutine_create(bdrv_write_compressed_co_entry);
+        qemu_coroutine_enter(co, &data);
+        while (data.ret == -EINPROGRESS) {
+            aio_poll(aio_context, true);
+        }
+    }
+    return data.ret;
 }
 
 int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
diff --git a/include/block/block.h b/include/block/block.h
index b210832..ae67fd8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -423,6 +423,8 @@ const char *bdrv_get_device_or_node_name(const BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
                           const uint8_t *buf, int nb_sectors);
+int bdrv_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                             int nb_sectors, QEMUIOVector *qiov);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
 void bdrv_round_to_clusters(BlockDriverState *bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a029c20..3c93ddb 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -208,6 +208,9 @@ struct BlockDriver {
     int (*bdrv_write_compressed)(BlockDriverState *bs, int64_t sector_num,
                                  const uint8_t *buf, int nb_sectors);
 
+    int (*bdrv_co_write_compressed)(BlockDriverState *bs, int64_t sector_num,
+                                    int nb_sectors, QEMUIOVector *qiov);
+
     int (*bdrv_snapshot_create)(BlockDriverState *bs,
                                 QEMUSnapshotInfo *sn_info);
     int (*bdrv_snapshot_goto)(BlockDriverState *bs,
diff --git a/qemu-img.c b/qemu-img.c
index 4792366..0d38eac 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2022,7 +2022,7 @@ static int img_convert(int argc, char **argv)
         const char *preallocation =
             qemu_opt_get(opts, BLOCK_OPT_PREALLOC);
 
-        if (!drv->bdrv_write_compressed) {
+        if (!drv->bdrv_write_compressed && !drv->bdrv_co_write_compressed) {
             error_report("Compression not supported for this file format");
             ret = -1;
             goto out;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-27 17:33   ` Stefan Hajnoczi
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 03/10] vmdk: add vmdk_co_write_compressed Denis V. Lunev
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

Added implementation of the qcow2_co_write_compressed function that
will allow us to safely use compressed writes for the qcow2 from running
VMs.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c | 89 ++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 52 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 62febfc..d948d44 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2533,13 +2533,16 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
 
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
-static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
-                                  const uint8_t *buf, int nb_sectors)
+static coroutine_fn int
+qcow2_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                          int nb_sectors, QEMUIOVector *qiov)
 {
     BDRVQcow2State *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    struct iovec iov;
     z_stream strm;
     int ret, out_len;
-    uint8_t *out_buf;
+    uint8_t *buf, *out_buf;
     uint64_t cluster_offset;
 
     if (nb_sectors == 0) {
@@ -2549,29 +2552,25 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
         return bdrv_truncate(bs->file->bs, cluster_offset);
     }
 
+    buf = qemu_blockalign(bs, s->cluster_size);
     if (nb_sectors != s->cluster_sectors) {
-        ret = -EINVAL;
-
-        /* Zero-pad last write if image size is not cluster aligned */
-        if (sector_num + nb_sectors == bs->total_sectors &&
-            nb_sectors < s->cluster_sectors) {
-            uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
-            memset(pad_buf, 0, s->cluster_size);
-            memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
-            ret = qcow2_write_compressed(bs, sector_num,
-                                         pad_buf, s->cluster_sectors);
-            qemu_vfree(pad_buf);
+        if (nb_sectors > s->cluster_sectors ||
+            sector_num + nb_sectors != bs->total_sectors)
+        {
+            qemu_vfree(buf);
+            return -EINVAL;
         }
-        return ret;
+        /* Zero-pad last write if image size is not cluster aligned */
+        memset(buf, 0, s->cluster_size);
     }
+    qemu_iovec_to_buf(qiov, 0, buf, qiov->size);
 
     out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128);
 
     /* best compression, small window, no zlib header */
     memset(&strm, 0, sizeof(strm));
-    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
-                       Z_DEFLATED, -12,
-                       9, Z_DEFAULT_STRATEGY);
+    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
+                       -12, 9, Z_DEFAULT_STRATEGY);
     if (ret != 0) {
         ret = -EINVAL;
         goto fail;
@@ -2593,34 +2592,50 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
     deflateEnd(&strm);
 
     if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
+        iov = (struct iovec) {
+            .iov_base   = buf,
+            .iov_len    = out_len,
+        };
+        qemu_iovec_init_external(&hd_qiov, &iov, 1);
         /* could not compress: write normal cluster */
-        ret = bdrv_write(bs, sector_num, buf, s->cluster_sectors);
+        ret = qcow2_co_writev(bs, sector_num, s->cluster_sectors, &hd_qiov);
         if (ret < 0) {
             goto fail;
         }
-    } else {
-        cluster_offset = qcow2_alloc_compressed_cluster_offset(bs,
-            sector_num << 9, out_len);
-        if (!cluster_offset) {
-            ret = -EIO;
-            goto fail;
-        }
-        cluster_offset &= s->cluster_offset_mask;
+        goto success;
+    }
 
-        ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
-        if (ret < 0) {
-            goto fail;
-        }
+    qemu_co_mutex_lock(&s->lock);
+    cluster_offset = \
+        qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len);
+    if (!cluster_offset) {
+        qemu_co_mutex_unlock(&s->lock);
+        ret = -EIO;
+        goto fail;
+    }
+    cluster_offset &= s->cluster_offset_mask;
 
-        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
-        ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len);
-        if (ret < 0) {
-            goto fail;
-        }
+    ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
+    qemu_co_mutex_unlock(&s->lock);
+    if (ret < 0) {
+        goto fail;
     }
 
+    iov = (struct iovec) {
+        .iov_base   = out_buf,
+        .iov_len    = out_len,
+    };
+    qemu_iovec_init_external(&hd_qiov, &iov, 1);
+
+    BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
+    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);
+    if (ret < 0) {
+        goto fail;
+    }
+success:
     ret = 0;
 fail:
+    qemu_vfree(buf);
     g_free(out_buf);
     return ret;
 }
@@ -3382,7 +3397,7 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_co_write_zeroes   = qcow2_co_write_zeroes,
     .bdrv_co_discard        = qcow2_co_discard,
     .bdrv_truncate          = qcow2_truncate,
-    .bdrv_write_compressed  = qcow2_write_compressed,
+    .bdrv_co_write_compressed  = qcow2_co_write_compressed,
     .bdrv_make_empty        = qcow2_make_empty,
 
     .bdrv_snapshot_create   = qcow2_snapshot_create,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 03/10] vmdk: add vmdk_co_write_compressed
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed Denis V. Lunev
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-27 17:38   ` Stefan Hajnoczi
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 04/10] qcow: add qcow_co_write_compressed Denis V. Lunev
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

Added implementation of the vmdk_co_write_compressed function that
will allow us to safely use compressed writes for the vmdk from running
VMs.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/vmdk.c | 56 ++++++--------------------------------------------------
 1 file changed, 6 insertions(+), 50 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index e6c97c2..9530b30 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1651,56 +1651,12 @@ vmdk_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     return ret;
 }
 
-typedef struct VmdkWriteCompressedCo {
-    BlockDriverState *bs;
-    int64_t sector_num;
-    const uint8_t *buf;
-    int nb_sectors;
-    int ret;
-} VmdkWriteCompressedCo;
-
-static void vmdk_co_write_compressed(void *opaque)
-{
-    VmdkWriteCompressedCo *co = opaque;
-    QEMUIOVector local_qiov;
-    uint64_t offset = co->sector_num * BDRV_SECTOR_SIZE;
-    uint64_t bytes = co->nb_sectors * BDRV_SECTOR_SIZE;
-
-    struct iovec iov = (struct iovec) {
-        .iov_base   = (uint8_t*) co->buf,
-        .iov_len    = bytes,
-    };
-    qemu_iovec_init_external(&local_qiov, &iov, 1);
-
-    co->ret = vmdk_pwritev(co->bs, offset, bytes, &local_qiov, false, false);
-}
-
-static int vmdk_write_compressed(BlockDriverState *bs,
-                                 int64_t sector_num,
-                                 const uint8_t *buf,
-                                 int nb_sectors)
+static int coroutine_fn
+vmdk_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                         int nb_sectors, QEMUIOVector *qiov)
 {
-    BDRVVmdkState *s = bs->opaque;
-
-    if (s->num_extents == 1 && s->extents[0].compressed) {
-        Coroutine *co;
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-        VmdkWriteCompressedCo data = {
-            .bs         = bs,
-            .sector_num = sector_num,
-            .buf        = buf,
-            .nb_sectors = nb_sectors,
-            .ret        = -EINPROGRESS,
-        };
-        co = qemu_coroutine_create(vmdk_co_write_compressed);
-        qemu_coroutine_enter(co, &data);
-        while (data.ret == -EINPROGRESS) {
-            aio_poll(aio_context, true);
-        }
-        return data.ret;
-    } else {
-        return -ENOTSUP;
-    }
+    return vmdk_co_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
+                           (uint64_t)nb_sectors << BDRV_SECTOR_BITS, qiov, 0);
 }
 
 static int coroutine_fn vmdk_co_write_zeroes(BlockDriverState *bs,
@@ -2422,7 +2378,7 @@ static BlockDriver bdrv_vmdk = {
     .bdrv_reopen_prepare          = vmdk_reopen_prepare,
     .bdrv_co_preadv               = vmdk_co_preadv,
     .bdrv_co_pwritev              = vmdk_co_pwritev,
-    .bdrv_write_compressed        = vmdk_write_compressed,
+    .bdrv_co_write_compressed     = vmdk_co_write_compressed,
     .bdrv_co_write_zeroes         = vmdk_co_write_zeroes,
     .bdrv_close                   = vmdk_close,
     .bdrv_create                  = vmdk_create,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 04/10] qcow: add qcow_co_write_compressed
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
                   ` (2 preceding siblings ...)
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 03/10] vmdk: add vmdk_co_write_compressed Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-27 17:45   ` Stefan Hajnoczi
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 05/10] block: remove BlockDriver.bdrv_write_compressed Denis V. Lunev
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

Added implementation of the qcow_co_write_compressed function that
will allow us to safely use compressed writes for the qcow from running
VMs.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow.c | 78 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 45 insertions(+), 33 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index d6dc1b0..80a4ce8 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -909,38 +909,37 @@ static int qcow_make_empty(BlockDriverState *bs)
 
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
-static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
-                                 const uint8_t *buf, int nb_sectors)
+static coroutine_fn int
+qcow_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                         int nb_sectors, QEMUIOVector *qiov)
 {
     BDRVQcowState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    struct iovec iov;
     z_stream strm;
     int ret, out_len;
-    uint8_t *out_buf;
+    uint8_t *buf, *out_buf;
     uint64_t cluster_offset;
 
+    buf = qemu_blockalign(bs, s->cluster_size);
     if (nb_sectors != s->cluster_sectors) {
-        ret = -EINVAL;
-
-        /* Zero-pad last write if image size is not cluster aligned */
-        if (sector_num + nb_sectors == bs->total_sectors &&
-            nb_sectors < s->cluster_sectors) {
-            uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
-            memset(pad_buf, 0, s->cluster_size);
-            memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
-            ret = qcow_write_compressed(bs, sector_num,
-                                        pad_buf, s->cluster_sectors);
-            qemu_vfree(pad_buf);
+        if (nb_sectors > s->cluster_sectors ||
+            sector_num + nb_sectors != bs->total_sectors)
+        {
+            qemu_vfree(buf);
+            return -EINVAL;
         }
-        return ret;
+        /* Zero-pad last write if image size is not cluster aligned */
+        memset(buf, 0, s->cluster_size);
     }
+    qemu_iovec_to_buf(qiov, 0, buf, qiov->size);
 
     out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128);
 
     /* best compression, small window, no zlib header */
     memset(&strm, 0, sizeof(strm));
-    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
-                       Z_DEFLATED, -12,
-                       9, Z_DEFAULT_STRATEGY);
+    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
+                       -12, 9, Z_DEFAULT_STRATEGY);
     if (ret != 0) {
         ret = -EINVAL;
         goto fail;
@@ -962,28 +961,41 @@ static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
     deflateEnd(&strm);
 
     if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
+        struct iovec iov = {
+            .iov_base   = buf,
+            .iov_len    = out_len,
+        };
+        qemu_iovec_init_external(&hd_qiov, &iov, 1);
         /* could not compress: write normal cluster */
-        ret = bdrv_write(bs, sector_num, buf, s->cluster_sectors);
-        if (ret < 0) {
-            goto fail;
-        }
-    } else {
-        cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
-                                            out_len, 0, 0);
-        if (cluster_offset == 0) {
-            ret = -EIO;
-            goto fail;
-        }
-
-        cluster_offset &= s->cluster_offset_mask;
-        ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len);
+        ret = qcow_co_writev(bs, sector_num, s->cluster_sectors, &hd_qiov);
         if (ret < 0) {
             goto fail;
         }
+        goto success;
     }
 
+    qemu_co_mutex_lock(&s->lock);
+    cluster_offset = get_cluster_offset(bs, sector_num << 9, 2, out_len, 0, 0);
+    qemu_co_mutex_unlock(&s->lock);
+    if (cluster_offset == 0) {
+        ret = -EIO;
+        goto fail;
+    }
+    cluster_offset &= s->cluster_offset_mask;
+
+    iov = (struct iovec) {
+        .iov_base   = out_buf,
+        .iov_len    = out_len,
+    };
+    qemu_iovec_init_external(&hd_qiov, &iov, 1);
+    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);
+    if (ret < 0) {
+        goto fail;
+    }
+success:
     ret = 0;
 fail:
+    qemu_vfree(buf);
     g_free(out_buf);
     return ret;
 }
@@ -1036,7 +1048,7 @@ static BlockDriver bdrv_qcow = {
 
     .bdrv_set_key           = qcow_set_key,
     .bdrv_make_empty        = qcow_make_empty,
-    .bdrv_write_compressed  = qcow_write_compressed,
+    .bdrv_co_write_compressed = qcow_co_write_compressed,
     .bdrv_get_info          = qcow_get_info,
 
     .create_opts            = &qcow_create_opts,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 05/10] block: remove BlockDriver.bdrv_write_compressed
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
                   ` (3 preceding siblings ...)
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 04/10] qcow: add qcow_co_write_compressed Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-16 16:57   ` Eric Blake
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression Denis V. Lunev
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

There are no block drivers left that implement the old
.bdrv_write_compressed interface, so it can be removed now.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c                | 14 --------------
 include/block/block_int.h |  3 ---
 qemu-img.c                |  2 +-
 3 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/block/io.c b/block/io.c
index 88af10c..ae87e72 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1876,7 +1876,6 @@ static void bdrv_write_compressed_co_entry(void *opaque)
 int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
                           const uint8_t *buf, int nb_sectors)
 {
-    BlockDriver *drv = bs->drv;
     BdrvWriteCompressedCo data = {
         .bs         = bs,
         .sector_num = sector_num,
@@ -1885,19 +1884,6 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
         .ret        = -EINPROGRESS,
     };
 
-    if (!drv) {
-        return -ENOMEDIUM;
-    }
-
-    if (drv->bdrv_write_compressed) {
-        int ret = bdrv_check_request(bs, sector_num, nb_sectors);
-        if (ret < 0) {
-            return ret;
-        }
-        assert(QLIST_EMPTY(&bs->dirty_bitmaps));
-        return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
-    }
-
     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
         bdrv_write_compressed_co_entry(&data);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3c93ddb..cae838c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -205,9 +205,6 @@ struct BlockDriver {
     bool has_variable_length;
     int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
 
-    int (*bdrv_write_compressed)(BlockDriverState *bs, int64_t sector_num,
-                                 const uint8_t *buf, int nb_sectors);
-
     int (*bdrv_co_write_compressed)(BlockDriverState *bs, int64_t sector_num,
                                     int nb_sectors, QEMUIOVector *qiov);
 
diff --git a/qemu-img.c b/qemu-img.c
index 0d38eac..fd892e8 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2022,7 +2022,7 @@ static int img_convert(int argc, char **argv)
         const char *preallocation =
             qemu_opt_get(opts, BLOCK_OPT_PREALLOC);
 
-        if (!drv->bdrv_write_compressed && !drv->bdrv_co_write_compressed) {
+        if (!drv->bdrv_co_write_compressed) {
             error_report("Compression not supported for this file format");
             ret = -1;
             goto out;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
                   ` (4 preceding siblings ...)
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 05/10] block: remove BlockDriver.bdrv_write_compressed Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-16 16:59   ` Eric Blake
  2016-05-27 17:56   ` Stefan Hajnoczi
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 07/10] blockdev-backup: " Denis V. Lunev
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

The patch adds a flag to the qmp/hmp drive-backup command which enables
block compression. Compression should be implemented in the format driver
to enable this feature.

There are some limitations of the format driver to allow compressed writes.
We can write data only once. Though for backup this is perfectly fine.
These limitations are maintained by the driver and the error will be
reported if we are doing something wrong.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/backup.c            | 13 +++++++++++++
 blockdev.c                | 12 ++++++++++--
 hmp-commands.hx           |  8 +++++---
 hmp.c                     |  3 ++-
 include/block/block_int.h |  1 +
 qapi/block-core.json      |  2 +-
 qmp-commands.hx           |  4 +++-
 7 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 491fd14..0bcfb23 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,6 +47,7 @@ typedef struct BackupBlockJob {
     uint64_t sectors_read;
     unsigned long *done_bitmap;
     int64_t cluster_size;
+    bool compress;
     QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;
 
@@ -157,6 +158,10 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
             ret = bdrv_co_write_zeroes(job->target,
                                        start * sectors_per_cluster,
                                        n, BDRV_REQ_MAY_UNMAP);
+        } else if (job->compress) {
+            ret = bdrv_co_write_compressed(job->target,
+                                           start * sectors_per_cluster, n,
+                                           &bounce_qiov);
         } else {
             ret = bdrv_co_writev(job->target,
                                  start * sectors_per_cluster, n,
@@ -497,6 +502,7 @@ static void coroutine_fn backup_run(void *opaque)
 void backup_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode sync_mode,
                   BdrvDirtyBitmap *sync_bitmap,
+                  bool compress,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockCompletionFunc *cb, void *opaque,
@@ -534,6 +540,12 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
         return;
     }
 
+    if (compress && target->drv->bdrv_co_write_compressed == NULL) {
+        error_setg(errp, "Compression is not supported for this drive %s",
+                   bdrv_get_device_name(target));
+        return;
+    }
+
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
         return;
     }
@@ -580,6 +592,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
     job->sync_mode = sync_mode;
     job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
                        sync_bitmap : NULL;
+    job->compress = compress;
 
     /* If there is no backing file on the target, we cannot rely on COW if our
      * backup cluster size is smaller than the target cluster size. Even for
diff --git a/blockdev.c b/blockdev.c
index 1892b8e..1ef9e31 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1856,6 +1856,7 @@ static void do_drive_backup(const char *device, const char *target,
                             bool has_mode, enum NewImageMode mode,
                             bool has_speed, int64_t speed,
                             bool has_bitmap, const char *bitmap,
+                            bool has_compress, bool compress,
                             bool has_on_source_error,
                             BlockdevOnError on_source_error,
                             bool has_on_target_error,
@@ -1896,6 +1897,7 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
                     backup->has_mode, backup->mode,
                     backup->has_speed, backup->speed,
                     backup->has_bitmap, backup->bitmap,
+                    backup->has_compress, backup->compress,
                     backup->has_on_source_error, backup->on_source_error,
                     backup->has_on_target_error, backup->on_target_error,
                     common->block_job_txn, &local_err);
@@ -3170,6 +3172,7 @@ static void do_drive_backup(const char *device, const char *target,
                             bool has_mode, enum NewImageMode mode,
                             bool has_speed, int64_t speed,
                             bool has_bitmap, const char *bitmap,
+                            bool has_compress, bool compress,
                             bool has_on_source_error,
                             BlockdevOnError on_source_error,
                             bool has_on_target_error,
@@ -3200,6 +3203,9 @@ static void do_drive_backup(const char *device, const char *target,
     if (!has_mode) {
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
+    if (!has_compress) {
+        compress = false;
+    }
 
     blk = blk_by_name(device);
     if (!blk) {
@@ -3288,7 +3294,7 @@ static void do_drive_backup(const char *device, const char *target,
         }
     }
 
-    backup_start(bs, target_bs, speed, sync, bmap,
+    backup_start(bs, target_bs, speed, sync, bmap, compress,
                  on_source_error, on_target_error,
                  block_job_cb, bs, txn, &local_err);
     if (local_err != NULL) {
@@ -3307,6 +3313,7 @@ void qmp_drive_backup(const char *device, const char *target,
                       bool has_mode, enum NewImageMode mode,
                       bool has_speed, int64_t speed,
                       bool has_bitmap, const char *bitmap,
+                      bool has_compress, bool compress,
                       bool has_on_source_error, BlockdevOnError on_source_error,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
@@ -3314,6 +3321,7 @@ void qmp_drive_backup(const char *device, const char *target,
     return do_drive_backup(device, target, has_format, format, sync,
                            has_mode, mode, has_speed, speed,
                            has_bitmap, bitmap,
+                           has_compress, compress,
                            has_on_source_error, on_source_error,
                            has_on_target_error, on_target_error,
                            NULL, errp);
@@ -3378,7 +3386,7 @@ void do_blockdev_backup(const char *device, const char *target,
 
     bdrv_ref(target_bs);
     bdrv_set_aio_context(target_bs, aio_context);
-    backup_start(bs, target_bs, speed, sync, NULL, on_source_error,
+    backup_start(bs, target_bs, speed, sync, NULL, false, on_source_error,
                  on_target_error, block_job_cb, bs, txn, &local_err);
     if (local_err != NULL) {
         bdrv_unref(target_bs);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 4f4f60a..4b1d15b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1181,8 +1181,8 @@ ETEXI
 
     {
         .name       = "drive_backup",
-        .args_type  = "reuse:-n,full:-f,device:B,target:s,format:s?",
-        .params     = "[-n] [-f] device target [format]",
+        .args_type  = "reuse:-n,full:-f,compress:-c,device:B,target:s,format:s?",
+        .params     = "[-n] [-f] [-c] device target [format]",
         .help       = "initiates a point-in-time\n\t\t\t"
                       "copy for a device. The device's contents are\n\t\t\t"
                       "copied to the new image file, excluding data that\n\t\t\t"
@@ -1190,7 +1190,9 @@ ETEXI
                       "The -n flag requests QEMU to reuse the image found\n\t\t\t"
                       "in new-image-file, instead of recreating it from scratch.\n\t\t\t"
                       "The -f flag requests QEMU to copy the whole disk,\n\t\t\t"
-                      "so that the result does not need a backing file.\n\t\t\t",
+                      "so that the result does not need a backing file.\n\t\t\t"
+                      "The -c flag requests QEMU to compress backup data\n\t\t\t"
+                      "(if the target format supports it).\n\t\t\t",
         .mhandler.cmd = hmp_drive_backup,
     },
 STEXI
diff --git a/hmp.c b/hmp.c
index d510236..e34593d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1094,6 +1094,7 @@ void hmp_drive_backup(Monitor *mon, const QDict *qdict)
     const char *format = qdict_get_try_str(qdict, "format");
     bool reuse = qdict_get_try_bool(qdict, "reuse", false);
     bool full = qdict_get_try_bool(qdict, "full", false);
+    bool compress = qdict_get_try_bool(qdict, "compress", false);
     enum NewImageMode mode;
     Error *err = NULL;
 
@@ -1111,7 +1112,7 @@ void hmp_drive_backup(Monitor *mon, const QDict *qdict)
 
     qmp_drive_backup(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0, false, NULL,
+                     true, mode, false, 0, false, NULL, compress, compress,
                      false, 0, false, 0, &err);
     hmp_handle_error(mon, &err);
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index cae838c..34c5ec7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -712,6 +712,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
 void backup_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode sync_mode,
                   BdrvDirtyBitmap *sync_bitmap,
+                  bool compress,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockCompletionFunc *cb, void *opaque,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 98a20d2..85e5950 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -905,7 +905,7 @@
 { 'struct': 'DriveBackup',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int', '*bitmap': 'str',
+            '*speed': 'int', '*bitmap': 'str', '*compress': 'bool',
             '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 94847e5..8915a0b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1186,7 +1186,8 @@ EQMP
     {
         .name       = "drive-backup",
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
-                      "bitmap:s?,on-source-error:s?,on-target-error:s?",
+                      "bitmap:s?,compress:b?,"
+                      "on-source-error:s?,on-target-error:s?",
         .mhandler.cmd_new = qmp_marshal_drive_backup,
     },
 
@@ -1220,6 +1221,7 @@ Arguments:
 - "mode": whether and how QEMU should create a new image
           (NewImageMode, optional, default 'absolute-paths')
 - "speed": the maximum speed, in bytes per second (json-int, optional)
+- "compress": compress data blocks (if the target format supports it).
 - "on-source-error": the action to take on an error on the source, default
                      'report'.  'stop' and 'enospc' can only be used
                      if the block device supports io-status.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 07/10] blockdev-backup: added support for data compression
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
                   ` (5 preceding siblings ...)
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-16 17:00   ` Eric Blake
  2016-05-27 17:57   ` Stefan Hajnoczi
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 08/10] qemu-iotests: test backup compression in 055 Denis V. Lunev
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 09/10] block: fix backup in vmdk format image Denis V. Lunev
  8 siblings, 2 replies; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 blockdev.c           | 10 +++++++++-
 qapi/block-core.json |  1 +
 qmp-commands.hx      |  3 ++-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 1ef9e31..ed7df2e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1940,6 +1940,7 @@ typedef struct BlockdevBackupState {
 static void do_blockdev_backup(const char *device, const char *target,
                                enum MirrorSyncMode sync,
                                bool has_speed, int64_t speed,
+                               bool has_compress, bool compress,
                                bool has_on_source_error,
                                BlockdevOnError on_source_error,
                                bool has_on_target_error,
@@ -1987,6 +1988,7 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
     do_blockdev_backup(backup->device, backup->target,
                        backup->sync,
                        backup->has_speed, backup->speed,
+                       backup->has_compress, backup->compress,
                        backup->has_on_source_error, backup->on_source_error,
                        backup->has_on_target_error, backup->on_target_error,
                        common->block_job_txn, &local_err);
@@ -3335,6 +3337,7 @@ BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
 void do_blockdev_backup(const char *device, const char *target,
                          enum MirrorSyncMode sync,
                          bool has_speed, int64_t speed,
+                         bool has_compress, bool compress,
                          bool has_on_source_error,
                          BlockdevOnError on_source_error,
                          bool has_on_target_error,
@@ -3356,6 +3359,9 @@ void do_blockdev_backup(const char *device, const char *target,
     if (!has_on_target_error) {
         on_target_error = BLOCKDEV_ON_ERROR_REPORT;
     }
+    if (!has_compress) {
+        compress = false;
+    }
 
     blk = blk_by_name(device);
     if (!blk) {
@@ -3386,7 +3392,7 @@ void do_blockdev_backup(const char *device, const char *target,
 
     bdrv_ref(target_bs);
     bdrv_set_aio_context(target_bs, aio_context);
-    backup_start(bs, target_bs, speed, sync, NULL, false, on_source_error,
+    backup_start(bs, target_bs, speed, sync, NULL, compress, on_source_error,
                  on_target_error, block_job_cb, bs, txn, &local_err);
     if (local_err != NULL) {
         bdrv_unref(target_bs);
@@ -3399,6 +3405,7 @@ out:
 void qmp_blockdev_backup(const char *device, const char *target,
                          enum MirrorSyncMode sync,
                          bool has_speed, int64_t speed,
+                         bool has_compress, bool compress,
                          bool has_on_source_error,
                          BlockdevOnError on_source_error,
                          bool has_on_target_error,
@@ -3406,6 +3413,7 @@ void qmp_blockdev_backup(const char *device, const char *target,
                          Error **errp)
 {
     do_blockdev_backup(device, target, sync, has_speed, speed,
+                       has_compress, compress,
                        has_on_source_error, on_source_error,
                        has_on_target_error, on_target_error,
                        NULL, errp);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 85e5950..ceb23d9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -941,6 +941,7 @@
   'data': { 'device': 'str', 'target': 'str',
             'sync': 'MirrorSyncMode',
             '*speed': 'int',
+            '*compress': 'bool',
             '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8915a0b..6eb5b43 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1241,7 +1241,7 @@ EQMP
 
     {
         .name       = "blockdev-backup",
-        .args_type  = "sync:s,device:B,target:B,speed:i?,"
+        .args_type  = "sync:s,device:B,target:B,speed:i?,compress:b?,"
                       "on-source-error:s?,on-target-error:s?",
         .mhandler.cmd_new = qmp_marshal_blockdev_backup,
     },
@@ -1263,6 +1263,7 @@ Arguments:
           sectors allocated in the topmost image, or "none" to only replicate
           new I/O (MirrorSyncMode).
 - "speed": the maximum speed, in bytes per second (json-int, optional)
+- "compress": compress data blocks (if the target format supports it).
 - "on-source-error": the action to take on an error on the source, default
                      'report'.  'stop' and 'enospc' can only be used
                      if the block device supports io-status.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 08/10] qemu-iotests: test backup compression in 055
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
                   ` (6 preceding siblings ...)
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 07/10] blockdev-backup: " Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 09/10] block: fix backup in vmdk format image Denis V. Lunev
  8 siblings, 0 replies; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

Added cases to check the backup compression out of qcow2, raw in qcow2
on drive-backup and blockdev-backup.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/055        | 97 +++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/055.out    |  4 +-
 tests/qemu-iotests/iotests.py | 10 ++---
 3 files changed, 104 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index c8e3578..3ac6894 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -451,5 +451,102 @@ class TestSingleTransaction(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/class', 'GenericError')
         self.assert_no_active_block_jobs()
 
+
+class TestDriveCompression(iotests.QMPTestCase):
+    image_len = 64 * 1024 * 1024 # MB
+    outfmt = 'qcow2'
+
+    def setUp(self):
+        # Write data to the image so we can compare later
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestDriveCompression.image_len))
+        qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x11 0 64k', test_img)
+        qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x00 64k 128k', test_img)
+        qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x22 162k 32k', test_img)
+        qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x33 67043328 64k', test_img)
+
+        qemu_img('create', '-f', TestDriveCompression.outfmt, blockdev_target_img,
+                 str(TestDriveCompression.image_len))
+        self.vm = iotests.VM().add_drive(test_img).add_drive(blockdev_target_img,
+                                                             format=TestDriveCompression.outfmt)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(blockdev_target_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def do_test_compress_complete(self, cmd, **args):
+        self.assert_no_active_block_jobs()
+
+        result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, **args)
+        self.assert_qmp(result, 'return', {})
+
+        self.wait_until_completed()
+
+        self.vm.shutdown()
+        self.assertTrue(iotests.compare_images(test_img, blockdev_target_img,
+                                               iotests.imgfmt, TestDriveCompression.outfmt),
+                        'target image does not match source after backup')
+
+    def test_complete_compress_drive_bakup(self):
+        self.do_test_compress_complete('drive-backup', target=blockdev_target_img, mode='existing')
+
+    def test_complete_compress_blockdev_backup(self):
+        self.do_test_compress_complete('blockdev-backup', target='drive1')
+
+    def do_test_compress_cancel(self, cmd, **args):
+        self.assert_no_active_block_jobs()
+
+        result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, **args)
+        self.assert_qmp(result, 'return', {})
+
+        event = self.cancel_and_wait()
+        self.assert_qmp(event, 'data/type', 'backup')
+
+    def test_compress_cancel_drive_bakup(self):
+        self.do_test_compress_cancel('drive-backup', target=blockdev_target_img, mode='existing')
+
+    def test_compress_cancel_blockdev_backup(self):
+        self.do_test_compress_cancel('blockdev-backup', target='drive1')
+
+    def do_test_compress_pause(self, cmd, **args):
+        self.assert_no_active_block_jobs()
+
+        self.vm.pause_drive('drive0')
+        result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, **args)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.vm.resume_drive('drive0')
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.wait_until_completed()
+
+        self.vm.shutdown()
+        self.assertTrue(iotests.compare_images(test_img, blockdev_target_img,
+                                               iotests.imgfmt, TestDriveCompression.outfmt),
+                        'target image does not match source after backup')
+
+    def test_compress_pause_drive_bakup(self):
+        self.do_test_compress_pause('drive-backup', target=blockdev_target_img, mode='existing')
+
+    def test_compress_pause_blockdev_backup(self):
+        self.do_test_compress_pause('blockdev-backup', target='drive1')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['raw', 'qcow2'])
diff --git a/tests/qemu-iotests/055.out b/tests/qemu-iotests/055.out
index 42314e9..5ce2f9a 100644
--- a/tests/qemu-iotests/055.out
+++ b/tests/qemu-iotests/055.out
@@ -1,5 +1,5 @@
-........................
+..............................
 ----------------------------------------------------------------------
-Ran 24 tests
+Ran 30 tests
 
 OK
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 1687c33..68830a42 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -89,10 +89,10 @@ def qemu_io(*args):
         sys.stderr.write('qemu-io received signal %i: %s\n' % (-exitcode, ' '.join(args)))
     return subp.communicate()[0]
 
-def compare_images(img1, img2):
+def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
     '''Return True if two image files are identical'''
-    return qemu_img('compare', '-f', imgfmt,
-                    '-F', imgfmt, img1, img2) == 0
+    return qemu_img('compare', '-f', fmt1,
+                    '-F', fmt2, img1, img2) == 0
 
 def create_image(name, size):
     '''Create a fully-allocated raw image with sector markers'''
@@ -175,14 +175,14 @@ class VM(object):
         self._args.append(opts)
         return self
 
-    def add_drive(self, path, opts='', interface='virtio'):
+    def add_drive(self, path, opts='', interface='virtio', format=imgfmt):
         '''Add a virtio-blk drive to the VM'''
         options = ['if=%s' % interface,
                    'id=drive%d' % self._num_drives]
 
         if path is not None:
             options.append('file=%s' % path)
-            options.append('format=%s' % imgfmt)
+            options.append('format=%s' % format)
             options.append('cache=%s' % cachemode)
 
         if opts:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 09/10] block: fix backup in vmdk format image
  2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
                   ` (7 preceding siblings ...)
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 08/10] qemu-iotests: test backup compression in 055 Denis V. Lunev
@ 2016-05-14 12:45 ` Denis V. Lunev
  2016-05-27 18:01   ` Stefan Hajnoczi
  8 siblings, 1 reply; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-14 12:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: den, Pavel Butsykin, Jeff Cody, Markus Armbruster, Eric Blake,
	John Snow, Stefan Hajnoczi, Kevin Wolf

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

The vmdk format has the extents and bs->file can be equal to the first
extension. Before start of the backup we do detach the old context on the
target drive at the bdrv_attach_aio_context. For the vmdk drive this means
a double detach of the same block driver state, because the detach occurs
for s->extents[0].file and bs->file.

To fix we  just skip the detach if s->extents[i].file and bs->file are the
same. This approach is already used in the vmdk_free_extents() and the
vmdk_get_allocated_file_size(), so it won't be some innovation :)

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/vmdk.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 9530b30..0550924 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2305,7 +2305,10 @@ static void vmdk_detach_aio_context(BlockDriverState *bs)
     int i;
 
     for (i = 0; i < s->num_extents; i++) {
-        bdrv_detach_aio_context(s->extents[i].file->bs);
+        BdrvChild *file = s->extents[i].file;
+        if (file != bs->file) {
+            bdrv_detach_aio_context(file->bs);
+        }
     }
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed Denis V. Lunev
@ 2016-05-16 16:52   ` Eric Blake
  2016-05-17 15:01     ` Pavel Butsykin
  2016-05-19 21:25   ` Stefan Hajnoczi
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-16 16:52 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-devel
  Cc: Pavel Butsykin, Jeff Cody, Markus Armbruster, John Snow,
	Stefan Hajnoczi, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 2495 bytes --]

On 05/14/2016 06:45 AM, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> This patch just adds the interface to the bdrv_co_write_compressed, which
> is currently not used but will be useful for safe implementation of the
> bdrv_co_write_compressed callback in format drivers.
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---

> +++ b/block/io.c
> @@ -1828,8 +1828,8 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>      return 0;
>  }
>  
> -int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
> -                          const uint8_t *buf, int nb_sectors)
> +int bdrv_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
> +                             int nb_sectors, QEMUIOVector *qiov)

As long as we're adding a new public interface, I'd really like us to
make it byte-based.  int64_t sector_num might be better represented as a
byte offset, and int nb_sectors seems redundant with qiov->size.

>  {
>      BlockDriver *drv = bs->drv;
>      int ret;
> @@ -1837,7 +1837,7 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>      if (!drv) {
>          return -ENOMEDIUM;
>      }
> -    if (!drv->bdrv_write_compressed) {
> +    if (!drv->bdrv_co_write_compressed) {
>          return -ENOTSUP;
>      }
>      ret = bdrv_check_request(bs, sector_num, nb_sectors);
> @@ -1846,8 +1846,71 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>      }
>  
>      assert(QLIST_EMPTY(&bs->dirty_bitmaps));
> +    assert(qemu_in_coroutine());
> +
> +    return drv->bdrv_co_write_compressed(bs, sector_num, nb_sectors, qiov);

Of course, if you make the public interface byte-based, then calling
into the back end will have to scale back to sectors (after first
asserting that we aren't violating the scaling); see how Kevin did it in
commit 166fe9605.

> +}
> +
> +typedef struct BdrvWriteCompressedCo {
> +    BlockDriverState *bs;
> +    int64_t sector_num;

Again, I think a byte offset is smarter than a sector number.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] block: remove BlockDriver.bdrv_write_compressed
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 05/10] block: remove BlockDriver.bdrv_write_compressed Denis V. Lunev
@ 2016-05-16 16:57   ` Eric Blake
  2016-05-17 12:22     ` Pavel Butsykin
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-16 16:57 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-devel
  Cc: Pavel Butsykin, Jeff Cody, Markus Armbruster, John Snow,
	Stefan Hajnoczi, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 1328 bytes --]

On 05/14/2016 06:45 AM, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> There are no block drivers left that implement the old
> .bdrv_write_compressed interface, so it can be removed now.
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---

> +++ b/block/io.c
> @@ -1876,7 +1876,6 @@ static void bdrv_write_compressed_co_entry(void *opaque)
>  int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>                            const uint8_t *buf, int nb_sectors)
>  {
> -    BlockDriver *drv = bs->drv;
>      BdrvWriteCompressedCo data = {
>          .bs         = bs,
>          .sector_num = sector_num,
> @@ -1885,19 +1884,6 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>          .ret        = -EINPROGRESS,
>      };
>  
> -    if (!drv) {
> -        return -ENOMEDIUM;
> -    }

Why are you deleting this check?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression Denis V. Lunev
@ 2016-05-16 16:59   ` Eric Blake
  2016-05-27 17:56   ` Stefan Hajnoczi
  1 sibling, 0 replies; 34+ messages in thread
From: Eric Blake @ 2016-05-16 16:59 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-devel
  Cc: Pavel Butsykin, Jeff Cody, Markus Armbruster, John Snow,
	Stefan Hajnoczi, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 2506 bytes --]

On 05/14/2016 06:45 AM, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> The idea is simple - backup is "written-once" data. It is written block
> by block and it is large enough. It would be nice to save storage
> space and compress it.
> 
> The patch adds a flag to the qmp/hmp drive-backup command which enables
> block compression. Compression should be implemented in the format driver
> to enable this feature.
> 
> There are some limitations of the format driver to allow compressed writes.
> We can write data only once. Though for backup this is perfectly fine.
> These limitations are maintained by the driver and the error will be
> reported if we are doing something wrong.
> 

> +++ b/qapi/block-core.json
> @@ -905,7 +905,7 @@
>  { 'struct': 'DriveBackup',
>    'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>              'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> -            '*speed': 'int', '*bitmap': 'str',
> +            '*speed': 'int', '*bitmap': 'str', '*compress': 'bool',

Missing documentation of the new option.

>              '*on-source-error': 'BlockdevOnError',
>              '*on-target-error': 'BlockdevOnError' } }
>  
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 94847e5..8915a0b 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1186,7 +1186,8 @@ EQMP
>      {
>          .name       = "drive-backup",
>          .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
> -                      "bitmap:s?,on-source-error:s?,on-target-error:s?",
> +                      "bitmap:s?,compress:b?,"
> +                      "on-source-error:s?,on-target-error:s?",
>          .mhandler.cmd_new = qmp_marshal_drive_backup,
>      },
>  
> @@ -1220,6 +1221,7 @@ Arguments:
>  - "mode": whether and how QEMU should create a new image
>            (NewImageMode, optional, default 'absolute-paths')
>  - "speed": the maximum speed, in bytes per second (json-int, optional)
> +- "compress": compress data blocks (if the target format supports it).

Missing mention that it is optional, default false.

>  - "on-source-error": the action to take on an error on the source, default
>                       'report'.  'stop' and 'enospc' can only be used
>                       if the block device supports io-status.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] blockdev-backup: added support for data compression
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 07/10] blockdev-backup: " Denis V. Lunev
@ 2016-05-16 17:00   ` Eric Blake
  2016-05-27 17:57   ` Stefan Hajnoczi
  1 sibling, 0 replies; 34+ messages in thread
From: Eric Blake @ 2016-05-16 17:00 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-devel
  Cc: Pavel Butsykin, Jeff Cody, Markus Armbruster, John Snow,
	Stefan Hajnoczi, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]

On 05/14/2016 06:45 AM, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> The idea is simple - backup is "written-once" data. It is written block
> by block and it is large enough. It would be nice to save storage
> space and compress it.
> 

> +++ b/qapi/block-core.json
> @@ -941,6 +941,7 @@
>    'data': { 'device': 'str', 'target': 'str',
>              'sync': 'MirrorSyncMode',
>              '*speed': 'int',
> +            '*compress': 'bool',

Missing documentation.

>              '*on-source-error': 'BlockdevOnError',
>              '*on-target-error': 'BlockdevOnError' } }
>  
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 8915a0b..6eb5b43 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1241,7 +1241,7 @@ EQMP
>  
>      {
>          .name       = "blockdev-backup",
> -        .args_type  = "sync:s,device:B,target:B,speed:i?,"
> +        .args_type  = "sync:s,device:B,target:B,speed:i?,compress:b?,"
>                        "on-source-error:s?,on-target-error:s?",
>          .mhandler.cmd_new = qmp_marshal_blockdev_backup,
>      },
> @@ -1263,6 +1263,7 @@ Arguments:
>            sectors allocated in the topmost image, or "none" to only replicate
>            new I/O (MirrorSyncMode).
>  - "speed": the maximum speed, in bytes per second (json-int, optional)
> +- "compress": compress data blocks (if the target format supports it).

Missing mention that it is optional, default false.

>  - "on-source-error": the action to take on an error on the source, default
>                       'report'.  'stop' and 'enospc' can only be used
>                       if the block device supports io-status.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] block: remove BlockDriver.bdrv_write_compressed
  2016-05-16 16:57   ` Eric Blake
@ 2016-05-17 12:22     ` Pavel Butsykin
  0 siblings, 0 replies; 34+ messages in thread
From: Pavel Butsykin @ 2016-05-17 12:22 UTC (permalink / raw)
  To: Eric Blake, Denis V. Lunev, qemu-devel
  Cc: Jeff Cody, Markus Armbruster, John Snow, Stefan Hajnoczi, Kevin Wolf

On 16.05.2016 19:57, Eric Blake wrote:
> On 05/14/2016 06:45 AM, Denis V. Lunev wrote:
>> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
>>
>> There are no block drivers left that implement the old
>> .bdrv_write_compressed interface, so it can be removed now.
>>
>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Jeff Cody <jcody@redhat.com>
>> CC: Markus Armbruster <armbru@redhat.com>
>> CC: Eric Blake <eblake@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> ---
>
>> +++ b/block/io.c
>> @@ -1876,7 +1876,6 @@ static void bdrv_write_compressed_co_entry(void *opaque)
>>   int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>>                             const uint8_t *buf, int nb_sectors)
>>   {
>> -    BlockDriver *drv = bs->drv;
>>       BdrvWriteCompressedCo data = {
>>           .bs         = bs,
>>           .sector_num = sector_num,
>> @@ -1885,19 +1884,6 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>>           .ret        = -EINPROGRESS,
>>       };
>>
>> -    if (!drv) {
>> -        return -ENOMEDIUM;
>> -    }
>
> Why are you deleting this check?
>

Because this check is duplicated in bdrv_co_write_compressed(), but
in this place it was necessary for drv->bdrv_write_compressed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed
  2016-05-16 16:52   ` Eric Blake
@ 2016-05-17 15:01     ` Pavel Butsykin
  0 siblings, 0 replies; 34+ messages in thread
From: Pavel Butsykin @ 2016-05-17 15:01 UTC (permalink / raw)
  To: Eric Blake, Denis V. Lunev, qemu-devel, Kevin Wolf
  Cc: Jeff Cody, Markus Armbruster, John Snow, Stefan Hajnoczi

On 16.05.2016 19:52, Eric Blake wrote:
> On 05/14/2016 06:45 AM, Denis V. Lunev wrote:
>> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
>>
>> This patch just adds the interface to the bdrv_co_write_compressed, which
>> is currently not used but will be useful for safe implementation of the
>> bdrv_co_write_compressed callback in format drivers.
>>
>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Jeff Cody <jcody@redhat.com>
>> CC: Markus Armbruster <armbru@redhat.com>
>> CC: Eric Blake <eblake@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> ---
>
>> +++ b/block/io.c
>> @@ -1828,8 +1828,8 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>>       return 0;
>>   }
>>
>> -int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>> -                          const uint8_t *buf, int nb_sectors)
>> +int bdrv_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
>> +                             int nb_sectors, QEMUIOVector *qiov)
>
> As long as we're adding a new public interface, I'd really like us to
> make it byte-based.  int64_t sector_num might be better represented as a
> byte offset, and int nb_sectors seems redundant with qiov->size.
>
>>   {
>>       BlockDriver *drv = bs->drv;
>>       int ret;
>> @@ -1837,7 +1837,7 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>>       if (!drv) {
>>           return -ENOMEDIUM;
>>       }
>> -    if (!drv->bdrv_write_compressed) {
>> +    if (!drv->bdrv_co_write_compressed) {
>>           return -ENOTSUP;
>>       }
>>       ret = bdrv_check_request(bs, sector_num, nb_sectors);
>> @@ -1846,8 +1846,71 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>>       }
>>
>>       assert(QLIST_EMPTY(&bs->dirty_bitmaps));
>> +    assert(qemu_in_coroutine());
>> +
>> +    return drv->bdrv_co_write_compressed(bs, sector_num, nb_sectors, qiov);
>
> Of course, if you make the public interface byte-based, then calling
> into the back end will have to scale back to sectors (after first
> asserting that we aren't violating the scaling); see how Kevin did it in
> commit 166fe9605.
>
>> +}
>> +
>> +typedef struct BdrvWriteCompressedCo {
>> +    BlockDriverState *bs;
>> +    int64_t sector_num;
>
> Again, I think a byte offset is smarter than a sector number.
>

Kevin used the byte offset for functions bdrv_driver_pread/_pwrite(It
looks like just an additional interface), which is not the same thing.
Here the bdrv_co/bdrv_write_compressed functions are analogues of the
bdrv_co/bdrv_write functions that still use sectors in the arguments.
So I'm not sure that the interface there needs to be some other.


Kevin, what do you think about this?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed Denis V. Lunev
  2016-05-16 16:52   ` Eric Blake
@ 2016-05-19 21:25   ` Stefan Hajnoczi
  2016-05-19 21:39     ` Denis V. Lunev
  1 sibling, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-05-19 21:25 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-devel, Pavel Butsykin, Jeff Cody, Markus Armbruster,
	Eric Blake, John Snow, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]

On Sat, May 14, 2016 at 03:45:49PM +0300, Denis V. Lunev wrote:
> diff --git a/block/io.c b/block/io.c
> index cd6d71a..88af10c 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1828,8 +1828,8 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>      return 0;
>  }
>  
> -int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
> -                          const uint8_t *buf, int nb_sectors)
> +int bdrv_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
> +                             int nb_sectors, QEMUIOVector *qiov)

Please use the coroutine_fn attribute to declare that this function must
be called in coroutine context.

> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index a029c20..3c93ddb 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -208,6 +208,9 @@ struct BlockDriver {
>      int (*bdrv_write_compressed)(BlockDriverState *bs, int64_t sector_num,
>                                   const uint8_t *buf, int nb_sectors);
>  
> +    int (*bdrv_co_write_compressed)(BlockDriverState *bs, int64_t sector_num,
> +                                    int nb_sectors, QEMUIOVector *qiov);

Please add the coroutine_fn attribute just like .bdrv_co_readv() and
friends.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed
  2016-05-19 21:25   ` Stefan Hajnoczi
@ 2016-05-19 21:39     ` Denis V. Lunev
  0 siblings, 0 replies; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-19 21:39 UTC (permalink / raw)
  To: Stefan Hajnoczi, Denis V. Lunev
  Cc: Kevin Wolf, Pavel Butsykin, Jeff Cody, Markus Armbruster,
	qemu-devel, John Snow

On 05/20/2016 12:25 AM, Stefan Hajnoczi wrote:
> On Sat, May 14, 2016 at 03:45:49PM +0300, Denis V. Lunev wrote:
>> diff --git a/block/io.c b/block/io.c
>> index cd6d71a..88af10c 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1828,8 +1828,8 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>>       return 0;
>>   }
>>   
>> -int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
>> -                          const uint8_t *buf, int nb_sectors)
>> +int bdrv_co_write_compressed(BlockDriverState *bs, int64_t sector_num,
>> +                             int nb_sectors, QEMUIOVector *qiov)
> Please use the coroutine_fn attribute to declare that this function must
> be called in coroutine context.
>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index a029c20..3c93ddb 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -208,6 +208,9 @@ struct BlockDriver {
>>       int (*bdrv_write_compressed)(BlockDriverState *bs, int64_t sector_num,
>>                                    const uint8_t *buf, int nb_sectors);
>>   
>> +    int (*bdrv_co_write_compressed)(BlockDriverState *bs, int64_t sector_num,
>> +                                    int nb_sectors, QEMUIOVector *qiov);
> Please add the coroutine_fn attribute just like .bdrv_co_readv() and
> friends.
yep. This seems right thing to do.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed Denis V. Lunev
@ 2016-05-27 17:33   ` Stefan Hajnoczi
  2016-05-30  9:12     ` Pavel Butsykin
  2016-06-01  9:25     ` Kevin Wolf
  0 siblings, 2 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-05-27 17:33 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Pavel Butsykin, Jeff Cody,
	Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 2472 bytes --]

On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote:
> +    qemu_co_mutex_lock(&s->lock);
> +    cluster_offset = \
> +        qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len);

The backslash isn't necessary for wrapping lines in C.  This kind of
thing is only necessary in languages like Python where the grammar is
whitespace sensistive.

The C compiler is happy with an arbitrary amount of whitespace
(newlines) in the middle of a statement.  The backslash in C is handled
by the preprocessor: it joins the line.  That's useful for macro
definitions where you need to tell the preprocessor that several lines
belong to one macro definition.  But it's not needed for normal C code.

> +    if (!cluster_offset) {
> +        qemu_co_mutex_unlock(&s->lock);
> +        ret = -EIO;
> +        goto fail;
> +    }
> +    cluster_offset &= s->cluster_offset_mask;
>  
> -        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
> -        ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len);
> -        if (ret < 0) {
> -            goto fail;
> -        }
> +    ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
> +    qemu_co_mutex_unlock(&s->lock);
> +    if (ret < 0) {
> +        goto fail;
>      }
>  
> +    iov = (struct iovec) {
> +        .iov_base   = out_buf,
> +        .iov_len    = out_len,
> +    };
> +    qemu_iovec_init_external(&hd_qiov, &iov, 1);
> +
> +    BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
> +    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);

There is a race condition here:

If the newly allocated cluster is only partially filled by compressed
data then qcow2_alloc_compressed_cluster_offset() remembers that more
bytes are still available in the cluster.  The
qcow2_alloc_compressed_cluster_offset() caller will continue filling the
same cluster.

Imagine two compressed writes running at the same time.  Write A
allocates just a few bytes so write B shares a sector with the first
write:

     Sector 1
  |AAABBBBBBBBB|

The race condition is that bdrv_co_pwritev() uses read-modify-write (a
bounce buffer).  If both requests call bdrv_co_pwritev() around the same
time then the following could happen:

     Sector 1
  |000BBBBBBBBB|

or:

     Sector 1
  |AAA000000000|

It's necessary to hold s->lock around the compressed data write to avoid
this race condition.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] vmdk: add vmdk_co_write_compressed
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 03/10] vmdk: add vmdk_co_write_compressed Denis V. Lunev
@ 2016-05-27 17:38   ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-05-27 17:38 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Pavel Butsykin, Jeff Cody,
	Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

On Sat, May 14, 2016 at 03:45:51PM +0300, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> Added implementation of the vmdk_co_write_compressed function that
> will allow us to safely use compressed writes for the vmdk from running
> VMs.
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/vmdk.c | 56 ++++++--------------------------------------------------
>  1 file changed, 6 insertions(+), 50 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] qcow: add qcow_co_write_compressed
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 04/10] qcow: add qcow_co_write_compressed Denis V. Lunev
@ 2016-05-27 17:45   ` Stefan Hajnoczi
  2016-05-30 14:27     ` Pavel Butsykin
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-05-27 17:45 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Pavel Butsykin, Jeff Cody,
	Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Sat, May 14, 2016 at 03:45:52PM +0300, Denis V. Lunev wrote:
> +    qemu_co_mutex_lock(&s->lock);
> +    cluster_offset = get_cluster_offset(bs, sector_num << 9, 2, out_len, 0, 0);
> +    qemu_co_mutex_unlock(&s->lock);
> +    if (cluster_offset == 0) {
> +        ret = -EIO;
> +        goto fail;
> +    }
> +    cluster_offset &= s->cluster_offset_mask;
> +
> +    iov = (struct iovec) {
> +        .iov_base   = out_buf,
> +        .iov_len    = out_len,
> +    };
> +    qemu_iovec_init_external(&hd_qiov, &iov, 1);
> +    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);

Not sure if this has the same race condition as the qcow2 patch.  It
seems that bdrv_getlength() is used to extend the file on a per-sector
basis.  That would mean compressed data is not packed inside sectors and
no read-write-modify race condition exists, but I haven't fully audited
get_cluster_offset().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression Denis V. Lunev
  2016-05-16 16:59   ` Eric Blake
@ 2016-05-27 17:56   ` Stefan Hajnoczi
  1 sibling, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-05-27 17:56 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Pavel Butsykin, Jeff Cody,
	Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]

On Sat, May 14, 2016 at 03:45:54PM +0300, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> The idea is simple - backup is "written-once" data. It is written block
> by block and it is large enough. It would be nice to save storage
> space and compress it.
> 
> The patch adds a flag to the qmp/hmp drive-backup command which enables
> block compression. Compression should be implemented in the format driver
> to enable this feature.
> 
> There are some limitations of the format driver to allow compressed writes.
> We can write data only once. Though for backup this is perfectly fine.
> These limitations are maintained by the driver and the error will be
> reported if we are doing something wrong.
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/backup.c            | 13 +++++++++++++
>  blockdev.c                | 12 ++++++++++--
>  hmp-commands.hx           |  8 +++++---
>  hmp.c                     |  3 ++-
>  include/block/block_int.h |  1 +
>  qapi/block-core.json      |  2 +-
>  qmp-commands.hx           |  4 +++-
>  7 files changed, 35 insertions(+), 8 deletions(-)

Aside from the API doc issues that Eric mentioned:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] blockdev-backup: added support for data compression
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 07/10] blockdev-backup: " Denis V. Lunev
  2016-05-16 17:00   ` Eric Blake
@ 2016-05-27 17:57   ` Stefan Hajnoczi
  1 sibling, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-05-27 17:57 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Pavel Butsykin, Jeff Cody,
	Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

On Sat, May 14, 2016 at 03:45:55PM +0300, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> The idea is simple - backup is "written-once" data. It is written block
> by block and it is large enough. It would be nice to save storage
> space and compress it.
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>  blockdev.c           | 10 +++++++++-
>  qapi/block-core.json |  1 +
>  qmp-commands.hx      |  3 ++-
>  3 files changed, 12 insertions(+), 2 deletions(-)

Aside from the API doc issues that Eric mentioned:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] block: fix backup in vmdk format image
  2016-05-14 12:45 ` [Qemu-devel] [PATCH 09/10] block: fix backup in vmdk format image Denis V. Lunev
@ 2016-05-27 18:01   ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-05-27 18:01 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Pavel Butsykin, Jeff Cody,
	Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

On Sat, May 14, 2016 at 03:45:57PM +0300, Denis V. Lunev wrote:
> From: Pavel Butsykin <pbutsykin@virtuozzo.com>
> 
> The vmdk format has the extents and bs->file can be equal to the first
> extension. Before start of the backup we do detach the old context on the
> target drive at the bdrv_attach_aio_context. For the vmdk drive this means
> a double detach of the same block driver state, because the detach occurs
> for s->extents[0].file and bs->file.
> 
> To fix we  just skip the detach if s->extents[i].file and bs->file are the
> same. This approach is already used in the vmdk_free_extents() and the
> vmdk_get_allocated_file_size(), so it won't be some innovation :)
> 
> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/vmdk.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 9530b30..0550924 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -2305,7 +2305,10 @@ static void vmdk_detach_aio_context(BlockDriverState *bs)
>      int i;
>  
>      for (i = 0; i < s->num_extents; i++) {
> -        bdrv_detach_aio_context(s->extents[i].file->bs);
> +        BdrvChild *file = s->extents[i].file;
> +        if (file != bs->file) {
> +            bdrv_detach_aio_context(file->bs);
> +        }
>      }
>  }

I think this fix is no longer necessary.  Max eliminated
vmdk_detach_aio_context() here:

[PULL 24/31] block: Propagate AioContext change to all children

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-27 17:33   ` Stefan Hajnoczi
@ 2016-05-30  9:12     ` Pavel Butsykin
  2016-05-30 12:58       ` Pavel Butsykin
  2016-06-01  9:25     ` Kevin Wolf
  1 sibling, 1 reply; 34+ messages in thread
From: Pavel Butsykin @ 2016-05-30  9:12 UTC (permalink / raw)
  To: Stefan Hajnoczi, Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Jeff Cody, Markus Armbruster,
	Stefan Hajnoczi, John Snow

On 27.05.2016 20:33, Stefan Hajnoczi wrote:
> On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote:
>> +    qemu_co_mutex_lock(&s->lock);
>> +    cluster_offset = \
>> +        qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len);
>
> The backslash isn't necessary for wrapping lines in C.  This kind of
> thing is only necessary in languages like Python where the grammar is
> whitespace sensistive.
>
> The C compiler is happy with an arbitrary amount of whitespace
> (newlines) in the middle of a statement.  The backslash in C is handled
> by the preprocessor: it joins the line.  That's useful for macro
> definitions where you need to tell the preprocessor that several lines
> belong to one macro definition.  But it's not needed for normal C code.
>
Thanks for the explanation, but the backslash is used more for the
person as a marker a line break. The current coding style misses this
point, but I can remove the backslash, because I don't think it's
something important :)

>> +    if (!cluster_offset) {
>> +        qemu_co_mutex_unlock(&s->lock);
>> +        ret = -EIO;
>> +        goto fail;
>> +    }
>> +    cluster_offset &= s->cluster_offset_mask;
>>
>> -        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
>> -        ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len);
>> -        if (ret < 0) {
>> -            goto fail;
>> -        }
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    if (ret < 0) {
>> +        goto fail;
>>       }
>>
>> +    iov = (struct iovec) {
>> +        .iov_base   = out_buf,
>> +        .iov_len    = out_len,
>> +    };
>> +    qemu_iovec_init_external(&hd_qiov, &iov, 1);
>> +
>> +    BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
>> +    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);
>
> There is a race condition here:
>
> If the newly allocated cluster is only partially filled by compressed
> data then qcow2_alloc_compressed_cluster_offset() remembers that more
> bytes are still available in the cluster.  The
> qcow2_alloc_compressed_cluster_offset() caller will continue filling the
> same cluster.
>
> Imagine two compressed writes running at the same time.  Write A
> allocates just a few bytes so write B shares a sector with the first
> write:
>
>       Sector 1
>    |AAABBBBBBBBB|
>
> The race condition is that bdrv_co_pwritev() uses read-modify-write (a
> bounce buffer).  If both requests call bdrv_co_pwritev() around the same
> time then the following could happen:
>
>       Sector 1
>    |000BBBBBBBBB|
>
> or:
>
>       Sector 1
>    |AAA000000000|
>
> It's necessary to hold s->lock around the compressed data write to avoid
> this race condition.
>
I agree, there is really a race.. Thank you, this is a very good point!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-30  9:12     ` Pavel Butsykin
@ 2016-05-30 12:58       ` Pavel Butsykin
  2016-05-31 18:42         ` Eric Blake
  0 siblings, 1 reply; 34+ messages in thread
From: Pavel Butsykin @ 2016-05-30 12:58 UTC (permalink / raw)
  To: Stefan Hajnoczi, Denis V. Lunev
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, John Snow

On 30.05.2016 12:12, Pavel Butsykin wrote:
> On 27.05.2016 20:33, Stefan Hajnoczi wrote:
>> On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote:
>>> +    qemu_co_mutex_lock(&s->lock);
>>> +    cluster_offset = \
>>> +        qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9,
>>> out_len);
>>
>> The backslash isn't necessary for wrapping lines in C.  This kind of
>> thing is only necessary in languages like Python where the grammar is
>> whitespace sensistive.
>>
>> The C compiler is happy with an arbitrary amount of whitespace
>> (newlines) in the middle of a statement.  The backslash in C is handled
>> by the preprocessor: it joins the line.  That's useful for macro
>> definitions where you need to tell the preprocessor that several lines
>> belong to one macro definition.  But it's not needed for normal C code.
>>
> Thanks for the explanation, but the backslash is used more for the
> person as a marker a line break. The current coding style misses this
> point, but I can remove the backslash, because I don't think it's
> something important :)
>
>>> +    if (!cluster_offset) {
>>> +        qemu_co_mutex_unlock(&s->lock);
>>> +        ret = -EIO;
>>> +        goto fail;
>>> +    }
>>> +    cluster_offset &= s->cluster_offset_mask;
>>>
>>> -        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
>>> -        ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf,
>>> out_len);
>>> -        if (ret < 0) {
>>> -            goto fail;
>>> -        }
>>> +    ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset,
>>> out_len);
>>> +    qemu_co_mutex_unlock(&s->lock);
>>> +    if (ret < 0) {
>>> +        goto fail;
>>>       }
>>>
>>> +    iov = (struct iovec) {
>>> +        .iov_base   = out_buf,
>>> +        .iov_len    = out_len,
>>> +    };
>>> +    qemu_iovec_init_external(&hd_qiov, &iov, 1);
>>> +
>>> +    BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
>>> +    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len,
>>> &hd_qiov, 0);
>>
>> There is a race condition here:
>>
>> If the newly allocated cluster is only partially filled by compressed
>> data then qcow2_alloc_compressed_cluster_offset() remembers that more
>> bytes are still available in the cluster.  The
>> qcow2_alloc_compressed_cluster_offset() caller will continue filling the
>> same cluster.
>>
>> Imagine two compressed writes running at the same time.  Write A
>> allocates just a few bytes so write B shares a sector with the first
>> write:

Sorry, but it seems this will never happen, because the second write
will not pass this check:

uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                                uint64_t offset,
                                                int compressed_size)
{
     ...
     /* Compression can't overwrite anything. Fail if the cluster was 
already
      * allocated. */
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
     if (cluster_offset & L2E_OFFSET_MASK) {
         qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
         return 0;
     }
    ...

As you can see we can't do the compressed write in the already allocated
cluster.

>>
>>       Sector 1
>>    |AAABBBBBBBBB|
>>
>> The race condition is that bdrv_co_pwritev() uses read-modify-write (a
>> bounce buffer).  If both requests call bdrv_co_pwritev() around the same
>> time then the following could happen:
>>
>>       Sector 1
>>    |000BBBBBBBBB|
>>
>> or:
>>
>>       Sector 1
>>    |AAA000000000|
>>
>> It's necessary to hold s->lock around the compressed data write to avoid
>> this race condition.
>>
> I agree, there is really a race.. Thank you, this is a very good point!
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] qcow: add qcow_co_write_compressed
  2016-05-27 17:45   ` Stefan Hajnoczi
@ 2016-05-30 14:27     ` Pavel Butsykin
  0 siblings, 0 replies; 34+ messages in thread
From: Pavel Butsykin @ 2016-05-30 14:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, Denis V. Lunev
  Cc: qemu-devel, Kevin Wolf, Jeff Cody, Markus Armbruster,
	Stefan Hajnoczi, John Snow

On 27.05.2016 20:45, Stefan Hajnoczi wrote:
> On Sat, May 14, 2016 at 03:45:52PM +0300, Denis V. Lunev wrote:
>> +    qemu_co_mutex_lock(&s->lock);
>> +    cluster_offset = get_cluster_offset(bs, sector_num << 9, 2, out_len, 0, 0);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    if (cluster_offset == 0) {
>> +        ret = -EIO;
>> +        goto fail;
>> +    }
>> +    cluster_offset &= s->cluster_offset_mask;
>> +
>> +    iov = (struct iovec) {
>> +        .iov_base   = out_buf,
>> +        .iov_len    = out_len,
>> +    };
>> +    qemu_iovec_init_external(&hd_qiov, &iov, 1);
>> +    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);
>
> Not sure if this has the same race condition as the qcow2 patch.  It
> seems that bdrv_getlength() is used to extend the file on a per-sector
> basis.  That would mean compressed data is not packed inside sectors and
> no read-write-modify race condition exists, but I haven't fully audited
> get_cluster_offset().
>

The get_cluster_offset() also doesn't allow to do multiple compressed
writes in a single cluster, because this function for all offsets
within the cluster returns the same cluster_offset. So here we just
can't write at offset in the cluster, only at the beginning of the
cluster.

> Stefan
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-30 12:58       ` Pavel Butsykin
@ 2016-05-31 18:42         ` Eric Blake
  2016-05-31 21:00           ` Denis V. Lunev
  2016-06-01  9:31           ` Kevin Wolf
  0 siblings, 2 replies; 34+ messages in thread
From: Eric Blake @ 2016-05-31 18:42 UTC (permalink / raw)
  To: Pavel Butsykin, Stefan Hajnoczi, Denis V. Lunev
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

On 05/30/2016 06:58 AM, Pavel Butsykin wrote:

> 
> Sorry, but it seems this will never happen, because the second write
> will not pass this check:
> 
> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>                                                uint64_t offset,
>                                                int compressed_size)
> {
>     ...
>     /* Compression can't overwrite anything. Fail if the cluster was
> already
>      * allocated. */
>     cluster_offset = be64_to_cpu(l2_table[l2_index]);
>     if (cluster_offset & L2E_OFFSET_MASK) {
>         qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>         return 0;
>     }
>    ...
> 
> As you can see we can't do the compressed write in the already allocated
> cluster.

Umm, doesn't that defeat the point of compression, if every compressed
cluster becomes the head of a new cluster?  The whole goal of
compression is to be able to fit multiple clusters within one.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-31 18:42         ` Eric Blake
@ 2016-05-31 21:00           ` Denis V. Lunev
  2016-05-31 21:13             ` Eric Blake
  2016-06-01  9:31           ` Kevin Wolf
  1 sibling, 1 reply; 34+ messages in thread
From: Denis V. Lunev @ 2016-05-31 21:00 UTC (permalink / raw)
  To: Eric Blake, Pavel Butsykin, Stefan Hajnoczi
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, John Snow

On 05/31/2016 09:42 PM, Eric Blake wrote:
> On 05/30/2016 06:58 AM, Pavel Butsykin wrote:
>
>> Sorry, but it seems this will never happen, because the second write
>> will not pass this check:
>>
>> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>>                                                 uint64_t offset,
>>                                                 int compressed_size)
>> {
>>      ...
>>      /* Compression can't overwrite anything. Fail if the cluster was
>> already
>>       * allocated. */
>>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
>>      if (cluster_offset & L2E_OFFSET_MASK) {
>>          qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>>          return 0;
>>      }
>>     ...
>>
>> As you can see we can't do the compressed write in the already allocated
>> cluster.
> Umm, doesn't that defeat the point of compression, if every compressed
> cluster becomes the head of a new cluster?  The whole goal of
> compression is to be able to fit multiple clusters within one.
>
AFAIK the file will be sparse in that unused areas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-31 21:00           ` Denis V. Lunev
@ 2016-05-31 21:13             ` Eric Blake
  2016-06-01  9:53               ` Pavel Butsykin
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-31 21:13 UTC (permalink / raw)
  To: Denis V. Lunev, Pavel Butsykin, Stefan Hajnoczi
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 1747 bytes --]

On 05/31/2016 03:00 PM, Denis V. Lunev wrote:
> On 05/31/2016 09:42 PM, Eric Blake wrote:
>> On 05/30/2016 06:58 AM, Pavel Butsykin wrote:
>>
>>> Sorry, but it seems this will never happen, because the second write
>>> will not pass this check:
>>>
>>> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>>>                                                 uint64_t offset,
>>>                                                 int compressed_size)
>>> {
>>>      ...
>>>      /* Compression can't overwrite anything. Fail if the cluster was
>>> already
>>>       * allocated. */
>>>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
>>>      if (cluster_offset & L2E_OFFSET_MASK) {
>>>          qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>>>          return 0;
>>>      }
>>>     ...
>>>
>>> As you can see we can't do the compressed write in the already allocated
>>> cluster.
>> Umm, doesn't that defeat the point of compression, if every compressed
>> cluster becomes the head of a new cluster?  The whole goal of
>> compression is to be able to fit multiple clusters within one.
>>
> AFAIK the file will be sparse in that unused areas

IIRC, on the NTFS file system, the minimum hole size is 64k. If you also
have 64k clusters, then you don't have a sparse file - every tail of
zero sectors will be explicit in the filesystem, if you are using 1:1
clusters for compression.  Other file systems may have finer granularity
for holes, but it's still rather awkward to be relying on sparseness
when a better solution is to pack compressed sectors consecutively.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-27 17:33   ` Stefan Hajnoczi
  2016-05-30  9:12     ` Pavel Butsykin
@ 2016-06-01  9:25     ` Kevin Wolf
  2016-06-01 20:06       ` Stefan Hajnoczi
  1 sibling, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2016-06-01  9:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Denis V. Lunev, qemu-devel, Pavel Butsykin, Jeff Cody,
	Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 2971 bytes --]

Am 27.05.2016 um 19:33 hat Stefan Hajnoczi geschrieben:
> On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote:
> > +    qemu_co_mutex_lock(&s->lock);
> > +    cluster_offset = \
> > +        qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len);
> 
> The backslash isn't necessary for wrapping lines in C.  This kind of
> thing is only necessary in languages like Python where the grammar is
> whitespace sensistive.
> 
> The C compiler is happy with an arbitrary amount of whitespace
> (newlines) in the middle of a statement.  The backslash in C is handled
> by the preprocessor: it joins the line.  That's useful for macro
> definitions where you need to tell the preprocessor that several lines
> belong to one macro definition.  But it's not needed for normal C code.
> 
> > +    if (!cluster_offset) {
> > +        qemu_co_mutex_unlock(&s->lock);
> > +        ret = -EIO;
> > +        goto fail;
> > +    }
> > +    cluster_offset &= s->cluster_offset_mask;
> >  
> > -        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
> > -        ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len);
> > -        if (ret < 0) {
> > -            goto fail;
> > -        }
> > +    ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
> > +    qemu_co_mutex_unlock(&s->lock);
> > +    if (ret < 0) {
> > +        goto fail;
> >      }
> >  
> > +    iov = (struct iovec) {
> > +        .iov_base   = out_buf,
> > +        .iov_len    = out_len,
> > +    };
> > +    qemu_iovec_init_external(&hd_qiov, &iov, 1);
> > +
> > +    BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
> > +    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);
> 
> There is a race condition here:
> 
> If the newly allocated cluster is only partially filled by compressed
> data then qcow2_alloc_compressed_cluster_offset() remembers that more
> bytes are still available in the cluster.  The
> qcow2_alloc_compressed_cluster_offset() caller will continue filling the
> same cluster.
> 
> Imagine two compressed writes running at the same time.  Write A
> allocates just a few bytes so write B shares a sector with the first
> write:
> 
>      Sector 1
>   |AAABBBBBBBBB|
> 
> The race condition is that bdrv_co_pwritev() uses read-modify-write (a
> bounce buffer).  If both requests call bdrv_co_pwritev() around the same
> time then the following could happen:
> 
>      Sector 1
>   |000BBBBBBBBB|
> 
> or:
> 
>      Sector 1
>   |AAA000000000|
> 
> It's necessary to hold s->lock around the compressed data write to avoid
> this race condition.

I don't think this race condition exists. bdrv_co_pwritev() can indeed
perform RMW if the lower layer requires alignment, but that's not
something callers need to care about (which would be an awful API) but
is fully handled there by marking requests serialising if they involve
RMW.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-31 18:42         ` Eric Blake
  2016-05-31 21:00           ` Denis V. Lunev
@ 2016-06-01  9:31           ` Kevin Wolf
  1 sibling, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-06-01  9:31 UTC (permalink / raw)
  To: Eric Blake
  Cc: Pavel Butsykin, Stefan Hajnoczi, Denis V. Lunev, Jeff Cody,
	qemu-devel, Markus Armbruster, Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 1728 bytes --]

Am 31.05.2016 um 20:42 hat Eric Blake geschrieben:
> On 05/30/2016 06:58 AM, Pavel Butsykin wrote:
> 
> > 
> > Sorry, but it seems this will never happen, because the second write
> > will not pass this check:
> > 
> > uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
> >                                                uint64_t offset,
> >                                                int compressed_size)
> > {
> >     ...
> >     /* Compression can't overwrite anything. Fail if the cluster was
> > already
> >      * allocated. */
> >     cluster_offset = be64_to_cpu(l2_table[l2_index]);
> >     if (cluster_offset & L2E_OFFSET_MASK) {
> >         qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> >         return 0;
> >     }
> >    ...
> > 
> > As you can see we can't do the compressed write in the already allocated
> > cluster.
> 
> Umm, doesn't that defeat the point of compression, if every compressed
> cluster becomes the head of a new cluster?  The whole goal of
> compression is to be able to fit multiple clusters within one.

With compression, be careful what kind of clusters you're looking at.
The clusters in the code above are guest clusters and you can't
overwrite a guest cluster that has already been written (I'm not sure
why, though - didn't matter so far because the only writer is qemu-img
convert, which writes the whole image sequentially and doesn't split
clusters across requests, but if we want to allow compressed writes in a
running VM, we may want to change this).

However, the compressed data of multiple guest clusters may end up in
the same host cluster, and that's indeed the whole point of compression.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-05-31 21:13             ` Eric Blake
@ 2016-06-01  9:53               ` Pavel Butsykin
  0 siblings, 0 replies; 34+ messages in thread
From: Pavel Butsykin @ 2016-06-01  9:53 UTC (permalink / raw)
  To: Eric Blake, Denis V. Lunev, Stefan Hajnoczi
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, John Snow

On 01.06.2016 00:13, Eric Blake wrote:
> On 05/31/2016 03:00 PM, Denis V. Lunev wrote:
>> On 05/31/2016 09:42 PM, Eric Blake wrote:
>>> On 05/30/2016 06:58 AM, Pavel Butsykin wrote:
>>>
>>>> Sorry, but it seems this will never happen, because the second write
>>>> will not pass this check:
>>>>
>>>> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>>>>                                                  uint64_t offset,
>>>>                                                  int compressed_size)
>>>> {
>>>>       ...
>>>>       /* Compression can't overwrite anything. Fail if the cluster was
>>>> already
>>>>        * allocated. */
>>>>       cluster_offset = be64_to_cpu(l2_table[l2_index]);
>>>>       if (cluster_offset & L2E_OFFSET_MASK) {
>>>>           qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>>>>           return 0;
>>>>       }
>>>>      ...
>>>>
>>>> As you can see we can't do the compressed write in the already allocated
>>>> cluster.
>>> Umm, doesn't that defeat the point of compression, if every compressed
>>> cluster becomes the head of a new cluster?  The whole goal of
>>> compression is to be able to fit multiple clusters within one.
>>>
>> AFAIK the file will be sparse in that unused areas
>
> IIRC, on the NTFS file system, the minimum hole size is 64k. If you also
> have 64k clusters, then you don't have a sparse file - every tail of
> zero sectors will be explicit in the filesystem, if you are using 1:1
> clusters for compression.  Other file systems may have finer granularity
> for holes, but it's still rather awkward to be relying on sparseness
> when a better solution is to pack compressed sectors consecutively.
>

Maybe you're right, I agree that the above mentioned issues are taking
place and it would be good to solve them, but it looks like the next
step. The solution that you proposed has a direct relation to the
improvement of the format driver that goes beyond the goals of this
patch-set, where the main goal is to provide opportunities compression
from the format drivers for the backup function.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed
  2016-06-01  9:25     ` Kevin Wolf
@ 2016-06-01 20:06       ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2016-06-01 20:06 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, Denis V. Lunev, qemu-devel, Pavel Butsykin,
	Jeff Cody, Markus Armbruster, John Snow

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

On Wed, Jun 01, 2016 at 11:25:57AM +0200, Kevin Wolf wrote:
> Am 27.05.2016 um 19:33 hat Stefan Hajnoczi geschrieben:
> > On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote:
> > > +    qemu_co_mutex_lock(&s->lock);
> > > +    cluster_offset = \
> > > +        qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len);
> > 
> > The backslash isn't necessary for wrapping lines in C.  This kind of
> > thing is only necessary in languages like Python where the grammar is
> > whitespace sensistive.
> > 
> > The C compiler is happy with an arbitrary amount of whitespace
> > (newlines) in the middle of a statement.  The backslash in C is handled
> > by the preprocessor: it joins the line.  That's useful for macro
> > definitions where you need to tell the preprocessor that several lines
> > belong to one macro definition.  But it's not needed for normal C code.
> > 
> > > +    if (!cluster_offset) {
> > > +        qemu_co_mutex_unlock(&s->lock);
> > > +        ret = -EIO;
> > > +        goto fail;
> > > +    }
> > > +    cluster_offset &= s->cluster_offset_mask;
> > >  
> > > -        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
> > > -        ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len);
> > > -        if (ret < 0) {
> > > -            goto fail;
> > > -        }
> > > +    ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
> > > +    qemu_co_mutex_unlock(&s->lock);
> > > +    if (ret < 0) {
> > > +        goto fail;
> > >      }
> > >  
> > > +    iov = (struct iovec) {
> > > +        .iov_base   = out_buf,
> > > +        .iov_len    = out_len,
> > > +    };
> > > +    qemu_iovec_init_external(&hd_qiov, &iov, 1);
> > > +
> > > +    BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
> > > +    ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0);
> > 
> > There is a race condition here:
> > 
> > If the newly allocated cluster is only partially filled by compressed
> > data then qcow2_alloc_compressed_cluster_offset() remembers that more
> > bytes are still available in the cluster.  The
> > qcow2_alloc_compressed_cluster_offset() caller will continue filling the
> > same cluster.
> > 
> > Imagine two compressed writes running at the same time.  Write A
> > allocates just a few bytes so write B shares a sector with the first
> > write:
> > 
> >      Sector 1
> >   |AAABBBBBBBBB|
> > 
> > The race condition is that bdrv_co_pwritev() uses read-modify-write (a
> > bounce buffer).  If both requests call bdrv_co_pwritev() around the same
> > time then the following could happen:
> > 
> >      Sector 1
> >   |000BBBBBBBBB|
> > 
> > or:
> > 
> >      Sector 1
> >   |AAA000000000|
> > 
> > It's necessary to hold s->lock around the compressed data write to avoid
> > this race condition.
> 
> I don't think this race condition exists. bdrv_co_pwritev() can indeed
> perform RMW if the lower layer requires alignment, but that's not
> something callers need to care about (which would be an awful API) but
> is fully handled there by marking requests serialising if they involve
> RMW.

Good point.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-06-01 20:06 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-14 12:45 [Qemu-devel] [PATCH v3 00/10] backup compression Denis V. Lunev
2016-05-14 12:45 ` [Qemu-devel] [PATCH 01/10] block/io: add bdrv_co_write_compressed Denis V. Lunev
2016-05-16 16:52   ` Eric Blake
2016-05-17 15:01     ` Pavel Butsykin
2016-05-19 21:25   ` Stefan Hajnoczi
2016-05-19 21:39     ` Denis V. Lunev
2016-05-14 12:45 ` [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed Denis V. Lunev
2016-05-27 17:33   ` Stefan Hajnoczi
2016-05-30  9:12     ` Pavel Butsykin
2016-05-30 12:58       ` Pavel Butsykin
2016-05-31 18:42         ` Eric Blake
2016-05-31 21:00           ` Denis V. Lunev
2016-05-31 21:13             ` Eric Blake
2016-06-01  9:53               ` Pavel Butsykin
2016-06-01  9:31           ` Kevin Wolf
2016-06-01  9:25     ` Kevin Wolf
2016-06-01 20:06       ` Stefan Hajnoczi
2016-05-14 12:45 ` [Qemu-devel] [PATCH 03/10] vmdk: add vmdk_co_write_compressed Denis V. Lunev
2016-05-27 17:38   ` Stefan Hajnoczi
2016-05-14 12:45 ` [Qemu-devel] [PATCH 04/10] qcow: add qcow_co_write_compressed Denis V. Lunev
2016-05-27 17:45   ` Stefan Hajnoczi
2016-05-30 14:27     ` Pavel Butsykin
2016-05-14 12:45 ` [Qemu-devel] [PATCH 05/10] block: remove BlockDriver.bdrv_write_compressed Denis V. Lunev
2016-05-16 16:57   ` Eric Blake
2016-05-17 12:22     ` Pavel Butsykin
2016-05-14 12:45 ` [Qemu-devel] [PATCH 06/10] drive-backup: added support for data compression Denis V. Lunev
2016-05-16 16:59   ` Eric Blake
2016-05-27 17:56   ` Stefan Hajnoczi
2016-05-14 12:45 ` [Qemu-devel] [PATCH 07/10] blockdev-backup: " Denis V. Lunev
2016-05-16 17:00   ` Eric Blake
2016-05-27 17:57   ` Stefan Hajnoczi
2016-05-14 12:45 ` [Qemu-devel] [PATCH 08/10] qemu-iotests: test backup compression in 055 Denis V. Lunev
2016-05-14 12:45 ` [Qemu-devel] [PATCH 09/10] block: fix backup in vmdk format image Denis V. Lunev
2016-05-27 18:01   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.