All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/10] block: Fix resize (extending) of short overlays
@ 2020-04-23 15:01 Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 01/10] block: Add flags to BlockDriver.bdrv_co_truncate() Kevin Wolf
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

v6:
- qcow2: Don't round up end offset [Eric]
- qcow2: Use different error messages for different error paths
- Request BDRV_REQ_ZERO_WRITE only if the backing file actually covers
  (part of) the new area (fix regression from v3) [Max]
- New patch: Forward ZERO_WRITE flag for full preallocation [Max]

v5:
- Fixed file-win32 [Patchew]
- Fixed zeroing in qcow2 for unaligned requests + tests [Vladimir]
- Made raw-format code more consistent [Eric]
- Leave output for existing iotests cases unchanged [Vladimir]

v4:
- Rewrote the series to move the actual zeroing to the block drivers so
  that error paths can work correctly and potentially long-running
  fallback to writing explicit zeroes is avoided.
- Fixed output filtering order in the test case [Max]

v3:
- Don't allow blocking the monitor for a zero write in block_resize
  (even though we can already blockfor other reasons there). This is
  mainly responsible for the increased complexity compared to v2.
  Personally, I think this is not an improvement over v2, but if this is
  what it takes to fix a corruption issue in 4.2... [Max]
- Don't use huge image files in the test case [Vladimir]

v2:
- Switched order of bs->total_sectors update and zero write [Vladimir]
- Fixed coding style [Vladimir]
- Changed the commit message to contain what was in the cover letter
- Test all preallocation modes
- Test allocation status with qemu-io 'map' [Vladimir]

*** BLURB HERE ***

Kevin Wolf (10):
  block: Add flags to BlockDriver.bdrv_co_truncate()
  block: Add flags to bdrv(_co)_truncate()
  block-backend: Add flags to blk_truncate()
  qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  raw-format: Support BDRV_REQ_ZERO_WRITE for truncate
  file-posix: Support BDRV_REQ_ZERO_WRITE for truncate
  block: truncate: Don't make backing file data visible
  iotests: Filter testfiles out in filter_img_info()
  iotests: Test committing to short backing file
  qcow2: Forward ZERO_WRITE flag for full preallocation

 include/block/block.h          |   5 +-
 include/block/block_int.h      |  10 +-
 include/sysemu/block-backend.h |   2 +-
 block.c                        |   3 +-
 block/block-backend.c          |   4 +-
 block/commit.c                 |   4 +-
 block/crypto.c                 |   7 +-
 block/file-posix.c             |   6 +-
 block/file-win32.c             |   2 +-
 block/gluster.c                |   1 +
 block/io.c                     |  42 +++++-
 block/iscsi.c                  |   2 +-
 block/mirror.c                 |   2 +-
 block/nfs.c                    |   3 +-
 block/parallels.c              |   6 +-
 block/qcow.c                   |   4 +-
 block/qcow2-cluster.c          |   2 +-
 block/qcow2-refcount.c         |   2 +-
 block/qcow2.c                  |  72 +++++++--
 block/qed.c                    |   3 +-
 block/raw-format.c             |   6 +-
 block/rbd.c                    |   1 +
 block/sheepdog.c               |   4 +-
 block/ssh.c                    |   2 +-
 block/vdi.c                    |   2 +-
 block/vhdx-log.c               |   2 +-
 block/vhdx.c                   |   6 +-
 block/vmdk.c                   |   8 +-
 block/vpc.c                    |   2 +-
 blockdev.c                     |   2 +-
 qemu-img.c                     |   2 +-
 qemu-io-cmds.c                 |   2 +-
 tests/test-block-iothread.c    |   9 +-
 tests/qemu-iotests/iotests.py  |   5 +-
 tests/qemu-iotests/274         | 157 ++++++++++++++++++++
 tests/qemu-iotests/274.out     | 260 +++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group       |   1 +
 37 files changed, 589 insertions(+), 64 deletions(-)
 create mode 100755 tests/qemu-iotests/274
 create mode 100644 tests/qemu-iotests/274.out

-- 
2.25.3



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 01/10] block: Add flags to BlockDriver.bdrv_co_truncate()
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 02/10] block: Add flags to bdrv(_co)_truncate() Kevin Wolf
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate()
driver callbacks, and a supported_truncate_flags field in
BlockDriverState that allows drivers to advertise support for request
flags in the context of truncate.

For now, we always pass 0 and no drivers declare support for any flag.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h   | 10 +++++++++-
 block/crypto.c              |  3 ++-
 block/file-posix.c          |  2 +-
 block/file-win32.c          |  2 +-
 block/gluster.c             |  1 +
 block/io.c                  |  8 +++++++-
 block/iscsi.c               |  2 +-
 block/nfs.c                 |  3 ++-
 block/qcow2.c               |  2 +-
 block/qed.c                 |  1 +
 block/raw-format.c          |  2 +-
 block/rbd.c                 |  1 +
 block/sheepdog.c            |  4 ++--
 block/ssh.c                 |  2 +-
 tests/test-block-iothread.c |  3 ++-
 15 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4c3587ea19..92335f33c7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -355,7 +355,7 @@ struct BlockDriver {
      */
     int coroutine_fn (*bdrv_co_truncate)(BlockDriverState *bs, int64_t offset,
                                          bool exact, PreallocMode prealloc,
-                                         Error **errp);
+                                         BdrvRequestFlags flags, Error **errp);
 
     int64_t (*bdrv_getlength)(BlockDriverState *bs);
     bool has_variable_length;
@@ -847,6 +847,14 @@ struct BlockDriverState {
     /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
      * BDRV_REQ_MAY_UNMAP, BDRV_REQ_WRITE_UNCHANGED) */
     unsigned int supported_zero_flags;
+    /*
+     * Flags honoured during truncate (so far: BDRV_REQ_ZERO_WRITE).
+     *
+     * If BDRV_REQ_ZERO_WRITE is given, the truncate operation must make sure
+     * that any added space reads as all zeros. If this can't be guaranteed,
+     * the operation must fail.
+     */
+    unsigned int supported_truncate_flags;
 
     /* the following member gives a name to every node on the bs graph. */
     char node_name[32];
diff --git a/block/crypto.c b/block/crypto.c
index d577f89659..3721a8495c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -299,7 +299,8 @@ static int block_crypto_co_create_generic(BlockDriverState *bs,
 
 static int coroutine_fn
 block_crypto_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
-                         PreallocMode prealloc, Error **errp)
+                         PreallocMode prealloc, BdrvRequestFlags flags,
+                         Error **errp)
 {
     BlockCrypto *crypto = bs->opaque;
     uint64_t payload_offset =
diff --git a/block/file-posix.c b/block/file-posix.c
index 7e19bbff5f..53f475ed61 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2080,7 +2080,7 @@ raw_regular_truncate(BlockDriverState *bs, int fd, int64_t offset,
 
 static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
                                         bool exact, PreallocMode prealloc,
-                                        Error **errp)
+                                        BdrvRequestFlags flags, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
     struct stat st;
diff --git a/block/file-win32.c b/block/file-win32.c
index 15859839a1..a6b0dda5c3 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -469,7 +469,7 @@ static void raw_close(BlockDriverState *bs)
 
 static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
                                         bool exact, PreallocMode prealloc,
-                                        Error **errp)
+                                        BdrvRequestFlags flags, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
     LONG low, high;
diff --git a/block/gluster.c b/block/gluster.c
index 0aa1f2cda4..d06df900f6 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1228,6 +1228,7 @@ static coroutine_fn int qemu_gluster_co_truncate(BlockDriverState *bs,
                                                  int64_t offset,
                                                  bool exact,
                                                  PreallocMode prealloc,
+                                                 BdrvRequestFlags flags,
                                                  Error **errp)
 {
     BDRVGlusterState *s = bs->opaque;
diff --git a/block/io.c b/block/io.c
index aba67f66b9..04ac5cf023 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3344,6 +3344,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
     BlockDriverState *bs = child->bs;
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;
+    BdrvRequestFlags flags = 0;
     int64_t old_size, new_bytes;
     int ret;
 
@@ -3394,7 +3395,12 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
     }
 
     if (drv->bdrv_co_truncate) {
-        ret = drv->bdrv_co_truncate(bs, offset, exact, prealloc, errp);
+        if (flags & ~bs->supported_truncate_flags) {
+            error_setg(errp, "Block driver does not support requested flags");
+            ret = -ENOTSUP;
+            goto out;
+        }
+        ret = drv->bdrv_co_truncate(bs, offset, exact, prealloc, flags, errp);
     } else if (bs->file && drv->is_filter) {
         ret = bdrv_co_truncate(bs->file, offset, exact, prealloc, errp);
     } else {
diff --git a/block/iscsi.c b/block/iscsi.c
index 0b4b7210df..914a1de9fb 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2124,7 +2124,7 @@ static void iscsi_reopen_commit(BDRVReopenState *reopen_state)
 
 static int coroutine_fn iscsi_co_truncate(BlockDriverState *bs, int64_t offset,
                                           bool exact, PreallocMode prealloc,
-                                          Error **errp)
+                                          BdrvRequestFlags flags, Error **errp)
 {
     IscsiLun *iscsilun = bs->opaque;
     int64_t cur_length;
diff --git a/block/nfs.c b/block/nfs.c
index cc2413d5ab..2393fbfe6b 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -755,7 +755,8 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
 
 static int coroutine_fn
 nfs_file_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
-                     PreallocMode prealloc, Error **errp)
+                     PreallocMode prealloc, BdrvRequestFlags flags,
+                     Error **errp)
 {
     NFSClient *client = bs->opaque;
     int ret;
diff --git a/block/qcow2.c b/block/qcow2.c
index b524b0c53f..0b406b22fb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3964,7 +3964,7 @@ fail:
 
 static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
                                           bool exact, PreallocMode prealloc,
-                                          Error **errp)
+                                          BdrvRequestFlags flags, Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t old_length;
diff --git a/block/qed.c b/block/qed.c
index 1af9b3cb1d..fb6100bd20 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1467,6 +1467,7 @@ static int coroutine_fn bdrv_qed_co_truncate(BlockDriverState *bs,
                                              int64_t offset,
                                              bool exact,
                                              PreallocMode prealloc,
+                                             BdrvRequestFlags flags,
                                              Error **errp)
 {
     BDRVQEDState *s = bs->opaque;
diff --git a/block/raw-format.c b/block/raw-format.c
index 93b25e1b6b..9331368f43 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -371,7 +371,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 
 static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
                                         bool exact, PreallocMode prealloc,
-                                        Error **errp)
+                                        BdrvRequestFlags flags, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
 
diff --git a/block/rbd.c b/block/rbd.c
index e637639a07..f2d52091c7 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1108,6 +1108,7 @@ static int coroutine_fn qemu_rbd_co_truncate(BlockDriverState *bs,
                                              int64_t offset,
                                              bool exact,
                                              PreallocMode prealloc,
+                                             BdrvRequestFlags flags,
                                              Error **errp)
 {
     int r;
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 59f7ebb171..ef0a6e743e 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2288,7 +2288,7 @@ static int64_t sd_getlength(BlockDriverState *bs)
 
 static int coroutine_fn sd_co_truncate(BlockDriverState *bs, int64_t offset,
                                        bool exact, PreallocMode prealloc,
-                                       Error **errp)
+                                       BdrvRequestFlags flags, Error **errp)
 {
     BDRVSheepdogState *s = bs->opaque;
     int ret, fd;
@@ -2604,7 +2604,7 @@ static coroutine_fn int sd_co_writev(BlockDriverState *bs, int64_t sector_num,
 
     assert(!flags);
     if (offset > s->inode.vdi_size) {
-        ret = sd_co_truncate(bs, offset, false, PREALLOC_MODE_OFF, NULL);
+        ret = sd_co_truncate(bs, offset, false, PREALLOC_MODE_OFF, 0, NULL);
         if (ret < 0) {
             return ret;
         }
diff --git a/block/ssh.c b/block/ssh.c
index 84e92821c0..9eb33df859 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -1298,7 +1298,7 @@ static int64_t ssh_getlength(BlockDriverState *bs)
 
 static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
                                         bool exact, PreallocMode prealloc,
-                                        Error **errp)
+                                        BdrvRequestFlags flags, Error **errp)
 {
     BDRVSSHState *s = bs->opaque;
 
diff --git a/tests/test-block-iothread.c b/tests/test-block-iothread.c
index 0c861809f0..2f3b76323d 100644
--- a/tests/test-block-iothread.c
+++ b/tests/test-block-iothread.c
@@ -46,7 +46,8 @@ static int coroutine_fn bdrv_test_co_pdiscard(BlockDriverState *bs,
 
 static int coroutine_fn
 bdrv_test_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
-                      PreallocMode prealloc, Error **errp)
+                      PreallocMode prealloc, BdrvRequestFlags flags,
+                      Error **errp)
 {
     return 0;
 }
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 02/10] block: Add flags to bdrv(_co)_truncate()
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 01/10] block: Add flags to BlockDriver.bdrv_co_truncate() Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 03/10] block-backend: Add flags to blk_truncate() Kevin Wolf
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

Now that block drivers can support flags for .bdrv_co_truncate, expose
the parameter in the node level interfaces bdrv_co_truncate() and
bdrv_truncate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h       |  5 +++--
 block/block-backend.c       |  2 +-
 block/crypto.c              |  2 +-
 block/io.c                  | 12 +++++++-----
 block/parallels.c           |  6 +++---
 block/qcow.c                |  4 ++--
 block/qcow2-refcount.c      |  2 +-
 block/qcow2.c               | 15 +++++++++------
 block/raw-format.c          |  2 +-
 block/vhdx-log.c            |  2 +-
 block/vhdx.c                |  2 +-
 block/vmdk.c                |  2 +-
 tests/test-block-iothread.c |  6 +++---
 13 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index b05995fe9c..8b62429aa4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -339,9 +339,10 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
 void bdrv_refresh_filename(BlockDriverState *bs);
 
 int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
-                                  PreallocMode prealloc, Error **errp);
+                                  PreallocMode prealloc, BdrvRequestFlags flags,
+                                  Error **errp);
 int bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
-                  PreallocMode prealloc, Error **errp);
+                  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
 
 int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
diff --git a/block/block-backend.c b/block/block-backend.c
index 38ae413826..8be20060d3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2144,7 +2144,7 @@ int blk_truncate(BlockBackend *blk, int64_t offset, bool exact,
         return -ENOMEDIUM;
     }
 
-    return bdrv_truncate(blk->root, offset, exact, prealloc, errp);
+    return bdrv_truncate(blk->root, offset, exact, prealloc, 0, errp);
 }
 
 int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
diff --git a/block/crypto.c b/block/crypto.c
index 3721a8495c..ab33545c92 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -313,7 +313,7 @@ block_crypto_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
 
     offset += payload_offset;
 
-    return bdrv_co_truncate(bs->file, offset, exact, prealloc, errp);
+    return bdrv_co_truncate(bs->file, offset, exact, prealloc, 0, errp);
 }
 
 static void block_crypto_close(BlockDriverState *bs)
diff --git a/block/io.c b/block/io.c
index 04ac5cf023..795075954e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3339,12 +3339,12 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
  * 'offset' bytes in length.
  */
 int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
-                                  PreallocMode prealloc, Error **errp)
+                                  PreallocMode prealloc, BdrvRequestFlags flags,
+                                  Error **errp)
 {
     BlockDriverState *bs = child->bs;
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;
-    BdrvRequestFlags flags = 0;
     int64_t old_size, new_bytes;
     int ret;
 
@@ -3402,7 +3402,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
         }
         ret = drv->bdrv_co_truncate(bs, offset, exact, prealloc, flags, errp);
     } else if (bs->file && drv->is_filter) {
-        ret = bdrv_co_truncate(bs->file, offset, exact, prealloc, errp);
+        ret = bdrv_co_truncate(bs->file, offset, exact, prealloc, flags, errp);
     } else {
         error_setg(errp, "Image format driver does not support resize");
         ret = -ENOTSUP;
@@ -3435,6 +3435,7 @@ typedef struct TruncateCo {
     int64_t offset;
     bool exact;
     PreallocMode prealloc;
+    BdrvRequestFlags flags;
     Error **errp;
     int ret;
 } TruncateCo;
@@ -3443,12 +3444,12 @@ static void coroutine_fn bdrv_truncate_co_entry(void *opaque)
 {
     TruncateCo *tco = opaque;
     tco->ret = bdrv_co_truncate(tco->child, tco->offset, tco->exact,
-                                tco->prealloc, tco->errp);
+                                tco->prealloc, tco->flags, tco->errp);
     aio_wait_kick();
 }
 
 int bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
-                  PreallocMode prealloc, Error **errp)
+                  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp)
 {
     Coroutine *co;
     TruncateCo tco = {
@@ -3456,6 +3457,7 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
         .offset     = offset,
         .exact      = exact,
         .prealloc   = prealloc,
+        .flags      = flags,
         .errp       = errp,
         .ret        = NOT_DONE,
     };
diff --git a/block/parallels.c b/block/parallels.c
index 6d4ed77f16..2be92cf417 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -203,7 +203,7 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
         } else {
             ret = bdrv_truncate(bs->file,
                                 (s->data_end + space) << BDRV_SECTOR_BITS,
-                                false, PREALLOC_MODE_OFF, NULL);
+                                false, PREALLOC_MODE_OFF, 0, NULL);
         }
         if (ret < 0) {
             return ret;
@@ -493,7 +493,7 @@ static int coroutine_fn parallels_co_check(BlockDriverState *bs,
              * That means we have to pass exact=true.
              */
             ret = bdrv_truncate(bs->file, res->image_end_offset, true,
-                                PREALLOC_MODE_OFF, &local_err);
+                                PREALLOC_MODE_OFF, 0, &local_err);
             if (ret < 0) {
                 error_report_err(local_err);
                 res->check_errors++;
@@ -889,7 +889,7 @@ static void parallels_close(BlockDriverState *bs)
 
         /* errors are ignored, so we might as well pass exact=true */
         bdrv_truncate(bs->file, s->data_end << BDRV_SECTOR_BITS, true,
-                      PREALLOC_MODE_OFF, NULL);
+                      PREALLOC_MODE_OFF, 0, NULL);
     }
 
     g_free(s->bat_dirty_bmap);
diff --git a/block/qcow.c b/block/qcow.c
index 8973e4e565..6b5f2269f0 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -480,7 +480,7 @@ static int get_cluster_offset(BlockDriverState *bs,
                     return -E2BIG;
                 }
                 ret = bdrv_truncate(bs->file, cluster_offset + s->cluster_size,
-                                    false, PREALLOC_MODE_OFF, NULL);
+                                    false, PREALLOC_MODE_OFF, 0, NULL);
                 if (ret < 0) {
                     return ret;
                 }
@@ -1035,7 +1035,7 @@ static int qcow_make_empty(BlockDriverState *bs)
             l1_length) < 0)
         return -1;
     ret = bdrv_truncate(bs->file, s->l1_table_offset + l1_length, false,
-                        PREALLOC_MODE_OFF, NULL);
+                        PREALLOC_MODE_OFF, 0, NULL);
     if (ret < 0)
         return ret;
 
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7ef1c0e42a..d9650b9b6c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2018,7 +2018,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                 }
 
                 ret = bdrv_truncate(bs->file, offset + s->cluster_size, false,
-                                    PREALLOC_MODE_OFF, &local_err);
+                                    PREALLOC_MODE_OFF, 0, &local_err);
                 if (ret < 0) {
                     error_report_err(local_err);
                     goto resize_fail;
diff --git a/block/qcow2.c b/block/qcow2.c
index 0b406b22fb..c5b0711357 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3095,7 +3095,7 @@ static int coroutine_fn preallocate_co(BlockDriverState *bs, uint64_t offset,
             mode = PREALLOC_MODE_OFF;
         }
         ret = bdrv_co_truncate(s->data_file, host_offset + cur_bytes, false,
-                               mode, errp);
+                               mode, 0, errp);
         if (ret < 0) {
             return ret;
         }
@@ -4061,7 +4061,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
              * always fulfilled, so there is no need to pass it on.)
              */
             bdrv_co_truncate(bs->file, (last_cluster + 1) * s->cluster_size,
-                             false, PREALLOC_MODE_OFF, &local_err);
+                             false, PREALLOC_MODE_OFF, 0, &local_err);
             if (local_err) {
                 warn_reportf_err(local_err,
                                  "Failed to truncate the tail of the image: ");
@@ -4083,7 +4083,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
              * file should be resized to the exact target size, too,
              * so we pass @exact here.
              */
-            ret = bdrv_co_truncate(s->data_file, offset, exact, prealloc, errp);
+            ret = bdrv_co_truncate(s->data_file, offset, exact, prealloc, 0,
+                                   errp);
             if (ret < 0) {
                 goto fail;
             }
@@ -4169,7 +4170,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         new_file_size = allocation_start +
                         nb_new_data_clusters * s->cluster_size;
         /* Image file grows, so @exact does not matter */
-        ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, errp);
+        ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
+                               errp);
         if (ret < 0) {
             error_prepend(errp, "Failed to resize underlying file: ");
             qcow2_free_clusters(bs, allocation_start,
@@ -4348,7 +4350,8 @@ qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
         if (len < 0) {
             return len;
         }
-        return bdrv_co_truncate(bs->file, len, false, PREALLOC_MODE_OFF, NULL);
+        return bdrv_co_truncate(bs->file, len, false, PREALLOC_MODE_OFF, 0,
+                                NULL);
     }
 
     if (offset_into_cluster(s, offset)) {
@@ -4563,7 +4566,7 @@ static int make_completely_empty(BlockDriverState *bs)
     }
 
     ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size, false,
-                        PREALLOC_MODE_OFF, &local_err);
+                        PREALLOC_MODE_OFF, 0, &local_err);
     if (ret < 0) {
         error_report_err(local_err);
         goto fail;
diff --git a/block/raw-format.c b/block/raw-format.c
index 9331368f43..3465c9a865 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -387,7 +387,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
 
     s->size = offset;
     offset += s->offset;
-    return bdrv_co_truncate(bs->file, offset, exact, prealloc, errp);
+    return bdrv_co_truncate(bs->file, offset, exact, prealloc, 0, errp);
 }
 
 static void raw_eject(BlockDriverState *bs, bool eject_flag)
diff --git a/block/vhdx-log.c b/block/vhdx-log.c
index 13a49c2a33..404fb5f3cb 100644
--- a/block/vhdx-log.c
+++ b/block/vhdx-log.c
@@ -558,7 +558,7 @@ static int vhdx_log_flush(BlockDriverState *bs, BDRVVHDXState *s,
                     goto exit;
                 }
                 ret = bdrv_truncate(bs->file, new_file_size, false,
-                                    PREALLOC_MODE_OFF, NULL);
+                                    PREALLOC_MODE_OFF, 0, NULL);
                 if (ret < 0) {
                     goto exit;
                 }
diff --git a/block/vhdx.c b/block/vhdx.c
index 33e57cd656..5dfbb2029a 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1264,7 +1264,7 @@ static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
     }
 
     return bdrv_truncate(bs->file, *new_offset + s->block_size, false,
-                         PREALLOC_MODE_OFF, NULL);
+                         PREALLOC_MODE_OFF, 0, NULL);
 }
 
 /*
diff --git a/block/vmdk.c b/block/vmdk.c
index 218d9c9800..5de99fe813 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2077,7 +2077,7 @@ vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
             }
             length = QEMU_ALIGN_UP(length, BDRV_SECTOR_SIZE);
             ret = bdrv_truncate(s->extents[i].file, length, false,
-                                PREALLOC_MODE_OFF, NULL);
+                                PREALLOC_MODE_OFF, 0, NULL);
             if (ret < 0) {
                 return ret;
             }
diff --git a/tests/test-block-iothread.c b/tests/test-block-iothread.c
index 2f3b76323d..71e9bce3b1 100644
--- a/tests/test-block-iothread.c
+++ b/tests/test-block-iothread.c
@@ -186,18 +186,18 @@ static void test_sync_op_truncate(BdrvChild *c)
     int ret;
 
     /* Normal success path */
-    ret = bdrv_truncate(c, 65536, false, PREALLOC_MODE_OFF, NULL);
+    ret = bdrv_truncate(c, 65536, false, PREALLOC_MODE_OFF, 0, NULL);
     g_assert_cmpint(ret, ==, 0);
 
     /* Early error: Negative offset */
-    ret = bdrv_truncate(c, -2, false, PREALLOC_MODE_OFF, NULL);
+    ret = bdrv_truncate(c, -2, false, PREALLOC_MODE_OFF, 0, NULL);
     g_assert_cmpint(ret, ==, -EINVAL);
 
     /* Error: Read-only image */
     c->bs->read_only = true;
     c->bs->open_flags &= ~BDRV_O_RDWR;
 
-    ret = bdrv_truncate(c, 65536, false, PREALLOC_MODE_OFF, NULL);
+    ret = bdrv_truncate(c, 65536, false, PREALLOC_MODE_OFF, 0, NULL);
     g_assert_cmpint(ret, ==, -EACCES);
 
     c->bs->read_only = false;
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 03/10] block-backend: Add flags to blk_truncate()
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 01/10] block: Add flags to BlockDriver.bdrv_co_truncate() Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 02/10] block: Add flags to bdrv(_co)_truncate() Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Kevin Wolf
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

Now that node level interface bdrv_truncate() supports passing request
flags to the block driver, expose this on the BlockBackend level, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/sysemu/block-backend.h | 2 +-
 block.c                        | 3 ++-
 block/block-backend.c          | 4 ++--
 block/commit.c                 | 4 ++--
 block/crypto.c                 | 2 +-
 block/mirror.c                 | 2 +-
 block/qcow2.c                  | 4 ++--
 block/qed.c                    | 2 +-
 block/vdi.c                    | 2 +-
 block/vhdx.c                   | 4 ++--
 block/vmdk.c                   | 6 +++---
 block/vpc.c                    | 2 +-
 blockdev.c                     | 2 +-
 qemu-img.c                     | 2 +-
 qemu-io-cmds.c                 | 2 +-
 15 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9bbdbd63d7..34de7faa81 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -237,7 +237,7 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
 int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
                           int bytes);
 int blk_truncate(BlockBackend *blk, int64_t offset, bool exact,
-                 PreallocMode prealloc, Error **errp);
+                 PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
 int blk_pdiscard(BlockBackend *blk, int64_t offset, int bytes);
 int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
                      int64_t pos, int size);
diff --git a/block.c b/block.c
index 2e3905c99e..03cc5813a2 100644
--- a/block.c
+++ b/block.c
@@ -548,7 +548,8 @@ static int64_t create_file_fallback_truncate(BlockBackend *blk,
     int64_t size;
     int ret;
 
-    ret = blk_truncate(blk, minimum_size, false, PREALLOC_MODE_OFF, &local_err);
+    ret = blk_truncate(blk, minimum_size, false, PREALLOC_MODE_OFF, 0,
+                       &local_err);
     if (ret < 0 && ret != -ENOTSUP) {
         error_propagate(errp, local_err);
         return ret;
diff --git a/block/block-backend.c b/block/block-backend.c
index 8be20060d3..17ed6d8c5b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2137,14 +2137,14 @@ int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
 }
 
 int blk_truncate(BlockBackend *blk, int64_t offset, bool exact,
-                 PreallocMode prealloc, Error **errp)
+                 PreallocMode prealloc, BdrvRequestFlags flags, Error **errp)
 {
     if (!blk_is_available(blk)) {
         error_setg(errp, "No medium inserted");
         return -ENOMEDIUM;
     }
 
-    return bdrv_truncate(blk->root, offset, exact, prealloc, 0, errp);
+    return bdrv_truncate(blk->root, offset, exact, prealloc, flags, errp);
 }
 
 int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
diff --git a/block/commit.c b/block/commit.c
index 8e672799af..87f6096d90 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -133,7 +133,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
     }
 
     if (base_len < len) {
-        ret = blk_truncate(s->base, len, false, PREALLOC_MODE_OFF, NULL);
+        ret = blk_truncate(s->base, len, false, PREALLOC_MODE_OFF, 0, NULL);
         if (ret) {
             goto out;
         }
@@ -458,7 +458,7 @@ int bdrv_commit(BlockDriverState *bs)
      * grow the backing file image if possible.  If not possible,
      * we must return an error */
     if (length > backing_length) {
-        ret = blk_truncate(backing, length, false, PREALLOC_MODE_OFF,
+        ret = blk_truncate(backing, length, false, PREALLOC_MODE_OFF, 0,
                            &local_err);
         if (ret < 0) {
             error_report_err(local_err);
diff --git a/block/crypto.c b/block/crypto.c
index ab33545c92..e02f343590 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -115,7 +115,7 @@ static ssize_t block_crypto_init_func(QCryptoBlock *block,
      * which will be used by the crypto header
      */
     return blk_truncate(data->blk, data->size + headerlen, false,
-                        data->prealloc, errp);
+                        data->prealloc, 0, errp);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index c26fd9260d..aca95c9bc9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -900,7 +900,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 
         if (s->bdev_length > base_length) {
             ret = blk_truncate(s->target, s->bdev_length, false,
-                               PREALLOC_MODE_OFF, NULL);
+                               PREALLOC_MODE_OFF, 0, NULL);
             if (ret < 0) {
                 goto immediate_exit;
             }
diff --git a/block/qcow2.c b/block/qcow2.c
index c5b0711357..9cfbdfc939 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3511,7 +3511,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
 
     /* Okay, now that we have a valid image, let's give it the right size */
     ret = blk_truncate(blk, qcow2_opts->size, false, qcow2_opts->preallocation,
-                       errp);
+                       0, errp);
     if (ret < 0) {
         error_prepend(errp, "Could not resize image: ");
         goto out;
@@ -5374,7 +5374,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
          * Amending image options should ensure that the image has
          * exactly the given new values, so pass exact=true here.
          */
-        ret = blk_truncate(blk, new_size, true, PREALLOC_MODE_OFF, errp);
+        ret = blk_truncate(blk, new_size, true, PREALLOC_MODE_OFF, 0, errp);
         blk_unref(blk);
         if (ret < 0) {
             return ret;
diff --git a/block/qed.c b/block/qed.c
index fb6100bd20..b0fdb8f565 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -677,7 +677,7 @@ static int coroutine_fn bdrv_qed_co_create(BlockdevCreateOptions *opts,
      * The QED format associates file length with allocation status,
      * so a new file (which is empty) must have a length of 0.
      */
-    ret = blk_truncate(blk, 0, true, PREALLOC_MODE_OFF, errp);
+    ret = blk_truncate(blk, 0, true, PREALLOC_MODE_OFF, 0, errp);
     if (ret < 0) {
         goto out;
     }
diff --git a/block/vdi.c b/block/vdi.c
index e1a11f2aa0..0c7835ae70 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -875,7 +875,7 @@ static int coroutine_fn vdi_co_do_create(BlockdevCreateOptions *create_options,
 
     if (image_type == VDI_TYPE_STATIC) {
         ret = blk_truncate(blk, offset + blocks * block_size, false,
-                           PREALLOC_MODE_OFF, errp);
+                           PREALLOC_MODE_OFF, 0, errp);
         if (ret < 0) {
             error_prepend(errp, "Failed to statically allocate file");
             goto exit;
diff --git a/block/vhdx.c b/block/vhdx.c
index 5dfbb2029a..21497f7318 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1703,13 +1703,13 @@ static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
         /* All zeroes, so we can just extend the file - the end of the BAT
          * is the furthest thing we have written yet */
         ret = blk_truncate(blk, data_file_offset, false, PREALLOC_MODE_OFF,
-                           errp);
+                           0, errp);
         if (ret < 0) {
             goto exit;
         }
     } else if (type == VHDX_TYPE_FIXED) {
         ret = blk_truncate(blk, data_file_offset + image_size, false,
-                           PREALLOC_MODE_OFF, errp);
+                           PREALLOC_MODE_OFF, 0, errp);
         if (ret < 0) {
             goto exit;
         }
diff --git a/block/vmdk.c b/block/vmdk.c
index 5de99fe813..8ec18f35a5 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2118,7 +2118,7 @@ static int vmdk_init_extent(BlockBackend *blk,
     int gd_buf_size;
 
     if (flat) {
-        ret = blk_truncate(blk, filesize, false, PREALLOC_MODE_OFF, errp);
+        ret = blk_truncate(blk, filesize, false, PREALLOC_MODE_OFF, 0, errp);
         goto exit;
     }
     magic = cpu_to_be32(VMDK4_MAGIC);
@@ -2182,7 +2182,7 @@ static int vmdk_init_extent(BlockBackend *blk,
     }
 
     ret = blk_truncate(blk, le64_to_cpu(header.grain_offset) << 9, false,
-                       PREALLOC_MODE_OFF, errp);
+                       PREALLOC_MODE_OFF, 0, errp);
     if (ret < 0) {
         goto exit;
     }
@@ -2523,7 +2523,7 @@ static int coroutine_fn vmdk_co_do_create(int64_t size,
     /* bdrv_pwrite write padding zeros to align to sector, we don't need that
      * for description file */
     if (desc_offset == 0) {
-        ret = blk_truncate(blk, desc_len, false, PREALLOC_MODE_OFF, errp);
+        ret = blk_truncate(blk, desc_len, false, PREALLOC_MODE_OFF, 0, errp);
         if (ret < 0) {
             goto exit;
         }
diff --git a/block/vpc.c b/block/vpc.c
index d8141b52da..2d1eade146 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -898,7 +898,7 @@ static int create_fixed_disk(BlockBackend *blk, uint8_t *buf,
     /* Add footer to total size */
     total_size += HEADER_SIZE;
 
-    ret = blk_truncate(blk, total_size, false, PREALLOC_MODE_OFF, errp);
+    ret = blk_truncate(blk, total_size, false, PREALLOC_MODE_OFF, 0, errp);
     if (ret < 0) {
         return ret;
     }
diff --git a/blockdev.c b/blockdev.c
index 5faddaa705..fbe3a06dbf 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2741,7 +2741,7 @@ void qmp_block_resize(bool has_device, const char *device,
     }
 
     bdrv_drained_begin(bs);
-    ret = blk_truncate(blk, size, false, PREALLOC_MODE_OFF, errp);
+    ret = blk_truncate(blk, size, false, PREALLOC_MODE_OFF, 0, errp);
     bdrv_drained_end(bs);
 
 out:
diff --git a/qemu-img.c b/qemu-img.c
index 821cbf610e..7f52742ef2 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3866,7 +3866,7 @@ static int img_resize(int argc, char **argv)
      * resizing, so pass @exact=true.  It is of no use to report
      * success when the image has not actually been resized.
      */
-    ret = blk_truncate(blk, total_size, true, prealloc, &err);
+    ret = blk_truncate(blk, total_size, true, prealloc, 0, &err);
     if (!ret) {
         qprintf(quiet, "Image resized.\n");
     } else {
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 1b7e700020..851f07e8f8 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1715,7 +1715,7 @@ static int truncate_f(BlockBackend *blk, int argc, char **argv)
      * exact=true.  It is better to err on the "emit more errors" side
      * than to be overly permissive.
      */
-    ret = blk_truncate(blk, offset, true, PREALLOC_MODE_OFF, &local_err);
+    ret = blk_truncate(blk, offset, true, PREALLOC_MODE_OFF, 0, &local_err);
     if (ret < 0) {
         error_report_err(local_err);
         return ret;
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
                   ` (2 preceding siblings ...)
  2020-04-23 15:01 ` [PATCH v6 03/10] block-backend: Add flags to blk_truncate() Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-23 15:18   ` Eric Blake
  2020-04-24  6:16   ` Vladimir Sementsov-Ogievskiy
  2020-04-23 15:01 ` [PATCH v6 05/10] raw-format: " Kevin Wolf
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c |  2 +-
 block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 17f1363279..4b5fc8c4a7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
-           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 9cfbdfc939..ad621fe404 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
 
     bs->supported_zero_flags = header.version >= 3 ?
                                BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 
     /* Repair image if dirty */
     if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         g_assert_not_reached();
     }
 
+    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
+        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
+
+        /*
+         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
+         * requires a cluster-aligned start. The end may be unaligned if it is
+         * at the end of the image (which it is here).
+         */
+        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
+            goto fail;
+        }
+
+        /* Write explicit zeros for the unaligned head */
+        if (zero_start > old_length) {
+            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
+            QEMUIOVector qiov;
+            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
+
+            qemu_co_mutex_unlock(&s->lock);
+            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
+            qemu_co_mutex_lock(&s->lock);
+
+            qemu_vfree(buf);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to zero out the new area");
+                goto fail;
+            }
+        }
+    }
+
     if (prealloc != PREALLOC_MODE_OFF) {
         /* Flush metadata before actually changing the image size */
         ret = qcow2_write_caches(bs);
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 05/10] raw-format: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
                   ` (3 preceding siblings ...)
  2020-04-23 15:01 ` [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-24  6:21   ` Vladimir Sementsov-Ogievskiy
  2020-04-23 15:01 ` [PATCH v6 06/10] file-posix: " Kevin Wolf
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

The raw format driver can simply forward the flag and let its bs->file
child take care of actually providing the zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/raw-format.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index 3465c9a865..351f2d91c6 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -387,7 +387,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
 
     s->size = offset;
     offset += s->offset;
-    return bdrv_co_truncate(bs->file, offset, exact, prealloc, 0, errp);
+    return bdrv_co_truncate(bs->file, offset, exact, prealloc, flags, errp);
 }
 
 static void raw_eject(BlockDriverState *bs, bool eject_flag)
@@ -445,6 +445,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
     bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
         ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
             bs->file->bs->supported_zero_flags);
+    bs->supported_truncate_flags = bs->file->bs->supported_truncate_flags &
+                                   BDRV_REQ_ZERO_WRITE;
 
     if (bs->probed && !bdrv_is_read_only(bs)) {
         bdrv_refresh_filename(bs->file->bs);
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 06/10] file-posix: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
                   ` (4 preceding siblings ...)
  2020-04-23 15:01 ` [PATCH v6 05/10] raw-format: " Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 07/10] block: truncate: Don't make backing file data visible Kevin Wolf
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

For regular files, we always get BDRV_REQ_ZERO_WRITE behaviour from the
OS, so we can advertise the flag and just ignore it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/file-posix.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 53f475ed61..1dca220a81 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -702,6 +702,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #endif
 
     bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
+    if (S_ISREG(st.st_mode)) {
+        /* When extending regular files, we get zeros from the OS */
+        bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+    }
     ret = 0;
 fail:
     if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 07/10] block: truncate: Don't make backing file data visible
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
                   ` (5 preceding siblings ...)
  2020-04-23 15:01 ` [PATCH v6 06/10] file-posix: " Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-23 15:21   ` Eric Blake
                     ` (2 more replies)
  2020-04-23 15:01 ` [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info() Kevin Wolf
                   ` (2 subsequent siblings)
  9 siblings, 3 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

When extending the size of an image that has a backing file larger than
its old size, make sure that the backing file data doesn't become
visible in the guest, but the added area is properly zeroed out.

Consider the following scenario where the overlay is shorter than its
backing file:

    base.qcow2:     AAAAAAAA
    overlay.qcow2:  BBBB

When resizing (extending) overlay.qcow2, the new blocks should not stay
unallocated and make the additional As from base.qcow2 visible like
before this patch, but zeros should be read.

A similar case happens with the various variants of a commit job when an
intermediate file is short (- for unallocated):

    base.qcow2:     A-A-AAAA
    mid.qcow2:      BB-B
    top.qcow2:      C--C--C-

After commit top.qcow2 to mid.qcow2, the following happens:

    mid.qcow2:      CB-C00C0 (correct result)
    mid.qcow2:      CB-C--C- (before this fix)

Without the fix, blocks that previously read as zeros on top.qcow2
suddenly turn into A.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/io.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/block/io.c b/block/io.c
index 795075954e..f618db3499 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3394,6 +3394,30 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
         goto out;
     }
 
+    /*
+     * If the image has a backing file that is large enough that it would
+     * provide data for the new area, we cannot leave it unallocated because
+     * then the backing file content would become visible. Instead, zero-fill
+     * the new area.
+     *
+     * Note that if the image has a backing file, but was opened without the
+     * backing file, taking care of keeping things consistent with that backing
+     * file is the user's responsibility.
+     */
+    if (new_bytes && bs->backing) {
+        int64_t backing_len;
+
+        backing_len = bdrv_getlength(backing_bs(bs));
+        if (backing_len < 0) {
+            ret = backing_len;
+            goto out;
+        }
+
+        if (backing_len > old_size) {
+            flags |= BDRV_REQ_ZERO_WRITE;
+        }
+    }
+
     if (drv->bdrv_co_truncate) {
         if (flags & ~bs->supported_truncate_flags) {
             error_setg(errp, "Block driver does not support requested flags");
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info()
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
                   ` (6 preceding siblings ...)
  2020-04-23 15:01 ` [PATCH v6 07/10] block: truncate: Don't make backing file data visible Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-24  6:51   ` Vladimir Sementsov-Ogievskiy
  2020-04-23 15:01 ` [PATCH v6 09/10] iotests: Test committing to short backing file Kevin Wolf
  2020-04-23 15:01 ` [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation Kevin Wolf
  9 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

We want to keep TEST_IMG for the full path of the main test image, but
filter_testfiles() must be called for other test images before replacing
other things like the image format because the test directory path could
contain the format as a substring.

Insert a filter_testfiles() call between both.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 7bc4934cd2..5f8c263d59 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -338,8 +338,9 @@ def filter_img_info(output, filename):
     for line in output.split('\n'):
         if 'disk size' in line or 'actual-size' in line:
             continue
-        line = line.replace(filename, 'TEST_IMG') \
-                   .replace(imgfmt, 'IMGFMT')
+        line = line.replace(filename, 'TEST_IMG')
+        line = filter_testfiles(line)
+        line = line.replace(imgfmt, 'IMGFMT')
         line = re.sub('iters: [0-9]+', 'iters: XXX', line)
         line = re.sub('uuid: [-a-f0-9]+', 'uuid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX', line)
         line = re.sub('cid: [0-9]+', 'cid: XXXXXXXXXX', line)
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 09/10] iotests: Test committing to short backing file
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
                   ` (7 preceding siblings ...)
  2020-04-23 15:01 ` [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info() Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-24  8:53   ` Vladimir Sementsov-Ogievskiy
  2020-04-23 15:01 ` [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation Kevin Wolf
  9 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/274     | 157 ++++++++++++++++++++++
 tests/qemu-iotests/274.out | 260 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 418 insertions(+)
 create mode 100755 tests/qemu-iotests/274
 create mode 100644 tests/qemu-iotests/274.out

diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
new file mode 100755
index 0000000000..8bf7ff3122
--- /dev/null
+++ b/tests/qemu-iotests/274
@@ -0,0 +1,157 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+# Creator/Owner: Kevin Wolf <kwolf@redhat.com>
+#
+# Some tests for short backing files and short overlays
+
+import iotests
+import os
+
+iotests.verify_image_format(supported_fmts=['qcow2'])
+iotests.verify_platform(['linux'])
+
+size_short = 1 * 1024 * 1024
+size_long = 2 * 1024 * 1024
+size_diff = size_long - size_short
+
+def create_chain():
+    iotests.qemu_img_log('create', '-f', iotests.imgfmt, base,
+                         str(size_long))
+    iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', base, mid,
+                         str(size_short))
+    iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', mid, top,
+                         str(size_long))
+
+    iotests.qemu_io_log('-c', 'write -P 1 0 %d' % size_long, base)
+
+def create_vm():
+    vm = iotests.VM()
+    vm.add_blockdev('file,filename=%s,node-name=base-file' % (base))
+    vm.add_blockdev('%s,file=base-file,node-name=base' % (iotests.imgfmt))
+    vm.add_blockdev('file,filename=%s,node-name=mid-file' % (mid))
+    vm.add_blockdev('%s,file=mid-file,node-name=mid,backing=base' % (iotests.imgfmt))
+    vm.add_drive(top, 'backing=mid,node-name=top')
+    return vm
+
+with iotests.FilePath('base') as base, \
+     iotests.FilePath('mid') as mid, \
+     iotests.FilePath('top') as top:
+
+    iotests.log('== Commit tests ==')
+
+    create_chain()
+
+    iotests.log('=== Check visible data ===')
+
+    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, top)
+    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), top)
+
+    iotests.log('=== Checking allocation status ===')
+
+    iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
+                        '-c', 'alloc %d %d' % (size_short, size_diff),
+                        base)
+
+    iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
+                        '-c', 'alloc %d %d' % (size_short, size_diff),
+                        mid)
+
+    iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
+                        '-c', 'alloc %d %d' % (size_short, size_diff),
+                        top)
+
+    iotests.log('=== Checking map ===')
+
+    iotests.qemu_img_log('map', '--output=json', base)
+    iotests.qemu_img_log('map', '--output=human', base)
+    iotests.qemu_img_log('map', '--output=json', mid)
+    iotests.qemu_img_log('map', '--output=human', mid)
+    iotests.qemu_img_log('map', '--output=json', top)
+    iotests.qemu_img_log('map', '--output=human', top)
+
+    iotests.log('=== Testing qemu-img commit (top -> mid) ===')
+
+    iotests.qemu_img_log('commit', top)
+    iotests.img_info_log(mid)
+    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
+
+    iotests.log('=== Testing HMP commit (top -> mid) ===')
+
+    create_chain()
+    with create_vm() as vm:
+        vm.launch()
+        vm.qmp_log('human-monitor-command', command_line='commit drive0')
+
+    iotests.img_info_log(mid)
+    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
+
+    iotests.log('=== Testing QMP active commit (top -> mid) ===')
+
+    create_chain()
+    with create_vm() as vm:
+        vm.launch()
+        vm.qmp_log('block-commit', device='top', base_node='mid',
+                   job_id='job0', auto_dismiss=False)
+        vm.run_job('job0', wait=5)
+
+    iotests.img_info_log(mid)
+    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
+
+
+    iotests.log('== Resize tests ==')
+
+    # Use different sizes for different allocation modes:
+    #
+    # We want to have at least one test where 32 bit truncation in the size of
+    # the overlapping area becomes visible. This is covered by the
+    # prealloc='off' case (1G to 6G is an overlap of 5G).
+    #
+    # However, we can only do this for modes that don't preallocate data
+    # because otherwise we might run out of space on the test host.
+    #
+    # We also want to test some unaligned combinations.
+    for (prealloc, base_size, top_size_old, top_size_new, off)  in [
+            ('off',       '6G',    '1G',   '8G',   '5G'),
+            ('metadata', '32G',   '30G',  '33G',  '31G'),
+            ('falloc',   '10M',    '5M',  '15M',   '9M'),
+            ('full',     '16M',    '8M',  '12M',  '11M'),
+            ('off',      '384k', '253k', '512k', '253k'),
+            ('off',      '400k', '256k', '512k', '336k'),
+            ('off',      '512k', '256k', '500k', '436k')]:
+
+        iotests.log('=== preallocation=%s ===' % prealloc)
+        iotests.qemu_img_log('create', '-f', iotests.imgfmt, base, base_size)
+        iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', base, top,
+                             top_size_old)
+        iotests.qemu_io_log('-c', 'write -P 1 %s 64k' % off, base)
+
+        # After this, 0 to base_size should be allocated/zeroed
+        # base_size to top_size_new should be unallocated with
+        # preallocation=off and allocated with preallocation enabled
+        iotests.qemu_img_log('resize', '-f', iotests.imgfmt,
+                             '--preallocation', prealloc, top, top_size_new)
+        iotests.qemu_io_log('-c', 'read -P 0 %s 64k' % off, top)
+
+        # Metadata preallocation doesn't have a defined result on the file
+        # system level with respect to holes, so skip it here
+        iotests.qemu_io_log('-c', 'map', top)
+        if prealloc != 'metadata':
+            iotests.qemu_img_log('map', '--output=json', top)
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
new file mode 100644
index 0000000000..179bd7ccaf
--- /dev/null
+++ b/tests/qemu-iotests/274.out
@@ -0,0 +1,260 @@
+== Commit tests ==
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=2097152 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 size=1048576 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=2097152 backing_file=TEST_DIR/PID-mid cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Check visible data ===
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Checking allocation status ===
+1048576/1048576 bytes allocated at offset 0 bytes
+1048576/1048576 bytes allocated at offset 1 MiB
+
+0/1048576 bytes allocated at offset 0 bytes
+0/0 bytes allocated at offset 1 MiB
+
+0/1048576 bytes allocated at offset 0 bytes
+0/1048576 bytes allocated at offset 1 MiB
+
+=== Checking map ===
+[{ "start": 0, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": 327680}]
+
+Offset          Length          Mapped to       File
+0               0x200000        0x50000         TEST_DIR/PID-base
+
+[{ "start": 0, "length": 1048576, "depth": 1, "zero": false, "data": true, "offset": 327680}]
+
+Offset          Length          Mapped to       File
+0               0x100000        0x50000         TEST_DIR/PID-base
+
+[{ "start": 0, "length": 1048576, "depth": 2, "zero": false, "data": true, "offset": 327680},
+{ "start": 1048576, "length": 1048576, "depth": 0, "zero": true, "data": false}]
+
+Offset          Length          Mapped to       File
+0               0x100000        0x50000         TEST_DIR/PID-base
+
+=== Testing qemu-img commit (top -> mid) ===
+Image committed.
+
+image: TEST_IMG
+file format: IMGFMT
+virtual size: 2 MiB (2097152 bytes)
+cluster_size: 65536
+backing file: TEST_DIR/PID-base
+Format specific information:
+    compat: 1.1
+    lazy refcounts: false
+    refcount bits: 16
+    corrupt: false
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing HMP commit (top -> mid) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=2097152 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 size=1048576 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=2097152 backing_file=TEST_DIR/PID-mid cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+{"execute": "human-monitor-command", "arguments": {"command-line": "commit drive0"}}
+{"return": ""}
+image: TEST_IMG
+file format: IMGFMT
+virtual size: 2 MiB (2097152 bytes)
+cluster_size: 65536
+backing file: TEST_DIR/PID-base
+Format specific information:
+    compat: 1.1
+    lazy refcounts: false
+    refcount bits: 16
+    corrupt: false
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing QMP active commit (top -> mid) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=2097152 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 size=1048576 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=2097152 backing_file=TEST_DIR/PID-mid cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "mid", "device": "top", "job-id": "job0"}}
+{"return": {}}
+{"execute": "job-complete", "arguments": {"id": "job0"}}
+{"return": {}}
+{"data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
+{"return": {}}
+image: TEST_IMG
+file format: IMGFMT
+virtual size: 2 MiB (2097152 bytes)
+cluster_size: 65536
+backing file: TEST_DIR/PID-base
+Format specific information:
+    compat: 1.1
+    lazy refcounts: false
+    refcount bits: 16
+    corrupt: false
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Resize tests ==
+=== preallocation=off ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=6442450944 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=1073741824 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 65536/65536 bytes at offset 5368709120
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image resized.
+
+read 65536/65536 bytes at offset 5368709120
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+1 GiB (0x40000000) bytes not allocated at offset 0 bytes (0x0)
+7 GiB (0x1c0000000) bytes     allocated at offset 1 GiB (0x40000000)
+
+[{ "start": 0, "length": 1073741824, "depth": 1, "zero": true, "data": false},
+{ "start": 1073741824, "length": 7516192768, "depth": 0, "zero": true, "data": false}]
+
+=== preallocation=metadata ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=34359738368 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=32212254720 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 65536/65536 bytes at offset 33285996544
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image resized.
+
+read 65536/65536 bytes at offset 33285996544
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+30 GiB (0x780000000) bytes not allocated at offset 0 bytes (0x0)
+3 GiB (0xc0000000) bytes     allocated at offset 30 GiB (0x780000000)
+
+=== preallocation=falloc ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=5242880 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 65536/65536 bytes at offset 9437184
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image resized.
+
+read 65536/65536 bytes at offset 9437184
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+5 MiB (0x500000) bytes not allocated at offset 0 bytes (0x0)
+10 MiB (0xa00000) bytes     allocated at offset 5 MiB (0x500000)
+
+[{ "start": 0, "length": 5242880, "depth": 1, "zero": true, "data": false},
+{ "start": 5242880, "length": 10485760, "depth": 0, "zero": true, "data": false, "offset": 327680}]
+
+=== preallocation=full ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=16777216 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=8388608 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 65536/65536 bytes at offset 11534336
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image resized.
+
+read 65536/65536 bytes at offset 11534336
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+8 MiB (0x800000) bytes not allocated at offset 0 bytes (0x0)
+4 MiB (0x400000) bytes     allocated at offset 8 MiB (0x800000)
+
+[{ "start": 0, "length": 8388608, "depth": 1, "zero": true, "data": false},
+{ "start": 8388608, "length": 4194304, "depth": 0, "zero": true, "data": false, "offset": 327680}]
+
+=== preallocation=off ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=393216 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=259072 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 65536/65536 bytes at offset 259072
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image resized.
+
+read 65536/65536 bytes at offset 259072
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+192 KiB (0x30000) bytes not allocated at offset 0 bytes (0x0)
+320 KiB (0x50000) bytes     allocated at offset 192 KiB (0x30000)
+
+[{ "start": 0, "length": 196608, "depth": 1, "zero": true, "data": false},
+{ "start": 196608, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680},
+{ "start": 262144, "length": 262144, "depth": 0, "zero": true, "data": false}]
+
+=== preallocation=off ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=409600 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=262144 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 65536/65536 bytes at offset 344064
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image resized.
+
+read 65536/65536 bytes at offset 344064
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+256 KiB (0x40000) bytes not allocated at offset 0 bytes (0x0)
+256 KiB (0x40000) bytes     allocated at offset 256 KiB (0x40000)
+
+[{ "start": 0, "length": 262144, "depth": 1, "zero": true, "data": false},
+{ "start": 262144, "length": 262144, "depth": 0, "zero": true, "data": false}]
+
+=== preallocation=off ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=524288 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=262144 backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off refcount_bits=16
+
+wrote 65536/65536 bytes at offset 446464
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image resized.
+
+read 65536/65536 bytes at offset 446464
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+256 KiB (0x40000) bytes not allocated at offset 0 bytes (0x0)
+244 KiB (0x3d000) bytes     allocated at offset 256 KiB (0x40000)
+
+[{ "start": 0, "length": 262144, "depth": 1, "zero": true, "data": false},
+{ "start": 262144, "length": 249856, "depth": 0, "zero": true, "data": false}]
+
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 435dccd5af..1710470e70 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -286,6 +286,7 @@
 270 rw backing quick
 272 rw
 273 backing quick
+274 rw backing
 277 rw quick
 279 rw backing quick
 280 rw migration quick
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
  2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
                   ` (8 preceding siblings ...)
  2020-04-23 15:01 ` [PATCH v6 09/10] iotests: Test committing to short backing file Kevin Wolf
@ 2020-04-23 15:01 ` Kevin Wolf
  2020-04-23 15:36   ` Eric Blake
  2020-04-23 18:05   ` Max Reitz
  9 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, berto, qemu-devel, mreitz

The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
image is possibly preallocated and then the zero flag is added to all
clusters. This means that a copy-on-write operation may be needed when
writing to these clusters, despite having used preallocation, negating
one of the major benefits of preallocation.

Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
and if the protocol driver can ensure that the new area reads as zeros,
we can skip setting the zero flag in the qcow2 layer.

Unfortunately, the same approach doesn't work for metadata
preallocation, so we'll still set the zero flag there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c              | 22 +++++++++++++++++++---
 tests/qemu-iotests/274.out |  4 ++--
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ad621fe404..b28e588942 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4170,9 +4170,25 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         /* Allocate the data area */
         new_file_size = allocation_start +
                         nb_new_data_clusters * s->cluster_size;
-        /* Image file grows, so @exact does not matter */
-        ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
-                               errp);
+        /*
+         * Image file grows, so @exact does not matter.
+         *
+         * If we need to zero out the new area, try first whether the protocol
+         * driver can already take care of this.
+         */
+        if (flags & BDRV_REQ_ZERO_WRITE) {
+            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc,
+                                   BDRV_REQ_ZERO_WRITE, errp);
+            if (ret >= 0) {
+                flags &= ~BDRV_REQ_ZERO_WRITE;
+            }
+        } else {
+            ret = -1;
+        }
+        if (ret < 0) {
+            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
+                                   errp);
+        }
         if (ret < 0) {
             error_prepend(errp, "Failed to resize underlying file: ");
             qcow2_free_clusters(bs, allocation_start,
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
index 179bd7ccaf..c355b52689 100644
--- a/tests/qemu-iotests/274.out
+++ b/tests/qemu-iotests/274.out
@@ -179,7 +179,7 @@ read 65536/65536 bytes at offset 9437184
 10 MiB (0xa00000) bytes     allocated at offset 5 MiB (0x500000)
 
 [{ "start": 0, "length": 5242880, "depth": 1, "zero": true, "data": false},
-{ "start": 5242880, "length": 10485760, "depth": 0, "zero": true, "data": false, "offset": 327680}]
+{ "start": 5242880, "length": 10485760, "depth": 0, "zero": false, "data": true, "offset": 327680}]
 
 === preallocation=full ===
 Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=16777216 cluster_size=65536 lazy_refcounts=off refcount_bits=16
@@ -198,7 +198,7 @@ read 65536/65536 bytes at offset 11534336
 4 MiB (0x400000) bytes     allocated at offset 8 MiB (0x800000)
 
 [{ "start": 0, "length": 8388608, "depth": 1, "zero": true, "data": false},
-{ "start": 8388608, "length": 4194304, "depth": 0, "zero": true, "data": false, "offset": 327680}]
+{ "start": 8388608, "length": 4194304, "depth": 0, "zero": false, "data": true, "offset": 327680}]
 
 === preallocation=off ===
 Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=393216 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-- 
2.25.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-23 15:01 ` [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Kevin Wolf
@ 2020-04-23 15:18   ` Eric Blake
  2020-04-23 15:48     ` Kevin Wolf
  2020-04-24  6:16   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Blake @ 2020-04-23 15:18 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: vsementsov, berto, qemu-devel, mreitz

On 4/23/20 10:01 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+), 1 deletion(-)
> 

> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +
> +        /*
> +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> +         * requires a cluster-aligned start. The end may be unaligned if it is
> +         * at the end of the image (which it is here).
> +         */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> +            goto fail;
> +        }
> +
> +        /* Write explicit zeros for the unaligned head */
> +        if (zero_start > old_length) {
> +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
> +            QEMUIOVector qiov;
> +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> +
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);

This works, but would it be any more efficient to use 
qcow2_co_pwrite_zeroes?  If the head of the cluster is already zero, 
then qcow2_co_pwrite_zeroes can turn into qcow2_cluster_zeroize for this 
cluster, while qcow2_co_pwritev_part cannot.

Because what you have works, and because we can use 
qcow2_co_pwrite_zeroes as an optimization in a later patch,

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 07/10] block: truncate: Don't make backing file data visible
  2020-04-23 15:01 ` [PATCH v6 07/10] block: truncate: Don't make backing file data visible Kevin Wolf
@ 2020-04-23 15:21   ` Eric Blake
  2020-04-23 17:59   ` Max Reitz
  2020-04-24  6:45   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2020-04-23 15:21 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: vsementsov, berto, qemu-devel, mreitz

On 4/23/20 10:01 AM, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
> 
> Consider the following scenario where the overlay is shorter than its
> backing file:
> 
>      base.qcow2:     AAAAAAAA
>      overlay.qcow2:  BBBB
> 
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
> 
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
> 
>      base.qcow2:     A-A-AAAA
>      mid.qcow2:      BB-B
>      top.qcow2:      C--C--C-
> 
> After commit top.qcow2 to mid.qcow2, the following happens:
> 
>      mid.qcow2:      CB-C00C0 (correct result)
>      mid.qcow2:      CB-C--C- (before this fix)
> 
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/io.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
  2020-04-23 15:01 ` [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation Kevin Wolf
@ 2020-04-23 15:36   ` Eric Blake
  2020-04-23 16:04     ` Kevin Wolf
  2020-04-23 18:05   ` Max Reitz
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Blake @ 2020-04-23 15:36 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: vsementsov, berto, qemu-devel, mreitz

On 4/23/20 10:01 AM, Kevin Wolf wrote:
> The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
> image is possibly preallocated and then the zero flag is added to all
> clusters. This means that a copy-on-write operation may be needed when
> writing to these clusters, despite having used preallocation, negating
> one of the major benefits of preallocation.
> 
> Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
> and if the protocol driver can ensure that the new area reads as zeros,
> we can skip setting the zero flag in the qcow2 layer.

Hmm.  When we get block status, it is very easy to tell that something 
reads as zero when the qcow2 file has the zero bit set, but when the 
qcow2 file does not have the zero bit set, we have to then query the 
format layer whether it reads as zeros (which is expensive enough for 
some format layers that we no longer report things as reading as zero). 
I'm worried that optimizing this case could penalize later block status.

We already can tell the difference between a cluster that has the zero 
bit set but is not preallocated, vs. has the zero bit set and is 
preallocated.  Are we really forcing a copy-on-write to a cluster that 
is marked zero but preallocated?  Is the problem that we don't have a 
way to distinguish between 'reads as zeroes, allocated, but we don't 
know state of format layer' and 'reads as zeroes, allocated, and we know 
format layer reads as zeroes'?

> 
> Unfortunately, the same approach doesn't work for metadata
> preallocation, so we'll still set the zero flag there.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2.c              | 22 +++++++++++++++++++---
>   tests/qemu-iotests/274.out |  4 ++--
>   2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index ad621fe404..b28e588942 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -4170,9 +4170,25 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>           /* Allocate the data area */
>           new_file_size = allocation_start +
>                           nb_new_data_clusters * s->cluster_size;
> -        /* Image file grows, so @exact does not matter */
> -        ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> -                               errp);
> +        /*
> +         * Image file grows, so @exact does not matter.
> +         *
> +         * If we need to zero out the new area, try first whether the protocol
> +         * driver can already take care of this.
> +         */
> +        if (flags & BDRV_REQ_ZERO_WRITE) {
> +            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc,
> +                                   BDRV_REQ_ZERO_WRITE, errp);
> +            if (ret >= 0) {
> +                flags &= ~BDRV_REQ_ZERO_WRITE;
> +            }

Hmm - just noticing things: how does this series interplay with the 
existing bdrv_has_zero_init_truncate?  Should this series automatically 
handle BDRV_REQ_ZERO_WRITE on a bdrv_co_truncate(PREALLOC_NONE) request 
for all drivers that report true, even if that driver does not advertise 
support for the BDRV_REQ_ZERO_WRITE flag?

> +        } else {
> +            ret = -1;
> +        }

Here, ret == -1 does not imply whether errp is set - but annoyingly, 
errp CAN be set if bdrv_co_truncate() failed.

> +        if (ret < 0) {
> +            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> +                                   errp);

And here, you are passing a possibly-set errp back to 
bdrv_co_truncate().  That is a bug that can abort.  You need to pass 
NULL to the first bdrv_co_truncate() call or else clear errp prior to 
trying a fallback to this second bdrv_co_truncate() call.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-23 15:18   ` Eric Blake
@ 2020-04-23 15:48     ` Kevin Wolf
  0 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 15:48 UTC (permalink / raw)
  To: Eric Blake; +Cc: vsementsov, berto, qemu-devel, qemu-block, mreitz

Am 23.04.2020 um 17:18 hat Eric Blake geschrieben:
> On 4/23/20 10:01 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Max Reitz <mreitz@redhat.com>
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
> >   2 files changed, 34 insertions(+), 1 deletion(-)
> > 
> 
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +
> > +        /*
> > +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> > +         * requires a cluster-aligned start. The end may be unaligned if it is
> > +         * at the end of the image (which it is here).
> > +         */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> > +        if (ret < 0) {
> > +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> > +            goto fail;
> > +        }
> > +
> > +        /* Write explicit zeros for the unaligned head */
> > +        if (zero_start > old_length) {
> > +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
> > +            QEMUIOVector qiov;
> > +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> > +
> > +            qemu_co_mutex_unlock(&s->lock);
> > +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
> 
> This works, but would it be any more efficient to use
> qcow2_co_pwrite_zeroes?  If the head of the cluster is already zero, then
> qcow2_co_pwrite_zeroes can turn into qcow2_cluster_zeroize for this cluster,
> while qcow2_co_pwritev_part cannot.

The problem is that qcow2_co_pwrite_zeroes() will return -ENOTSUP if the
request is still unaligned after this optimisation. I would have to go
through the generic bdrv_co_pwrite_zeroes() to get the fallback, but I
can't do that because bs->total_sectors isn't updated yet.

Kevin



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
  2020-04-23 15:36   ` Eric Blake
@ 2020-04-23 16:04     ` Kevin Wolf
  2020-04-23 16:15       ` Eric Blake
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2020-04-23 16:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: vsementsov, berto, qemu-devel, qemu-block, mreitz

Am 23.04.2020 um 17:36 hat Eric Blake geschrieben:
> On 4/23/20 10:01 AM, Kevin Wolf wrote:
> > The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
> > image is possibly preallocated and then the zero flag is added to all
> > clusters. This means that a copy-on-write operation may be needed when
> > writing to these clusters, despite having used preallocation, negating
> > one of the major benefits of preallocation.
> > 
> > Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
> > and if the protocol driver can ensure that the new area reads as zeros,
> > we can skip setting the zero flag in the qcow2 layer.
> 
> Hmm.  When we get block status, it is very easy to tell that something reads
> as zero when the qcow2 file has the zero bit set, but when the qcow2 file
> does not have the zero bit set, we have to then query the format layer
> whether it reads as zeros (which is expensive enough for some format layers
> that we no longer report things as reading as zero). I'm worried that
> optimizing this case could penalize later block status.

That's just how preallocation works. If you don't want that, you need
preallocation=off.

> We already can tell the difference between a cluster that has the zero bit
> set but is not preallocated, vs. has the zero bit set and is preallocated.
> Are we really forcing a copy-on-write to a cluster that is marked zero but
> preallocated?  Is the problem that we don't have a way to distinguish
> between 'reads as zeroes, allocated, but we don't know state of format
> layer' and 'reads as zeroes, allocated, and we know format layer reads as
> zeroes'?

Basically, yes. If we had this, we could have a type of cluster where
writing to it still involves a metadata update (to clear the zero flag),
but never copy-on-write, even for partial writes.

I'm not sure if this would cover a very relevant case, though.

> > 
> > Unfortunately, the same approach doesn't work for metadata
> > preallocation, so we'll still set the zero flag there.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/qcow2.c              | 22 +++++++++++++++++++---
> >   tests/qemu-iotests/274.out |  4 ++--
> >   2 files changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index ad621fe404..b28e588942 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -4170,9 +4170,25 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           /* Allocate the data area */
> >           new_file_size = allocation_start +
> >                           nb_new_data_clusters * s->cluster_size;
> > -        /* Image file grows, so @exact does not matter */
> > -        ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> > -                               errp);
> > +        /*
> > +         * Image file grows, so @exact does not matter.
> > +         *
> > +         * If we need to zero out the new area, try first whether the protocol
> > +         * driver can already take care of this.
> > +         */
> > +        if (flags & BDRV_REQ_ZERO_WRITE) {
> > +            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc,
> > +                                   BDRV_REQ_ZERO_WRITE, errp);
> > +            if (ret >= 0) {
> > +                flags &= ~BDRV_REQ_ZERO_WRITE;
> > +            }
> 
> Hmm - just noticing things: how does this series interplay with the existing
> bdrv_has_zero_init_truncate?  Should this series automatically handle
> BDRV_REQ_ZERO_WRITE on a bdrv_co_truncate(PREALLOC_NONE) request for all
> drivers that report true, even if that driver does not advertise support for
> the BDRV_REQ_ZERO_WRITE flag?

Hmm... It feels risky to me.

> > +        } else {
> > +            ret = -1;
> > +        }
> 
> Here, ret == -1 does not imply whether errp is set - but annoyingly, errp
> CAN be set if bdrv_co_truncate() failed.
> 
> > +        if (ret < 0) {
> > +            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> > +                                   errp);
> 
> And here, you are passing a possibly-set errp back to bdrv_co_truncate().
> That is a bug that can abort.  You need to pass NULL to the first
> bdrv_co_truncate() call or else clear errp prior to trying a fallback to
> this second bdrv_co_truncate() call.

Yes, you're right. If nothing else comes up, I'll fix this while
applying.

Kevin



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
  2020-04-23 16:04     ` Kevin Wolf
@ 2020-04-23 16:15       ` Eric Blake
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2020-04-23 16:15 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: vsementsov, berto, qemu-devel, qemu-block, mreitz

On 4/23/20 11:04 AM, Kevin Wolf wrote:

>> Hmm.  When we get block status, it is very easy to tell that something reads
>> as zero when the qcow2 file has the zero bit set, but when the qcow2 file
>> does not have the zero bit set, we have to then query the format layer
>> whether it reads as zeros (which is expensive enough for some format layers
>> that we no longer report things as reading as zero). I'm worried that
>> optimizing this case could penalize later block status.
> 
> That's just how preallocation works. If you don't want that, you need
> preallocation=off.

Good point. And if I recall, didn't we already have a discussion (or 
even patches) to optimize whether querying the format layer during block 
status was even worth the effort, depending on heuristics of the size of 
the format layer which in turn is based on whether there was 
preallocation?  So not a show-stopper.

> 
>> We already can tell the difference between a cluster that has the zero bit
>> set but is not preallocated, vs. has the zero bit set and is preallocated.
>> Are we really forcing a copy-on-write to a cluster that is marked zero but
>> preallocated?  Is the problem that we don't have a way to distinguish
>> between 'reads as zeroes, allocated, but we don't know state of format
>> layer' and 'reads as zeroes, allocated, and we know format layer reads as
>> zeroes'?
> 
> Basically, yes. If we had this, we could have a type of cluster where
> writing to it still involves a metadata update (to clear the zero flag),
> but never copy-on-write, even for partial writes.
> 
> I'm not sure if this would cover a very relevant case, though.

I also wonder if Berto's subcluster patches might play into this scenario.

>>
>> Hmm - just noticing things: how does this series interplay with the existing
>> bdrv_has_zero_init_truncate?  Should this series automatically handle
>> BDRV_REQ_ZERO_WRITE on a bdrv_co_truncate(PREALLOC_NONE) request for all
>> drivers that report true, even if that driver does not advertise support for
>> the BDRV_REQ_ZERO_WRITE flag?
> 
> Hmm... It feels risky to me.

Or worded differently, is bdrv_has_zero_init_truncate even still 
necessary (when it is documented only to cover the PREALLOC_NONE case), 
or should we get rid of it in favor of using BDRV_REQ_ZERO_WRITE 
everywhere instead?  (Which in turn involves visiting all drivers that 
previously advertised bdrv_has_zero_init_truncate... but I already have 
work in my tree trying to do that as part of preparing to add an 
autoclear bit to qcow2 to make it faster to report when a qcow2 image is 
known all-zero content...)

Looks like I'll be rebasing my work on top of this series.

> 
>>> +        } else {
>>> +            ret = -1;
>>> +        }
>>
>> Here, ret == -1 does not imply whether errp is set - but annoyingly, errp
>> CAN be set if bdrv_co_truncate() failed.
>>
>>> +        if (ret < 0) {
>>> +            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
>>> +                                   errp);
>>
>> And here, you are passing a possibly-set errp back to bdrv_co_truncate().
>> That is a bug that can abort.  You need to pass NULL to the first
>> bdrv_co_truncate() call or else clear errp prior to trying a fallback to
>> this second bdrv_co_truncate() call.
> 
> Yes, you're right. If nothing else comes up, I'll fix this while
> applying.
> 
> Kevin
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 07/10] block: truncate: Don't make backing file data visible
  2020-04-23 15:01 ` [PATCH v6 07/10] block: truncate: Don't make backing file data visible Kevin Wolf
  2020-04-23 15:21   ` Eric Blake
@ 2020-04-23 17:59   ` Max Reitz
  2020-04-24  6:45   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2020-04-23 17:59 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, vsementsov, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2720 bytes --]

On 23.04.20 17:01, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
> 
> Consider the following scenario where the overlay is shorter than its
> backing file:
> 
>     base.qcow2:     AAAAAAAA
>     overlay.qcow2:  BBBB
> 
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
> 
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
> 
>     base.qcow2:     A-A-AAAA
>     mid.qcow2:      BB-B
>     top.qcow2:      C--C--C-
> 
> After commit top.qcow2 to mid.qcow2, the following happens:
> 
>     mid.qcow2:      CB-C00C0 (correct result)
>     mid.qcow2:      CB-C--C- (before this fix)
> 
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>

Seems like Berto gave you a rather broad R-b in v4. :)

> ---
>  block/io.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 795075954e..f618db3499 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3394,6 +3394,30 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>          goto out;
>      }
>  
> +    /*
> +     * If the image has a backing file that is large enough that it would
> +     * provide data for the new area, we cannot leave it unallocated because
> +     * then the backing file content would become visible. Instead, zero-fill
> +     * the new area.
> +     *
> +     * Note that if the image has a backing file, but was opened without the
> +     * backing file, taking care of keeping things consistent with that backing
> +     * file is the user's responsibility.
> +     */
> +    if (new_bytes && bs->backing) {
> +        int64_t backing_len;
> +
> +        backing_len = bdrv_getlength(backing_bs(bs));
> +        if (backing_len < 0) {
> +            ret = backing_len;

Shouldn’t we set errp?

Max

> +            goto out;
> +        }
> +
> +        if (backing_len > old_size) {
> +            flags |= BDRV_REQ_ZERO_WRITE;
> +        }
> +    }
> +
>      if (drv->bdrv_co_truncate) {
>          if (flags & ~bs->supported_truncate_flags) {
>              error_setg(errp, "Block driver does not support requested flags");
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
  2020-04-23 15:01 ` [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation Kevin Wolf
  2020-04-23 15:36   ` Eric Blake
@ 2020-04-23 18:05   ` Max Reitz
  1 sibling, 0 replies; 27+ messages in thread
From: Max Reitz @ 2020-04-23 18:05 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, vsementsov, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1104 bytes --]

On 23.04.20 17:01, Kevin Wolf wrote:
> The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
> image is possibly preallocated and then the zero flag is added to all
> clusters. This means that a copy-on-write operation may be needed when
> writing to these clusters, despite having used preallocation, negating
> one of the major benefits of preallocation.
> 
> Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
> and if the protocol driver can ensure that the new area reads as zeros,
> we can skip setting the zero flag in the qcow2 layer.
> 
> Unfortunately, the same approach doesn't work for metadata
> preallocation, so we'll still set the zero flag there.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2.c              | 22 +++++++++++++++++++---
>  tests/qemu-iotests/274.out |  4 ++--
>  2 files changed, 21 insertions(+), 5 deletions(-)

Oh, nice.  I didn’t think you (or anyone else for that matter) would
actually do this. :)

With the errp thing fixed:

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-23 15:01 ` [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Kevin Wolf
  2020-04-23 15:18   ` Eric Blake
@ 2020-04-24  6:16   ` Vladimir Sementsov-Ogievskiy
  2020-04-24 12:17     ` Kevin Wolf
  1 sibling, 1 reply; 27+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24  6:16 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

23.04.2020 18:01, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 17f1363279..4b5fc8c4a7 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>       /* Caller must pass aligned values, except at image end */
>       assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>       assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> -           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> +           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
>   
>       /* The zero flag is only supported by version 3 and newer */
>       if (s->qcow_version < 3) {
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..ad621fe404 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>   
>       bs->supported_zero_flags = header.version >= 3 ?
>                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>   
>       /* Repair image if dirty */
>       if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
> @@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>           g_assert_not_reached();
>       }
>   
> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +
> +        /*
> +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> +         * requires a cluster-aligned start. The end may be unaligned if it is
> +         * at the end of the image (which it is here).
> +         */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> +            goto fail;
> +        }
> +
> +        /* Write explicit zeros for the unaligned head */
> +        if (zero_start > old_length) {
> +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);

Why not s/s->cluster_size/zero_start - old_length/? We may save a lot of pages if cluster_size is large.

> +            QEMUIOVector qiov;
> +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> +
> +            qemu_co_mutex_unlock(&s->lock);

We are in intermediate state here. Is it safe to unlock? Anything may happen, up to another truncate..

> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);

Better not do it if this cluster is already ZERO.. But it may be a later patch to improve that.

> +            qemu_co_mutex_lock(&s->lock);
> +
> +            qemu_vfree(buf);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
> +                goto fail;
> +            }
> +        }
> +    }
> +
>       if (prealloc != PREALLOC_MODE_OFF) {
>           /* Flush metadata before actually changing the image size */
>           ret = qcow2_write_caches(bs);
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 05/10] raw-format: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-23 15:01 ` [PATCH v6 05/10] raw-format: " Kevin Wolf
@ 2020-04-24  6:21   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 27+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24  6:21 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

23.04.2020 18:01, Kevin Wolf wrote:
> The raw format driver can simply forward the flag and let its bs->file
> child take care of actually providing the zeros.
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> Reviewed-by: Max Reitz<mreitz@redhat.com>
> Reviewed-by: Eric Blake<eblake@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 07/10] block: truncate: Don't make backing file data visible
  2020-04-23 15:01 ` [PATCH v6 07/10] block: truncate: Don't make backing file data visible Kevin Wolf
  2020-04-23 15:21   ` Eric Blake
  2020-04-23 17:59   ` Max Reitz
@ 2020-04-24  6:45   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 27+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24  6:45 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

23.04.2020 18:01, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
> 
> Consider the following scenario where the overlay is shorter than its
> backing file:
> 
>      base.qcow2:     AAAAAAAA
>      overlay.qcow2:  BBBB
> 
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
> 
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
> 
>      base.qcow2:     A-A-AAAA
>      mid.qcow2:      BB-B
>      top.qcow2:      C--C--C-
> 
> After commit top.qcow2 to mid.qcow2, the following happens:
> 
>      mid.qcow2:      CB-C00C0 (correct result)
>      mid.qcow2:      CB-C--C- (before this fix)
> 
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/io.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 795075954e..f618db3499 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3394,6 +3394,30 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>           goto out;
>       }
>   
> +    /*
> +     * If the image has a backing file that is large enough that it would
> +     * provide data for the new area, we cannot leave it unallocated because
> +     * then the backing file content would become visible. Instead, zero-fill
> +     * the new area.
> +     *
> +     * Note that if the image has a backing file, but was opened without the
> +     * backing file, taking care of keeping things consistent with that backing
> +     * file is the user's responsibility.
> +     */
> +    if (new_bytes && bs->backing) {
> +        int64_t backing_len;
> +
> +        backing_len = bdrv_getlength(backing_bs(bs));

this fit in one line with backing_len definition

> +        if (backing_len < 0) {
> +            ret = backing_len;

with errp set, as noted by Max:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> +            goto out;
> +        }
> +
> +        if (backing_len > old_size) {
> +            flags |= BDRV_REQ_ZERO_WRITE;
> +        }
> +    }
> +
>       if (drv->bdrv_co_truncate) {
>           if (flags & ~bs->supported_truncate_flags) {
>               error_setg(errp, "Block driver does not support requested flags");
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info()
  2020-04-23 15:01 ` [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info() Kevin Wolf
@ 2020-04-24  6:51   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 27+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24  6:51 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

23.04.2020 18:01, Kevin Wolf wrote:
> We want to keep TEST_IMG for the full path of the main test image, but
> filter_testfiles() must be called for other test images before replacing
> other things like the image format because the test directory path could
> contain the format as a substring.
> 
> Insert a filter_testfiles() call between both.
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> Reviewed-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 09/10] iotests: Test committing to short backing file
  2020-04-23 15:01 ` [PATCH v6 09/10] iotests: Test committing to short backing file Kevin Wolf
@ 2020-04-24  8:53   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 27+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24  8:53 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: berto, qemu-devel, mreitz

23.04.2020 18:01, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/274     | 157 ++++++++++++++++++++++
>   tests/qemu-iotests/274.out | 260 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/group   |   1 +
>   3 files changed, 418 insertions(+)
>   create mode 100755 tests/qemu-iotests/274
>   create mode 100644 tests/qemu-iotests/274.out
> 
> diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
> new file mode 100755
> index 0000000000..8bf7ff3122
> --- /dev/null
> +++ b/tests/qemu-iotests/274
> @@ -0,0 +1,157 @@
> +#!/usr/bin/env python3
> +#
> +# Copyright (C) 2019 Red Hat, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +#
> +# Creator/Owner: Kevin Wolf <kwolf@redhat.com>
> +#
> +# Some tests for short backing files and short overlays
> +
> +import iotests
> +import os

unused import

> +
> +iotests.verify_image_format(supported_fmts=['qcow2'])
> +iotests.verify_platform(['linux'])
> +
> +size_short = 1 * 1024 * 1024
> +size_long = 2 * 1024 * 1024
> +size_diff = size_long - size_short
> +
> +def create_chain():
> +    iotests.qemu_img_log('create', '-f', iotests.imgfmt, base,
> +                         str(size_long))
> +    iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', base, mid,
> +                         str(size_short))
> +    iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', mid, top,
> +                         str(size_long))
> +
> +    iotests.qemu_io_log('-c', 'write -P 1 0 %d' % size_long, base)
> +
> +def create_vm():
> +    vm = iotests.VM()
> +    vm.add_blockdev('file,filename=%s,node-name=base-file' % (base))
> +    vm.add_blockdev('%s,file=base-file,node-name=base' % (iotests.imgfmt))
> +    vm.add_blockdev('file,filename=%s,node-name=mid-file' % (mid))
> +    vm.add_blockdev('%s,file=mid-file,node-name=mid,backing=base' % (iotests.imgfmt))

over-79 line

> +    vm.add_drive(top, 'backing=mid,node-name=top')
> +    return vm
> +
> +with iotests.FilePath('base') as base, \
> +     iotests.FilePath('mid') as mid, \
> +     iotests.FilePath('top') as top:
> +
> +    iotests.log('== Commit tests ==')
> +
> +    create_chain()
> +
> +    iotests.log('=== Check visible data ===')
> +
> +    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, top)
> +    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), top)
> +
> +    iotests.log('=== Checking allocation status ===')
> +
> +    iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
> +                        '-c', 'alloc %d %d' % (size_short, size_diff),
> +                        base)
> +
> +    iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
> +                        '-c', 'alloc %d %d' % (size_short, size_diff),
> +                        mid)
> +
> +    iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
> +                        '-c', 'alloc %d %d' % (size_short, size_diff),
> +                        top)
> +
> +    iotests.log('=== Checking map ===')
> +
> +    iotests.qemu_img_log('map', '--output=json', base)
> +    iotests.qemu_img_log('map', '--output=human', base)
> +    iotests.qemu_img_log('map', '--output=json', mid)
> +    iotests.qemu_img_log('map', '--output=human', mid)
> +    iotests.qemu_img_log('map', '--output=json', top)
> +    iotests.qemu_img_log('map', '--output=human', top)
> +
> +    iotests.log('=== Testing qemu-img commit (top -> mid) ===')
> +
> +    iotests.qemu_img_log('commit', top)
> +    iotests.img_info_log(mid)
> +    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
> +    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
> +
> +    iotests.log('=== Testing HMP commit (top -> mid) ===')
> +
> +    create_chain()
> +    with create_vm() as vm:
> +        vm.launch()
> +        vm.qmp_log('human-monitor-command', command_line='commit drive0')
> +
> +    iotests.img_info_log(mid)
> +    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
> +    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
> +
> +    iotests.log('=== Testing QMP active commit (top -> mid) ===')
> +
> +    create_chain()
> +    with create_vm() as vm:
> +        vm.launch()
> +        vm.qmp_log('block-commit', device='top', base_node='mid',
> +                   job_id='job0', auto_dismiss=False)
> +        vm.run_job('job0', wait=5)
> +
> +    iotests.img_info_log(mid)
> +    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
> +    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
> +
> +
> +    iotests.log('== Resize tests ==')
> +
> +    # Use different sizes for different allocation modes:
> +    #
> +    # We want to have at least one test where 32 bit truncation in the size of
> +    # the overlapping area becomes visible. This is covered by the
> +    # prealloc='off' case (1G to 6G is an overlap of 5G).
> +    #
> +    # However, we can only do this for modes that don't preallocate data
> +    # because otherwise we might run out of space on the test host.
> +    #
> +    # We also want to test some unaligned combinations.
> +    for (prealloc, base_size, top_size_old, top_size_new, off)  in [

extra space before "in"

> +            ('off',       '6G',    '1G',   '8G',   '5G'),
> +            ('metadata', '32G',   '30G',  '33G',  '31G'),
> +            ('falloc',   '10M',    '5M',  '15M',   '9M'),
> +            ('full',     '16M',    '8M',  '12M',  '11M'),
> +            ('off',      '384k', '253k', '512k', '253k'),
> +            ('off',      '400k', '256k', '512k', '336k'),
> +            ('off',      '512k', '256k', '500k', '436k')]:
> +
> +        iotests.log('=== preallocation=%s ===' % prealloc)
> +        iotests.qemu_img_log('create', '-f', iotests.imgfmt, base, base_size)
> +        iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', base, top,
> +                             top_size_old)
> +        iotests.qemu_io_log('-c', 'write -P 1 %s 64k' % off, base)
> +
> +        # After this, 0 to base_size should be allocated/zeroed
> +        # base_size to top_size_new should be unallocated with

the comment is outdated. base_size to top_size_new is allocated zero too.

> +        # preallocation=off and allocated with preallocation enabled
> +        iotests.qemu_img_log('resize', '-f', iotests.imgfmt,
> +                             '--preallocation', prealloc, top, top_size_new)
> +        iotests.qemu_io_log('-c', 'read -P 0 %s 64k' % off, top)
> +
> +        # Metadata preallocation doesn't have a defined result on the file
> +        # system level with respect to holes, so skip it here

this is outdated too, as we should have zeroes for any case now.

> +        iotests.qemu_io_log('-c', 'map', top)
> +        if prealloc != 'metadata':

so, we may drop the condition and print 'map' always.

---

I've looked through img/io "map" outputs, they look correct.



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-24  6:16   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-24 12:17     ` Kevin Wolf
  2020-04-24 14:16       ` Eric Blake
  2020-04-24 14:27       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2020-04-24 12:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: berto, qemu-devel, qemu-block, mreitz

Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 23.04.2020 18:01, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Max Reitz <mreitz@redhat.com>
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
> >   2 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 17f1363279..4b5fc8c4a7 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
> >       /* Caller must pass aligned values, except at image end */
> >       assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> >       assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> > -           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> > +           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
> >       /* The zero flag is only supported by version 3 and newer */
> >       if (s->qcow_version < 3) {
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 9cfbdfc939..ad621fe404 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
> >       bs->supported_zero_flags = header.version >= 3 ?
> >                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> > +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> >       /* Repair image if dirty */
> >       if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
> > @@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           g_assert_not_reached();
> >       }
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +
> > +        /*
> > +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> > +         * requires a cluster-aligned start. The end may be unaligned if it is
> > +         * at the end of the image (which it is here).
> > +         */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> > +        if (ret < 0) {
> > +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> > +            goto fail;
> > +        }
> > +
> > +        /* Write explicit zeros for the unaligned head */
> > +        if (zero_start > old_length) {
> > +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
> 
> Why not s/s->cluster_size/zero_start - old_length/? We may save a lot
> of pages if cluster_size is large.

Ok.

> > +            QEMUIOVector qiov;
> > +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> > +
> > +            qemu_co_mutex_unlock(&s->lock);
> 
> We are in intermediate state here. Is it safe to unlock? Anything may
> happen, up to another truncate..

I don't think it's worse than unlocking during normal writes, which we
have been doing for a long time. I don't see anything that would corrupt
any internal state.

Of course, doing truncate in parallel with other operations around EOF
has somewhat undefined semantics because the order could be anything.
But if you don't want to get undefined results, you just shouldn't make
such requests. It's similar to reading and writing the same area at the
same time.

> > +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
> 
> Better not do it if this cluster is already ZERO.. But it may be a
> later patch to improve that.

I doubt it's worth writing code to optimise a corner case inside a
corner case.

> > +            qemu_co_mutex_lock(&s->lock);
> > +
> > +            qemu_vfree(buf);
> > +            if (ret < 0) {
> > +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
> > +                goto fail;
> > +            }
> > +        }
> > +    }
> > +
> >       if (prealloc != PREALLOC_MODE_OFF) {
> >           /* Flush metadata before actually changing the image size */
> >           ret = qcow2_write_caches(bs);

Kevin



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-24 12:17     ` Kevin Wolf
@ 2020-04-24 14:16       ` Eric Blake
  2020-04-24 14:27       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Blake @ 2020-04-24 14:16 UTC (permalink / raw)
  To: Kevin Wolf, Vladimir Sementsov-Ogievskiy
  Cc: berto, qemu-devel, qemu-block, mreitz

On 4/24/20 7:17 AM, Kevin Wolf wrote:

> 
>>> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
>>
>> Better not do it if this cluster is already ZERO.. But it may be a
>> later patch to improve that.
> 
> I doubt it's worth writing code to optimise a corner case inside a
> corner case.

I spotted the same issue, and my suggestion was to use 
qcow2_co_pwrite_zero instead of qcow2_co_pwritev_part, but Kevin pointed 
out that my idea would probably not work the way I thought.  Then again, 
checking if the page is already zero is all the more benefit that 
qcow2_co_pwrite_zero would have given for me to have raised the 
question.  The following example illustrates why it might be worthwhile:

Suppose we have an image with 1M clusters, which is an unaligned 100.5M 
in length but started life with all clusters sparse.  We then resize it 
to another unaligned 200.5M.  The call to qcow2_co_pwritev_part will 
cause the image to be non-sparse for one cluster out of 201, whereas 
adding a check to skip the pwritev because the original unaligned tail 
cluster already read as zero will let us keep all clusters sparse.

At any rate, I argued that any such optimization would be a followup 
patch, and Kevin is right that it is a corner case optimization unlikely 
to affect many real-life images.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
  2020-04-24 12:17     ` Kevin Wolf
  2020-04-24 14:16       ` Eric Blake
@ 2020-04-24 14:27       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 27+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24 14:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: berto, qemu-devel, qemu-block, mreitz

24.04.2020 15:17, Kevin Wolf wrote:
> Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 23.04.2020 18:01, Kevin Wolf wrote:
>>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
>>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
>>> undo any previous preallocation, but just adds the zero flag to all
>>> relevant L2 entries. If an external data file is in use, a write_zeroes
>>> request to the data file is made instead.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    block/qcow2-cluster.c |  2 +-
>>>    block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
>>>    2 files changed, 34 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 17f1363279..4b5fc8c4a7 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>>>        /* Caller must pass aligned values, except at image end */
>>>        assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>>>        assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
>>> -           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
>>> +           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
>>>        /* The zero flag is only supported by version 3 and newer */
>>>        if (s->qcow_version < 3) {
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 9cfbdfc939..ad621fe404 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>>>        bs->supported_zero_flags = header.version >= 3 ?
>>>                                   BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
>>> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>>>        /* Repair image if dirty */
>>>        if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
>>> @@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>>            g_assert_not_reached();
>>>        }
>>> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> +
>>> +        /*
>>> +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
>>> +         * requires a cluster-aligned start. The end may be unaligned if it is
>>> +         * at the end of the image (which it is here).
>>> +         */
>>> +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
>>> +        if (ret < 0) {
>>> +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
>>> +            goto fail;
>>> +        }
>>> +
>>> +        /* Write explicit zeros for the unaligned head */
>>> +        if (zero_start > old_length) {
>>> +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
>>
>> Why not s/s->cluster_size/zero_start - old_length/? We may save a lot
>> of pages if cluster_size is large.
> 
> Ok.
> 
>>> +            QEMUIOVector qiov;
>>> +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
>>> +
>>> +            qemu_co_mutex_unlock(&s->lock);
>>
>> We are in intermediate state here. Is it safe to unlock? Anything may
>> happen, up to another truncate..
> 
> I don't think it's worse than unlocking during normal writes, which we
> have been doing for a long time. I don't see anything that would corrupt
> any internal state.
> 
> Of course, doing truncate in parallel with other operations around EOF
> has somewhat undefined semantics because the order could be anything.
> But if you don't want to get undefined results, you just shouldn't make
> such requests. It's similar to reading and writing the same area at the
> same time.

If there would be two truncate operations in parallel, we may finish up with s->l1_vm_state_index bs->total_sectors not corresponding to other metadata. Not sure how much is it bad..

> 
>>> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
>>
>> Better not do it if this cluster is already ZERO.. But it may be a
>> later patch to improve that.
> 
> I doubt it's worth writing code to optimise a corner case inside a
> corner case.
> 
>>> +            qemu_co_mutex_lock(&s->lock);
>>> +
>>> +            qemu_vfree(buf);
>>> +            if (ret < 0) {
>>> +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
>>> +                goto fail;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>>        if (prealloc != PREALLOC_MODE_OFF) {
>>>            /* Flush metadata before actually changing the image size */
>>>            ret = qcow2_write_caches(bs);
> 
> Kevin
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-04-24 14:28 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 01/10] block: Add flags to BlockDriver.bdrv_co_truncate() Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 02/10] block: Add flags to bdrv(_co)_truncate() Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 03/10] block-backend: Add flags to blk_truncate() Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Kevin Wolf
2020-04-23 15:18   ` Eric Blake
2020-04-23 15:48     ` Kevin Wolf
2020-04-24  6:16   ` Vladimir Sementsov-Ogievskiy
2020-04-24 12:17     ` Kevin Wolf
2020-04-24 14:16       ` Eric Blake
2020-04-24 14:27       ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 05/10] raw-format: " Kevin Wolf
2020-04-24  6:21   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 06/10] file-posix: " Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 07/10] block: truncate: Don't make backing file data visible Kevin Wolf
2020-04-23 15:21   ` Eric Blake
2020-04-23 17:59   ` Max Reitz
2020-04-24  6:45   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info() Kevin Wolf
2020-04-24  6:51   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 09/10] iotests: Test committing to short backing file Kevin Wolf
2020-04-24  8:53   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation Kevin Wolf
2020-04-23 15:36   ` Eric Blake
2020-04-23 16:04     ` Kevin Wolf
2020-04-23 16:15       ` Eric Blake
2020-04-23 18:05   ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.