All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization
@ 2012-07-25 12:21 Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 1/7] docs: add dirty bit to qcow2 specification Stefan Hajnoczi
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

This series aims to improve qcow2 performance with cache=writethrough and
cache=directsync.  In particular it reduces the impact of metadata updates for
allocating writes.

This optimization should help make qcow2 a good choice even for
cache=writethrough and cache=directsync where QED has traditionally had an
advantage due to less metadata - this allows us to converge image format
development in QEMU around the qcow2v3 format.

Allocating writes are expensive because they involve updating L2 tables and
refcount blocks.  In addition they can also cause L2 table allocation and
refcount block allocation but these remain unaffected by this optimization.

The key insight is that refcounts are not required to access data in the image.
This means that we can postpone refcount updates without violating the data
integrity guarantee that cache=writethrough and cache=directsync give.

The trade-off for postponing refcount updates is that the image may not be
completely consistent in case of power failure or crash.  If the image is dirty
then it must be repaired before performing further modifications, in other
words we need an fsck-like scan on startup.

fio benchmark results:
WRITE lazy_refcounts=off io=552650KB
WRITE lazy_refcounts=on  io=708264KB (+28%)

Disk: Intel X25-M 120GB 2nd Gen
Host file system: ext4
Host kernel: Linux 3.2.0-3-amd64 Debian

Note that these random 8k writes are to an empty qcow2 file.  This is the worst
case scenario for cache=writethrough and cache=directsync because each I/O is
likely to involve metadata I/O, including refcount updates.

$ qemu-system-x86_64 -enable-kvm -m 1024 \
                     -drive if=virtio,cache=none,file=../vm/rhel-6.img \
                     -drive if=virtio,cache=writethrough,file=a.qcow2

Job file:
[global]
ioengine=aio
buffered=0
rw=write
bs=8k
runtime=3m
time_based=1
iodepth=1
filename=/dev/vdb

[job1]

v2:
 * Fixed feature bit qcow2 spec formatting [Kevin]
 * Record feature bit names in qcow2 header for better error messages [Kevin]
 * Use separate enums for different feature bit types [Kevin]
 * Check that dirty bit is still set after read-only image access [Kevin]
 * Check that the dirty flag is never set with lazy_refcounts=off [Kevin]

Stefan Hajnoczi (7):
  docs: add dirty bit to qcow2 specification
  qcow2: introduce dirty bit
  docs: add lazy refcounts bit to qcow2 specification
  qemu-iotests: ignore qemu-img create lazy_refcounts output
  qcow2: implement lazy refcounts
  qemu-io: add "abort" command to simulate program crash
  qemu-iotests: add 039 qcow2 lazy refcounts test

 block/qcow2-cluster.c        |    5 +-
 block/qcow2.c                |  121 +++++++++++++++++++++++++++++++++++++++---
 block/qcow2.h                |   21 ++++++++
 block_int.h                  |   26 ++++-----
 docs/specs/qcow2.txt         |   14 ++++-
 qemu-io.c                    |   12 +++++
 tests/qemu-iotests/031.out   |   20 +++----
 tests/qemu-iotests/036.out   |    4 +-
 tests/qemu-iotests/039       |  117 ++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/039.out   |   42 +++++++++++++++
 tests/qemu-iotests/common.rc |    3 +-
 tests/qemu-iotests/group     |    1 +
 12 files changed, 351 insertions(+), 35 deletions(-)
 create mode 100755 tests/qemu-iotests/039
 create mode 100644 tests/qemu-iotests/039.out

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 1/7] docs: add dirty bit to qcow2 specification
  2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
@ 2012-07-25 12:21 ` Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 2/7] qcow2: introduce dirty bit Stefan Hajnoczi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

The dirty bit will make it possible to perform lazy refcount updates,
where the image file is not kept consistent all the time.  Upon opening
a dirty image file, it is necessary to perform a consistency check and
repair any incorrect refcounts.

Therefore the dirty bit must be an incompatible feature bit.  We don't
want old programs accessing a file with stale refcounts.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 docs/specs/qcow2.txt |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 87bf785..339cdc1 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -75,7 +75,12 @@ in the description of a field.
                     Bitmask of incompatible features. An implementation must
                     fail to open an image if an unknown bit is set.
 
-                    Bits 0-63:  Reserved (set to 0)
+                    Bit 0:      Dirty bit.  If this bit is set then refcounts
+                                may be inconsistent, make sure to scan L1/L2
+                                tables to repair refcounts before accessing the
+                                image.
+
+                    Bits 1-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 2/7] qcow2: introduce dirty bit
  2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 1/7] docs: add dirty bit to qcow2 specification Stefan Hajnoczi
@ 2012-07-25 12:21 ` Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 3/7] docs: add lazy refcounts bit to qcow2 specification Stefan Hajnoczi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

This patch adds an incompatible feature bit to mark images that have not
been closed cleanly.  When a dirty image file is opened a consistency
check and repair is performed.

Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block/qcow2.c              |   50 +++++++++++++++++++++++++++++++++++++++++---
 block/qcow2.h              |    8 +++++++
 tests/qemu-iotests/031.out |   20 +++++++++---------
 tests/qemu-iotests/036.out |    4 ++--
 4 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 870148d..7fe1567 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -214,6 +214,27 @@ static void report_unsupported_feature(BlockDriverState *bs,
     }
 }
 
+/*
+ * Clears the dirty bit and flushes before if necessary.  Only call this
+ * function when there are no pending requests, it does not guard against
+ * concurrent requests dirtying the image.
+ */
+static int qcow2_mark_clean(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
+        int ret = bdrv_flush(bs);
+        if (ret < 0) {
+            return ret;
+        }
+
+        s->incompatible_features &= ~QCOW2_INCOMPAT_DIRTY;
+        return qcow2_update_header(bs);
+    }
+    return 0;
+}
+
 static int qcow2_open(BlockDriverState *bs, int flags)
 {
     BDRVQcowState *s = bs->opaque;
@@ -287,12 +308,13 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     s->compatible_features      = header.compatible_features;
     s->autoclear_features       = header.autoclear_features;
 
-    if (s->incompatible_features != 0) {
+    if (s->incompatible_features & ~QCOW2_INCOMPAT_MASK) {
         void *feature_table = NULL;
         qcow2_read_extensions(bs, header.header_length, ext_end,
                               &feature_table);
         report_unsupported_feature(bs, feature_table,
-                                   s->incompatible_features);
+                                   s->incompatible_features &
+                                   ~QCOW2_INCOMPAT_MASK);
         ret = -ENOTSUP;
         goto fail;
     }
@@ -412,6 +434,22 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     /* Initialise locks */
     qemu_co_mutex_init(&s->lock);
 
+    /* Repair image if dirty */
+    if ((s->incompatible_features & QCOW2_INCOMPAT_DIRTY) &&
+        !bs->read_only) {
+        BdrvCheckResult result = {0};
+
+        ret = qcow2_check_refcounts(bs, &result, BDRV_FIX_ERRORS);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        ret = qcow2_mark_clean(bs);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
 #ifdef DEBUG_ALLOC
     {
         BdrvCheckResult result = {0};
@@ -785,6 +823,8 @@ static void qcow2_close(BlockDriverState *bs)
     qcow2_cache_flush(bs, s->l2_table_cache);
     qcow2_cache_flush(bs, s->refcount_block_cache);
 
+    qcow2_mark_clean(bs);
+
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
 
@@ -949,7 +989,11 @@ int qcow2_update_header(BlockDriverState *bs)
 
     /* Feature table */
     Qcow2Feature features[] = {
-        /* no feature defined yet */
+        {
+            .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+            .bit  = QCOW2_INCOMPAT_DIRTY_BITNR,
+            .name = "dirty bit",
+        },
     };
 
     ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
diff --git a/block/qcow2.h b/block/qcow2.h
index 455b6d7..b5fefc0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -110,6 +110,14 @@ enum {
     QCOW2_FEAT_TYPE_AUTOCLEAR       = 2,
 };
 
+/* Incompatible feature bits */
+enum {
+    QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
+    QCOW2_INCOMPAT_DIRTY         = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
+
+    QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY,
+};
+
 typedef struct Qcow2Feature {
     uint8_t type;
     uint8_t bit;
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index d3cab30..297b458 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -54,8 +54,8 @@ header_length             72
 
 Header extension:
 magic                     0x6803f857
-length                    0
-data                      ''
+length                    48
+data                      <binary>
 
 Header extension:
 magic                     0x12345678
@@ -68,7 +68,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   2
-backing_file_offset       0x98
+backing_file_offset       0xc8
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -92,8 +92,8 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    0
-data                      ''
+length                    48
+data                      <binary>
 
 Header extension:
 magic                     0x12345678
@@ -155,8 +155,8 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    0
-data                      ''
+length                    48
+data                      <binary>
 
 Header extension:
 magic                     0x12345678
@@ -169,7 +169,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   3
-backing_file_offset       0xb8
+backing_file_offset       0xe8
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -193,8 +193,8 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    0
-data                      ''
+length                    48
+data                      <binary>
 
 Header extension:
 magic                     0x12345678
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index 6953e37..ca0fda1 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -46,7 +46,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    0
-data                      ''
+length                    48
+data                      <binary>
 
 *** done
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 3/7] docs: add lazy refcounts bit to qcow2 specification
  2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 1/7] docs: add dirty bit to qcow2 specification Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 2/7] qcow2: introduce dirty bit Stefan Hajnoczi
@ 2012-07-25 12:21 ` Stefan Hajnoczi
  2012-07-26 12:57   ` Kevin Wolf
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 4/7] qemu-iotests: ignore qemu-img create lazy_refcounts output Stefan Hajnoczi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

The lazy refcounts bit indicates that this image can take advantage of
the dirty bit and that refcount updates can be postponed.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 docs/specs/qcow2.txt |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 339cdc1..f2f4faa 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -86,7 +86,12 @@ in the description of a field.
                     Bitmask of compatible features. An implementation can
                     safely ignore any unknown bits that are set.
 
-                    Bits 0-63:  Reserved (set to 0)
+                    Bit 0:      Lazy refcounts bit.  If this bit is set then
+                                lazy refcount updates can be used.  This means
+                                postponing marking the image file dirty and
+                                postponing refcount metadata updates.
+
+                    Bits 1-63:  Reserved (set to 0)
 
          88 -  95:  autoclear_features
                     Bitmask of auto-clear features. An implementation may only
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 4/7] qemu-iotests: ignore qemu-img create lazy_refcounts output
  2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 3/7] docs: add lazy refcounts bit to qcow2 specification Stefan Hajnoczi
@ 2012-07-25 12:21 ` Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 5/7] qcow2: implement lazy refcounts Stefan Hajnoczi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

Hide the default lazy_refcounts=off output from qemu-img like we do with
other image creation options.  This ensures that existing golden outputs
continue to pass despite the new option that has been added.

Note that this patch applies before the one that actually introduces the
lazy_refcounts=on|off option.  This ensures git-bisect(1) continues to
work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 tests/qemu-iotests/common.rc |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 5e3a524..cc4e39b 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -113,7 +113,8 @@ _make_test_img()
 	sed -e "s# table_size=0##g" | \
 	sed -e "s# compat='[^']*'##g" | \
 	sed -e "s# compat6=off##g" | \
-	sed -e "s# static=off##g"
+	sed -e "s# static=off##g" | \
+	sed -e "s# lazy_refcounts=off##g"
 }
 
 _cleanup_test_img()
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 5/7] qcow2: implement lazy refcounts
  2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 4/7] qemu-iotests: ignore qemu-img create lazy_refcounts output Stefan Hajnoczi
@ 2012-07-25 12:21 ` Stefan Hajnoczi
  2012-07-26 13:15   ` Kevin Wolf
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 6/7] qemu-io: add "abort" command to simulate program crash Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test Stefan Hajnoczi
  6 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

Lazy refcounts is a performance optimization for qcow2 that postpones
refcount metadata updates and instead marks the image dirty.  In the
case of crash or power failure the image will be left in a dirty state
and repaired next time it is opened.

Reducing metadata I/O is important for cache=writethrough and
cache=directsync because these modes guarantee that data is on disk
after each write (hence we cannot take advantage of caching updates in
RAM).  Refcount metadata is not needed for guest->file block address
translation and therefore does not need to be on-disk at the time of
write completion - this is the motivation behind the lazy refcount
optimization.

The lazy refcount optimization must be enabled at image creation time:

  qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
  qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough

Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block/qcow2-cluster.c      |    5 +++-
 block/qcow2.c              |   71 +++++++++++++++++++++++++++++++++++++++++---
 block/qcow2.h              |   13 ++++++++
 block_int.h                |   26 ++++++++--------
 tests/qemu-iotests/031.out |   12 ++++----
 tests/qemu-iotests/036.out |    2 +-
 6 files changed, 105 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d7e0e19..e179211 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -662,7 +662,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
         qcow2_cache_depends_on_flush(s->l2_table_cache);
     }
 
-    qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
+    if (qcow2_need_accurate_refcounts(s)) {
+        qcow2_cache_set_dependency(bs, s->l2_table_cache,
+                                   s->refcount_block_cache);
+    }
     ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
     if (ret < 0) {
         goto err;
diff --git a/block/qcow2.c b/block/qcow2.c
index 7fe1567..d48527f7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -215,6 +215,39 @@ static void report_unsupported_feature(BlockDriverState *bs,
 }
 
 /*
+ * Sets the dirty bit and flushes afterwards if necessary.
+ *
+ * The incompatible_features bit is only set if the image file header was
+ * updated successfully.  Therefore it is not required to check the return
+ * value of this function.
+ */
+static int qcow2_mark_dirty(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t val;
+    int ret;
+
+    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
+        return 0; /* already dirty */
+    }
+
+    val = cpu_to_be64(s->incompatible_features | QCOW2_INCOMPAT_DIRTY);
+    ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, incompatible_features),
+                      &val, sizeof(val));
+    if (ret < 0) {
+        return ret;
+    }
+    ret = bdrv_flush(bs->file);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* Only treat image as dirty if the header was updated successfully */
+    s->incompatible_features |= QCOW2_INCOMPAT_DIRTY;
+    return 0;
+}
+
+/*
  * Clears the dirty bit and flushes before if necessary.  Only call this
  * function when there are no pending requests, it does not guard against
  * concurrent requests dirtying the image.
@@ -752,6 +785,11 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             goto fail;
         }
 
+        if (l2meta.nb_clusters > 0 &&
+            (s->compatible_features & QCOW2_COMPAT_LAZY_REFCOUNTS)) {
+            qcow2_mark_dirty(bs);
+        }
+
         cluster_offset = l2meta.cluster_offset;
         assert((cluster_offset & 511) == 0);
 
@@ -994,6 +1032,11 @@ int qcow2_update_header(BlockDriverState *bs)
             .bit  = QCOW2_INCOMPAT_DIRTY_BITNR,
             .name = "dirty bit",
         },
+        {
+            .type = QCOW2_FEAT_TYPE_COMPATIBLE,
+            .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
+            .name = "lazy refcounts",
+        },
     };
 
     ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
@@ -1176,6 +1219,11 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         header.crypt_method = cpu_to_be32(QCOW_CRYPT_NONE);
     }
 
+    if (flags & BLOCK_FLAG_LAZY_REFCOUNTS) {
+        header.compatible_features |=
+            cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS);
+    }
+
     ret = bdrv_pwrite(bs, 0, &header, sizeof(header));
     if (ret < 0) {
         goto out;
@@ -1289,6 +1337,8 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
                     options->value.s);
                 return -EINVAL;
             }
+        } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
+            flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
         }
         options++;
     }
@@ -1299,6 +1349,12 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
         return -EINVAL;
     }
 
+    if (version < 3 && (flags & BLOCK_FLAG_LAZY_REFCOUNTS)) {
+        fprintf(stderr, "Lazy refcounts only supported with compatibility "
+                "level 1.1 and above (use compat=1.1 or greater)\n");
+        return -EINVAL;
+    }
+
     return qcow2_create2(filename, sectors, backing_file, backing_fmt, flags,
                          cluster_size, prealloc, options, version);
 }
@@ -1485,10 +1541,12 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
         return ret;
     }
 
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
-    if (ret < 0) {
-        qemu_co_mutex_unlock(&s->lock);
-        return ret;
+    if (qcow2_need_accurate_refcounts(s)) {
+        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+        if (ret < 0) {
+            qemu_co_mutex_unlock(&s->lock);
+            return ret;
+        }
     }
     qemu_co_mutex_unlock(&s->lock);
 
@@ -1603,6 +1661,11 @@ static QEMUOptionParameter qcow2_create_options[] = {
         .type = OPT_STRING,
         .help = "Preallocation mode (allowed values: off, metadata)"
     },
+    {
+        .name = BLOCK_OPT_LAZY_REFCOUNTS,
+        .type = OPT_FLAG,
+        .help = "Postpone refcount updates",
+    },
     { NULL }
 };
 
diff --git a/block/qcow2.h b/block/qcow2.h
index b5fefc0..b4eb654 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -118,6 +118,14 @@ enum {
     QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY,
 };
 
+/* Compatible feature bits */
+enum {
+    QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR = 0,
+    QCOW2_COMPAT_LAZY_REFCOUNTS       = 1 << QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
+
+    QCOW2_COMPAT_FEAT_MASK            = QCOW2_COMPAT_LAZY_REFCOUNTS,
+};
+
 typedef struct Qcow2Feature {
     uint8_t type;
     uint8_t bit;
@@ -245,6 +253,11 @@ static inline int qcow2_get_cluster_type(uint64_t l2_entry)
     }
 }
 
+/* Check whether refcounts are eager or lazy */
+static inline bool qcow2_need_accurate_refcounts(BDRVQcowState *s)
+{
+    return !(s->incompatible_features & QCOW2_INCOMPAT_DIRTY);
+}
 
 // FIXME Need qcow2_ prefix to global functions
 
diff --git a/block_int.h b/block_int.h
index d72317f..6c1d9ca 100644
--- a/block_int.h
+++ b/block_int.h
@@ -31,8 +31,9 @@
 #include "qemu-timer.h"
 #include "qapi-types.h"
 
-#define BLOCK_FLAG_ENCRYPT	1
-#define BLOCK_FLAG_COMPAT6	4
+#define BLOCK_FLAG_ENCRYPT          1
+#define BLOCK_FLAG_COMPAT6          4
+#define BLOCK_FLAG_LAZY_REFCOUNTS   8
 
 #define BLOCK_IO_LIMIT_READ     0
 #define BLOCK_IO_LIMIT_WRITE    1
@@ -41,16 +42,17 @@
 #define BLOCK_IO_SLICE_TIME     100000000
 #define NANOSECONDS_PER_SECOND  1000000000.0
 
-#define BLOCK_OPT_SIZE          "size"
-#define BLOCK_OPT_ENCRYPT       "encryption"
-#define BLOCK_OPT_COMPAT6       "compat6"
-#define BLOCK_OPT_BACKING_FILE  "backing_file"
-#define BLOCK_OPT_BACKING_FMT   "backing_fmt"
-#define BLOCK_OPT_CLUSTER_SIZE  "cluster_size"
-#define BLOCK_OPT_TABLE_SIZE    "table_size"
-#define BLOCK_OPT_PREALLOC      "preallocation"
-#define BLOCK_OPT_SUBFMT        "subformat"
-#define BLOCK_OPT_COMPAT_LEVEL  "compat"
+#define BLOCK_OPT_SIZE              "size"
+#define BLOCK_OPT_ENCRYPT           "encryption"
+#define BLOCK_OPT_COMPAT6           "compat6"
+#define BLOCK_OPT_BACKING_FILE      "backing_file"
+#define BLOCK_OPT_BACKING_FMT       "backing_fmt"
+#define BLOCK_OPT_CLUSTER_SIZE      "cluster_size"
+#define BLOCK_OPT_TABLE_SIZE        "table_size"
+#define BLOCK_OPT_PREALLOC          "preallocation"
+#define BLOCK_OPT_SUBFMT            "subformat"
+#define BLOCK_OPT_COMPAT_LEVEL      "compat"
+#define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
 
 typedef struct BdrvTrackedRequest BdrvTrackedRequest;
 
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index 297b458..796c993 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -54,7 +54,7 @@ header_length             72
 
 Header extension:
 magic                     0x6803f857
-length                    48
+length                    96
 data                      <binary>
 
 Header extension:
@@ -68,7 +68,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   2
-backing_file_offset       0xc8
+backing_file_offset       0xf8
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -92,7 +92,7 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    48
+length                    96
 data                      <binary>
 
 Header extension:
@@ -155,7 +155,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    48
+length                    96
 data                      <binary>
 
 Header extension:
@@ -169,7 +169,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   3
-backing_file_offset       0xe8
+backing_file_offset       0x118
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -193,7 +193,7 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    48
+length                    96
 data                      <binary>
 
 Header extension:
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index ca0fda1..063ca22 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -46,7 +46,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    48
+length                    96
 data                      <binary>
 
 *** done
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 6/7] qemu-io: add "abort" command to simulate program crash
  2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 5/7] qcow2: implement lazy refcounts Stefan Hajnoczi
@ 2012-07-25 12:21 ` Stefan Hajnoczi
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test Stefan Hajnoczi
  6 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

Avoiding data loss and corruption is the top requirement for image file
formats.  The qemu-io "abort" command makes it possible to simulate
program crashes and does not give the image format a chance to cleanly
shut down.  This command is useful for data integrity test cases.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 qemu-io.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/qemu-io.c b/qemu-io.c
index 8f3b94b..d0f4fb7 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1652,6 +1652,17 @@ static const cmdinfo_t map_cmd = {
        .oneline        = "prints the allocated areas of a file",
 };
 
+static int abort_f(int argc, char **argv)
+{
+    abort();
+}
+
+static const cmdinfo_t abort_cmd = {
+       .name           = "abort",
+       .cfunc          = abort_f,
+       .flags          = CMD_NOFILE_OK,
+       .oneline        = "simulate a program crash using abort(3)",
+};
 
 static int close_f(int argc, char **argv)
 {
@@ -1905,6 +1916,7 @@ int main(int argc, char **argv)
     add_command(&discard_cmd);
     add_command(&alloc_cmd);
     add_command(&map_cmd);
+    add_command(&abort_cmd);
 
     add_args_command(init_args_command);
     add_check_command(init_check_command);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 6/7] qemu-io: add "abort" command to simulate program crash Stefan Hajnoczi
@ 2012-07-25 12:21 ` Stefan Hajnoczi
  2012-07-25 17:54   ` Eric Blake
  2012-07-26 13:28   ` Kevin Wolf
  6 siblings, 2 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, Stefan Hajnoczi

This tests establishes the basic post-conditions of the qcow2 lazy
refcounts features:

  1. If the image was closed normally, it is marked clean.

  2. If an allocating write was performed and the image was not close
     normally, then it is marked dirty.

     a. Written data can be read back successfully.
     b. The image file can be repaired and will be marked clean again.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 tests/qemu-iotests/039     |  117 ++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/039.out |   42 ++++++++++++++++
 tests/qemu-iotests/group   |    1 +
 3 files changed, 160 insertions(+)
 create mode 100755 tests/qemu-iotests/039
 create mode 100644 tests/qemu-iotests/039.out

diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039
new file mode 100755
index 0000000..4bf792f
--- /dev/null
+++ b/tests/qemu-iotests/039
@@ -0,0 +1,117 @@
+#!/bin/bash
+#
+# Test qcow2 lazy refcounts
+#
+# Copyright (C) 2012 Red Hat, Inc.
+# Copyright IBM, Corp. 2010
+#
+# Based on test 038.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=stefanha@linux.vnet.ibm.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto generic
+_supported_os Linux
+
+size=128M
+
+echo
+echo "== Checking that image is clean on shutdown =="
+
+IMGOPTS="compat=1.1,lazy_refcounts=on"
+_make_test_img $size
+
+$QEMU_IO -c "write -P 0x5a 0 512" $TEST_IMG | _filter_qemu_io
+
+# The dirty bit must not be set
+./qcow2.py $TEST_IMG dump-header | grep incompatible_features
+_check_test_img
+
+echo
+echo "== Creating a dirty image file =="
+
+IMGOPTS="compat=1.1,lazy_refcounts=on"
+_make_test_img $size
+
+old_ulimit=$(ulimit -c)
+ulimit -c 0 # do not produce a core dump on abort(3)
+$QEMU_IO -c "write -P 0x5a 0 512" -c "abort" $TEST_IMG | _filter_qemu_io
+ulimit -c "$old_ulimit"
+
+# The dirty bit must be set
+./qcow2.py $TEST_IMG dump-header | grep incompatible_features
+_check_test_img
+
+echo
+echo "== Read-only access must still work =="
+
+$QEMU_IO -r -c "read -P 0x5a 0 512" $TEST_IMG | _filter_qemu_io
+
+# The dirty bit must be set
+./qcow2.py $TEST_IMG dump-header | grep incompatible_features
+
+echo
+echo "== Repairing the image file must succeed =="
+
+$QEMU_IMG check -r all $TEST_IMG
+
+# The dirty bit must not be set
+./qcow2.py $TEST_IMG dump-header | grep incompatible_features
+
+echo
+echo "== Data should still be accessible after repair =="
+
+$QEMU_IO -c "read -P 0x5a 0 512" $TEST_IMG | _filter_qemu_io
+
+echo
+echo "== Creating an image file with lazy_refcounts=off =="
+
+IMGOPTS="compat=1.1,lazy_refcounts=off"
+_make_test_img $size
+
+old_ulimit=$(ulimit -c)
+ulimit -c 0 # do not produce a core dump on abort(3)
+$QEMU_IO -c "write -P 0x5a 0 512" -c "abort" $TEST_IMG | _filter_qemu_io
+ulimit -c "$old_ulimit"
+
+# The dirty bit must not be set since lazy_refcounts=off
+./qcow2.py $TEST_IMG dump-header | grep incompatible_features
+_check_test_img
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
+
diff --git a/tests/qemu-iotests/039.out b/tests/qemu-iotests/039.out
new file mode 100644
index 0000000..6be47c0
--- /dev/null
+++ b/tests/qemu-iotests/039.out
@@ -0,0 +1,42 @@
+QA output created by 039
+
+== Checking that image is clean on shutdown ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 lazy_refcounts=on 
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+incompatible_features     0x0
+No errors were found on the image.
+
+== Creating a dirty image file ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 lazy_refcounts=on 
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+incompatible_features     0x1
+ERROR OFLAG_COPIED: offset=8000000000050000 refcount=0
+ERROR cluster 5 refcount=0 reference=1
+
+2 errors were found on the image.
+Data may be corrupted, or further writes to the image may corrupt it.
+
+== Read-only access must still work ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+incompatible_features     0x1
+
+== Repairing the image file must succeed ==
+ERROR OFLAG_COPIED: offset=8000000000050000 refcount=0
+Repairing cluster 5 refcount=0 reference=1
+No errors were found on the image.
+incompatible_features     0x0
+
+== Data should still be accessible after repair ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Creating an image file with lazy_refcounts=off ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+incompatible_features     0x0
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 7a2c92b..ebb5ca4 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -45,3 +45,4 @@
 036 rw auto quick
 037 rw auto backing
 038 rw auto backing
+039 rw auto
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test Stefan Hajnoczi
@ 2012-07-25 17:54   ` Eric Blake
  2012-07-25 22:45     ` Stefan Hajnoczi
  2012-07-26 13:28   ` Kevin Wolf
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Blake @ 2012-07-25 17:54 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

On 07/25/2012 06:21 AM, Stefan Hajnoczi wrote:
> This tests establishes the basic post-conditions of the qcow2 lazy
> refcounts features:
> 
>   1. If the image was closed normally, it is marked clean.
> 
>   2. If an allocating write was performed and the image was not close
>      normally, then it is marked dirty.
> 
>      a. Written data can be read back successfully.
>      b. The image file can be repaired and will be marked clean again.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

> +++ b/tests/qemu-iotests/039
> @@ -0,0 +1,117 @@
> +#!/bin/bash

Since you are assuming bash (and even if you were to assume POSIX
/bin/sh)...

> +
> +seq=`basename $0`

I prefer $() over ``.

> +echo "QA output created by $seq"
> +
> +here=`pwd`

POSIX (and therefore bash) guarantees that $PWD is sane, and faster to
access than $(pwd).

> +tmp=/tmp/$$

That's not very secure.  It may be worth using bash's $RANDOM, or using
mkstemp(1).

Beyond that, the series seemed reasonable to me.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-25 17:54   ` Eric Blake
@ 2012-07-25 22:45     ` Stefan Hajnoczi
  2012-07-25 23:41       ` Eric Blake
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-25 22:45 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, qemu-devel

On Wed, Jul 25, 2012 at 11:54:50AM -0600, Eric Blake wrote:
> On 07/25/2012 06:21 AM, Stefan Hajnoczi wrote:
> > This tests establishes the basic post-conditions of the qcow2 lazy
> > refcounts features:
> > 
> >   1. If the image was closed normally, it is marked clean.
> > 
> >   2. If an allocating write was performed and the image was not close
> >      normally, then it is marked dirty.
> > 
> >      a. Written data can be read back successfully.
> >      b. The image file can be repaired and will be marked clean again.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> 
> > +++ b/tests/qemu-iotests/039
> > @@ -0,0 +1,117 @@
> > +#!/bin/bash
> 
> Since you are assuming bash (and even if you were to assume POSIX
> /bin/sh)...
> 
> > +
> > +seq=`basename $0`
> 
> I prefer $() over ``.
> 
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> 
> POSIX (and therefore bash) guarantees that $PWD is sane, and faster to
> access than $(pwd).
> 
> > +tmp=/tmp/$$
> 
> That's not very secure.  It may be worth using bash's $RANDOM, or using
> mkstemp(1).
> 
> Beyond that, the series seemed reasonable to me.

All qemu-iotests scripts do these things in the same way and I'd like
for them to be consistent.

If we make these changes they should be applied to all qemu-iotests
scripts.  I agree with your points but also think the value in making
the change now is small.

Do you want to send a patch that fixes these issues in qemu-iotests?
The general shell scripting style used there is quite old school and
makes use of backquotes often.

Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-25 22:45     ` Stefan Hajnoczi
@ 2012-07-25 23:41       ` Eric Blake
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Blake @ 2012-07-25 23:41 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Khoa Huynh, Anthony Liguori, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

On 07/25/2012 04:45 PM, Stefan Hajnoczi wrote:

>> Since you are assuming bash (and even if you were to assume POSIX
>> /bin/sh)...
>>
>>> +
>>> +seq=`basename $0`
>>
>> I prefer $() over ``.
>>
>>> +echo "QA output created by $seq"
>>> +
>>> +here=`pwd`
>>
>> POSIX (and therefore bash) guarantees that $PWD is sane, and faster to
>> access than $(pwd).
>>
>>> +tmp=/tmp/$$
>>
>> That's not very secure.  It may be worth using bash's $RANDOM, or using
>> mkstemp(1).
>>
>> Beyond that, the series seemed reasonable to me.
> 
> All qemu-iotests scripts do these things in the same way and I'd like
> for them to be consistent.

Good argument.

> 
> If we make these changes they should be applied to all qemu-iotests
> scripts.  I agree with your points but also think the value in making
> the change now is small.

Indeed - what you have is technically correct, even if not the most
efficient.  Any such cleanups should, as you say, be a separate patch
globally applied to the qemu-iotests, and not this test in isolation.

> 
> Do you want to send a patch that fixes these issues in qemu-iotests?

Up to you; or read another way, it bothered me enough to comment, but
not enough to write the patch myself, so I'm fine living with status quo
if it doesn't bother anyone else either.  Old-school techniques aren't
wrong, per se, just inefficient; and while the insecure temp file name
could be exploited, people running the testsuite tend to be on personal
platforms rather than enterprise systems, and the cost of exploiting a
testsuite is not as severe as the cost of exploiting an installed script.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] docs: add lazy refcounts bit to qcow2 specification
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 3/7] docs: add lazy refcounts bit to qcow2 specification Stefan Hajnoczi
@ 2012-07-26 12:57   ` Kevin Wolf
  2012-07-26 15:36     ` Stefan Hajnoczi
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2012-07-26 12:57 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Khoa Huynh, Anthony Liguori, qemu-devel

Am 25.07.2012 14:21, schrieb Stefan Hajnoczi:
> The lazy refcounts bit indicates that this image can take advantage of
> the dirty bit and that refcount updates can be postponed.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  docs/specs/qcow2.txt |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 339cdc1..f2f4faa 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -86,7 +86,12 @@ in the description of a field.
>                      Bitmask of compatible features. An implementation can
>                      safely ignore any unknown bits that are set.
>  
> -                    Bits 0-63:  Reserved (set to 0)
> +                    Bit 0:      Lazy refcounts bit.  If this bit is set then
> +                                lazy refcount updates can be used.  This means
> +                                postponing marking the image file dirty and

Not sure what was intended here, but the dirty flag isn't set anywhere
yet, so you can't postpone it either. Maybe s/postponing//?

> +                                postponing refcount metadata updates.
> +
> +                    Bits 1-63:  Reserved (set to 0)
>  
>           88 -  95:  autoclear_features
>                      Bitmask of auto-clear features. An implementation may only

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/7] qcow2: implement lazy refcounts
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 5/7] qcow2: implement lazy refcounts Stefan Hajnoczi
@ 2012-07-26 13:15   ` Kevin Wolf
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin Wolf @ 2012-07-26 13:15 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Khoa Huynh, Anthony Liguori, qemu-devel

Am 25.07.2012 14:21, schrieb Stefan Hajnoczi:
> Lazy refcounts is a performance optimization for qcow2 that postpones
> refcount metadata updates and instead marks the image dirty.  In the
> case of crash or power failure the image will be left in a dirty state
> and repaired next time it is opened.
> 
> Reducing metadata I/O is important for cache=writethrough and
> cache=directsync because these modes guarantee that data is on disk
> after each write (hence we cannot take advantage of caching updates in
> RAM).  Refcount metadata is not needed for guest->file block address
> translation and therefore does not need to be on-disk at the time of
> write completion - this is the motivation behind the lazy refcount
> optimization.
> 
> The lazy refcount optimization must be enabled at image creation time:
> 
>   qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
>   qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough
> 
> Update qemu-iotests 031 and 036 since the extension header size changes
> when we add feature bit table entries.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block/qcow2-cluster.c      |    5 +++-
>  block/qcow2.c              |   71 +++++++++++++++++++++++++++++++++++++++++---
>  block/qcow2.h              |   13 ++++++++
>  block_int.h                |   26 ++++++++--------
>  tests/qemu-iotests/031.out |   12 ++++----
>  tests/qemu-iotests/036.out |    2 +-
>  6 files changed, 105 insertions(+), 24 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index d7e0e19..e179211 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -662,7 +662,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>          qcow2_cache_depends_on_flush(s->l2_table_cache);
>      }
>  
> -    qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
> +    if (qcow2_need_accurate_refcounts(s)) {
> +        qcow2_cache_set_dependency(bs, s->l2_table_cache,
> +                                   s->refcount_block_cache);
> +    }
>      ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
>      if (ret < 0) {
>          goto err;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7fe1567..d48527f7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -215,6 +215,39 @@ static void report_unsupported_feature(BlockDriverState *bs,
>  }
>  
>  /*
> + * Sets the dirty bit and flushes afterwards if necessary.
> + *
> + * The incompatible_features bit is only set if the image file header was
> + * updated successfully.  Therefore it is not required to check the return
> + * value of this function.
> + */
> +static int qcow2_mark_dirty(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    uint64_t val;
> +    int ret;
> +
> +    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
> +        return 0; /* already dirty */
> +    }
> +
> +    val = cpu_to_be64(s->incompatible_features | QCOW2_INCOMPAT_DIRTY);
> +    ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, incompatible_features),
> +                      &val, sizeof(val));

If you respin, I think would be nice to have either an assert(s->version
== 3) before writing to qcow3 header fields, or to use
qcow2_update_header() in the first place.

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test Stefan Hajnoczi
  2012-07-25 17:54   ` Eric Blake
@ 2012-07-26 13:28   ` Kevin Wolf
  2012-07-27  7:56     ` Stefan Hajnoczi
  1 sibling, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2012-07-26 13:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Khoa Huynh, Anthony Liguori, qemu-devel

Am 25.07.2012 14:21, schrieb Stefan Hajnoczi:
> This tests establishes the basic post-conditions of the qcow2 lazy
> refcounts features:
> 
>   1. If the image was closed normally, it is marked clean.
> 
>   2. If an allocating write was performed and the image was not close
>      normally, then it is marked dirty.
> 
>      a. Written data can be read back successfully.
>      b. The image file can be repaired and will be marked clean again.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

I think an important case that is missing here is opening a dirty image
rw without having run qemu-img check -r first.

> +== Read-only access must still work ==
> +read 512/512 bytes at offset 0
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +incompatible_features     0x1
> +
> +== Repairing the image file must succeed ==
> +ERROR OFLAG_COPIED: offset=8000000000050000 refcount=0
> +Repairing cluster 5 refcount=0 reference=1
> +No errors were found on the image.
> +incompatible_features     0x0

I wonder what happened to the "The following inconsistencies were found
and repaired" message. Most likely not a problem with qemu-iotests,
though, but something unexpected in qemu-img.

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] docs: add lazy refcounts bit to qcow2 specification
  2012-07-26 12:57   ` Kevin Wolf
@ 2012-07-26 15:36     ` Stefan Hajnoczi
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-26 15:36 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Khoa Huynh, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Thu, Jul 26, 2012 at 1:57 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 25.07.2012 14:21, schrieb Stefan Hajnoczi:
>> The lazy refcounts bit indicates that this image can take advantage of
>> the dirty bit and that refcount updates can be postponed.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  docs/specs/qcow2.txt |    7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>> index 339cdc1..f2f4faa 100644
>> --- a/docs/specs/qcow2.txt
>> +++ b/docs/specs/qcow2.txt
>> @@ -86,7 +86,12 @@ in the description of a field.
>>                      Bitmask of compatible features. An implementation can
>>                      safely ignore any unknown bits that are set.
>>
>> -                    Bits 0-63:  Reserved (set to 0)
>> +                    Bit 0:      Lazy refcounts bit.  If this bit is set then
>> +                                lazy refcount updates can be used.  This means
>> +                                postponing marking the image file dirty and
>
> Not sure what was intended here, but the dirty flag isn't set anywhere
> yet, so you can't postpone it either. Maybe s/postponing//?

Yes, s/postponing//.

Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-26 13:28   ` Kevin Wolf
@ 2012-07-27  7:56     ` Stefan Hajnoczi
  2012-07-27  8:07       ` Kevin Wolf
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-27  7:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Khoa Huynh, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Thu, Jul 26, 2012 at 2:28 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 25.07.2012 14:21, schrieb Stefan Hajnoczi:
>> This tests establishes the basic post-conditions of the qcow2 lazy
>> refcounts features:
>>
>>   1. If the image was closed normally, it is marked clean.
>>
>>   2. If an allocating write was performed and the image was not close
>>      normally, then it is marked dirty.
>>
>>      a. Written data can be read back successfully.
>>      b. The image file can be repaired and will be marked clean again.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>
> I think an important case that is missing here is opening a dirty image
> rw without having run qemu-img check -r first.

I have added that test case.

>> +== Read-only access must still work ==
>> +read 512/512 bytes at offset 0
>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +incompatible_features     0x1
>> +
>> +== Repairing the image file must succeed ==
>> +ERROR OFLAG_COPIED: offset=8000000000050000 refcount=0
>> +Repairing cluster 5 refcount=0 reference=1
>> +No errors were found on the image.
>> +incompatible_features     0x0
>
> I wonder what happened to the "The following inconsistencies were found
> and repaired" message. Most likely not a problem with qemu-iotests,
> though, but something unexpected in qemu-img.

It's because opening a qcow2 image read/write when the dirty flag is
set causes a repair.  This accounts for the "Repairing cluster 5 ..."
message.

Then qemu-img check -r all calls bdrv_check() on an already repaired
image file and we get the "No errors were found on the image".

Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-27  7:56     ` Stefan Hajnoczi
@ 2012-07-27  8:07       ` Kevin Wolf
  2012-07-30 10:09         ` Stefan Hajnoczi
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2012-07-27  8:07 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Khoa Huynh, Anthony Liguori, Stefan Hajnoczi, qemu-devel

Am 27.07.2012 09:56, schrieb Stefan Hajnoczi:
> On Thu, Jul 26, 2012 at 2:28 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 25.07.2012 14:21, schrieb Stefan Hajnoczi:
>>> +== Read-only access must still work ==
>>> +read 512/512 bytes at offset 0
>>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>> +incompatible_features     0x1
>>> +
>>> +== Repairing the image file must succeed ==
>>> +ERROR OFLAG_COPIED: offset=8000000000050000 refcount=0
>>> +Repairing cluster 5 refcount=0 reference=1
>>> +No errors were found on the image.
>>> +incompatible_features     0x0
>>
>> I wonder what happened to the "The following inconsistencies were found
>> and repaired" message. Most likely not a problem with qemu-iotests,
>> though, but something unexpected in qemu-img.
> 
> It's because opening a qcow2 image read/write when the dirty flag is
> set causes a repair.  This accounts for the "Repairing cluster 5 ..."
> message.
> 
> Then qemu-img check -r all calls bdrv_check() on an already repaired
> image file and we get the "No errors were found on the image".

I see. Not exactly how it was intended... Do we need a BDRV_O_CHECK flag
that prevents the automatic repair or should we just live with the
suboptimal output when lazy refcounting is enabled?

Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test
  2012-07-27  8:07       ` Kevin Wolf
@ 2012-07-30 10:09         ` Stefan Hajnoczi
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2012-07-30 10:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Khoa Huynh, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Fri, Jul 27, 2012 at 9:07 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 27.07.2012 09:56, schrieb Stefan Hajnoczi:
>> On Thu, Jul 26, 2012 at 2:28 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 25.07.2012 14:21, schrieb Stefan Hajnoczi:
>>>> +== Read-only access must still work ==
>>>> +read 512/512 bytes at offset 0
>>>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> +incompatible_features     0x1
>>>> +
>>>> +== Repairing the image file must succeed ==
>>>> +ERROR OFLAG_COPIED: offset=8000000000050000 refcount=0
>>>> +Repairing cluster 5 refcount=0 reference=1
>>>> +No errors were found on the image.
>>>> +incompatible_features     0x0
>>>
>>> I wonder what happened to the "The following inconsistencies were found
>>> and repaired" message. Most likely not a problem with qemu-iotests,
>>> though, but something unexpected in qemu-img.
>>
>> It's because opening a qcow2 image read/write when the dirty flag is
>> set causes a repair.  This accounts for the "Repairing cluster 5 ..."
>> message.
>>
>> Then qemu-img check -r all calls bdrv_check() on an already repaired
>> image file and we get the "No errors were found on the image".
>
> I see. Not exactly how it was intended... Do we need a BDRV_O_CHECK flag
> that prevents the automatic repair or should we just live with the
> suboptimal output when lazy refcounting is enabled?

You noticed the issue so others might notice it too.  I'll send a
patch including qcow2 and qed changes to fix this using BDRV_O_CHECK.

Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-07-30 10:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-25 12:21 [Qemu-devel] [PATCH v2 0/7] qcow2: implement lazy refcounts optimization Stefan Hajnoczi
2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 1/7] docs: add dirty bit to qcow2 specification Stefan Hajnoczi
2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 2/7] qcow2: introduce dirty bit Stefan Hajnoczi
2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 3/7] docs: add lazy refcounts bit to qcow2 specification Stefan Hajnoczi
2012-07-26 12:57   ` Kevin Wolf
2012-07-26 15:36     ` Stefan Hajnoczi
2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 4/7] qemu-iotests: ignore qemu-img create lazy_refcounts output Stefan Hajnoczi
2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 5/7] qcow2: implement lazy refcounts Stefan Hajnoczi
2012-07-26 13:15   ` Kevin Wolf
2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 6/7] qemu-io: add "abort" command to simulate program crash Stefan Hajnoczi
2012-07-25 12:21 ` [Qemu-devel] [PATCH v2 7/7] qemu-iotests: add 039 qcow2 lazy refcounts test Stefan Hajnoczi
2012-07-25 17:54   ` Eric Blake
2012-07-25 22:45     ` Stefan Hajnoczi
2012-07-25 23:41       ` Eric Blake
2012-07-26 13:28   ` Kevin Wolf
2012-07-27  7:56     ` Stefan Hajnoczi
2012-07-27  8:07       ` Kevin Wolf
2012-07-30 10:09         ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.