All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions
@ 2014-11-24 15:56 Max Reitz
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 01/12] " Max Reitz
                   ` (17 more replies)
  0 siblings, 18 replies; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

As has been requested, this series adds new overlap check functions to
the qcow2 code. My local branch is called "qcow2-improved-overlap-v1",
but I am not so sure whether it is actually an improvement; that is left
for you to decide, dear reviewers.

See patch 1 for an explanation of why this series exists and what it
does. Patch 1 is basically the core of this series, the rest just
employs the functions introduced there.

In a later patch, we may want to change the meaning of the "constant"
overlap checking option to mean the same as "cached", which is
everything except for inactive L2 tables. This series does make
checking for overlaps with inactive L2 tables at runtime just as cheap
as everything else (constant time plus caching), but using these checks
means qemu has to read all the snapshot L1 tables when opening a qcow2
file. This does not take long, of course, but it does result in a bit of
overhead so I did not want to enable it by default.

I think just enabling all overlap checks by default after this series
should be fine and useful, though.


Benchmarks
==========

First, let me repeat the results I wrote as a reply to the cover letter
of v1:

I had a 1 GB qcow2 image in tmpfs (sorry, I "only" have 4 GB of RAM in
this laptop, and a 2 GB image will kill it) which I accessed from a
guest from inside qemu (Arch Linux, to be exact, because its image just
dumps you into a console and that's all I need). I ran

$ for i in $(seq 1 42); do \
    dd if=/dev/zero of=/dev/vda bs=65536 \
    done

because it does not matter what data is written to the image, I just
need to write to a lot of clusters fast to maximize pressure on the
overlap check.

The results were 13.5/10.5/12.41 % of CPU usage (recorded using perf) in
qcow2_check_metadata_overlap() with the current implementation and
0.08/0.09 % with this series applied (0.08 % with overlap-check=cached,
0.09 % with overlap-check=all).

Now as a table, just to have one here (and because it is useful when
skipping through this text):

Current implementation:     12.14 (± 1.519) %  (n = 3)
With this series applied:    0.08           %  (n = 1)
overlap-check=all:           0.09           %  (n = 1)


Now I did some other benchmarks.

The first of those is just running (on the host, obviously):

$ perf record -ag \
  qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=512 \
  /tmp/test.qcow2 2G

I ran ten rounds with the current implementation, ten rounds with these
patches applied and five rounds with cache_size set to 0 in
qcow2_create_empty_metadata_list() (which results in only one bitmap
existing at one time, and therefore a lot of conversions between the
bitmap and the list representation).

The current implementation had a CPU usage (fraction of cycles) in
qcow2_check_metadata_overlap() of 47.65 (±4.24) %. There was one rather
extreme outlier (36.31 %), with that removed it is 48.91 (±1.54) %.

With this series applied, the usage dropped to 0.149 (± 0.028) %.
Additionally, qcow2_metadata_list_enter() took 0.046 (± 0.021) %. Both
together took thus 0.195 (± 0.040) %.

With cache_size set to 0, the usage was 0.130 % (± 0.012 %) in
qcow2_check_metadata_overlap(); 0.056 % (± 0.011 %) in
qcow2_metadata_list_enter(); and 0.186 % (± 0.021 %). It dropped, but
still in range of standard deviation. Thus, heavy conversion between
bitmap and list conversion (in normal use cases due to random accesses)
should not be a problem at all.

As a table:

Current implementation:     48.91 (± 1.537) %  (n =  9)
With this series applied:   0.195 (± 0.040) %  (n = 10)
Only one bitmap cached:     0.186 (± 0.021) %  (n =  5)

And as a really noticeable consequence: Before this series, creating
the image like that took 16.624 s (15.97 s user). With this series, it
takes 4.502 s (3.94 s user).

Because statistics are fun, I collected the number of metadata list
operations for the second and the third tests:

Second test (default of 15 bitmaps cached):

List to bitmap conversions: 1053
Bitmap to list conversions: 1038
Bitmaps deallocated:        1038

Insert operations:         82266
Remove operations:            14
Queries:                  312265

Third test (only one bitmap cached):

List to bitmap conversions: 135943
Bitmap to list conversions:  66546
Bitmaps deallocated:        135942

Insert operations:           82266
Remove operations:              14
Queries:                    312265

As expected, the number of conversions is much higher. Interestingly, in
the first case, every bitmap deallocation also meant a bitmap to list
conversion; that means, every bitmap was modified at least once while it
was cached. In contrast to that, in the second case, only every second
deallocation resulted in a conversion, which means that half of the
cached bitmaps were only read (which is bad considering all was done was
allocating metadata structures).

But for performance, there is no real information here (we only see that
setting cache_size to 0 actually did increase the number of
conversions).


The second benchmark I ran today was a program which opened /dev/vda in
qemu (again on Arch) with O_DIRECT | O_SYNC | O_RDWR. It then wrote an
uninitialized 512 byte buffer to random places (512-byte-aligned) to a
1 GB image freshly created before VM launch with metadata preallocation
and 512 byte clusters. The intention was to break bitmap caching and
having to convert back and forth between list and bitmap representation.
The results are as follows:

Current implementation:     18.73 (± 0.157) %  (n = 10)
With this series applied:   0.068 (± 0.009) %  (n = 10)
Only one bitmap cached:     0.062 (± 0.008) %  (n =  5)


Runtime conclusion
------------------

As can be seen from the benchmarks, runtime CPU usage by the metadata
overlap checks is greatly decreased by this series (to 1/150 in the
first benchmark, to 1/250 in the second, and to 1/275 in the third).


Where is the caveat?
--------------------

The problem with this series is obviously that all the metadata
information is read when reading a qcow2 image. iotest 044 is a good
test for this. I personally haven't seen a problem here, but I can't
outrule any. I never noticed any waits when opening images.

When approaching this from a theoretical approach, it becomes clear that
there shouldn't be any problems here. If using the default of
overlap-check=cached, only information available anyway is used to built
up the metadata list (the same information that is iterated through for
every overlap check in the current implementation). Since building the
list simply means setting a bitmask in a bitmap and then transforming
that bitmap to a list (which has been shown not pose any performance
issues by the second benchmark), there should not be any problem here.

So, the only caveat I do see is increased code complexity. I initially
thought it might be too complex for its purpose; but after having done
the benchmarks, it became apparent to me that there is a problem with
the current implementation and that this series does fix it.

And RAM usage may pose a problem.


RAM usage
=========

So I have looked at my 2 GB image above, and the list uses 40 kB, which
may or may not be too much (sounds completely fine to me for an image
with 512 byte clusters); but it is a least a number I can use for
testing the following theoretical inspection:


So, once again, we assume the worst. Every metadata structure needs its
own entry in the list - actually, the worst would be that every cluster
needs its own entry, but that only makes a difference for L1 tables and
the refcount table, so we can ignore that. In fact, we can completely
ignore those tables; it makes things much easier.

Let's name the cluster size "CS", and name the image size in bytes "IS".

So, for a refcount width of 16 bits per entry, we can describe CS / 2
clusters per refblock; which means we need (IS / CS) / (CS / 2) =
2 * IS / (CS * CS) refblocks. Let's be generous and say that the L2
tables are capable of describing the complete image size (which in
practice they do not because the guest disk size is smaller than the
physical size). This assumption also has the neat side-effect of not
having to care about snapshots. If we have a lot of internal snapshots
with copied L2 tables, IS grows and therefore the next formula knows
that the number of L2 tables grows as well. Nice. So, as every L2 table
can describe CS / 8 (guest) clusters, we need 8 * IS / (CS * CS) L2
tables.

Therefore, ignoring the image header, L1 tables, the reftable and the
snapshot table, we have about the following amount of metadata clusters
per image (with 16 bit refcount entries):

(2 + 8) * IS / (CS * CS) = 10 * IS / (CS * CS)

We said that for every metadata cluster, we would need one entry in the
list. Every list entry takes 32 bits (4 bytes). Thus, we need
approximately up to

40 * IS / (CS * CS)

bytes for the metadata list (please ignore units and append "bytes" at
the resulting number...).

Let's test that for the above image, which has a disk size of 266 MB:

40 * 266M / (512 * 512) = 42 kB

Great! It works.


So, now let's set CS to 64 kB, because that is basically the only
cluster size we really care about. For a 1 TB image, we need 10 kB for
the list. Sounds great to me. For a 1 PB image, we will need 10 MB. Fair
enough. (Note that you don't need 10 MB of RAM to create a 1 PB image;
you only need that once the disk size of the image has reached 1 PB).

And 1 TB with 512 byte clusters? 160 MB. Urgh, that is a lot. But then
again, you can switch off the overlap check with overlap-check=off; and
trying to actually use a 1 TB image with 512 byte clusters is crazy in
itself (have you tried just creating one without preallocation? It takes
forever). So I can live with that.


tl;dr
=====

* CPU usage at runtime decreased by 150 to 275 percent on
  overlap-check-heavy tasks
* No additional performance problems at loading time (in theory has the
  same runtime complexity as a single overlap check right now; in
  practice I could not find any problems)
* Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
  for a 1 PB image etc. pp.)



As of this version, this series depends on
"[PATCH v6] qcow2: Buffer L1 table in snapshot refcount update".


v2:
- Patch 1:
  - Fixed a mistake regarding the value of
    Qcow2MetadataFragment::relative_start; this value should be relative
    to the window start, not relative to the end of the last fragment
    (destroy_window_bitmap() wrote the latter there in v1)
  - Use uint64_t for the window index everywhere
- Patch 8: Said "qcow2: Buffer L1 table in snapshot refcount update"
  removes l1_allocated in qcow2_update_snapshot_refcount() and basically
  replaces it by active_l1 (where active_l1 = !l1_allocated). In v1, the
  condition it was used in was actually wrong, this is fixed here, too
  (the active L2 cluster type should be removed from the list if we are
  working on the active L1 table, not if we are working on an inactive
  L1 table).


git-backport-diff against v2:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/12:[0010] [FC] 'qcow2: Add new overlap check functions'
002/12:[----] [--] 'qcow2: Pull up overlap check option evaluation'
003/12:[----] [--] 'qcow2: Create metadata list'
004/12:[----] [--] 'qcow2/overlaps: Protect image header'
005/12:[----] [--] 'qcow2/overlaps: Protect refcount table'
006/12:[----] [--] 'qcow2/overlaps: Protect refcount blocks'
007/12:[----] [--] 'qcow2/overlaps: Protect active L1 table'
008/12:[0002] [FC] 'qcow2/overlaps: Protect active L2 tables'
009/12:[----] [--] 'qcow2/overlaps: Protect snapshot table'
010/12:[----] [--] 'qcow2/overlaps: Protect inactive L1 tables'
011/12:[----] [-C] 'qcow2/overlaps: Protect inactive L2 tables'
012/12:[----] [--] 'qcow2: Use new metadata overlap check function'


Max Reitz (12):
  qcow2: Add new overlap check functions
  qcow2: Pull up overlap check option evaluation
  qcow2: Create metadata list
  qcow2/overlaps: Protect image header
  qcow2/overlaps: Protect refcount table
  qcow2/overlaps: Protect refcount blocks
  qcow2/overlaps: Protect active L1 table
  qcow2/overlaps: Protect active L2 tables
  qcow2/overlaps: Protect snapshot table
  qcow2/overlaps: Protect inactive L1 tables
  qcow2/overlaps: Protect inactive L2 tables
  qcow2: Use new metadata overlap check function

 block/Makefile.objs    |   3 +-
 block/qcow2-cluster.c  |  13 ++
 block/qcow2-overlap.c  | 400 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c | 202 ++++++++++---------------
 block/qcow2-snapshot.c | 105 ++++++++++++-
 block/qcow2.c          | 130 ++++++++++------
 block/qcow2.h          |  13 ++
 7 files changed, 694 insertions(+), 172 deletions(-)
 create mode 100644 block/qcow2-overlap.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2014-11-25 11:02   ` Max Reitz
                     ` (2 more replies)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 02/12] qcow2: Pull up overlap check option evaluation Max Reitz
                   ` (16 subsequent siblings)
  17 siblings, 3 replies; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

The existing qcow2 metadata overlap detection function used existing
structures to determine the location of the image metadata, from plain
fields such as l1_table_offset and l1_size in the BDRVQcowState, over
image structures in memory such as the L1 table for the L2 tables'
positions, or it even read the required data directly from disk for
every requested check, such as the snapshot L1 tables for the inactive
L2 tables' positions.

These new functions instead keep a dedicated structure for keeping track
of the metadata positions in memory. It consists of two parts: First,
there is one structure which is basically a list of all metadata
structures. Each entry has a bitmask of types (because some metadata
structures may actually overlap, such as active and inactive L2 tables),
a number of clusters occupied and the offset from the previous entry in
clusters. This structure requires relatively few memory, but checking a
certain point may take relatively long. Each entry is called a
"fragment".

Therefore, there is another representation which is a bitmap, or rather
a bytemap, of metadata types. The previously described list is split
into multiple windows with each describing a constant number of clusters
(WINDOW_SIZE). If the list is to be queried or changed, the respective
window is selected in constant time and the bitmap is generated from the
fragments belonging to the window. This bitmap can then be queried in
constant time and easily be changed.

Because the bitmap representation requires more memory, it is only used
as a cache. Whenever a window is removed from the cache, the fragment
list will be rebuilt from the bitmap if the latter has been modified.
Therefore, the fragment list is only used as the background
representation to save memory, whereas the bitmap is used whenever
possible.

Regarding the size of the fragment list in memory: As one refcount block
can handle cluster_size / 2 entries and one L2 table can handle
cluster_size / 8 entries, for a qcow2 image with the standard cluster
size of 64 kB, there is a ratio of data to metadata of about 1/6000
(1/32768 refblocks and 1/8192 L2 tables) if one ignores the fact that
not every cluster requires an L2 table entry. The refcount table and the
L1 table is generally negligible. At the worst, each metadata cluster
requires its own entry in the fragment list; each entry takes up four
bytes, therefore, at the worst, the fragment list should take up (for an
image with 64 kB clusters) (4 B) / (64 kB * 6000) of the image size,
which is about 1.e-8 (i.e., 11 kB for a 1 TB image, or 11 MB for a 1 PB
image).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/Makefile.objs   |   3 +-
 block/qcow2-overlap.c | 404 ++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h         |  13 ++
 3 files changed, 419 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-overlap.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 04b0e43..8d9c73d 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
+block-obj-y += qcow2-cache.o qcow2-overlap.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
diff --git a/block/qcow2-overlap.c b/block/qcow2-overlap.c
new file mode 100644
index 0000000..9f3f2aa
--- /dev/null
+++ b/block/qcow2-overlap.c
@@ -0,0 +1,404 @@
+/*
+ * QCOW2 runtime metadata overlap detection
+ *
+ * Copyright (c) 2014 Max Reitz <mreitz@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block/block_int.h"
+#include "qemu-common.h"
+#include "qemu/range.h"
+#include "qcow2.h"
+
+/* Number of clusters which are covered by each metadata window;
+ * note that this may not exceed 2^16 as long as
+ * Qcow2MetadataFragment::relative_start is a uint16_t */
+#define WINDOW_SIZE 4096
+
+/* Describes a fragment of a or a whole metadata range; does not necessarily
+ * describe the whole range because it needs to be split on window boundaries */
+typedef struct Qcow2MetadataFragment {
+    /* Bitmask of QCow2MetadataOverlap values */
+    uint8_t types;
+    uint8_t nb_clusters;
+    /* Number of clusters between the start of the window and this range */
+    uint16_t relative_start;
+} QEMU_PACKED Qcow2MetadataFragment;
+
+typedef struct Qcow2MetadataWindow {
+    Qcow2MetadataFragment *fragments;
+    int nb_fragments, fragments_array_size;
+
+    /* If not NULL, this is an expanded version of the "RLE" version given by
+     * the fragments array; there are WINDOW_SIZE entries */
+    uint8_t *bitmap;
+    bool bitmap_modified;
+
+    /* Time of last access */
+    unsigned age;
+
+    /* Index in Qcow2MetadataList::cached_windows */
+    int cached_windows_index;
+} Qcow2MetadataWindow;
+
+struct Qcow2MetadataList {
+    Qcow2MetadataWindow *windows;
+    uint64_t nb_windows;
+
+    unsigned current_age;
+
+    /* Index into the windows array */
+    int *cached_windows;
+    size_t nb_cached_windows;
+};
+
+/**
+ * Destroys the cached window bitmap. If it has been modified, the fragment list
+ * will be rebuilt accordingly.
+ */
+static void destroy_window_bitmap(Qcow2MetadataList *mdl,
+                                  Qcow2MetadataWindow *window)
+{
+    if (!window->bitmap) {
+        return;
+    }
+
+    if (window->bitmap_modified) {
+        int bitmap_i, fragment_i = 0;
+        QCow2MetadataOverlap current_types = 0;
+        int current_nb_clusters = 0;
+
+        /* Rebuild the fragment list; the case bitmap_i == WINDOW_SIZE is for
+         * entering the last fragment at the bitmap end */
+
+        for (bitmap_i = 0; bitmap_i <= WINDOW_SIZE; bitmap_i++) {
+            /* Qcow2MetadataFragment::nb_clusters is a uint8_t, so
+             * current_nb_clusters may not exceed 255 */
+            if (bitmap_i < WINDOW_SIZE &&
+                current_types == window->bitmap[bitmap_i] &&
+                current_nb_clusters < 255)
+            {
+                current_nb_clusters++;
+            } else {
+                if (current_types && current_nb_clusters) {
+                    if (fragment_i >= window->fragments_array_size) {
+                        window->fragments_array_size =
+                            3 * window->fragments_array_size / 2 + 1;
+
+                        /* new_nb_fragments should be small enough, and there is
+                         * nothing we can do on failure anyway, so do not use
+                         * g_try_renew() here */
+                        window->fragments =
+                            g_renew(Qcow2MetadataFragment, window->fragments,
+                                    window->fragments_array_size);
+                    }
+
+                    window->fragments[fragment_i++] = (Qcow2MetadataFragment){
+                        .types          = current_types,
+                        .nb_clusters    = current_nb_clusters,
+                        .relative_start = bitmap_i - current_nb_clusters,
+                    };
+                }
+
+                current_nb_clusters = 0;
+                if (bitmap_i < WINDOW_SIZE) {
+                    current_types = window->bitmap[bitmap_i];
+                }
+            }
+        }
+
+        window->nb_fragments = fragment_i;
+    }
+
+    g_free(window->bitmap);
+    window->bitmap = NULL;
+}
+
+/**
+ * Creates a bitmap from the fragment list.
+ */
+static void build_window_bitmap(Qcow2MetadataList *mdl,
+                                Qcow2MetadataWindow *window)
+{
+    int cache_i, oldest_cache_i = -1, i;
+    unsigned oldest_cache_age = 0;
+
+    for (cache_i = 0; cache_i < mdl->nb_cached_windows; cache_i++) {
+        unsigned age;
+
+        if (mdl->cached_windows[cache_i] < 0) {
+            break;
+        }
+
+        age = mdl->current_age - mdl->windows[mdl->cached_windows[cache_i]].age;
+        if (age > oldest_cache_age) {
+            oldest_cache_age = age;
+            oldest_cache_i = cache_i;
+        }
+    }
+
+    if (cache_i >= mdl->nb_cached_windows) {
+        destroy_window_bitmap(mdl,
+            &mdl->windows[mdl->cached_windows[oldest_cache_i]]);
+        cache_i = oldest_cache_i;
+    }
+
+    assert(cache_i >= 0);
+    mdl->cached_windows[cache_i] = window - mdl->windows;
+    window->cached_windows_index = cache_i;
+
+    window->age = mdl->current_age++;
+
+    window->bitmap = g_new0(uint8_t, WINDOW_SIZE);
+
+    for (i = 0; i < window->nb_fragments; i++) {
+        Qcow2MetadataFragment *fragment = &window->fragments[i];
+
+        memset(&window->bitmap[fragment->relative_start], fragment->types,
+               fragment->nb_clusters);
+    }
+
+    window->bitmap_modified = false;
+}
+
+/**
+ * Enters a new range into the metadata list.
+ */
+void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
+                               int nb_clusters, QCow2MetadataOverlap types)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t start_cluster = offset >> s->cluster_bits;
+    uint64_t end_cluster = start_cluster + nb_clusters;
+    uint64_t current_cluster = start_cluster;
+
+    types &= s->overlap_check;
+    if (!types) {
+        return;
+    }
+
+    if (offset_into_cluster(s, offset)) {
+        /* Do not enter apparently broken metadata ranges */
+        return;
+    }
+
+    while (current_cluster < end_cluster) {
+        int bitmap_i;
+        int bitmap_i_start = current_cluster % WINDOW_SIZE;
+        int bitmap_i_end = MIN(WINDOW_SIZE,
+                               end_cluster - current_cluster + bitmap_i_start);
+        uint64_t window_i = current_cluster / WINDOW_SIZE;
+        Qcow2MetadataWindow *window;
+
+        if (window_i >= s->metadata_list->nb_windows) {
+            /* This should not be happening too often, so it is fine to resize
+             * the array to exactly the required size */
+            Qcow2MetadataWindow *new_windows;
+
+            new_windows = g_try_renew(Qcow2MetadataWindow,
+                                      s->metadata_list->windows,
+                                      window_i + 1);
+            if (!new_windows) {
+                return;
+            }
+
+            memset(new_windows + s->metadata_list->nb_windows, 0,
+                   (window_i + 1 - s->metadata_list->nb_windows) *
+                   sizeof(Qcow2MetadataWindow));
+
+            s->metadata_list->windows = new_windows;
+            s->metadata_list->nb_windows = window_i + 1;
+        }
+
+        window = &s->metadata_list->windows[window_i];
+        if (!window->bitmap) {
+            build_window_bitmap(s->metadata_list, window);
+        }
+
+        for (bitmap_i = bitmap_i_start; bitmap_i < bitmap_i_end; bitmap_i++) {
+            window->bitmap[bitmap_i] |= types;
+        }
+
+        window->age = s->metadata_list->current_age++;
+        window->bitmap_modified = true;
+
+        /* Go to the next window */
+        current_cluster += WINDOW_SIZE - bitmap_i_start;
+    }
+}
+
+/**
+ * Removes a range of the given types from the metadata list.
+ */
+void qcow2_metadata_list_remove(BlockDriverState *bs, uint64_t offset,
+                                int nb_clusters, QCow2MetadataOverlap types)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t start_cluster = offset >> s->cluster_bits;
+    uint64_t end_cluster = start_cluster + nb_clusters;
+    uint64_t current_cluster = start_cluster;
+
+    types &= s->overlap_check;
+    if (!types) {
+        return;
+    }
+
+    if (offset_into_cluster(s, offset)) {
+        /* Try to remove even broken metadata ranges */
+        end_cluster++;
+    }
+
+    while (current_cluster < end_cluster) {
+        int bitmap_i;
+        int bitmap_i_start = current_cluster % WINDOW_SIZE;
+        int bitmap_i_end = MIN(WINDOW_SIZE,
+                               end_cluster - current_cluster + bitmap_i_start);
+        uint64_t window_i = current_cluster / WINDOW_SIZE;
+        Qcow2MetadataWindow *window;
+
+        /* If the list is too small, there is no metadata structure here;
+         * because window_i will only grow, we can abort here */
+        if (window_i >= s->metadata_list->nb_windows) {
+            return;
+        }
+
+        window = &s->metadata_list->windows[window_i];
+        if (!window->bitmap) {
+            build_window_bitmap(s->metadata_list, window);
+        }
+
+        for (bitmap_i = bitmap_i_start; bitmap_i < bitmap_i_end; bitmap_i++) {
+            window->bitmap[bitmap_i] &= ~types;
+        }
+
+        window->age = s->metadata_list->current_age++;
+        window->bitmap_modified = true;
+
+        /* Go to the next window */
+        current_cluster += WINDOW_SIZE - bitmap_i_start;
+    }
+}
+
+static int single_check_metadata_overlap(Qcow2MetadataList *mdl, int ign,
+                                         uint64_t cluster)
+{
+    uint64_t window_i = cluster / WINDOW_SIZE;
+    int bitmap_i = cluster % WINDOW_SIZE;
+    Qcow2MetadataWindow *window;
+
+    if (window_i >= mdl->nb_windows) {
+        return 0;
+    }
+    window = &mdl->windows[window_i];
+
+    if (!window->bitmap) {
+        build_window_bitmap(mdl, window);
+    }
+
+    window->age = mdl->current_age++;
+
+    return window->bitmap[bitmap_i] & ~ign;
+}
+
+/* This will replace qcow2_check_metadata_overlap() */
+static int check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
+                                  int64_t size) __attribute__((used));
+
+static int check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
+                                  int64_t size)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t start_cluster = offset >> s->cluster_bits;
+    uint64_t end_cluster = DIV_ROUND_UP(offset + size, s->cluster_size);
+    uint64_t current_cluster;
+    int ret = 0;
+
+    for (current_cluster = start_cluster; current_cluster < end_cluster;
+         current_cluster++)
+    {
+        ret |= single_check_metadata_overlap(s->metadata_list, ign,
+                                             current_cluster);
+    }
+
+    return ret;
+}
+
+int qcow2_create_empty_metadata_list(BlockDriverState *bs, size_t cache_size,
+                                     Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret;
+    size_t cache_entries, i;
+
+    s->metadata_list = g_new0(Qcow2MetadataList, 1);
+    s->metadata_list->nb_windows = bs->total_sectors / s->cluster_sectors
+                                                     / WINDOW_SIZE;
+    s->metadata_list->windows = g_try_new0(Qcow2MetadataWindow,
+                                           s->metadata_list->nb_windows);
+    if (s->metadata_list->nb_windows && !s->metadata_list->windows) {
+        error_setg(errp, "Could not allocate metadata overlap check windows");
+        ret = -ENOMEM;
+        goto fail;
+    }
+
+    cache_entries = cache_size
+                  / (WINDOW_SIZE + sizeof(*s->metadata_list->cached_windows));
+    if (!cache_entries) {
+        cache_entries = 1;
+    }
+
+    s->metadata_list->nb_cached_windows = cache_entries;
+    s->metadata_list->cached_windows = g_try_new(int, cache_entries);
+    if (!s->metadata_list->cached_windows) {
+        error_setg(errp, "Could not allocate metadata overlap cache pointers");
+        ret = -ENOMEM;
+        goto fail;
+    }
+    for (i = 0; i < s->metadata_list->nb_cached_windows; i++) {
+        s->metadata_list->cached_windows[i] = -1;
+    }
+
+    return 0;
+
+fail:
+    qcow2_metadata_list_destroy(bs);
+    return ret;
+}
+
+void qcow2_metadata_list_destroy(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t i;
+
+    if (s->metadata_list && s->metadata_list->windows) {
+        for (i = 0; i < s->metadata_list->nb_windows; i++) {
+            Qcow2MetadataWindow *window = &s->metadata_list->windows[i];
+
+            destroy_window_bitmap(s->metadata_list, window);
+            g_free(window->fragments);
+        }
+
+        g_free(s->metadata_list->cached_windows);
+        g_free(s->metadata_list->windows);
+    }
+
+    g_free(s->metadata_list);
+    s->metadata_list = NULL;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 6e39a1b..8dd5a0a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -159,6 +159,9 @@ typedef struct QCowSnapshot {
 struct Qcow2Cache;
 typedef struct Qcow2Cache Qcow2Cache;
 
+struct Qcow2MetadataList;
+typedef struct Qcow2MetadataList Qcow2MetadataList;
+
 typedef struct Qcow2UnknownHeaderExtension {
     uint32_t magic;
     uint32_t len;
@@ -261,6 +264,7 @@ typedef struct BDRVQcowState {
 
     bool discard_passthrough[QCOW2_DISCARD_MAX];
 
+    Qcow2MetadataList *metadata_list;
     int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
     bool signaled_corruption;
 
@@ -576,4 +580,13 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
     void **table);
 int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
+/* qcow2-overlap.c functions */
+int qcow2_create_empty_metadata_list(BlockDriverState *bs, size_t cache_size,
+                                     Error **errp);
+void qcow2_metadata_list_destroy(BlockDriverState *bs);
+void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
+                               int nb_clusters, QCow2MetadataOverlap type);
+void qcow2_metadata_list_remove(BlockDriverState *bs, uint64_t offset,
+                                int nb_clusters, QCow2MetadataOverlap type);
+
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 02/12] qcow2: Pull up overlap check option evaluation
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 01/12] " Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-03 23:33   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 03/12] qcow2: Create metadata list Max Reitz
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Pull up the absorption of the qcow2-relevant command line options and
the evaluation of the overlap check options in qcow2_open().

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 96 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d120494..ed88d69 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -696,6 +696,54 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         bs->encrypted = 1;
     }
 
+    opts = qemu_opts_create(&qcow2_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    opt_overlap_check = qemu_opt_get(opts, QCOW2_OPT_OVERLAP);
+    opt_overlap_check_template = qemu_opt_get(opts, QCOW2_OPT_OVERLAP_TEMPLATE);
+    if (opt_overlap_check_template && opt_overlap_check &&
+        strcmp(opt_overlap_check_template, opt_overlap_check))
+    {
+        error_setg(errp, "Conflicting values for qcow2 options '"
+                   QCOW2_OPT_OVERLAP "' ('%s') and '" QCOW2_OPT_OVERLAP_TEMPLATE
+                   "' ('%s')", opt_overlap_check, opt_overlap_check_template);
+        ret = -EINVAL;
+        goto fail;
+    }
+    if (!opt_overlap_check) {
+        opt_overlap_check = opt_overlap_check_template ?: "cached";
+    }
+
+    if (!strcmp(opt_overlap_check, "none")) {
+        overlap_check_template = 0;
+    } else if (!strcmp(opt_overlap_check, "constant")) {
+        overlap_check_template = QCOW2_OL_CONSTANT;
+    } else if (!strcmp(opt_overlap_check, "cached")) {
+        overlap_check_template = QCOW2_OL_CACHED;
+    } else if (!strcmp(opt_overlap_check, "all")) {
+        overlap_check_template = QCOW2_OL_ALL;
+    } else {
+        error_setg(errp, "Unsupported value '%s' for qcow2 option "
+                   "'overlap-check'. Allowed are either of the following: "
+                   "none, constant, cached, all", opt_overlap_check);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    s->overlap_check = 0;
+    for (i = 0; i < QCOW2_OL_MAX_BITNR; i++) {
+        /* overlap-check defines a template bitmask, but every flag may be
+         * overwritten through the associated boolean option */
+        s->overlap_check |=
+            qemu_opt_get_bool(opts, overlap_bool_option_names[i],
+                              overlap_check_template & (1 << i)) << i;
+    }
+
     s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
     s->l2_size = 1 << s->l2_bits;
     /* 2^(s->refcount_order - 3) is the refcount width in bytes */
@@ -791,14 +839,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* get L2 table/refcount block cache size from command line options */
-    opts = qemu_opts_create(&qcow2_runtime_opts, NULL, 0, &error_abort);
-    qemu_opts_absorb_qdict(opts, options, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        ret = -EINVAL;
-        goto fail;
-    }
-
     read_cache_sizes(opts, &l2_cache_size, &refcount_cache_size, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
@@ -931,46 +971,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     s->discard_passthrough[QCOW2_DISCARD_OTHER] =
         qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_OTHER, false);
 
-    opt_overlap_check = qemu_opt_get(opts, QCOW2_OPT_OVERLAP);
-    opt_overlap_check_template = qemu_opt_get(opts, QCOW2_OPT_OVERLAP_TEMPLATE);
-    if (opt_overlap_check_template && opt_overlap_check &&
-        strcmp(opt_overlap_check_template, opt_overlap_check))
-    {
-        error_setg(errp, "Conflicting values for qcow2 options '"
-                   QCOW2_OPT_OVERLAP "' ('%s') and '" QCOW2_OPT_OVERLAP_TEMPLATE
-                   "' ('%s')", opt_overlap_check, opt_overlap_check_template);
-        ret = -EINVAL;
-        goto fail;
-    }
-    if (!opt_overlap_check) {
-        opt_overlap_check = opt_overlap_check_template ?: "cached";
-    }
-
-    if (!strcmp(opt_overlap_check, "none")) {
-        overlap_check_template = 0;
-    } else if (!strcmp(opt_overlap_check, "constant")) {
-        overlap_check_template = QCOW2_OL_CONSTANT;
-    } else if (!strcmp(opt_overlap_check, "cached")) {
-        overlap_check_template = QCOW2_OL_CACHED;
-    } else if (!strcmp(opt_overlap_check, "all")) {
-        overlap_check_template = QCOW2_OL_ALL;
-    } else {
-        error_setg(errp, "Unsupported value '%s' for qcow2 option "
-                   "'overlap-check'. Allowed are either of the following: "
-                   "none, constant, cached, all", opt_overlap_check);
-        ret = -EINVAL;
-        goto fail;
-    }
-
-    s->overlap_check = 0;
-    for (i = 0; i < QCOW2_OL_MAX_BITNR; i++) {
-        /* overlap-check defines a template bitmask, but every flag may be
-         * overwritten through the associated boolean option */
-        s->overlap_check |=
-            qemu_opt_get_bool(opts, overlap_bool_option_names[i],
-                              overlap_check_template & (1 << i)) << i;
-    }
-
     qemu_opts_del(opts);
     opts = NULL;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 03/12] qcow2: Create metadata list
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 01/12] " Max Reitz
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 02/12] qcow2: Pull up overlap check option evaluation Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-03 23:34   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 04/12] qcow2/overlaps: Protect image header Max Reitz
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Create and destroy the metadata list on creation and destruction of a
qcow2 BDS, respectively. Skip creation if no overlap checks should be
performed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index ed88d69..f80f9ed 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -744,6 +744,13 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
                               overlap_check_template & (1 << i)) << i;
     }
 
+    if (s->overlap_check) {
+        ret = qcow2_create_empty_metadata_list(bs, 65536, errp);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
     s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
     s->l2_size = 1 << s->l2_bits;
     /* 2^(s->refcount_order - 3) is the refcount width in bytes */
@@ -1006,6 +1013,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
     g_free(s->cluster_cache);
     qemu_vfree(s->cluster_data);
+    qcow2_metadata_list_destroy(bs);
     return ret;
 }
 
@@ -1444,6 +1452,8 @@ static void qcow2_close(BlockDriverState *bs)
     qemu_vfree(s->cluster_data);
     qcow2_refcount_close(bs);
     qcow2_free_snapshots(bs);
+
+    qcow2_metadata_list_destroy(bs);
 }
 
 static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 04/12] qcow2/overlaps: Protect image header
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (2 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 03/12] qcow2: Create metadata list Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-03 23:47   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 05/12] qcow2/overlaps: Protect refcount table Max Reitz
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Enter the image header into the metadata list to protect it against
accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index f80f9ed..19ac2df 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -751,6 +751,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         }
     }
 
+    qcow2_metadata_list_enter(bs, 0, 1, QCOW2_OL_MAIN_HEADER);
+
     s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
     s->l2_size = 1 << s->l2_bits;
     /* 2^(s->refcount_order - 3) is the refcount width in bytes */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 05/12] qcow2/overlaps: Protect refcount table
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (3 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 04/12] qcow2/overlaps: Protect image header Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-03 23:55   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 06/12] qcow2/overlaps: Protect refcount blocks Max Reitz
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Keep track of the refcount table in the metadata list to protect it
against accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 18 ++++++++++++++++++
 block/qcow2.c          |  4 ++++
 2 files changed, 22 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c0c4a50..f438d20 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -433,6 +433,14 @@ static int alloc_refcount_block(BlockDriverState *bs,
     s->refcount_table_size = table_size;
     s->refcount_table_offset = table_offset;
 
+    qcow2_metadata_list_remove(bs, old_table_offset,
+                               size_to_clusters(s, old_table_size *
+                                                   sizeof(uint64_t)),
+                               QCOW2_OL_REFCOUNT_TABLE);
+
+    qcow2_metadata_list_enter(bs, table_offset, table_clusters,
+                              QCOW2_OL_REFCOUNT_TABLE);
+
     /* Free old table. */
     qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t),
                         QCOW2_DISCARD_OTHER);
@@ -1946,6 +1954,16 @@ write_refblocks:
         goto fail;
     }
 
+    qcow2_metadata_list_remove(bs, s->refcount_table_offset,
+                               size_to_clusters(s, s->refcount_table_size
+                                                   * sizeof(uint64_t)),
+                               QCOW2_OL_REFCOUNT_TABLE);
+
+    qcow2_metadata_list_enter(bs, reftable_offset,
+                              size_to_clusters(s, reftable_size *
+                                                  sizeof(uint64_t)),
+                              QCOW2_OL_REFCOUNT_TABLE);
+
     for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
         be64_to_cpus(&on_disk_reftable[refblock_index]);
     }
diff --git a/block/qcow2.c b/block/qcow2.c
index 19ac2df..1644421 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -779,6 +779,10 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         error_setg(errp, "Invalid reference count table offset");
         goto fail;
     }
+    qcow2_metadata_list_enter(bs, s->refcount_table_offset,
+                              size_to_clusters(s, s->refcount_table_size *
+                                                  sizeof(uint64_t)),
+                              QCOW2_OL_REFCOUNT_TABLE);
 
     /* Snapshot table offset/length */
     if (header.nb_snapshots > QCOW_MAX_SNAPSHOTS) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 06/12] qcow2/overlaps: Protect refcount blocks
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (4 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 05/12] qcow2/overlaps: Protect refcount table Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-05  1:24   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 07/12] qcow2/overlaps: Protect active L1 table Max Reitz
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Keep track of the refcount blocks in the metadata list to protect them
against accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f438d20..b2116a6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -57,8 +57,16 @@ int qcow2_refcount_init(BlockDriverState *bs)
         if (ret < 0) {
             goto fail;
         }
-        for(i = 0; i < s->refcount_table_size; i++)
+        for (i = 0; i < s->refcount_table_size; i++) {
+            uint64_t refblock_offset;
+
             be64_to_cpus(&s->refcount_table[i]);
+            refblock_offset = s->refcount_table[i] & REFT_OFFSET_MASK;
+            if (refblock_offset) {
+                qcow2_metadata_list_enter(bs, refblock_offset, 1,
+                                          QCOW2_OL_REFCOUNT_BLOCK);
+            }
+        }
     }
     return 0;
  fail:
@@ -302,6 +310,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         }
 
         s->refcount_table[refcount_table_index] = new_block;
+        qcow2_metadata_list_enter(bs, new_block, 1, QCOW2_OL_REFCOUNT_BLOCK);
 
         /* The new refcount block may be where the caller intended to put its
          * data, so let it restart the search. */
@@ -363,6 +372,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
     uint64_t *new_table = g_try_new0(uint64_t, table_size);
     uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
+    uint64_t block_index;
 
     assert(table_size > 0 && blocks_clusters > 0);
     if (new_table == NULL || new_blocks == NULL) {
@@ -441,6 +451,12 @@ static int alloc_refcount_block(BlockDriverState *bs,
     qcow2_metadata_list_enter(bs, table_offset, table_clusters,
                               QCOW2_OL_REFCOUNT_TABLE);
 
+    for (block_index = 0; block_index < blocks_clusters; block_index++) {
+        qcow2_metadata_list_enter(bs, meta_offset +
+                                  (block_index << s->cluster_bits), 1,
+                                  QCOW2_OL_REFCOUNT_BLOCK);
+    }
+
     /* Free old table. */
     qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t),
                         QCOW2_DISCARD_OTHER);
@@ -1959,14 +1975,34 @@ write_refblocks:
                                                    * sizeof(uint64_t)),
                                QCOW2_OL_REFCOUNT_TABLE);
 
+    for (refblock_index = 0; refblock_index < s->refcount_table_size;
+         refblock_index++)
+    {
+        uint64_t refblock_offset = s->refcount_table[refblock_index] &
+                                   REFT_OFFSET_MASK;
+        if (refblock_offset) {
+            qcow2_metadata_list_remove(bs, refblock_offset, 1,
+                                       QCOW2_OL_REFCOUNT_BLOCK);
+        }
+    }
+
     qcow2_metadata_list_enter(bs, reftable_offset,
                               size_to_clusters(s, reftable_size *
                                                   sizeof(uint64_t)),
                               QCOW2_OL_REFCOUNT_TABLE);
 
     for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
+        uint64_t refblock_offset;
+
         be64_to_cpus(&on_disk_reftable[refblock_index]);
+
+        refblock_offset = on_disk_reftable[refblock_index] & REFT_OFFSET_MASK;
+        if (refblock_offset) {
+            qcow2_metadata_list_enter(bs, refblock_offset, 1,
+                                      QCOW2_OL_REFCOUNT_BLOCK);
+        }
     }
+
     s->refcount_table = on_disk_reftable;
     s->refcount_table_offset = reftable_offset;
     s->refcount_table_size = reftable_size;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 07/12] qcow2/overlaps: Protect active L1 table
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (5 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 06/12] qcow2/overlaps: Protect refcount blocks Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-05  1:25   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 08/12] qcow2/overlaps: Protect active L2 tables Max Reitz
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Keep track of the active L1 table in the metadata list to protect it
against accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c  | 11 +++++++++++
 block/qcow2-snapshot.c | 10 ++++++++++
 block/qcow2.c          |  4 ++++
 3 files changed, 25 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index df0b2c9..131e5b2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -125,6 +125,17 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
     s->l1_table = new_l1_table;
     old_l1_size = s->l1_size;
     s->l1_size = new_l1_size;
+
+    qcow2_metadata_list_remove(bs, old_l1_table_offset,
+                               size_to_clusters(s, old_l1_size *
+                                                   sizeof(uint64_t)),
+                               QCOW2_OL_ACTIVE_L1);
+
+    qcow2_metadata_list_enter(bs, s->l1_table_offset,
+                              size_to_clusters(s, s->l1_size *
+                                                  sizeof(uint64_t)),
+                              QCOW2_OL_ACTIVE_L1);
+
     qcow2_free_clusters(bs, old_l1_table_offset, old_l1_size * sizeof(uint64_t),
                         QCOW2_DISCARD_OTHER);
     return 0;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 5b3903c..41ea053 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -720,6 +720,11 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
         return ret;
     }
 
+    qcow2_metadata_list_remove(bs, s->l1_table_offset,
+                               size_to_clusters(s, s->l1_size *
+                                                   sizeof(uint64_t)),
+                               QCOW2_OL_ACTIVE_L1);
+
     /* Switch the L1 table */
     qemu_vfree(s->l1_table);
 
@@ -731,5 +736,10 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
         be64_to_cpus(&s->l1_table[i]);
     }
 
+    qcow2_metadata_list_enter(bs, s->l1_table_offset,
+                              size_to_clusters(s, s->l1_size *
+                                                  sizeof(uint64_t)),
+                              QCOW2_OL_ACTIVE_L1);
+
     return 0;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 1644421..775cb39 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -831,6 +831,10 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
     s->l1_table_offset = header.l1_table_offset;
 
+    qcow2_metadata_list_enter(bs, s->l1_table_offset,
+                              size_to_clusters(s, s->l1_size *
+                                                  sizeof(uint64_t)),
+                              QCOW2_OL_ACTIVE_L1);
 
     if (s->l1_size > 0) {
         s->l1_table = qemu_try_blockalign(bs->file,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 08/12] qcow2/overlaps: Protect active L2 tables
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (6 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 07/12] qcow2/overlaps: Protect active L1 table Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-05 15:27   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 09/12] qcow2/overlaps: Protect snapshot table Max Reitz
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Keep track of the active L2 tables in the metadata list to protect them
against accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c  |  2 ++
 block/qcow2-refcount.c |  6 ++++++
 block/qcow2-snapshot.c | 21 +++++++++++++++++++++
 block/qcow2.c          |  8 +++++++-
 4 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 131e5b2..b749941 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -288,6 +288,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
         goto fail;
     }
 
+    qcow2_metadata_list_enter(bs, l2_offset, 1, QCOW2_OL_ACTIVE_L2);
+
     *table = l2_table;
     trace_qcow2_l2_allocate_done(bs, l1_index, 0);
     return 0;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b2116a6..94fe515 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1040,6 +1040,12 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
             if (addend != 0) {
                 refcount = qcow2_update_cluster_refcount(bs, l2_offset >>
                         s->cluster_bits, addend, QCOW2_DISCARD_SNAPSHOT);
+                if (addend < 0) {
+                    if (active_l1) {
+                        qcow2_metadata_list_remove(bs, l2_offset, 1,
+                                                   QCOW2_OL_ACTIVE_L2);
+                    }
+                }
             } else {
                 refcount = qcow2_get_refcount(bs, l2_offset >> s->cluster_bits);
             }
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 41ea053..d83a939 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -561,6 +561,13 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
     g_free(sn_l1_table);
     sn_l1_table = NULL;
 
+    for (i = 0; i < s->l1_size; i++) {
+        uint64_t l2_offset = s->l1_table[i] & L1E_OFFSET_MASK;
+        if (l2_offset) {
+            qcow2_metadata_list_enter(bs, l2_offset, 1, QCOW2_OL_ACTIVE_L2);
+        }
+    }
+
     /*
      * Update QCOW_OFLAG_COPIED in the active L1 table (it may have changed
      * when we decreased the refcount of the old snapshot.
@@ -725,6 +732,13 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
                                                    sizeof(uint64_t)),
                                QCOW2_OL_ACTIVE_L1);
 
+    for (i = 0; i < s->l1_size; i++) {
+        uint64_t l2_offset = s->l1_table[i] & L1E_OFFSET_MASK;
+        if (l2_offset) {
+            qcow2_metadata_list_remove(bs, l2_offset, 1, QCOW2_OL_ACTIVE_L2);
+        }
+    }
+
     /* Switch the L1 table */
     qemu_vfree(s->l1_table);
 
@@ -741,5 +755,12 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
                                                   sizeof(uint64_t)),
                               QCOW2_OL_ACTIVE_L1);
 
+    for (i = 0; i < s->l1_size; i++) {
+        uint64_t l2_offset = s->l1_table[i] & L1E_OFFSET_MASK;
+        if (l2_offset) {
+            qcow2_metadata_list_enter(bs, l2_offset, 1, QCOW2_OL_ACTIVE_L2);
+        }
+    }
+
     return 0;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 775cb39..d0950fc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -850,8 +850,14 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
             error_setg_errno(errp, -ret, "Could not read L1 table");
             goto fail;
         }
-        for(i = 0;i < s->l1_size; i++) {
+        for (i = 0; i < s->l1_size; i++) {
+            uint64_t l2_offset;
+
             be64_to_cpus(&s->l1_table[i]);
+            l2_offset = s->l1_table[i] & L1E_OFFSET_MASK;
+            if (l2_offset) {
+                qcow2_metadata_list_enter(bs, l2_offset, 1, QCOW2_OL_ACTIVE_L2);
+            }
         }
     }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 09/12] qcow2/overlaps: Protect snapshot table
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (7 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 08/12] qcow2/overlaps: Protect active L2 tables Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-05 15:29   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 10/12] qcow2/overlaps: Protect inactive L1 tables Max Reitz
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Keep track of the snapshot table in the metadata list to protect it
against accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c | 10 ++++++++++
 block/qcow2.c          |  6 ++++++
 2 files changed, 16 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index d83a939..c32d889 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -262,8 +262,18 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
     /* free the old snapshot table */
     qcow2_free_clusters(bs, s->snapshots_offset, s->snapshots_size,
                         QCOW2_DISCARD_SNAPSHOT);
+
+    qcow2_metadata_list_remove(bs, s->snapshots_offset,
+                               size_to_clusters(s, s->snapshots_size),
+                               QCOW2_OL_SNAPSHOT_TABLE);
+
     s->snapshots_offset = snapshots_offset;
     s->snapshots_size = snapshots_size;
+
+    qcow2_metadata_list_enter(bs, s->snapshots_offset,
+                              size_to_clusters(s, s->snapshots_size),
+                              QCOW2_OL_SNAPSHOT_TABLE);
+
     return 0;
 
 fail:
diff --git a/block/qcow2.c b/block/qcow2.c
index d0950fc..468090d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -798,6 +798,12 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         error_setg(errp, "Invalid snapshot table offset");
         goto fail;
     }
+    if (header.nb_snapshots) {
+        qcow2_metadata_list_enter(bs, header.snapshots_offset,
+                                  size_to_clusters(s, header.nb_snapshots *
+                                                   sizeof(QCowSnapshotHeader)),
+                                  QCOW2_OL_SNAPSHOT_TABLE);
+    }
 
     /* read the level 1 table */
     if (header.l1_size > QCOW_MAX_L1_SIZE) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 10/12] qcow2/overlaps: Protect inactive L1 tables
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (8 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 09/12] qcow2/overlaps: Protect snapshot table Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-02-05 19:41   ` Eric Blake
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 11/12] qcow2/overlaps: Protect inactive L2 tables Max Reitz
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Keep track of the inactive L1 tables in the metadata list to protect
them against accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index c32d889..b3122d8 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -121,6 +121,21 @@ int qcow2_read_snapshots(BlockDriverState *bs)
             ret = -EFBIG;
             goto fail;
         }
+
+        if (!(s->overlap_check & QCOW2_OL_INACTIVE_L1)) {
+            continue;
+        }
+
+        if (sn->l1_size > INT_MAX / sizeof(uint64_t)) {
+            /* Do not fail opening the image because a snapshot is broken which
+             * might not be used anyway */
+            continue;
+        }
+
+        qcow2_metadata_list_enter(bs, sn->l1_table_offset,
+                                  size_to_clusters(s, sn->l1_size *
+                                                      sizeof(uint64_t)),
+                                  QCOW2_OL_INACTIVE_L1);
     }
 
     assert(offset - s->snapshots_offset <= INT_MAX);
@@ -416,6 +431,11 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
     g_free(l1_table);
     l1_table = NULL;
 
+    qcow2_metadata_list_enter(bs, sn->l1_table_offset,
+                              size_to_clusters(s, sn->l1_size *
+                                                  sizeof(uint64_t)),
+                              QCOW2_OL_INACTIVE_L1);
+
     /*
      * Increase the refcounts of all clusters and make sure everything is
      * stable on disk before updating the snapshot table to contain a pointer
@@ -636,6 +656,11 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
     g_free(sn.id_str);
     g_free(sn.name);
 
+    qcow2_metadata_list_remove(bs, sn.l1_table_offset,
+                               size_to_clusters(s, sn.l1_size *
+                                                   sizeof(uint64_t)),
+                               QCOW2_OL_INACTIVE_L1);
+
     /*
      * Now decrease the refcounts of clusters referenced by the snapshot and
      * free the L1 table.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 11/12] qcow2/overlaps: Protect inactive L2 tables
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (9 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 10/12] qcow2/overlaps: Protect inactive L1 tables Max Reitz
@ 2014-11-24 15:56 ` Max Reitz
  2015-01-21 15:23   ` Stefan Hajnoczi
  2014-11-24 15:57 ` [Qemu-devel] [PATCH v2 12/12] qcow2: Use new metadata overlap check function Max Reitz
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Keep track of the inactive L2 tables in the metadata list to protect
them against accidental modifications.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 20 ++++++++++++++++++++
 block/qcow2-snapshot.c | 41 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 94fe515..c7096b7 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1042,8 +1042,28 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                         s->cluster_bits, addend, QCOW2_DISCARD_SNAPSHOT);
                 if (addend < 0) {
                     if (active_l1) {
+                        /* This is easy */
                         qcow2_metadata_list_remove(bs, l2_offset, 1,
                                                    QCOW2_OL_ACTIVE_L2);
+                    } else {
+                        /* If refcount == 0, this is, too. If refcount > 1, we
+                         * know that there must be some other inactive L2
+                         * reference; and for refcount == 1, if this is an
+                         * active L2 table, this was the last inactive L2
+                         * reference. */
+                        bool remove;
+                        if (refcount == 0) {
+                            remove = true;
+                        } else if (refcount == 1) {
+                            remove = qcow2_check_metadata_overlap(bs,
+                                ~QCOW2_OL_ACTIVE_L2, l2_offset,s->cluster_size);
+                        } else {
+                            remove = false;
+                        }
+                        if (remove) {
+                            qcow2_metadata_list_remove(bs, l2_offset, 1,
+                                                       QCOW2_OL_INACTIVE_L2);
+                        }
                     }
                 }
             } else {
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index b3122d8..800c7d3 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -46,9 +46,10 @@ int qcow2_read_snapshots(BlockDriverState *bs)
     QCowSnapshotHeader h;
     QCowSnapshotExtraData extra;
     QCowSnapshot *sn;
-    int i, id_str_size, name_size;
+    int i, j, id_str_size, name_size;
     int64_t offset;
     uint32_t extra_data_size;
+    uint64_t *l1_table;
     int ret;
 
     if (!s->nb_snapshots) {
@@ -122,7 +123,8 @@ int qcow2_read_snapshots(BlockDriverState *bs)
             goto fail;
         }
 
-        if (!(s->overlap_check & QCOW2_OL_INACTIVE_L1)) {
+        if (!(s->overlap_check & (QCOW2_OL_INACTIVE_L1 | QCOW2_OL_INACTIVE_L2)))
+        {
             continue;
         }
 
@@ -136,6 +138,34 @@ int qcow2_read_snapshots(BlockDriverState *bs)
                                   size_to_clusters(s, sn->l1_size *
                                                       sizeof(uint64_t)),
                                   QCOW2_OL_INACTIVE_L1);
+
+        if (!(s->overlap_check & QCOW2_OL_INACTIVE_L2)) {
+            continue;
+        }
+
+        l1_table = qemu_try_blockalign(bs->file,
+                                       sn->l1_size * sizeof(uint64_t));
+        if (!l1_table) {
+            /* Do not fail opening the image just because a snapshot's L2 tables
+             * cannot be covered by the overlap checks */
+            continue;
+        }
+
+        ret = bdrv_pread(bs->file, sn->l1_table_offset, l1_table,
+                         sn->l1_size * sizeof(uint64_t));
+        if (ret < 0) {
+            qemu_vfree(l1_table);
+            continue;
+        }
+        for (j = 0; j < sn->l1_size; j++) {
+            uint64_t l2_offset = be64_to_cpu(l1_table[j]) & L1E_OFFSET_MASK;
+            if (l2_offset) {
+                qcow2_metadata_list_enter(bs, l2_offset, 1,
+                                          QCOW2_OL_INACTIVE_L2);
+            }
+        }
+
+        qemu_vfree(l1_table);
     }
 
     assert(offset - s->snapshots_offset <= INT_MAX);
@@ -436,6 +466,13 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
                                                   sizeof(uint64_t)),
                               QCOW2_OL_INACTIVE_L1);
 
+    for (i = 0; i < s->l1_size; i++) {
+        uint64_t l2_offset = s->l1_table[i] & L1E_OFFSET_MASK;
+        if (l2_offset) {
+            qcow2_metadata_list_enter(bs, l2_offset, 1, QCOW2_OL_INACTIVE_L2);
+        }
+    }
+
     /*
      * Increase the refcounts of all clusters and make sure everything is
      * stable on disk before updating the snapshot table to contain a pointer
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 12/12] qcow2: Use new metadata overlap check function
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (10 preceding siblings ...)
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 11/12] qcow2/overlaps: Protect inactive L2 tables Max Reitz
@ 2014-11-24 15:57 ` Max Reitz
  2015-01-21 15:28   ` Stefan Hajnoczi
  2014-11-24 16:56 ` [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Eric Blake
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-24 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Make the static new overlap check function global and drop the old
function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-overlap.c  |   8 +---
 block/qcow2-refcount.c | 120 -------------------------------------------------
 2 files changed, 2 insertions(+), 126 deletions(-)

diff --git a/block/qcow2-overlap.c b/block/qcow2-overlap.c
index 9f3f2aa..a06e16b 100644
--- a/block/qcow2-overlap.c
+++ b/block/qcow2-overlap.c
@@ -317,12 +317,8 @@ static int single_check_metadata_overlap(Qcow2MetadataList *mdl, int ign,
     return window->bitmap[bitmap_i] & ~ign;
 }
 
-/* This will replace qcow2_check_metadata_overlap() */
-static int check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
-                                  int64_t size) __attribute__((used));
-
-static int check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
-                                  int64_t size)
+int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
+                                 int64_t size)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t start_cluster = offset >> s->cluster_bits;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c7096b7..a0f745e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2166,126 +2166,6 @@ fail:
     return ret;
 }
 
-#define overlaps_with(ofs, sz) \
-    ranges_overlap(offset, size, ofs, sz)
-
-/*
- * Checks if the given offset into the image file is actually free to use by
- * looking for overlaps with important metadata sections (L1/L2 tables etc.),
- * i.e. a sanity check without relying on the refcount tables.
- *
- * The ign parameter specifies what checks not to perform (being a bitmask of
- * QCow2MetadataOverlap values), i.e., what sections to ignore.
- *
- * Returns:
- * - 0 if writing to this offset will not affect the mentioned metadata
- * - a positive QCow2MetadataOverlap value indicating one overlapping section
- * - a negative value (-errno) indicating an error while performing a check,
- *   e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
- */
-int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
-                                 int64_t size)
-{
-    BDRVQcowState *s = bs->opaque;
-    int chk = s->overlap_check & ~ign;
-    int i, j;
-
-    if (!size) {
-        return 0;
-    }
-
-    if (chk & QCOW2_OL_MAIN_HEADER) {
-        if (offset < s->cluster_size) {
-            return QCOW2_OL_MAIN_HEADER;
-        }
-    }
-
-    /* align range to test to cluster boundaries */
-    size = align_offset(offset_into_cluster(s, offset) + size, s->cluster_size);
-    offset = start_of_cluster(s, offset);
-
-    if ((chk & QCOW2_OL_ACTIVE_L1) && s->l1_size) {
-        if (overlaps_with(s->l1_table_offset, s->l1_size * sizeof(uint64_t))) {
-            return QCOW2_OL_ACTIVE_L1;
-        }
-    }
-
-    if ((chk & QCOW2_OL_REFCOUNT_TABLE) && s->refcount_table_size) {
-        if (overlaps_with(s->refcount_table_offset,
-            s->refcount_table_size * sizeof(uint64_t))) {
-            return QCOW2_OL_REFCOUNT_TABLE;
-        }
-    }
-
-    if ((chk & QCOW2_OL_SNAPSHOT_TABLE) && s->snapshots_size) {
-        if (overlaps_with(s->snapshots_offset, s->snapshots_size)) {
-            return QCOW2_OL_SNAPSHOT_TABLE;
-        }
-    }
-
-    if ((chk & QCOW2_OL_INACTIVE_L1) && s->snapshots) {
-        for (i = 0; i < s->nb_snapshots; i++) {
-            if (s->snapshots[i].l1_size &&
-                overlaps_with(s->snapshots[i].l1_table_offset,
-                s->snapshots[i].l1_size * sizeof(uint64_t))) {
-                return QCOW2_OL_INACTIVE_L1;
-            }
-        }
-    }
-
-    if ((chk & QCOW2_OL_ACTIVE_L2) && s->l1_table) {
-        for (i = 0; i < s->l1_size; i++) {
-            if ((s->l1_table[i] & L1E_OFFSET_MASK) &&
-                overlaps_with(s->l1_table[i] & L1E_OFFSET_MASK,
-                s->cluster_size)) {
-                return QCOW2_OL_ACTIVE_L2;
-            }
-        }
-    }
-
-    if ((chk & QCOW2_OL_REFCOUNT_BLOCK) && s->refcount_table) {
-        for (i = 0; i < s->refcount_table_size; i++) {
-            if ((s->refcount_table[i] & REFT_OFFSET_MASK) &&
-                overlaps_with(s->refcount_table[i] & REFT_OFFSET_MASK,
-                s->cluster_size)) {
-                return QCOW2_OL_REFCOUNT_BLOCK;
-            }
-        }
-    }
-
-    if ((chk & QCOW2_OL_INACTIVE_L2) && s->snapshots) {
-        for (i = 0; i < s->nb_snapshots; i++) {
-            uint64_t l1_ofs = s->snapshots[i].l1_table_offset;
-            uint32_t l1_sz  = s->snapshots[i].l1_size;
-            uint64_t l1_sz2 = l1_sz * sizeof(uint64_t);
-            uint64_t *l1 = g_try_malloc(l1_sz2);
-            int ret;
-
-            if (l1_sz2 && l1 == NULL) {
-                return -ENOMEM;
-            }
-
-            ret = bdrv_pread(bs->file, l1_ofs, l1, l1_sz2);
-            if (ret < 0) {
-                g_free(l1);
-                return ret;
-            }
-
-            for (j = 0; j < l1_sz; j++) {
-                uint64_t l2_ofs = be64_to_cpu(l1[j]) & L1E_OFFSET_MASK;
-                if (l2_ofs && overlaps_with(l2_ofs, s->cluster_size)) {
-                    g_free(l1);
-                    return QCOW2_OL_INACTIVE_L2;
-                }
-            }
-
-            g_free(l1);
-        }
-    }
-
-    return 0;
-}
-
 static const char *metadata_ol_names[] = {
     [QCOW2_OL_MAIN_HEADER_BITNR]    = "qcow2_header",
     [QCOW2_OL_ACTIVE_L1_BITNR]      = "active L1 table",
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (11 preceding siblings ...)
  2014-11-24 15:57 ` [Qemu-devel] [PATCH v2 12/12] qcow2: Use new metadata overlap check function Max Reitz
@ 2014-11-24 16:56 ` Eric Blake
  2014-11-24 17:35 ` Max Reitz
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2014-11-24 16:56 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> As has been requested, this series adds new overlap check functions to
> the qcow2 code. My local branch is called "qcow2-improved-overlap-v1",
> but I am not so sure whether it is actually an improvement; that is left
> for you to decide, dear reviewers.
> 

> 
> The problem with this series is obviously that all the metadata
> information is read when reading a qcow2 image. iotest 044 is a good
> test for this. I personally haven't seen a problem here, but I can't
> outrule any. I never noticed any waits when opening images.

I have another reason for WANTING to read the qcow2 metadata up front
when opening an image.  Right now, the 'query-blockstats' QMP command
reports a wr_highest_offset field (I don't know if it has any decent use
besides for qcow2 images on raw block devices, but it was part of the
design long before we had image-specific stats).  However, this field is
ALWAYS reported as 0 for backing images in a chain when qemu first
boots, because the current code base does not update the field UNTIL the
first allocating write to a file (whether that is a data write, or
whether it is something metadata-only such as creating and destroying a
temporary internal snapshot).  When doing a block commit operation, the
field becomes useful for predicting how fast the backing qcow2 file is
growing as part of the commit operation - but since the field starts
life at 0 instead of the real size of the file, it leads to some very
awkward startup code.  If we parse all qcow2 metadata up front when
opening a file, then we trivially have the correct wr_highest_offset
instead of 0, even for a read-only image.


> 
> tl;dr
> =====
> 
> * CPU usage at runtime decreased by 150 to 275 percent on
>   overlap-check-heavy tasks
> * No additional performance problems at loading time (in theory has the
>   same runtime complexity as a single overlap check right now; in
>   practice I could not find any problems)
> * Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
>   for a 1 PB image etc. pp.)
> 

Sounds reasonable to me.  Although I'm not sure if it counts as a bug
fix, so I'm not sure if we should try to rush it into 2.2 (yes,
performance CAN be a bug, but it is late to be adding complex code).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (12 preceding siblings ...)
  2014-11-24 16:56 ` [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Eric Blake
@ 2014-11-24 17:35 ` Max Reitz
  2014-11-25 10:55 ` Max Reitz
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2014-11-24 17:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-24 at 16:56, Max Reitz wrote:
> tl;dr
> =====
>
> * CPU usage at runtime decreased by 150 to 275 percent on
>    overlap-check-heavy tasks

Oops,  that should read "15000 to 27500 percent" or "150 to 275 times".

Max

> * No additional performance problems at loading time (in theory has the
>    same runtime complexity as a single overlap check right now; in
>    practice I could not find any problems)
> * Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
>    for a 1 PB image etc. pp.)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (13 preceding siblings ...)
  2014-11-24 17:35 ` Max Reitz
@ 2014-11-25 10:55 ` Max Reitz
  2014-11-25 13:25 ` Stefan Hajnoczi
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2014-11-25 10:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-24 at 16:56, Max Reitz wrote:
> RAM usage
> =========
>
> So I have looked at my 2 GB image above, and the list uses 40 kB, which
> may or may not be too much (sounds completely fine to me for an image
> with 512 byte clusters); but it is a least a number I can use for
> testing the following theoretical inspection:

(I'm in the process of some kind of self-review right now)

Wrong, it's 371 kB after patch 1 is fixed.

[snip]

> Let's test that for the above image, which has a disk size of 266 MB:

Except the disk size doesn't matter; the image was created with 
preallocation=metadata, therefore all the metadata for a 2 GB virtual 
image is there. Let's check the file length: 2190559232, two percent 
above 2 GB. Sounds reasonable.

For that file length, we actually have:

40 * 2190559232 / (512 * 512) = 326 kB

Hm, okay, so it doesn't work so well. The good message is: I know why. 
In the calculation given here, I omitted the size of 
Qcow2MetadataWindow; for every WINDOW_SIZE (= WS) clusters, there is one 
such object. Let's include it in the calculation 
(sizeof(Qcow2MetadataWindow) is 40 on my x64 system):

40 * IS / (CS * CS) + 40 * IS / (CS * WS)
= 40 * IS / CS * (1 / CS + 1 / WS)

Okay, something else I forgot? There is the Qcow2MetadataList object 
itself; but we have it only once, so let's omit that. Then there is an 
integer array with an entry per cache entry and the cache itself; 
qcow2_create_empty_metadata_list() limits the cache size so that it that 
integer array and the cached bitmaps will not surpass the given byte 
size (currently 64 kB), so I'll just omit it as well (it's constant and 
can easily be adjusted).

So, with the above term we have:

40 * 2190559232 / 512 * (1 / 512 + 1 / 4096) = 367 kB

Much better.

> 40 * 266M / (512 * 512) = 42 kB
>
> Great! It works.
>
>
> So, now let's set CS to 64 kB, because that is basically the only
> cluster size we really care about. For a 1 TB image, we need 10 kB for
> the list. Sounds great to me. For a 1 PB image, we will need 10 MB. Fair
> enough. (Note that you don't need 10 MB of RAM to create a 1 PB image;
> you only need that once the disk size of the image has reached 1 PB).
>
> And 1 TB with 512 byte clusters? 160 MB. Urgh, that is a lot. But then
> again, you can switch off the overlap check with overlap-check=off; and
> trying to actually use a 1 TB image with 512 byte clusters is crazy in
> itself (have you tried just creating one without preallocation? It takes
> forever). So I can live with that.

And with the fixed term:

1 TB / 64 kB: 170 kB
1 PB / 64 kB: 170 MB

1 TB / 512 B: 180 MB

The actually "problematic" 512 B cluster version actually doesn't get so 
much worse (because 1 / 4096 < 1 / 512; whereas 1 / 4096 > 1 / 65536, 
which is why fixing the term has a much higher impact on greater cluster 
sizes).

But for the default of 64 kB, the size basically explodes. We now can 
either choose to ignore that fact (17x is a lot, but using more than 1 
MB starting from 6 TB still sounds fine to me) or increase WINDOW_SIZE 
(to a maximum of 65536, which would reduce the RAM usage to 20 kB for a 
1 TB image and 20 MB for a 1 PB image), which would probably somewhat 
limit performance in the conversion case, but since I haven't seen any 
issues for WINDOW_SIZE = 4096, I don't think it should make a huge 
difference. But as a side effect, we will want to increase the cache 
size, because with the current default of 64 kB, we will have only one 
cached bitmap; but we probably want to have at least two, maybe four if 
possible. But 256 kB does not sound too bad either.

> tl;dr
> =====
>
> * CPU usage at runtime decreased by 150 to 275 percent on
>    overlap-check-heavy tasks
> * No additional performance problems at loading time (in theory has the
>    same runtime complexity as a single overlap check right now; in
>    practice I could not find any problems)
> * Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
>    for a 1 PB image etc. pp.)

I'm not sure why I wrote 40 kB and 40 MB here; it was 10 kB and 10 MB.

Anyway, now it's 170 kB for a 1 TB image and 170 MB for a 1 PB image.

Max

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 01/12] " Max Reitz
@ 2014-11-25 11:02   ` Max Reitz
  2015-01-21 16:53   ` Stefan Hajnoczi
  2015-02-03 23:08   ` Eric Blake
  2 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2014-11-25 11:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-24 at 16:56, Max Reitz wrote:
> The existing qcow2 metadata overlap detection function used existing
> structures to determine the location of the image metadata, from plain
> fields such as l1_table_offset and l1_size in the BDRVQcowState, over
> image structures in memory such as the L1 table for the L2 tables'
> positions, or it even read the required data directly from disk for
> every requested check, such as the snapshot L1 tables for the inactive
> L2 tables' positions.
>
> These new functions instead keep a dedicated structure for keeping track
> of the metadata positions in memory. It consists of two parts: First,
> there is one structure which is basically a list of all metadata
> structures. Each entry has a bitmask of types (because some metadata
> structures may actually overlap, such as active and inactive L2 tables),
> a number of clusters occupied and the offset from the previous entry in
> clusters. This structure requires relatively few memory, but checking a
> certain point may take relatively long. Each entry is called a
> "fragment".
>
> Therefore, there is another representation which is a bitmap, or rather
> a bytemap, of metadata types. The previously described list is split
> into multiple windows with each describing a constant number of clusters
> (WINDOW_SIZE). If the list is to be queried or changed, the respective
> window is selected in constant time and the bitmap is generated from the
> fragments belonging to the window. This bitmap can then be queried in
> constant time and easily be changed.
>
> Because the bitmap representation requires more memory, it is only used
> as a cache. Whenever a window is removed from the cache, the fragment
> list will be rebuilt from the bitmap if the latter has been modified.
> Therefore, the fragment list is only used as the background
> representation to save memory, whereas the bitmap is used whenever
> possible.
>
> Regarding the size of the fragment list in memory: As one refcount block
> can handle cluster_size / 2 entries and one L2 table can handle
> cluster_size / 8 entries, for a qcow2 image with the standard cluster
> size of 64 kB, there is a ratio of data to metadata of about 1/6000
> (1/32768 refblocks and 1/8192 L2 tables) if one ignores the fact that
> not every cluster requires an L2 table entry. The refcount table and the
> L1 table is generally negligible. At the worst, each metadata cluster
> requires its own entry in the fragment list; each entry takes up four
> bytes, therefore, at the worst, the fragment list should take up (for an
> image with 64 kB clusters) (4 B) / (64 kB * 6000) of the image size,
> which is about 1.e-8 (i.e., 11 kB for a 1 TB image, or 11 MB for a 1 PB
> image).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/Makefile.objs   |   3 +-
>   block/qcow2-overlap.c | 404 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2.h         |  13 ++
>   3 files changed, 419 insertions(+), 1 deletion(-)
>   create mode 100644 block/qcow2-overlap.c
>
> diff --git a/block/qcow2-overlap.c b/block/qcow2-overlap.c
> new file mode 100644
> index 0000000..9f3f2aa
> --- /dev/null
> +++ b/block/qcow2-overlap.c

[snip]

> +
> +/**
> + * Destroys the cached window bitmap. If it has been modified, the fragment list
> + * will be rebuilt accordingly.
> + */
> +static void destroy_window_bitmap(Qcow2MetadataList *mdl,
> +                                  Qcow2MetadataWindow *window)
> +{
> +    if (!window->bitmap) {
> +        return;
> +    }
> +
> +    if (window->bitmap_modified) {
> +        int bitmap_i, fragment_i = 0;
> +        QCow2MetadataOverlap current_types = 0;
> +        int current_nb_clusters = 0;
> +
> +        /* Rebuild the fragment list; the case bitmap_i == WINDOW_SIZE is for
> +         * entering the last fragment at the bitmap end */
> +
> +        for (bitmap_i = 0; bitmap_i <= WINDOW_SIZE; bitmap_i++) {
> +            /* Qcow2MetadataFragment::nb_clusters is a uint8_t, so
> +             * current_nb_clusters may not exceed 255 */
> +            if (bitmap_i < WINDOW_SIZE &&
> +                current_types == window->bitmap[bitmap_i] &&
> +                current_nb_clusters < 255)
> +            {
> +                current_nb_clusters++;
> +            } else {
> +                if (current_types && current_nb_clusters) {
> +                    if (fragment_i >= window->fragments_array_size) {
> +                        window->fragments_array_size =
> +                            3 * window->fragments_array_size / 2 + 1;
> +
> +                        /* new_nb_fragments should be small enough, and there is
> +                         * nothing we can do on failure anyway, so do not use
> +                         * g_try_renew() here */
> +                        window->fragments =
> +                            g_renew(Qcow2MetadataFragment, window->fragments,
> +                                    window->fragments_array_size);
> +                    }
> +
> +                    window->fragments[fragment_i++] = (Qcow2MetadataFragment){
> +                        .types          = current_types,
> +                        .nb_clusters    = current_nb_clusters,
> +                        .relative_start = bitmap_i - current_nb_clusters,
> +                    };
> +                }
> +
> +                current_nb_clusters = 0;

Because I don't want to write a new version every time I find a single 
bug myself, I'll just reply here, so you know which things I know about.

This should be 1, and not 0. At this point, we have found the first 
cluster of this fragment, which is why this should be 1.

Max

> +                if (bitmap_i < WINDOW_SIZE) {
> +                    current_types = window->bitmap[bitmap_i];
> +                }
> +            }
> +        }
> +
> +        window->nb_fragments = fragment_i;
> +    }
> +
> +    g_free(window->bitmap);
> +    window->bitmap = NULL;
> +}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (14 preceding siblings ...)
  2014-11-25 10:55 ` Max Reitz
@ 2014-11-25 13:25 ` Stefan Hajnoczi
  2014-12-15  9:43 ` Max Reitz
  2015-01-20 22:48 ` Max Reitz
  17 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2014-11-25 13:25 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Peter Lieven, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

On Mon, Nov 24, 2014 at 04:56:48PM +0100, Max Reitz wrote:
> * CPU usage at runtime decreased by 150 to 275 percent on
>   overlap-check-heavy tasks
> * No additional performance problems at loading time (in theory has the
>   same runtime complexity as a single overlap check right now; in
>   practice I could not find any problems)
> * Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
>   for a 1 PB image etc. pp.)

RAM usage seems fine and the CPU reduction is very nice.

I will review the individual patches for QEMU 2.3.  If there is demand
it can be added to the QEMU 2.2 -stable branch at a later date.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (15 preceding siblings ...)
  2014-11-25 13:25 ` Stefan Hajnoczi
@ 2014-12-15  9:43 ` Max Reitz
  2015-01-20 22:48 ` Max Reitz
  17 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2014-12-15  9:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-24 at 16:56, Max Reitz wrote:
> As has been requested, this series adds new overlap check functions to
> the qcow2 code. My local branch is called "qcow2-improved-overlap-v1",
> but I am not so sure whether it is actually an improvement; that is left
> for you to decide, dear reviewers.
>
> See patch 1 for an explanation of why this series exists and what it
> does. Patch 1 is basically the core of this series, the rest just
> employs the functions introduced there.
>
> In a later patch, we may want to change the meaning of the "constant"
> overlap checking option to mean the same as "cached", which is
> everything except for inactive L2 tables. This series does make
> checking for overlaps with inactive L2 tables at runtime just as cheap
> as everything else (constant time plus caching), but using these checks
> means qemu has to read all the snapshot L1 tables when opening a qcow2
> file. This does not take long, of course, but it does result in a bit of
> overhead so I did not want to enable it by default.
>
> I think just enabling all overlap checks by default after this series
> should be fine and useful, though.
>
>
> Benchmarks
> ==========
>
> First, let me repeat the results I wrote as a reply to the cover letter
> of v1:
>
> I had a 1 GB qcow2 image in tmpfs (sorry, I "only" have 4 GB of RAM in
> this laptop, and a 2 GB image will kill it) which I accessed from a
> guest from inside qemu (Arch Linux, to be exact, because its image just
> dumps you into a console and that's all I need). I ran
>
> $ for i in $(seq 1 42); do \
>      dd if=/dev/zero of=/dev/vda bs=65536 \
>      done
>
> because it does not matter what data is written to the image, I just
> need to write to a lot of clusters fast to maximize pressure on the
> overlap check.
>
> The results were 13.5/10.5/12.41 % of CPU usage (recorded using perf) in
> qcow2_check_metadata_overlap() with the current implementation and
> 0.08/0.09 % with this series applied (0.08 % with overlap-check=cached,
> 0.09 % with overlap-check=all).
>
> Now as a table, just to have one here (and because it is useful when
> skipping through this text):
>
> Current implementation:     12.14 (± 1.519) %  (n = 3)
> With this series applied:    0.08           %  (n = 1)
> overlap-check=all:           0.09           %  (n = 1)
>
>
> Now I did some other benchmarks.
>
> The first of those is just running (on the host, obviously):
>
> $ perf record -ag \
>    qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=512 \
>    /tmp/test.qcow2 2G
>
> I ran ten rounds with the current implementation, ten rounds with these
> patches applied and five rounds with cache_size set to 0 in
> qcow2_create_empty_metadata_list() (which results in only one bitmap
> existing at one time, and therefore a lot of conversions between the
> bitmap and the list representation).
>
> The current implementation had a CPU usage (fraction of cycles) in
> qcow2_check_metadata_overlap() of 47.65 (±4.24) %. There was one rather
> extreme outlier (36.31 %), with that removed it is 48.91 (±1.54) %.
>
> With this series applied, the usage dropped to 0.149 (± 0.028) %.
> Additionally, qcow2_metadata_list_enter() took 0.046 (± 0.021) %. Both
> together took thus 0.195 (± 0.040) %.
>
> With cache_size set to 0, the usage was 0.130 % (± 0.012 %) in
> qcow2_check_metadata_overlap(); 0.056 % (± 0.011 %) in
> qcow2_metadata_list_enter(); and 0.186 % (± 0.021 %). It dropped, but
> still in range of standard deviation. Thus, heavy conversion between
> bitmap and list conversion (in normal use cases due to random accesses)
> should not be a problem at all.
>
> As a table:
>
> Current implementation:     48.91 (± 1.537) %  (n =  9)
> With this series applied:   0.195 (± 0.040) %  (n = 10)
> Only one bitmap cached:     0.186 (± 0.021) %  (n =  5)
>
> And as a really noticeable consequence: Before this series, creating
> the image like that took 16.624 s (15.97 s user). With this series, it
> takes 4.502 s (3.94 s user).
>
> Because statistics are fun, I collected the number of metadata list
> operations for the second and the third tests:
>
> Second test (default of 15 bitmaps cached):
>
> List to bitmap conversions: 1053
> Bitmap to list conversions: 1038
> Bitmaps deallocated:        1038
>
> Insert operations:         82266
> Remove operations:            14
> Queries:                  312265
>
> Third test (only one bitmap cached):
>
> List to bitmap conversions: 135943
> Bitmap to list conversions:  66546
> Bitmaps deallocated:        135942
>
> Insert operations:           82266
> Remove operations:              14
> Queries:                    312265
>
> As expected, the number of conversions is much higher. Interestingly, in
> the first case, every bitmap deallocation also meant a bitmap to list
> conversion; that means, every bitmap was modified at least once while it
> was cached. In contrast to that, in the second case, only every second
> deallocation resulted in a conversion, which means that half of the
> cached bitmaps were only read (which is bad considering all was done was
> allocating metadata structures).
>
> But for performance, there is no real information here (we only see that
> setting cache_size to 0 actually did increase the number of
> conversions).
>
>
> The second benchmark I ran today was a program which opened /dev/vda in
> qemu (again on Arch) with O_DIRECT | O_SYNC | O_RDWR. It then wrote an
> uninitialized 512 byte buffer to random places (512-byte-aligned) to a
> 1 GB image freshly created before VM launch with metadata preallocation
> and 512 byte clusters. The intention was to break bitmap caching and
> having to convert back and forth between list and bitmap representation.
> The results are as follows:
>
> Current implementation:     18.73 (± 0.157) %  (n = 10)
> With this series applied:   0.068 (± 0.009) %  (n = 10)
> Only one bitmap cached:     0.062 (± 0.008) %  (n =  5)
>
>
> Runtime conclusion
> ------------------
>
> As can be seen from the benchmarks, runtime CPU usage by the metadata
> overlap checks is greatly decreased by this series (to 1/150 in the
> first benchmark, to 1/250 in the second, and to 1/275 in the third).
>
>
> Where is the caveat?
> --------------------
>
> The problem with this series is obviously that all the metadata
> information is read when reading a qcow2 image. iotest 044 is a good
> test for this. I personally haven't seen a problem here, but I can't
> outrule any. I never noticed any waits when opening images.
>
> When approaching this from a theoretical approach, it becomes clear that
> there shouldn't be any problems here. If using the default of
> overlap-check=cached, only information available anyway is used to built
> up the metadata list (the same information that is iterated through for
> every overlap check in the current implementation). Since building the
> list simply means setting a bitmask in a bitmap and then transforming
> that bitmap to a list (which has been shown not pose any performance
> issues by the second benchmark), there should not be any problem here.
>
> So, the only caveat I do see is increased code complexity. I initially
> thought it might be too complex for its purpose; but after having done
> the benchmarks, it became apparent to me that there is a problem with
> the current implementation and that this series does fix it.
>
> And RAM usage may pose a problem.
>
>
> RAM usage
> =========
>
> So I have looked at my 2 GB image above, and the list uses 40 kB, which
> may or may not be too much (sounds completely fine to me for an image
> with 512 byte clusters); but it is a least a number I can use for
> testing the following theoretical inspection:
>
>
> So, once again, we assume the worst. Every metadata structure needs its
> own entry in the list - actually, the worst would be that every cluster
> needs its own entry, but that only makes a difference for L1 tables and
> the refcount table, so we can ignore that. In fact, we can completely
> ignore those tables; it makes things much easier.
>
> Let's name the cluster size "CS", and name the image size in bytes "IS".
>
> So, for a refcount width of 16 bits per entry, we can describe CS / 2
> clusters per refblock; which means we need (IS / CS) / (CS / 2) =
> 2 * IS / (CS * CS) refblocks. Let's be generous and say that the L2
> tables are capable of describing the complete image size (which in
> practice they do not because the guest disk size is smaller than the
> physical size). This assumption also has the neat side-effect of not
> having to care about snapshots. If we have a lot of internal snapshots
> with copied L2 tables, IS grows and therefore the next formula knows
> that the number of L2 tables grows as well. Nice. So, as every L2 table
> can describe CS / 8 (guest) clusters, we need 8 * IS / (CS * CS) L2
> tables.
>
> Therefore, ignoring the image header, L1 tables, the reftable and the
> snapshot table, we have about the following amount of metadata clusters
> per image (with 16 bit refcount entries):
>
> (2 + 8) * IS / (CS * CS) = 10 * IS / (CS * CS)
>
> We said that for every metadata cluster, we would need one entry in the
> list. Every list entry takes 32 bits (4 bytes). Thus, we need
> approximately up to
>
> 40 * IS / (CS * CS)
>
> bytes for the metadata list (please ignore units and append "bytes" at
> the resulting number...).
>
> Let's test that for the above image, which has a disk size of 266 MB:
>
> 40 * 266M / (512 * 512) = 42 kB
>
> Great! It works.
>
>
> So, now let's set CS to 64 kB, because that is basically the only
> cluster size we really care about. For a 1 TB image, we need 10 kB for
> the list. Sounds great to me. For a 1 PB image, we will need 10 MB. Fair
> enough. (Note that you don't need 10 MB of RAM to create a 1 PB image;
> you only need that once the disk size of the image has reached 1 PB).
>
> And 1 TB with 512 byte clusters? 160 MB. Urgh, that is a lot. But then
> again, you can switch off the overlap check with overlap-check=off; and
> trying to actually use a 1 TB image with 512 byte clusters is crazy in
> itself (have you tried just creating one without preallocation? It takes
> forever). So I can live with that.
>
>
> tl;dr
> =====
>
> * CPU usage at runtime decreased by 150 to 275 percent on
>    overlap-check-heavy tasks
> * No additional performance problems at loading time (in theory has the
>    same runtime complexity as a single overlap check right now; in
>    practice I could not find any problems)
> * Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
>    for a 1 PB image etc. pp.)
>
>
>
> As of this version, this series depends on
> "[PATCH v6] qcow2: Buffer L1 table in snapshot refcount update".
>
>
> v2:
> - Patch 1:
>    - Fixed a mistake regarding the value of
>      Qcow2MetadataFragment::relative_start; this value should be relative
>      to the window start, not relative to the end of the last fragment
>      (destroy_window_bitmap() wrote the latter there in v1)
>    - Use uint64_t for the window index everywhere
> - Patch 8: Said "qcow2: Buffer L1 table in snapshot refcount update"
>    removes l1_allocated in qcow2_update_snapshot_refcount() and basically
>    replaces it by active_l1 (where active_l1 = !l1_allocated). In v1, the
>    condition it was used in was actually wrong, this is fixed here, too
>    (the active L2 cluster type should be removed from the list if we are
>    working on the active L1 table, not if we are working on an inactive
>    L1 table).
>
>
> git-backport-diff against v2:
>
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
>
> 001/12:[0010] [FC] 'qcow2: Add new overlap check functions'
> 002/12:[----] [--] 'qcow2: Pull up overlap check option evaluation'
> 003/12:[----] [--] 'qcow2: Create metadata list'
> 004/12:[----] [--] 'qcow2/overlaps: Protect image header'
> 005/12:[----] [--] 'qcow2/overlaps: Protect refcount table'
> 006/12:[----] [--] 'qcow2/overlaps: Protect refcount blocks'
> 007/12:[----] [--] 'qcow2/overlaps: Protect active L1 table'
> 008/12:[0002] [FC] 'qcow2/overlaps: Protect active L2 tables'
> 009/12:[----] [--] 'qcow2/overlaps: Protect snapshot table'
> 010/12:[----] [--] 'qcow2/overlaps: Protect inactive L1 tables'
> 011/12:[----] [-C] 'qcow2/overlaps: Protect inactive L2 tables'
> 012/12:[----] [--] 'qcow2: Use new metadata overlap check function'
>
>
> Max Reitz (12):
>    qcow2: Add new overlap check functions
>    qcow2: Pull up overlap check option evaluation
>    qcow2: Create metadata list
>    qcow2/overlaps: Protect image header
>    qcow2/overlaps: Protect refcount table
>    qcow2/overlaps: Protect refcount blocks
>    qcow2/overlaps: Protect active L1 table
>    qcow2/overlaps: Protect active L2 tables
>    qcow2/overlaps: Protect snapshot table
>    qcow2/overlaps: Protect inactive L1 tables
>    qcow2/overlaps: Protect inactive L2 tables
>    qcow2: Use new metadata overlap check function
>
>   block/Makefile.objs    |   3 +-
>   block/qcow2-cluster.c  |  13 ++
>   block/qcow2-overlap.c  | 400 +++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2-refcount.c | 202 ++++++++++---------------
>   block/qcow2-snapshot.c | 105 ++++++++++++-
>   block/qcow2.c          | 130 ++++++++++------
>   block/qcow2.h          |  13 ++
>   7 files changed, 694 insertions(+), 172 deletions(-)
>   create mode 100644 block/qcow2-overlap.c

Ping

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
                   ` (16 preceding siblings ...)
  2014-12-15  9:43 ` Max Reitz
@ 2015-01-20 22:48 ` Max Reitz
  17 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2015-01-20 22:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-24 at 10:56, Max Reitz wrote:
> As has been requested, this series adds new overlap check functions to
> the qcow2 code. My local branch is called "qcow2-improved-overlap-v1",
> but I am not so sure whether it is actually an improvement; that is left
> for you to decide, dear reviewers.
>
> See patch 1 for an explanation of why this series exists and what it
> does. Patch 1 is basically the core of this series, the rest just
> employs the functions introduced there.
>
> In a later patch, we may want to change the meaning of the "constant"
> overlap checking option to mean the same as "cached", which is
> everything except for inactive L2 tables. This series does make
> checking for overlaps with inactive L2 tables at runtime just as cheap
> as everything else (constant time plus caching), but using these checks
> means qemu has to read all the snapshot L1 tables when opening a qcow2
> file. This does not take long, of course, but it does result in a bit of
> overhead so I did not want to enable it by default.
>
> I think just enabling all overlap checks by default after this series
> should be fine and useful, though.
>
>
> Benchmarks
> ==========
>
> First, let me repeat the results I wrote as a reply to the cover letter
> of v1:
>
> I had a 1 GB qcow2 image in tmpfs (sorry, I "only" have 4 GB of RAM in
> this laptop, and a 2 GB image will kill it) which I accessed from a
> guest from inside qemu (Arch Linux, to be exact, because its image just
> dumps you into a console and that's all I need). I ran
>
> $ for i in $(seq 1 42); do \
>      dd if=/dev/zero of=/dev/vda bs=65536 \
>      done
>
> because it does not matter what data is written to the image, I just
> need to write to a lot of clusters fast to maximize pressure on the
> overlap check.
>
> The results were 13.5/10.5/12.41 % of CPU usage (recorded using perf) in
> qcow2_check_metadata_overlap() with the current implementation and
> 0.08/0.09 % with this series applied (0.08 % with overlap-check=cached,
> 0.09 % with overlap-check=all).
>
> Now as a table, just to have one here (and because it is useful when
> skipping through this text):
>
> Current implementation:     12.14 (± 1.519) %  (n = 3)
> With this series applied:    0.08           %  (n = 1)
> overlap-check=all:           0.09           %  (n = 1)
>
>
> Now I did some other benchmarks.
>
> The first of those is just running (on the host, obviously):
>
> $ perf record -ag \
>    qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=512 \
>    /tmp/test.qcow2 2G
>
> I ran ten rounds with the current implementation, ten rounds with these
> patches applied and five rounds with cache_size set to 0 in
> qcow2_create_empty_metadata_list() (which results in only one bitmap
> existing at one time, and therefore a lot of conversions between the
> bitmap and the list representation).
>
> The current implementation had a CPU usage (fraction of cycles) in
> qcow2_check_metadata_overlap() of 47.65 (±4.24) %. There was one rather
> extreme outlier (36.31 %), with that removed it is 48.91 (±1.54) %.
>
> With this series applied, the usage dropped to 0.149 (± 0.028) %.
> Additionally, qcow2_metadata_list_enter() took 0.046 (± 0.021) %. Both
> together took thus 0.195 (± 0.040) %.
>
> With cache_size set to 0, the usage was 0.130 % (± 0.012 %) in
> qcow2_check_metadata_overlap(); 0.056 % (± 0.011 %) in
> qcow2_metadata_list_enter(); and 0.186 % (± 0.021 %). It dropped, but
> still in range of standard deviation. Thus, heavy conversion between
> bitmap and list conversion (in normal use cases due to random accesses)
> should not be a problem at all.
>
> As a table:
>
> Current implementation:     48.91 (± 1.537) %  (n =  9)
> With this series applied:   0.195 (± 0.040) %  (n = 10)
> Only one bitmap cached:     0.186 (± 0.021) %  (n =  5)
>
> And as a really noticeable consequence: Before this series, creating
> the image like that took 16.624 s (15.97 s user). With this series, it
> takes 4.502 s (3.94 s user).
>
> Because statistics are fun, I collected the number of metadata list
> operations for the second and the third tests:
>
> Second test (default of 15 bitmaps cached):
>
> List to bitmap conversions: 1053
> Bitmap to list conversions: 1038
> Bitmaps deallocated:        1038
>
> Insert operations:         82266
> Remove operations:            14
> Queries:                  312265
>
> Third test (only one bitmap cached):
>
> List to bitmap conversions: 135943
> Bitmap to list conversions:  66546
> Bitmaps deallocated:        135942
>
> Insert operations:           82266
> Remove operations:              14
> Queries:                    312265
>
> As expected, the number of conversions is much higher. Interestingly, in
> the first case, every bitmap deallocation also meant a bitmap to list
> conversion; that means, every bitmap was modified at least once while it
> was cached. In contrast to that, in the second case, only every second
> deallocation resulted in a conversion, which means that half of the
> cached bitmaps were only read (which is bad considering all was done was
> allocating metadata structures).
>
> But for performance, there is no real information here (we only see that
> setting cache_size to 0 actually did increase the number of
> conversions).
>
>
> The second benchmark I ran today was a program which opened /dev/vda in
> qemu (again on Arch) with O_DIRECT | O_SYNC | O_RDWR. It then wrote an
> uninitialized 512 byte buffer to random places (512-byte-aligned) to a
> 1 GB image freshly created before VM launch with metadata preallocation
> and 512 byte clusters. The intention was to break bitmap caching and
> having to convert back and forth between list and bitmap representation.
> The results are as follows:
>
> Current implementation:     18.73 (± 0.157) %  (n = 10)
> With this series applied:   0.068 (± 0.009) %  (n = 10)
> Only one bitmap cached:     0.062 (± 0.008) %  (n =  5)
>
>
> Runtime conclusion
> ------------------
>
> As can be seen from the benchmarks, runtime CPU usage by the metadata
> overlap checks is greatly decreased by this series (to 1/150 in the
> first benchmark, to 1/250 in the second, and to 1/275 in the third).
>
>
> Where is the caveat?
> --------------------
>
> The problem with this series is obviously that all the metadata
> information is read when reading a qcow2 image. iotest 044 is a good
> test for this. I personally haven't seen a problem here, but I can't
> outrule any. I never noticed any waits when opening images.
>
> When approaching this from a theoretical approach, it becomes clear that
> there shouldn't be any problems here. If using the default of
> overlap-check=cached, only information available anyway is used to built
> up the metadata list (the same information that is iterated through for
> every overlap check in the current implementation). Since building the
> list simply means setting a bitmask in a bitmap and then transforming
> that bitmap to a list (which has been shown not pose any performance
> issues by the second benchmark), there should not be any problem here.
>
> So, the only caveat I do see is increased code complexity. I initially
> thought it might be too complex for its purpose; but after having done
> the benchmarks, it became apparent to me that there is a problem with
> the current implementation and that this series does fix it.
>
> And RAM usage may pose a problem.
>
>
> RAM usage
> =========
>
> So I have looked at my 2 GB image above, and the list uses 40 kB, which
> may or may not be too much (sounds completely fine to me for an image
> with 512 byte clusters); but it is a least a number I can use for
> testing the following theoretical inspection:
>
>
> So, once again, we assume the worst. Every metadata structure needs its
> own entry in the list - actually, the worst would be that every cluster
> needs its own entry, but that only makes a difference for L1 tables and
> the refcount table, so we can ignore that. In fact, we can completely
> ignore those tables; it makes things much easier.
>
> Let's name the cluster size "CS", and name the image size in bytes "IS".
>
> So, for a refcount width of 16 bits per entry, we can describe CS / 2
> clusters per refblock; which means we need (IS / CS) / (CS / 2) =
> 2 * IS / (CS * CS) refblocks. Let's be generous and say that the L2
> tables are capable of describing the complete image size (which in
> practice they do not because the guest disk size is smaller than the
> physical size). This assumption also has the neat side-effect of not
> having to care about snapshots. If we have a lot of internal snapshots
> with copied L2 tables, IS grows and therefore the next formula knows
> that the number of L2 tables grows as well. Nice. So, as every L2 table
> can describe CS / 8 (guest) clusters, we need 8 * IS / (CS * CS) L2
> tables.
>
> Therefore, ignoring the image header, L1 tables, the reftable and the
> snapshot table, we have about the following amount of metadata clusters
> per image (with 16 bit refcount entries):
>
> (2 + 8) * IS / (CS * CS) = 10 * IS / (CS * CS)
>
> We said that for every metadata cluster, we would need one entry in the
> list. Every list entry takes 32 bits (4 bytes). Thus, we need
> approximately up to
>
> 40 * IS / (CS * CS)
>
> bytes for the metadata list (please ignore units and append "bytes" at
> the resulting number...).
>
> Let's test that for the above image, which has a disk size of 266 MB:
>
> 40 * 266M / (512 * 512) = 42 kB
>
> Great! It works.
>
>
> So, now let's set CS to 64 kB, because that is basically the only
> cluster size we really care about. For a 1 TB image, we need 10 kB for
> the list. Sounds great to me. For a 1 PB image, we will need 10 MB. Fair
> enough. (Note that you don't need 10 MB of RAM to create a 1 PB image;
> you only need that once the disk size of the image has reached 1 PB).
>
> And 1 TB with 512 byte clusters? 160 MB. Urgh, that is a lot. But then
> again, you can switch off the overlap check with overlap-check=off; and
> trying to actually use a 1 TB image with 512 byte clusters is crazy in
> itself (have you tried just creating one without preallocation? It takes
> forever). So I can live with that.
>
>
> tl;dr
> =====
>
> * CPU usage at runtime decreased by 150 to 275 percent on
>    overlap-check-heavy tasks
> * No additional performance problems at loading time (in theory has the
>    same runtime complexity as a single overlap check right now; in
>    practice I could not find any problems)
> * Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
>    for a 1 PB image etc. pp.)
>
>
>
> As of this version, this series depends on
> "[PATCH v6] qcow2: Buffer L1 table in snapshot refcount update".
>
>
> v2:
> - Patch 1:
>    - Fixed a mistake regarding the value of
>      Qcow2MetadataFragment::relative_start; this value should be relative
>      to the window start, not relative to the end of the last fragment
>      (destroy_window_bitmap() wrote the latter there in v1)
>    - Use uint64_t for the window index everywhere
> - Patch 8: Said "qcow2: Buffer L1 table in snapshot refcount update"
>    removes l1_allocated in qcow2_update_snapshot_refcount() and basically
>    replaces it by active_l1 (where active_l1 = !l1_allocated). In v1, the
>    condition it was used in was actually wrong, this is fixed here, too
>    (the active L2 cluster type should be removed from the list if we are
>    working on the active L1 table, not if we are working on an inactive
>    L1 table).
>
>
> git-backport-diff against v2:
>
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
>
> 001/12:[0010] [FC] 'qcow2: Add new overlap check functions'
> 002/12:[----] [--] 'qcow2: Pull up overlap check option evaluation'
> 003/12:[----] [--] 'qcow2: Create metadata list'
> 004/12:[----] [--] 'qcow2/overlaps: Protect image header'
> 005/12:[----] [--] 'qcow2/overlaps: Protect refcount table'
> 006/12:[----] [--] 'qcow2/overlaps: Protect refcount blocks'
> 007/12:[----] [--] 'qcow2/overlaps: Protect active L1 table'
> 008/12:[0002] [FC] 'qcow2/overlaps: Protect active L2 tables'
> 009/12:[----] [--] 'qcow2/overlaps: Protect snapshot table'
> 010/12:[----] [--] 'qcow2/overlaps: Protect inactive L1 tables'
> 011/12:[----] [-C] 'qcow2/overlaps: Protect inactive L2 tables'
> 012/12:[----] [--] 'qcow2: Use new metadata overlap check function'
>
>
> Max Reitz (12):
>    qcow2: Add new overlap check functions
>    qcow2: Pull up overlap check option evaluation
>    qcow2: Create metadata list
>    qcow2/overlaps: Protect image header
>    qcow2/overlaps: Protect refcount table
>    qcow2/overlaps: Protect refcount blocks
>    qcow2/overlaps: Protect active L1 table
>    qcow2/overlaps: Protect active L2 tables
>    qcow2/overlaps: Protect snapshot table
>    qcow2/overlaps: Protect inactive L1 tables
>    qcow2/overlaps: Protect inactive L2 tables
>    qcow2: Use new metadata overlap check function
>
>   block/Makefile.objs    |   3 +-
>   block/qcow2-cluster.c  |  13 ++
>   block/qcow2-overlap.c  | 400 +++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2-refcount.c | 202 ++++++++++---------------
>   block/qcow2-snapshot.c | 105 ++++++++++++-
>   block/qcow2.c          | 130 ++++++++++------
>   block/qcow2.h          |  13 ++
>   7 files changed, 694 insertions(+), 172 deletions(-)
>   create mode 100644 block/qcow2-overlap.c
>

Ping

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/12] qcow2/overlaps: Protect inactive L2 tables
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 11/12] qcow2/overlaps: Protect inactive L2 tables Max Reitz
@ 2015-01-21 15:23   ` Stefan Hajnoczi
  2015-01-21 16:07     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-01-21 15:23 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Peter Lieven, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 768 bytes --]

On Mon, Nov 24, 2014 at 04:56:59PM +0100, Max Reitz wrote:
> @@ -136,6 +138,34 @@ int qcow2_read_snapshots(BlockDriverState *bs)
>                                    size_to_clusters(s, sn->l1_size *
>                                                        sizeof(uint64_t)),
>                                    QCOW2_OL_INACTIVE_L1);
> +
> +        if (!(s->overlap_check & QCOW2_OL_INACTIVE_L2)) {
> +            continue;
> +        }
> +
> +        l1_table = qemu_try_blockalign(bs->file,
> +                                       sn->l1_size * sizeof(uint64_t));

At this point we haven't validated sn->l1_size <= QCOW_MAX_L1_SIZE.

A bogus l1_size means we do a huge read and add junk into the metadata
list.  I think it would be best to check the value here.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/12] qcow2: Use new metadata overlap check function
  2014-11-24 15:57 ` [Qemu-devel] [PATCH v2 12/12] qcow2: Use new metadata overlap check function Max Reitz
@ 2015-01-21 15:28   ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-01-21 15:28 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Peter Lieven, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 246 bytes --]

On Mon, Nov 24, 2014 at 04:57:00PM +0100, Max Reitz wrote:
> @@ -2166,126 +2166,6 @@ fail:
>      return ret;
>  }
>  
> -#define overlaps_with(ofs, sz) \
> -    ranges_overlap(offset, size, ofs, sz)

Dropping this macro makes me happy.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/12] qcow2/overlaps: Protect inactive L2 tables
  2015-01-21 15:23   ` Stefan Hajnoczi
@ 2015-01-21 16:07     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2015-01-21 16:07 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Peter Lieven, qemu-devel

On 2015-01-21 at 10:23, Stefan Hajnoczi wrote:
> On Mon, Nov 24, 2014 at 04:56:59PM +0100, Max Reitz wrote:
>> @@ -136,6 +138,34 @@ int qcow2_read_snapshots(BlockDriverState *bs)
>>                                     size_to_clusters(s, sn->l1_size *
>>                                                         sizeof(uint64_t)),
>>                                     QCOW2_OL_INACTIVE_L1);
>> +
>> +        if (!(s->overlap_check & QCOW2_OL_INACTIVE_L2)) {
>> +            continue;
>> +        }
>> +
>> +        l1_table = qemu_try_blockalign(bs->file,
>> +                                       sn->l1_size * sizeof(uint64_t));
> At this point we haven't validated sn->l1_size <= QCOW_MAX_L1_SIZE.
>
> A bogus l1_size means we do a huge read and add junk into the metadata
> list.  I think it would be best to check the value here.

Right, will do.

Thanks for reviewing!

Max

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 01/12] " Max Reitz
  2014-11-25 11:02   ` Max Reitz
@ 2015-01-21 16:53   ` Stefan Hajnoczi
  2015-01-21 22:12     ` Max Reitz
  2015-02-03 23:08   ` Eric Blake
  2 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2015-01-21 16:53 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Peter Lieven, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1888 bytes --]

On Mon, Nov 24, 2014 at 04:56:49PM +0100, Max Reitz wrote:
> +static void build_window_bitmap(Qcow2MetadataList *mdl,
> +                                Qcow2MetadataWindow *window)
> +{
> +    int cache_i, oldest_cache_i = -1, i;
> +    unsigned oldest_cache_age = 0;
> +
> +    for (cache_i = 0; cache_i < mdl->nb_cached_windows; cache_i++) {
> +        unsigned age;
> +
> +        if (mdl->cached_windows[cache_i] < 0) {
> +            break;
> +        }
> +
> +        age = mdl->current_age - mdl->windows[mdl->cached_windows[cache_i]].age;
> +        if (age > oldest_cache_age) {
> +            oldest_cache_age = age;
> +            oldest_cache_i = cache_i;
> +        }
> +    }
> +
> +    if (cache_i >= mdl->nb_cached_windows) {
> +        destroy_window_bitmap(mdl,
> +            &mdl->windows[mdl->cached_windows[oldest_cache_i]]);
> +        cache_i = oldest_cache_i;
> +    }
> +
> +    assert(cache_i >= 0);
> +    mdl->cached_windows[cache_i] = window - mdl->windows;
> +    window->cached_windows_index = cache_i;

Is this field ever used?

> +/**
> + * Removes a range of the given types from the metadata list.
> + */
> +void qcow2_metadata_list_remove(BlockDriverState *bs, uint64_t offset,
> +                                int nb_clusters, QCow2MetadataOverlap types)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    uint64_t start_cluster = offset >> s->cluster_bits;
> +    uint64_t end_cluster = start_cluster + nb_clusters;
> +    uint64_t current_cluster = start_cluster;
> +
> +    types &= s->overlap_check;
> +    if (!types) {
> +        return;
> +    }
> +
> +    if (offset_into_cluster(s, offset)) {
> +        /* Try to remove even broken metadata ranges */
> +        end_cluster++;

Why does it make sense to go ahead (and add another cluster on to the
end) when offset is not cluster-aligned?

It seems we might clear out the wrong metadata.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
  2015-01-21 16:53   ` Stefan Hajnoczi
@ 2015-01-21 22:12     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2015-01-21 22:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Peter Lieven, qemu-devel

On 2015-01-21 at 11:53, Stefan Hajnoczi wrote:
> On Mon, Nov 24, 2014 at 04:56:49PM +0100, Max Reitz wrote:
>> +static void build_window_bitmap(Qcow2MetadataList *mdl,
>> +                                Qcow2MetadataWindow *window)
>> +{
>> +    int cache_i, oldest_cache_i = -1, i;
>> +    unsigned oldest_cache_age = 0;
>> +
>> +    for (cache_i = 0; cache_i < mdl->nb_cached_windows; cache_i++) {
>> +        unsigned age;
>> +
>> +        if (mdl->cached_windows[cache_i] < 0) {
>> +            break;
>> +        }
>> +
>> +        age = mdl->current_age - mdl->windows[mdl->cached_windows[cache_i]].age;
>> +        if (age > oldest_cache_age) {
>> +            oldest_cache_age = age;
>> +            oldest_cache_i = cache_i;
>> +        }
>> +    }
>> +
>> +    if (cache_i >= mdl->nb_cached_windows) {
>> +        destroy_window_bitmap(mdl,
>> +            &mdl->windows[mdl->cached_windows[oldest_cache_i]]);
>> +        cache_i = oldest_cache_i;
>> +    }
>> +
>> +    assert(cache_i >= 0);
>> +    mdl->cached_windows[cache_i] = window - mdl->windows;
>> +    window->cached_windows_index = cache_i;
> Is this field ever used?

You're right, it isn't. Great! *g*

Thanks, I'll remove it.

>> +/**
>> + * Removes a range of the given types from the metadata list.
>> + */
>> +void qcow2_metadata_list_remove(BlockDriverState *bs, uint64_t offset,
>> +                                int nb_clusters, QCow2MetadataOverlap types)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    uint64_t start_cluster = offset >> s->cluster_bits;
>> +    uint64_t end_cluster = start_cluster + nb_clusters;
>> +    uint64_t current_cluster = start_cluster;
>> +
>> +    types &= s->overlap_check;
>> +    if (!types) {
>> +        return;
>> +    }
>> +
>> +    if (offset_into_cluster(s, offset)) {
>> +        /* Try to remove even broken metadata ranges */
>> +        end_cluster++;
> Why does it make sense to go ahead (and add another cluster on to the
> end) when offset is not cluster-aligned?
>
> It seems we might clear out the wrong metadata.

Which I deemed not as bad as keeping stale metadata in and then running 
into corruption wrongly detected. Feel free to convince me otherwise, I 
don't have a very strong opinion (I think we should ideally do as much 
for corruption prevention as possible, but I don't want to force users 
to run qemu-img check if they don't actually have to).

Max

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 01/12] " Max Reitz
  2014-11-25 11:02   ` Max Reitz
  2015-01-21 16:53   ` Stefan Hajnoczi
@ 2015-02-03 23:08   ` Eric Blake
  2015-02-04 16:29     ` Max Reitz
  2 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2015-02-03 23:08 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 14614 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> The existing qcow2 metadata overlap detection function used existing
> structures to determine the location of the image metadata, from plain
> fields such as l1_table_offset and l1_size in the BDRVQcowState, over
> image structures in memory such as the L1 table for the L2 tables'
> positions, or it even read the required data directly from disk for
> every requested check, such as the snapshot L1 tables for the inactive
> L2 tables' positions.
> 
> These new functions instead keep a dedicated structure for keeping track
> of the metadata positions in memory. It consists of two parts: First,
> there is one structure which is basically a list of all metadata
> structures. Each entry has a bitmask of types (because some metadata
> structures may actually overlap, such as active and inactive L2 tables),
> a number of clusters occupied and the offset from the previous entry in
> clusters. This structure requires relatively few memory, but checking a

s/few/little/

> certain point may take relatively long. Each entry is called a
> "fragment".
> 
> Therefore, there is another representation which is a bitmap, or rather
> a bytemap, of metadata types. The previously described list is split
> into multiple windows with each describing a constant number of clusters
> (WINDOW_SIZE). If the list is to be queried or changed, the respective
> window is selected in constant time and the bitmap is generated from the
> fragments belonging to the window. This bitmap can then be queried in
> constant time and easily be changed.
> 
> Because the bitmap representation requires more memory, it is only used
> as a cache. Whenever a window is removed from the cache, the fragment
> list will be rebuilt from the bitmap if the latter has been modified.
> Therefore, the fragment list is only used as the background
> representation to save memory, whereas the bitmap is used whenever
> possible.
> 
> Regarding the size of the fragment list in memory: As one refcount block
> can handle cluster_size / 2 entries and one L2 table can handle
> cluster_size / 8 entries, for a qcow2 image with the standard cluster
> size of 64 kB, there is a ratio of data to metadata of about 1/6000
> (1/32768 refblocks and 1/8192 L2 tables) if one ignores the fact that
> not every cluster requires an L2 table entry. The refcount table and the
> L1 table is generally negligible. At the worst, each metadata cluster
> requires its own entry in the fragment list; each entry takes up four
> bytes, therefore, at the worst, the fragment list should take up (for an
> image with 64 kB clusters) (4 B) / (64 kB * 6000) of the image size,
> which is about 1.e-8 (i.e., 11 kB for a 1 TB image, or 11 MB for a 1 PB
> image).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/Makefile.objs   |   3 +-
>  block/qcow2-overlap.c | 404 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.h         |  13 ++
>  3 files changed, 419 insertions(+), 1 deletion(-)
>  create mode 100644 block/qcow2-overlap.c

Are you still hoping to get this in 2.3?


> +++ b/block/qcow2-overlap.c
> @@ -0,0 +1,404 @@
> +/*
> + * QCOW2 runtime metadata overlap detection
> + *
> + * Copyright (c) 2014 Max Reitz <mreitz@redhat.com>

Slow review means it is now 2015.

> +/* Number of clusters which are covered by each metadata window;
> + * note that this may not exceed 2^16 as long as
> + * Qcow2MetadataFragment::relative_start is a uint16_t */
> +#define WINDOW_SIZE 4096

So this says that for every 4096 clusters, we have one bytemap
representation, as well as a chain of up to 4096 fragment descriptors?

> +
> +/* Describes a fragment of a or a whole metadata range; does not necessarily

s/of a or/of or/

> + * describe the whole range because it needs to be split on window boundaries */
> +typedef struct Qcow2MetadataFragment {
> +    /* Bitmask of QCow2MetadataOverlap values */
> +    uint8_t types;
> +    uint8_t nb_clusters;
> +    /* Number of clusters between the start of the window and this range */
> +    uint16_t relative_start;

So even I have a file with 4096 consecutive sectors all tied up in the
same purpose within a given window, I have to represent it as 16
Qcow2MetadataFragments rather than 1 fragment, because the uint8_t size
of nb_clusters limits me to at most 256 clusters per Fragment entry?
And worst case, a window will have a list of 4096 fragments if every
cluster alternates between some other type.

> +} QEMU_PACKED Qcow2MetadataFragment;

Is QEMU_PACKED really essential here? If I'm not mistaken, this struct
is only ever kept in memory and not written out to disk.  On the other
hand, I understand that you are trying to ensure that the compiler
packed this into 32 bits rather than injecting any padding.  Would a
BUG_ON(sizeof(Qcow2MetadataFragment) == 4) be any better at representing
that fact?

> +
> +typedef struct Qcow2MetadataWindow {
> +    Qcow2MetadataFragment *fragments;
> +    int nb_fragments, fragments_array_size;
> +
> +    /* If not NULL, this is an expanded version of the "RLE" version given by
> +     * the fragments array; there are WINDOW_SIZE entries */
> +    uint8_t *bitmap;
> +    bool bitmap_modified;
> +
> +    /* Time of last access */
> +    unsigned age;
> +
> +    /* Index in Qcow2MetadataList::cached_windows */
> +    int cached_windows_index;
> +} Qcow2MetadataWindow;
> +
> +struct Qcow2MetadataList {
> +    Qcow2MetadataWindow *windows;
> +    uint64_t nb_windows;
> +
> +    unsigned current_age;
> +
> +    /* Index into the windows array */
> +    int *cached_windows;
> +    size_t nb_cached_windows;
> +};

Is there a maximum size for nb_cached_windows before you start evicting
cached windows?

> +
> +/**
> + * Destroys the cached window bitmap. If it has been modified, the fragment list
> + * will be rebuilt accordingly.
> + */
> +static void destroy_window_bitmap(Qcow2MetadataList *mdl,
> +                                  Qcow2MetadataWindow *window)
> +{
> +    if (!window->bitmap) {
> +        return;
> +    }
> +
> +    if (window->bitmap_modified) {
> +        int bitmap_i, fragment_i = 0;
> +        QCow2MetadataOverlap current_types = 0;
> +        int current_nb_clusters = 0;
> +
> +        /* Rebuild the fragment list; the case bitmap_i == WINDOW_SIZE is for
> +         * entering the last fragment at the bitmap end */
> +
> +        for (bitmap_i = 0; bitmap_i <= WINDOW_SIZE; bitmap_i++) {
> +            /* Qcow2MetadataFragment::nb_clusters is a uint8_t, so
> +             * current_nb_clusters may not exceed 255 */

Wait. Why 255 and not 256? Can't you use nb_clusters==0 as a modulo for
256 consecutive clusters, as the fragments should never encode a
0-length run? That way, you can represent 4096 consecutive clusters in
16 fragments of 256 each, instead of 17 (16 of 255, and 1 of 16).

> +            if (bitmap_i < WINDOW_SIZE &&
> +                current_types == window->bitmap[bitmap_i] &&
> +                current_nb_clusters < 255)
> +            {
> +                current_nb_clusters++;
> +            } else {
> +                if (current_types && current_nb_clusters) {
> +                    if (fragment_i >= window->fragments_array_size) {
> +                        window->fragments_array_size =
> +                            3 * window->fragments_array_size / 2 + 1;
> +
> +                        /* new_nb_fragments should be small enough, and there is
> +                         * nothing we can do on failure anyway, so do not use
> +                         * g_try_renew() here */
> +                        window->fragments =
> +                            g_renew(Qcow2MetadataFragment, window->fragments,
> +                                    window->fragments_array_size);
> +                    }
> +
> +                    window->fragments[fragment_i++] = (Qcow2MetadataFragment){
> +                        .types          = current_types,
> +                        .nb_clusters    = current_nb_clusters,
> +                        .relative_start = bitmap_i - current_nb_clusters,
> +                    };
> +                }
> +
> +                current_nb_clusters = 0;
> +                if (bitmap_i < WINDOW_SIZE) {
> +                    current_types = window->bitmap[bitmap_i];
> +                }
> +            }
> +        }
> +
> +        window->nb_fragments = fragment_i;

Any need to clear window->bitmap_modified at this point? Or are you
careful to only rely on it when window->bitmap is non-NULL.

> +    }
> +
> +    g_free(window->bitmap);
> +    window->bitmap = NULL;
> +}
> +
> +/**
> + * Creates a bitmap from the fragment list.
> + */
> +static void build_window_bitmap(Qcow2MetadataList *mdl,
> +                                Qcow2MetadataWindow *window)
> +{
> +    int cache_i, oldest_cache_i = -1, i;
> +    unsigned oldest_cache_age = 0;
> +
> +    for (cache_i = 0; cache_i < mdl->nb_cached_windows; cache_i++) {
> +        unsigned age;
> +
> +        if (mdl->cached_windows[cache_i] < 0) {
> +            break;
> +        }
> +
> +        age = mdl->current_age - mdl->windows[mdl->cached_windows[cache_i]].age;
> +        if (age > oldest_cache_age) {
> +            oldest_cache_age = age;
> +            oldest_cache_i = cache_i;
> +        }
> +    }
> +
> +    if (cache_i >= mdl->nb_cached_windows) {
> +        destroy_window_bitmap(mdl,
> +            &mdl->windows[mdl->cached_windows[oldest_cache_i]]);
> +        cache_i = oldest_cache_i;
> +    }
> +
> +    assert(cache_i >= 0);
> +    mdl->cached_windows[cache_i] = window - mdl->windows;
> +    window->cached_windows_index = cache_i;
> +
> +    window->age = mdl->current_age++;
> +
> +    window->bitmap = g_new0(uint8_t, WINDOW_SIZE);
> +
> +    for (i = 0; i < window->nb_fragments; i++) {
> +        Qcow2MetadataFragment *fragment = &window->fragments[i];
> +
> +        memset(&window->bitmap[fragment->relative_start], fragment->types,
> +               fragment->nb_clusters);

Hmm. If you do use my idea of nb_clusters==0 for 256, this needs a
special case. Another option would be storing number of clusters - 1 (so
a value of 0 is 1 cluster, a value of 255 is 256 clusters).

> +    }
> +
> +    window->bitmap_modified = false;
> +}
> +
> +/**
> + * Enters a new range into the metadata list.
> + */
> +void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
> +                               int nb_clusters, QCow2MetadataOverlap types)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    uint64_t start_cluster = offset >> s->cluster_bits;
> +    uint64_t end_cluster = start_cluster + nb_clusters;
> +    uint64_t current_cluster = start_cluster;
> +
> +    types &= s->overlap_check;
> +    if (!types) {
> +        return;
> +    }
> +
> +    if (offset_into_cluster(s, offset)) {
> +        /* Do not enter apparently broken metadata ranges */
> +        return;
> +    }
> +
> +    while (current_cluster < end_cluster) {
> +        int bitmap_i;
> +        int bitmap_i_start = current_cluster % WINDOW_SIZE;
> +        int bitmap_i_end = MIN(WINDOW_SIZE,
> +                               end_cluster - current_cluster + bitmap_i_start);
> +        uint64_t window_i = current_cluster / WINDOW_SIZE;
> +        Qcow2MetadataWindow *window;
> +
> +        if (window_i >= s->metadata_list->nb_windows) {
> +            /* This should not be happening too often, so it is fine to resize
> +             * the array to exactly the required size */
> +            Qcow2MetadataWindow *new_windows;
> +
> +            new_windows = g_try_renew(Qcow2MetadataWindow,
> +                                      s->metadata_list->windows,
> +                                      window_i + 1);
> +            if (!new_windows) {
> +                return;
> +            }
> +
> +            memset(new_windows + s->metadata_list->nb_windows, 0,
> +                   (window_i + 1 - s->metadata_list->nb_windows) *
> +                   sizeof(Qcow2MetadataWindow));
> +
> +            s->metadata_list->windows = new_windows;
> +            s->metadata_list->nb_windows = window_i + 1;
> +        }
> +
> +        window = &s->metadata_list->windows[window_i];
> +        if (!window->bitmap) {
> +            build_window_bitmap(s->metadata_list, window);
> +        }
> +
> +        for (bitmap_i = bitmap_i_start; bitmap_i < bitmap_i_end; bitmap_i++) {
> +            window->bitmap[bitmap_i] |= types;

This adds in new types but keeps existing types listed.  Is that okay?

> +        }
> +
> +        window->age = s->metadata_list->current_age++;
> +        window->bitmap_modified = true;
> +
> +        /* Go to the next window */
> +        current_cluster += WINDOW_SIZE - bitmap_i_start;
> +    }
> +}
...

> +++ b/block/qcow2.h
> @@ -159,6 +159,9 @@ typedef struct QCowSnapshot {
>  struct Qcow2Cache;
>  typedef struct Qcow2Cache Qcow2Cache;
>  
> +struct Qcow2MetadataList;
> +typedef struct Qcow2MetadataList Qcow2MetadataList;
> +
>  typedef struct Qcow2UnknownHeaderExtension {
>      uint32_t magic;
>      uint32_t len;
> @@ -261,6 +264,7 @@ typedef struct BDRVQcowState {
>  
>      bool discard_passthrough[QCOW2_DISCARD_MAX];
>  
> +    Qcow2MetadataList *metadata_list;
>      int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
>      bool signaled_corruption;
>  
> @@ -576,4 +580,13 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>      void **table);
>  int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
>  
> +/* qcow2-overlap.c functions */
> +int qcow2_create_empty_metadata_list(BlockDriverState *bs, size_t cache_size,
> +                                     Error **errp);
> +void qcow2_metadata_list_destroy(BlockDriverState *bs);
> +void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
> +                               int nb_clusters, QCow2MetadataOverlap type);
> +void qcow2_metadata_list_remove(BlockDriverState *bs, uint64_t offset,
> +                                int nb_clusters, QCow2MetadataOverlap type);
> +
>  #endif
> 

Looks reasonable in general.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/12] qcow2: Pull up overlap check option evaluation
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 02/12] qcow2: Pull up overlap check option evaluation Max Reitz
@ 2015-02-03 23:33   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-02-03 23:33 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Pull up the absorption of the qcow2-relevant command line options and
> the evaluation of the overlap check options in qcow2_open().
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 96 +++++++++++++++++++++++++++++------------------------------
>  1 file changed, 48 insertions(+), 48 deletions(-)
> 

Simple code motion.
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/12] qcow2: Create metadata list
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 03/12] qcow2: Create metadata list Max Reitz
@ 2015-02-03 23:34   ` Eric Blake
  2015-02-04 16:31     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2015-02-03 23:34 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Create and destroy the metadata list on creation and destruction of a
> qcow2 BDS, respectively. Skip creation if no overlap checks should be
> performed.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index ed88d69..f80f9ed 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -744,6 +744,13 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>                                overlap_check_template & (1 << i)) << i;
>      }
>  
> +    if (s->overlap_check) {
> +        ret = qcow2_create_empty_metadata_list(bs, 65536, errp);

Why 64k? Does this magic number need a name?

Otherwise,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/12] qcow2/overlaps: Protect image header
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 04/12] qcow2/overlaps: Protect image header Max Reitz
@ 2015-02-03 23:47   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-02-03 23:47 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 932 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Enter the image header into the metadata list to protect it against
> accidental modifications.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f80f9ed..19ac2df 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -751,6 +751,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>          }
>      }
>  
> +    qcow2_metadata_list_enter(bs, 0, 1, QCOW2_OL_MAIN_HEADER);
> +
>      s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>      s->l2_size = 1 << s->l2_bits;
>      /* 2^(s->refcount_order - 3) is the refcount width in bytes */
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/12] qcow2/overlaps: Protect refcount table
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 05/12] qcow2/overlaps: Protect refcount table Max Reitz
@ 2015-02-03 23:55   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-02-03 23:55 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 497 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Keep track of the refcount table in the metadata list to protect it
> against accidental modifications.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 18 ++++++++++++++++++
>  block/qcow2.c          |  4 ++++
>  2 files changed, 22 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
  2015-02-03 23:08   ` Eric Blake
@ 2015-02-04 16:29     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2015-02-04 16:29 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2015-02-03 at 18:08, Eric Blake wrote:
> On 11/24/2014 08:56 AM, Max Reitz wrote:
>> The existing qcow2 metadata overlap detection function used existing
>> structures to determine the location of the image metadata, from plain
>> fields such as l1_table_offset and l1_size in the BDRVQcowState, over
>> image structures in memory such as the L1 table for the L2 tables'
>> positions, or it even read the required data directly from disk for
>> every requested check, such as the snapshot L1 tables for the inactive
>> L2 tables' positions.
>>
>> These new functions instead keep a dedicated structure for keeping track
>> of the metadata positions in memory. It consists of two parts: First,
>> there is one structure which is basically a list of all metadata
>> structures. Each entry has a bitmask of types (because some metadata
>> structures may actually overlap, such as active and inactive L2 tables),
>> a number of clusters occupied and the offset from the previous entry in
>> clusters. This structure requires relatively few memory, but checking a
> s/few/little/
>
>> certain point may take relatively long. Each entry is called a
>> "fragment".
>>
>> Therefore, there is another representation which is a bitmap, or rather
>> a bytemap, of metadata types. The previously described list is split
>> into multiple windows with each describing a constant number of clusters
>> (WINDOW_SIZE). If the list is to be queried or changed, the respective
>> window is selected in constant time and the bitmap is generated from the
>> fragments belonging to the window. This bitmap can then be queried in
>> constant time and easily be changed.
>>
>> Because the bitmap representation requires more memory, it is only used
>> as a cache. Whenever a window is removed from the cache, the fragment
>> list will be rebuilt from the bitmap if the latter has been modified.
>> Therefore, the fragment list is only used as the background
>> representation to save memory, whereas the bitmap is used whenever
>> possible.
>>
>> Regarding the size of the fragment list in memory: As one refcount block
>> can handle cluster_size / 2 entries and one L2 table can handle
>> cluster_size / 8 entries, for a qcow2 image with the standard cluster
>> size of 64 kB, there is a ratio of data to metadata of about 1/6000
>> (1/32768 refblocks and 1/8192 L2 tables) if one ignores the fact that
>> not every cluster requires an L2 table entry. The refcount table and the
>> L1 table is generally negligible. At the worst, each metadata cluster
>> requires its own entry in the fragment list; each entry takes up four
>> bytes, therefore, at the worst, the fragment list should take up (for an
>> image with 64 kB clusters) (4 B) / (64 kB * 6000) of the image size,
>> which is about 1.e-8 (i.e., 11 kB for a 1 TB image, or 11 MB for a 1 PB
>> image).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/Makefile.objs   |   3 +-
>>   block/qcow2-overlap.c | 404 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.h         |  13 ++
>>   3 files changed, 419 insertions(+), 1 deletion(-)
>>   create mode 100644 block/qcow2-overlap.c
> Are you still hoping to get this in 2.3?

There's no reason to rush it into 2.3.

>> +++ b/block/qcow2-overlap.c
>> @@ -0,0 +1,404 @@
>> +/*
>> + * QCOW2 runtime metadata overlap detection
>> + *
>> + * Copyright (c) 2014 Max Reitz <mreitz@redhat.com>
> Slow review means it is now 2015.

Well, if this series had been fine, 2014 would have been correct. ;-)

>> +/* Number of clusters which are covered by each metadata window;
>> + * note that this may not exceed 2^16 as long as
>> + * Qcow2MetadataFragment::relative_start is a uint16_t */
>> +#define WINDOW_SIZE 4096
> So this says that for every 4096 clusters, we have one bytemap
> representation, as well as a chain of up to 4096 fragment descriptors?

Indeed.

>> +
>> +/* Describes a fragment of a or a whole metadata range; does not necessarily
> s/of a or/of or/
>
>> + * describe the whole range because it needs to be split on window boundaries */
>> +typedef struct Qcow2MetadataFragment {
>> +    /* Bitmask of QCow2MetadataOverlap values */
>> +    uint8_t types;
>> +    uint8_t nb_clusters;
>> +    /* Number of clusters between the start of the window and this range */
>> +    uint16_t relative_start;
> So even I have a file with 4096 consecutive sectors all tied up in the
> same purpose within a given window, I have to represent it as 16
> Qcow2MetadataFragments rather than 1 fragment, because the uint8_t size
> of nb_clusters limits me to at most 256 clusters per Fragment entry?

Yes, but remember that only metadata is covered, data clusters aren't. 
You'll have a much harder time finding a qcow2 image with 256 
consecutive metadata clusters of the same type (it certainly can happen 
with L2 tables and refcount blocks, but it is much less likely).

> And worst case, a window will have a list of 4096 fragments if every
> cluster alternates between some other type.

Indeed, but again, very unlikely (at least it's very unlikely for a lot 
of windows to be like that; in general, we'll need one fragment per 
metadata cluster (at maximum) and it doesn't really matter whether 
they're all in the same window or spread all over the image).

If you're really concerned about DoS, I propose that we could let the 
user specify the maximum size to be used for the overlap structures, and 
if that is exceeded, no overlap checks will be performed (and we'd set 
that size to something that seems reasonable to us by default). There 
already is an option @cache_size for qcow2_create_empty_metadata_list(), 
which we could just use for the whole metadata structures and if it 
becomes impossible to adhere to that limit, just drop the checks (maybe 
coupled with a QMP event).

>> +} QEMU_PACKED Qcow2MetadataFragment;
> Is QEMU_PACKED really essential here? If I'm not mistaken, this struct
> is only ever kept in memory and not written out to disk.

It safes memory (albeit the compiler probably doesn't inject any padding 
anyway in this case).

> On the other
> hand, I understand that you are trying to ensure that the compiler
> packed this into 32 bits rather than injecting any padding.  Would a
> BUG_ON(sizeof(Qcow2MetadataFragment) == 4) be any better at representing
> that fact?

I don't know. First, it wouldn't be better, because we'd need the 
QEMU_PACKED anyway. Second, if the compiler decides that even with 
QEMU_PACKED the structure should occupy five bytes, something definitely 
is wrong, but it doesn't really concern me here. As long as the 
structure as small as possible, that's fine.

So I can most certainly add that assertion, but I don't think it'll be 
too useful (and it won't replace the QEMU_PACKED).

>> +
>> +typedef struct Qcow2MetadataWindow {
>> +    Qcow2MetadataFragment *fragments;
>> +    int nb_fragments, fragments_array_size;
>> +
>> +    /* If not NULL, this is an expanded version of the "RLE" version given by
>> +     * the fragments array; there are WINDOW_SIZE entries */
>> +    uint8_t *bitmap;
>> +    bool bitmap_modified;
>> +
>> +    /* Time of last access */
>> +    unsigned age;
>> +
>> +    /* Index in Qcow2MetadataList::cached_windows */
>> +    int cached_windows_index;
>> +} Qcow2MetadataWindow;
>> +
>> +struct Qcow2MetadataList {
>> +    Qcow2MetadataWindow *windows;
>> +    uint64_t nb_windows;
>> +
>> +    unsigned current_age;
>> +
>> +    /* Index into the windows array */
>> +    int *cached_windows;
>> +    size_t nb_cached_windows;
>> +};
> Is there a maximum size for nb_cached_windows before you start evicting
> cached windows?

Yes, and it isn't fixed. This series does not introduce an option for 
the user to set it, but it would definitely be possible.

>> +
>> +/**
>> + * Destroys the cached window bitmap. If it has been modified, the fragment list
>> + * will be rebuilt accordingly.
>> + */
>> +static void destroy_window_bitmap(Qcow2MetadataList *mdl,
>> +                                  Qcow2MetadataWindow *window)
>> +{
>> +    if (!window->bitmap) {
>> +        return;
>> +    }
>> +
>> +    if (window->bitmap_modified) {
>> +        int bitmap_i, fragment_i = 0;
>> +        QCow2MetadataOverlap current_types = 0;
>> +        int current_nb_clusters = 0;
>> +
>> +        /* Rebuild the fragment list; the case bitmap_i == WINDOW_SIZE is for
>> +         * entering the last fragment at the bitmap end */
>> +
>> +        for (bitmap_i = 0; bitmap_i <= WINDOW_SIZE; bitmap_i++) {
>> +            /* Qcow2MetadataFragment::nb_clusters is a uint8_t, so
>> +             * current_nb_clusters may not exceed 255 */
> Wait. Why 255 and not 256? Can't you use nb_clusters==0 as a modulo for
> 256 consecutive clusters, as the fragments should never encode a
> 0-length run?

I could, and if you insist on it, I'll do. But I don't know if we'll 
gain a lot from it, and it'll make the code more complicated (although 
just how much more complicated remains to be seen). Maybe just storing 
nb_clusters - 1 is simpler.

> That way, you can represent 4096 consecutive clusters in
> 16 fragments of 256 each, instead of 17 (16 of 255, and 1 of 16).
>
>> +            if (bitmap_i < WINDOW_SIZE &&
>> +                current_types == window->bitmap[bitmap_i] &&
>> +                current_nb_clusters < 255)
>> +            {
>> +                current_nb_clusters++;
>> +            } else {
>> +                if (current_types && current_nb_clusters) {
>> +                    if (fragment_i >= window->fragments_array_size) {
>> +                        window->fragments_array_size =
>> +                            3 * window->fragments_array_size / 2 + 1;
>> +
>> +                        /* new_nb_fragments should be small enough, and there is
>> +                         * nothing we can do on failure anyway, so do not use
>> +                         * g_try_renew() here */
>> +                        window->fragments =
>> +                            g_renew(Qcow2MetadataFragment, window->fragments,
>> +                                    window->fragments_array_size);
>> +                    }
>> +
>> +                    window->fragments[fragment_i++] = (Qcow2MetadataFragment){
>> +                        .types          = current_types,
>> +                        .nb_clusters    = current_nb_clusters,
>> +                        .relative_start = bitmap_i - current_nb_clusters,
>> +                    };
>> +                }
>> +
>> +                current_nb_clusters = 0;
>> +                if (bitmap_i < WINDOW_SIZE) {
>> +                    current_types = window->bitmap[bitmap_i];
>> +                }
>> +            }
>> +        }
>> +
>> +        window->nb_fragments = fragment_i;
> Any need to clear window->bitmap_modified at this point? Or are you
> careful to only rely on it when window->bitmap is non-NULL.

window->bitmap_modified is read only in one place, and that is right in 
this function which is only called if window->bitmap != NULL.

window->bitmap_modified is written in several places, but all of those 
are preceded by writes to the bitmap so window->bitmap is sure to be 
non-NULL there (if it was NULL, build_window_bitmap() is called beforehand).

>> +    }
>> +
>> +    g_free(window->bitmap);
>> +    window->bitmap = NULL;
>> +}
>> +
>> +/**
>> + * Creates a bitmap from the fragment list.
>> + */
>> +static void build_window_bitmap(Qcow2MetadataList *mdl,
>> +                                Qcow2MetadataWindow *window)
>> +{
>> +    int cache_i, oldest_cache_i = -1, i;
>> +    unsigned oldest_cache_age = 0;
>> +
>> +    for (cache_i = 0; cache_i < mdl->nb_cached_windows; cache_i++) {
>> +        unsigned age;
>> +
>> +        if (mdl->cached_windows[cache_i] < 0) {
>> +            break;
>> +        }
>> +
>> +        age = mdl->current_age - mdl->windows[mdl->cached_windows[cache_i]].age;
>> +        if (age > oldest_cache_age) {
>> +            oldest_cache_age = age;
>> +            oldest_cache_i = cache_i;
>> +        }
>> +    }
>> +
>> +    if (cache_i >= mdl->nb_cached_windows) {
>> +        destroy_window_bitmap(mdl,
>> +            &mdl->windows[mdl->cached_windows[oldest_cache_i]]);
>> +        cache_i = oldest_cache_i;
>> +    }
>> +
>> +    assert(cache_i >= 0);
>> +    mdl->cached_windows[cache_i] = window - mdl->windows;
>> +    window->cached_windows_index = cache_i;
>> +
>> +    window->age = mdl->current_age++;
>> +
>> +    window->bitmap = g_new0(uint8_t, WINDOW_SIZE);
>> +
>> +    for (i = 0; i < window->nb_fragments; i++) {
>> +        Qcow2MetadataFragment *fragment = &window->fragments[i];
>> +
>> +        memset(&window->bitmap[fragment->relative_start], fragment->types,
>> +               fragment->nb_clusters);
> Hmm. If you do use my idea of nb_clusters==0 for 256, this needs a
> special case. Another option would be storing number of clusters - 1 (so
> a value of 0 is 1 cluster, a value of 255 is 256 clusters).
>
>> +    }
>> +
>> +    window->bitmap_modified = false;
>> +}
>> +
>> +/**
>> + * Enters a new range into the metadata list.
>> + */
>> +void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
>> +                               int nb_clusters, QCow2MetadataOverlap types)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    uint64_t start_cluster = offset >> s->cluster_bits;
>> +    uint64_t end_cluster = start_cluster + nb_clusters;
>> +    uint64_t current_cluster = start_cluster;
>> +
>> +    types &= s->overlap_check;
>> +    if (!types) {
>> +        return;
>> +    }
>> +
>> +    if (offset_into_cluster(s, offset)) {
>> +        /* Do not enter apparently broken metadata ranges */
>> +        return;
>> +    }
>> +
>> +    while (current_cluster < end_cluster) {
>> +        int bitmap_i;
>> +        int bitmap_i_start = current_cluster % WINDOW_SIZE;
>> +        int bitmap_i_end = MIN(WINDOW_SIZE,
>> +                               end_cluster - current_cluster + bitmap_i_start);
>> +        uint64_t window_i = current_cluster / WINDOW_SIZE;
>> +        Qcow2MetadataWindow *window;
>> +
>> +        if (window_i >= s->metadata_list->nb_windows) {
>> +            /* This should not be happening too often, so it is fine to resize
>> +             * the array to exactly the required size */
>> +            Qcow2MetadataWindow *new_windows;
>> +
>> +            new_windows = g_try_renew(Qcow2MetadataWindow,
>> +                                      s->metadata_list->windows,
>> +                                      window_i + 1);
>> +            if (!new_windows) {
>> +                return;
>> +            }
>> +
>> +            memset(new_windows + s->metadata_list->nb_windows, 0,
>> +                   (window_i + 1 - s->metadata_list->nb_windows) *
>> +                   sizeof(Qcow2MetadataWindow));
>> +
>> +            s->metadata_list->windows = new_windows;
>> +            s->metadata_list->nb_windows = window_i + 1;
>> +        }
>> +
>> +        window = &s->metadata_list->windows[window_i];
>> +        if (!window->bitmap) {
>> +            build_window_bitmap(s->metadata_list, window);
>> +        }
>> +
>> +        for (bitmap_i = bitmap_i_start; bitmap_i < bitmap_i_end; bitmap_i++) {
>> +            window->bitmap[bitmap_i] |= types;
> This adds in new types but keeps existing types listed.  Is that okay?

Yes. The caller is responsible for making sure all overlaps are intended 
(like an L2 table being both active an inactive at the same time).

>> +        }
>> +
>> +        window->age = s->metadata_list->current_age++;
>> +        window->bitmap_modified = true;
>> +
>> +        /* Go to the next window */
>> +        current_cluster += WINDOW_SIZE - bitmap_i_start;
>> +    }
>> +}
> ...
>
>> +++ b/block/qcow2.h
>> @@ -159,6 +159,9 @@ typedef struct QCowSnapshot {
>>   struct Qcow2Cache;
>>   typedef struct Qcow2Cache Qcow2Cache;
>>   
>> +struct Qcow2MetadataList;
>> +typedef struct Qcow2MetadataList Qcow2MetadataList;
>> +
>>   typedef struct Qcow2UnknownHeaderExtension {
>>       uint32_t magic;
>>       uint32_t len;
>> @@ -261,6 +264,7 @@ typedef struct BDRVQcowState {
>>   
>>       bool discard_passthrough[QCOW2_DISCARD_MAX];
>>   
>> +    Qcow2MetadataList *metadata_list;
>>       int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
>>       bool signaled_corruption;
>>   
>> @@ -576,4 +580,13 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>>       void **table);
>>   int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
>>   
>> +/* qcow2-overlap.c functions */
>> +int qcow2_create_empty_metadata_list(BlockDriverState *bs, size_t cache_size,
>> +                                     Error **errp);
>> +void qcow2_metadata_list_destroy(BlockDriverState *bs);
>> +void qcow2_metadata_list_enter(BlockDriverState *bs, uint64_t offset,
>> +                               int nb_clusters, QCow2MetadataOverlap type);
>> +void qcow2_metadata_list_remove(BlockDriverState *bs, uint64_t offset,
>> +                                int nb_clusters, QCow2MetadataOverlap type);
>> +
>>   #endif
>>
> Looks reasonable in general.

Thanks! :-)

Max

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/12] qcow2: Create metadata list
  2015-02-03 23:34   ` Eric Blake
@ 2015-02-04 16:31     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2015-02-04 16:31 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2015-02-03 at 18:34, Eric Blake wrote:
> On 11/24/2014 08:56 AM, Max Reitz wrote:
>> Create and destroy the metadata list on creation and destruction of a
>> qcow2 BDS, respectively. Skip creation if no overlap checks should be
>> performed.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index ed88d69..f80f9ed 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -744,6 +744,13 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>                                 overlap_check_template & (1 << i)) << i;
>>       }
>>   
>> +    if (s->overlap_check) {
>> +        ret = qcow2_create_empty_metadata_list(bs, 65536, errp);
> Why 64k? Does this magic number need a name?

Well, maybe I should add a comment "TODO: The user should be able to 
override this default".

> Otherwise,
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

Max

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/12] qcow2/overlaps: Protect refcount blocks
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 06/12] qcow2/overlaps: Protect refcount blocks Max Reitz
@ 2015-02-05  1:24   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-02-05  1:24 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Keep track of the refcount blocks in the metadata list to protect them
> against accidental modifications.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 38 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 37 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/12] qcow2/overlaps: Protect active L1 table
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 07/12] qcow2/overlaps: Protect active L1 table Max Reitz
@ 2015-02-05  1:25   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-02-05  1:25 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Keep track of the active L1 table in the metadata list to protect it
> against accidental modifications.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-cluster.c  | 11 +++++++++++
>  block/qcow2-snapshot.c | 10 ++++++++++
>  block/qcow2.c          |  4 ++++
>  3 files changed, 25 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/12] qcow2/overlaps: Protect active L2 tables
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 08/12] qcow2/overlaps: Protect active L2 tables Max Reitz
@ 2015-02-05 15:27   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-02-05 15:27 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 597 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Keep track of the active L2 tables in the metadata list to protect them
> against accidental modifications.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-cluster.c  |  2 ++
>  block/qcow2-refcount.c |  6 ++++++
>  block/qcow2-snapshot.c | 21 +++++++++++++++++++++
>  block/qcow2.c          |  8 +++++++-
>  4 files changed, 36 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/12] qcow2/overlaps: Protect snapshot table
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 09/12] qcow2/overlaps: Protect snapshot table Max Reitz
@ 2015-02-05 15:29   ` Eric Blake
  2015-02-05 15:30     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2015-02-05 15:29 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Keep track of the snapshot table in the metadata list to protect it
> against accidental modifications.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-snapshot.c | 10 ++++++++++
>  block/qcow2.c          |  6 ++++++
>  2 files changed, 16 insertions(+)
> 

> +    if (header.nb_snapshots) {
> +        qcow2_metadata_list_enter(bs, header.snapshots_offset,
> +                                  size_to_clusters(s, header.nb_snapshots *
> +                                                   sizeof(QCowSnapshotHeader)),

In other patches in this series, you had been aligning the sizeof() with
the parameter expression it continues, rather than the first parameter.

blah(s, count *
        sizeof(struct))

vs.

blah(s, count *
     sizeof(struct))


I don't care either way, just wanted to point it out, in case you care.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/12] qcow2/overlaps: Protect snapshot table
  2015-02-05 15:29   ` Eric Blake
@ 2015-02-05 15:30     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2015-02-05 15:30 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2015-02-05 at 10:29, Eric Blake wrote:
> On 11/24/2014 08:56 AM, Max Reitz wrote:
>> Keep track of the snapshot table in the metadata list to protect it
>> against accidental modifications.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-snapshot.c | 10 ++++++++++
>>   block/qcow2.c          |  6 ++++++
>>   2 files changed, 16 insertions(+)
>>
>> +    if (header.nb_snapshots) {
>> +        qcow2_metadata_list_enter(bs, header.snapshots_offset,
>> +                                  size_to_clusters(s, header.nb_snapshots *
>> +                                                   sizeof(QCowSnapshotHeader)),
> In other patches in this series, you had been aligning the sizeof() with
> the parameter expression it continues, rather than the first parameter.
>
> blah(s, count *
>          sizeof(struct))
>
> vs.
>
> blah(s, count *
>       sizeof(struct))
>
>
> I don't care either way, just wanted to point it out, in case you care.

I do care. :-)

In this case I didn't align it because that would have broken the 80 
characters limit.

Mayb

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/12] qcow2/overlaps: Protect inactive L1 tables
  2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 10/12] qcow2/overlaps: Protect inactive L1 tables Max Reitz
@ 2015-02-05 19:41   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2015-02-05 19:41 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 472 bytes --]

On 11/24/2014 08:56 AM, Max Reitz wrote:
> Keep track of the inactive L1 tables in the metadata list to protect
> them against accidental modifications.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-snapshot.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-02-05 19:41 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-24 15:56 [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Max Reitz
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 01/12] " Max Reitz
2014-11-25 11:02   ` Max Reitz
2015-01-21 16:53   ` Stefan Hajnoczi
2015-01-21 22:12     ` Max Reitz
2015-02-03 23:08   ` Eric Blake
2015-02-04 16:29     ` Max Reitz
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 02/12] qcow2: Pull up overlap check option evaluation Max Reitz
2015-02-03 23:33   ` Eric Blake
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 03/12] qcow2: Create metadata list Max Reitz
2015-02-03 23:34   ` Eric Blake
2015-02-04 16:31     ` Max Reitz
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 04/12] qcow2/overlaps: Protect image header Max Reitz
2015-02-03 23:47   ` Eric Blake
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 05/12] qcow2/overlaps: Protect refcount table Max Reitz
2015-02-03 23:55   ` Eric Blake
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 06/12] qcow2/overlaps: Protect refcount blocks Max Reitz
2015-02-05  1:24   ` Eric Blake
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 07/12] qcow2/overlaps: Protect active L1 table Max Reitz
2015-02-05  1:25   ` Eric Blake
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 08/12] qcow2/overlaps: Protect active L2 tables Max Reitz
2015-02-05 15:27   ` Eric Blake
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 09/12] qcow2/overlaps: Protect snapshot table Max Reitz
2015-02-05 15:29   ` Eric Blake
2015-02-05 15:30     ` Max Reitz
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 10/12] qcow2/overlaps: Protect inactive L1 tables Max Reitz
2015-02-05 19:41   ` Eric Blake
2014-11-24 15:56 ` [Qemu-devel] [PATCH v2 11/12] qcow2/overlaps: Protect inactive L2 tables Max Reitz
2015-01-21 15:23   ` Stefan Hajnoczi
2015-01-21 16:07     ` Max Reitz
2014-11-24 15:57 ` [Qemu-devel] [PATCH v2 12/12] qcow2: Use new metadata overlap check function Max Reitz
2015-01-21 15:28   ` Stefan Hajnoczi
2014-11-24 16:56 ` [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check functions Eric Blake
2014-11-24 17:35 ` Max Reitz
2014-11-25 10:55 ` Max Reitz
2014-11-25 13:25 ` Stefan Hajnoczi
2014-12-15  9:43 ` Max Reitz
2015-01-20 22:48 ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.