All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality
@ 2013-01-16 16:24 Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 01/36] qcow2: Add deduplication to the qcow2 specification Benoît Canet
                   ` (35 more replies)
  0 siblings, 36 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This patchset create the core infrastructure for deduplication and enable it.

One can compile and install https://github.com/wernerd/Skein3Fish and use the
--enable-skein-dedup configure option in order to use the faster skein HASH.

Images must be created with "-o dedup=[skein|sha256]" in order to activate the
deduplication in the image.

Deduplication is now fast enough to be usable.
Nice side effect is that duplicated writes are faster than native QCOW2:

v5:
    Move qemu-io-test dedup patch [Eric]
    Reserve some room at the end of the QCOW header extensions. [Eric]
    Fix the specification. [Eric]
    Now overflow deduplication refcount at 2^16/2 [Stefan]
    Increase L2 table size and deduplication block hash size.

v4: Fix and complete qcow2 spec [Stefan]
    Hash the hash_algo field in the header extension [Stefan]
    Fix qcow2 spec [Eric]
    Remove pointer to hash and simplify hash memory management [Stefan]
    Rename and move qcow2_read_cluster_data to qcow2.c [Stefan]
    Document lock dropping behaviour of the previous function [Stefan]
    cleanup qcow2_dedup_read_missing_cluster_data [Stefan]
    rename *_offset to *_sect [Stefan]
    add a ./configure check for ssl [Stefan]
    Replace openssl by gnutls [Stefan]
    Implement Skein hashes
    Rewrite pretty every qcow2-dedup.c commits after Add
       qcow2_dedup_read_missing_and_concatenate to simplify the code
    Use 64KB deduplication hash block to reduce allocation flushes
    Use 64KB l2 tables to reduce allocation flushes [breaks compatibility]
    Use lazy refcounts to avoid qcow2_cache_set_dependency loops resultings
       in frequent caches flushes
    Do not create and load dedup RAM structures when bdrs->read_only is true

v3: make it work barely
    replace kernel red black trees by gtree.

Benoît Canet (36):
  qcow2: Add deduplication to the qcow2 specification.
  qcow2: Add deduplication structures and fields.
  qcow2: Add qcow2_dedup_read_missing_and_concatenate
  qcow2: Make update_refcount public.
  qcow2: Create a way to link to l2 tables when deduplicating.
  qcow2: Add qcow2_dedup and related functions
  qcow2: Add qcow2_dedup_store_new_hashes.
  qcow2: Implement qcow2_compute_cluster_hash.
  qcow2: Extract qcow2_dedup_grow_table
  qcow2: Add qcow2_dedup_grow_table and use it.
  qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate
    clusters.
  qcow2: make the deduplication forget a cluster hash when a cluster is
    to dedupe
  qcow2: Create qcow2_is_cluster_to_dedup.
  qcow2: Load and save deduplication table header extension.
  qcow2: Extract qcow2_do_table_init.
  qcow2-cache: Allow to choose table size at creation.
  qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
  block: Add qemu-img dedup create option.
  qcow2: Add a deduplication boolean to update_refcount.
  qcow2: Drop hash for a given cluster when dedup makes refcount >
    2^16/2.
  qcow2: Remove hash when cluster is deleted.
  qcow2: Add qcow2_dedup_is_running to probe if dedup is running.
  qcow2: Integrate deduplication in qcow2_co_writev loop.
  qcow2: Serialize write requests when deduplication is activated.
  qcow2: Add verification of dedup table.
  qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
  qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
  qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
  qcow2: Integrate SKEIN hash algorithm in deduplication.
  qcow2: Add lazy refcounts to deduplication to prevent
    qcow2_cache_set_dependency loops
  qcow2: Use large L2 table for deduplication.
  qcow: Set large dedup hash block size.
  qemu-iotests: Filter dedup=on/off so existing tests don't break.
  qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
  qcow2: Add qcow2_co_dedup_resume to restart deduplication.
  qcow2: Enable the deduplication feature.

 block/Makefile.objs          |    1 +
 block/qcow2-cache.c          |   12 +-
 block/qcow2-cluster.c        |  182 ++++--
 block/qcow2-dedup.c          | 1298 ++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c       |  175 ++++--
 block/qcow2.c                |  368 ++++++++++--
 block/qcow2.h                |  144 ++++-
 configure                    |   55 ++
 docs/specs/qcow2.txt         |  104 +++-
 include/block/block_int.h    |    1 +
 tests/qemu-iotests/common.rc |    3 +-
 11 files changed, 2209 insertions(+), 134 deletions(-)
 create mode 100644 block/qcow2-dedup.c

--
1.7.10.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 01/36] qcow2: Add deduplication to the qcow2 specification.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 02/36] qcow2: Add deduplication structures and fields Benoît Canet
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 docs/specs/qcow2.txt |  104 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..d5f8072 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@ in the description of a field.
                                 tables to repair refcounts before accessing the
                                 image.
 
-                    Bits 1-63:  Reserved (set to 0)
+                    Bit 1:      Deduplication bit.  If this bit is set then
+                                deduplication is used on this image.
+                                L2 tables size 64KB is different from
+                                cluster size 4KB.
+
+                    Bits 2-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -116,6 +121,7 @@ be stored. Each extension has a structure like the following:
                         0x00000000 - End of the header extension area
                         0xE2792ACA - Backing file format name
                         0x6803f857 - Feature name table
+                        0xCD8E819B - Deduplication
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -159,6 +165,100 @@ the header extension data. Each entry look like this:
                     terminated if it has full length)
 
 
+== Deduplication ==
+
+The deduplication extension contains information concerning deduplication.
+
+    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
+
+          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
+                    pointers
+
+              12:   Hash algo enum field
+                        0: SHA-256
+                        1: SHA3
+                        2: SKEIN-256
+
+              13:   Dedup strategies bitmap
+                        0: RAM based hash lookup (always set to 1 for now)
+                        1: Disk based hash lookup
+                        2: Deduplication running if set to 1
+
+        14 - 69:    Set to zero and reserved for future use
+
+Disk based lookup structure will be described in a future QCOW2 specification.
+
+== Deduplication table (RAM method) ==
+
+The deduplication table maps a physical offset to a data hash and
+logical offset. It is used to permanently store the information to
+do the deduplication. It is loaded at startup into a RAM based representation
+used to do the lookups.
+
+The deduplication table contains 64-bit offsets to the level 2 deduplication
+table blocks.
+Each entry of these blocks contains a 32-byte SHA256 hash followed by the
+64-bit logical offset of the first encountered cluster having this hash.
+
+== Deduplication table schematic (RAM method) ==
+
+0       l1_dedup_index                                              Size
+              |
+|--------------------------------------------------------------------|
+|             |                                                      |
+|             |        L1 Deduplication table                        |
+|             |                                                      |
+|--------------------------------------------------------------------|
+              |
+              |
+              |
+0             |           l2_dedup_block_entries
+              |
+|---------------------------------|
+|                                 |
+|    L2 deduplication block       |
+|                                 |
+|                 l2_dedup_index  |
+|---------------------------------|
+                         |
+         0               |              40
+                         |
+         |-------------------------------|
+         |                               |
+         |    Deduplication table entry  |
+         |                               |
+         |-------------------------------|
+
+
+== Deduplication table entry description (RAM method) ==
+
+Each L2 deduplication table entry has the following structure:
+
+    Byte  0 - 31:   hash of data cluster
+
+         32 - 39:   Logical offset of first encountered block having
+                    this hash
+
+== Deduplication table arithmetics (RAM method) ==
+
+cluster_size = 4096
+dedup_block_size = 65536 * 5
+l2_size = 65536 * 16 (16 factor is from the smaller cluster_size)
+
+Entries in the deduplication table are ordered by physical cluster index.
+
+The number of entries in an l2 deduplication table block is :
+l2_dedup_block_entries = FLOOR(dedup_block_size / (32 + 8))
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
+
+The 16 remaining bytes in each l2 deduplication blocks are set to zero and
+reserved for a future usage.
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -211,7 +311,7 @@ guest clusters to host clusters. They are called L1 and L2 table.
 
 The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
-exactly one cluster in size.
+exactly one cluster in size excepted for the deduplication case.
 
 Given a offset into the virtual disk, the offset into the image file can be
 obtained as follows:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 02/36] qcow2: Add deduplication structures and fields.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 01/36] qcow2: Add deduplication to the qcow2 specification Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 03/36] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.h |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 718b52b..b31b64e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -43,6 +43,10 @@
 #define QCOW_OFLAG_COPIED     (1LL << 63)
 /* indicate that the cluster is compressed (they never have the copied flag) */
 #define QCOW_OFLAG_COMPRESSED (1LL << 62)
+/* indicate that the cluster must be processed when deduplication restart
+ * also indicate that the on disk dedup hash must be ignored and discarded
+ */
+#define QCOW_OFLAG_TO_DEDUP (1LL << 61)
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1LL << 0)
 
@@ -58,6 +62,57 @@
 
 #define DEFAULT_CLUSTER_SIZE 65536
 
+#define HASH_LENGTH 32
+
+typedef enum {
+    QCOW_DEDUP_STOPPED,
+    QCOW_DEDUP_STARTING,
+    QCOW_DEDUP_STARTED,
+    QCOW_DEDUP_STOPPING,
+} QCowDedupStatus;
+
+typedef enum {
+    QCOW_HASH_SHA256 = 0,
+    QCOW_HASH_SHA3   = 1,
+    QCOW_HASH_SKEIN  = 2,
+} QCowHashAlgo;
+
+typedef struct {
+    uint8_t data[HASH_LENGTH]; /* 32 bytes hash of a given cluster */
+} QCowHash;
+
+/* Used to keep a single precomputed hash between the calls of the dedup
+ * function
+ */
+typedef struct {
+    QCowHash hash;
+    bool reuse;                  /* The hash is precomputed reuse it */
+} QcowPersistantHash;
+
+/* deduplication node */
+typedef struct {
+    QCowHash hash;
+    uint64_t physical_sect;       /* where the cluster is stored on disk */
+    uint64_t first_logical_sect;  /* logical sector of the first occurence of
+                                   * this cluster
+                                   */
+} QCowHashNode;
+
+/* Undedupable hashes that must be written later to disk */
+typedef struct QCowHashElement {
+    QCowHash hash;
+    QTAILQ_ENTRY(QCowHashElement) next;
+} QCowHashElement;
+
+typedef struct {
+    QcowPersistantHash phash;  /* contains a hash persisting between calls of
+                                * qcow2_dedup()
+                                */
+    QTAILQ_HEAD(, QCowHashElement) undedupables;
+    int nb_clusters_processed;
+    int nb_undedupable_sectors;
+} QCowDedupState;
+
 typedef struct QCowHeader {
     uint32_t magic;
     uint32_t version;
@@ -114,8 +169,10 @@ enum {
 enum {
     QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
     QCOW2_INCOMPAT_DIRTY         = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
+    QCOW2_INCOMPAT_DEDUP_BITNR   = 1,
+    QCOW2_INCOMPAT_DEDUP         = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
 
-    QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY,
+    QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
 };
 
 /* Compatible feature bits */
@@ -138,6 +195,7 @@ typedef struct BDRVQcowState {
     int cluster_sectors;
     int l2_bits;
     int l2_size;
+    int hash_block_size;
     int l1_size;
     int l1_vm_state_index;
     int csize_shift;
@@ -148,6 +206,7 @@ typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    Qcow2Cache *dedup_cluster_cache;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -160,6 +219,17 @@ typedef struct BDRVQcowState {
     int64_t free_cluster_index;
     int64_t free_byte_offset;
 
+    bool has_dedup;
+    QCowDedupStatus dedup_status;
+    QCowHashAlgo dedup_hash_algo;
+    Coroutine *dedup_resume_co;
+    int dedup_co_delay;
+    uint64_t *dedup_table;
+    uint64_t dedup_table_offset;
+    int32_t dedup_table_size;
+    GTree *dedup_tree_by_hash;
+    GTree *dedup_tree_by_sect;
+
     CoMutex lock;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 03/36] qcow2: Add qcow2_dedup_read_missing_and_concatenate
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 01/36] qcow2: Add deduplication to the qcow2 specification Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 02/36] qcow2: Add deduplication structures and fields Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:53   ` Eric Blake
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 04/36] qcow2: Make update_refcount public Benoît Canet
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This function is used to read missing data when unaligned writes are
done. This function also concatenate missing data with the given
qiov data in order to prepare a buffer used to look for duplicated
clusters.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs |    1 +
 block/qcow2-dedup.c |  119 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c       |   36 +++++++++++++++-
 block/qcow2.h       |   12 ++++++
 4 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-dedup.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c067f38..21afc85 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2-dedup.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
new file mode 100644
index 0000000..4e99eb1
--- /dev/null
+++ b/block/qcow2-dedup.c
@@ -0,0 +1,119 @@
+/*
+ * Deduplication for the QCOW2 format
+ *
+ * Copyright (C) Nodalink, SARL. 2012-2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block/block_int.h"
+#include "qemu-common.h"
+#include "qcow2.h"
+
+/*
+ * Prepare a buffer containing all the required data required to compute cluster
+ * sized deduplication hashes.
+ * If sector_num or nb_sectors are not cluster-aligned, missing data
+ * before/after the qiov will be read.
+ *
+ * @qiov:               the qiov for which missing data must be read
+ * @sector_num:         the first sectors that must be read into the qiov
+ * @nb_sectors:         the number of sectors to read into the qiov
+ * @data:               the place where the data will be concatenated and stored
+ * @nb_data_sectors:    the resulting size of the contatenated data (in sectors)
+ * @ret:                negative on error
+ */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+                                             QEMUIOVector *qiov,
+                                             uint64_t sector_num,
+                                             int nb_sectors,
+                                             uint8_t **data,
+                                             int *nb_data_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    uint64_t cluster_beginning_sector;
+    uint64_t first_sector_after_qiov;
+    int cluster_beginning_nr;
+    int cluster_ending_nr;
+    int unaligned_ending_nr;
+    uint64_t max_cluster_ending_nr;
+
+    /* compute how much and where to read at the beginning */
+    cluster_beginning_nr = sector_num & (s->cluster_sectors - 1);
+    cluster_beginning_sector = sector_num - cluster_beginning_nr;
+
+    /* for the ending */
+    first_sector_after_qiov = sector_num + nb_sectors;
+    unaligned_ending_nr = first_sector_after_qiov & (s->cluster_sectors - 1);
+    cluster_ending_nr = unaligned_ending_nr ?
+                        s->cluster_sectors - unaligned_ending_nr : 0;
+
+    /* compute total size in sectors and allocate memory */
+    *nb_data_sectors = cluster_beginning_nr + nb_sectors + cluster_ending_nr;
+    *data = qemu_blockalign(bs, *nb_data_sectors * BDRV_SECTOR_SIZE);
+
+    /* read beginning */
+    if (cluster_beginning_nr) {
+        ret = qcow2_read_cluster_data(bs,
+                                      *data,
+                                      cluster_beginning_sector,
+                                      cluster_beginning_nr);
+    }
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* append qiov content */
+    qemu_iovec_to_buf(qiov, 0, *data + cluster_beginning_nr * BDRV_SECTOR_SIZE,
+                      qiov->size);
+
+    /* Fix cluster_ending_nr if we are at risk of reading outside the image
+     * (Cluster unaligned image size)
+     */
+    max_cluster_ending_nr = bs->total_sectors - first_sector_after_qiov;
+    cluster_ending_nr = max_cluster_ending_nr < (uint64_t) cluster_ending_nr ?
+                        (int) max_cluster_ending_nr : cluster_ending_nr;
+
+    /* read and add ending */
+    if (cluster_ending_nr) {
+        ret = qcow2_read_cluster_data(bs,
+                                      *data +
+                                      (cluster_beginning_nr +
+                                      nb_sectors) *
+                                      BDRV_SECTOR_SIZE,
+                                      first_sector_after_qiov,
+                                      cluster_ending_nr);
+    }
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    return 0;
+
+fail:
+    qemu_vfree(*data);
+    *data = NULL;
+    return ret;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index d603f98..410d3c1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -69,7 +69,6 @@ static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
         return 0;
 }
 
-
 /* 
  * read qcow2 extension and fill bs
  * start reading from start_offset
@@ -1110,6 +1109,41 @@ fail:
     return ret;
 }
 
+/**
+ * Read some data from the QCOW2 file
+ *
+ * Important: s->lock is dropped. Things can change before the function return
+ *            to the caller.
+ *
+ * @data:       the buffer where the data must be stored
+ * @sector_num: the sector number to read in the QCOW2 file
+ * @nb_sectors: the number of sectors to read
+ * @ret:        negative on error
+ */
+int qcow2_read_cluster_data(BlockDriverState *bs,
+                            uint8_t *data,
+                            uint64_t sector_num,
+                            int nb_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int ret;
+
+    iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE;
+    iov.iov_base = data;
+    qemu_iovec_init_external(&qiov, &iov, 1);
+    qemu_co_mutex_unlock(&s->lock);
+    ret = bdrv_co_readv(bs, sector_num, nb_sectors, &qiov);
+    qemu_co_mutex_lock(&s->lock);
+    if (ret < 0) {
+        error_report("failed to read %d sectors at offset %" PRIu64 "\n",
+                     nb_sectors, sector_num);
+    }
+
+    return ret;
+}
+
 static int qcow2_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt)
 {
diff --git a/block/qcow2.h b/block/qcow2.h
index b31b64e..1fceb65 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -376,6 +376,10 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
 
 int qcow2_mark_dirty(BlockDriverState *bs);
 int qcow2_update_header(BlockDriverState *bs);
+int qcow2_read_cluster_data(BlockDriverState *bs,
+                            uint8_t *data,
+                            uint64_t sector_num,
+                            int nb_sectors);
 
 /* qcow2-refcount.c functions */
 int qcow2_refcount_init(BlockDriverState *bs);
@@ -444,4 +448,12 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
     void **table);
 int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
+/* qcow2-dedup.c functions */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+                                             QEMUIOVector *qiov,
+                                             uint64_t sector,
+                                             int sectors_nr,
+                                             uint8_t **dedup_cluster_data,
+                                             int *dedup_cluster_data_nr);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 04/36] qcow2: Make update_refcount public.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (2 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 03/36] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 05/36] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    6 +-----
 block/qcow2.h          |    2 ++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6a95aa6..e014b0e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -27,10 +27,6 @@
 #include "block/qcow2.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
-                            int64_t offset, int64_t length,
-                            int addend);
-
 
 /*********************************************************/
 /* refcount handling */
@@ -413,7 +409,7 @@ fail_block:
 }
 
 /* XXX: cache several refcount block clusters ? */
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     int64_t offset, int64_t length, int addend)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 1fceb65..803aeda 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -399,6 +399,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                           BdrvCheckMode fix);
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+    int64_t offset, int64_t length, int addend);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 05/36] qcow2: Create a way to link to l2 tables when deduplicating.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (3 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 04/36] qcow2: Make update_refcount public Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 06/36] qcow2: Add qcow2_dedup and related functions Benoît Canet
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |    8 ++++++--
 block/qcow2.h         |    9 +++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 56fccf9..63a7241 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -693,7 +693,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
             old_cluster[j++] = l2_table[l2_index + i];
 
         l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+                    (i << s->cluster_bits)) |
+                    (m->oflag_copied ? QCOW_OFLAG_COPIED : 0));
      }
 
 
@@ -706,7 +707,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      * If this was a COW, we need to decrease the refcount of the old cluster.
      * Also flush bs->file to get the right order for L2 and refcount update.
      */
-    if (j != 0) {
+    if (!m->overwrite && j != 0) {
         for (i = 0; i < j; i++) {
             qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1);
         }
@@ -1006,6 +1007,9 @@ again:
                     .offset     = nb_sectors * BDRV_SECTOR_SIZE,
                     .nb_sectors = avail_sectors - nb_sectors,
                 },
+
+                .oflag_copied   = true,
+                .overwrite      = false,
             };
             qemu_co_queue_init(&(*m)->dependent_requests);
             QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2.h b/block/qcow2.h
index 803aeda..4273e7c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -63,6 +63,10 @@
 #define DEFAULT_CLUSTER_SIZE 65536
 
 #define HASH_LENGTH 32
+/* indicate that the hash structure is empty and miss offset */
+#define QCOW_FLAG_EMPTY   (1LL << 62)
+/* indicate that the cluster for this hash has QCOW_OFLAG_COPIED on disk */
+#define QCOW_FLAG_FIRST   (1LL << 63)
 
 typedef enum {
     QCOW_DEDUP_STOPPED,
@@ -304,6 +308,11 @@ typedef struct QCowL2Meta
      */
     CoQueue dependent_requests;
 
+    /* set to true if QCOW_OFLAG_COPIED must be set in the L2 table entry */
+    bool oflag_copied;
+    /* set to true if we are overwriting an L2 table entry */
+    bool overwrite;
+
     /**
      * The COW Region between the start of the first allocated cluster and the
      * area the guest actually writes to.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 06/36] qcow2: Add qcow2_dedup and related functions
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (4 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 05/36] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 07/36] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |  436 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h       |    5 +
 2 files changed, 441 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4e99eb1..5901749 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -117,3 +117,439 @@ fail:
     *data = NULL;
     return ret;
 }
+
+/*
+ * Build a QCowHashNode structure
+ *
+ * @hash:               the given hash
+ * @physical_sect:      the cluster offset in the QCOW2 file
+ * @first_logical_sect: the first logical cluster offset written
+ * @ret:                the build QCowHashNode
+ */
+static QCowHashNode *qcow2_dedup_build_qcow_hash_node(QCowHash *hash,
+                                                  uint64_t physical_sect,
+                                                  uint64_t first_logical_sect)
+{
+    QCowHashNode *hash_node;
+
+    hash_node = g_new0(QCowHashNode, 1);
+    memcpy(hash_node->hash.data, hash->data, HASH_LENGTH);
+    hash_node->physical_sect = physical_sect;
+    hash_node->first_logical_sect = first_logical_sect;
+
+    return hash_node;
+}
+
+/*
+ * Compute the hash of a given cluster
+ *
+ * @data: a buffer containing the cluster data
+ * @hash: a QCowHash where to store the computed hash
+ * @ret:  0 on success, negative on error
+ */
+static int qcow2_compute_cluster_hash(BlockDriverState *bs,
+                                       QCowHash *hash,
+                                       uint8_t *data)
+{
+    return 0;
+}
+
+/*
+ * Get a QCowHashNode corresponding to a cluster data
+ *
+ * @phash:           if phash can be used no hash is computed
+ * @data:            a buffer containing the cluster
+ * @nb_clusters_processed: the number of cluster to skip in the buffer
+ * @err:             Error code if any
+ * @ret:             QCowHashNode of the duplicated cluster or NULL if not found
+ */
+static QCowHashNode *qcow2_get_hash_node_for_cluster(BlockDriverState *bs,
+                                                     QcowPersistantHash *phash,
+                                                     uint8_t *data,
+                                                     int nb_clusters_processed,
+                                                     int *err)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    *err = 0;
+
+    /* no hash has been provided compute it and store it for later usage */
+    if (!phash->reuse) {
+        ret = qcow2_compute_cluster_hash(bs,
+                                         &phash->hash,
+                                         data +
+                                         nb_clusters_processed *
+                                         s->cluster_size);
+    }
+
+    /* do not reuse the hash anymore if it was precomputed */
+    phash->reuse = false;
+
+    if (ret < 0) {
+        *err = ret;
+        return NULL;
+    }
+
+    return g_tree_lookup(s->dedup_tree_by_hash, &phash->hash);
+}
+
+/*
+ * Build a QCowHashNode from a given QCowHash and insert it into the tree
+ *
+ * @hash: the given QCowHash
+ */
+static void qcow2_build_and_insert_hash_node(BlockDriverState *bs,
+                                             QCowHash *hash)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+
+    /* build the hash node with QCOW_FLAG_EMPTY as offsets so we will remember
+     * to fill these field later with real values.
+     */
+    hash_node = qcow2_dedup_build_qcow_hash_node(hash,
+                                                 QCOW_FLAG_EMPTY,
+                                                 QCOW_FLAG_EMPTY);
+    g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
+}
+
+/*
+ * Helper used to build a QCowHashElement
+ *
+ * @hash: the QCowHash to use
+ * @ret:  a newly allocated QCowHashElement containing the given hash
+ */
+static QCowHashElement *qcow2_build_dedup_hash(QCowHash *hash)
+{
+    QCowHashElement *dedup_hash;
+    dedup_hash = g_new0(QCowHashElement, 1);
+    memcpy(dedup_hash->hash.data, hash->data, HASH_LENGTH);
+    return dedup_hash;
+}
+
+/*
+ * Helper used to link a deduplicated cluster in the l2
+ *
+ * @logical_sect:  the cluster sector seen by the guest
+ * @physical_sect: the cluster sector in the QCOW2 file
+ * @overwrite:     true if we must overwrite the L2 table entry
+ * @ret:
+ */
+static int qcow2_dedup_link_l2(BlockDriverState *bs,
+                               uint64_t logical_sect,
+                               uint64_t physical_sect,
+                               bool overwrite)
+{
+    QCowL2Meta m = {
+        .alloc_offset   = physical_sect << 9,
+        .offset         = logical_sect << 9,
+        .nb_clusters    = 1,
+        .nb_available   = 0,
+        .cow_start = {
+            .offset     = 0,
+            .nb_sectors = 0,
+        },
+        .cow_end = {
+            .offset     = 0,
+            .nb_sectors = 0,
+        },
+        .oflag_copied   = false,
+        .overwrite      = overwrite,
+    };
+    return qcow2_alloc_cluster_link_l2(bs, &m);
+}
+
+/* Clear the QCOW_OFLAG_COPIED from the first L2 entry written for a physical
+ * cluster.
+ *
+ * @hash_node: the duplicated hash node
+ * @ret:       0 on success, negative on error
+ */
+static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
+                                                QCowHashNode *hash_node)
+{
+    int ret = 0;
+    uint64_t first_logical_sect = hash_node->first_logical_sect;
+
+    /* QCOW_OFLAG_COPIED already cleared -> do nothing */
+    if (!(first_logical_sect & QCOW_FLAG_FIRST)) {
+        return 0;
+    }
+
+    /* note : QCOW_FLAG_FIRST == QCOW_OFLAG_COPIED */
+    first_logical_sect &= ~QCOW_FLAG_FIRST;
+
+    /* overwrite first L2 entry to clear QCOW_FLAG_COPIED */
+    ret = qcow2_dedup_link_l2(bs, first_logical_sect,
+                              hash_node->physical_sect,
+                              true);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* remember that we dont't need to clear QCOW_OFLAG_COPIED again */
+    hash_node->first_logical_sect &= first_logical_sect;
+
+    return 0;
+}
+
+/* This function deduplicate a cluster
+ *
+ * @logical_sect: The logical sector of the write
+ * @hash_node:    The duplicated cluster hash node
+ * @ret:          0 on success, negative on error
+ */
+static int qcow2_deduplicate_cluster(BlockDriverState *bs,
+                                     uint64_t logical_sect,
+                                     QCowHashNode *hash_node)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    /* create new L2 entry */
+    ret = qcow2_dedup_link_l2(bs, logical_sect,
+                              hash_node->physical_sect,
+                              false);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* Increment the refcount of the cluster */
+    return update_refcount(bs,
+                           (hash_node->physical_sect /
+                            s->cluster_sectors) << s->cluster_bits,
+                            1, 1);
+}
+
+/* This function tries to deduplicate a given cluster.
+ *
+ * @sector_num:           the logical sector number we are trying to deduplicate
+ * @phash:                Used instead of computing the hash if provided
+ * @data:                 the buffer in which to look for a duplicated cluster
+ * @nb_clusters_processed: the number of cluster that must be skipped in data
+ * @ret:                  ret < 0 on error, 1 on deduplication else 0
+ */
+static int qcow2_try_dedup_cluster(BlockDriverState *bs,
+                                   QcowPersistantHash *phash,
+                                   uint64_t sector_num,
+                                   uint8_t *data,
+                                   int nb_clusters_processed)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    QCowHashNode *hash_node;
+    uint64_t logical_sect;
+    uint64_t existing_physical_offset;
+    int pnum = s->cluster_sectors;
+
+    /* search the tree for duplicated cluster */
+    hash_node = qcow2_get_hash_node_for_cluster(bs,
+                                                phash,
+                                                data,
+                                                nb_clusters_processed,
+                                                &ret);
+
+    /* we won't reuse the hash on error */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if cluster is not duplicated store hash for later usage */
+    if (!hash_node) {
+        qcow2_build_and_insert_hash_node(bs, &phash->hash);
+        return 0;
+    }
+
+    logical_sect = sector_num & ~(s->cluster_sectors - 1);
+    ret = qcow2_get_cluster_offset(bs, logical_sect << 9,
+                                   &pnum, &existing_physical_offset);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if we are rewriting the same cluster at the same place do nothing */
+    if (existing_physical_offset == hash_node->physical_sect << 9) {
+        return 1;
+    }
+
+    /* take care of not having refcount > 1 and QCOW_OFLAG_COPIED at once */
+    ret = qcow2_clear_l2_copied_flag_if_needed(bs, hash_node);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* do the deduplication */
+    ret = qcow2_deduplicate_cluster(bs, logical_sect,
+                                    hash_node);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    return 1;
+}
+
+
+static void add_hash_to_undedupable_list(BlockDriverState *bs,
+                                                    QCowDedupState *ds)
+{
+    /* memorise hash for later storage in gtree and disk */
+    QCowHashElement *dedup_hash = qcow2_build_dedup_hash(&ds->phash.hash);
+    QTAILQ_INSERT_TAIL(&ds->undedupables, dedup_hash, next);
+}
+
+static int qcow2_dedup_starting_from_begining(BlockDriverState *bs,
+                                              QCowDedupState *ds,
+                                              uint64_t sector_num,
+                                              uint8_t *data,
+                                              int left_to_process)
+{
+    BDRVQcowState *s = bs->opaque;
+    int i;
+    int ret = 0;
+
+    for (i = 0; i < left_to_process; i++) {
+        ret = qcow2_try_dedup_cluster(bs,
+                                      &ds->phash,
+                                      sector_num + i * s->cluster_sectors,
+                                      data,
+                                      ds->nb_clusters_processed + i);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* stop if a cluster has not been deduplicated */
+        if (ret != 1) {
+            break;
+        }
+    }
+
+    return i;
+}
+
+static int qcow2_count_next_non_dedupable_clusters(BlockDriverState *bs,
+                                                   QCowDedupState *ds,
+                                                   uint8_t *data,
+                                                   int left_to_process)
+{
+    int i;
+    int ret = 0;
+    QCowHashNode *hash_node;
+
+    for (i = 0; i < left_to_process; i++) {
+        hash_node = qcow2_get_hash_node_for_cluster(bs,
+                                                  &ds->phash,
+                                                  data,
+                                                  ds->nb_clusters_processed + i,
+                                                  &ret);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* found a duplicated cluster : stop here */
+        if (hash_node) {
+            break;
+        }
+
+        qcow2_build_and_insert_hash_node(bs, &ds->phash.hash);
+        add_hash_to_undedupable_list(bs, ds);
+    }
+
+    return i;
+}
+
+
+/* Deduplicate all the cluster that can be deduplicated.
+ *
+ * Next it compute the number of non deduplicable sectors to come while storing
+ * the hashes of these sectors in a linked list for later usage.
+ * Then it compute the first duplicated cluster hash that come after non
+ * deduplicable cluster, this hash will be used at next call of the function
+ *
+ * @ds:              a structure containing the state of the deduplication
+ *                   for this write request
+ * @sector_num:      The logical sector
+ * @data:            the buffer containing the data to deduplicate
+ * @data_nr:         the size of the buffer in sectors
+ *
+ */
+int qcow2_dedup(BlockDriverState *bs,
+                QCowDedupState *ds,
+                uint64_t sector_num,
+                uint8_t *data,
+                int data_nr)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    int deduped_clusters_nr = 0;
+    int left_to_process;
+    int begining_index;
+
+    begining_index = sector_num & (s->cluster_sectors - 1);
+
+    left_to_process = (data_nr / s->cluster_sectors) -
+                      ds->nb_clusters_processed;
+
+    /* start deduplicating all that can be cluster after cluster */
+    ret = qcow2_dedup_starting_from_begining(bs,
+                                             ds,
+                                             sector_num,
+                                             data,
+                                             left_to_process);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    deduped_clusters_nr = ret;
+
+    left_to_process -= ret;
+    ds->nb_clusters_processed += ret;
+
+    /* We deduped everything till the end */
+    if (!left_to_process) {
+        ds->nb_undedupable_sectors = 0;
+        goto exit;
+    }
+
+    /* skip and account the first undedupable cluster found */
+    left_to_process--;
+    ds->nb_clusters_processed++;
+    ds->nb_undedupable_sectors += s->cluster_sectors;
+
+    add_hash_to_undedupable_list(bs, ds);
+
+    /* Count how many non duplicated sector can be written and memorize hashes
+     * to write them after data has reached disk.
+     */
+    ret = qcow2_count_next_non_dedupable_clusters(bs,
+                                                  ds,
+                                                  data,
+                                                  left_to_process);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    left_to_process -= ret;
+    ds->nb_clusters_processed += ret;
+    ds->nb_undedupable_sectors += ret * s->cluster_sectors;
+
+    /* remember to reuse the last hash computed at new qcow2_dedup call */
+    if (left_to_process) {
+        ds->phash.reuse = true;
+    }
+
+exit:
+    if (!deduped_clusters_nr) {
+        return 0;
+    }
+
+    return deduped_clusters_nr * s->cluster_sectors - begining_index;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 4273e7c..11c3002 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -466,5 +466,10 @@ int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
                                              int sectors_nr,
                                              uint8_t **dedup_cluster_data,
                                              int *dedup_cluster_data_nr);
+int qcow2_dedup(BlockDriverState *bs,
+                QCowDedupState *ds,
+                uint64_t sector_num,
+                uint8_t *data,
+                int data_nr);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 07/36] qcow2: Add qcow2_dedup_store_new_hashes.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (5 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 06/36] qcow2: Add qcow2_dedup and related functions Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 08/36] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |  325 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/qcow2.h       |    5 +
 2 files changed, 329 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 5901749..a424af8 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -29,6 +29,12 @@
 #include "qemu-common.h"
 #include "qcow2.h"
 
+static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
+                                       QCowHash *hash,
+                                       uint64_t *first_logical_sect,
+                                       uint64_t physical_sect,
+                                       bool write);
+
 /*
  * Prepare a buffer containing all the required data required to compute cluster
  * sized deduplication hashes.
@@ -291,7 +297,11 @@ static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
     /* remember that we dont't need to clear QCOW_OFLAG_COPIED again */
     hash_node->first_logical_sect &= first_logical_sect;
 
-    return 0;
+    /* clear the QCOW_FLAG_FIRST flag from disk */
+    return qcow2_dedup_read_write_hash(bs, &hash_node->hash,
+                                       &hash_node->first_logical_sect,
+                                       hash_node->physical_sect,
+                                       true);
 }
 
 /* This function deduplicate a cluster
@@ -553,3 +563,316 @@ exit:
 
     return deduped_clusters_nr * s->cluster_sectors - begining_index;
 }
+
+
+/* Create a deduplication table hash block, write it's offset to disk and
+ * reference it in the RAM deduplication table
+ *
+ * sync this to disk and get the dedup cluster cache entry
+ *
+ * @index: index in the RAM deduplication table
+ * @ret:   offset on success, negative on error
+ */
+static uint64_t qcow2_create_block(BlockDriverState *bs,
+                                               int32_t index)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+    uint64_t data64;
+    int ret = 0;
+
+    /* allocate a new dedup table hash block */
+    offset = qcow2_alloc_clusters(bs, s->hash_block_size);
+
+    if (offset < 0) {
+        return offset;
+    }
+
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    if (ret < 0) {
+        goto free_fail;
+    }
+
+    /* write the new block offset in the dedup table L1 */
+    data64 = cpu_to_be64(offset);
+    ret = bdrv_pwrite_sync(bs->file,
+                           s->dedup_table_offset +
+                           index * sizeof(uint64_t),
+                           &data64, sizeof(data64));
+
+    if (ret < 0) {
+        goto free_fail;
+    }
+
+    s->dedup_table[index] = offset;
+
+    return offset;
+
+free_fail:
+    qcow2_free_clusters(bs, offset, s->hash_block_size);
+    return ret;
+}
+
+static int qcow2_create_and_get_block(BlockDriverState *bs,
+                                      uint32_t index,
+                                      uint8_t **block)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    int64_t offset;
+
+    offset = qcow2_create_block(bs, index);
+
+    if (offset < 0) {
+        return offset;
+    }
+
+
+    /* get an empty cluster from the dedup cache */
+    ret = qcow2_cache_get_empty(bs, s->dedup_cluster_cache,
+                                offset,
+                                (void **) block);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* clear it */
+    memset(*block, 0, s->hash_block_size);
+
+    return 0;
+}
+
+static inline bool qcow2_has_dedup_block(BlockDriverState *bs,
+                                         uint32_t index)
+{
+    BDRVQcowState *s = bs->opaque;
+    return s->dedup_table[index] == 0 ? false : true;
+}
+
+static inline void qcow2_write_hash_to_block_and_dirty(BlockDriverState *bs,
+                                                       uint8_t *block,
+                                                       QCowHash *hash,
+                                                       int offset,
+                                                       uint64_t *logical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t first;
+    first = cpu_to_be64(*logical_sect);
+    memcpy(block + offset, hash->data, HASH_LENGTH);
+    memcpy(block + offset + HASH_LENGTH, &first, 8);
+    qcow2_cache_entry_mark_dirty(s->dedup_cluster_cache, block);
+}
+
+static inline uint64_t qcow2_read_hash_from_block(uint8_t *block,
+                                                  QCowHash *hash,
+                                                  int offset)
+{
+    uint64_t first;
+    memcpy(hash->data, block + offset, HASH_LENGTH);
+    memcpy(&first, block + offset + HASH_LENGTH, 8);
+    return be64_to_cpu(first);
+}
+
+/* Read/write a given hash and cluster_sect from/to the dedup table
+ *
+ * This function doesn't flush the dedup cache to disk
+ *
+ * @hash:                     the hash to read or store
+ * @first_logical_sect:       logical sector of the QCOW_FLAG_OCOPIED cluster
+ * @physical_sect:            sector of the cluster in QCOW2 file (in sectors)
+ * @write:                    true to write, false to read
+ * @ret:                      0 on succes, errno on error
+ */
+static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
+                                       QCowHash *hash,
+                                       uint64_t *first_logical_sect,
+                                       uint64_t physical_sect,
+                                       bool write)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint8_t *block = NULL;
+    int ret = 0;
+    int64_t cluster_number;
+    uint32_t index_in_dedup_table;
+    int offset_in_block;
+    int nb_hash_in_block = s->hash_block_size / (HASH_LENGTH + 8);
+
+    cluster_number = physical_sect / s->cluster_sectors;
+    index_in_dedup_table = cluster_number / nb_hash_in_block;
+
+    if (s->dedup_table_size <= index_in_dedup_table) {
+        return -ENOSPC;
+    }
+
+    /* if we must read and there is nothing to read return a null hash */
+    if (!qcow2_has_dedup_block(bs, index_in_dedup_table) && !write) {
+        memset(hash->data, 0, HASH_LENGTH);
+        *first_logical_sect = 0;
+        return 0;
+    }
+
+    if (qcow2_has_dedup_block(bs, index_in_dedup_table)) {
+        ret = qcow2_cache_get(bs,
+                              s->dedup_cluster_cache,
+                              s->dedup_table[index_in_dedup_table],
+                              (void **) &block);
+    } else {
+        ret = qcow2_create_and_get_block(bs,
+                                         index_in_dedup_table,
+                                         &block);
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    offset_in_block = (cluster_number % nb_hash_in_block) *
+                      (HASH_LENGTH + 8);
+
+    if (write)  {
+        qcow2_write_hash_to_block_and_dirty(bs,
+                                            block,
+                                            hash,
+                                            offset_in_block,
+                                            first_logical_sect);
+    } else  {
+        *first_logical_sect = qcow2_read_hash_from_block(block,
+                                                         hash,
+                                                         offset_in_block);
+    }
+
+    qcow2_cache_put(bs, s->dedup_cluster_cache, (void **) &block);
+
+    return 0;
+}
+
+static inline bool is_hash_node_empty(QCowHashNode *hash_node)
+{
+    return hash_node->physical_sect & QCOW_FLAG_EMPTY;
+}
+
+static void qcow2_remove_hash_node(BlockDriverState *bs,
+                                   QCowHashNode *hash_node)
+{
+    BDRVQcowState *s = bs->opaque;
+    g_tree_remove(s->dedup_tree_by_sect, &hash_node->physical_sect);
+    g_tree_remove(s->dedup_tree_by_hash, &hash_node->hash);
+}
+
+/* This function removes a hash_node from the trees given a physical sector
+ *
+ * @physical_sect: The physical sector of the cluster corresponding to the hash
+ */
+static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
+                                            uint64_t physical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+
+    hash_node = g_tree_lookup(s->dedup_tree_by_sect, &physical_sect);
+
+    if (!hash_node) {
+        return;
+    }
+
+    qcow2_remove_hash_node(bs, hash_node);
+}
+
+/* This function store a hash information to disk and RAM
+ *
+ * @hash:           the QCowHash to process
+ * @logical_sect:   the logical sector of the cluster seen by the guest
+ * @physical_sect:  the physical sector of the stored cluster
+ * @ret:            0 on success, negative on error
+ */
+static int qcow2_store_hash(BlockDriverState *bs,
+                            QCowHash *hash,
+                            uint64_t logical_sect,
+                            uint64_t physical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+
+    hash_node = g_tree_lookup(s->dedup_tree_by_hash, hash);
+
+    /* no hash node found for this hash */
+    if (!hash_node) {
+        return 0;
+    }
+
+    /* the hash node information are already completed */
+    if (!is_hash_node_empty(hash_node)) {
+        return 0;
+    }
+
+    /* Remember that this QCowHashNoderepresent the first occurence of the
+     * cluste so we will be able to clear QCOW_OFLAG_COPIED from the L2 table
+     * entry when refcount will go > 1.
+     */
+    logical_sect = logical_sect | QCOW_FLAG_FIRST;
+
+    /* remove stale hash node pointing to this physical sector from the trees */
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+
+    /* fill the missing fields of the hash node */
+    hash_node->physical_sect = physical_sect;
+    hash_node->first_logical_sect = logical_sect;
+
+    /* insert the hash node in the second tree: it's already in the first one */
+    g_tree_insert(s->dedup_tree_by_sect, &hash_node->physical_sect, hash_node);
+
+    /* write the hash to disk */
+    return qcow2_dedup_read_write_hash(bs,
+                                       hash,
+                                       &logical_sect,
+                                       physical_sect,
+                                       true);
+}
+
+/* This function store the hashes of the clusters which are not duplicated
+ *
+ * @ds:            The deduplication state
+ * @count:         the number of dedup hash to process
+ * @logical_sect:  logical offset of the first cluster (in sectors)
+ * @physical_sect: offset of the first cluster (in sectors)
+ * @ret:           0 on succes, errno on error
+ */
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+                                 QCowDedupState *ds,
+                                 int count,
+                                 uint64_t logical_sect,
+                                 uint64_t physical_sect)
+{
+    int ret = 0;
+    int i = 0;
+    BDRVQcowState *s = bs->opaque;
+    QCowHashElement *dedup_hash, *next_dedup_hash;
+
+    /* round values on cluster boundaries for easier cluster deletion */
+    logical_sect = logical_sect & ~(s->cluster_sectors - 1);
+    physical_sect = physical_sect & ~(s->cluster_sectors - 1);
+
+    QTAILQ_FOREACH_SAFE(dedup_hash, &ds->undedupables, next, next_dedup_hash) {
+
+        ret = qcow2_store_hash(bs,
+                               &dedup_hash->hash,
+                               logical_sect + i * s->cluster_sectors,
+                               physical_sect + i * s->cluster_sectors);
+
+        QTAILQ_REMOVE(&ds->undedupables, dedup_hash, next);
+        g_free(dedup_hash);
+
+        if (ret < 0) {
+            break;
+        }
+
+        i++;
+
+        if (i == count) {
+            break;
+        }
+    }
+
+    return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 11c3002..ea0c30e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -471,5 +471,10 @@ int qcow2_dedup(BlockDriverState *bs,
                 uint64_t sector_num,
                 uint8_t *data,
                 int data_nr);
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+                                 QCowDedupState *ds,
+                                 int count,
+                                 uint64_t logical_sect,
+                                 uint64_t physical_sect);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 08/36] qcow2: Implement qcow2_compute_cluster_hash.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (6 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 07/36] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 09/36] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Add detection of libgnutls used to compute SHA256 hashes

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   13 ++++++++++++-
 configure           |   22 ++++++++++++++++++++++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index a424af8..45b2326 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -25,6 +25,8 @@
  * THE SOFTWARE.
  */
 
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
 #include "block/block_int.h"
 #include "qemu-common.h"
 #include "qcow2.h"
@@ -157,7 +159,16 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
                                        QCowHash *hash,
                                        uint8_t *data)
 {
-    return 0;
+    BDRVQcowState *s = bs->opaque;
+    switch (s->dedup_hash_algo) {
+    case QCOW_HASH_SHA256:
+        return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
+                                s->cluster_size, hash->data);
+    default:
+        error_report("Invalid deduplication hash algorithm %i",
+                     s->dedup_hash_algo);
+        abort();
+    }
 }
 
 /*
diff --git a/configure b/configure
index 99c1ec3..390326e 100755
--- a/configure
+++ b/configure
@@ -1724,6 +1724,28 @@ EOF
 fi
 
 ##########################################
+# QCOW Deduplication gnutls detection
+cat > $TMPC <<EOF
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
+EOF
+qcow_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
+qcow_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
+if compile_prog "$qcow_tls_cflags" "$qcow_tls_libs" ; then
+  qcow_tls=yes
+  libs_softmmu="$qcow_tls_libs $libs_softmmu"
+  libs_tools="$qcow_tls_libs $libs_softmmu"
+  QEMU_CFLAGS="$QEMU_CFLAGS $qcow_tls_cflags"
+else
+  echo "gnutls > 2.10.0 required to compile QEMU"
+  exit 1
+fi
+
+##########################################
 # VNC SASL detection
 if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
   cat > $TMPC <<EOF
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 09/36] qcow2: Extract qcow2_dedup_grow_table
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (7 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 08/36] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 10/36] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |  102 +++++++++++++++++++++++++++++++------------------
 block/qcow2-dedup.c   |    3 +-
 block/qcow2.h         |    6 +++
 3 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 63a7241..dbcb6d2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -29,44 +29,48 @@
 #include "block/qcow2.h"
 #include "trace.h"
 
-int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+int qcow2_do_grow_table(BlockDriverState *bs, int min_size, bool exact_size,
+                        uint64_t **table, uint64_t *table_offset,
+                        int *table_size, qcow2_save_table save_table,
+                        const char *table_name)
 {
     BDRVQcowState *s = bs->opaque;
-    int new_l1_size, new_l1_size2, ret, i;
-    uint64_t *new_l1_table;
-    int64_t new_l1_table_offset;
-    uint8_t data[12];
+    int new_size, new_size2, ret, i;
+    uint64_t *new_table;
+    int64_t new_table_offset;
 
-    if (min_size <= s->l1_size)
+    if (min_size <= *table_size) {
         return 0;
+    }
 
     if (exact_size) {
-        new_l1_size = min_size;
+        new_size = min_size;
     } else {
         /* Bump size up to reduce the number of times we have to grow */
-        new_l1_size = s->l1_size;
-        if (new_l1_size == 0) {
-            new_l1_size = 1;
+        new_size = *table_size;
+        if (new_size == 0) {
+            new_size = 1;
         }
-        while (min_size > new_l1_size) {
-            new_l1_size = (new_l1_size * 3 + 1) / 2;
+        while (min_size > new_size) {
+            new_size = (new_size * 3 + 1) / 2;
         }
     }
 
 #ifdef DEBUG_ALLOC2
-    fprintf(stderr, "grow l1_table from %d to %d\n", s->l1_size, new_l1_size);
+    fprintf(stderr, "grow %s_table from %d to %d\n",
+            table_name, *table_size, new_size);
 #endif
 
-    new_l1_size2 = sizeof(uint64_t) * new_l1_size;
-    new_l1_table = g_malloc0(align_offset(new_l1_size2, 512));
-    memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
+    new_size2 = sizeof(uint64_t) * new_size;
+    new_table = g_malloc0(align_offset(new_size2, 512));
+    memcpy(new_table, *table, *table_size * sizeof(uint64_t));
 
     /* write new table (align to cluster) */
     BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
-    new_l1_table_offset = qcow2_alloc_clusters(bs, new_l1_size2);
-    if (new_l1_table_offset < 0) {
-        g_free(new_l1_table);
-        return new_l1_table_offset;
+    new_table_offset = qcow2_alloc_clusters(bs, new_size2);
+    if (new_table_offset < 0) {
+        g_free(new_table);
+        return new_table_offset;
     }
 
     ret = qcow2_cache_flush(bs, s->refcount_block_cache);
@@ -75,34 +79,56 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
-    for(i = 0; i < s->l1_size; i++)
-        new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
-    ret = bdrv_pwrite_sync(bs->file, new_l1_table_offset, new_l1_table, new_l1_size2);
+    for (i = 0; i < *table_size; i++) {
+        new_table[i] = cpu_to_be64(new_table[i]);
+    }
+    ret = bdrv_pwrite_sync(bs->file, new_table_offset, new_table, new_size2);
     if (ret < 0)
         goto fail;
-    for(i = 0; i < s->l1_size; i++)
-        new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
+    for (i = 0; i < *table_size; i++) {
+        new_table[i] = be64_to_cpu(new_table[i]);
+    }
+
+    g_free(*table);
+    qcow2_free_clusters(bs, *table_offset, *table_size * sizeof(uint64_t));
+    *table_offset = new_table_offset;
+    *table = new_table;
+    *table_size = new_size;
 
     /* set new table */
     BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
-    cpu_to_be32w((uint32_t*)data, new_l1_size);
-    cpu_to_be64wu((uint64_t*)(data + 4), new_l1_table_offset);
-    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size), data,sizeof(data));
-    if (ret < 0) {
-        goto fail;
-    }
-    g_free(s->l1_table);
-    qcow2_free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
-    s->l1_table_offset = new_l1_table_offset;
-    s->l1_table = new_l1_table;
-    s->l1_size = new_l1_size;
+    save_table(bs, *table_offset, *table_size);
+
     return 0;
  fail:
-    g_free(new_l1_table);
-    qcow2_free_clusters(bs, new_l1_table_offset, new_l1_size2);
+    g_free(new_table);
+    qcow2_free_clusters(bs, new_table_offset, new_size2);
     return ret;
 }
 
+static int qcow2_l1_save_table(BlockDriverState *bs,
+                               int64_t table_offset, int size)
+{
+    uint8_t data[12];
+    cpu_to_be32w((uint32_t *)data, size);
+    cpu_to_be64wu((uint64_t *)(data + 4), table_offset);
+    return bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
+                            data, sizeof(data));
+}
+
+int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    return qcow2_do_grow_table(bs,
+                               min_size,
+                               exact_size,
+                               &s->l1_table,
+                               &s->l1_table_offset,
+                               &s->l1_size,
+                               qcow2_l1_save_table,
+                               "l1");
+}
+
 /*
  * l2_load
  *
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 45b2326..de1b366 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -575,7 +575,6 @@ exit:
     return deduped_clusters_nr * s->cluster_sectors - begining_index;
 }
 
-
 /* Create a deduplication table hash block, write it's offset to disk and
  * reference it in the RAM deduplication table
  *
@@ -592,7 +591,7 @@ static uint64_t qcow2_create_block(BlockDriverState *bs,
     uint64_t data64;
     int ret = 0;
 
-    /* allocate a new dedup table hash block */
+    /* allocate a new dedup table cluster */
     offset = qcow2_alloc_clusters(bs, s->hash_block_size);
 
     if (offset < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index ea0c30e..359a50f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -412,6 +412,12 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     int64_t offset, int64_t length, int addend);
 
 /* qcow2-cluster.c functions */
+typedef int (*qcow2_save_table)(BlockDriverState *bs,
+                                int64_t table_offset, int size);
+int qcow2_do_grow_table(BlockDriverState *bs, int min_size, bool exact_size,
+                        uint64_t **table, uint64_t *table_offset,
+                        int *table_size, qcow2_save_table save_table,
+                        const char *table_name);
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
 void qcow2_l2_cache_reset(BlockDriverState *bs);
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 10/36] qcow2: Add qcow2_dedup_grow_table and use it.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (8 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 09/36] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 11/36] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters Benoît Canet
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index de1b366..de6e3a3 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -38,6 +38,44 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        bool write);
 
 /*
+ * Save the dedup table information into the header extensions
+ *
+ * @table_offset: the dedup table offset in the QCOW2 file
+ * @size:         the size of the dedup table
+ * @ret:          0 on success, -errno  on error
+ */
+static int qcow2_dedup_save_table_info(BlockDriverState *bs,
+                                  int64_t table_offset, int size)
+{
+    BDRVQcowState *s = bs->opaque;
+    s->dedup_table_offset = table_offset;
+    s->dedup_table_size = size;
+    return qcow2_update_header(bs);
+}
+
+/*
+ * Grow the deduplication table
+ *
+ * @min_size:   minimal size
+ * @exact_size: if true force to grow to the exact size
+ * @ret:        0 on success, -errno  on error
+ */
+static int qcow2_dedup_grow_table(BlockDriverState *bs,
+                                  int min_size,
+                                  bool exact_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    return qcow2_do_grow_table(bs,
+                               min_size,
+                               exact_size,
+                               &s->dedup_table,
+                               &s->dedup_table_offset,
+                               &s->dedup_table_size,
+                               qcow2_dedup_save_table_info,
+                               "dedup");
+}
+
+/*
  * Prepare a buffer containing all the required data required to compute cluster
  * sized deduplication hashes.
  * If sector_num or nb_sectors are not cluster-aligned, missing data
@@ -712,7 +750,11 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
     index_in_dedup_table = cluster_number / nb_hash_in_block;
 
     if (s->dedup_table_size <= index_in_dedup_table) {
-        return -ENOSPC;
+        ret = qcow2_dedup_grow_table(bs, index_in_dedup_table + 1, false);
+    }
+
+    if (ret < 0) {
+        return ret;
     }
 
     /* if we must read and there is nothing to read return a null hash */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 11/36] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (9 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 10/36] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 12/36] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-cluster.c |    8 ++++++--
 block/qcow2-dedup.c   |    7 +++++++
 block/qcow2.h         |    3 +++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index dbcb6d2..ef91216 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -709,6 +709,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
 
     for (i = 0; i < m->nb_clusters; i++) {
+        uint64_t flags = 0;
         /* if two concurrent writes happen to the same unallocated cluster
 	 * each write allocates separate cluster and writes data concurrently.
 	 * The first one to complete updates l2 table with pointer to its
@@ -718,9 +719,11 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
         if(l2_table[l2_index + i] != 0)
             old_cluster[j++] = l2_table[l2_index + i];
 
+        flags = m->oflag_copied ? QCOW_OFLAG_COPIED : 0;
+        flags |= m->to_deduplicate ? QCOW_OFLAG_TO_DEDUP : 0;
+
         l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) |
-                    (m->oflag_copied ? QCOW_OFLAG_COPIED : 0));
+                    (i << s->cluster_bits)) | flags);
      }
 
 
@@ -1036,6 +1039,7 @@ again:
 
                 .oflag_copied   = true,
                 .overwrite      = false,
+                .to_deduplicate = qcow2_must_deduplicate(bs),
             };
             qemu_co_queue_init(&(*m)->dependent_requests);
             QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index de6e3a3..3d512e5 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -37,6 +37,12 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        uint64_t physical_sect,
                                        bool write);
 
+bool qcow2_must_deduplicate(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    return s->has_dedup && s->dedup_status != QCOW_DEDUP_STARTED;
+}
+
 /*
  * Save the dedup table information into the header extensions
  *
@@ -310,6 +316,7 @@ static int qcow2_dedup_link_l2(BlockDriverState *bs,
         },
         .oflag_copied   = false,
         .overwrite      = overwrite,
+        .to_deduplicate = false,
     };
     return qcow2_alloc_cluster_link_l2(bs, &m);
 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 359a50f..da7e57e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -312,6 +312,8 @@ typedef struct QCowL2Meta
     bool oflag_copied;
     /* set to true if we are overwriting an L2 table entry */
     bool overwrite;
+    /* set to true if the cluster must be tagged with QCOW_OFLAG_TO_DEDUP */
+    bool to_deduplicate;
 
     /**
      * The COW Region between the start of the first allocated cluster and the
@@ -466,6 +468,7 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
 /* qcow2-dedup.c functions */
+bool qcow2_must_deduplicate(BlockDriverState *bs);
 int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
                                              QEMUIOVector *qiov,
                                              uint64_t sector,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 12/36] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (10 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 11/36] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 13/36] qcow2: Create qcow2_is_cluster_to_dedup Benoît Canet
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |   11 +++++++++--
 block/qcow2-dedup.c   |    8 +++++++-
 block/qcow2.h         |    2 ++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ef91216..5b1d20d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -710,6 +710,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 
     for (i = 0; i < m->nb_clusters; i++) {
         uint64_t flags = 0;
+        uint64_t offset = cluster_offset + (i << s->cluster_bits);
         /* if two concurrent writes happen to the same unallocated cluster
 	 * each write allocates separate cluster and writes data concurrently.
 	 * The first one to complete updates l2 table with pointer to its
@@ -722,8 +723,14 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
         flags = m->oflag_copied ? QCOW_OFLAG_COPIED : 0;
         flags |= m->to_deduplicate ? QCOW_OFLAG_TO_DEDUP : 0;
 
-        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | flags);
+        l2_table[l2_index + i] = cpu_to_be64(offset | flags);
+
+        /* make the deduplication forget the cluster to avoid making
+         * the dedup pointing to a cluster that has changed on it's back.
+         */
+        if (m->to_deduplicate) {
+            qcow2_dedup_forget_cluster_by_sector(bs, offset >> 9);
+        }
      }
 
 
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 3d512e5..7049bd8 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -824,7 +824,7 @@ static void qcow2_remove_hash_node(BlockDriverState *bs,
  * @physical_sect: The physical sector of the cluster corresponding to the hash
  */
 static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
-                                            uint64_t physical_sect)
+                                             uint64_t physical_sect)
 {
     BDRVQcowState *s = bs->opaque;
     QCowHashNode *hash_node;
@@ -838,6 +838,12 @@ static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
     qcow2_remove_hash_node(bs, hash_node);
 }
 
+void qcow2_dedup_forget_cluster_by_sector(BlockDriverState *bs,
+                                          uint64_t physical_sect)
+{
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+}
+
 /* This function store a hash information to disk and RAM
  *
  * @hash:           the QCowHash to process
diff --git a/block/qcow2.h b/block/qcow2.h
index da7e57e..bc1ba33 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -469,6 +469,8 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
 /* qcow2-dedup.c functions */
 bool qcow2_must_deduplicate(BlockDriverState *bs);
+void qcow2_dedup_forget_cluster_by_sector(BlockDriverState *bs,
+                                          uint64_t physical_sect);
 int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
                                              QEMUIOVector *qiov,
                                              uint64_t sector,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 13/36] qcow2: Create qcow2_is_cluster_to_dedup.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (11 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 12/36] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 14/36] qcow2: Load and save deduplication table header extension Benoît Canet
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h         |    4 ++++
 2 files changed, 56 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5b1d20d..fedcf57 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -514,6 +514,58 @@ out:
     return ret;
 }
 
+/* Check if a cluster is to deduplicate given it's index
+ *
+ * @index:         The logical index of the cluster starting from 0
+ * @physical_sect: The physical sector of the cluster as return value
+ * @err:           0 on success, negative on error
+ * @ret:           True if the cluster is to deduplicate else false
+ */
+bool qcow2_is_cluster_to_dedup(BlockDriverState *bs,
+                               uint64_t index,
+                               uint64_t *physical_sect,
+                               int *err)
+{
+    BDRVQcowState *s = bs->opaque;
+    unsigned int l1_index, l2_index;
+    uint64_t offset;
+    uint64_t l2_offset;
+    uint64_t *l2_table = NULL;
+
+    *physical_sect = 0;
+    *err = 0;
+
+    l1_index = index >> s->l2_bits;
+
+    if (l1_index >= s->l1_size) {
+        return false;
+    }
+
+    /* no l1 entry */
+    if (!(s->l1_table[l1_index] & QCOW_OFLAG_COPIED)) {
+        return false;
+    }
+
+    l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+
+    *err = l2_load(bs, l2_offset, &l2_table);
+    if (*err < 0) {
+        return false;
+    }
+
+    l2_index = index & (s->l2_size - 1);
+
+    offset = be64_to_cpu(l2_table[l2_index]);
+    *physical_sect = (offset & L2E_OFFSET_MASK) >> 9;
+
+    *err = qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
+    if (*err < 0) {
+        return false;
+    }
+
+    return offset & QCOW_OFLAG_TO_DEDUP;
+}
+
 /*
  * get_cluster_table
  *
diff --git a/block/qcow2.h b/block/qcow2.h
index bc1ba33..0232088 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -440,6 +440,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
 int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
     int nb_sectors);
 int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
+bool qcow2_is_cluster_to_dedup(BlockDriverState *bs,
+                               uint64_t index,
+                               uint64_t *physical_sect,
+                               int *ret);
 
 /* qcow2-snapshot.c functions */
 int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 14/36] qcow2: Load and save deduplication table header extension.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (12 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 13/36] qcow2: Create qcow2_is_cluster_to_dedup Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 17:35   ` Eric Blake
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 15/36] qcow2: Extract qcow2_do_table_init Benoît Canet
                   ` (21 subsequent siblings)
  35 siblings, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 410d3c1..acd3258 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -53,9 +53,18 @@ typedef struct {
     uint32_t len;
 } QCowExtension;
 
+typedef struct {
+    uint64_t offset;
+    int32_t  size;
+    uint8_t  hash_algo;
+    uint8_t  strategies;
+    char     reserved[56];
+} QCowDedupTableExtension;
+
 #define  QCOW2_EXT_MAGIC_END 0
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define  QCOW2_EXT_MAGIC_DEDUP_TABLE 0xCD8E819B
 
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
@@ -83,6 +92,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
     QCowExtension ext;
     uint64_t offset;
     int ret;
+    QCowDedupTableExtension dedup_table_extension;
 
 #ifdef DEBUG_EXT
     printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
@@ -147,6 +157,19 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
             }
             break;
 
+        case QCOW2_EXT_MAGIC_DEDUP_TABLE:
+                ret = bdrv_pread(bs->file, offset,
+                                 &dedup_table_extension, ext.len);
+                if (ret < 0) {
+                    return ret;
+                }
+                s->dedup_table_offset =
+                    be64_to_cpu(dedup_table_extension.offset);
+                s->dedup_table_size =
+                    be32_to_cpu(dedup_table_extension.size);
+                s->dedup_hash_algo = dedup_table_extension.hash_algo;
+            break;
+
         default:
             /* unknown magic - save it in case we need to rewrite the header */
             {
@@ -958,6 +981,7 @@ int qcow2_update_header(BlockDriverState *bs)
     uint32_t refcount_table_clusters;
     size_t header_length;
     Qcow2UnknownHeaderExtension *uext;
+    QCowDedupTableExtension dedup_table_extension;
 
     buf = qemu_blockalign(bs, buflen);
 
@@ -1061,6 +1085,25 @@ int qcow2_update_header(BlockDriverState *bs)
     buf += ret;
     buflen -= ret;
 
+    if (s->has_dedup) {
+        memset(&dedup_table_extension, 0, sizeof(dedup_table_extension));
+        dedup_table_extension.offset = cpu_to_be64(s->dedup_table_offset);
+        dedup_table_extension.size = cpu_to_be32(s->dedup_table_size);
+        dedup_table_extension.hash_algo = s->dedup_hash_algo;
+        dedup_table_extension.strategies |= 1; /* RAM based lookup */
+        dedup_table_extension.strategies |= 1 << 2; /* deduplication running */
+        ret = header_ext_add(buf,
+                             QCOW2_EXT_MAGIC_DEDUP_TABLE,
+                             &dedup_table_extension,
+                             sizeof(dedup_table_extension),
+                             buflen);
+        if (ret < 0) {
+            goto fail;
+        }
+        buf += ret;
+        buflen -= ret;
+    }
+
     /* Keep unknown header extensions */
     QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
         ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 15/36] qcow2: Extract qcow2_do_table_init.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (13 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 14/36] qcow2: Load and save deduplication table header extension Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 16/36] qcow2-cache: Allow to choose table size at creation Benoît Canet
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |   43 ++++++++++++++++++++++++++++++-------------
 block/qcow2.h          |    5 +++++
 2 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e014b0e..75c2bde 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -31,27 +31,44 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
 /*********************************************************/
 /* refcount handling */
 
-int qcow2_refcount_init(BlockDriverState *bs)
+int qcow2_do_table_init(BlockDriverState *bs,
+                        uint64_t **table,
+                        int64_t offset,
+                        int size,
+                        bool is_refcount)
 {
-    BDRVQcowState *s = bs->opaque;
-    int ret, refcount_table_size2, i;
-
-    refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
-    s->refcount_table = g_malloc(refcount_table_size2);
-    if (s->refcount_table_size > 0) {
-        BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
-        ret = bdrv_pread(bs->file, s->refcount_table_offset,
-                         s->refcount_table, refcount_table_size2);
-        if (ret != refcount_table_size2)
+    int ret, size2, i;
+
+    size2 = size * sizeof(uint64_t);
+    *table = g_malloc(size2);
+    if (size > 0) {
+        if (is_refcount) {
+            BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_LOAD);
+        }
+        ret = bdrv_pread(bs->file, offset,
+                         *table, size2);
+        if (ret != size2) {
             goto fail;
-        for(i = 0; i < s->refcount_table_size; i++)
-            be64_to_cpus(&s->refcount_table[i]);
+        }
+        for (i = 0; i < size; i++) {
+            be64_to_cpus(&(*table)[i]);
+        }
     }
     return 0;
  fail:
     return -ENOMEM;
 }
 
+int qcow2_refcount_init(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    return qcow2_do_table_init(bs,
+                               &s->refcount_table,
+                               s->refcount_table_offset,
+                               s->refcount_table_size,
+                               true);
+}
+
 void qcow2_refcount_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 0232088..8eb2977 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -393,6 +393,11 @@ int qcow2_read_cluster_data(BlockDriverState *bs,
                             int nb_sectors);
 
 /* qcow2-refcount.c functions */
+int qcow2_do_table_init(BlockDriverState *bs,
+                        uint64_t **table,
+                        int64_t offset,
+                        int size,
+                        bool is_refcount);
 int qcow2_refcount_init(BlockDriverState *bs);
 void qcow2_refcount_close(BlockDriverState *bs);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 16/36] qcow2-cache: Allow to choose table size at creation.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (14 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 15/36] qcow2: Extract qcow2_do_table_init Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 17/36] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cache.c |   12 +++++++-----
 block/qcow2.c       |    5 +++--
 block/qcow2.h       |    3 ++-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 2f3114e..83f2814 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -40,20 +40,22 @@ struct Qcow2Cache {
     struct Qcow2Cache*      depends;
     int                     size;
     bool                    depends_on_flush;
+    int                     table_size;
 };
 
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+                               int table_size)
 {
-    BDRVQcowState *s = bs->opaque;
     Qcow2Cache *c;
     int i;
 
     c = g_malloc0(sizeof(*c));
     c->size = num_tables;
     c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
+    c->table_size = table_size;
 
     for (i = 0; i < c->size; i++) {
-        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
+        c->entries[i].table = qemu_blockalign(bs, c->table_size);
     }
 
     return c;
@@ -121,7 +123,7 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
     }
 
     ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
-        s->cluster_size);
+        c->table_size);
     if (ret < 0) {
         return ret;
     }
@@ -253,7 +255,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
             BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
         }
 
-        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
+        ret = bdrv_pread(bs->file, offset, c->entries[i].table, c->table_size);
         if (ret < 0) {
             return ret;
         }
diff --git a/block/qcow2.c b/block/qcow2.c
index acd3258..b8c4e31 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -452,8 +452,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     }
 
     /* alloc L2 table/refcount block cache */
-    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
+                                                 s->cluster_size);
 
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
diff --git a/block/qcow2.h b/block/qcow2.h
index 8eb2977..b17977f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -461,7 +461,8 @@ void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
 
 /* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+                               int table_size);
 int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
 
 void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 17/36] qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (15 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 16/36] qcow2-cache: Allow to choose table size at creation Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 18/36] block: Add qemu-img dedup create option Benoît Canet
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   49 ++++++++++++++++++++++++++++++-------------------
 block/qcow2.h |    4 ++--
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b8c4e31..f046a77 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -238,61 +238,72 @@ static void report_unsupported_feature(BlockDriverState *bs,
 }
 
 /*
- * Sets the dirty bit and flushes afterwards if necessary.
+ * Sets the an incompatible feature bit and flushes afterwards if necessary.
  *
  * The incompatible_features bit is only set if the image file header was
  * updated successfully.  Therefore it is not required to check the return
  * value of this function.
  */
-int qcow2_mark_dirty(BlockDriverState *bs)
+static int qcow2_add_feature(BlockDriverState *bs,
+                             QCow2IncompatibleFeature feature)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t val;
-    int ret;
+    int ret = 0;
 
     assert(s->qcow_version >= 3);
 
-    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
-        return 0; /* already dirty */
+    if (s->incompatible_features & feature) {
+        return 0; /* already added */
     }
 
-    val = cpu_to_be64(s->incompatible_features | QCOW2_INCOMPAT_DIRTY);
+    val = cpu_to_be64(s->incompatible_features | feature);
     ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, incompatible_features),
                       &val, sizeof(val));
     if (ret < 0) {
         return ret;
     }
-    ret = bdrv_flush(bs->file);
-    if (ret < 0) {
-        return ret;
-    }
 
-    /* Only treat image as dirty if the header was updated successfully */
-    s->incompatible_features |= QCOW2_INCOMPAT_DIRTY;
+    /* Only treat image as having the feature if the header was updated
+     * successfully
+     */
+    s->incompatible_features |= feature;
     return 0;
 }
 
+int qcow2_mark_dirty(BlockDriverState *bs)
+{
+    return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
 /*
- * Clears the dirty bit and flushes before if necessary.  Only call this
- * function when there are no pending requests, it does not guard against
- * concurrent requests dirtying the image.
+ * Clears an incompatible feature bit and flushes before if necessary.
+ * Only call this function when there are no pending requests, it does not
+ * guard against concurrent requests adding a feature to the image.
  */
-static int qcow2_mark_clean(BlockDriverState *bs)
+static int qcow2_remove_feature(BlockDriverState *bs,
+                             QCow2IncompatibleFeature feature)
 {
     BDRVQcowState *s = bs->opaque;
+    int ret = 0;
 
-    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
-        int ret = bdrv_flush(bs);
+    if (s->incompatible_features & feature) {
+        ret = bdrv_flush(bs);
         if (ret < 0) {
             return ret;
         }
 
-        s->incompatible_features &= ~QCOW2_INCOMPAT_DIRTY;
+        s->incompatible_features &= ~feature;
         return qcow2_update_header(bs);
     }
     return 0;
 }
 
+static int qcow2_mark_clean(BlockDriverState *bs)
+{
+    return qcow2_remove_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
 static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
                        BdrvCheckMode fix)
 {
diff --git a/block/qcow2.h b/block/qcow2.h
index b17977f..59432fd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -170,14 +170,14 @@ enum {
 };
 
 /* Incompatible feature bits */
-enum {
+typedef enum {
     QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
     QCOW2_INCOMPAT_DIRTY         = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
     QCOW2_INCOMPAT_DEDUP_BITNR   = 1,
     QCOW2_INCOMPAT_DEDUP         = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
 
     QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
-};
+} QCow2IncompatibleFeature;
 
 /* Compatible feature bits */
 enum {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 18/36] block: Add qemu-img dedup create option.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (16 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 17/36] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 19/36] qcow2: Add a deduplication boolean to update_refcount Benoît Canet
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c             |  113 +++++++++++++++++++++++++++++++++++++++------
 block/qcow2.h             |    2 +
 include/block/block_int.h |    1 +
 3 files changed, 103 insertions(+), 13 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f046a77..835554d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -276,6 +276,11 @@ int qcow2_mark_dirty(BlockDriverState *bs)
     return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
 }
 
+static int qcow2_activate_dedup(BlockDriverState *bs)
+{
+    return qcow2_add_feature(bs, QCOW2_INCOMPAT_DEDUP);
+}
+
 /*
  * Clears an incompatible feature bit and flushes before if necessary.
  * Only call this function when there are no pending requests, it does not
@@ -907,6 +912,11 @@ static void qcow2_close(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     g_free(s->l1_table);
 
+    if (s->has_dedup) {
+        qcow2_cache_flush(bs, s->dedup_cluster_cache);
+        qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+    }
+
     qcow2_cache_flush(bs, s->l2_table_cache);
     qcow2_cache_flush(bs, s->refcount_block_cache);
 
@@ -1266,7 +1276,8 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
                          const char *backing_file, const char *backing_format,
                          int flags, size_t cluster_size, int prealloc,
-                         QEMUOptionParameter *options, int version)
+                         QEMUOptionParameter *options, int version,
+                         bool dedup, uint8_t hash_algo)
 {
     /* Calculate cluster_bits */
     int cluster_bits;
@@ -1293,8 +1304,10 @@ static int qcow2_create2(const char *filename, int64_t total_size,
      * size for any qcow2 image.
      */
     BlockDriverState* bs;
+    BDRVQcowState *s;
     QCowHeader header;
-    uint8_t* refcount_table;
+    uint8_t *tables;
+    int size;
     int ret;
 
     ret = bdrv_create_file(filename, options);
@@ -1336,10 +1349,11 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         goto out;
     }
 
-    /* Write an empty refcount table */
-    refcount_table = g_malloc0(cluster_size);
-    ret = bdrv_pwrite(bs, cluster_size, refcount_table, cluster_size);
-    g_free(refcount_table);
+    /* Write an empty refcount table + extra space for dedup table if needed */
+    size = dedup ? 2 : 1;
+    tables = g_malloc0(size * cluster_size);
+    ret = bdrv_pwrite(bs, cluster_size, tables, size * cluster_size);
+    g_free(tables);
 
     if (ret < 0) {
         goto out;
@@ -1350,7 +1364,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
     /*
      * And now open the image and make it consistent first (i.e. increase the
      * refcount of the cluster that is occupied by the header and the refcount
-     * table)
+     * table and the eventual dedup table)
      */
     BlockDriver* drv = bdrv_find_format("qcow2");
     assert(drv != NULL);
@@ -1360,7 +1374,8 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         goto out;
     }
 
-    ret = qcow2_alloc_clusters(bs, 2 * cluster_size);
+    size++; /* Add a cluster for the header */
+    ret = qcow2_alloc_clusters(bs, size * cluster_size);
     if (ret < 0) {
         goto out;
 
@@ -1370,11 +1385,33 @@ static int qcow2_create2(const char *filename, int64_t total_size,
     }
 
     /* Okay, now that we have a valid image, let's give it the right size */
+    s = bs->opaque;
     ret = bdrv_truncate(bs, total_size * BDRV_SECTOR_SIZE);
     if (ret < 0) {
         goto out;
     }
 
+    if (dedup) {
+        s->has_dedup = true;
+        s->dedup_table_offset = cluster_size * 2;
+        s->dedup_table_size = cluster_size / sizeof(uint64_t);
+        s->dedup_hash_algo = hash_algo;
+
+        ret = qcow2_activate_dedup(bs);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = qcow2_update_header(bs);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* minimal init */
+        s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+                                                    s->hash_block_size);
+    }
+
     /* Want a backing file? There you go.*/
     if (backing_file) {
         ret = bdrv_change_backing_file(bs, backing_file, backing_format);
@@ -1400,15 +1437,41 @@ out:
     return ret;
 }
 
+static int qcow2_warn_if_version_3_is_needed(int version,
+                                             bool has_feature,
+                                             const char *feature)
+{
+    if (version < 3 && has_feature) {
+        fprintf(stderr, "%s only supported with compatibility "
+                "level 1.1 and above (use compat=1.1 or greater)\n",
+                feature);
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static int8_t qcow2_get_dedup_hash_algo(char *value)
+{
+    if (!strcmp(value, "sha256")) {
+        return QCOW_HASH_SHA256;
+    }
+
+    error_printf("Unsupported deduplication hash algorithm.\n");
+    return -EINVAL;
+}
+
 static int qcow2_create(const char *filename, QEMUOptionParameter *options)
 {
     const char *backing_file = NULL;
     const char *backing_fmt = NULL;
     uint64_t sectors = 0;
     int flags = 0;
+    int ret;
     size_t cluster_size = DEFAULT_CLUSTER_SIZE;
     int prealloc = 0;
     int version = 2;
+    bool dedup = false;
+    int8_t hash_algo = 0;
 
     /* Read out options */
     while (options && options->name) {
@@ -1446,24 +1509,43 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
             }
         } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
             flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
+        } else if (!strcmp(options->name, BLOCK_OPT_DEDUP) &&
+                   options->value.s) {
+            hash_algo = qcow2_get_dedup_hash_algo(options->value.s);
+            if (hash_algo < 0) {
+                return hash_algo;
+            }
+            dedup = true;
         }
         options++;
     }
 
+    if (dedup) {
+        cluster_size = 4096;
+    }
+
     if (backing_file && prealloc) {
         fprintf(stderr, "Backing file and preallocation cannot be used at "
             "the same time\n");
         return -EINVAL;
     }
 
-    if (version < 3 && (flags & BLOCK_FLAG_LAZY_REFCOUNTS)) {
-        fprintf(stderr, "Lazy refcounts only supported with compatibility "
-                "level 1.1 and above (use compat=1.1 or greater)\n");
-        return -EINVAL;
+    ret = qcow2_warn_if_version_3_is_needed(version,
+                                            flags & BLOCK_FLAG_LAZY_REFCOUNTS,
+                                            "Lazy refcounts");
+    if (ret < 0) {
+        return ret;
+    }
+    ret = qcow2_warn_if_version_3_is_needed(version,
+                                            dedup,
+                                            "Deduplication");
+    if (ret < 0) {
+        return ret;
     }
 
     return qcow2_create2(filename, sectors, backing_file, backing_fmt, flags,
-                         cluster_size, prealloc, options, version);
+                         cluster_size, prealloc, options, version,
+                         dedup, hash_algo);
 }
 
 static int qcow2_make_empty(BlockDriverState *bs)
@@ -1766,6 +1848,11 @@ static QEMUOptionParameter qcow2_create_options[] = {
         .type = OPT_FLAG,
         .help = "Postpone refcount updates",
     },
+    {
+        .name = BLOCK_OPT_DEDUP,
+        .type = OPT_STRING,
+        .help = "Deduplication",
+    },
     { NULL }
 };
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 59432fd..f987328 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -60,6 +60,8 @@
 /* Must be at least 4 to cover all cases of refcount table growth */
 #define REFCOUNT_CACHE_SIZE 4
 
+#define DEDUP_CACHE_SIZE 4
+
 #define DEFAULT_CLUSTER_SIZE 65536
 
 #define HASH_LENGTH 32
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f83ffb8..b7ed3e6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -55,6 +55,7 @@
 #define BLOCK_OPT_SUBFMT            "subformat"
 #define BLOCK_OPT_COMPAT_LEVEL      "compat"
 #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
+#define BLOCK_OPT_DEDUP             "dedup"
 
 typedef struct BdrvTrackedRequest BdrvTrackedRequest;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 19/36] qcow2: Add a deduplication boolean to update_refcount.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (17 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 18/36] block: Add qemu-img dedup create option Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 20/36] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This is needed for next commit which handle the deduplication refcount overflow
case.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c    |    2 +-
 block/qcow2-refcount.c |   20 +++++++++++---------
 block/qcow2.h          |    2 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 7049bd8..25ecefa 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -386,7 +386,7 @@ static int qcow2_deduplicate_cluster(BlockDriverState *bs,
     return update_refcount(bs,
                            (hash_node->physical_sect /
                             s->cluster_sectors) << s->cluster_bits,
-                            1, 1);
+                            1, 1, true);
 }
 
 /* This function tries to deduplicate a given cluster.
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 75c2bde..b1ad112 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -245,7 +245,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     } else {
         /* Described somewhere else. This can recurse at most twice before we
          * arrive at a block that describes itself. */
-        ret = update_refcount(bs, new_block, s->cluster_size, 1);
+        ret = update_refcount(bs, new_block, s->cluster_size, 1, false);
         if (ret < 0) {
             goto fail_block;
         }
@@ -427,7 +427,7 @@ fail_block:
 
 /* XXX: cache several refcount block clusters ? */
 int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
-    int64_t offset, int64_t length, int addend)
+    int64_t offset, int64_t length, int addend, bool deduplication)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t start, last, cluster_offset;
@@ -513,7 +513,8 @@ fail:
      */
     if (ret < 0) {
         int dummy;
-        dummy = update_refcount(bs, offset, cluster_offset - offset, -addend);
+        dummy = update_refcount(bs, offset, cluster_offset - offset, -addend,
+                                deduplication);
         (void)dummy;
     }
 
@@ -534,7 +535,8 @@ static int update_cluster_refcount(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     int ret;
 
-    ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend);
+    ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend,
+                          false);
     if (ret < 0) {
         return ret;
     }
@@ -588,7 +590,7 @@ int64_t qcow2_alloc_clusters(BlockDriverState *bs, int64_t size)
         return offset;
     }
 
-    ret = update_refcount(bs, offset, size, 1);
+    ret = update_refcount(bs, offset, size, 1, false);
     if (ret < 0) {
         return ret;
     }
@@ -620,7 +622,7 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
     old_free_cluster_index = s->free_cluster_index;
     s->free_cluster_index = cluster_index + i;
 
-    ret = update_refcount(bs, offset, i << s->cluster_bits, 1);
+    ret = update_refcount(bs, offset, i << s->cluster_bits, 1, false);
     if (ret < 0) {
         return ret;
     }
@@ -686,7 +688,7 @@ void qcow2_free_clusters(BlockDriverState *bs,
     int ret;
 
     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_FREE);
-    ret = update_refcount(bs, offset, size, -1);
+    ret = update_refcount(bs, offset, size, -1, false);
     if (ret < 0) {
         fprintf(stderr, "qcow2_free_clusters failed: %s\n", strerror(-ret));
         /* TODO Remember the clusters to free them later and avoid leaking */
@@ -795,7 +797,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                             int ret;
                             ret = update_refcount(bs,
                                 (offset & s->cluster_offset_mask) & ~511,
-                                nb_csectors * 512, addend);
+                                nb_csectors * 512, addend, false);
                             if (ret < 0) {
                                 goto fail;
                             }
@@ -1228,7 +1230,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 
             if (num_fixed) {
                 ret = update_refcount(bs, i << s->cluster_bits, 1,
-                                      refcount2 - refcount1);
+                                      refcount2 - refcount1, false);
                 if (ret >= 0) {
                     (*num_fixed)++;
                     continue;
diff --git a/block/qcow2.h b/block/qcow2.h
index f987328..5c126be 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -418,7 +418,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                           BdrvCheckMode fix);
 int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
-    int64_t offset, int64_t length, int addend);
+    int64_t offset, int64_t length, int addend, bool deduplication);
 
 /* qcow2-cluster.c functions */
 typedef int (*qcow2_save_table)(BlockDriverState *bs,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 20/36] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (18 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 19/36] qcow2: Add a deduplication boolean to update_refcount Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 17:46   ` Eric Blake
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 21/36] qcow2: Remove hash when cluster is deleted Benoît Canet
                   ` (15 subsequent siblings)
  35 siblings, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

A new physical cluster with the same hash value will be used for further
occurence of this hash.
---
 block/qcow2-dedup.c    |   32 ++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c |    3 +++
 block/qcow2.h          |    4 ++++
 3 files changed, 39 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 25ecefa..9eba773 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -941,3 +941,35 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
 
     return ret;
 }
+
+/* Force to use a new physical cluster and QCowHashNode when the refcount pass
+ * 2^16/2.
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
+                                           uint64_t cluster_index)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHashNode *hash_node;
+    uint64_t physical_sect = cluster_index * s->cluster_sectors;
+
+    hash_node =  g_tree_lookup(s->dedup_tree_by_sect, &physical_sect);
+
+    if (!hash_node) {
+        return;
+    }
+
+    /* mark this hash so we won't load it anymore at startup after writing it */
+    hash_node->first_logical_sect |= QCOW_FLAG_HALF_MAX_REFCOUNT;
+
+    /* write to disk */
+    qcow2_dedup_read_write_hash(bs,
+                                &hash_node->hash,
+                                &hash_node->first_logical_sect,
+                                hash_node->physical_sect,
+                                true);
+
+    /* remove the QCowHashNode from ram so we won't use it anymore for dedup */
+    qcow2_remove_hash_node(bs, hash_node);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b1ad112..ac396c4 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -489,6 +489,9 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
             ret = -EINVAL;
             goto fail;
         }
+        if (s->has_dedup && deduplication && refcount >= 0xFFFF/2) {
+            qcow2_dedup_refcount_half_max_reached(bs, cluster_index);
+        }
         if (refcount == 0 && cluster_index < s->free_cluster_index) {
             s->free_cluster_index = cluster_index;
         }
diff --git a/block/qcow2.h b/block/qcow2.h
index 5c126be..ba10ed0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -65,6 +65,8 @@
 #define DEFAULT_CLUSTER_SIZE 65536
 
 #define HASH_LENGTH 32
+/* indicate that this cluster refcount has reached its maximum value */
+#define QCOW_FLAG_HALF_MAX_REFCOUNT (1LL << 61)
 /* indicate that the hash structure is empty and miss offset */
 #define QCOW_FLAG_EMPTY   (1LL << 62)
 /* indicate that the cluster for this hash has QCOW_OFLAG_COPIED on disk */
@@ -499,5 +501,7 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
                                  int count,
                                  uint64_t logical_sect,
                                  uint64_t physical_sect);
+void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
+                                           uint64_t cluster_index);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 21/36] qcow2: Remove hash when cluster is deleted.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (19 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 20/36] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 22/36] qcow2: Add qcow2_dedup_is_running to probe if dedup is running Benoît Canet
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c    |   26 ++++++++++++++++++++++++++
 block/qcow2-refcount.c |    3 +++
 block/qcow2.h          |    2 ++
 3 files changed, 31 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 9eba773..8b51dda 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -942,6 +942,32 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
     return ret;
 }
 
+/* Clean the last reference to a given cluster when it's refcount is zero
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
+                                      uint64_t cluster_index)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowHash null_hash;
+    uint64_t logical_sect = 0;
+    uint64_t physical_sect = cluster_index * s->cluster_sectors;
+
+    /* prepare null hash */
+    memset(&null_hash, 0, sizeof(null_hash));
+
+    /* clear from disk */
+    qcow2_dedup_read_write_hash(bs,
+                                &null_hash,
+                                &logical_sect,
+                                physical_sect,
+                                true);
+
+    /* remove from ram if present so we won't dedup with it anymore */
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+}
+
 /* Force to use a new physical cluster and QCowHashNode when the refcount pass
  * 2^16/2.
  *
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index ac396c4..6a6719f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -492,6 +492,9 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         if (s->has_dedup && deduplication && refcount >= 0xFFFF/2) {
             qcow2_dedup_refcount_half_max_reached(bs, cluster_index);
         }
+        if (s->has_dedup && refcount == 0) {
+            qcow2_dedup_refcount_zero_reached(bs, cluster_index);
+        }
         if (refcount == 0 && cluster_index < s->free_cluster_index) {
             s->free_cluster_index = cluster_index;
         }
diff --git a/block/qcow2.h b/block/qcow2.h
index ba10ed0..842c321 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -501,6 +501,8 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
                                  int count,
                                  uint64_t logical_sect,
                                  uint64_t physical_sect);
+void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
+                                       uint64_t cluster_index);
 void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
                                            uint64_t cluster_index);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 22/36] qcow2: Add qcow2_dedup_is_running to probe if dedup is running.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (20 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 21/36] qcow2: Remove hash when cluster is deleted Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 23/36] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |    6 ++++++
 block/qcow2.h       |    1 +
 2 files changed, 7 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 8b51dda..cc99e27 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -999,3 +999,9 @@ void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
     /* remove the QCowHashNode from ram so we won't use it anymore for dedup */
     qcow2_remove_hash_node(bs, hash_node);
 }
+
+bool qcow2_dedup_is_running(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    return s->has_dedup && s->dedup_status == QCOW_DEDUP_STARTED;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 842c321..dc9f519 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -505,5 +505,6 @@ void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
                                        uint64_t cluster_index);
 void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
                                            uint64_t cluster_index);
+bool qcow2_dedup_is_running(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 23/36] qcow2: Integrate deduplication in qcow2_co_writev loop.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (21 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 22/36] qcow2: Add qcow2_dedup_is_running to probe if dedup is running Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 24/36] qcow2: Serialize write requests when deduplication is activated Benoît Canet
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 835554d..6b8f85f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -330,6 +330,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     QCowHeader header;
     uint64_t ext_end;
 
+    s->has_dedup = false;
     ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
     if (ret < 0) {
         goto fail;
@@ -792,13 +793,18 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     int index_in_cluster;
     int n_end;
-    int ret;
+    int ret = 0;
     int cur_nr_sectors; /* number of sectors in current iteration */
     uint64_t cluster_offset;
     QEMUIOVector hd_qiov;
     uint64_t bytes_done = 0;
     uint8_t *cluster_data = NULL;
     QCowL2Meta *l2meta;
+    uint8_t *dedup_cluster_data = NULL;
+    int dedup_cluster_data_nr;
+    int deduped_sectors_nr;
+    QCowDedupState ds;
+    bool atomic_dedup_is_running;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), sector_num,
                                  remaining_sectors);
@@ -809,13 +815,70 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
 
     qemu_co_mutex_lock(&s->lock);
 
+    atomic_dedup_is_running = qcow2_dedup_is_running(bs);
+    if (atomic_dedup_is_running) {
+        QTAILQ_INIT(&ds.undedupables);
+        ds.phash.reuse = false;
+        ds.nb_undedupable_sectors = 0;
+        ds.nb_clusters_processed = 0;
+
+        /* if deduplication is on we make sure dedup_cluster_data
+         * contains a multiple of cluster size of data in order
+         * to compute the hashes
+         */
+        ret = qcow2_dedup_read_missing_and_concatenate(bs,
+                                                       qiov,
+                                                       sector_num,
+                                                       remaining_sectors,
+                                                       &dedup_cluster_data,
+                                                       &dedup_cluster_data_nr);
+
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
     while (remaining_sectors != 0) {
 
         l2meta = NULL;
 
         trace_qcow2_writev_start_part(qemu_coroutine_self());
+
+        if (atomic_dedup_is_running && ds.nb_undedupable_sectors == 0) {
+            /* Try to deduplicate as much clusters as possible */
+            deduped_sectors_nr = qcow2_dedup(bs,
+                                             &ds,
+                                             sector_num,
+                                             dedup_cluster_data,
+                                             dedup_cluster_data_nr);
+
+            if (deduped_sectors_nr < 0) {
+                goto fail;
+            }
+
+            remaining_sectors -= deduped_sectors_nr;
+            sector_num += deduped_sectors_nr;
+            bytes_done += deduped_sectors_nr * 512;
+
+            /* no more data to write -> exit */
+            if (remaining_sectors <= 0) {
+                goto fail;
+            }
+
+            /* if we deduped something trace it */
+            if (deduped_sectors_nr) {
+                trace_qcow2_writev_done_part(qemu_coroutine_self(),
+                                             deduped_sectors_nr);
+                trace_qcow2_writev_start_part(qemu_coroutine_self());
+            }
+        }
+
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
-        n_end = index_in_cluster + remaining_sectors;
+        n_end = atomic_dedup_is_running &&
+                ds.nb_undedupable_sectors < remaining_sectors ?
+                index_in_cluster + ds.nb_undedupable_sectors :
+                index_in_cluster + remaining_sectors;
+
         if (s->crypt_method &&
             n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors) {
             n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
@@ -851,6 +914,24 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
                 cur_nr_sectors * 512);
         }
 
+        /* Write the non duplicated clusters hashes to disk */
+        if (atomic_dedup_is_running) {
+            int count = cur_nr_sectors / s->cluster_sectors;
+            int has_ending = ((cluster_offset >> 9) + index_in_cluster +
+                             cur_nr_sectors) & (s->cluster_sectors - 1);
+            count = index_in_cluster ? count + 1 : count;
+            count = has_ending ? count + 1 : count;
+            ret = qcow2_dedup_store_new_hashes(bs,
+                                               &ds,
+                                               count,
+                                               sector_num,
+                                               (cluster_offset >> 9));
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
         qemu_co_mutex_unlock(&s->lock);
         BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
         trace_qcow2_writev_data(qemu_coroutine_self(),
@@ -882,6 +963,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             l2meta = NULL;
         }
 
+        ds.nb_undedupable_sectors -= cur_nr_sectors;
         remaining_sectors -= cur_nr_sectors;
         sector_num += cur_nr_sectors;
         bytes_done += cur_nr_sectors * 512;
@@ -902,6 +984,7 @@ fail:
 
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
+    qemu_vfree(dedup_cluster_data);
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
 
     return ret;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 24/36] qcow2: Serialize write requests when deduplication is activated.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (22 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 23/36] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 25/36] qcow2: Add verification of dedup table Benoît Canet
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This fix the sub cluster sized writes race conditions while waiting
for a more faster solution.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   14 +++++++++++++-
 block/qcow2.h |    1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6b8f85f..4f8cf68 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -523,6 +523,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 
     /* Initialise locks */
     qemu_co_mutex_init(&s->lock);
+    qemu_co_mutex_init(&s->dedup_lock);
 
     /* Repair image if dirty */
     if (!(flags & BDRV_O_CHECK) && !bs->read_only &&
@@ -814,8 +815,15 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     s->cluster_cache_offset = -1; /* disable compressed cache */
 
     qemu_co_mutex_lock(&s->lock);
-
     atomic_dedup_is_running = qcow2_dedup_is_running(bs);
+    qemu_co_mutex_unlock(&s->lock);
+
+    if (atomic_dedup_is_running) {
+        qemu_co_mutex_lock(&s->dedup_lock);
+    }
+
+    qemu_co_mutex_lock(&s->lock);
+
     if (atomic_dedup_is_running) {
         QTAILQ_INIT(&ds.undedupables);
         ds.phash.reuse = false;
@@ -982,6 +990,10 @@ fail:
         g_free(l2meta);
     }
 
+    if (atomic_dedup_is_running) {
+        qemu_co_mutex_unlock(&s->dedup_lock);
+    }
+
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
     qemu_vfree(dedup_cluster_data);
diff --git a/block/qcow2.h b/block/qcow2.h
index dc9f519..9f5d0f0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -239,6 +239,7 @@ typedef struct BDRVQcowState {
     GTree *dedup_tree_by_sect;
 
     CoMutex lock;
+    CoMutex dedup_lock;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 25/36] qcow2: Add verification of dedup table.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (23 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 24/36] qcow2: Serialize write requests when deduplication is activated Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 26/36] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6a6719f..34a6a04 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1158,6 +1158,14 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         goto fail;
     }
 
+    if (s->has_dedup) {
+        ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+                                 s->dedup_table_offset, s->dedup_table_size, 0);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
     /* snapshots */
     for(i = 0; i < s->nb_snapshots; i++) {
         sn = s->snapshots + i;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 26/36] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (24 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 25/36] qcow2: Add verification of dedup table Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 27/36] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 34a6a04..f7a283a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1003,7 +1003,14 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         PRIx64 ": %s\n", l2_entry, strerror(-refcount));
                     goto fail;
                 }
-                if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
+                if (!s->has_dedup &&
+                    (refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
+                    fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
+                        PRIx64 " refcount=%d\n", l2_entry, refcount);
+                    res->corruptions++;
+                }
+                if (s->has_dedup && refcount > 1 &&
+                    ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
                     fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
                         PRIx64 " refcount=%d\n", l2_entry, refcount);
                     res->corruptions++;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 27/36] qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (25 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 26/36] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 28/36] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |   65 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 56 insertions(+), 9 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f7a283a..3077a9f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1047,6 +1047,43 @@ fail:
     return -EIO;
 }
 
+static int check_dedup_l2(BlockDriverState *bs, BdrvCheckResult *res,
+                          int64_t l2_offset)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t *l2_table;
+    int i, l2_size;
+
+    /* Read L2 table from disk */
+    l2_size = s->cluster_size;
+    l2_table = g_malloc(l2_size);
+
+    if (bdrv_pread(bs->file, l2_offset, l2_table, l2_size) != l2_size) {
+        goto fail;
+    }
+
+    /* Do the actual checks */
+    for (i = 0; i < (s->l2_size - 5); i += 5) {
+        uint64_t first_logical_offset = be64_to_cpu(l2_table[i + 4]) &
+                                        ~QCOW_FLAG_FIRST;
+        if (first_logical_offset > (bs->total_sectors * BDRV_SECTOR_SIZE)) {
+            fprintf(stderr, "ERROR: l2 deduplication first_logical_offset"
+                    "=%" PRIi64 " outside of deduplicated volume in l2 table "
+                    "with offset %" PRIi64 ".\n", first_logical_offset,
+                    l2_offset);
+            res->corruptions++;
+        }
+    }
+
+    g_free(l2_table);
+    return 0;
+
+fail:
+    fprintf(stderr, "ERROR: I/O error in check_dedup_l2\n");
+    g_free(l2_table);
+    return -EIO;
+}
+
 /*
  * Increases the refcount for the L1 table, its L2 tables and all referenced
  * clusters in the given refcount table. While doing so, performs some checks
@@ -1060,7 +1097,8 @@ static int check_refcounts_l1(BlockDriverState *bs,
                               uint16_t *refcount_table,
                               int refcount_table_size,
                               int64_t l1_table_offset, int l1_size,
-                              int check_copied)
+                              int check_copied,
+                              bool dedup)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t *l1_table, l2_offset, l1_size2;
@@ -1116,11 +1154,19 @@ static int check_refcounts_l1(BlockDriverState *bs,
                 res->corruptions++;
             }
 
-            /* Process and check L2 entries */
-            ret = check_refcounts_l2(bs, res, refcount_table,
-                refcount_table_size, l2_offset, check_copied);
-            if (ret < 0) {
-                goto fail;
+            if (dedup) {
+                /* Process and check dedup l2 entries */
+                ret = check_dedup_l2(bs, res, l2_offset);
+                if (ret < 0) {
+                    goto fail;
+                }
+                } else {
+                /* Process and check L2 entries */
+                ret = check_refcounts_l2(bs, res, refcount_table,
+                    refcount_table_size, l2_offset, check_copied);
+                if (ret < 0) {
+                    goto fail;
+                }
             }
         }
     }
@@ -1160,14 +1206,15 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 
     /* current L1 table */
     ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
-                       s->l1_table_offset, s->l1_size, 1);
+                       s->l1_table_offset, s->l1_size, 1, false);
     if (ret < 0) {
         goto fail;
     }
 
     if (s->has_dedup) {
         ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
-                                 s->dedup_table_offset, s->dedup_table_size, 0);
+                                 s->dedup_table_offset, s->dedup_table_size,
+                                 0, true);
         if (ret < 0) {
             goto fail;
         }
@@ -1177,7 +1224,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     for(i = 0; i < s->nb_snapshots; i++) {
         sn = s->snapshots + i;
         ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
-            sn->l1_table_offset, sn->l1_size, 0);
+            sn->l1_table_offset, sn->l1_size, 0, false);
         if (ret < 0) {
             goto fail;
         }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 28/36] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (26 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 27/36] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 29/36] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

In the case of a race condition between two writes a l2 entry can be written
without QCOW_OFLAG_COPIED before the first write fill it.
This patch simply check if the l2 entry has the correct offset without
QCOW_OFLAG_COPIED and do nothing.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fedcf57..c016e85 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -763,6 +763,11 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     for (i = 0; i < m->nb_clusters; i++) {
         uint64_t flags = 0;
         uint64_t offset = cluster_offset + (i << s->cluster_bits);
+
+        if (be64_to_cpu(l2_table[l2_index + i]) == offset) {
+            continue;
+        }
+
         /* if two concurrent writes happen to the same unallocated cluster
 	 * each write allocates separate cluster and writes data concurrently.
 	 * The first one to complete updates l2 table with pointer to its
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 29/36] qcow2: Integrate SKEIN hash algorithm in deduplication.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (27 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 28/36] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 30/36] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   14 ++++++++++++++
 block/qcow2.c       |    5 +++++
 configure           |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index cc99e27..50ffa54 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -30,6 +30,9 @@
 #include "block/block_int.h"
 #include "qemu-common.h"
 #include "qcow2.h"
+#ifdef CONFIG_SKEIN_DEDUP
+#include <skeinApi.h>
+#endif
 
 static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        QCowHash *hash,
@@ -208,6 +211,17 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
     case QCOW_HASH_SHA256:
         return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
                                 s->cluster_size, hash->data);
+#if defined(CONFIG_SKEIN_DEDUP)
+    case QCOW_HASH_SKEIN:
+        {
+        SkeinCtx_t ctx;
+        skeinCtxPrepare(&ctx, Skein256);
+        skeinInit(&ctx, Skein256);
+        skeinUpdate(&ctx, data, s->cluster_size);
+        skeinFinal(&ctx, hash->data);
+        }
+        return 0;
+#endif
     default:
         error_report("Invalid deduplication hash algorithm %i",
                      s->dedup_hash_algo);
diff --git a/block/qcow2.c b/block/qcow2.c
index 4f8cf68..e742e02 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1550,6 +1550,11 @@ static int8_t qcow2_get_dedup_hash_algo(char *value)
     if (!strcmp(value, "sha256")) {
         return QCOW_HASH_SHA256;
     }
+#if defined(CONFIG_SKEIN_DEDUP)
+    if (!strcmp(value, "skein")) {
+        return QCOW_HASH_SKEIN;
+    }
+#endif
 
     error_printf("Unsupported deduplication hash algorithm.\n");
     return -EINVAL;
diff --git a/configure b/configure
index 390326e..97497af 100755
--- a/configure
+++ b/configure
@@ -223,6 +223,7 @@ libiscsi=""
 coroutine=""
 seccomp=""
 glusterfs=""
+skein_dedup="no"
 
 # parse CC options first
 for opt do
@@ -882,6 +883,8 @@ for opt do
   ;;
   --enable-glusterfs) glusterfs="yes"
   ;;
+  --enable-skein-dedup) skein_dedup="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1130,6 +1133,7 @@ echo "  --with-coroutine=BACKEND coroutine backend. Supported options:"
 echo "                           gthread, ucontext, sigaltstack, windows"
 echo "  --enable-glusterfs       enable GlusterFS backend"
 echo "  --disable-glusterfs      disable GlusterFS backend"
+echo "  --enable-skein-dedup     enable computing dedup hashes with SKEIN"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2412,6 +2416,30 @@ EOF
   fi
 fi
 
+##########################################
+# SKEIN dedup hash function probe
+if test "$skein_dedup" != "no" ; then
+  cat > $TMPC <<EOF
+#include <skeinApi.h>
+int main(void) {
+    SkeinCtx_t ctx;
+    skeinCtxPrepare(&ctx, 512);
+    return 0;
+}
+EOF
+  skein_libs="-lskein3fish"
+  if compile_prog "" "$skein_libs" ; then
+    skein_dedup=yes
+    libs_tools="$skein_libs $libs_tools"
+    libs_softmmu="$skein_libs $libs_softmmu"
+  else
+    if test "$skein_dedup" = "yes" ; then
+      feature_not_found "libskein3fish not found"
+    fi
+    skein_dedup=no
+  fi
+fi
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -3296,6 +3324,7 @@ echo "build guest agent $guest_agent"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine_backend"
 echo "GlusterFS support $glusterfs"
+echo "SKEIN support     $skein_dedup"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3637,6 +3666,10 @@ if test "$glusterfs" = "yes" ; then
   echo "CONFIG_GLUSTERFS=y" >> $config_host_mak
 fi
 
+if test "$skein_dedup" = "yes" ; then
+  echo "CONFIG_SKEIN_DEDUP=y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 30/36] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (28 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 29/36] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 31/36] qcow2: Use large L2 table for deduplication Benoît Canet
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index e742e02..7ef9170 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1616,6 +1616,7 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
                 return hash_algo;
             }
             dedup = true;
+            flags |= BLOCK_FLAG_LAZY_REFCOUNTS;
         }
         options++;
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 31/36] qcow2: Use large L2 table for deduplication.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (29 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 30/36] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 32/36] qcow: Set large dedup hash block size Benoît Canet
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c  |    2 +-
 block/qcow2-refcount.c |   22 +++++++++++++++-------
 block/qcow2.c          |    8 ++++++--
 3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c016e85..8ad4740 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -236,7 +236,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
             goto fail;
         }
 
-        memcpy(l2_table, old_table, s->cluster_size);
+        memcpy(l2_table, old_table, s->l2_size << 3);
 
         ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
         if (ret < 0) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3077a9f..f305510 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -536,12 +536,15 @@ fail:
  */
 static int update_cluster_refcount(BlockDriverState *bs,
                                    int64_t cluster_index,
-                                   int addend)
+                                   int addend,
+                                   bool is_l2)
 {
     BDRVQcowState *s = bs->opaque;
     int ret;
 
-    ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend,
+    int size = is_l2 ? s->l2_size << 3 : 1;
+
+    ret = update_refcount(bs, cluster_index << s->cluster_bits, size, addend,
                           false);
     if (ret < 0) {
         return ret;
@@ -666,7 +669,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if (free_in_cluster == 0)
             s->free_byte_offset = 0;
         if ((offset & (s->cluster_size - 1)) != 0)
-            update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+            update_cluster_refcount(bs, offset >> s->cluster_bits, 1, false);
     } else {
         offset = qcow2_alloc_clusters(bs, s->cluster_size);
         if (offset < 0) {
@@ -676,7 +679,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if ((cluster_offset + s->cluster_size) == offset) {
             /* we are lucky: contiguous data */
             offset = s->free_byte_offset;
-            update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+            update_cluster_refcount(bs, offset >> s->cluster_bits, 1, false);
             s->free_byte_offset += size;
         } else {
             s->free_byte_offset = offset;
@@ -817,7 +820,10 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     } else {
                         uint64_t cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
                         if (addend != 0) {
-                            refcount = update_cluster_refcount(bs, cluster_index, addend);
+                            refcount = update_cluster_refcount(bs,
+                                                               cluster_index,
+                                                               addend,
+                                                               false);
                         } else {
                             refcount = get_refcount(bs, cluster_index);
                         }
@@ -849,7 +855,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 
             if (addend != 0) {
-                refcount = update_cluster_refcount(bs, l2_offset >> s->cluster_bits, addend);
+                refcount = update_cluster_refcount(bs,
+                                                   l2_offset >> s->cluster_bits,
+                                                   addend, true);
             } else {
                 refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
             }
@@ -1145,7 +1153,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
             /* Mark L2 table as used */
             l2_offset &= L1E_OFFSET_MASK;
             inc_refcounts(bs, res, refcount_table, refcount_table_size,
-                l2_offset, s->cluster_size);
+                l2_offset, s->l2_size << 3);
 
             /* L2 tables are cluster aligned */
             if (l2_offset & (s->cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 7ef9170..f70c24b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -432,7 +432,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     s->cluster_bits = header.cluster_bits;
     s->cluster_size = 1 << s->cluster_bits;
     s->cluster_sectors = 1 << (s->cluster_bits - 9);
-    s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
+        s->l2_bits = 17; /* 64 * 16 KB L2 to compensate smaller cluster size */
+    } else {
+        s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    }
     s->l2_size = 1 << s->l2_bits;
     bs->total_sectors = header.size / 512;
     s->csize_shift = (62 - (s->cluster_bits - 8));
@@ -469,7 +473,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     }
 
     /* alloc L2 table/refcount block cache */
-    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s->l2_size << 3);
     s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
                                                  s->cluster_size);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 32/36] qcow: Set large dedup hash block size.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (30 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 31/36] qcow2: Use large L2 table for deduplication Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 33/36] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |    4 ++--
 block/qcow2.c          |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f305510..348342a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1063,7 +1063,7 @@ static int check_dedup_l2(BlockDriverState *bs, BdrvCheckResult *res,
     int i, l2_size;
 
     /* Read L2 table from disk */
-    l2_size = s->cluster_size;
+    l2_size = s->hash_block_size;
     l2_table = g_malloc(l2_size);
 
     if (bdrv_pread(bs->file, l2_offset, l2_table, l2_size) != l2_size) {
@@ -1153,7 +1153,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
             /* Mark L2 table as used */
             l2_offset &= L1E_OFFSET_MASK;
             inc_refcounts(bs, res, refcount_table, refcount_table_size,
-                l2_offset, s->l2_size << 3);
+                l2_offset, dedup ? s->hash_block_size : s->l2_size << 3);
 
             /* L2 tables are cluster aligned */
             if (l2_offset & (s->cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index f70c24b..bd7579a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -434,6 +434,8 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     s->cluster_sectors = 1 << (s->cluster_bits - 9);
     if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
         s->l2_bits = 17; /* 64 * 16 KB L2 to compensate smaller cluster size */
+        s->l2_bits = 16 - 3; /* 64 KB L2 */
+        s->hash_block_size = DEFAULT_CLUSTER_SIZE * 5;
     } else {
         s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 33/36] qemu-iotests: Filter dedup=on/off so existing tests don't break.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (31 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 32/36] qcow: Set large dedup hash block size Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 34/36] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 tests/qemu-iotests/common.rc |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index aef5f52..72e746d 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -124,7 +124,8 @@ _make_test_img()
             -e "s# compat='[^']*'##g" \
             -e "s# compat6=\\(on\\|off\\)##g" \
             -e "s# static=\\(on\\|off\\)##g" \
-            -e "s# lazy_refcounts=\\(on\\|off\\)##g"
+            -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+            -e "s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g"
 
     # Start an NBD server on the image file, which is what we'll be talking to
     if [ $IMGPROTO = "nbd" ]; then
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 34/36] qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (32 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 33/36] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 35/36] qcow2: Add qcow2_co_dedup_resume to restart deduplication Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 36/36] qcow2: Enable the deduplication feature Benoît Canet
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h       |    2 ++
 2 files changed, 99 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 50ffa54..35fcc01 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1019,3 +1019,100 @@ bool qcow2_dedup_is_running(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     return s->has_dedup && s->dedup_status == QCOW_DEDUP_STARTED;
 }
+
+static gint qcow2_dedup_compare_by_hash(gconstpointer a,
+                                        gconstpointer b,
+                                        gpointer data)
+{
+    QCowHash *hash_a = (QCowHash *) a;
+    QCowHash *hash_b = (QCowHash *) b;
+    return memcmp(hash_a->data, hash_b->data, HASH_LENGTH);
+}
+
+static void qcow2_dedup_destroy_qcow_hash_node(gpointer p)
+{
+    QCowHashNode *hash_node = (QCowHashNode *) p;
+    g_free(hash_node);
+}
+
+static gint qcow2_dedup_compare_by_offset(gconstpointer a,
+                                          gconstpointer b,
+                                          gpointer data)
+{
+    uint64_t offset_a = *((uint64_t *) a);
+    uint64_t offset_b = *((uint64_t *) b);
+
+    if (offset_a > offset_b) {
+        return 1;
+    }
+    if (offset_a < offset_b) {
+        return -1;
+    }
+    return 0;
+}
+
+static int qcow2_dedup_alloc(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret;
+
+    ret = qcow2_do_table_init(bs,
+                              &s->dedup_table,
+                              s->dedup_table_offset,
+                              s->dedup_table_size,
+                              false);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    s->dedup_tree_by_hash = g_tree_new_full(qcow2_dedup_compare_by_hash, NULL,
+                                            NULL,
+                                            qcow2_dedup_destroy_qcow_hash_node);
+    s->dedup_tree_by_sect = g_tree_new_full(qcow2_dedup_compare_by_offset,
+                                              NULL, NULL, NULL);
+
+    s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+                                                s->hash_block_size);
+
+    return 0;
+}
+
+static void qcow2_dedup_free(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    g_free(s->dedup_table);
+
+    qcow2_cache_flush(bs, s->dedup_cluster_cache);
+    qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+    g_tree_destroy(s->dedup_tree_by_sect);
+    g_tree_destroy(s->dedup_tree_by_hash);
+}
+
+int qcow2_dedup_init(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    s->has_dedup = true;
+
+    ret = qcow2_dedup_alloc(bs);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if we are read-only we don't load the deduplication table */
+    if (bs->read_only) {
+        return 0;
+    }
+
+    s->dedup_status = QCOW_DEDUP_STARTING;
+
+    return 0;
+}
+
+void qcow2_dedup_close(BlockDriverState *bs)
+{
+    qcow2_dedup_free(bs);
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 9f5d0f0..29267a9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -507,5 +507,7 @@ void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
 void qcow2_dedup_refcount_half_max_reached(BlockDriverState *bs,
                                            uint64_t cluster_index);
 bool qcow2_dedup_is_running(BlockDriverState *bs);
+int qcow2_dedup_init(BlockDriverState *bs);
+void qcow2_dedup_close(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 35/36] qcow2: Add qcow2_co_dedup_resume to restart deduplication.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (33 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 34/36] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 36/36] qcow2: Enable the deduplication feature Benoît Canet
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2-dedup.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 35fcc01..6cd1af4 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -34,6 +34,7 @@
 #include <skeinApi.h>
 #endif
 
+static void qcow2_dedup_reset(BlockDriverState *bs);
 static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
                                        QCowHash *hash,
                                        uint64_t *first_logical_sect,
@@ -1020,6 +1021,175 @@ bool qcow2_dedup_is_running(BlockDriverState *bs)
     return s->has_dedup && s->dedup_status == QCOW_DEDUP_STARTED;
 }
 
+static bool hash_is_null(QCowHash *hash)
+{
+    QCowHash null_hash;
+    memset(&null_hash.data, 0, HASH_LENGTH);
+    return !memcmp(hash->data, null_hash.data, HASH_LENGTH);
+}
+
+static void qcow2_dedup_insert_hash_node(BlockDriverState *bs,
+                                         QCowHashNode *hash_node)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    g_tree_insert(s->dedup_tree_by_hash, &hash_node->hash, hash_node);
+    g_tree_insert(s->dedup_tree_by_sect, &hash_node->physical_sect, hash_node);
+}
+
+/* This load the QCowHashNode corresponding to a given cluster index into ram
+ *
+ * @index: index of the given physical sector
+ * @ret:   0 on succes, negative on error
+ */
+static int qcow2_load_cluster_hash(BlockDriverState *bs,
+                                   uint64_t index)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    QCowHash hash;
+    uint64_t first_logical_sect;
+    QCowHashNode *hash_node;
+
+    /* get the hash */
+    ret = qcow2_dedup_read_write_hash(bs, &hash,
+                                      &first_logical_sect,
+                                      index * s->cluster_sectors,
+                                      false);
+
+    if (ret < 0) {
+        error_report("Failed to load deduplication hash.");
+        return ret;
+    }
+
+    /* if the hash is null don't load it */
+    if (hash_is_null(&hash)) {
+        return ret;
+    }
+
+    hash_node = qcow2_dedup_build_qcow_hash_node(&hash,
+                                                 index * s->cluster_sectors,
+                                                 first_logical_sect);
+    qcow2_dedup_insert_hash_node(bs, hash_node);
+
+    return 0;
+}
+
+/* Load all the actives hashes into RAM
+ *
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_load_valid_hashes(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t max_clusters, i;
+    int nb_hash_in_hash_block = s->hash_block_size / (HASH_LENGTH + 8);
+    int ret = 0;
+
+    max_clusters = s->dedup_table_size * nb_hash_in_hash_block;
+
+    /* load all the hash stored to disk in memory */
+    for (i = 0; i < max_clusters; i++) {
+        if (!(i % nb_hash_in_hash_block)) {
+            co_sleep_ns(rt_clock, s->dedup_co_delay);
+        }
+        qemu_co_mutex_lock(&s->lock);
+        ret = qcow2_load_cluster_hash(bs, i);
+        qemu_co_mutex_unlock(&s->lock);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static int qcow2_drop_to_dedup_stale_hash(BlockDriverState *bs,
+                                          uint64_t index)
+{
+    int ret = 0;
+    bool to_dedup;
+    uint64_t physical_sect;
+
+    to_dedup = qcow2_is_cluster_to_dedup(bs, index, &physical_sect, &ret);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (!to_dedup) {
+        return 0;
+    }
+
+    qcow2_remove_hash_node_by_sector(bs, physical_sect);
+    return 0;
+}
+
+/* For each l2 entry marked as QCOW_OFLAG_TO_DEDUP drop the obsolete hash
+ * from the trees
+ *
+ * @ret: 0 on success, negative on error
+ */
+static int qcow2_drop_to_dedup_hashes(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t i;
+    int ret = 0;
+
+    /* for each l2 entry */
+    for (i = 0; i < s->l2_size * s->l1_size; i++) {
+        if (!(i % s->l2_size)) {
+            co_sleep_ns(rt_clock, s->dedup_co_delay);
+        }
+        qemu_co_mutex_lock(&s->lock);
+        ret = qcow2_drop_to_dedup_stale_hash(bs, i);
+        qemu_co_mutex_unlock(&s->lock);
+
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * This coroutine resume deduplication
+ *
+ * @data: the given BlockDriverState
+ * @ret:  NULL
+ */
+static void coroutine_fn qcow2_co_dedup_resume(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    ret = qcow2_load_valid_hashes(bs);
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = qcow2_drop_to_dedup_hashes(bs);
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    qemu_co_mutex_lock(&s->lock);
+    s->dedup_status = QCOW_DEDUP_STARTED;
+    qemu_co_mutex_unlock(&s->lock);
+
+    return;
+
+fail:
+    qemu_co_mutex_lock(&s->lock);
+    s->dedup_status = QCOW_DEDUP_STOPPED;
+    qcow2_dedup_reset(bs);
+    qemu_co_mutex_unlock(&s->lock);
+}
+
 static gint qcow2_dedup_compare_by_hash(gconstpointer a,
                                         gconstpointer b,
                                         gpointer data)
@@ -1089,6 +1259,12 @@ static void qcow2_dedup_free(BlockDriverState *bs)
     g_tree_destroy(s->dedup_tree_by_hash);
 }
 
+static void qcow2_dedup_reset(BlockDriverState *bs)
+{
+    qcow2_dedup_free(bs);
+    qcow2_dedup_alloc(bs);
+}
+
 int qcow2_dedup_init(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1109,6 +1285,10 @@ int qcow2_dedup_init(BlockDriverState *bs)
 
     s->dedup_status = QCOW_DEDUP_STARTING;
 
+    /* resume deduplication */
+    s->dedup_resume_co = qemu_coroutine_create(qcow2_co_dedup_resume);
+    qemu_coroutine_enter(s->dedup_resume_co, bs);
+
     return 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V5 36/36] qcow2: Enable the deduplication feature.
  2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
                   ` (34 preceding siblings ...)
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 35/36] qcow2: Add qcow2_co_dedup_resume to restart deduplication Benoît Canet
@ 2013-01-16 16:24 ` Benoît Canet
  35 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-16 16:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

---
 block/qcow2.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index bd7579a..753fce0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -542,6 +542,13 @@ static int qcow2_open(BlockDriverState *bs, int flags)
         }
     }
 
+    if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
+        ret = qcow2_dedup_init(bs);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
 #ifdef DEBUG_ALLOC
     {
         BdrvCheckResult result = {0};
@@ -1011,11 +1018,11 @@ fail:
 static void qcow2_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
+
     g_free(s->l1_table);
 
     if (s->has_dedup) {
-        qcow2_cache_flush(bs, s->dedup_cluster_cache);
-        qcow2_cache_destroy(bs, s->dedup_cluster_cache);
+        qcow2_dedup_close(bs);
     }
 
     qcow2_cache_flush(bs, s->l2_table_cache);
@@ -1509,8 +1516,10 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         }
 
         /* minimal init */
-        s->dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
-                                                    s->hash_block_size);
+        ret = qcow2_dedup_init(bs);
+        if (ret < 0) {
+            goto out;
+        }
     }
 
     /* Want a backing file? There you go.*/
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V5 03/36] qcow2: Add qcow2_dedup_read_missing_and_concatenate
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 03/36] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
@ 2013-01-16 16:53   ` Eric Blake
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2013-01-16 16:53 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]

On 01/16/2013 09:24 AM, Benoît Canet wrote:
> This function is used to read missing data when unaligned writes are
> done. This function also concatenate missing data with the given
> qiov data in order to prepare a buffer used to look for duplicated
> clusters.
> 
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/Makefile.objs |    1 +
>  block/qcow2-dedup.c |  119 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.c       |   36 +++++++++++++++-
>  block/qcow2.h       |   12 ++++++
>  4 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 block/qcow2-dedup.c

I'm not an expert in this area of code, so I didn't review it closely.
However, I did notice this:

> +++ b/block/qcow2.c
> @@ -69,7 +69,6 @@ static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
>          return 0;
>  }
>  
> -
>  /* 
>   * read qcow2 extension and fill bs
>   * start reading from start_offset

Spurious whitespace change.

> @@ -1110,6 +1109,41 @@ fail:
>      return ret;
>  }
>  
> +/**
> + * Read some data from the QCOW2 file
> + *
> + * Important: s->lock is dropped. Things can change before the function return

s/return/returns/

> + *            to the caller.
> + *
> + * @data:       the buffer where the data must be stored
> + * @sector_num: the sector number to read in the QCOW2 file
> + * @nb_sectors: the number of sectors to read
> + * @ret:        negative on error
> + */
> +int qcow2_read_cluster_data(BlockDriverState *bs,
> +                            uint8_t *data,
> +                            uint64_t sector_num,
> +                            int nb_sectors)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V5 14/36] qcow2: Load and save deduplication table header extension.
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 14/36] qcow2: Load and save deduplication table header extension Benoît Canet
@ 2013-01-16 17:35   ` Eric Blake
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2013-01-16 17:35 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 909 bytes --]

On 01/16/2013 09:24 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/qcow2.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 

> +    if (s->has_dedup) {
> +        memset(&dedup_table_extension, 0, sizeof(dedup_table_extension));
> +        dedup_table_extension.offset = cpu_to_be64(s->dedup_table_offset);
> +        dedup_table_extension.size = cpu_to_be32(s->dedup_table_size);
> +        dedup_table_extension.hash_algo = s->dedup_hash_algo;
> +        dedup_table_extension.strategies |= 1; /* RAM based lookup */
> +        dedup_table_extension.strategies |= 1 << 2; /* deduplication running */

Magic numbers here; you should probably give these bits a symbolic name
via #define or enum.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V5 20/36] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2.
  2013-01-16 16:24 ` [Qemu-devel] [RFC V5 20/36] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
@ 2013-01-16 17:46   ` Eric Blake
  2013-01-21 11:51     ` Benoît Canet
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2013-01-16 17:46 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

On 01/16/2013 09:24 AM, Benoît Canet wrote:
> A new physical cluster with the same hash value will be used for further
> occurence of this hash.

s/occurence/occurrence/

> ---
>  block/qcow2-dedup.c    |   32 ++++++++++++++++++++++++++++++++
>  block/qcow2-refcount.c |    3 +++
>  block/qcow2.h          |    4 ++++
>  3 files changed, 39 insertions(+)
> 

> +++ b/block/qcow2-refcount.c
> @@ -489,6 +489,9 @@ int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>              ret = -EINVAL;
>              goto fail;
>          }
> +        if (s->has_dedup && deduplication && refcount >= 0xFFFF/2) {
> +            qcow2_dedup_refcount_half_max_reached(bs, cluster_index);

You are hardcoding to a width of 16 bits; however, version 3 makes the
refcount field variable-sized:

         96 -  99:  refcount_order
                    Describes the width of a reference count block entry
(width
                    in bits = 1 << refcount_order). For version 2
images, the
                    order is always assumed to be 4 (i.e. the width is
16 bits).


Hmm, what happens if refcount_order is 0 to disable reference counting?
 That setting is valid for creating a qcow2 file that can't be used for
internal snapshots.  But it also interferes with dedup; so you probably
want to add some additional requirements in the spec (patch 1/36) that
when dedup is in use, refcount_order must be a minimum value (or require
that it be exactly 4, for a width of 16 bits).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V5 20/36] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2.
  2013-01-16 17:46   ` Eric Blake
@ 2013-01-21 11:51     ` Benoît Canet
  0 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-01-21 11:51 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, qemu-devel, stefanha

> You are hardcoding to a width of 16 bits; however, version 3 makes the
> refcount field variable-sized:
>
>          96 -  99:  refcount_order
>                     Describes the width of a reference count block entry
> (width
>                     in bits = 1 << refcount_order). For version 2
> images, the
>                     order is always assumed to be 4 (i.e. the width is
> 16 bits).

Currently the qcow2 code doesn't support anything but refcount_order == 4.

In qcow2.c qcow_open there is:
        be32_to_cpus(&header.refcount_order);
to get the qcow2 order followed by:
    /* Check support for various header values */
    if (header.refcount_order != 4) {
        report_unsupported(bs, "%d bit reference counts",
                           1 << header.refcount_order);
        ret = -ENOTSUP;
        goto fail;
    }

I guess the code doesn't need any special handling for now.

> Hmm, what happens if refcount_order is 0 to disable reference counting?
>  That setting is valid for creating a qcow2 file that can't be used for
> internal snapshots.  But it also interferes with dedup; so you probably
> want to add some additional requirements in the spec (patch 1/36) that
> when dedup is in use, refcount_order must be a minimum value (or require
> that it be exactly 4, for a width of 16 bits).

I'll do that.

Regards

Benoît

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-01-21 11:50 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-16 16:24 [Qemu-devel] [RFC V5 00/36] QCOW2 deduplication core functionality Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 01/36] qcow2: Add deduplication to the qcow2 specification Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 02/36] qcow2: Add deduplication structures and fields Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 03/36] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-01-16 16:53   ` Eric Blake
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 04/36] qcow2: Make update_refcount public Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 05/36] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 06/36] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 07/36] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 08/36] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 09/36] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 10/36] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 11/36] qcow2: Makes qcow2_alloc_cluster_link_l2 mark to deduplicate clusters Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 12/36] qcow2: make the deduplication forget a cluster hash when a cluster is to dedupe Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 13/36] qcow2: Create qcow2_is_cluster_to_dedup Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 14/36] qcow2: Load and save deduplication table header extension Benoît Canet
2013-01-16 17:35   ` Eric Blake
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 15/36] qcow2: Extract qcow2_do_table_init Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 16/36] qcow2-cache: Allow to choose table size at creation Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 17/36] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 18/36] block: Add qemu-img dedup create option Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 19/36] qcow2: Add a deduplication boolean to update_refcount Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 20/36] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
2013-01-16 17:46   ` Eric Blake
2013-01-21 11:51     ` Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 21/36] qcow2: Remove hash when cluster is deleted Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 22/36] qcow2: Add qcow2_dedup_is_running to probe if dedup is running Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 23/36] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 24/36] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 25/36] qcow2: Add verification of dedup table Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 26/36] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 27/36] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 28/36] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 29/36] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 30/36] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 31/36] qcow2: Use large L2 table for deduplication Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 32/36] qcow: Set large dedup hash block size Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 33/36] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 34/36] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 35/36] qcow2: Add qcow2_co_dedup_resume to restart deduplication Benoît Canet
2013-01-16 16:24 ` [Qemu-devel] [RFC V5 36/36] qcow2: Enable the deduplication feature Benoît Canet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.