All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality
@ 2013-06-20 14:26 Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
                   ` (23 more replies)
  0 siblings, 24 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Hello,

This is the new patchset of the QCOW2 deduplication feature using an on disk
key value store modelled after the SILT one.

This is more a millestone reached post than a for review post.

Performance is slow because the code is issuing a lot of read and write to the
SSD in a sequential ways.
The next revision will focus on parallelising the IOs.

Kevin: the first patch add the description of the QCOW2 journal structure.
       The code make it easy to have multiple journal in QCOW2.
       Do you think it could fit others QCOW2 usages.
       Patches describing the others data structure are comming.

Best regards

Benoît

Benoît Canet (24):
  qcow2: Add journal specification.
  qcow2: Add deduplication structures and fields.
  qcow2: Add journal.
  qcow2: Create the log store.
  qcow2: Add the hash store.
  qcow2: Add the deduplication store.
  qcow2: Add qcow2_dedup_read_missing_and_concatenate
  qcow2: Create a way to link to l2 tables when deduplicating.
  qcow2: Make qcow2_update_cluster_refcount public.
  qcow2: Add qcow2_dedup and related functions
  qcow2: Add qcow2_dedup_store_new_hashes.
  qcow2: Do allocate on rewrite on the dedup case.
  qcow2: Implement qcow2_compute_cluster_hash.
  qcow2: Load and save deduplication table header extension.
  qcow2: Extract qcow2_set_incompat_feature and
    qcow2_clear_incompat_feature.
  block: Add qcow2_dedup format and image creation code.
  qcow2: Drop hash for a given cluster when dedup makes refcount >
    2^16/2.
  qcow2: Remove hash when cluster is deleted.
  qcow2: Integrate deduplication in qcow2_co_writev loop.
  qcow2: Serialize write requests when deduplication is activated.
  qcow2: Integrate SKEIN hash algorithm in deduplication.
  qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
  qcow2: Enable the deduplication feature.
  qcow2: Enable deduplication tests

 block/Makefile.objs          |    1 +
 block/qcow2-cluster.c        |   13 +-
 block/qcow2-dedup.c          |  778 +++++++++++++++++++++++++
 block/qcow2-hash-store.c     |  802 ++++++++++++++++++++++++++
 block/qcow2-journal.c        |  587 +++++++++++++++++++
 block/qcow2-log-store.c      | 1281 ++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c       |   20 +-
 block/qcow2-store.c          |  771 +++++++++++++++++++++++++
 block/qcow2.c                |  431 +++++++++++++-
 block/qcow2.h                |  357 +++++++++++-
 configure                    |  121 +++-
 docs/specs/qcow2.txt         |   42 ++
 include/block/block_int.h    |    1 +
 tests/qemu-iotests/common    |    6 +
 tests/qemu-iotests/common.rc |    3 +-
 15 files changed, 5144 insertions(+), 70 deletions(-)
 create mode 100644 block/qcow2-dedup.c
 create mode 100644 block/qcow2-hash-store.c
 create mode 100644 block/qcow2-journal.c
 create mode 100644 block/qcow2-log-store.c
 create mode 100644 block/qcow2-store.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-07-02 14:42   ` Stefan Hajnoczi
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields Benoît Canet
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

---
 docs/specs/qcow2.txt |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..a4ffc85 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -350,3 +350,45 @@ Snapshot table entry:
         variable:   Unique ID string for the snapshot (not null terminated)
 
         variable:   Name of the snapshot (not null terminated)
+
+== Journal ==
+
+QCOW2 can use one or more instance of a metadata journal.
+
+A journal is a sequential log of journal entries appended on a previously
+allocated and reseted area.
+A journal is designed like a linked list with each entry pointing to the next
+so it's easy to iterate over entries.
+
+A journal uses the following constants to denote the type of each entry
+
+TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
+TYPE_END  = 1         the entry ends a journal cluster and point to the next
+                      cluster
+TYPE_HASH = 2         the entry contains a deduplication hash
+
+QCOW2 journal entry:
+
+    Byte 0         :    Size of the entry: size = 2 + n with size <= 254
+
+         1         :    Type of the entry
+
+         2 - size  :    The optional n bytes structure carried by entry
+
+A journal is divided into clusters and no journal entry can be spilled on two
+clusters. This avoid having to read more than one cluster to get a single entry.
+
+For this purpose an entry with the end type is added at the end of a journal
+cluster before starting to write in the next cluster.
+The size of such an entry is set so the entry points to the next cluster.
+
+As any journal cluster must be ended with an end entry the size of regular
+journal entries is limited to 254 bytes in order to always left room for an end
+entry which mimimal size is two bytes.
+
+The only cases where size > 254 are none entries where size = 255.
+
+The replay of a journal stop when the first end none entry is reached.
+
+The journal cluster size is 4096 bytes.
+
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 03/24] qcow2: Add journal Benoît Canet
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.h |  203 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 201 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9421843..953edfe 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -57,7 +57,182 @@
 #define REFCOUNT_CACHE_SIZE 4
 
 #define DEFAULT_CLUSTER_SIZE 65536
-
+#define DEFAULT_DEDUP_CLUSTER_SIZE 4096
+
+#define HASH_LENGTH 32
+
+/* indicate that this cluster hash has been deleted from the key value store */
+#define QCOW_DEDUP_DELETED (1LL << 61)
+/* indicate that the hash structure is empty and miss offset */
+#define QCOW_DEDUP_FLAG_EMPTY   (1LL << 62)
+
+#define SSD_ERASE_BLOCK_SIZE (512 * 1024) /* match SSD erase block size */
+#define JOURNAL_CLUSTER_SIZE 4096       /* used to read entries */
+#define HASH_STORE_CLUSTER_SIZE 4096
+
+#define QCOW_LOG_END_SIZE 2            /* size of a end block journal entry */
+#define QCOW_LOG_STORE_ENTRY_USED (1LL << 60) /* mark used entry in table */
+#define QCOW_LOG_STORE_BUCKET_SIZE 4   /* size of a cuckoo hash bucket */
+#define QCOW_LOG_STORE_MAX_KICKS 128   /* max numbers of cuckoo hash kicks */
+#define QCOW_LOG_STORE_JOURNAL_RATIO 2 /* the ratio to compute the extra
+                                        * room the journal will take based
+                                        * on the log store size
+                                        */
+#define QCOW2_NB_INCARNATION_GOAL  128 /* targeted number of incarnation */
+
+#define QCOW_DEDUP_DIRTY 1 /* dirty flag in the qcow2 header extension */
+
+typedef enum {
+    QCOW_LOG_NONE = 0xFF,     /* on SSD erased clusters will mark none */
+    QCOW_LOG_END = 1,         /* end a block and point to the next */
+    QCOW_LOG_HASH = 2,        /* used to journalize a QCowHashInfo */
+} QCowLogEntryType;
+
+typedef enum {
+    QCOW_HASH_SHA256 = 0,
+    QCOW_HASH_SHA3   = 1,
+    QCOW_HASH_SKEIN  = 2,
+} QCowHashAlgo;
+
+typedef struct {
+    uint8_t data[HASH_LENGTH]; /* 32 bytes hash of a given cluster */
+} __attribute__((packed)) QCowHash;
+
+/* deduplication info */
+typedef struct {
+    QCowHash hash;
+    uint64_t physical_sect;       /* where the cluster is stored on disk */
+    uint64_t first_logical_sect;  /* logical sector of the first occurrence of
+                                   * this cluster
+                                   */
+} __attribute__((packed)) QCowHashInfo;
+
+/* Used to keep a single precomputed hash between the calls of the dedup
+ * function
+ */
+typedef struct {
+    QCowHashInfo hash_info;
+    bool reuse;     /* The main deduplication function can set this field to
+                     * true before exiting to avoid computing the same hash
+                     * twice. It's a speed optimization.
+                     */
+} QcowPersistentHash;
+
+/* Undedupable hashes that must be written later to disk */
+typedef struct QCowHashElement {
+    QCowHashInfo hash_info;
+    QTAILQ_ENTRY(QCowHashElement) next;
+} QCowHashElement;
+
+typedef struct {
+    QcowPersistentHash phash;  /* contains a hash persisting between calls of
+                                * qcow2_dedup()
+                                */
+    QTAILQ_HEAD(, QCowHashElement) undedupables;
+    uint64_t nb_clusters_processed;
+    uint64_t nb_undedupable_sectors;
+} QCowDedupState;
+
+/* The code must take care that the maximum size field of a QCowJournalEntry
+ * will be no more than 254 bytes.
+ * It's required to save the 2 bytes of room for QCOW_LOG_END entries
+ * in every cases
+ */
+typedef union {
+    QCowHashInfo hash_info;
+    uint8_t      padding[254]; /* note the extra two bytes of padding to avoid
+                                * read overflow.
+                                */
+} QCowJournalEntryUnion;
+
+typedef struct {
+    uint8_t size;            /* maximum size of a journal entry is 254 bytes */
+    uint8_t type;            /* contains a QCowLogEntryType for future usage */
+    QCowJournalEntryUnion u;
+} __attribute__((packed)) QCowJournalEntry;
+
+typedef struct {
+    uint64_t sector;                  /* the journal physical on disk sector */
+    uint64_t size;                    /* the size of the journal in bytes */
+    uint64_t index;                   /* index of next buf cluster to write */
+    uint8_t  *write_buf;              /* used to buffer written data */
+    uint64_t offset_in_buf;           /* the offset in the write buffer */
+    bool     flushed;                 /* true if the buffer reached disk*/
+    uint8_t  *read_cache;             /* used to cache read data */
+    int64_t read_index;               /* index the cached read cluster */
+    bool started;                     /* has the journal been resumed */
+} QCowJournal;
+
+typedef struct {
+    QCowJournal journal;          /* the journal this log store will use */
+    uint32_t order;               /* the number of bits used for the sub hashes
+                                   * as sub hashes will be used as an index for
+                                   * locating each bucket nb_bucket = 2^order
+                                   */
+    uint16_t nb_kicks;            /* the number of cuckoo hash kicks done */
+    bool *kick_map;               /* take care of not doing kick path loops */
+    QCowHashInfo *hash_table;     /* nb_buckets * QCOW_LOG_STORE_BUCKET_SIZE */
+    QCowHashInfo *hash_table_copy; /* copy of the hash table for packing */
+
+    /* members required to freeze a log store into a hash store consumable
+     * hash table (incarnation)
+     */
+    uint8_t *write_buf;           /* the buffer used to write */
+    uint64_t write_buf_offset;    /* the on disk offset of the buffer */
+    uint32_t in_buf_offset;       /* the current offset in the write buffer */
+} QCowLogStore;
+
+/* a QCowIncarnation is a frozen QCowLogStore
+ * Freezes are read only and their in ram filters is queried from the youngest
+ * to the oldest in order to know if their hash table contains a given
+ * QCowHashInfo.
+ * If so a read is issued to retrieve the QCowHashInfo.
+ */
+typedef struct QCowIncarnation {
+    uint64_t filter_offset;      /* the on disk offset of the ram filter */
+    uint64_t hash_table_offset;  /* the on disk offset of the hash table
+                                  * (should be just after the end of the filter)
+                                  */
+    uint64_t size;               /* the on disk size of the incarnation */
+    uint8_t  *filter;            /* an in ram filter */
+    QTAILQ_ENTRY(QCowIncarnation) next;
+} QCowIncarnation;
+
+typedef struct QCowLimbo {
+    uint64_t offset;             /* the on disk offset of the to reincarnate
+                                  * disk space
+                                  */
+    QTAILQ_ENTRY(QCowLimbo) next;
+} QCowLimbo;
+
+typedef struct {
+    uint32_t order;               /* the number of bits used for the sub hashes
+                                   * as sub hashes will be used as an index for
+                                   * locating each bucket nb_bucket = 2^order
+                                   */
+    uint32_t nb_incarnations;     /* the current number of incarnations */
+    uint32_t nb_in_limbo;         /* the number of dead incarnations */
+    QTAILQ_HEAD(incarnations_head, QCowIncarnation) incarnations;
+                                               /* a list of frozen hash table
+                                                * ordered from the youngest
+                                                * to the oldest
+                                                */
+    QTAILQ_HEAD(in_limbo_head, QCowLimbo) in_limbo;
+                                       /* a list of dead incarnation which disk
+                                        * space can be recycled
+                                        */
+} QCowHashStore;
+
+typedef struct {
+    uint32_t      order;
+    QCowLogStore  log_store;         /* the current log store */
+    QCowLogStore  frozen_log_store;  /* the log store to incarnate */
+    bool          freezing;          /* are we incarnating a log store */
+    QCowHashStore hash_store;
+    CoMutex       insert_lock;       /* used to prevent multiple freeze attempts
+                                      * at the same time
+                                      */
+} QCowStore;
 
 #define QCOW2_OPT_LAZY_REFCOUNTS "lazy_refcounts"
 
@@ -117,8 +292,10 @@ enum {
 enum {
     QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
     QCOW2_INCOMPAT_DIRTY         = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
+    QCOW2_INCOMPAT_DEDUP_BITNR   = 1,
+    QCOW2_INCOMPAT_DEDUP         = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
 
-    QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY,
+    QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
 };
 
 /* Compatible feature bits */
@@ -163,6 +340,28 @@ typedef struct BDRVQcowState {
     int64_t free_cluster_index;
     int64_t free_byte_offset;
 
+    bool has_dedup;                     /* indicate if this image has dedup */
+
+    /* the following fields are saved in QCOW2 dedup header extension */
+    bool dedup_dirty;                   /* mapped to the header dirty flag */
+    uint64_t dedup_conf_offset;         /* disk offset of the dedup config */
+    size_t dedup_conf_size;             /* disk size of the dedup config */
+    QCowHashAlgo dedup_hash_algo;       /* the cryptographic hash algo used */
+    uint32_t dedup_max_incarnations;    /* the maximum number of incarnations in
+                                         * hash store -> oldest incarnation will
+                                         * be dropped. It's harmless for the
+                                         * dedup and allow to scale.
+                                         * The whole thing act as a FIFO or LRU
+                                         * if value is 0 there is no limits
+                                         */
+
+    QCowStore key_value_store;          /* the key value store used to store
+                                         * dedup hashes
+                                         */
+    int freeze_errno;                   /* catch errors when incarnating */
+    Coroutine *freeze_co;               /* coroutine used when freezing */
+    Coroutine *load_filter_co;          /* used to load incarnations filters */
+
     CoMutex lock;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 03/24] qcow2: Add journal.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 04/24] qcow2: Create the log store Benoît Canet
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

This commit add the code required to manage one or more journals in a qcow2 file.
The primary user of this journal will be the qcow2-log-store.c.
The journal is asynchronous and will require it's users to issue flushs in order
to make sure that entries had reached stable storage.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs   |    1 +
 block/qcow2-journal.c |  587 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h         |   29 +++
 3 files changed, 617 insertions(+)
 create mode 100644 block/qcow2-journal.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 5f0358a..ee894b5 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2-journal.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += vhdx.o
diff --git a/block/qcow2-journal.c b/block/qcow2-journal.c
new file mode 100644
index 0000000..693de37
--- /dev/null
+++ b/block/qcow2-journal.c
@@ -0,0 +1,587 @@
+/*
+ * QCOW2 journal
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/qcow2.h"
+
+/* This function reset the state of a journal
+ *
+ * @journal: the journal to initialize
+ * @sector:  the on disk sector where the journal start
+ * @size:    the size of the journal
+ */
+static void qcow2_journal_reset(QCowJournal *journal,
+                                uint64_t sector,
+                                uint64_t size)
+{
+    journal->sector = sector;
+    journal->size = size;
+    journal->index = 0;
+    /* clear write buffer */
+    memset(journal->write_buf, QCOW_LOG_NONE, JOURNAL_CLUSTER_SIZE * 2);
+    journal->offset_in_buf = 0;
+    journal->flushed = true;
+    /* clear read cache */
+    memset(journal->read_cache, QCOW_LOG_NONE, JOURNAL_CLUSTER_SIZE * 2);
+    /* so we will load the cache at first hit */
+    journal->read_index = -1;
+    /* journal need to be resumed */
+    journal->started = false;
+}
+
+/* This function set the journal to the correct state after having replaying it
+ *
+ * @journal: the journal to resume
+ * @offset:  the offset inside the journal
+ * @ret:     0 on success, -errno on error
+ */
+int qcow2_journal_resume(BlockDriverState *bs,
+                         QCowJournal *journal,
+                         uint64_t offset)
+{
+    uint64_t disk_offset;
+
+    /* flush read cache */
+    journal->read_index = -1;
+
+    /* mark the journal as resumed */
+    journal->started = true;
+
+    /* reset state */
+    journal->index = offset / JOURNAL_CLUSTER_SIZE;
+    journal->offset_in_buf = offset % JOURNAL_CLUSTER_SIZE;
+
+    /* compute on disk offset of the current cluster */
+    disk_offset = journal->sector * BDRV_SECTOR_SIZE +
+                  journal->index * JOURNAL_CLUSTER_SIZE;
+
+    /* read the current cluster in buffer */
+    return bdrv_pread(bs->file,
+                      disk_offset,
+                      journal->write_buf,
+                      JOURNAL_CLUSTER_SIZE);
+
+}
+
+uint64_t qcow2_journal_round_to_erase_blocks(uint64_t size)
+{
+    return ((size / SSD_ERASE_BLOCK_SIZE) + 1) * SSD_ERASE_BLOCK_SIZE;
+}
+
+/* This function reset a journal content to none
+ *
+ * To be called at the end of the incarnate/freeze coroutine 
+ *
+ * @journal: the QCowJournal to reset
+ * @in_coroutine: true if called from a coroutine
+ * @ret:     0 on success, -errno on error
+ */
+static int qcow2_journal_set_to_none(BlockDriverState *bs,
+                                     QCowJournal *journal,
+                                     bool in_coroutine)
+{
+    int ret = 0;
+    uint8_t *buf;
+    uint64_t i, offset;
+
+    /* prepare buffer to erase journal */
+    buf = qemu_blockalign(bs, SSD_ERASE_BLOCK_SIZE);
+    memset(buf, QCOW_LOG_NONE, SSD_ERASE_BLOCK_SIZE);
+
+    /* do the erasing */
+    for (i = 0; i < (journal->size / SSD_ERASE_BLOCK_SIZE); i++) {
+         offset = journal->sector * BDRV_SECTOR_SIZE +
+                  i * SSD_ERASE_BLOCK_SIZE;
+         /* function will be called from the incarnate coroutine ->
+          * don't write on disk when vm is paused
+          */
+         if (in_coroutine) {
+             co_sleep_ns(vm_clock, 1);
+         }
+         ret = bdrv_pwrite(bs->file,
+                           offset,
+                           buf,
+                           SSD_ERASE_BLOCK_SIZE);
+
+         if (ret < 0) {
+             qemu_vfree(buf);
+             return ret;
+         }
+    }
+
+    qemu_vfree(buf);
+    return 0;
+}
+
+/* This function is used to recycle a QCowJournal
+ *
+ * It reset it's state and buffer while keeping back the allocated on disk
+ * space.
+ * Then it clear the two first clusters.
+ *
+ * @journal: the journal to recycle
+ * @ret:     0 on success, -errno on error
+ */
+int qcow2_journal_recycle(BlockDriverState *bs, QCowJournal *journal)
+{
+    qcow2_journal_reset(journal,
+                        journal->sector,
+                        journal->size);
+    /* mark the journal as started */
+    journal->started = true;
+    /* writes 0xFF so QCOW_LOG_NONE terminate the journal */
+    return qcow2_journal_set_to_none(bs, journal, true);
+}
+
+/* This function is used to allocate a journal's buffers
+ *
+ * Set sector and size to zero so the journal will be treated as unallocated
+ *
+ * @journal: the journal to initialize
+ */
+void qcow2_journal_init(BlockDriverState *bs,
+                        QCowJournal *journal)
+{
+    journal->sector = 0;
+    journal->size = 0;
+    /* allocate extra cluster because sometime we will read an end entry
+     * which spill on the second cluster
+     */
+    journal->write_buf = qemu_blockalign(bs, JOURNAL_CLUSTER_SIZE * 2);
+    /* allocate extra read_cache cluster to avoid read casting overflow */
+    journal->read_cache = qemu_blockalign(bs, JOURNAL_CLUSTER_SIZE * 2);
+}
+
+/* This function cleanup a journal
+ *
+ * @journal: the journal to cleanup
+ */
+void qcow2_journal_cleanup(QCowJournal *journal)
+{
+    qemu_vfree(journal->write_buf);
+    qemu_vfree(journal->read_cache);
+}
+
+/* This function tell if a journal on disk space has been allocated
+ *
+ * @journal: the journal to test
+ * @ret:     true if allocated else false
+ */
+bool qcow2_journal_is_allocated(QCowJournal *journal)
+{
+    return journal->sector;
+}
+
+/* This function is used to allocate the on disk space of a journal
+ *
+ * @size: the on disk size of the journal
+ * @ret:  0 on success, -errno on error
+ */
+int qcow2_journal_disk_allocate(BlockDriverState *bs,
+                                QCowJournal *journal,
+                                uint64_t size)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+    int ret = 0;
+
+    /* allocate journal disk space */
+    offset = qcow2_alloc_clusters(bs, size);
+
+    if (offset < 0) {
+        return offset;
+    }
+
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+
+    if (ret < 0) {
+        goto deallocate_exit;
+    }
+
+    qcow2_journal_reset(journal, offset / BDRV_SECTOR_SIZE, size);
+
+    ret = qcow2_journal_set_to_none(bs, journal, false);
+
+    if (ret < 0) {
+        goto deallocate_exit;
+    }
+
+    /* success */
+    return 0;
+
+deallocate_exit:
+    qcow2_free_clusters(bs, offset, size);
+    return ret;
+}
+
+
+/* This function deallocate a journal disk space
+ *
+ * @journal: the journal to deallocate
+ */
+void qcow2_journal_disk_deallocate(BlockDriverState *bs,
+                                   QCowJournal *journal)
+{
+    qcow2_free_clusters(bs,
+                        journal->sector * BDRV_SECTOR_SIZE,
+                        journal->size);
+}
+
+/* Add a QCOW_LOG_END entry pointing to the next cluster of the journal
+ *
+ * Note: the caller code must be sure the end entry will be written in the last
+ *       256 bytes of the current cluster
+ *
+ * @journal: the journal to operate on
+ */
+static void qcow2_journal_add_end_entry(QCowJournal *journal)
+{
+    QCowJournalEntry entry;
+
+    memset(&entry, QCOW_LOG_NONE, sizeof(entry));
+    entry.size = JOURNAL_CLUSTER_SIZE - journal->offset_in_buf;
+    entry.type = QCOW_LOG_END;
+    memcpy(journal->write_buf + journal->offset_in_buf, &entry, entry.size);
+}
+
+/* Reset the journal write buffer increments counters
+ *
+ * @journal: the journal to operate on
+ */
+static void qcow2_journal_reset_buffer_and_inc(QCowJournal *journal)
+{
+    memset(journal->write_buf, QCOW_LOG_NONE, JOURNAL_CLUSTER_SIZE);
+    journal->offset_in_buf = 0;
+    journal->index++;
+}
+
+/* This function write a journal buffer to disk
+ *
+ * note: caller code must be sure that there is at max only 256 bytes of free
+ *       space in the buffer when deciding to end writting in this cluster
+ * note: this function will never return -1 (full) when end_cluster == false
+ *       This is used to flush the current cluster.
+ *
+ * @journal:     the journal to flush the buffer to disk
+ * @end_cluster:   true if we must prepare to write in a new cluster
+ * @ret:         0 on success, -errno on error, -1 if journal is full
+ */
+static int qcow2_journal_write_buffer(BlockDriverState *bs,
+                                      QCowJournal *journal,
+                                      bool end_cluster)
+{
+    int ret = 0;
+    uint64_t offset, total_bytes;
+
+    /* We will add a QCOW_LOG_END entry at the end of the buffer
+     * before writing it to disk.
+     * This entry will have the size required for a walk in the journal to jump
+     * to the next cluster.
+     */
+    if (end_cluster) {
+        qcow2_journal_add_end_entry(journal);
+    }
+
+    /* check if the journal is full and we are creating a new cluster */
+    total_bytes = (journal->index + 1) * JOURNAL_CLUSTER_SIZE;
+    if (end_cluster && total_bytes >= journal->size) {
+        return -1;
+    }
+
+    /* We write the journal buffer to disk */
+    offset = journal->sector * BDRV_SECTOR_SIZE +
+             journal->index * JOURNAL_CLUSTER_SIZE;
+    ret = bdrv_pwrite(bs->file,
+                      offset,
+                      journal->write_buf,
+                      JOURNAL_CLUSTER_SIZE);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* prepare the journal state to buffer a new serie of entry */
+    if (end_cluster) {
+        qcow2_journal_reset_buffer_and_inc(journal);
+    }
+
+    journal->flushed = true;
+
+    return 0;
+}
+
+/* This function is used to append an entry to a journal
+ *
+ * Note: Appending an entry to the journal is an asynchronous process.
+ *       The journal must be flushed at some point to ensure that data have
+ *       reached stable storage.
+ *
+ * @journal:    the journal to append the entry to
+ * @entry:      the journal entry to write
+ * @ret:        offset of the entry in the journal, -errno on error, -1 if full
+ */
+int64_t qcow2_journal_append(BlockDriverState *bs, QCowJournal *journal,
+                             QCowJournalEntry *entry)
+{
+    int ret = 0;
+    uint64_t entry_offset;
+
+    assert(entry->size <= 254);
+
+    /* check if there is room left for this entry and end entry
+     * regular entry size are at max 254 bytes
+     * QCOW_LOG_END entry takes 2 bytes and can skip 256 bytes
+     */
+    if ((journal->offset_in_buf + entry->size + QCOW_LOG_END_SIZE) >=
+        JOURNAL_CLUSTER_SIZE) {
+        /* if not we must flush the buffer and create a new one */
+        ret = qcow2_journal_write_buffer(bs, journal, true);
+    }
+
+    /* error or journal full */
+    if (ret < 0) {
+        return ret;
+    }
+
+    entry_offset = journal->index * JOURNAL_CLUSTER_SIZE +
+                   journal->offset_in_buf;
+
+    /* write entry to the journal buffer */
+    memcpy(journal->write_buf + journal->offset_in_buf, entry, entry->size);
+    journal->offset_in_buf += entry->size;
+    journal->flushed = false;
+
+    return entry_offset;
+}
+
+/* This function is used to read a journal entry from disk given it's offset
+ *
+ * @journal: the journal to read the entry from
+ * @offset:  the offset of the entry in the journal
+ * @entry:   the journal entry to read the data into
+ * @ret:     size of the entry on success, -errno on error
+ */
+static int qcow2_journal_read_from_disk(BlockDriverState *bs,
+                                        QCowJournal *journal,
+                                        uint64_t offset,
+                                        QCowJournalEntry *entry)
+{
+
+    int ret = 0;
+    uint64_t index;
+
+
+    index = offset / JOURNAL_CLUSTER_SIZE;
+
+    /* If this cluster is not cached read it.
+     * Cache will help if we iterate, otherwise for dedup access it will be
+     * totally random.
+     */
+    if (index != journal->read_index) {
+        uint64_t read_offset = journal->sector * BDRV_SECTOR_SIZE +
+                               index * JOURNAL_CLUSTER_SIZE;
+        ret = bdrv_pread(bs->file,
+                         read_offset,
+                         journal->read_cache,
+                         JOURNAL_CLUSTER_SIZE);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        journal->read_index = index;
+    }
+
+    /* The maximum size of an entry is 254 bytes and the journal
+     * read_cache has an extra cluster to avoid read overflow
+     */
+    memcpy(entry,
+           journal->read_cache + offset % JOURNAL_CLUSTER_SIZE,
+           sizeof(QCowJournalEntry));
+
+    return entry->size;
+}
+
+/* This function tell if a journal entry is in the write buffer */
+static bool qcow2_journal_is_offset_in_buf(QCowJournal *journal,
+                                           uint64_t offset)
+{
+    return journal->index == offset / JOURNAL_CLUSTER_SIZE;
+}
+
+/* This function read a journal entry from the write buffer */
+static int qcow2_journal_read_from_buf(QCowJournal *journal,
+                                       uint64_t offset,
+                                       QCowJournalEntry *entry)
+{
+    memcpy(entry,
+           journal->write_buf + offset % JOURNAL_CLUSTER_SIZE,
+           sizeof(QCowJournalEntry));
+    return entry->size;
+}
+
+/* This function is used to read a journal entry given it's offset
+ *
+ * As this function return the size of the entry just read it can be used as
+ * an iterator to walk in the journal.
+ *
+ * @journal: the journal to read the entry from
+ * @offset:  the offset of the entry in the journal
+ * @entry:   the journal entry to read the data into
+ * @ret:     size of the entry on success, -errno on error
+ */
+int qcow2_journal_read(BlockDriverState *bs, QCowJournal *journal,
+                       uint64_t offset, QCowJournalEntry *entry)
+{
+    if (offset >= journal->size) {
+        return -EFBIG;
+    }
+
+    if (journal->started &&
+        qcow2_journal_is_offset_in_buf(journal, offset)) {
+        return qcow2_journal_read_from_buf(journal, offset, entry);
+    }
+
+    return qcow2_journal_read_from_disk(bs, journal, offset, entry);
+}
+
+/* This function is used to flush the current journal page to disk
+ *
+ * Note: it does not issue a bdrv_flush it's up to the caller to do so.
+ * Note: qcow2_journal_write_buffer will never return -1 (full) when it receive
+ *       end_cluster == false) 
+ *
+ * @journal: the journal to flush
+ * @ret:     0 on success, -errno on error
+ */
+int qcow2_journal_flush(BlockDriverState *bs, QCowJournal *journal)
+{
+    if (journal->flushed) {
+        return 0;
+    }
+
+    /* write the current cluster while not ending it */
+    return qcow2_journal_write_buffer(bs, journal, false);
+}
+
+/* This function flush and stop a journal
+ *
+ * @journal: the journal to stop
+ * @ret:     0 on success, -errno on error
+ */
+int qcow2_journal_stop(BlockDriverState *bs, QCowJournal *journal)
+{
+    int ret = 0;
+
+    ret = qcow2_journal_flush(bs, journal);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    journal->started = false;
+
+    return 0;
+}
+
+/* This function convert a hash info to to a journal entry
+ *
+ * @entry:     the already allocated destination journal entry
+ * @hash_info: the QCowHashInfo to take data from
+ */
+void qcow2_journal_entry_from_hash_info(QCowJournalEntry *entry,
+                                        QCowHashInfo *hash_info)
+{
+    entry->size = 2 + sizeof(QCowHashInfo);
+    entry->type = QCOW_LOG_HASH;
+    memcpy(&entry->u.hash_info, hash_info, sizeof(QCowHashInfo));
+    entry->u.hash_info.physical_sect = cpu_to_be64(hash_info->physical_sect);
+    entry->u.hash_info.first_logical_sect =
+        cpu_to_be64(hash_info->first_logical_sect);
+}
+
+/* This function convert a journal entry to a hash info
+ *
+ * @entry:     the already allocated QCowHashInfo
+ * @hash_info: the QCowJournalEntry to take data from
+ * @ret:       O on success, -1 on invalid entry
+ */
+int qcow2_hash_info_from_journal_entry(QCowHashInfo *hash_info,
+                                       QCowJournalEntry *entry)
+{
+    if (entry->type != QCOW_LOG_HASH) {
+        return -EINVAL;
+    }
+
+    if (entry->size != (2 + sizeof(QCowHashInfo))) {
+        return -EINVAL;
+    }
+
+    memcpy(hash_info, &entry->u.hash_info, sizeof(QCowHashInfo));
+    hash_info->physical_sect = be64_to_cpu(hash_info->physical_sect);
+    hash_info->first_logical_sect = be64_to_cpu(hash_info->first_logical_sect);
+
+    return 0;
+}
+
+/* This function compute the size of a journal when dumped */
+size_t qcow2_journal_dump_size(void)
+{
+    return sizeof(uint64_t) * 2;
+}
+
+/* This function dump the required journal information in a buffer
+ *
+ * @buf:     the buffer to do the dump into
+ * @journal: the journal to dump
+ * @ret:     the size of the dump
+ */
+size_t qcow2_journal_dump(uint8_t *buf, QCowJournal *journal)
+{
+    uint64_t *buf64 = (uint64_t *) buf;
+
+    buf64[0] = cpu_to_be64(journal->sector);
+    buf64[1] = cpu_to_be64(journal->size);
+
+    return qcow2_journal_dump_size();
+}
+
+/* This function parse a journal dump
+ *
+ *
+ * @journal: the journal to parse the dump into
+ * @buf:     the buffer to read the journal info from
+ */
+void qcow2_journal_parse(QCowJournal *journal, uint8_t *buf)
+{
+    uint64_t *buf64 = (uint64_t *) buf;
+
+    return qcow2_journal_reset(journal,
+                               be64_to_cpu(buf64[0]),  /* sector */
+                               be64_to_cpu(buf64[1])); /* size */
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 953edfe..adde631 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -606,4 +606,33 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
     void **table);
 int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
+/* qcow2-journal.c functions */
+int qcow2_journal_resume(BlockDriverState *bs,
+                         QCowJournal *journal,
+                         uint64_t offset);
+uint64_t qcow2_journal_round_to_erase_blocks(uint64_t size);
+int qcow2_journal_recycle(BlockDriverState *bs, QCowJournal *journal);
+void qcow2_journal_init(BlockDriverState *bs,
+                        QCowJournal *journal);
+void qcow2_journal_cleanup(QCowJournal *journal);
+bool qcow2_journal_is_allocated(QCowJournal *journal);
+int qcow2_journal_disk_allocate(BlockDriverState *bs,
+                                QCowJournal *journal,
+                                uint64_t size);
+void qcow2_journal_disk_deallocate(BlockDriverState *bs,
+                                   QCowJournal *journal);
+int64_t qcow2_journal_append(BlockDriverState *bs, QCowJournal *journal,
+                             QCowJournalEntry *entry);
+int qcow2_journal_read(BlockDriverState *bs, QCowJournal *journal,
+                       uint64_t offset, QCowJournalEntry *entry);
+int qcow2_journal_flush(BlockDriverState *bs, QCowJournal *journal);
+int qcow2_journal_stop(BlockDriverState *bs, QCowJournal *journal);
+void qcow2_journal_entry_from_hash_info(QCowJournalEntry *entry,
+                                        QCowHashInfo *hash_info);
+int qcow2_hash_info_from_journal_entry(QCowHashInfo *hash_info,
+                                       QCowJournalEntry *entry);
+size_t qcow2_journal_dump_size(void);
+size_t qcow2_journal_dump(uint8_t *buf, QCowJournal *journal);
+void qcow2_journal_parse(QCowJournal *journal, uint8_t *buf);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 04/24] qcow2: Create the log store.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (2 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 03/24] qcow2: Add journal Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 05/24] qcow2: Add the hash store Benoît Canet
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

The log store is an in RAM hash table used to index QCowJournalEntries.
Combining the log store with a journal allows to get the benefits of the journal
(fewer random write and minimization of SSD wear out) with the advantages of a
hash table (being good at random access).

Once the journal is full and not packable or if the log store hash table is full
the journal is frozen and converted into an on disk hash table used by the hash
store.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs     |    2 +-
 block/qcow2-log-store.c | 1039 +++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h           |   26 ++
 3 files changed, 1066 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-log-store.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index ee894b5..8c0b646 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,6 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
-block-obj-y += qcow2-journal.o
+block-obj-y += qcow2-journal.o qcow2-log-store.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += vhdx.o
diff --git a/block/qcow2-log-store.c b/block/qcow2-log-store.c
new file mode 100644
index 0000000..24ae310
--- /dev/null
+++ b/block/qcow2-log-store.c
@@ -0,0 +1,1039 @@
+/*
+ * QCOW2 log store
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/qcow2.h"
+
+static uint64_t qcow2_dedup_get_h(QCowHashInfo *hash_info, int index,
+                                  uint32_t order)
+{
+    uint64_t *array = (uint64_t *) &hash_info->hash.data;
+    uint64_t result = array[index];
+    return result & ((1 << order) - 1);
+}
+
+uint64_t qcow2_dedup_h1(QCowHashInfo *hash_info, uint32_t order)
+{
+    return qcow2_dedup_get_h(hash_info, 0, order);
+}
+
+uint64_t qcow2_dedup_h2(QCowHashInfo *hash_info, uint32_t order)
+{
+    return qcow2_dedup_get_h(hash_info, 1, order);
+}
+
+static bool qcow2_dedup_h_equals(QCowHashInfo *hash_info, uint32_t order)
+{
+    return qcow2_dedup_h1(hash_info, order) ==
+           qcow2_dedup_h2(hash_info, order);
+}
+
+static void qcow2_log_store_reset_kick_map(QCowLogStore *store)
+{
+    uint64_t i;
+    for (i = 0; i < qcow2_log_store_nb_entries(store->order); i++) {
+        store->kick_map[i] = false;
+    }
+}
+
+static bool qcow2_log_store_is_in_kick_path(QCowLogStore *store,
+                                            uint64_t in_table_idx)
+{
+    return store->kick_map[in_table_idx];
+}
+
+static void qcow2_log_store_set_in_kick_path(QCowLogStore *store,
+                                             uint64_t in_table_idx)
+{
+    store->kick_map[in_table_idx] = true;
+}
+
+static void qcow2_log_store_unset_from_kick_path(QCowLogStore *store,
+                                                 uint64_t in_table_idx)
+{
+    store->kick_map[in_table_idx] = false;
+}
+
+static void qcow2_log_store_set_perm_element(int *perm, int value, int rnd)
+{
+    for (;perm[rnd % QCOW_LOG_STORE_BUCKET_SIZE] != -1; rnd++);
+    perm[rnd % QCOW_LOG_STORE_BUCKET_SIZE] = value;
+}
+
+/* This function create a random permutation of the set
+ * [0, QCOW_LOG_STORE_BUCKET_SIZE]
+ */
+static void qcow2_log_store_make_perm(int *perm)
+{
+    int i, rnd;
+
+    /* reset */
+    for (i = 0; i < QCOW_LOG_STORE_BUCKET_SIZE; i++) {
+        perm[i] = -1;
+    }
+
+    /* create permutation */
+    for (i = 0; i < QCOW_LOG_STORE_BUCKET_SIZE; i++) {
+        rnd = rand();
+        qcow2_log_store_set_perm_element(perm, i, rnd);
+    }
+}
+
+/* This function is used to make sure we don't create path cycle while kicking
+ *
+ * The function uses a random permutation to compute the kick idx.
+ * This way two call to the function should not give the same results.
+ * Randomization increase hash table usage by 13% giving a 93% filling rate.
+ *
+ * @store:      the store to work with
+ * @bucket_idx: the index of the bucket where we are trying to get an entry to
+ *              push
+ * @ret:        [0, QCOW_LOG_STORE_BUCKET_SIZE] on succes else -2
+ */
+static int qcow2_log_store_get_kick_idx(QCowLogStore *store,
+                                        uint64_t bucket_idx)
+{
+    QCowHashInfo *entry;
+    uint64_t in_table_idx;
+    int perm[QCOW_LOG_STORE_BUCKET_SIZE];
+    int i;
+
+    /* create a pseudo random permutation */
+    qcow2_log_store_make_perm(&perm[0]);
+
+    /* use the permutation to get entry to kick */
+    for (i = 0; i < QCOW_LOG_STORE_BUCKET_SIZE; i++) {
+        in_table_idx = bucket_idx * QCOW_LOG_STORE_BUCKET_SIZE + perm[i];
+        entry = &store->hash_table[in_table_idx];
+
+        /* we take care of not selecting an entry already moved or an entry
+         * which as no alternate location
+         */
+        if (!qcow2_log_store_is_in_kick_path(store, in_table_idx) &&
+            !qcow2_dedup_h_equals(entry, store->order)) {
+            qcow2_log_store_set_in_kick_path(store, in_table_idx);
+            return perm[i];
+        }
+    }
+
+    return -2;
+}
+
+
+uint64_t qcow2_log_store_nb_entries(uint32_t order)
+{
+    /* entries are grouped in QCOW_LOG_STORE_BUCKET_SIZE sized buckets */
+    return (1 << order) * QCOW_LOG_STORE_BUCKET_SIZE;
+}
+
+static uint64_t qcow2_log_store_journal_size(uint32_t order)
+{
+    /* size of a QCowLog hash entry */
+    uint32_t qcow_log_hash_size = sizeof(QCowHashInfo) +
+                                  QCOW_LOG_END_SIZE;
+    /* leave some extra room for insert/delete cycle and other QCOW2 journal
+     * insertion
+     */
+    return qcow2_journal_round_to_erase_blocks(qcow2_log_store_nb_entries(order)
+                                               * qcow_log_hash_size
+                                               * QCOW_LOG_STORE_JOURNAL_RATIO);
+}
+
+/* this function reset a log store state
+ *
+ * @store: the log store to reset
+ * @order: the bit order
+ */
+static void qcow2_log_store_reset(QCowLogStore *store, uint32_t order)
+{
+    store->order = order;
+    store->nb_kicks = 0;
+    qcow2_log_store_reset_kick_map(store);
+    memset(store->hash_table, 0,
+           qcow2_log_store_nb_entries(order) * sizeof(QCowHashInfo));
+    memset(store->write_buf, 0, HASH_STORE_CLUSTER_SIZE);
+    store->write_buf_offset = 0;
+    store->in_buf_offset = 0;
+}
+
+/* This function create a new log store
+ *
+ * The function does allocate journal disk space
+ *
+ * @store: the log store to work on
+ * @order: the bit order
+ * @ret:   0 on success, -errno on error
+ */
+int qcow2_log_store_create(BlockDriverState *bs,
+                           QCowLogStore *store,
+                           uint32_t order)
+{
+    qcow2_log_store_reset(store, order);
+    return qcow2_journal_disk_allocate(bs,
+               &store->journal,
+               qcow2_log_store_journal_size(store->order));
+}
+
+
+int qcow2_log_store_recycle(BlockDriverState *bs,
+                            QCowLogStore *store)
+{
+    qcow2_log_store_reset(store, store->order);
+    return qcow2_journal_recycle(bs, &store->journal);
+}
+
+/* This function initialize a log store
+ *
+ * @store: the log store to initialize
+ */
+void qcow2_log_store_init(BlockDriverState *bs,
+                          QCowLogStore *store,
+                          uint32_t order)
+{
+    store->order = order;
+    qcow2_journal_init(bs, &store->journal);
+    store->hash_table = g_new0(QCowHashInfo,
+                               qcow2_log_store_nb_entries(order));
+    store->kick_map = g_new0(bool, qcow2_log_store_nb_entries(order));
+    store->write_buf = qemu_blockalign(bs, HASH_STORE_CLUSTER_SIZE);
+}
+
+void qcow2_log_store_cleanup(QCowLogStore *store)
+{
+    qemu_vfree(store->write_buf);
+    g_free(store->hash_table);
+    g_free(store->kick_map);
+    qcow2_journal_cleanup(&store->journal);
+}
+
+static bool qcow2_log_store_is_used_by_table_idx(QCowHashInfo *table,
+                                                 uint64_t in_table_idx)
+{
+    QCowHashInfo *entry;
+    entry = &table[in_table_idx];
+    return entry->first_logical_sect & QCOW_LOG_STORE_ENTRY_USED;
+}
+
+static void qcow2_log_store_copy_to_hash_table(QCowHashInfo *table,
+                                               uint64_t in_table_idx,
+                                               QCowHashInfo *hash_info)
+{
+    QCowHashInfo *entry;
+
+    entry = &table[in_table_idx];
+    memcpy(entry, hash_info, sizeof(QCowHashInfo));
+    entry->first_logical_sect |= QCOW_LOG_STORE_ENTRY_USED;
+}
+
+static void qcow2_log_store_copy_from_hash_table(QCowHashInfo *hash_info,
+                                                 QCowHashInfo *table,
+                                                 uint64_t in_table_idx)
+{
+    QCowHashInfo *entry;
+
+    entry = &table[in_table_idx];
+    memcpy(hash_info, entry, sizeof(QCowHashInfo));
+    hash_info->first_logical_sect &= ~QCOW_LOG_STORE_ENTRY_USED;
+}
+
+static uint64_t qcow2_log_store_tag_by_bucket_idx(QCowLogStore *store,
+                                                  QCowHashInfo *hash_info,
+                                                  uint64_t bucket_idx)
+{
+    uint64_t h[2];
+
+    h[0] = qcow2_dedup_h1(hash_info, store->order);
+    h[1] = qcow2_dedup_h2(hash_info, store->order);
+
+    if (bucket_idx == h[0]) {
+        return h[1];
+    }
+
+    if (bucket_idx == h[1]) {
+        return h[0];
+    }
+
+    /* should not pass here */
+    assert(false);
+    return 0;
+}
+
+/* This function calculate the order (number of bit) required
+ *
+ * @min_nb_entries_required: minimal number of entry the log store must handle
+ * @ret:                     number of bit (order) of the sub hashes functions
+ */
+int qcow2_log_store_compute_order(uint64_t min_nb_entries_required)
+{
+    uint32_t result = 1;
+    uint64_t min = min_nb_entries_required / QCOW_LOG_STORE_BUCKET_SIZE;
+    while (min) {
+        min >>= 1;
+        result++;
+    }
+    return result;
+}
+
+/* This function get a hash info from the table given it's table index
+ *
+ * It get the result only if the given hash info as the same hash as the one
+ * stored in the table
+ *
+ * @store:        the store to work on
+ * @hash_info:    the QCowHashInfo we are working with (will be overwritten)
+ * @in_table_idx: the index of the entry in the cuckoo hash table
+ * @ret:          true if hash match else false
+ */
+static int qcow2_log_store_try_get_hash_info(QCowLogStore *store,
+                                             QCowHashInfo *hash_info,
+                                             uint64_t in_table_idx)
+{
+    bool match;
+    QCowHashInfo *entry;
+
+    entry = &store->hash_table[in_table_idx];
+
+    match = !memcmp(&entry->hash.data,
+                    &hash_info->hash.data,
+                    HASH_LENGTH);
+
+    if (match) {
+        memcpy(hash_info, entry, sizeof(QCowHashInfo));
+        hash_info->first_logical_sect &= ~QCOW_LOG_STORE_ENTRY_USED;
+    }
+
+    return match;
+}
+
+/* This functions try to get a QCowHashInfo the store hash_table
+ *
+ * @store:       the store to work on
+ * @hash_info:   the QCowHashInfo we are indexing
+ * @bucket_idx:  the bucket index
+ * @index_only:  if true only return the in bucket index and do not overwritte
+ *                 the QCowHashInfo parameter.
+ *                 This is used to check if we can overwrite a given entry in
+ *                 the bucket.
+ * @ret:         integer in range [0, (QCOW_LOG_STORE_BUCKET_SIZE - 1)] or
+ *                 -1 if not found
+ */
+static int qcow2_log_store_try_get(QCowLogStore *store,
+                                   QCowHashInfo *hash_info,
+                                   uint64_t bucket_idx,
+                                   bool index_only)
+{
+    int i;
+    bool got = false;
+    uint64_t in_table_idx;
+    QCowHashInfo dummy;
+
+    /* if index_only is true we want to work on a dummy QCowHashInfo copy */
+    memcpy(&dummy, hash_info, sizeof(QCowHashInfo));
+
+    /* for each bucket entry */
+    for (i = 0; i < QCOW_LOG_STORE_BUCKET_SIZE; i++) {
+        in_table_idx = bucket_idx * QCOW_LOG_STORE_BUCKET_SIZE + i;
+
+        got = qcow2_log_store_try_get_hash_info(store,
+                                                index_only ? &dummy :hash_info,
+                                                in_table_idx);
+
+        /* the hash info is indexed by this entry */
+        if (got) {
+            return i;
+        }
+    }
+
+    /* no entry to overwrite -> not found */
+    return -1;
+}
+
+/* This function look for a given hash info in a log store
+ *
+ * @store:     The QCowLogStore to lookup into
+ * @hash_info: The QCowHashInfo to complete : at last the hash must be filled.
+ * @ret:       true on successfull lookup, false if not found
+ */
+bool qcow2_log_store_get(QCowLogStore *store, QCowHashInfo *hash_info)
+{
+    uint64_t h[2], bucket_idx;
+    int ret = 0;
+    int i;
+
+    h[0] = qcow2_dedup_h1(hash_info, store->order);
+    h[1] = qcow2_dedup_h2(hash_info, store->order);
+
+    /* try to get the hash info at the two alternate location */
+    for (i = 0; i < 2; i++) {
+        bucket_idx = h[i];
+
+        ret = qcow2_log_store_try_get(store, hash_info, bucket_idx, false);
+
+        /* on success (in range [0, (QCOW_LOG_STORE_BUCKET_SIZE - 1)]) */
+        if (ret >= 0) {
+            return true;
+        }
+    }
+
+    /* not found */
+    return false;
+}
+
+/* This function try to append an entry in a new journal given it's table index
+ *
+ * The nop operation (no append to do) is considered as a success
+ *
+ * @table:        the hash table to work on
+ * @dest:         the new journal to insert the entry into
+ * @in_table_idx: the index in the hash table of the entry to append
+ * @ret:          0 on success, -1 if the new journal is full,
+ *                -EFBIG on bogus entry offset, -errno on error
+ */
+static int qcow2_log_store_migrate_journal_entry_by_index(BlockDriverState *bs,
+                                                       QCowJournal *dest,
+                                                       QCowHashInfo *table,
+                                                       uint64_t in_table_idx)
+{
+    QCowJournalEntry journal_entry;
+    QCowHashInfo hash_info;
+    int64_t offset;
+
+    /* return success on nop (the entry is not used) */
+    if (!qcow2_log_store_is_used_by_table_idx(table, in_table_idx)) {
+        return 0;
+    }
+
+    qcow2_log_store_copy_from_hash_table(&hash_info,
+                                         table,
+                                         in_table_idx);
+
+    qcow2_journal_entry_from_hash_info(&journal_entry, &hash_info);
+
+    /* get the new offset by writing the journal entry in the new journal */
+    offset = qcow2_journal_append(bs, dest, &journal_entry);
+
+    /* error or full */
+    if (offset < 0) {
+        return offset;
+    }
+
+    return 0;
+}
+
+static void qcow2_log_store_copy_hash_table(QCowLogStore *store)
+{
+    store->hash_table_copy = g_new0(QCowHashInfo,
+                                    qcow2_log_store_nb_entries(store->order));
+    memcpy(store->hash_table_copy, store->hash_table,
+           qcow2_log_store_nb_entries(store->order) * sizeof(QCowHashInfo));
+}
+
+/* this function do the real work of packing the store journal into a new one
+ *
+ * The function take care of flushing the new journal
+ *
+ * @store:       the store to work on
+ * @new_journal: the destination journal
+ * @ret:          0 on success, -1 if the new journal is full,
+ *                -EFBIG on bogus entry offset, -errno on error
+ */
+static int qcow2_log_store_do_pack_journal(BlockDriverState *bs,
+                                           QCowLogStore *store,
+                                           QCowJournal *dest)
+{
+    uint64_t i;
+    int ret = 0;
+
+    /* we will insert each used QCowLogStore entry into the new journal */
+    for (i = 0; i < qcow2_log_store_nb_entries(store->order); i++) {
+        ret = qcow2_log_store_migrate_journal_entry_by_index(bs,
+                                                         dest,
+                                                         store->hash_table_copy,
+                                                         i);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return qcow2_journal_flush(bs, dest);
+}
+
+static void qcow2_log_store_commit_new_journal(BlockDriverState *bs,
+                                               QCowLogStore *store,
+                                               QCowJournal *new_journal)
+{
+    /* cleanup old data */
+    qcow2_journal_disk_deallocate(bs, &store->journal);
+    g_free(store->hash_table);
+    /* commit new journal and hash table */
+    memcpy(&store->journal, new_journal, sizeof(new_journal));;
+    store->hash_table = store->hash_table_copy;
+    store->hash_table_copy = NULL;
+}
+
+/* This function pack a log store journal into a new one
+ *
+ * This is used to manage pathological cyclic insertion/deletion cases or the
+ * case where a log store journal is full due to too much entries inserted by
+ * other QCOW2 users.
+ * As this event should be rare we don't bother recycling the journal
+ * disk space. As a consequence the QCOW2 file may grow.
+ *
+ * @store:     the log store from which the journal must be packed
+ * @need_save: will be set to true if new journal config must be saved later
+                 by the upper layers
+ * @ret:       0 on success, -1 if the new journal is full,
+ *               -EFBIG on bogus entry offset, -errno on error
+ */
+static int qcow2_log_store_pack_journal(BlockDriverState *bs,
+                                        QCowLogStore *store,
+                                        bool *need_save)
+{
+    QCowJournal new_journal;
+    int ret = 0;
+    /* we won't need to save journal config on failure */
+    *need_save = false;
+
+    /* TODO: insert journal replay here if other part of QCOW make use of it */
+
+    /* we flush the journal first */
+    ret = qcow2_journal_flush(bs, &store->journal);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* we allocate a new journal, we will copy used entry into */
+    qcow2_journal_init(bs, &new_journal);
+    ret = qcow2_journal_disk_allocate(bs,
+              &new_journal,
+              qcow2_log_store_journal_size(store->order));
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* copy the hash table to be able to abort packing at any time */
+    qcow2_log_store_copy_hash_table(store);
+
+    ret = qcow2_log_store_do_pack_journal(bs,
+                                          store,
+                                          &new_journal);
+
+    if (ret < 0) {
+        goto free_dealloc_error_exit;
+    }
+
+    qcow2_log_store_commit_new_journal(bs, store, &new_journal);
+
+    /* tell the upper layer to save the new journal config to disk */
+    *need_save = true;
+
+    return 0;
+
+free_dealloc_error_exit:
+    g_free(store->hash_table_copy);
+    qcow2_journal_disk_deallocate(bs, &new_journal);
+    qcow2_journal_cleanup(&new_journal);
+    return ret;
+}
+
+/* This function append a QCowHashInfo to the journal
+ *
+ * The function will try to pack the journal and redo the insert if the journal
+ * is full
+ *
+ * store:      the QCowLogStore to work with
+ * hash_info:  the QCowHashInfo to insert into the store
+ * @need_save: will be set to true if new journal info must be saved later
+ * ret:        the offset of the QCowJournalEntry on success, -1 if a journal
+ *             packing failed or journal is full and unpackable,
+               -EFBIG on bogus entry offset while packing,
+ *             -errno on error
+ */
+static int64_t qcow2_log_store_append_to_journal(BlockDriverState *bs,
+                                                 QCowLogStore *store,
+                                                 QCowHashInfo *hash_info,
+                                                 bool *need_save)
+{
+    QCowJournalEntry journal_entry;
+    int ret = 0;
+    int64_t offset;
+    *need_save = false;
+
+    qcow2_journal_entry_from_hash_info(&journal_entry, hash_info);
+
+    offset = qcow2_journal_append(bs, &store->journal, &journal_entry);
+
+    /* the journal is full : maybe a pathological cyclic insertion/deletion
+     * pattern made it full.
+     * We will create a new one and repack the current journal into it.
+     */
+    if (offset == -1) {
+        ret = qcow2_log_store_pack_journal(bs, store, need_save);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        offset = qcow2_journal_append(bs, &store->journal, &journal_entry);
+    }
+
+    return offset;
+}
+
+/* This function find the first free entry index
+ *
+ * @store:      the store to work on
+ * @bucket_idx: the bucket to scan index
+ * @ret:        integer in range [0, (QCOW_LOG_STORE_BUCKET_SIZE - 1)]
+ *              or -1 if no room found
+ */
+static int qcow2_log_store_first_free_entry_idx(QCowLogStore *store,
+                                                uint64_t bucket_idx)
+{
+    uint64_t in_table_idx;
+    bool used;
+    int i;
+
+    /* for each entry of the bucket */
+    for (i = 0; i < QCOW_LOG_STORE_BUCKET_SIZE; i++) {
+        in_table_idx = bucket_idx * QCOW_LOG_STORE_BUCKET_SIZE + i;
+
+        /* return index of the first free entry */
+        used = qcow2_log_store_is_used_by_table_idx(store->hash_table,
+                                                    in_table_idx);
+
+        if (!used) {
+            return i;
+        }
+    }
+
+    /* no free entry found -> place not found */
+    return -1;
+}
+
+/* This function recursively kick an entry to it's alternate location
+ *
+ * @store:            the QCowLogStore to work on
+ * @entry:            the entry to kick
+ * @entry_bucket_idx: the index of the entry to kick (in bucket)
+ * @ret:              0 on success, -1 if the hash table is full, -2 on kick
+ *                        path loop
+ */
+static int qcow2_log_store_kick_entry(QCowLogStore *store,
+                                      QCowHashInfo *entry,
+                                      uint64_t entry_bucket_idx)
+{
+    QCowHashInfo *dest_entry;
+    int ret = 0, in_bucket_idx;
+    uint64_t in_table_idx, tag;
+
+    /* is hash table full */
+    if (store->nb_kicks == (QCOW_LOG_STORE_MAX_KICKS - 1)) {
+        return -1;
+    }
+
+    store->nb_kicks++;
+
+    tag = qcow2_log_store_tag_by_bucket_idx(store, entry, entry_bucket_idx);
+
+    /* look if we can insert the kicked entry in the alternate location */
+    in_bucket_idx = qcow2_log_store_first_free_entry_idx(store, tag);
+
+    /* no room found in the alternate location we must kick again an entry */
+    if (in_bucket_idx == -1) {
+        in_bucket_idx = qcow2_log_store_get_kick_idx(store, tag);
+
+        /* the kick path is doing a loop */
+        if (in_bucket_idx == -2) {
+            return -2;
+        }
+
+        in_table_idx = tag * QCOW_LOG_STORE_BUCKET_SIZE + in_bucket_idx;
+        dest_entry = &store->hash_table[in_table_idx];
+
+        /* kick the random entry */
+        ret = qcow2_log_store_kick_entry(store,
+                                         dest_entry,
+                                         tag);
+
+        /* if table is full or kick path make a loop: propagate to the caller */
+        if (ret < 0) {
+            qcow2_log_store_unset_from_kick_path(store, in_table_idx);
+            return ret;
+        }
+    }
+
+    /* free room found, compute the index in the hash table */
+    in_table_idx = tag * QCOW_LOG_STORE_BUCKET_SIZE + in_bucket_idx;
+    /* the old bucket index become the alternate location (tag) */
+    qcow2_log_store_copy_to_hash_table(store->hash_table,
+                                       in_table_idx,
+                                       entry);
+    memset(entry, 0, sizeof(QCowHashInfo));
+    qcow2_log_store_unset_from_kick_path(store, in_table_idx);
+    return 0;
+}
+
+static int64_t qcow2_log_store_find_overwrite_idx(QCowLogStore *store,
+                                                  QCowHashInfo *hash_info)
+{
+    uint64_t h[2];
+    int i, idx;
+
+    h[0] = qcow2_dedup_h1(hash_info, store->order);
+    h[1] = qcow2_dedup_h2(hash_info, store->order);
+
+    for (i = 0; i < 2; i++) {
+        idx = qcow2_log_store_try_get(store,
+                                      hash_info,
+                                      h[i],      /* index */
+                                      true);
+
+        /* no suitable place found -> continue */
+        if (idx == -1) {
+            continue;
+        }
+
+        return h[i] * QCOW_LOG_STORE_BUCKET_SIZE + idx;
+    }
+
+    return -1;
+}
+
+static int64_t qcow2_log_store_find_any_free_idx(QCowLogStore *store,
+                                                 QCowHashInfo *hash_info)
+{
+    uint64_t h[2];
+    int i, idx;
+
+    h[0] = qcow2_dedup_h1(hash_info, store->order);
+    h[1] = qcow2_dedup_h2(hash_info, store->order);
+
+    /* try to find a free entry in either alternate location */
+    for (i = 0; i < 2; i++) {
+        idx = qcow2_log_store_first_free_entry_idx(store, h[i]);
+
+        /* no suitable place found -> continue */
+        if (idx == -1) {
+            continue;
+        }
+
+        return h[i] * QCOW_LOG_STORE_BUCKET_SIZE + idx;
+    }
+
+    /* no suitable place found */
+    return -1;
+}
+
+/* this function scan for the two alternative buckets of a given QCowHashInfo
+ * and return entry index if a suitable place is found
+ *
+ * @store:     the QCowLogStore to work on
+ * @hash_info: the QCowHashInfo we are trying to index
+ * @ret:        integer in range [0, (QCOW_LOG_STORE_BUCKET_SIZE - 1)] or
+ *              -1 if room not found
+ */
+static int64_t qcow2_log_store_scan_canditates_buckets(QCowLogStore *store,
+                                                       QCowHashInfo *hash_info)
+{
+    int ret = 0;
+
+    /* we try to overwrite in the two alternate location first because
+     * a kick may have swapped entries and we don't want duplicates.
+     */
+    ret = qcow2_log_store_find_overwrite_idx(store, hash_info);
+
+    /* place found return it */
+    if (ret != -1) {
+        return ret;
+    }
+
+    /* else try to find any free entry in the two alternate location */
+    return qcow2_log_store_find_any_free_idx(store, hash_info);
+}
+
+/* This function will return the index of the hash table entry to copy into
+ *
+ * It's a two step process : first the two alternate buckets are checked.
+ *     If they are not suitable the first entry of the first bucket is kicked to
+ *     it's alternate location. This process is recursive up to
+ *     QCOW_LOG_STORE_MAX_KICKS times.
+ *
+ * @store:     the QCowLogStore to work on
+ * @hash_info: the QCowHashInfo we are trying to index
+ * @ret:       the index (in the hash table) or -1 if the hash table is full
+ */
+static int64_t qcow2_log_store_get_in_table_idx(QCowLogStore *store,
+                                                QCowHashInfo *hash_info)
+{
+    QCowHashInfo *dest_entry;
+    int ret = 0, kick_retry;
+    uint64_t h[2];
+    int64_t in_table_idx;
+    int in_bucket_idx;
+
+    /* first check if any of the two candidate buckets has some room to do an
+     * insertion. If so return the index
+     */
+    in_table_idx = qcow2_log_store_scan_canditates_buckets(store, hash_info);
+
+    /* room available */
+    if (in_table_idx >= 0) {
+        return in_table_idx;
+    }
+
+    h[0] = qcow2_dedup_h1(hash_info, store->order);
+    h[1] = qcow2_dedup_h2(hash_info, store->order);
+
+    /* we try 10 time to kick in order to try differents kick path and
+     * avoid kick path loops
+     */
+    kick_retry = 10;
+    do {
+        store->nb_kicks = 0;
+        /* clear previous kick path */
+        qcow2_log_store_unset_from_kick_path(store, in_table_idx);
+
+        /* take the first not yet pushed entry of the h[0] bucket */
+        in_bucket_idx = qcow2_log_store_get_kick_idx(store, h[0]);
+
+        /* kick path cycle found */
+        if (in_bucket_idx == -2) {
+            in_table_idx = 0;
+            continue;
+        }
+
+        in_table_idx = h[0] * QCOW_LOG_STORE_BUCKET_SIZE + in_bucket_idx;
+
+        dest_entry = &store->hash_table[in_table_idx];
+
+        ret = qcow2_log_store_kick_entry(store, dest_entry, h[0]);
+    } while (ret == -2 && kick_retry);
+
+    /* hash table is full, or kick path was cycling multiple time */
+    if (ret < 0) {
+        return -1;
+    }
+
+    qcow2_log_store_unset_from_kick_path(store, in_table_idx);
+    return in_table_idx;
+}
+
+/* This function insert a given QCowHashInfo in the log store
+ *
+ * This function will overwrite an already inserted entry
+ *
+ * @store:     the QCowLogStore to insert into
+ * @hash_info: the QCowHashInfo to insert : at last the hash must be filled
+ * @need_save: will be set to true if new journal info must be saved later
+ * @ret:       0 on success, 1 if the log store is full, -errno on error
+ */
+int qcow2_log_store_insert(BlockDriverState *bs, QCowLogStore *store,
+                           QCowHashInfo *hash_info, bool *need_save)
+{
+    int64_t in_table_idx, offset;
+
+    *need_save = false;
+
+    /* find the place where to insert the entry in the cuckoo hash table */
+    in_table_idx = qcow2_log_store_get_in_table_idx(store, hash_info);
+
+    /* hash table is full so log store is full -> need to freeze log store */
+    if (in_table_idx == -1) {
+        return 1;
+    }
+
+    offset = qcow2_log_store_append_to_journal(bs, store, hash_info, need_save);
+
+    /* the journal is full and packing failed so we should treat the log
+     * store as full
+     */
+    if (offset == -1) {
+        return 1;
+    }
+
+    /* journal appending failed */
+    if (offset < 0) {
+        return offset;
+    }
+
+    qcow2_log_store_copy_to_hash_table(store->hash_table,
+                                       in_table_idx,
+                                       hash_info);
+
+    return 0;
+}
+
+/* This function delegate a flush to the QCowJournal of the log store
+ *
+ * @store: the QCowLogStore to work with
+ * @ret:   0 on success, -errno on error
+ */
+int qcow2_log_store_flush(BlockDriverState *bs, QCowLogStore *store)
+{
+    return qcow2_journal_flush(bs, &store->journal);
+}
+
+/* This function flush and stop the log store journal
+ *
+ * @store: the QCowLogStore to work with
+ * @ret:   0 on success, -errno on error
+ */
+int qcow2_log_store_stop(BlockDriverState *bs, QCowLogStore *store)
+{
+    return qcow2_journal_stop(bs, &store->journal);
+}
+
+/* This function index a QCowJournalEntry into the cuckoo hash table 
+ *
+ * @store:  the QCowLogStore to work on
+ * @entry:  the journal entry to convert validate and insert
+ * @offset: the offset of the entry in the journal
+ * @ret:    0 on success, -EINVAL on bogus journal entry
+ */
+static int qcow2_log_store_index_journal_entry(BlockDriverState *bs,
+                                               QCowLogStore *store,
+                                               QCowJournalEntry *entry,
+                                               uint64_t offset)
+{
+    QCowHashInfo hash_info;
+    int64_t in_table_idx;
+    int ret = 0;
+
+    ret = qcow2_hash_info_from_journal_entry(&hash_info, entry);
+
+    /* bogus entry size or invalid type */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* find the place where to insert the entry in the the cuckoo hash table */
+    in_table_idx = qcow2_log_store_get_in_table_idx(store, &hash_info);
+
+    /* the cuckoo hash table should not be full when rebuilding */
+    assert(in_table_idx >= 0);
+
+    /* we store the offset and the tag in the in ram hash table */
+    qcow2_log_store_copy_to_hash_table(store->hash_table,
+                                       in_table_idx,
+                                       &hash_info);
+
+    return 0;
+}
+
+/* This function rebuild the log store from its journal
+ *
+ * @store: the QCowLogStore to rebuild
+ * @ret:   0 on succes, -errno on error, -EINVAL on bogus journal entry
+ */
+int qcow2_log_store_rebuild_from_journal(BlockDriverState *bs,
+                                         QCowLogStore *store)
+{
+    QCowJournalEntry entry;
+    uint64_t offset = 0;
+    int ret = 0, size = 0;
+
+    /* we walk through the journal (qcow2_journal_read is an iterator) */
+    size = qcow2_journal_read(bs, &store->journal, offset, &entry);
+    /* we stop on obviously bogus sized entry and end of journal */
+    while (size >= QCOW_LOG_END_SIZE && entry.type != QCOW_LOG_NONE) {
+        /* it's a hash insert it */
+        if (entry.type == QCOW_LOG_HASH) {
+            ret = qcow2_log_store_index_journal_entry(bs,
+                                                      store,
+                                                      &entry,
+                                                      offset);
+
+        }
+        /* bogus entry -> return error */
+        if (ret < 0) {
+            return ret;
+        }
+        /* read next journal entry */
+        offset += size;
+        size = qcow2_journal_read(bs, &store->journal, offset, &entry);
+    }
+
+    /* bogus entry size */
+    if (size >= 0 &&
+        size < QCOW_LOG_END_SIZE &&
+        entry.type != QCOW_LOG_NONE) {
+        return -EINVAL;
+    }
+
+    /* on error */
+    if (size < 0) {
+        return size;
+    }
+
+    /* now inform the journal that it must append to existing data */
+    return qcow2_journal_resume(bs, &store->journal, offset);
+}
+
+/* the on disk size of a log store dump */
+size_t qcow2_log_store_dump_size(void)
+{
+    /* sizeof order + sizeof journal */
+    return sizeof(uint32_t) + qcow2_journal_dump_size();
+}
+
+/* This function dump a log store infos into buffer
+ *
+ * @buf:   the buffer to dump into
+ * @store: the log store to dump
+ * @ret:   the size of the dump
+ */
+size_t qcow2_log_store_dump(uint8_t *buf, QCowLogStore *store)
+{
+    uint32_t *buf32 = (uint32_t *) buf;
+
+    buf32[0] = cpu_to_be32(store->order);
+
+    return sizeof(store->order) +
+           qcow2_journal_dump(buf + sizeof(store->order),
+                              &store->journal);
+}
+
+/* this function parse a log store from disk
+ *
+ * @store: the store to parse into
+ * @buf: the buffer to parse from
+ * @ret: the size of the parsed data
+ */
+size_t qcow2_log_store_parse(QCowLogStore *store, uint8_t *buf)
+{
+    uint32_t *buf32 = (uint32_t *) buf;
+    uint32_t order;
+
+    order = be32_to_cpu(buf32[0]);
+    qcow2_log_store_reset(store, order);
+    qcow2_journal_parse(&store->journal, buf + sizeof(order));
+    return qcow2_log_store_dump_size();
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index adde631..d347471 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -635,4 +635,30 @@ size_t qcow2_journal_dump_size(void);
 size_t qcow2_journal_dump(uint8_t *buf, QCowJournal *journal);
 void qcow2_journal_parse(QCowJournal *journal, uint8_t *buf);
 
+/* qcow2-log-store.c functions */
+
+uint64_t qcow2_dedup_h1(QCowHashInfo *hash_info, uint32_t order);
+uint64_t qcow2_dedup_h2(QCowHashInfo *hash_info, uint32_t order);
+uint64_t qcow2_log_store_nb_entries(uint32_t order);
+int qcow2_log_store_create(BlockDriverState *bs,
+                           QCowLogStore *store,
+                           uint32_t order);
+int qcow2_log_store_recycle(BlockDriverState *bs,
+                            QCowLogStore *store);
+void qcow2_log_store_init(BlockDriverState *bs,
+                          QCowLogStore *store,
+                          uint32_t order);
+void qcow2_log_store_cleanup(QCowLogStore *store);
+int qcow2_log_store_compute_order(uint64_t min_nb_entries_required);
+bool qcow2_log_store_get(QCowLogStore *store, QCowHashInfo *hash_info);
+int qcow2_log_store_insert(BlockDriverState *bs, QCowLogStore *store,
+                           QCowHashInfo *hash_info, bool *need_save);
+int qcow2_log_store_flush(BlockDriverState *bs, QCowLogStore *store);
+int qcow2_log_store_stop(BlockDriverState *bs, QCowLogStore *store);
+int qcow2_log_store_rebuild_from_journal(BlockDriverState *bs,
+                                         QCowLogStore *store);
+size_t qcow2_log_store_dump_size(void);
+size_t qcow2_log_store_dump(uint8_t *buf, QCowLogStore *store);
+size_t qcow2_log_store_parse(QCowLogStore *store, uint8_t *buf);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 05/24] qcow2: Add the hash store.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (3 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 04/24] qcow2: Create the log store Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 06/24] qcow2: Add the deduplication store Benoît Canet
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

The hash store allow to do lookup in the set of previously frozen log stores.
Each frozen log store is called an incarnation. When the number of incarnation
reach a user defined setting oldest incarnations are dropped to avoid that the
dedup metadata grow without limit.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs      |    2 +-
 block/qcow2-hash-store.c |  802 ++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-log-store.c  |  242 ++++++++++++++
 block/qcow2.h            |   38 +++
 4 files changed, 1083 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-hash-store.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 8c0b646..cafd4f8 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,6 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
-block-obj-y += qcow2-journal.o qcow2-log-store.o
+block-obj-y += qcow2-journal.o qcow2-log-store.o qcow2-hash-store.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += vhdx.o
diff --git a/block/qcow2-hash-store.c b/block/qcow2-hash-store.c
new file mode 100644
index 0000000..5284740
--- /dev/null
+++ b/block/qcow2-hash-store.c
@@ -0,0 +1,802 @@
+/*
+ * QCOW2 hash store
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/qcow2.h"
+
+/* This function compute the number of bytes required to store a tag in hash
+ * store filters
+ *
+ * This function will only return values in the set [2, 4, 8] matching
+ * convenient compiler type sizes.
+ *
+ * @order: order of the hash store
+ * @ret:   the number of byte required to store a tag
+ */
+int qcow2_hash_store_get_tag_size(uint32_t order)
+{
+    if (order < 16) {
+        return 2;
+    }
+
+    if (order < 32) {
+        return 4;
+    }
+
+    return 8;
+}
+
+static uint64_t qcow2_hash_store_round_to_cluster(uint64_t size)
+{
+    return ((size / HASH_STORE_CLUSTER_SIZE) + 1) * HASH_STORE_CLUSTER_SIZE;
+}
+
+/* This function compute the size of an in ram filter
+ *
+ * @order: the order of the store
+ * @ret:   the size of the in ram filter in bytes
+ */
+uint64_t qcow2_hash_store_filter_size(uint32_t order)
+{
+    uint64_t size = qcow2_log_store_nb_entries(order) *
+                    qcow2_hash_store_get_tag_size(order);
+    return qcow2_hash_store_round_to_cluster(size);
+}
+
+/* this function make sure that no buckets are splitted between two clusters */
+static uint64_t qcow2_hash_store_round_to_bucket(uint64_t count)
+{
+    return (count / QCOW_LOG_STORE_BUCKET_SIZE) * QCOW_LOG_STORE_BUCKET_SIZE;
+}
+
+uint64_t qcow2_hash_store_nb_hash_info_per_cluster(void)
+{
+    uint64_t raw_nb = HASH_STORE_CLUSTER_SIZE / sizeof(QCowHashInfo);
+    return qcow2_hash_store_round_to_bucket(raw_nb);
+}
+
+/* This function compute the size of a frozen hash table that is dumped to disk
+ *
+ * @order: the order of the store
+ * @ret:   the size in bytes
+ */
+static uint64_t qcow2_hash_store_table_size(uint32_t order)
+{
+    uint32_t nb_per_cluster = qcow2_hash_store_nb_hash_info_per_cluster();
+    uint64_t count = (qcow2_log_store_nb_entries(order) / nb_per_cluster) + 1;
+    return count * HASH_STORE_CLUSTER_SIZE;
+}
+
+/* This function compute the size of the result of a log store incarnation
+ *
+ * @order: the order of the store
+ * @ret:   the size in bytes
+ */
+uint64_t qcow2_hash_store_incarnation_size(uint32_t order)
+{
+    return qcow2_hash_store_filter_size(order) +
+           qcow2_hash_store_table_size(order);
+}
+
+/* This function reset a hash store
+ *
+ * @store:           the hash store to initialize
+ * @order:           the bit order of the hash store
+ * @nb_incarnations: the number of incarnation in the hash store
+ * @nb_in_limbo:     the number of in limbo incarnation disk space to recycle
+ */
+void qcow2_hash_store_reset(QCowHashStore *store,
+                            uint32_t order,
+                            uint32_t nb_incarnations,
+                            uint32_t nb_in_limbo)
+{
+    store->order = order;
+    store->nb_incarnations = nb_incarnations;
+    store->nb_in_limbo = nb_in_limbo;
+}
+
+/* This function initialize a hash store
+ *
+ * @store:           the hash store to initialize
+ */
+void qcow2_hash_store_init(QCowHashStore *store, uint32_t order)
+{
+    store->order = order;
+    QTAILQ_INIT(&store->incarnations);
+    QTAILQ_INIT(&store->in_limbo);
+}
+
+/* This function cleanup a hash store : it free incarnation and limboes
+ *
+ * @store: the store to cleanup
+ */
+void qcow2_hash_store_cleanup(QCowHashStore *store)
+{
+    QCowIncarnation *incarnation, *next_incarnation;
+    QCowLimbo *limbo, *next_limbo;
+
+    /* destroy each incarnation structure */
+    QTAILQ_FOREACH_SAFE(incarnation,
+                        &store->incarnations,
+                        next, next_incarnation) {
+        QTAILQ_REMOVE(&store->incarnations, incarnation, next);
+        qcow2_hash_store_incarnation_free(incarnation);
+    }
+
+    /* destroy each limbo offset */
+    QTAILQ_FOREACH_SAFE(limbo, &store->in_limbo, next, next_limbo) {
+        QTAILQ_REMOVE(&store->in_limbo, limbo, next);
+        g_free(limbo);
+    }
+}
+
+/* this function read a tag from an in ram filter
+ *
+ * For explanation on purpose see qcow2_log_store_write_tag_to_filter
+ *
+ * @order:  the order of the store
+ * @index:  the index in the in ram filter
+ * @filter: the in ram filter
+ * @ret:    the tag stored at index in the in ram filter
+ */
+static uint64_t qcow2_hash_store_read_tag_from_filter(uint32_t order,
+                                                      uint64_t index,
+                                                      uint8_t *filter)
+{
+    uint16_t *filter_16 = (uint16_t *) filter;
+    uint32_t *filter_32 = (uint32_t *) filter;
+    uint64_t *filter_64 = (uint64_t *) filter;
+
+    switch (qcow2_hash_store_get_tag_size(order)) {
+    case 2:
+        return be16_to_cpu(filter_16[index]);
+    case 4:
+        return be32_to_cpu(filter_32[index]);
+    case 8:
+        return be64_to_cpu(filter_64[index]);
+    default:
+        /* should never pass here given qcow2_hash_store_get_tag_size results */
+        abort();
+    }
+    return 0;
+}
+
+static bool qcow2_hash_store_max_incarnations_reached(BlockDriverState *bs,
+                                                      QCowHashStore *store)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    /* max never reached if limit is zero */
+    if (!s->dedup_max_incarnations) {
+        return false;
+    }
+
+    return store->nb_incarnations >= s->dedup_max_incarnations;
+}
+
+static QCowLimbo * qcow2_hash_store_limbo_new(uint64_t offset)
+{
+    QCowLimbo *result = g_new0(QCowLimbo, 1);
+    result->offset = offset;
+    return result;
+}
+
+static QCowIncarnation *qcow2_hash_store_incarnation_new(uint8_t *filter,
+                                                         uint64_t offset,
+                                                         uint32_t order)
+{
+    QCowIncarnation *incarnation;
+
+    incarnation = g_new0(QCowIncarnation, 1);
+    incarnation->filter = filter;
+    incarnation->filter_offset = offset;
+    incarnation->hash_table_offset = offset +
+                                     qcow2_hash_store_filter_size(order);
+    incarnation->size = qcow2_hash_store_incarnation_size(order);
+
+    return incarnation;
+}
+
+void qcow2_hash_store_incarnation_free(QCowIncarnation *incarnation)
+{
+    qemu_vfree(incarnation->filter);
+    free(incarnation);
+}
+
+/* This function allocate the disk space needed for an incarnation
+ *
+ * @order: the bit order of the incarnation size
+ * @ret:   offset of allocated disk space on succes, -errno on error
+ */
+static int64_t qcow2_hash_store_allocate_incarnation_space(BlockDriverState *bs,
+                                                           uint32_t order)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t size = qcow2_hash_store_incarnation_size(order);
+    int64_t offset;
+    int ret = 0;
+
+    /* allocate from disk the combined size of the filter and the hash table */
+    offset = qcow2_alloc_clusters(bs, size);
+
+    if (offset < 0) {
+        return offset;
+    }
+
+    /* flush refcounts so we don't loose allocated space on crash */
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+
+    if (ret < 0) {
+        qcow2_free_clusters(bs, offset, size);
+        return ret;
+    }
+
+    return offset;
+}
+
+/* This function get the on disk offset needed for storing an incarnation
+ *
+ * It will try to recycle in limbo incarnation space if there is any
+ * or will allocate space from disk
+ *
+ * @store: the QCowHashStore to work on
+ * @ret:   disk offset on succes, -errno on error
+ */
+int64_t qcow2_hash_store_get_incarnation_offset(BlockDriverState *bs,
+                                                QCowHashStore *store)
+{
+    QCowLimbo *limbo;
+    int64_t offset;
+
+    /* no previously allocated space in limbo -> allocate it */
+    if (QTAILQ_EMPTY(&store->in_limbo)) {
+        return qcow2_hash_store_allocate_incarnation_space(bs, store->order);
+    }
+
+    /* take oldest in limbo allocated space and remove it from list */
+    limbo = QTAILQ_LAST(&store->in_limbo, in_limbo_head);
+    QTAILQ_REMOVE(&store->in_limbo, limbo, next);
+    store->nb_in_limbo--;
+
+    offset = limbo->offset;
+    free(limbo);
+
+    /* TODO: save new configuration to disk */
+
+    return offset;
+}
+
+/* This function will forget the oldest incarnation and put it's disk space
+ * offset in the limbo list
+ *
+ * note: this function does not save the new configuration to disk as it's
+ * caller will do it
+ *
+ * store: the hash store to work on
+ */
+static void qcow2_hash_store_forget_oldest_incarnation(QCowHashStore *store)
+{
+    QCowIncarnation *incarnation;
+    QCowLimbo *limbo;
+
+    /* get oldest elements of the incarnation list */
+    incarnation = QTAILQ_LAST(&store->incarnations, incarnations_head);
+
+    /* remove it from incarnation list */
+    QTAILQ_REMOVE(&store->incarnations, incarnation, next);
+    store->nb_incarnations--;
+
+    /* transform the incarnation disk space in a to reincarnate space */
+    limbo = qcow2_hash_store_limbo_new(incarnation->filter_offset);
+
+    /* insert it in the limbo list */
+    QTAILQ_INSERT_HEAD(&store->in_limbo, limbo, next);
+    store->nb_in_limbo++;
+
+    /* free the incarnation from ram */
+    qcow2_hash_store_incarnation_free(incarnation);
+}
+
+/* This functions forget all the incarnations and recycle them as limbo
+ *
+ * note: the called must save the new configuration
+ *
+ * @store: the QCowHashStore to work on
+ */
+void qcow2_hash_store_forget_all_incarnations(QCowHashStore *store)
+{
+    while (store->nb_incarnations) {
+        qcow2_hash_store_forget_oldest_incarnation(store);
+    }
+
+    /* TODO: need to save configuration to disk */
+}
+
+/* This function will add a newly created incarnation to a hash store
+ *
+ * @store:  the hash store to work on
+ * @filter: the in ram filter of the incarnation
+ * @offset: the on disk offset of the incarnation
+ */
+void qcow2_hash_store_add_incarnation(BlockDriverState *bs,
+                                      QCowHashStore *store,
+                                      uint8_t *filter,
+                                      uint64_t offset)
+{
+    QCowIncarnation *incarnation;
+
+    /* behave as a FIFO if the maximum number of incarnation is reached */
+    while (qcow2_hash_store_max_incarnations_reached(bs, store)) {
+        qcow2_hash_store_forget_oldest_incarnation(store);
+    }
+
+    /* create new incarnation */
+    incarnation = qcow2_hash_store_incarnation_new(filter,
+                                                   offset,
+                                                   store->order);
+
+    /* insert at the begining of the list */
+    QTAILQ_INSERT_HEAD(&store->incarnations, incarnation, next);
+    store->nb_incarnations++;
+
+    /* TODO: save new configuration to disk */
+}
+
+/* This function copy the QCowHashInfo from a buffer if the hash is the same
+ *
+ * @hash_info:  the QCowHashInfo we are trying to read the infos for
+ * @buf:        a buffer containing QCowHashInfos
+ * @in_buf_idx: the offset in the buffer in sizeof(QCowHashInfo) chunks
+ * @ret:        true if found else false
+ */
+static bool qcow2_hash_store_get_hash_info(QCowHashInfo *hash_info,
+                                           uint8_t *buf,
+                                           uint64_t in_buf_idx)
+{
+    QCowHashInfo *info;
+    bool found = false;
+
+    info = (QCowHashInfo *) buf;
+
+    /* compare QCowHash to check if the hash info is found */
+    found = !memcmp(&hash_info->hash,
+                    &info[in_buf_idx].hash,
+                    sizeof(QCowHash));
+
+    /* if hash found copy the associated informations */
+    if (found) {
+        hash_info->physical_sect =
+            be64_to_cpu(info[in_buf_idx].physical_sect);
+        hash_info->first_logical_sect =
+            be64_to_cpu(info[in_buf_idx].first_logical_sect);
+    }
+
+    return found;
+}
+
+/* compute disk offset of the cluster to read  */
+static uint64_t qcow2_hash_store_buf_offset(uint64_t hash_table_offset,
+                                            uint64_t in_table_idx)
+{
+    uint32_t nb_in_cluster = qcow2_hash_store_nb_hash_info_per_cluster();
+    uint64_t cluster_index = in_table_idx / nb_in_cluster;
+    return hash_table_offset + cluster_index * HASH_STORE_CLUSTER_SIZE;
+}
+
+/* allocate buffer and read cluster from disk if *buf == NULL
+ *
+ * @offfset: the on disk offset of the cluster to read
+ * @buf:     the pointer to return the allocated area into
+ * @ret:     0 on nop, >0 on success and -errno on error
+ */
+static int qcow2_hash_store_cache_cluster(BlockDriverState *bs,
+                                          uint64_t offset,
+                                          uint8_t **buf)
+{
+    /* buffer already allocated and filed with disk data */
+    if (*buf) {
+        return 0;
+    }
+
+    /* allocate buffer */
+    *buf = qemu_blockalign(bs, HASH_STORE_CLUSTER_SIZE);
+
+    /* read cluster from disk */
+    return bdrv_pread(bs->file,
+                      offset,
+                      *buf,
+                      HASH_STORE_CLUSTER_SIZE);
+}
+
+/* this function probe the filter for a given tag and try to read from disk
+ * the QCowHashInfo if the tag match.
+ *
+ * @store:        the QCowHashStore we work with
+ * @incarnation:  the incarnation to work with
+ * @in_table_idx: the index in the filter and the hash table
+ * @tag:          the tag to try to match
+ * @hash_info:    the QCowHashInfo we are trying to complete
+ * @buf:          a pointer to pass a read buffer between the calls
+ * @ret:          1 on if found, 0 if not, -errno on error
+ */
+static int qcow2_hash_store_probe_read_entry(BlockDriverState *bs,
+                                             QCowHashStore *store,
+                                             QCowIncarnation *incarnation,
+                                             uint64_t in_table_idx,
+                                             uint64_t tag,
+                                             QCowHashInfo *hash_info,
+                                             uint8_t **buf)
+{
+    uint64_t filter_tag;
+    uint64_t in_buf_idx;
+    uint64_t buf_offset;
+    int ret = 0;
+
+    filter_tag = qcow2_hash_store_read_tag_from_filter(store->order,
+                                                       in_table_idx,
+                                                       incarnation->filter);
+
+    /* no matching tag found */
+    if (tag != filter_tag) {
+        return 0;
+    }
+
+    /* the code will allocate the buffer and read the cluster content from disk
+     * the first time it pass here (*buf == NULL)
+     */
+    buf_offset = qcow2_hash_store_buf_offset(incarnation->hash_table_offset,
+                                             in_table_idx);
+    ret = qcow2_hash_store_cache_cluster(bs,
+                                         buf_offset,
+                                         buf);
+
+    /* on read error */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* check if this entry contains the QCowHashInfo we are looking for
+     * and copy it's associated informations if so
+     */
+    in_buf_idx = in_table_idx % qcow2_hash_store_nb_hash_info_per_cluster();
+    return qcow2_hash_store_get_hash_info(hash_info,
+                                          *buf,
+                                          in_buf_idx);
+}
+
+/* This function will probe the entries of a bucket for a given tag and try
+ * to read the missing QCowHashInfo from disk if an entry is a good candidate
+ *
+ * @store:       the store to work on
+ * @incarnation: the incarnation we are probing
+ * @bucket_idx:  the index of the bucket
+ * @tag:         the tag used for the probe
+ * @hash_info:   the QCowHashInfo we are trying to complete
+ * @ret:         1 on success, 0 if not found, -errno on error
+ */
+static int qcow2_hash_store_probe_read(BlockDriverState *bs,
+                                       QCowHashStore *store,
+                                       QCowIncarnation *incarnation,
+                                       uint64_t bucket_idx,
+                                       uint64_t tag,
+                                       QCowHashInfo *hash_info)
+{
+    uint64_t in_table_idx;
+    uint8_t *buf = NULL; /* NULL by default for qemu_vfree */
+    int ret = 0;
+    int i;
+
+    /* will scan each bucket entry
+     * note that the code handle the case where multiple entry have the same tag
+     */
+    for (i = 0; i < QCOW_LOG_STORE_BUCKET_SIZE; i++) {
+
+        in_table_idx = bucket_idx * QCOW_LOG_STORE_BUCKET_SIZE + i;
+
+        ret = qcow2_hash_store_probe_read_entry(bs,
+                                                store,
+                                                incarnation,
+                                                in_table_idx,
+                                                tag,
+                                                hash_info,
+                                                &buf);
+
+        if (ret) {
+             break;
+        }
+    }
+
+    /* will not free the buffer if not allocated because of NULL */
+    qemu_vfree(buf);
+    return ret;
+}
+
+/* This function check if a hash match the in ram filter and read it from disk
+ *
+ * @store:       the store to work on
+ * @incarnation: the incarnation containing the in ram filter
+ * @hash_info:   the QCowHashInfo to complete
+ * @ret:         1 on success, 0 if not found, -errno on error
+ */
+static int qcow2_hash_store_try_get(BlockDriverState *bs,
+                                    QCowHashStore *store,
+                                    QCowIncarnation *incarnation,
+                                    QCowHashInfo *hash_info)
+{
+    uint64_t h[2];
+    int i;
+    int ret = 0;
+
+    /* incarnation filter probably not loaded from disk yet -> not found */
+    if (!incarnation->filter) {
+        return 0;
+    }
+
+    h[0] = qcow2_dedup_h1(hash_info, store->order);
+    h[1] = qcow2_dedup_h2(hash_info, store->order);
+
+    for (i = 0; i < 2; i++) {
+        /* first check the in ram filter */
+        ret = qcow2_hash_store_probe_read(bs,
+                                          store,
+                                          incarnation,
+                                          h[i],        /* index */
+                                          h[1-i],      /* tag */
+                                          hash_info);
+
+        /* not found in filter */
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/* This function look for a given hash info in the hash store
+ *
+ * @store:     the QCowHashStore to lookup into
+ * @hash_info: the QCowHashInfo to complete : at last the hash must be filled.
+ * @ret:       > 1 on successfull lookup, 0 if not found, -errno on error
+ */
+int qcow2_hash_store_get(BlockDriverState *bs, QCowHashStore *store,
+                         QCowHashInfo *hash_info)
+{
+    QCowIncarnation *incarnation;
+    int ret = 0;
+
+    /* we walk the list of incarnation from the youngest to the oldest */
+    QTAILQ_FOREACH(incarnation, &store->incarnations, next) {
+        /* check if this incarnation contains the hash, try to get it if so */
+        ret = qcow2_hash_store_try_get(bs,
+                                       store,
+                                       incarnation,
+                                       hash_info);
+
+        /* QCowHashInfo found or an error happened */
+        if (ret) {
+            return ret;
+        }
+    }
+
+    /* return not found */
+    return 0;
+}
+
+/* This function loads a single incarnation in ram filter from disk
+ *
+ * @incarnation: the incarnation to load the filer into
+ * @order:       the bit order
+ * @ret:         0 on success, -errno on error
+ */
+static int qcow2_hash_store_load_filter(BlockDriverState *bs,
+                                        QCowIncarnation *incarnation,
+                                        uint32_t order)
+{
+    int ret = 0;
+    uint64_t size = qcow2_hash_store_filter_size(order);
+    uint8_t *buf = qemu_blockalign(bs, size);
+
+    ret = bdrv_pread(bs->file, incarnation->filter_offset, buf, size);
+
+    if (ret < 0) {
+        qemu_vfree(buf);
+        return ret;
+    }
+
+    /* activate filter */
+    incarnation->filter = buf;
+    /* other fields where filled when allocating the incarnation */
+
+    return 0;
+}
+
+/* This structure is used only between the two next functions */
+typedef struct {
+    BlockDriverState *bs;
+    QCowHashStore *store;
+} QCowLoadFilterArgs;
+
+/* This coroutine will load all incarnations filters at startup
+ *
+ * @opaque: a QCowLoadFilterArgs containing a bs and the store
+ */
+static void coroutine_fn qcow2_hash_store_co_load_filters(void *opaque)
+{
+    QCowLoadFilterArgs *args = (QCowLoadFilterArgs *) opaque;
+    QCowIncarnation *incarnation;
+    int ret = 0;
+
+    /* yield so qcow2 will continue to start */
+    qemu_coroutine_yield();
+
+    QTAILQ_FOREACH(incarnation, &args->store->incarnations, next) {
+        ret = qcow2_hash_store_load_filter(args->bs,
+                                           incarnation,
+                                           args->store->order);
+
+        if (ret < 0) {
+            free(args);
+            return;
+        }
+    }
+
+    free(args);
+}
+
+/* This function starts the coroutine responsible for loading the RAM filters
+ *
+ * @store: the QCowHashStore to work on
+ */
+void qcow2_hash_store_load_filters(BlockDriverState *bs,
+                                   QCowHashStore *store)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    /* will pass this as argument to the coroutine */
+    QCowLoadFilterArgs *args = g_new0(QCowLoadFilterArgs, 1);
+    args->bs = bs;
+    args->store = store;
+
+    s->load_filter_co =
+        qemu_coroutine_create(qcow2_hash_store_co_load_filters);
+    qemu_coroutine_enter(s->load_filter_co, args);
+}
+
+/* compute buffer space required to dump the hash store configuration
+ *
+ * @store: the hash store to work on
+ * @ret:   the space the hash store dump will take
+ */
+size_t qcow2_hash_store_dump_size(QCowHashStore *store)
+{
+    return sizeof(store->order) +
+           sizeof(store->nb_incarnations) +
+           sizeof(store->nb_in_limbo) +
+           store->nb_incarnations * sizeof(uint64_t) +
+           store->nb_in_limbo * sizeof(uint64_t);
+}
+
+/* dump a hash store in given buffer and return the buffer space size
+ *
+ * @buf:   the buffer to dump into
+ * @store: the store to dump
+ * @ret:   the used space size
+ */
+size_t qcow2_hash_store_dump(uint8_t *buf, QCowHashStore *store)
+{
+    QCowIncarnation *incarnation;
+    QCowLimbo *limbo;
+    uint32_t *buf32 = (uint32_t *) buf;
+    uint64_t *buf64;
+
+    buf32[0] = cpu_to_be32(store->order);
+    buf32[1] = cpu_to_be32(store->nb_incarnations);
+    buf32[2] = cpu_to_be32(store->nb_in_limbo);
+
+    /* we will write next data here */
+    buf64 = (uint64_t *) (buf + 3 * sizeof(uint32_t));
+
+    /* dump each incarnation offset */
+    QTAILQ_FOREACH(incarnation, &store->incarnations, next) {
+        *buf64 = cpu_to_be64(incarnation->filter_offset);
+        buf64++;
+    }
+
+    /* dump each limbo offset */
+    QTAILQ_FOREACH(limbo, &store->in_limbo, next) {
+        *buf64 = cpu_to_be64(limbo->offset);
+        buf64++;
+    }
+
+    /* return size */
+    return qcow2_hash_store_dump_size(store);
+}
+
+/* parse the given buffer into a QCowHashStore
+ *
+ * @store:   the hash store to parse into
+ * @buf:     the buffer to parse from
+ * @buf_end: the end of the buffer to parse from
+ * @ret:     the parsed size or -1 on error
+ */
+size_t qcow2_hash_store_parse(QCowHashStore *store,
+                              uint8_t *buf,
+                              uint8_t *buf_end)
+{
+    QCowIncarnation *incarnation;
+    QCowLimbo *limbo;
+    uint32_t *buf32 = (uint32_t *) buf;
+    uint64_t *buf64, *buf64_end;
+    uint32_t i;
+
+    buf64_end = (uint64_t *) buf_end;
+
+    /* reset hash store */
+    qcow2_hash_store_reset(store,
+                           be32_to_cpu(buf32[0]),  /* order */
+                           be32_to_cpu(buf32[1]),  /* nb_incarnations */
+                           be32_to_cpu(buf32[2])); /* nb_in_limbo */
+
+    /* we will read from here */
+    buf64 = (uint64_t *) (buf + 3 * sizeof(uint32_t));
+
+    /* instanciate each incarnation */
+    for (i = 0;
+         i < store->nb_incarnations && buf64 <= (buf64_end - 1);
+         i++) {
+        /* the filter will be loaded later in a coroutine and NULL filter
+         * won't harm
+         */
+        incarnation = qcow2_hash_store_incarnation_new(NULL,/* filter */
+                          be64_to_cpu(*buf64),            /* offset */
+                          store->order);
+        QTAILQ_INSERT_HEAD(&store->incarnations, incarnation, next);
+        buf64++;
+    }
+
+    /* buffer too small */
+    if (i < store->nb_incarnations) {
+        return -1;
+    }
+
+    /* instanciate each limbo disk space */
+    for (i = 0;
+         i < store->nb_in_limbo && buf64 <= (buf64_end - 1);
+         i++) {
+        limbo = qcow2_hash_store_limbo_new(be64_to_cpu(*buf64));
+        QTAILQ_INSERT_HEAD(&store->in_limbo, limbo, next);
+        buf64++;
+    }
+
+    if (i < store->nb_in_limbo) {
+        return -1;
+    }
+
+    return buf_end - buf;
+}
diff --git a/block/qcow2-log-store.c b/block/qcow2-log-store.c
index 24ae310..b0ebc28 100644
--- a/block/qcow2-log-store.c
+++ b/block/qcow2-log-store.c
@@ -283,6 +283,14 @@ static uint64_t qcow2_log_store_tag_by_bucket_idx(QCowLogStore *store,
     return 0;
 }
 
+static uint64_t qcow2_log_store_tag_by_table_idx(QCowLogStore *store,
+                                                 QCowHashInfo *hash_info,
+                                                 uint64_t in_table_idx)
+{
+    return qcow2_log_store_tag_by_bucket_idx(store, hash_info, in_table_idx /
+                                             QCOW_LOG_STORE_BUCKET_SIZE);
+}
+
 /* This function calculate the order (number of bit) required
  *
  * @min_nb_entries_required: minimal number of entry the log store must handle
@@ -997,6 +1005,240 @@ int qcow2_log_store_rebuild_from_journal(BlockDriverState *bs,
     return qcow2_journal_resume(bs, &store->journal, offset);
 }
 
+/* This function write the signifiant portion of a tag into a in ram filter
+ *
+ * This function is here to cope with the fact that the filter elements size
+ * will vary to keep the false lookup probability low.
+ * The bulk of the code will manipulate uint64_t variable and utility functions
+ * like this one will be used to write into and read from the in ram filter.
+ *
+ * The function takes care of endianess issues as the filter will be stored on
+ * disk.
+ *
+ * @store:  the QCowLogStore to work with
+ * @index:  the index of the element to write
+ * @tag:    the tag that will act as a filter element
+ * @filter: the in ram filter to write to
+ */
+static void qcow2_log_store_write_tag_to_filter(QCowLogStore *store,
+                                                uint64_t index,
+                                                uint64_t tag,
+                                                uint8_t *filter)
+{
+    uint16_t *filter_16 = (uint16_t *) filter;
+    uint32_t *filter_32 = (uint32_t *) filter;
+    uint64_t *filter_64 = (uint64_t *) filter;
+
+    switch (qcow2_hash_store_get_tag_size(store->order)) {
+    case 2:
+        filter_16[index] = cpu_to_be16((uint16_t) tag);
+        break;
+    case 4:
+        filter_32[index] = cpu_to_be32((uint32_t) tag);
+        break;
+    case 8:
+        filter_64[index] = cpu_to_be64((uint64_t) tag);
+        break;
+    default:
+        /* should never pass here given qcow2_hash_store_get_tag_size results */
+        abort();
+    }
+}
+
+/* This function create a ram filter from a QCowLogStore and write it to disk
+ *
+ * @store:  the QCowLogStore to create the filter from
+ * @offset: the on disk offset to write the filter to
+ * @filter: the filter pointer to allocate, write and return the data into
+ * @ret:    0 on success, -errno on error
+ */
+int qcow2_log_store_freeze_filter(BlockDriverState *bs,
+                                  QCowLogStore *store,
+                                  uint64_t offset,
+                                  uint8_t **filter)
+{
+    QCowHashInfo *entry;
+    uint64_t i, size, tag;
+    int ret = 0;
+
+    /* allocate the filter with an alignement suitable for disk writes */
+    size = qcow2_hash_store_filter_size(store->order);
+    *filter = qemu_blockalign(bs, size);
+    /* zero allocated memory */
+    memset(*filter, 0, size);
+
+    /* build the filter */
+    for (i = 0; i < qcow2_log_store_nb_entries(store->order); i++) {
+        /* only store used entries */
+        if (!qcow2_log_store_is_used_by_table_idx(store->hash_table, i)) {
+            continue;
+        }
+
+        entry = &store->hash_table[i];
+        tag = qcow2_log_store_tag_by_table_idx(store, entry, i); 
+        qcow2_log_store_write_tag_to_filter(store,
+                                            i,
+                                            tag,
+                                            *filter);
+    }
+
+    /* we should not to write on disk if vm is suspended (migration) */
+    co_sleep_ns(vm_clock, 1);
+
+    /* dump the filter to disk */
+    ret = bdrv_pwrite(bs->file, offset, *filter, size);
+
+    if (ret < 0) {
+        goto error_exit;
+    }
+
+    return 0;
+
+error_exit:
+    qemu_vfree(*filter);
+    *filter = NULL;
+    return ret;
+}
+
+/* Write the buffer used to freeze the log store to disk and reset buffer state
+ *
+ * @store: the QCowHashStore to work on
+ * @ret:   0 on succes, -errno on error
+ */
+static int qcow2_log_store_write_freeze_buffer(BlockDriverState *bs,
+                                               QCowLogStore *store)
+{
+    int ret = 0;
+
+    /* we should not to write on disk if vm is suspended (migration) */
+    co_sleep_ns(vm_clock, 1);
+
+    /* write to disk */
+    ret = bdrv_pwrite(bs->file,
+                      store->write_buf_offset,
+                      store->write_buf,
+                      HASH_STORE_CLUSTER_SIZE);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* prepare for next batch of QCowHashInfo */
+    memset(store->write_buf, 0, HASH_STORE_CLUSTER_SIZE);
+    store->write_buf_offset += HASH_STORE_CLUSTER_SIZE;
+    store->in_buf_offset = 0;
+
+    /* success */
+    return 0;
+}
+
+static bool qcow_log_store_freeze_buffer_full(QCowLogStore *store)
+{
+    /* we make sure that there is not bucket splitted between two clusters */
+    uint64_t count = store->in_buf_offset / sizeof(QCowHashInfo);
+    return count == qcow2_hash_store_nb_hash_info_per_cluster();
+}
+
+static bool qcow2_log_store_is_entry_at_right_place(QCowLogStore *store,
+                                                    QCowHashInfo *entry,
+                                                    uint64_t in_table_idx)
+{
+    uint64_t h[2];
+    uint64_t bucket_idx = in_table_idx / QCOW_LOG_STORE_BUCKET_SIZE;
+
+    h[0] = qcow2_dedup_h1(entry, store->order);
+    h[1] = qcow2_dedup_h2(entry, store->order);
+
+    /* check that the bucket index match one of the two sub hashes */
+    return (bucket_idx == h[0]) || (bucket_idx == h[1]);
+}
+
+/* Append a given log store entry to a frozen hash table suitable for hash store
+ *
+ * @store: the QCowLogStore to work on
+ * @index: the index of the entry in the QCowLogStore hash table
+ * @ret:   0 on succes, -errno on error
+ */
+static int qcow2_log_store_add_entry_to_frozen_hash_table(BlockDriverState *bs,
+                                                          QCowLogStore *store,
+                                                          uint64_t in_table_idx)
+{
+    QCowHashInfo hash_entry;
+    int ret = 0;
+    bool used = qcow2_log_store_is_used_by_table_idx(store->hash_table,
+                                                     in_table_idx);
+
+    /* get hash info from table if entry is used */
+    if (used) {
+        qcow2_log_store_copy_from_hash_table(&hash_entry,
+                                             store->hash_table,
+                                             in_table_idx);
+        hash_entry.physical_sect = cpu_to_be64(hash_entry.physical_sect);
+        hash_entry.first_logical_sect =
+            cpu_to_be64(hash_entry.first_logical_sect);
+    }
+
+    /* write buffer to disk if needed */
+    if (qcow_log_store_freeze_buffer_full(store)) {
+        ret = qcow2_log_store_write_freeze_buffer(bs, store);
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* validate position of the entry in the hash table */
+    if (used) {
+         assert(qcow2_log_store_is_entry_at_right_place(store,
+                                                        &hash_entry,
+                                                        in_table_idx));
+    }
+
+    /* the entry is used copy the journal entry QCowHashInfo */
+    if (used) {
+        memcpy(store->write_buf + store->in_buf_offset,
+               &hash_entry,
+               sizeof(QCowHashInfo));
+    } else {
+        memset(store->write_buf + store->in_buf_offset, 0,
+               sizeof(QCowHashInfo));
+    }
+
+    store->in_buf_offset += sizeof(QCowHashInfo);
+    return 0;
+}
+
+/* This function freeze the log store into an incarnation hash table
+ *
+ * @store:  the store to work on
+ * @offset: the on disk start offset of the hash table
+ * @ret:    0 on succes, -errno on error
+ */
+int qcow2_log_store_freeze_hash_table(BlockDriverState *bs,
+                                      QCowLogStore *store,
+                                      uint64_t offset)
+{
+    uint64_t i;
+    int ret = 0;
+
+    /* initialize data required to do the freeze */
+    memset(store->write_buf, 0, HASH_STORE_CLUSTER_SIZE);
+    store->write_buf_offset = offset;
+    store->in_buf_offset = 0;
+
+    /* build the hash table */
+    for (i = 0; i < qcow2_log_store_nb_entries(store->order); i++) {
+        ret = qcow2_log_store_add_entry_to_frozen_hash_table(bs, store, i);
+
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    /* write the last buffer to disk */
+    return qcow2_log_store_write_freeze_buffer(bs, store);
+}
+
 /* the on disk size of a log store dump */
 size_t qcow2_log_store_dump_size(void)
 {
diff --git a/block/qcow2.h b/block/qcow2.h
index d347471..d0037bf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -79,6 +79,7 @@
                                         * on the log store size
                                         */
 #define QCOW2_NB_INCARNATION_GOAL  128 /* targeted number of incarnation */
+#define QCOW2_STORE_CO_SLEEP_NS    (100 * 1000 * 1000) /* 0.1 s in ns */
 
 #define QCOW_DEDUP_DIRTY 1 /* dirty flag in the qcow2 header extension */
 
@@ -657,8 +658,45 @@ int qcow2_log_store_flush(BlockDriverState *bs, QCowLogStore *store);
 int qcow2_log_store_stop(BlockDriverState *bs, QCowLogStore *store);
 int qcow2_log_store_rebuild_from_journal(BlockDriverState *bs,
                                          QCowLogStore *store);
+int qcow2_log_store_freeze_filter(BlockDriverState *bs,
+                                  QCowLogStore *store,
+                                  uint64_t offset,
+                                  uint8_t **filter);
+int qcow2_log_store_freeze_hash_table(BlockDriverState *bs,
+                                      QCowLogStore *store,
+                                      uint64_t offset);
 size_t qcow2_log_store_dump_size(void);
 size_t qcow2_log_store_dump(uint8_t *buf, QCowLogStore *store);
 size_t qcow2_log_store_parse(QCowLogStore *store, uint8_t *buf);
 
+/* qcow2-hash-store.c functions */
+
+int qcow2_hash_store_get_tag_size(uint32_t order);
+uint64_t qcow2_hash_store_nb_hash_info_per_cluster(void);
+uint64_t qcow2_hash_store_incarnation_size(uint32_t order);
+void qcow2_hash_store_reset(QCowHashStore *store,
+                            uint32_t order,
+                            uint32_t nb_incarnations,
+                            uint32_t nb_in_limbo);
+void qcow2_hash_store_init(QCowHashStore *store, uint32_t order);
+void qcow2_hash_store_cleanup(QCowHashStore *store);
+uint64_t qcow2_hash_store_filter_size(uint32_t order);
+void qcow2_hash_store_incarnation_free(QCowIncarnation *incarnation);
+int64_t qcow2_hash_store_get_incarnation_offset(BlockDriverState *bs,
+                                                QCowHashStore *store);
+void qcow2_hash_store_forget_all_incarnations(QCowHashStore *store);
+void qcow2_hash_store_add_incarnation(BlockDriverState *bs,
+                                      QCowHashStore *hash_store,
+                                      uint8_t *filter,
+                                      uint64_t offset);
+int qcow2_hash_store_get(BlockDriverState *bs, QCowHashStore *store,
+                         QCowHashInfo *hash_info);
+void qcow2_hash_store_load_filters(BlockDriverState *bs,
+                                   QCowHashStore *store);
+size_t qcow2_hash_store_dump_size(QCowHashStore *store);
+size_t qcow2_hash_store_dump(uint8_t *buf, QCowHashStore *store);
+size_t qcow2_hash_store_parse(QCowHashStore *store,
+                                uint8_t *buf,
+                                uint8_t *buf_end);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 06/24] qcow2: Add the deduplication store.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (4 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 05/24] qcow2: Add the hash store Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 07/24] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

The key value store is an SSD optimised store used to store deduplication's
QCowHashInfo.
It binds together the QCowJournal, the QCowLogStore and the QCowHashStore and
must be called be the deduplication code.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs |    2 +-
 block/qcow2-store.c |  771 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h       |   20 +-
 3 files changed, 791 insertions(+), 2 deletions(-)
 create mode 100644 block/qcow2-store.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index cafd4f8..2d1a269 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,6 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
-block-obj-y += qcow2-journal.o qcow2-log-store.o qcow2-hash-store.o
+block-obj-y += qcow2-journal.o qcow2-log-store.o qcow2-hash-store.o qcow2-store.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += vhdx.o
diff --git a/block/qcow2-store.c b/block/qcow2-store.c
new file mode 100644
index 0000000..8e3fad5
--- /dev/null
+++ b/block/qcow2-store.c
@@ -0,0 +1,771 @@
+/*
+ * QCOW2 key value store for SSD storage
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/qcow2.h"
+
+/* The qcow2_store combine the qcow2_log_store with the qcow2_hash_store->
+ *
+ * insertions:
+ * Entries are inserted in the qcow2_log_store which delegate writing to disk
+ * to the qcow2_journal.
+ * When full the qcow2_log_store is frozen into an incarnation and given to the
+ * qcow2_hash_store-> A new qcow2_log store is then created
+ *
+ * deletions:
+ * Deletions are insertions with an additional flag set and a mandatory
+ * write flush of the journal: loosing an insertion is not a problem whereas
+ * loosing a deletion is a problem.
+ *
+ * queries:
+ * The log store is first queried and if the entry is not found in it the hash
+ * store is queried.
+ * When queried the hash store it will check for the presence of the
+ * QCowHashInfo in every incarnation starting from the youngest to the oldest.
+ *
+ * fifo:
+ * A user configurable setting allow the qcow2_store to act as a fifo.
+ * the oldest QCowHashInfos stored in it will be dropped an incarnation at once.
+ * Loosing the oldest QCowHashInfo won't corrupt the dedup metadata the only
+ * risk if to make to dedup ratio lower.
+ *
+ * algorithm:
+ * This code is an implementation of the first two stages of the SILT paper.
+ * The third stage was dropped to lower the complexity of the code
+ * The idea of behaving as a FIFO and the notion of incarnations comes from
+ * BufferHash and is handy to help reducing RAM consumption.
+ *
+ * SILT: www.cs.cmu.edu/~dga/papers/silt-sosp2011.pdf
+ * BufferHash: research.microsoft.com/en-us/people/sumann/bufferhash-nsdi10.pdf
+ */
+
+/* This function initialize a key value store
+ *
+ * @store: the store to initialize
+ * @order: the bit order
+ */
+static void qcow2_store_init(BlockDriverState *bs,
+                             QCowStore *store,
+                             uint32_t order)
+{
+    store->order = order;
+    qemu_co_mutex_init(&store->insert_lock);
+    qcow2_log_store_init(bs, &store->log_store, order);
+    qcow2_log_store_init(bs, &store->frozen_log_store, order);
+    qcow2_hash_store_init(&store->hash_store, order);
+}
+
+/* This function cleanup a key value store
+ *
+ * @store: the store to cleanup
+ */
+void qcow2_store_cleanup(QCowStore *store)
+{
+    qcow2_log_store_cleanup(&store->log_store);
+    qcow2_log_store_cleanup(&store->frozen_log_store);
+    qcow2_hash_store_cleanup(&store->hash_store);
+}
+
+/* This function look for a given hash info in the store
+ *
+ *
+ * @store:     the QCowStore to lookup into
+ * @hash_info: the QCowHashInfo to complete : at last the hash must be filled.
+ * @ret:       1 on successfull lookup, 0 if not found, -errno on error
+ */
+int qcow2_store_get(BlockDriverState *bs, QCowStore *store,
+                    QCowHashInfo *hash_info)
+{
+    bool found;
+    int ret = 0;
+
+    /* we do the same query from the youngest to the oldest store */
+
+    found = qcow2_log_store_get(&store->log_store, hash_info);
+
+    if (found) {
+        goto exit;
+    }
+
+    if (store->freezing) {
+         found = qcow2_log_store_get(&store->frozen_log_store, hash_info);
+    }
+
+    /* found or error */
+    if (found) {
+        goto exit;
+    }
+
+    ret = qcow2_hash_store_get(bs, &store->hash_store, hash_info);
+
+exit:
+    /* if nothing found or error */
+    if (!found && ret <= 0) {
+        return ret;
+    }
+
+    /* if hash info found but deleted from store return not found */
+    if (hash_info->first_logical_sect & QCOW_DEDUP_DELETED) {
+        return 0;
+    }
+
+    return 1;
+}
+
+/* this structure is used between the two following functions */
+typedef struct {
+    BlockDriverState *bs;
+    QCowStore *store;
+} QCowIncarnateArgs;
+
+/* This coroutine convert a log store into an incarnation
+ *
+ * While doing this convertion we take care of two things :
+ *
+ * -not allocating QCOW2 disk space while the guest is frozen
+ * -not reseting the freezing flag to false while the guest is frozen
+ *
+ * All operations in between these two steps can be done freely as they are only
+ * writes on an already allocated disk space area.
+ *
+ * As long a the freezing flag is set to true the target of the migration would
+ * restart the freeze cleanly.
+ *
+ * @opaque: the function arguments (bs, log_store and hash_store)
+ */
+static void coroutine_fn qcow2_store_co_incarnate(void *opaque)
+{
+    QCowIncarnateArgs *args = (QCowIncarnateArgs *) opaque;
+    int64_t ret = 0;
+    int64_t offset;
+    uint64_t filter_size;
+    uint64_t incarnation_size;
+    uint8_t *filter;
+
+    BlockDriverState *bs               = args->bs;
+    BDRVQcowState    *s                = bs->opaque;
+    QCowStore        *store            = args->store;
+    QCowLogStore     *frozen_log_store = &store->frozen_log_store;
+    QCowHashStore    *hash_store       = &store->hash_store;
+
+    /* avoid disk allocations while the guest is suspended or migrating */
+    co_sleep_ns(vm_clock, 1);
+
+    /* allocate a new incarnation disk space or get it from limbo */
+    offset = qcow2_hash_store_get_incarnation_offset(bs,
+                                                     hash_store);
+
+    if (offset < 0) {
+        /* save the errno so the rest of QCOW2 will know */
+        s->freeze_errno = offset;
+        return;
+    }
+
+    /* save new config */
+    ret = qcow2_store_save(bs, store);
+
+    if (ret < 0) {
+        goto deallocate_exit;
+    }
+
+    /* build the in ram filter and dump it to disk */
+    ret = qcow2_log_store_freeze_filter(bs,
+                                        frozen_log_store,
+                                        offset,
+                                        &filter);
+
+    if (ret < 0) {
+        goto deallocate_exit;
+    }
+
+    /* reorder the journal contained hashes and write them to disk */
+    filter_size = qcow2_hash_store_filter_size(hash_store->order);
+    ret = qcow2_log_store_freeze_hash_table(bs,
+                                            frozen_log_store,
+                                            offset + filter_size);
+
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    /* delegate the new frozen hash table to the hash store */
+    qcow2_hash_store_add_incarnation(bs,
+                                     hash_store,
+                                     filter,
+                                     offset);
+
+    /* reset the log store and clear journal  */
+    ret = qcow2_log_store_recycle(bs, &store->frozen_log_store);
+
+    if (ret < 0) {
+        goto exit;
+    }
+
+    /* we finished to incarnate the log -> remember it */
+    store->freezing = false;
+    /* avoid modifying the freezing flag while the guest is suspended or
+     * migrating
+     */
+    co_sleep_ns(vm_clock, 1);
+    /* save new config */
+    ret =  qcow2_store_save(bs, store);
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    free(args);
+    return;
+
+free_exit:
+    qemu_vfree(filter);
+deallocate_exit:
+    incarnation_size = qcow2_hash_store_incarnation_size(hash_store->order);
+    /* postpone deallocation if guest is suspended */
+    co_sleep_ns(vm_clock, 1);
+    /* FIXME: why freeing this does match well with limbo */
+    qcow2_free_clusters(bs, offset, incarnation_size);
+exit:
+    /* save the errno so the rest of QCOW2 will know */
+    s->freeze_errno = ret;
+    free(args);
+}
+
+/* This function start a coroutine convertion a log store into and incarnation
+ *
+ * As the function return void the coroutine store error status in
+ * s->freeze_errno
+ *
+ * @store:  the store to work on
+ */
+static void qcow2_store_incarnate(BlockDriverState *bs,
+                                  QCowStore *store)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    QCowIncarnateArgs *args = g_new0(QCowIncarnateArgs, 1);
+    args->bs = bs;
+    args->store = store;
+
+    s->freeze_co = qemu_coroutine_create(qcow2_store_co_incarnate);
+    qemu_coroutine_enter(s->freeze_co, args);
+}
+
+/* This function freeze the current log store and create a new one
+ *
+ * This swap the two log store
+ *
+ * @store: the QCowStore we work with
+ */
+static int qcow2_store_swap_log_store(BlockDriverState *bs, QCowStore *store)
+{
+    QCowStore swap;
+    int ret = 0;
+
+    /* flush and stop the journal */
+    ret = qcow2_log_store_stop(bs, &store->log_store);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* backup frozen log store */
+    memcpy(&swap,
+           &store->frozen_log_store,
+           sizeof(QCowLogStore));
+
+    /* freeze current log store */
+    memcpy(&store->frozen_log_store,
+           &store->log_store,
+           sizeof(QCowLogStore));
+
+    /* finish the swap */
+    memcpy(&store->log_store,
+           &swap,
+           sizeof(QCowLogStore));
+
+    /* no parallel freeze can occur */
+    assert(!store->freezing);
+    store->freezing = true;
+
+    return 0;
+}
+
+/* this function start the freeze of the current log store into and incarnation
+ *
+ * @store:     the QCowStore to work on
+ * @hash_info: the hash_info to insert into the new log store
+ * @ret:       0 on succes, -errno on error
+ */
+static int qcow2_store_start_freeze(BlockDriverState *bs,
+                                    QCowStore *store,
+                                    QCowHashInfo *hash_info)
+{
+    int ret = 0;
+    bool need_save = false;
+
+    /* if we are already are freezing a log store wait for the operation to end
+     * before starting a new one
+     */
+    while (store->freezing) {
+        co_sleep_ns(rt_clock, QCOW2_STORE_CO_SLEEP_NS);
+    }
+
+    /* log store is full -> prepare the log store freeze and create new one */
+    ret = qcow2_store_swap_log_store(bs, store);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* save new configuration */
+    ret = qcow2_store_save(bs, store);
+
+    /* on error */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* delegate the insert to the new log store */
+    ret = qcow2_log_store_insert(bs, &store->log_store, hash_info, &need_save);
+
+    /* on error */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* ret == 1 case is not tested because the log store is new and can't
+     * possibly be full right now
+     */
+
+    /* start to incarnate the previous log store into a hash store
+     * This is the responsibility of the coroutine to get rid of the frozen log
+     * store and save the new configuration to disk.
+     */
+    store->freezing = true;
+    qcow2_store_incarnate(bs, store);
+
+    return 0;
+}
+
+/* This function insert a QCowHashInfo into the store
+ *
+ * @store:     The QCowStore to work with
+ * @hash_info: the QCowHashInfo to insert into the sore
+ * @ret:       0 on success, -errno on error
+ */
+int qcow2_store_insert(BlockDriverState *bs, QCowStore *store,
+                       QCowHashInfo *hash_info)
+{
+    int ret = 0;
+    bool need_save = false;
+
+    /* we take the insert_lock here to prevent concurents inserts to trigger
+     * multiple concurent log store freeze if the log store is full
+     */
+    qemu_co_mutex_lock(&store->insert_lock);
+
+    /* delegate the insert to the log store */
+    ret = qcow2_log_store_insert(bs, &store->log_store, hash_info, &need_save);
+
+    /* on error */
+    if (ret < 0) {
+        goto unlock_exit;
+    }
+
+    /* need_save is true only if case of success -> no need to check ret */
+    if (need_save) {
+        ret = qcow2_store_save(bs, store);
+    }
+
+    /* on success */
+    if (!ret) {
+        goto unlock_exit;
+    }
+
+    /* ret == 1 -> log store is full -> freeze in into an incarnation */
+    ret = qcow2_store_start_freeze(bs, store, hash_info);
+
+unlock_exit:
+    qemu_co_mutex_unlock(&store->insert_lock);
+    return ret;
+}
+
+/* This delegate a flush to the log store
+ *
+ * @store:     The QCowStore to work with
+ * @ret:       0 on success, -errno on error
+ */
+int qcow2_store_flush(BlockDriverState *bs, QCowStore *store)
+{
+    return qcow2_log_store_flush(bs, &store->log_store);
+}
+
+/* This function delete a QCowHashInfo from the store
+ *
+ * @store:     The QCowStore to work with
+ * @hash_info: the QCowHashInfo to insert into the sore
+ * @ret:       0 on success, -errno on error
+ */
+int qcow2_store_delete(BlockDriverState *bs, QCowStore *store,
+                       QCowHashInfo *hash_info)
+{
+    hash_info->first_logical_sect |= QCOW_DEDUP_DELETED;
+    return qcow2_store_insert(bs, store, hash_info);
+}
+
+/* this function compute the size of a dump of a store
+ *
+ * @store: the store to work on
+ * @ret:   the size of a dump
+ */
+static size_t qcow2_store_dump_size(QCowStore *store)
+{
+    return sizeof(store->order) + 1 +
+           qcow2_log_store_dump_size() * 2 + /* current + frozen log store */
+           qcow2_hash_store_dump_size(&store->hash_store);
+}
+
+/* this function dump a store in a buffer
+ *
+ * @buf:   the buffer to dump into
+ * @store: the store to dump
+ * @ret:   the size used for the dump
+ */
+static size_t qcow2_store_dump(uint8_t *buf, QCowStore *store)
+{
+    uint32_t *buf32 = (uint32_t *) buf;
+    size_t offset;
+
+    /* dump order */
+    buf32[0] = cpu_to_be32(store->order);
+    offset = sizeof(store->order);
+
+    /* dump freeze status */
+    buf[offset] = store->freezing;
+    offset++;
+
+    /* dump current log store */
+    offset += qcow2_log_store_dump(buf + offset,
+                                   &store->log_store);
+
+
+    /* dump frozen log store */
+    offset += qcow2_log_store_dump(buf + offset,
+                                   &store->frozen_log_store);
+
+    /* dump hash store */
+    offset += qcow2_hash_store_dump(buf + offset,
+                                    &store->hash_store);
+
+    return offset;
+}
+
+/* this function parse a buffer in a given store
+ *
+ * note: As the parse method is used only on startup it also init the store
+ *
+ * @store: the store to parse into
+ * @buf:   the buffer to parse from
+ * @size:  the size of the buffer
+ * @ret:   0 on success, -1 when buffer is too small to parse all data from
+ */
+static int qcow2_store_parse(BlockDriverState *bs,
+                             QCowStore *store,
+                             uint8_t *buf,
+                             uint32_t size)
+{
+    size_t offset;
+    uint32_t *buf32 = (uint32_t *) buf;
+    uint32_t order;
+    int ret;
+
+    /* parse order and init store */
+    order = be32_to_cpu(buf32[0]);
+    qcow2_store_init(bs, store, order);
+    offset = sizeof(store->order);
+
+    /* parse freeze status */
+    store->freezing = buf[offset];
+    offset++;
+
+    /* parse current log store */
+    offset += qcow2_log_store_parse(&store->log_store, buf + offset);
+
+    /* parse frozen log store i its not NULL */
+    offset += qcow2_log_store_parse(&store->frozen_log_store, buf + offset);
+
+    /* parse hash store */
+    ret = qcow2_hash_store_parse(&store->hash_store,
+                                 buf + offset,
+                                 buf + size);
+
+    /* buffer was too small -> cleanup and return error  */
+    if (ret == -1) {
+        /* free various buffers */
+        qcow2_store_cleanup(store);
+        return -1;
+    }
+
+    return 0;
+}
+
+static int32_t qcow2_store_compute_order(BlockDriverState *bs,
+                                         uint64_t total_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t nb_clusters = total_size / s->cluster_size;
+    uint64_t nb_hash_per_incarnation = nb_clusters / QCOW2_NB_INCARNATION_GOAL;
+    return qcow2_log_store_compute_order(nb_hash_per_incarnation);
+}
+
+static int64_t qcow2_store_alloc_disk_conf(BlockDriverState *bs,
+                                           QCowStore *store,
+                                           size_t *psize)
+{
+    /* compute configuration space dump size * 2 with room for growth */
+    *psize = qcow2_store_dump_size(store) * 2;
+
+    /* allocate the store configuration space */
+    return qcow2_alloc_clusters(bs, *psize);
+}
+
+/* this function save a QCowStore to disk and update header if needeed
+ *
+ * The function handle the need to grow the on disk dump area
+ * The function does rewrite the QCOW2 header extension if needed.
+ *
+ * @store:  the store to save
+ * @ret:    0 on success, -errno on error
+ */
+int64_t qcow2_store_save(BlockDriverState *bs, QCowStore *store)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset, old_offset = 0;
+    size_t size, old_size = 0;
+    bool alloc = false;
+    uint8_t *buf;
+    int ret = 0;
+
+    /* we use these temporary variable to commit atomically */
+    offset = s->dedup_conf_offset;
+    size = s->dedup_conf_size;
+
+    /* check if we need to grow on disk dump area */
+    if (size < qcow2_store_dump_size(store))  {
+        alloc = true;
+        old_offset = offset;
+        old_size = size;
+        offset = qcow2_store_alloc_disk_conf(bs, store, &size);
+
+        if (offset < 0) {
+            return offset;
+        }
+    }
+
+    /* allocate buffer */
+    buf = qemu_blockalign(bs, size);
+    memset(buf, 0, size);
+
+    /* serialize store config in buffer */
+    ret = qcow2_store_dump(buf, store);
+
+    if (ret < 0) {
+        goto free_dealloc_exit;
+    }
+
+    /* write the conf to disk */
+    ret = bdrv_pwrite(bs->file, offset, buf, size);
+
+    /* no error happened and we grew the config disk space -> save in header */
+    if (alloc && ret > 0) {
+        s->dedup_conf_offset = offset;
+        s->dedup_conf_size = size;
+        qcow2_update_header(bs);
+        /* do not free old disk space on creation */
+        if (old_size) {
+            qcow2_free_clusters(bs, old_offset, old_size);
+        }
+    }
+    ret = 0;
+
+free_dealloc_exit:
+    /* on alloc and error free newly allocated disk space */
+    if (alloc && ret < 0) {
+        qcow2_free_clusters(bs, offset, size);
+    }
+    qemu_vfree(buf);
+    return ret;
+}
+
+/* This function will allocate a journal, init the store and save the config
+ *
+ * this function will be used at QCOW2 image creation time
+ * This code don't bother cleaning up failure since error would make image
+ * creation fail. It just free memory.
+ *
+ * note: s->cluster_size must be filled with correct value for
+ *       qcow2_store_compute_order
+ *
+ * @store:      the store to create
+ * @total_size: the total size of the image
+ * @ret:        0 on success, -errno on error
+ */
+int64_t qcow2_store_create(BlockDriverState *bs,
+                           QCowStore *store,
+                           uint64_t total_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    /* compute the bit order required to store the journal and hash tables */
+    uint32_t order = qcow2_store_compute_order(bs, total_size);
+
+    /* initialize the store state */
+    qcow2_store_init(bs, store, order);
+
+    /* create the journal */
+    ret = qcow2_log_store_create(bs, &store->log_store, order);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* create frozen log store we will swap at runtime */
+    ret = qcow2_log_store_create(bs, &store->frozen_log_store, order);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* set size to zero so the save function will start allocating
+     * and will not try to free anything
+     */
+    s->dedup_conf_size = 0;
+
+    /* save new config */
+    return qcow2_store_save(bs, store);
+}
+
+
+/* this function initialize a store and load it's configuration from disk
+ *
+ * note: the journal and the incarnations filters are not loaded from disk
+ *       this work is to be done later.
+ *
+ * @store:  the store to load the configuration into
+ * @ret:    0 on success, -errno on error, -1 if the on disk dump was invalid
+ */
+int qcow2_store_load(BlockDriverState *bs, QCowStore *store)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    /* allocate read buffer */
+    uint8_t *buf = qemu_blockalign(bs, s->dedup_conf_size);
+
+    /* read conf from disk */
+    ret = bdrv_pread(bs->file,
+                     s->dedup_conf_offset,
+                     buf,
+                     s->dedup_conf_size);
+
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    /* parse the buffer into a store */
+    ret = qcow2_store_parse(bs, store, buf, s->dedup_conf_size);
+free_exit:
+    qemu_vfree(buf);
+
+    return ret;
+}
+
+/* this function load the in ram store data at startup
+ *
+ * It loads the current journal synchronously so after this the deduplication
+ * will be operational.
+ *
+ * @store: the store to work on
+ * @ret:   0 on succes, -errno on error
+ */
+int qcow2_store_start(BlockDriverState *bs, QCowStore *store)
+{
+    int ret = 0;
+
+    /* reload current journal */
+    ret = qcow2_log_store_rebuild_from_journal(bs, &store->log_store);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* resume freeze in progress */
+    if(store->freezing) {
+        ret = qcow2_log_store_rebuild_from_journal(bs,
+                                                   &store->frozen_log_store);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* do the incarnation in a coroutine */
+        qcow2_store_incarnate(bs, store);
+    }
+
+    /* load incarnations in ram filters in a coroutine */
+    qcow2_hash_store_load_filters(bs, &store->hash_store);
+
+    return 0;
+}
+
+int qcow2_store_forget(BlockDriverState *bs, QCowStore *store)
+{
+     int ret = 0;
+     store->freezing = false;
+
+     /* reset hash store */
+     qcow2_hash_store_forget_all_incarnations(&store->hash_store);
+
+     /* reset log store */
+     ret = qcow2_log_store_create(bs, &store->log_store, store->order);
+
+     if (ret < 0) {
+         return ret;
+     }
+
+     /* reset frozen log store */
+     ret = qcow2_log_store_create(bs, &store->frozen_log_store, store->order);
+
+     if (ret < 0) {
+         return ret;
+     }
+
+     /* save config to disk */
+     return qcow2_store_save(bs, store);
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index d0037bf..c0dbfe3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -226,9 +226,9 @@ typedef struct {
 
 typedef struct {
     uint32_t      order;
+    bool          freezing;          /* is a log store freeze in progress */
     QCowLogStore  log_store;         /* the current log store */
     QCowLogStore  frozen_log_store;  /* the log store to incarnate */
-    bool          freezing;          /* are we incarnating a log store */
     QCowHashStore hash_store;
     CoMutex       insert_lock;       /* used to prevent multiple freeze attempts
                                       * at the same time
@@ -699,4 +699,22 @@ size_t qcow2_hash_store_parse(QCowHashStore *store,
                                 uint8_t *buf,
                                 uint8_t *buf_end);
 
+/* qcow2-store.c functions (Should be called by dedup.c) */
+
+void qcow2_store_cleanup(QCowStore *store);
+int qcow2_store_get(BlockDriverState *bs, QCowStore *store,
+                    QCowHashInfo *hash_info);
+int qcow2_store_insert(BlockDriverState *bs, QCowStore *store,
+                       QCowHashInfo *hash_info);
+int qcow2_store_flush(BlockDriverState *bs, QCowStore *store);
+int qcow2_store_delete(BlockDriverState *bs, QCowStore *store,
+                       QCowHashInfo *hash_info);
+int64_t qcow2_store_save(BlockDriverState *bs, QCowStore *store);
+int64_t qcow2_store_create(BlockDriverState *bs,
+                           QCowStore *store,
+                           uint64_t total_size);
+int qcow2_store_load(BlockDriverState *bs, QCowStore *store);
+int qcow2_store_start(BlockDriverState *bs, QCowStore *store);
+int qcow2_store_forget(BlockDriverState *bs, QCowStore *store);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 07/24] qcow2: Add qcow2_dedup_read_missing_and_concatenate
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (5 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 06/24] qcow2: Add the deduplication store Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 08/24] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

This function is used to read missing data when unaligned writes are
done. This function also concatenate missing data with the given
qiov data in order to prepare a buffer used to look for duplicated
clusters.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/Makefile.objs |    2 +-
 block/qcow2-dedup.c |  121 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c       |   35 +++++++++++++++
 block/qcow2.h       |   13 ++++++
 4 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-dedup.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 2d1a269..088407a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,6 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
-block-obj-y += qcow2-journal.o qcow2-log-store.o qcow2-hash-store.o qcow2-store.o
+block-obj-y += qcow2-journal.o qcow2-log-store.o qcow2-hash-store.o qcow2-store.o qcow2-dedup.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += vhdx.o
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
new file mode 100644
index 0000000..bc6e2c2
--- /dev/null
+++ b/block/qcow2-dedup.c
@@ -0,0 +1,121 @@
+/*
+ * Deduplication for the QCOW2 format
+ *
+ * Copyright (C) Nodalink, SARL. 2012-2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block/block_int.h"
+#include "qemu-common.h"
+#include "qcow2.h"
+
+/*
+ * Prepare a buffer containing everything required to compute cluster
+ * sized deduplication hashes.
+ * If sector_num or nb_sectors are not cluster-aligned, missing data
+ * before/after the qiov will be read.
+ *
+ * @qiov:               the qiov for which missing data must be read
+ * @sector_num:         the first sectors that must be read into the qiov
+ * @nb_sectors:         the number of sectors to read into the qiov
+ * @data:               the place where the data will be concatenated and stored
+ *                           the caller is responsible to use qemu_vfree() to
+ *                           data on success.
+ * @nb_data_sectors:    the resulting size of the contatenated data (in sectors)
+ * @ret:                negative on error
+ */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+                                             QEMUIOVector *qiov,
+                                             uint64_t sector_num,
+                                             int nb_sectors,
+                                             uint8_t **data,
+                                             int *nb_data_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    uint64_t cluster_beginning_sector;
+    uint64_t first_sector_after_qiov;
+    int cluster_beginning_nr;
+    int cluster_ending_nr;
+    int unaligned_ending_nr;
+    uint64_t max_cluster_ending_nr;
+
+    /* compute how much and where to read at the beginning */
+    cluster_beginning_nr = sector_num & (s->cluster_sectors - 1);
+    cluster_beginning_sector = sector_num - cluster_beginning_nr;
+
+    /* for the ending */
+    first_sector_after_qiov = sector_num + nb_sectors;
+    unaligned_ending_nr = first_sector_after_qiov & (s->cluster_sectors - 1);
+    cluster_ending_nr = unaligned_ending_nr ?
+                        s->cluster_sectors - unaligned_ending_nr : 0;
+
+    /* compute total size in sectors and allocate memory */
+    *nb_data_sectors = cluster_beginning_nr + nb_sectors + cluster_ending_nr;
+    *data = qemu_blockalign(bs, *nb_data_sectors * BDRV_SECTOR_SIZE);
+
+    /* read beginning */
+    if (cluster_beginning_nr) {
+        ret = qcow2_read_cluster_data(bs,
+                                      *data,
+                                      cluster_beginning_sector,
+                                      cluster_beginning_nr);
+    }
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* append qiov content */
+    qemu_iovec_to_buf(qiov, 0, *data + cluster_beginning_nr * BDRV_SECTOR_SIZE,
+                      qiov->size);
+
+    /* Fix cluster_ending_nr if we are at risk of reading outside the image
+     * (Cluster unaligned image size)
+     */
+    max_cluster_ending_nr = bs->total_sectors - first_sector_after_qiov;
+    cluster_ending_nr = max_cluster_ending_nr < (uint64_t) cluster_ending_nr ?
+                        (int) max_cluster_ending_nr : cluster_ending_nr;
+
+    /* read and add ending */
+    if (cluster_ending_nr) {
+        ret = qcow2_read_cluster_data(bs,
+                                      *data +
+                                      (cluster_beginning_nr +
+                                      nb_sectors) *
+                                      BDRV_SECTOR_SIZE,
+                                      first_sector_after_qiov,
+                                      cluster_ending_nr);
+    }
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    return 0;
+
+fail:
+    qemu_vfree(*data);
+    *data = NULL;
+    return ret;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 2e346d8..34b2a87 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1157,6 +1157,41 @@ fail:
     return ret;
 }
 
+/**
+ * Read some data from the QCOW2 file
+ *
+ * Important: s->lock is dropped. Things can change before the function returns
+ *            to the caller.
+ *
+ * @data:       the buffer where the data must be stored
+ * @sector_num: the sector number to read in the QCOW2 file
+ * @nb_sectors: the number of sectors to read
+ * @ret:        negative on error
+ */
+coroutine_fn int qcow2_read_cluster_data(BlockDriverState *bs,
+                            uint8_t *data,
+                            uint64_t sector_num,
+                            int nb_sectors)
+{
+    BDRVQcowState *s = bs->opaque;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int ret;
+
+    iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE;
+    iov.iov_base = data;
+    qemu_iovec_init_external(&qiov, &iov, 1);
+    qemu_co_mutex_unlock(&s->lock);
+    ret = qcow2_co_readv(bs, sector_num, nb_sectors, &qiov);
+    qemu_co_mutex_lock(&s->lock);
+    if (ret < 0) {
+        error_report("failed to read %d sectors at offset %" PRIu64 "\n",
+                     nb_sectors, sector_num);
+    }
+
+    return ret;
+}
+
 static int qcow2_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt)
 {
diff --git a/block/qcow2.h b/block/qcow2.h
index c0dbfe3..9a9abd3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -539,6 +539,10 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
 
 int qcow2_mark_dirty(BlockDriverState *bs);
 int qcow2_update_header(BlockDriverState *bs);
+int qcow2_read_cluster_data(BlockDriverState *bs,
+                            uint8_t *data,
+                            uint64_t sector_num,
+                            int nb_sectors);
 
 /* qcow2-refcount.c functions */
 int qcow2_refcount_init(BlockDriverState *bs);
@@ -717,4 +721,13 @@ int qcow2_store_load(BlockDriverState *bs, QCowStore *store);
 int qcow2_store_start(BlockDriverState *bs, QCowStore *store);
 int qcow2_store_forget(BlockDriverState *bs, QCowStore *store);
 
+/* qcow2-dedup.c functions */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+                                             QEMUIOVector *qiov,
+                                             uint64_t sector,
+                                             int sectors_nr,
+                                             uint8_t **dedup_cluster_data,
+                                             int *dedup_cluster_data_nr);
+
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 08/24] qcow2: Create a way to link to l2 tables when deduplicating.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (6 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 07/24] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 09/24] qcow2: Make qcow2_update_cluster_refcount public Benoît Canet
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |    6 ++++--
 block/qcow2.h         |    6 ++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c71470a..d6db0b9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -696,7 +696,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
             old_cluster[j++] = l2_table[l2_index + i];
 
         l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+                    (i << s->cluster_bits)) | m->l2_entry_flags);
      }
 
 
@@ -709,7 +709,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      * If this was a COW, we need to decrease the refcount of the old cluster.
      * Also flush bs->file to get the right order for L2 and refcount update.
      */
-    if (j != 0) {
+    if (!m->overwrite && j != 0) {
         for (i = 0; i < j; i++) {
             qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1);
         }
@@ -1098,6 +1098,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
             .offset     = nb_sectors * BDRV_SECTOR_SIZE,
             .nb_sectors = avail_sectors - nb_sectors,
         },
+        .l2_entry_flags = QCOW_OFLAG_COPIED,
+        .overwrite      = false,
     };
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2.h b/block/qcow2.h
index 9a9abd3..7754065 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -438,6 +438,12 @@ typedef struct QCowL2Meta
      */
     CoQueue dependent_requests;
 
+    /* contains the flags to apply to the l2 entry */
+    uint64_t l2_entry_flags;
+
+    /* set to true if we are overwriting an L2 table entry */
+    bool overwrite;
+
     /**
      * The COW Region between the start of the first allocated cluster and the
      * area the guest actually writes to.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 09/24] qcow2: Make qcow2_update_cluster_refcount public.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (7 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 08/24] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 10/24] qcow2: Add qcow2_dedup and related functions Benoît Canet
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-refcount.c |   17 ++++++++++-------
 block/qcow2.h          |    3 +++
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b32738f..3bd8f37 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -520,9 +520,9 @@ fail:
  * If the return value is non-negative, it is the new refcount of the cluster.
  * If it is negative, it is -errno and indicates an error.
  */
-static int update_cluster_refcount(BlockDriverState *bs,
-                                   int64_t cluster_index,
-                                   int addend)
+int qcow2_update_cluster_refcount(BlockDriverState *bs,
+                                  int64_t cluster_index,
+                                  int addend)
 {
     BDRVQcowState *s = bs->opaque;
     int ret;
@@ -649,7 +649,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if (free_in_cluster == 0)
             s->free_byte_offset = 0;
         if ((offset & (s->cluster_size - 1)) != 0)
-            update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
     } else {
         offset = qcow2_alloc_clusters(bs, s->cluster_size);
         if (offset < 0) {
@@ -659,7 +659,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if ((cluster_offset + s->cluster_size) == offset) {
             /* we are lucky: contiguous data */
             offset = s->free_byte_offset;
-            update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
+            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1);
             s->free_byte_offset += size;
         } else {
             s->free_byte_offset = offset;
@@ -795,7 +795,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     } else {
                         uint64_t cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
                         if (addend != 0) {
-                            refcount = update_cluster_refcount(bs, cluster_index, addend);
+                            refcount = qcow2_update_cluster_refcount(bs,
+                                           cluster_index,
+                                           addend);
                         } else {
                             refcount = get_refcount(bs, cluster_index);
                         }
@@ -827,7 +829,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 
             if (addend != 0) {
-                refcount = update_cluster_refcount(bs, l2_offset >> s->cluster_bits, addend);
+                refcount = qcow2_update_cluster_refcount(bs,
+                               l2_offset >> s->cluster_bits, addend);
             } else {
                 refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
             }
diff --git a/block/qcow2.h b/block/qcow2.h
index 7754065..3238083 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -568,6 +568,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                           BdrvCheckMode fix);
+int qcow2_update_cluster_refcount(BlockDriverState *bs,
+                                  int64_t cluster_index,
+                                  int addend);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 10/24] qcow2: Add qcow2_dedup and related functions
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (8 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 09/24] qcow2: Make qcow2_update_cluster_refcount public Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 11/24] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |  415 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h       |    5 +
 2 files changed, 420 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index bc6e2c2..0daf77e 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -119,3 +119,418 @@ fail:
     *data = NULL;
     return ret;
 }
+
+/*
+ * Set a QCowHashInfo into the new state
+ *
+ * @hash_info: the hash_info to set to new
+ */
+static void qcow2_set_hash_info_new(QCowHashInfo *hash_info)
+{
+    hash_info->physical_sect = QCOW_DEDUP_FLAG_EMPTY;
+    hash_info->first_logical_sect = QCOW_DEDUP_FLAG_EMPTY;
+}
+
+/*
+ * Compute the hash of a given cluster
+ *
+ * @data: a buffer containing the cluster data
+ * @hash: a QCowHash where to store the computed hash
+ * @ret:  0 on success, negative on error
+ */
+static int qcow2_compute_cluster_hash(BlockDriverState *bs,
+                                       QCowHash *hash,
+                                       uint8_t *data)
+{
+    return 0;
+}
+
+/*
+ * Get a QCowHashInfo corresponding to a cluster data
+ *
+ * @phash:           if phash can be used no hash is computed
+ * @data:            a buffer containing the cluster
+ * @ret:             1 if if hash found, 0 if not found, -errno on error
+ */
+static int qcow2_get_persistent_hash_for_cluster(BlockDriverState *bs,
+                                                 QcowPersistentHash *phash,
+                                                 uint8_t *data)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    /* no hash has been provided compute it and store it for later usage */
+    if (!phash->reuse) {
+        ret = qcow2_compute_cluster_hash(bs,
+                                         &phash->hash_info.hash,
+                                         data);
+    }
+
+    /* do not reuse the hash anymore if it was precomputed */
+    phash->reuse = false;
+
+    /* error */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* ask the store if it knows this QCowHashInfo */
+    return qcow2_store_get(bs, &s->key_value_store, &phash->hash_info);
+}
+
+/*
+ * Insert a QCowHashInfo into the store as new
+ *
+ * @hash_info: the QCowHashInfo to set to new and insert
+ * @ret:       0 on succes, -errno on error
+ */
+static int qcow2_add_new_hash_info_to_store(BlockDriverState *bs,
+                                            QCowHashInfo *hash_info)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    /* set the QCowHashInfo to the new state so we will remember to fill these
+     * field later with real values
+     */
+    qcow2_set_hash_info_new(hash_info);
+    return qcow2_store_insert(bs, &s->key_value_store, hash_info);
+}
+
+/*
+ * Helper used to build a QCowHashElement
+ *
+ * @hash: the QCowHash to use
+ * @ret:  a newly allocated QCowHashElement containing the given hash
+ */
+static QCowHashElement *qcow2_dedup_hash_new(QCowHash *hash)
+{
+    QCowHashElement *dedup_hash;
+    dedup_hash = g_new0(QCowHashElement, 1);
+    memcpy(dedup_hash->hash_info.hash.data, hash->data, HASH_LENGTH);
+    return dedup_hash;
+}
+
+/*
+ * Helper used to link a deduplicated cluster in the l2
+ *
+ * @logical_sect:  the cluster sector seen by the guest
+ * @physical_sect: the cluster sector in the QCOW2 file
+ * @overwrite:     true if we must overwrite the L2 table entry
+ * @ret:
+ */
+static int qcow2_dedup_link_l2(BlockDriverState *bs,
+                               uint64_t logical_sect,
+                               uint64_t physical_sect,
+                               bool overwrite)
+{
+    QCowL2Meta m = {
+        .alloc_offset   = physical_sect << 9,
+        .offset         = logical_sect << 9,
+        .nb_clusters    = 1,
+        .nb_available   = 0,
+        .cow_start = {
+            .offset     = 0,
+            .nb_sectors = 0,
+        },
+        .cow_end = {
+            .offset     = 0,
+            .nb_sectors = 0,
+        },
+        .l2_entry_flags = 0,
+        .overwrite      = overwrite,
+    };
+    return qcow2_alloc_cluster_link_l2(bs, &m);
+}
+
+/* Clear the QCOW_OFLAG_COPIED from the first L2 entry written for a physical
+ * cluster.
+ *
+ * @hash_info: the duplicated hash info
+ * @ret:       0 on success, negative on error
+ */
+static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
+                                                QCowHashInfo *hash_info)
+{
+    int ret = 0;
+    uint64_t first_logical_sect = hash_info->first_logical_sect;
+
+    /* QCOW_OFLAG_COPIED already cleared -> do nothing */
+    if (!(first_logical_sect & QCOW_OFLAG_COPIED)) {
+        return 0;
+    }
+
+    first_logical_sect &= ~QCOW_OFLAG_COPIED;
+
+    /* overwrite first L2 entry to clear QCOW_FLAG_COPIED */
+    ret = qcow2_dedup_link_l2(bs, first_logical_sect,
+                              hash_info->physical_sect,
+                              true);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* remember that we don't need to clear QCOW_OFLAG_COPIED again */
+    hash_info->first_logical_sect = first_logical_sect;
+
+    return 0;
+}
+
+/* This function deduplicate a cluster
+ *
+ * @logical_sect: The logical sector of the write
+ * @hash_info:    The duplicated cluster hash info
+ * @ret:          0 on success, negative on error
+ */
+static int qcow2_deduplicate_cluster(BlockDriverState *bs,
+                                     uint64_t logical_sect,
+                                     QCowHashInfo *hash_info)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t cluster_index = hash_info->physical_sect / s->cluster_sectors;
+    int ret = 0;
+
+    /* Increment the refcount of the cluster */
+    ret = qcow2_update_cluster_refcount(bs,
+                                        cluster_index,
+                                        1);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* create new L2 entry */
+    return qcow2_dedup_link_l2(bs, logical_sect,
+                               hash_info->physical_sect,
+                               false);
+}
+
+/* This function tries to deduplicate a given cluster.
+ *
+ * @sector_num:           the logical sector number we are trying to deduplicate
+ * @phash:                Used instead of computing the hash if provided
+ * @data:                 the buffer in which to look for a duplicated cluster
+ * @ret:                  ret < 0 on error, 1 on deduplication else 0
+ */
+static int qcow2_try_dedup_cluster(BlockDriverState *bs,
+                                   QcowPersistentHash *phash,
+                                   uint64_t sector_num,
+                                   uint8_t *data)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    uint64_t logical_sect;
+    uint64_t existing_physical_offset;
+    int pnum = s->cluster_sectors;
+
+    /* search the tree for duplicated cluster */
+    ret = qcow2_get_persistent_hash_for_cluster(bs,
+                                                phash,
+                                                data);
+
+    /* we won't reuse the hash on error */
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if cluster is not duplicated store hash for later usage */
+    if (!ret) {
+        return qcow2_add_new_hash_info_to_store(bs, &phash->hash_info);
+    }
+
+    logical_sect = sector_num & ~(s->cluster_sectors - 1);
+    ret = qcow2_get_cluster_offset(bs, logical_sect << 9,
+                                   &pnum, &existing_physical_offset);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if we are rewriting the same cluster at the same place do nothing */
+    if (existing_physical_offset == phash->hash_info.physical_sect << 9) {
+        return 1;
+    }
+
+    /* take care of not having refcount > 1 and QCOW_OFLAG_COPIED at once */
+    ret = qcow2_clear_l2_copied_flag_if_needed(bs, &phash->hash_info);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* do the deduplication */
+    ret = qcow2_deduplicate_cluster(bs, logical_sect, &phash->hash_info);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    return 1;
+}
+
+
+static void add_hash_to_undedupable_list(BlockDriverState *bs,
+                                         QCowDedupState *ds)
+{
+    /* memorize hash for later storage in key value store and and disk */
+    QCowHashElement *dedup_hash =
+        qcow2_dedup_hash_new(&ds->phash.hash_info.hash);
+    QTAILQ_INSERT_TAIL(&ds->undedupables, dedup_hash, next);
+}
+
+static int qcow2_dedup_starting_from_begining(BlockDriverState *bs,
+                                              QCowDedupState *ds,
+                                              uint64_t sector_num,
+                                              uint8_t *data,
+                                              int left_to_process)
+{
+    BDRVQcowState *s = bs->opaque;
+    int i;
+    int ret = 0;
+
+    for (i = 0; i < left_to_process; i++) {
+        ret = qcow2_try_dedup_cluster(bs,
+                                      &ds->phash,
+                                      sector_num + i * s->cluster_sectors,
+                                      data + i * s->cluster_size);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* stop if a cluster has not been deduplicated */
+        if (ret != 1) {
+            break;
+        }
+    }
+
+    return i;
+}
+
+static int qcow2_count_next_non_dedupable_clusters(BlockDriverState *bs,
+                                                   QCowDedupState *ds,
+                                                   uint8_t *data,
+                                                   int left_to_process)
+{
+    BDRVQcowState *s = bs->opaque;
+    int i;
+    int ret = 0;
+
+    for (i = 0; i < left_to_process; i++) {
+        ret = qcow2_get_persistent_hash_for_cluster(bs,
+                                                    &ds->phash,
+                                                    data + i * s->cluster_size);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        /* found a duplicated cluster : stop here */
+        if (ret) {
+            break;
+        }
+
+        ret = qcow2_add_new_hash_info_to_store(bs, &ds->phash.hash_info);
+
+        if (ret < 0) {
+            return ret;
+        }
+        add_hash_to_undedupable_list(bs, ds);
+    }
+
+    return i;
+}
+
+
+/* Deduplicate all the cluster that can be deduplicated.
+ *
+ * Next it computes the number of non deduplicable sectors to come while storing
+ * the hashes of these sectors in a linked list for later usage.
+ * Then it computes the first duplicated cluster hash that comes after non
+ * deduplicable cluster, this hash will be used at next call of the function
+ *
+ * @ds:              a structure containing the state of the deduplication
+ *                   for this write request
+ * @sector_num:      The logical sector
+ * @data:            the buffer containing the data to deduplicate
+ * @data_nr:         the size of the buffer in sectors
+ *
+ */
+int qcow2_dedup(BlockDriverState *bs,
+                QCowDedupState *ds,
+                uint64_t sector_num,
+                uint8_t *data,
+                int data_nr)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+    int deduped_clusters_nr = 0;
+    int left_to_process;
+    int start_index;
+
+    start_index = sector_num & (s->cluster_sectors - 1);
+
+    left_to_process = (data_nr / s->cluster_sectors) -
+                      ds->nb_clusters_processed;
+
+    data += ds->nb_clusters_processed * s->cluster_size;
+
+    /* start deduplicating all that can be cluster after cluster */
+    ret = qcow2_dedup_starting_from_begining(bs,
+                                             ds,
+                                             sector_num,
+                                             data,
+                                             left_to_process);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    deduped_clusters_nr = ret;
+
+    left_to_process -= ret;
+    ds->nb_clusters_processed += ret;
+    data += ret * s->cluster_size;
+
+    /* We deduped everything till the end */
+    if (!left_to_process) {
+        ds->nb_undedupable_sectors = 0;
+        goto exit;
+    }
+
+    /* skip and account the first undedupable cluster found */
+    left_to_process--;
+    ds->nb_clusters_processed++;
+    data += s->cluster_size;
+    ds->nb_undedupable_sectors += s->cluster_sectors;
+
+    add_hash_to_undedupable_list(bs, ds);
+
+    /* Count how many non duplicated sector can be written and memorize hashes
+     * to write them after data has reached disk.
+     */
+    ret = qcow2_count_next_non_dedupable_clusters(bs,
+                                                  ds,
+                                                  data,
+                                                  left_to_process);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    left_to_process -= ret;
+    ds->nb_clusters_processed += ret;
+    ds->nb_undedupable_sectors += ret * s->cluster_sectors;
+
+    /* remember to reuse the last hash computed at new qcow2_dedup call */
+    if (left_to_process) {
+        ds->phash.reuse = true;
+    }
+
+exit:
+    if (!deduped_clusters_nr) {
+        return 0;
+    }
+
+    return deduped_clusters_nr * s->cluster_sectors - start_index;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 3238083..3346842 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -737,6 +737,11 @@ int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
                                              int sectors_nr,
                                              uint8_t **dedup_cluster_data,
                                              int *dedup_cluster_data_nr);
+int qcow2_dedup(BlockDriverState *bs,
+                QCowDedupState *ds,
+                uint64_t sector_num,
+                uint8_t *data,
+                int data_nr);
 
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 11/24] qcow2: Add qcow2_dedup_store_new_hashes.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (9 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 10/24] qcow2: Add qcow2_dedup and related functions Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 12/24] qcow2: Do allocate on rewrite on the dedup case Benoît Canet
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   97 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/qcow2.h       |    6 +++-
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 0daf77e..ffbf866 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -251,6 +251,7 @@ static int qcow2_dedup_link_l2(BlockDriverState *bs,
 static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
                                                 QCowHashInfo *hash_info)
 {
+    BDRVQcowState *s = bs->opaque;
     int ret = 0;
     uint64_t first_logical_sect = hash_info->first_logical_sect;
 
@@ -273,7 +274,8 @@ static int qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
     /* remember that we don't need to clear QCOW_OFLAG_COPIED again */
     hash_info->first_logical_sect = first_logical_sect;
 
-    return 0;
+    /* clear the QCOW_OFLAG_COPIED flag from disk */
+    return qcow2_store_insert(bs, &s->key_value_store, hash_info);
 }
 
 /* This function deduplicate a cluster
@@ -534,3 +536,96 @@ exit:
 
     return deduped_clusters_nr * s->cluster_sectors - start_index;
 }
+
+static inline bool is_hash_info_empty(QCowHashInfo *hash_info)
+{
+    return hash_info->physical_sect & QCOW_DEDUP_FLAG_EMPTY;
+}
+
+/* This function store a hash information to disk and RAM
+ *
+ * @hash_info:      the QCowHashInfo to process
+ * @logical_sect:   the logical sector of the cluster seen by the guest
+ * @physical_sect:  the physical sector of the stored cluster
+ * @ret:            0 on success, negative on error
+ */
+static int qcow2_store_hash(BlockDriverState *bs,
+                            QCowHashInfo *hash_info,
+                            uint64_t logical_sect,
+                            uint64_t physical_sect)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    ret = qcow2_store_get(bs, &s->key_value_store, hash_info);
+
+    /* no hash info found in store or error */
+    if (ret <= 0) {
+        return ret;
+    }
+
+    /* the hash info information are already completed */
+    if (!is_hash_info_empty(hash_info)) {
+        return 0;
+    }
+
+    /* Remember that this QCowHashInfo represents the first occurrence of the
+     * cluster so we will be able to clear QCOW_OFLAG_COPIED from the L2 table
+     * entry when refcount will go > 1.
+     */
+    logical_sect = logical_sect | QCOW_OFLAG_COPIED;
+
+    /* fill the missing fields of the hash info */
+    hash_info->physical_sect = physical_sect;
+    hash_info->first_logical_sect = logical_sect;
+
+    /* write the hash into the key value store */
+    return qcow2_store_insert(bs, &s->key_value_store, hash_info);
+}
+
+/* This function store the hashes of the clusters which are not duplicated
+ *
+ * @ds:            The deduplication state
+ * @count:         the number of dedup hash to process
+ * @logical_sect:  logical offset of the first cluster (in sectors)
+ * @physical_sect: offset of the first cluster (in sectors)
+ * @ret:           0 on succes, errno on error
+ */
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+                                 QCowDedupState *ds,
+                                 int count,
+                                 uint64_t logical_sect,
+                                 uint64_t physical_sect)
+{
+    int ret = 0;
+    int i = 0;
+    BDRVQcowState *s = bs->opaque;
+    QCowHashElement *dedup_hash, *next_dedup_hash;
+
+    /* round values on cluster boundaries for easier cluster deletion */
+    logical_sect = logical_sect & ~(s->cluster_sectors - 1);
+    physical_sect = physical_sect & ~(s->cluster_sectors - 1);
+
+    QTAILQ_FOREACH_SAFE(dedup_hash, &ds->undedupables, next, next_dedup_hash) {
+
+        ret = qcow2_store_hash(bs,
+                               &dedup_hash->hash_info,
+                               logical_sect + i * s->cluster_sectors,
+                               physical_sect + i * s->cluster_sectors);
+
+        QTAILQ_REMOVE(&ds->undedupables, dedup_hash, next);
+        g_free(dedup_hash);
+
+        if (ret < 0) {
+            break;
+        }
+
+        i++;
+
+        if (i == count) {
+            break;
+        }
+    }
+
+    return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 3346842..ff26e81 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -742,6 +742,10 @@ int qcow2_dedup(BlockDriverState *bs,
                 uint64_t sector_num,
                 uint8_t *data,
                 int data_nr);
-
+int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
+                                 QCowDedupState *ds,
+                                 int count,
+                                 uint64_t logical_sect,
+                                 uint64_t physical_sect);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 12/24] qcow2: Do allocate on rewrite on the dedup case.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (10 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 11/24] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 13/24] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

This patch does allocate on rewrite when deduplication is on.
This get rid of the need of removing the old hash of the lookup structure
when a cluster get rewritten.
The old data is left in place and will be collected/deleted when it's cluster
will reach 0.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d6db0b9..41c4bc2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -878,7 +878,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
 
     /* Check how many clusters are already allocated and don't need COW */
-    if (qcow2_get_cluster_type(cluster_offset) == QCOW2_CLUSTER_NORMAL
+    if (!s->has_dedup &&
+        qcow2_get_cluster_type(cluster_offset) == QCOW2_CLUSTER_NORMAL
         && (cluster_offset & QCOW_OFLAG_COPIED))
     {
         /* If a specific host_offset is required, check it */
@@ -1028,7 +1029,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     /* For the moment, overwrite compressed clusters one by one */
     if (entry & QCOW_OFLAG_COMPRESSED) {
         nb_clusters = 1;
-    } else {
+    } else if (!s->has_dedup) {
         nb_clusters = count_cow_clusters(s, nb_clusters, l2_table, l2_index);
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 13/24] qcow2: Implement qcow2_compute_cluster_hash.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (11 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 12/24] qcow2: Do allocate on rewrite on the dedup case Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 14/24] qcow2: Load and save deduplication table header extension Benoît Canet
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Also factorize detection of libgnutls with vnc tls.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   17 +++++++++-
 configure           |   86 +++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 79 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index ffbf866..2d2c15c 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -28,6 +28,10 @@
 #include "block/block_int.h"
 #include "qemu-common.h"
 #include "qcow2.h"
+#ifdef CONFIG_SHA256_DEDUP
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
+#endif
 
 /*
  * Prepare a buffer containing everything required to compute cluster
@@ -142,7 +146,18 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
                                        QCowHash *hash,
                                        uint8_t *data)
 {
-    return 0;
+    BDRVQcowState *s = bs->opaque;
+    switch (s->dedup_hash_algo) {
+#ifdef CONFIG_SHA256_DEDUP
+    case QCOW_HASH_SHA256:
+        return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
+                                s->cluster_size, hash->data);
+#endif
+    default:
+        error_report("Invalid deduplication hash algorithm %i",
+                     s->dedup_hash_algo);
+        abort();
+    }
 }
 
 /*
diff --git a/configure b/configure
index e818e8b..34b26b2 100755
--- a/configure
+++ b/configure
@@ -240,6 +240,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+sha256_dedup="yes"
 
 # parse CC options first
 for opt do
@@ -930,6 +931,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-sha256-dedup) sha256_dedup="no"
+  ;;
+  --enable-sha256-dedup) sha256_dedup="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1190,6 +1195,8 @@ echo "  --with-coroutine=BACKEND coroutine backend. Supported options:"
 echo "                           gthread, ucontext, sigaltstack, windows"
 echo "  --enable-glusterfs       enable GlusterFS backend"
 echo "  --disable-glusterfs      disable GlusterFS backend"
+echo "  --disable-sha256-dedup   disable sha256 dedup"
+echo "  --enable-sha256-dedup    enables sha256 dedup"
 echo "  --enable-gcov            enable test coverage analysis with gcov"
 echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
@@ -1812,32 +1819,60 @@ EOF
 fi
 
 ##########################################
-# VNC TLS/WS detection
-if test "$vnc" = "yes" -a \( "$vnc_tls" != "no" -o "$vnc_ws" != "no" \) ; then
-  cat > $TMPC <<EOF
+# gnutls detection (factorize the VNC TLS and SHA256 deduplication test)
+cat > $TMPC <<EOF
 #include <gnutls/gnutls.h>
-int main(void) { gnutls_session_t s; gnutls_init(&s, GNUTLS_SERVER); return 0; }
+#include <gnutls/crypto.h>
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
 EOF
-  vnc_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
-  vnc_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
-  if compile_prog "$vnc_tls_cflags" "$vnc_tls_libs" ; then
-    if test "$vnc_tls" != "no" ; then
-      vnc_tls=yes
-    fi
-    if test "$vnc_ws" != "no" ; then
-      vnc_ws=yes
-    fi
-    libs_softmmu="$vnc_tls_libs $libs_softmmu"
-    QEMU_CFLAGS="$QEMU_CFLAGS $vnc_tls_cflags"
+gnu_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
+gnu_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
+if compile_prog "$gnu_tls_cflags" "$gnu_tls_libs" ; then
+  gnu_tls=yes
+else
+  gnu_tls=no
+fi
+
+##########################################
+# VNC TLS/WS
+if test "$vnc" = "yes" -a "$gnu_tls" != "no"; then
+  if test "$vnc_tls" != "no" ; then
+    libs_softmmu="$gnu_tls_libs $libs_softmmu"
+    libs_tools="$gnu_tls_libs $libs_softmmu"
+    QEMU_CFLAGS="$QEMU_CFLAGS $gnu_tls_cflags"
+    vnc_tls=yes
+  fi
+  if test "$vnc_ws" != "no" ; then
+    libs_softmmu="$gnu_tls_libs $libs_softmmu"
+    libs_tools="$gnu_tls_libs $libs_softmmu"
+    QEMU_CFLAGS="$QEMU_CFLAGS $gnu_tls_cflags"
+    vnc_ws=yes
+  fi
+else
+  if test "$vnc_tls" = "yes" ; then
+    feature_not_found "vnc-tls"
+  fi
+  if test "$vnc_ws" = "yes" ; then
+    feature_not_found "vnc-ws"
+  fi
+  vnc_tls=no
+  vnc_ws=no
+fi
+
+##########################################
+# SHA256 deduplication
+if test "$sha256_dedup" = "yes"; then
+  if test "$gnu_tls" = "yes"; then
+    libs_softmmu="$gnu_tls_libs $libs_softmmu"
+    libs_tools="$gnu_tls_libs $libs_softmmu"
+    QEMU_CFLAGS="$QEMU_CFLAGS $gnu_tls_cflags"
+    sha256_dedup=yes
   else
-    if test "$vnc_tls" = "yes" ; then
-      feature_not_found "vnc-tls"
-    fi
-    if test "$vnc_ws" = "yes" ; then
-      feature_not_found "vnc-ws"
-    fi
-    vnc_tls=no
-    vnc_ws=no
+    echo "gnutls > 2.10.0 required to compile QEMU with sha256 deduplication"
+    exit 1
   fi
 fi
 
@@ -3562,6 +3597,7 @@ echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "GlusterFS support $glusterfs"
 echo "virtio-blk-data-plane $virtio_blk_data_plane"
+echo "sha256-dedup      $sha256_dedup"
 echo "gcov              $gcov_tool"
 echo "gcov enabled      $gcov"
 echo "TPM support       $tpm"
@@ -3951,6 +3987,10 @@ if test "$virtio_blk_data_plane" = "yes" ; then
   echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
 fi
 
+if test "$sha256_dedup" = "yes" ; then
+  echo "CONFIG_SHA256_DEDUP=y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 14/24] qcow2: Load and save deduplication table header extension.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (12 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 13/24] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 15/24] qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature Benoît Canet
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 34b2a87..3cd1051 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -54,9 +54,19 @@ typedef struct {
     uint32_t len;
 } QCowExtension;
 
+typedef struct {
+    uint64_t offset;
+    uint32_t size;
+    uint32_t max;
+    uint32_t flags;
+    uint8_t  hash_algo;
+    char     reserved[56];
+} QCowDedupConfExtension;
+
 #define  QCOW2_EXT_MAGIC_END 0
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define  QCOW2_EXT_MAGIC_DEDUP_TABLE 0xCD8E819B
 
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
@@ -85,6 +95,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
     QCowExtension ext;
     uint64_t offset;
     int ret;
+    QCowDedupConfExtension dedup_conf_ext;
 
 #ifdef DEBUG_EXT
     printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
@@ -149,6 +160,28 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
             }
             break;
 
+        case QCOW2_EXT_MAGIC_DEDUP_TABLE:
+            if (ext.len > sizeof(dedup_conf_ext)) {
+                fprintf(stderr, "ERROR: dedup_conf_ext: len=%u too large"
+                        " (>=%zu)\n",
+                        ext.len, sizeof(dedup_conf_ext));
+                return 2;
+            }
+            ret = bdrv_pread(bs->file, offset,
+                             &dedup_conf_ext, ext.len);
+            if (ret < 0) {
+                return ret;
+            }
+            s->dedup_conf_offset =
+                be64_to_cpu(dedup_conf_ext.offset);
+            s->dedup_conf_size =
+                be32_to_cpu(dedup_conf_ext.size);
+            s->dedup_max_incarnations =
+                be32_to_cpu(dedup_conf_ext.max);
+            s->dedup_hash_algo = dedup_conf_ext.hash_algo;
+            s->dedup_dirty = dedup_conf_ext.flags & QCOW_DEDUP_DIRTY;
+            break;
+
         default:
             /* unknown magic - save it in case we need to rewrite the header */
             {
@@ -1006,6 +1039,7 @@ int qcow2_update_header(BlockDriverState *bs)
     uint32_t refcount_table_clusters;
     size_t header_length;
     Qcow2UnknownHeaderExtension *uext;
+    QCowDedupConfExtension dedup_conf_ext;
 
     buf = qemu_blockalign(bs, buflen);
 
@@ -1109,6 +1143,25 @@ int qcow2_update_header(BlockDriverState *bs)
     buf += ret;
     buflen -= ret;
 
+    if (s->has_dedup) {
+        memset(&dedup_conf_ext, 0, sizeof(dedup_conf_ext));
+        dedup_conf_ext.offset    = cpu_to_be64(s->dedup_conf_offset);
+        dedup_conf_ext.size      = cpu_to_be32(s->dedup_conf_size);
+        dedup_conf_ext.max       = cpu_to_be32(s->dedup_max_incarnations);
+        dedup_conf_ext.hash_algo = s->dedup_hash_algo;
+        dedup_conf_ext.flags     = s->dedup_dirty ? QCOW_DEDUP_DIRTY : 0;
+        ret = header_ext_add(buf,
+                             QCOW2_EXT_MAGIC_DEDUP_TABLE,
+                             &dedup_conf_ext,
+                             sizeof(dedup_conf_ext),
+                             buflen);
+        if (ret < 0) {
+            goto fail;
+        }
+        buf += ret;
+        buflen -= ret;
+    }
+
     /* Keep unknown header extensions */
     QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
         ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 15/24] qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (13 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 14/24] qcow2: Load and save deduplication table header extension Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 16/24] block: Add qcow2_dedup format and image creation code Benoît Canet
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Also change callers.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-cluster.c |    2 +-
 block/qcow2.c         |   43 ++++++++++++++++++++++---------------------
 block/qcow2.h         |    7 ++++---
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 41c4bc2..7cd6e75 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -672,7 +672,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 
     /* Update L2 table. */
     if (s->use_lazy_refcounts) {
-        qcow2_mark_dirty(bs);
+        qcow2_set_incompat_feature(bs, QCOW2_INCOMPAT_DIRTY);
     }
     if (qcow2_need_accurate_refcounts(s)) {
         qcow2_cache_set_dependency(bs, s->l2_table_cache,
diff --git a/block/qcow2.c b/block/qcow2.c
index 3cd1051..58e6236 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -250,56 +250,57 @@ static void report_unsupported_feature(BlockDriverState *bs,
 }
 
 /*
- * Sets the dirty bit and flushes afterwards if necessary.
+ * Sets the an incompatible feature bit and flushes afterwards if necessary.
  *
  * The incompatible_features bit is only set if the image file header was
  * updated successfully.  Therefore it is not required to check the return
  * value of this function.
  */
-int qcow2_mark_dirty(BlockDriverState *bs)
+int qcow2_set_incompat_feature(BlockDriverState *bs,
+                               QCow2IncompatibleFeature feature)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t val;
-    int ret;
+    int ret = 0;
 
     assert(s->qcow_version >= 3);
 
-    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
-        return 0; /* already dirty */
+    if (s->incompatible_features & feature) {
+        return 0; /* already added */
     }
 
-    val = cpu_to_be64(s->incompatible_features | QCOW2_INCOMPAT_DIRTY);
+    val = cpu_to_be64(s->incompatible_features | feature);
     ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, incompatible_features),
                       &val, sizeof(val));
     if (ret < 0) {
         return ret;
     }
-    ret = bdrv_flush(bs->file);
-    if (ret < 0) {
-        return ret;
-    }
 
-    /* Only treat image as dirty if the header was updated successfully */
-    s->incompatible_features |= QCOW2_INCOMPAT_DIRTY;
+    /* Only treat image as having the feature if the header was updated
+     * successfully
+     */
+    s->incompatible_features |= feature;
     return 0;
 }
 
 /*
- * Clears the dirty bit and flushes before if necessary.  Only call this
- * function when there are no pending requests, it does not guard against
- * concurrent requests dirtying the image.
+ * Clears an incompatible feature bit and flushes before if necessary.
+ * Only call this function when there are no pending requests, it does not
+ * guard against concurrent requests adding a feature to the image.
  */
-static int qcow2_mark_clean(BlockDriverState *bs)
+static int qcow2_clear_incompat_feature(BlockDriverState *bs,
+                                        QCow2IncompatibleFeature feature)
 {
     BDRVQcowState *s = bs->opaque;
+    int ret = 0;
 
-    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
-        int ret = bdrv_flush(bs);
+    if (s->incompatible_features & feature) {
+        ret = bdrv_flush(bs);
         if (ret < 0) {
             return ret;
         }
 
-        s->incompatible_features &= ~QCOW2_INCOMPAT_DIRTY;
+        s->incompatible_features &= ~feature;
         return qcow2_update_header(bs);
     }
     return 0;
@@ -314,7 +315,7 @@ static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
     }
 
     if (fix && result->check_errors == 0 && result->corruptions == 0) {
-        return qcow2_mark_clean(bs);
+        return qcow2_clear_incompat_feature(bs, QCOW2_INCOMPAT_DIRTY);
     }
     return ret;
 }
@@ -949,7 +950,7 @@ static void qcow2_close(BlockDriverState *bs)
     qcow2_cache_flush(bs, s->l2_table_cache);
     qcow2_cache_flush(bs, s->refcount_block_cache);
 
-    qcow2_mark_clean(bs);
+    qcow2_clear_incompat_feature(bs, QCOW2_INCOMPAT_DIRTY);
 
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
diff --git a/block/qcow2.h b/block/qcow2.h
index ff26e81..720131d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -290,14 +290,14 @@ enum {
 };
 
 /* Incompatible feature bits */
-enum {
+typedef enum {
     QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
     QCOW2_INCOMPAT_DIRTY         = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
     QCOW2_INCOMPAT_DEDUP_BITNR   = 1,
     QCOW2_INCOMPAT_DEDUP         = 1 << QCOW2_INCOMPAT_DEDUP_BITNR,
 
     QCOW2_INCOMPAT_MASK          = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
-};
+} QCow2IncompatibleFeature;
 
 /* Compatible feature bits */
 enum {
@@ -543,7 +543,8 @@ static inline uint64_t l2meta_cow_end(QCowL2Meta *m)
 int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
                   int64_t sector_num, int nb_sectors);
 
-int qcow2_mark_dirty(BlockDriverState *bs);
+int qcow2_set_incompat_feature(BlockDriverState *bs,
+                               QCow2IncompatibleFeature feature);
 int qcow2_update_header(BlockDriverState *bs);
 int qcow2_read_cluster_data(BlockDriverState *bs,
                             uint8_t *data,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 16/24] block: Add qcow2_dedup format and image creation code.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (14 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 15/24] qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 17/24] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Also modify qemu-io-test.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c                |  176 ++++++++++++++++++++++++++++++++++++++++--
 include/block/block_int.h    |    1 +
 tests/qemu-iotests/common.rc |    3 +-
 3 files changed, 173 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 58e6236..e1265a2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1313,7 +1313,8 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
                          const char *backing_file, const char *backing_format,
                          int flags, size_t cluster_size, int prealloc,
-                         QEMUOptionParameter *options, int version)
+                         QEMUOptionParameter *options, int version,
+                         bool dedup, uint8_t hash_algo)
 {
     /* Calculate cluster_bits */
     int cluster_bits;
@@ -1417,11 +1418,39 @@ static int qcow2_create2(const char *filename, int64_t total_size,
     }
 
     /* Okay, now that we have a valid image, let's give it the right size */
+    BDRVQcowState *s = bs->opaque;
     ret = bdrv_truncate(bs, total_size * BDRV_SECTOR_SIZE);
     if (ret < 0) {
         goto out;
     }
 
+    if (dedup) {
+        s->has_dedup = true;
+        /* fill cluster size for qcow2_store_create */
+        s->cluster_size = cluster_size;
+        /* unlimited incarnation number */
+        s->dedup_max_incarnations = 0;
+        s->dedup_hash_algo = hash_algo;
+
+        ret = qcow2_set_incompat_feature(bs, QCOW2_INCOMPAT_DEDUP);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* this will create the journal and save the conf to disk then
+         * save header extension
+         */
+        ret = qcow2_store_create(bs,
+                                 &s->key_value_store,
+                                 total_size * BDRV_SECTOR_SIZE);
+
+        if (ret < 0) {
+            goto out;
+        }
+
+        qcow2_store_cleanup(bs, &s->key_value_store);
+    }
+
     /* Want a backing file? There you go.*/
     if (backing_file) {
         ret = bdrv_change_backing_file(bs, backing_file, backing_format);
@@ -1447,15 +1476,41 @@ out:
     return ret;
 }
 
+static int qcow2_warn_if_version_3_is_needed(int version,
+                                             bool has_feature,
+                                             const char *feature)
+{
+    if (version < 3 && has_feature) {
+        fprintf(stderr, "%s only supported with compatibility "
+                "level 1.1 and above (use compat=1.1 or greater)\n",
+                feature);
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static int8_t qcow2_get_dedup_hash_algo(char *value)
+{
+    if (!value || !strcmp(value, "sha256")) {
+        return QCOW_HASH_SHA256;
+    }
+
+    error_printf("Unsupported deduplication hash algorithm.\n");
+    return -EINVAL;
+}
+
 static int qcow2_create(const char *filename, QEMUOptionParameter *options)
 {
     const char *backing_file = NULL;
     const char *backing_fmt = NULL;
     uint64_t sectors = 0;
     int flags = 0;
+    int ret;
     size_t cluster_size = DEFAULT_CLUSTER_SIZE;
     int prealloc = 0;
     int version = 2;
+    bool dedup = false;
+    int8_t hash_algo = 0;
 
     /* Read out options */
     while (options && options->name) {
@@ -1493,6 +1548,13 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
             }
         } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
             flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
+        } else if (!strcmp(options->name, BLOCK_OPT_DEDUP)) {
+            hash_algo = qcow2_get_dedup_hash_algo(options->value.s);
+            if (hash_algo < 0) {
+                return hash_algo;
+            }
+            dedup = true;
+            version = 3;
         }
         options++;
     }
@@ -1503,14 +1565,22 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
         return -EINVAL;
     }
 
-    if (version < 3 && (flags & BLOCK_FLAG_LAZY_REFCOUNTS)) {
-        fprintf(stderr, "Lazy refcounts only supported with compatibility "
-                "level 1.1 and above (use compat=1.1 or greater)\n");
-        return -EINVAL;
+    ret = qcow2_warn_if_version_3_is_needed(version,
+                                            flags & BLOCK_FLAG_LAZY_REFCOUNTS,
+                                            "Lazy refcounts");
+    if (ret < 0) {
+        return ret;
+    }
+    ret = qcow2_warn_if_version_3_is_needed(version,
+                                            dedup,
+                                            "Deduplication");
+    if (ret < 0) {
+        return ret;
     }
 
     return qcow2_create2(filename, sectors, backing_file, backing_fmt, flags,
-                         cluster_size, prealloc, options, version);
+                         cluster_size, prealloc, options, version,
+                         dedup, hash_algo);
 }
 
 static int qcow2_make_empty(BlockDriverState *bs)
@@ -1829,6 +1899,51 @@ static QEMUOptionParameter qcow2_create_options[] = {
     { NULL }
 };
 
+static QEMUOptionParameter qcow2_dedup_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+    {
+        .name = BLOCK_OPT_BACKING_FILE,
+        .type = OPT_STRING,
+        .help = "File name of a base image"
+    },
+    {
+        .name = BLOCK_OPT_BACKING_FMT,
+        .type = OPT_STRING,
+        .help = "Image format of the base image"
+    },
+    {
+        .name = BLOCK_OPT_ENCRYPT,
+        .type = OPT_FLAG,
+        .help = "Encrypt the image"
+    },
+    {
+        .name = BLOCK_OPT_CLUSTER_SIZE,
+        .type = OPT_SIZE,
+        .help = "qcow2 cluster size",
+        .value = { .n = DEFAULT_DEDUP_CLUSTER_SIZE },
+    },
+    {
+        .name = BLOCK_OPT_PREALLOC,
+        .type = OPT_STRING,
+        .help = "Preallocation mode (allowed values: off, metadata)"
+    },
+    {
+        .name = BLOCK_OPT_LAZY_REFCOUNTS,
+        .type = OPT_FLAG,
+        .help = "Postpone refcount updates",
+    },
+    {
+        .name = BLOCK_OPT_DEDUP,
+        .type = OPT_STRING,
+        .help = "Deduplication",
+    },
+    { NULL }
+};
+
 static BlockDriver bdrv_qcow2 = {
     .format_name        = "qcow2",
     .instance_size      = sizeof(BDRVQcowState),
@@ -1868,9 +1983,58 @@ static BlockDriver bdrv_qcow2 = {
     .bdrv_check = qcow2_check,
 };
 
+/* As all the defined .create_options are passed to qcow2_create() even if
+ * the user does not specify them it's not possible to have a default 4KB
+ * cluster size for deduplication.
+ * For example it's impossible to make the difference between the 64KB cluster
+ * size default create option of qcow2 or a 64KB user specified cluster size.
+ * So we declare the qcow2_dedup format in order to be able to define
+ * deduplication specific create options.
+ * It will also help for qemu-io-test integration.
+ */
+static BlockDriver bdrv_qcow2_dedup = {
+    .format_name        = "qcow2_dedup",
+    .instance_size      = sizeof(BDRVQcowState),
+    .bdrv_probe         = qcow2_probe,
+    .bdrv_open          = qcow2_open,
+    .bdrv_close         = qcow2_close,
+    .bdrv_reopen_prepare  = qcow2_reopen_prepare,
+    .bdrv_create        = qcow2_create,
+    .bdrv_co_is_allocated = qcow2_co_is_allocated,
+    .bdrv_set_key       = qcow2_set_key,
+    .bdrv_make_empty    = qcow2_make_empty,
+
+    .bdrv_co_readv          = qcow2_co_readv,
+    .bdrv_co_writev         = qcow2_co_writev,
+    .bdrv_co_flush_to_os    = qcow2_co_flush_to_os,
+
+    .bdrv_co_write_zeroes   = qcow2_co_write_zeroes,
+    .bdrv_co_discard        = qcow2_co_discard,
+    .bdrv_truncate          = qcow2_truncate,
+    .bdrv_write_compressed  = qcow2_write_compressed,
+
+    .bdrv_snapshot_create   = qcow2_snapshot_create,
+    .bdrv_snapshot_goto     = qcow2_snapshot_goto,
+    .bdrv_snapshot_delete   = qcow2_snapshot_delete,
+    .bdrv_snapshot_list     = qcow2_snapshot_list,
+    .bdrv_snapshot_load_tmp     = qcow2_snapshot_load_tmp,
+    .bdrv_get_info      = qcow2_get_info,
+
+    .bdrv_save_vmstate    = qcow2_save_vmstate,
+    .bdrv_load_vmstate    = qcow2_load_vmstate,
+
+    .bdrv_change_backing_file   = qcow2_change_backing_file,
+
+    .bdrv_invalidate_cache      = qcow2_invalidate_cache,
+
+    .create_options = qcow2_dedup_create_options,
+    .bdrv_check = qcow2_check,
+};
+
 static void bdrv_qcow2_init(void)
 {
     bdrv_register(&bdrv_qcow2);
+    bdrv_register(&bdrv_qcow2_dedup);
 }
 
 block_init(bdrv_qcow2_init);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6078dd3..ef4d5d5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,7 @@
 #define BLOCK_OPT_COMPAT_LEVEL      "compat"
 #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
 #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type"
+#define BLOCK_OPT_DEDUP             "dedup"
 
 typedef struct BdrvTrackedRequest BdrvTrackedRequest;
 
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 31eb62b..71b67f7 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -130,7 +130,8 @@ _make_test_img()
             -e "s# zeroed_grain=\\(on\\|off\\)##g" \
             -e "s# subformat='[^']*'##g" \
             -e "s# adapter_type='[^']*'##g" \
-            -e "s# lazy_refcounts=\\(on\\|off\\)##g"
+            -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+            -e "s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g"
 
     # Start an NBD server on the image file, which is what we'll be talking to
     if [ $IMGPROTO = "nbd" ]; then
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 17/24] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (15 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 16/24] block: Add qcow2_dedup format and image creation code Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted Benoît Canet
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

A new physical cluster with the same hash value will be used for further
occurrence of this hash.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 2d2c15c..da4ad5c 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -305,12 +305,24 @@ static int qcow2_deduplicate_cluster(BlockDriverState *bs,
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t cluster_index = hash_info->physical_sect / s->cluster_sectors;
-    int ret = 0;
+    int refcount, ret = 0;
 
     /* Increment the refcount of the cluster */
-    ret = qcow2_update_cluster_refcount(bs,
-                                        cluster_index,
-                                        1);
+    refcount = qcow2_update_cluster_refcount(bs,
+                                             cluster_index,
+                                             1);
+
+    if (refcount < 0) {
+        return refcount;
+    }
+
+    /* if we reached half the max refcount delete the QCowHashInfo from the
+     * store so the next time this cluster will be seen it will be handled as
+     * new
+     */
+    if (refcount >= 0xFFFF/2) {
+        ret = qcow2_store_delete(bs, &s->key_value_store, hash_info);
+    }
 
     if (ret < 0) {
         return ret;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (16 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 17/24] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 19/24] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c    |   45 +++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2-refcount.c |    3 +++
 block/qcow2.h          |    2 ++
 3 files changed, 50 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index da4ad5c..599cb2e 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -656,3 +656,48 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
 
     return ret;
 }
+
+/* Clean the last reference to a given cluster when its refcount is zero
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_destroy_hash(BlockDriverState *bs,
+                              uint64_t cluster_index)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t offset = cluster_index * s->cluster_size;
+    QCowHashInfo hash_info;
+    uint8_t *buf;
+    int ret = 0;
+
+    /* allocate buffer */
+    buf = qemu_blockalign(bs, s->cluster_size);
+
+    /* read cluster from disk */
+    ret = bdrv_pread(bs->file, offset, buf, s->cluster_size);
+
+    /* error */
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    /* clear hash info */
+    memset(&hash_info, 0, sizeof(QCowHashInfo));
+
+    /* compute hash for the cluster */
+    ret = qcow2_compute_cluster_hash(bs,
+                                     &hash_info.hash,
+                                     buf);
+
+
+    /* error */
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    /* delete hash from key value store. It will not be deduplicated anymore */
+    qcow2_store_delete(bs, &s->key_value_store, &hash_info);
+
+free_exit:
+   qemu_vfree(buf);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3bd8f37..2734cd9 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -482,6 +482,9 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
             ret = -EINVAL;
             goto fail;
         }
+        if (s->has_dedup && refcount == 0) {
+            qcow2_dedup_destroy_hash(bs, cluster_index);
+        }
         if (refcount == 0 && cluster_index < s->free_cluster_index) {
             s->free_cluster_index = cluster_index;
         }
diff --git a/block/qcow2.h b/block/qcow2.h
index 720131d..6f85e03 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -748,5 +748,7 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
                                  int count,
                                  uint64_t logical_sect,
                                  uint64_t physical_sect);
+void qcow2_dedup_destroy_hash(BlockDriverState *bs,
+                              uint64_t cluster_index);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 19/24] qcow2: Integrate deduplication in qcow2_co_writev loop.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (17 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated Benoît Canet
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e1265a2..8eb63f1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -342,6 +342,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags)
     Error *local_err = NULL;
     uint64_t ext_end;
 
+    s->has_dedup = false;
     ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
     if (ret < 0) {
         goto fail;
@@ -821,13 +822,17 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     int index_in_cluster;
     int n_end;
-    int ret;
+    int ret = 0;
     int cur_nr_sectors; /* number of sectors in current iteration */
     uint64_t cluster_offset;
     QEMUIOVector hd_qiov;
     uint64_t bytes_done = 0;
     uint8_t *cluster_data = NULL;
     QCowL2Meta *l2meta = NULL;
+    uint8_t *dedup_cluster_data = NULL;
+    int dedup_cluster_data_nr;
+    int deduped_sectors_nr;
+    QCowDedupState ds;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), sector_num,
                                  remaining_sectors);
@@ -838,13 +843,69 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
 
     qemu_co_mutex_lock(&s->lock);
 
+    if (s->has_dedup) {
+        QTAILQ_INIT(&ds.undedupables);
+        ds.phash.reuse = false;
+        ds.nb_undedupable_sectors = 0;
+        ds.nb_clusters_processed = 0;
+
+        /* if deduplication is on we make sure dedup_cluster_data
+         * contains a multiple of cluster size of data in order
+         * to compute the hashes
+         */
+        ret = qcow2_dedup_read_missing_and_concatenate(bs,
+                                                       qiov,
+                                                       sector_num,
+                                                       remaining_sectors,
+                                                       &dedup_cluster_data,
+                                                       &dedup_cluster_data_nr);
+
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
     while (remaining_sectors != 0) {
 
         l2meta = NULL;
 
         trace_qcow2_writev_start_part(qemu_coroutine_self());
+
+        if (s->has_dedup && ds.nb_undedupable_sectors == 0) {
+            /* Try to deduplicate as much clusters as possible */
+            deduped_sectors_nr = qcow2_dedup(bs,
+                                             &ds,
+                                             sector_num,
+                                             dedup_cluster_data,
+                                             dedup_cluster_data_nr);
+
+            if (deduped_sectors_nr < 0) {
+                goto fail;
+            }
+
+            remaining_sectors -= deduped_sectors_nr;
+            sector_num += deduped_sectors_nr;
+            bytes_done += deduped_sectors_nr * 512;
+
+            /* no more data to write -> exit */
+            if (remaining_sectors <= 0) {
+                break;
+            }
+
+            /* if we deduped something trace it */
+            if (deduped_sectors_nr) {
+                trace_qcow2_writev_done_part(qemu_coroutine_self(),
+                                             deduped_sectors_nr);
+                trace_qcow2_writev_start_part(qemu_coroutine_self());
+            }
+        }
+
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
-        n_end = index_in_cluster + remaining_sectors;
+        n_end = s->has_dedup &&
+                ds.nb_undedupable_sectors < remaining_sectors ?
+                index_in_cluster + ds.nb_undedupable_sectors :
+                index_in_cluster + remaining_sectors;
+
         if (s->crypt_method &&
             n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors) {
             n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
@@ -912,6 +973,28 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
             l2meta = next;
         }
 
+        /* Write the non duplicated clusters hashes to disk */
+        if (s->has_dedup) {
+            int count = cur_nr_sectors / s->cluster_sectors;
+            int has_ending = ((cluster_offset >> 9) + index_in_cluster +
+                             cur_nr_sectors) & (s->cluster_sectors - 1);
+            if (index_in_cluster) {
+                count++;
+            }
+            if (has_ending) {
+                count++;
+            }
+            ret = qcow2_dedup_store_new_hashes(bs,
+                                               &ds,
+                                               count,
+                                               sector_num,
+                                               (cluster_offset >> 9));
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        ds.nb_undedupable_sectors -= cur_nr_sectors;
         remaining_sectors -= cur_nr_sectors;
         sector_num += cur_nr_sectors;
         bytes_done += cur_nr_sectors * 512;
@@ -937,6 +1020,7 @@ fail:
 
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
+    qemu_vfree(dedup_cluster_data);
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
 
     return ret;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (18 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 19/24] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 21/24] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

This fixes the sub cluster sized writes race conditions while waiting
for a faster solution.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2.c |   14 ++++++++++++++
 block/qcow2.h |    1 +
 2 files changed, 15 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8eb63f1..11c115f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -534,6 +534,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags)
 
     /* Initialise locks */
     qemu_co_mutex_init(&s->lock);
+    qemu_co_mutex_init(&s->dedup_lock);
 
     /* Repair image if dirty */
     if (!(flags & BDRV_O_CHECK) && !bs->read_only &&
@@ -841,6 +842,15 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
 
     s->cluster_cache_offset = -1; /* disable compressed cache */
 
+    if (s->has_dedup) {
+        /* This mutex is used to serialize the write requests in the dedup case.
+         * The goal is to avoid that the dedup process concurrents requests to
+         * the same clusters and corrupt data.
+         * With qcow2_dedup_read_missing_and_concatenate that would not work.
+         */
+        qemu_co_mutex_lock(&s->dedup_lock);
+    }
+
     qemu_co_mutex_lock(&s->lock);
 
     if (s->has_dedup) {
@@ -1018,6 +1028,10 @@ fail:
         l2meta = next;
     }
 
+    if (s->has_dedup) {
+        qemu_co_mutex_unlock(&s->dedup_lock);
+    }
+
     qemu_iovec_destroy(&hd_qiov);
     qemu_vfree(cluster_data);
     qemu_vfree(dedup_cluster_data);
diff --git a/block/qcow2.h b/block/qcow2.h
index 6f85e03..3c6e685 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -364,6 +364,7 @@ typedef struct BDRVQcowState {
     Coroutine *load_filter_co;          /* used to load incarnations filters */
 
     CoMutex lock;
+    CoMutex dedup_lock;
 
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 21/24] qcow2: Integrate SKEIN hash algorithm in deduplication.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (19 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 22/24] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   15 +++++++++++++++
 block/qcow2.c       |    5 +++++
 configure           |   35 +++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 599cb2e..606459f 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -33,6 +33,10 @@
 #include <gnutls/crypto.h>
 #endif
 
+#ifdef CONFIG_SKEIN_DEDUP
+#include <skeinApi.h>
+#endif
+
 /*
  * Prepare a buffer containing everything required to compute cluster
  * sized deduplication hashes.
@@ -153,6 +157,17 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
         return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
                                 s->cluster_size, hash->data);
 #endif
+#if defined(CONFIG_SKEIN_DEDUP)
+    case QCOW_HASH_SKEIN:
+        {
+        SkeinCtx_t ctx;
+        skeinCtxPrepare(&ctx, Skein256);
+        skeinInit(&ctx, Skein256);
+        skeinUpdate(&ctx, data, s->cluster_size);
+        skeinFinal(&ctx, hash->data);
+        }
+        return 0;
+#endif
     default:
         error_report("Invalid deduplication hash algorithm %i",
                      s->dedup_hash_algo);
diff --git a/block/qcow2.c b/block/qcow2.c
index 11c115f..ea2f0f2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1592,6 +1592,11 @@ static int8_t qcow2_get_dedup_hash_algo(char *value)
     if (!value || !strcmp(value, "sha256")) {
         return QCOW_HASH_SHA256;
     }
+#if defined(CONFIG_SKEIN_DEDUP)
+    if (!strcmp(value, "skein")) {
+        return QCOW_HASH_SKEIN;
+    }
+#endif
 
     error_printf("Unsupported deduplication hash algorithm.\n");
     return -EINVAL;
diff --git a/configure b/configure
index 34b26b2..902c3c1 100755
--- a/configure
+++ b/configure
@@ -241,6 +241,7 @@ gtkabi="2.0"
 tpm="no"
 libssh2=""
 sha256_dedup="yes"
+skein_dedup="no"
 
 # parse CC options first
 for opt do
@@ -935,6 +936,10 @@ for opt do
   ;;
   --enable-sha256-dedup) sha256_dedup="yes"
   ;;
+  --disable-skein-dedup) skein_dedup="no"
+  ;;
+  --enable-skein-dedup) skein_dedup="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1202,6 +1207,7 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
 echo "  --disable-libssh2        disable ssh block device support"
 echo "  --enable-libssh2         enable ssh block device support"
+echo "  --enable-skein-dedup     enable computing dedup hashes with SKEIN"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2613,6 +2619,30 @@ EOF
   fi
 fi
 
+##########################################
+# SKEIN dedup hash function probe
+if test "$skein_dedup" != "no" ; then
+  cat > $TMPC <<EOF
+#include <skeinApi.h>
+int main(void) {
+    SkeinCtx_t ctx;
+    skeinCtxPrepare(&ctx, 512);
+    return 0;
+}
+EOF
+  skein_libs="-lskein3fish"
+  if compile_prog "" "$skein_libs" ; then
+    skein_dedup=yes
+    libs_tools="$skein_libs $libs_tools"
+    libs_softmmu="$skein_libs $libs_softmmu"
+  else
+    if test "$skein_dedup" = "yes" ; then
+      feature_not_found "libskein3fish not found"
+    fi
+    skein_dedup=no
+  fi
+fi
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -3603,6 +3633,7 @@ echo "gcov enabled      $gcov"
 echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
+echo "SKEIN support     $skein_dedup"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3991,6 +4022,10 @@ if test "$sha256_dedup" = "yes" ; then
   echo "CONFIG_SHA256_DEDUP=y" >> $config_host_mak
 fi
 
+if test "$skein_dedup" = "yes" ; then
+  echo "CONFIG_SKEIN_DEDUP=y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 22/24] qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (20 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 21/24] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 23/24] qcow2: Enable the deduplication feature Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 24/24] qcow2: Enable deduplication tests Benoît Canet
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qcow2-dedup.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c       |    2 +-
 block/qcow2.h       |    2 ++
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 606459f..5eeea38 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -716,3 +716,63 @@ void qcow2_dedup_destroy_hash(BlockDriverState *bs,
 free_exit:
    qemu_vfree(buf);
 }
+
+int qcow2_dedup_init(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    s->has_dedup = true;
+
+    /* if we are read-only we don't init anything */
+    if (bs->read_only) {
+        return 0;
+    }
+
+    /* no need to allocate the various store's buffers since qcow2_store_load
+     * will do it
+     */
+
+    /* load and parse the configuration from disk */
+    ret = qcow2_store_load(bs, &s->key_value_store);
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* if QEMU crashed forget everyting that was stored in the store */
+    if(s->dedup_dirty) {
+        qcow2_store_forget(bs, &s->key_value_store);
+    }
+
+    /* set the dirty bit */
+    s->dedup_dirty = true;
+    qcow2_update_header(bs);
+
+    /* load the journal and the incarnations in a coroutine */
+    return qcow2_store_start(bs, &s->key_value_store);
+}
+
+void qcow2_dedup_close(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0;
+
+    /* if we are read-only we don't need to cleanup */
+    if (bs->read_only) {
+        return;
+    }
+
+    ret = qcow2_store_flush(bs, &s->key_value_store);
+
+    qcow2_store_cleanup(&s->key_value_store);
+
+    /* flush failed -> leave the store dirty so it will be discard at restart */
+    if (ret < 0) {
+        return;
+    }
+
+    /* clear the dirty bit */
+    s->dedup_dirty = false;
+    qcow2_update_header(bs);
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index ea2f0f2..f7b94dd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1546,7 +1546,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
             goto out;
         }
 
-        qcow2_store_cleanup(bs, &s->key_value_store);
+        qcow2_store_cleanup(&s->key_value_store);
     }
 
     /* Want a backing file? There you go.*/
diff --git a/block/qcow2.h b/block/qcow2.h
index 3c6e685..b293be7 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -751,5 +751,7 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
                                  uint64_t physical_sect);
 void qcow2_dedup_destroy_hash(BlockDriverState *bs,
                               uint64_t cluster_index);
+int qcow2_dedup_init(BlockDriverState *bs);
+void qcow2_dedup_close(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 23/24] qcow2: Enable the deduplication feature.
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (21 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 22/24] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 24/24] qcow2: Enable deduplication tests Benoît Canet
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

---
 block/qcow2.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index f7b94dd..bb7bf74 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -569,6 +569,13 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags)
         goto fail;
     }
 
+    if (s->incompatible_features & QCOW2_INCOMPAT_DEDUP) {
+        ret = qcow2_dedup_init(bs);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
 #ifdef DEBUG_ALLOC
     {
         BdrvCheckResult result = {0};
@@ -1043,8 +1050,13 @@ fail:
 static void qcow2_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
+
     g_free(s->l1_table);
 
+    if (s->has_dedup) {
+        qcow2_dedup_close(bs);
+    }
+
     qcow2_cache_flush(bs, s->l2_table_cache);
     qcow2_cache_flush(bs, s->refcount_block_cache);
 
@@ -1547,6 +1559,11 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         }
 
         qcow2_store_cleanup(&s->key_value_store);
+
+        ret = qcow2_dedup_init(bs);
+        if (ret < 0) {
+            goto out;
+        }
     }
 
     /* Want a backing file? There you go.*/
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [RFC V8 24/24] qcow2: Enable deduplication tests
  2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
                   ` (22 preceding siblings ...)
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 23/24] qcow2: Enable the deduplication feature Benoît Canet
@ 2013-06-20 14:26 ` Benoît Canet
  23 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-06-20 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

---
 tests/qemu-iotests/common |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 6826ea7..2483742 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -130,6 +130,7 @@ check options
     -cow                test cow
     -qcow               test qcow
     -qcow2              test qcow2
+    -qcow2_dedup        test qcow2 deduplication
     -qed                test qed
     -vdi                test vdi
     -vpc                test vpc
@@ -175,6 +176,11 @@ testlist options
 	    xpand=false
 	    ;;
 
+	-qcow2_dedup)
+	    IMGFMT=qcow2_dedup
+	    xpand=false
+	    ;;
+
 	-qed)
 	    IMGFMT=qed
 	    xpand=false
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
@ 2013-07-02 14:42   ` Stefan Hajnoczi
  2013-07-02 14:54     ` Kevin Wolf
  2013-07-02 21:23     ` Benoît Canet
  0 siblings, 2 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2013-07-02 14:42 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel

On Thu, Jun 20, 2013 at 04:26:09PM +0200, Benoît Canet wrote:
> ---
>  docs/specs/qcow2.txt |   42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..a4ffc85 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -350,3 +350,45 @@ Snapshot table entry:
>          variable:   Unique ID string for the snapshot (not null terminated)
>  
>          variable:   Name of the snapshot (not null terminated)
> +
> +== Journal ==
> +
> +QCOW2 can use one or more instance of a metadata journal.

s/instance/instances/

Is there a reason to use multiple journals rather than a single journal
for all entry types?  The single journal area avoids seeks.

> +
> +A journal is a sequential log of journal entries appended on a previously
> +allocated and reseted area.

I think you say "previously reset area" instead of "reseted".  Another
option is "initialized area".

> +A journal is designed like a linked list with each entry pointing to the next
> +so it's easy to iterate over entries.
> +
> +A journal uses the following constants to denote the type of each entry
> +
> +TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
> +TYPE_END  = 1         the entry ends a journal cluster and point to the next
> +                      cluster
> +TYPE_HASH = 2         the entry contains a deduplication hash
> +
> +QCOW2 journal entry:
> +
> +    Byte 0         :    Size of the entry: size = 2 + n with size <= 254

This is not clear.  I'm wondering if the +2 is included in the byte
value or not.  I'm also wondering what a byte value of zero means and
what a byte value of 255 means.

Please include an example to illustrate how this field works.

> +
> +         1         :    Type of the entry
> +
> +         2 - size  :    The optional n bytes structure carried by entry
> +
> +A journal is divided into clusters and no journal entry can be spilled on two
> +clusters. This avoid having to read more than one cluster to get a single entry.
> +
> +For this purpose an entry with the end type is added at the end of a journal
> +cluster before starting to write in the next cluster.
> +The size of such an entry is set so the entry points to the next cluster.
> +
> +As any journal cluster must be ended with an end entry the size of regular
> +journal entries is limited to 254 bytes in order to always left room for an end
> +entry which mimimal size is two bytes.
> +
> +The only cases where size > 254 are none entries where size = 255.
> +
> +The replay of a journal stop when the first end none entry is reached.

s/stop/stops/

> +The journal cluster size is 4096 bytes.

Questions about this layout:

1. Journal entries have no integrity mechanism, which is especially
   important if they span physical sectors where cheap disks may perform
   a partial write.  This would leave a corrupt journal.  If the last
   bytes are a checksum then you can get some confidence that the entry
   was fully written and is valid.

   Did I miss something?

2. Byte-granularity means that read-modify-write is necessary to append
   entries to the journal.  Therefore a failure could destroy previously
   committed entries.

   Any ideas how existing journals handle this?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 14:42   ` Stefan Hajnoczi
@ 2013-07-02 14:54     ` Kevin Wolf
  2013-07-02 21:26       ` Benoît Canet
  2013-07-03  7:51       ` Stefan Hajnoczi
  2013-07-02 21:23     ` Benoît Canet
  1 sibling, 2 replies; 41+ messages in thread
From: Kevin Wolf @ 2013-07-02 14:54 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Benoît Canet, qemu-devel

Am 02.07.2013 um 16:42 hat Stefan Hajnoczi geschrieben:
> On Thu, Jun 20, 2013 at 04:26:09PM +0200, Benoît Canet wrote:
> > ---
> >  docs/specs/qcow2.txt |   42 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 42 insertions(+)
> > 
> > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> > index 36a559d..a4ffc85 100644
> > --- a/docs/specs/qcow2.txt
> > +++ b/docs/specs/qcow2.txt
> > @@ -350,3 +350,45 @@ Snapshot table entry:
> >          variable:   Unique ID string for the snapshot (not null terminated)
> >  
> >          variable:   Name of the snapshot (not null terminated)
> > +
> > +== Journal ==
> > +
> > +QCOW2 can use one or more instance of a metadata journal.
> 
> s/instance/instances/
> 
> Is there a reason to use multiple journals rather than a single journal
> for all entry types?  The single journal area avoids seeks.
> 
> > +
> > +A journal is a sequential log of journal entries appended on a previously
> > +allocated and reseted area.
> 
> I think you say "previously reset area" instead of "reseted".  Another
> option is "initialized area".
> 
> > +A journal is designed like a linked list with each entry pointing to the next
> > +so it's easy to iterate over entries.
> > +
> > +A journal uses the following constants to denote the type of each entry
> > +
> > +TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
> > +TYPE_END  = 1         the entry ends a journal cluster and point to the next
> > +                      cluster
> > +TYPE_HASH = 2         the entry contains a deduplication hash
> > +
> > +QCOW2 journal entry:
> > +
> > +    Byte 0         :    Size of the entry: size = 2 + n with size <= 254
> 
> This is not clear.  I'm wondering if the +2 is included in the byte
> value or not.  I'm also wondering what a byte value of zero means and
> what a byte value of 255 means.
> 
> Please include an example to illustrate how this field works.
> 
> > +
> > +         1         :    Type of the entry
> > +
> > +         2 - size  :    The optional n bytes structure carried by entry
> > +
> > +A journal is divided into clusters and no journal entry can be spilled on two
> > +clusters. This avoid having to read more than one cluster to get a single entry.
> > +
> > +For this purpose an entry with the end type is added at the end of a journal
> > +cluster before starting to write in the next cluster.
> > +The size of such an entry is set so the entry points to the next cluster.
> > +
> > +As any journal cluster must be ended with an end entry the size of regular
> > +journal entries is limited to 254 bytes in order to always left room for an end
> > +entry which mimimal size is two bytes.
> > +
> > +The only cases where size > 254 are none entries where size = 255.
> > +
> > +The replay of a journal stop when the first end none entry is reached.
> 
> s/stop/stops/
> 
> > +The journal cluster size is 4096 bytes.
> 
> Questions about this layout:
> 
> 1. Journal entries have no integrity mechanism, which is especially
>    important if they span physical sectors where cheap disks may perform
>    a partial write.  This would leave a corrupt journal.  If the last
>    bytes are a checksum then you can get some confidence that the entry
>    was fully written and is valid.
> 
>    Did I miss something?

Adding a checksum sounds like a good idea.

> 2. Byte-granularity means that read-modify-write is necessary to append
>    entries to the journal.  Therefore a failure could destroy previously
>    committed entries.
> 
>    Any ideas how existing journals handle this?

You commit only whole blocks. So in this case we can consider a block
only committed as soon as a TYPE_END entry has been written (and after
that we won't touch it any more until the journalled changes have been
flushed to disk).

There's one "interesting" case: cache=writethrough. I'm not entirely
sure yet what to do with it, but it's slow anyway, so using one block
per entry and therefore flushing the journal very often might actually
be not totally unreasonable.

Another thing I'm not sure about is whether a fixed 4k block is good or
if we should leave it configurable. I don't think making it an option
would hurt (not necessarily modifyable with qemu-img, but as a field
in the file format).

Kevin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 14:42   ` Stefan Hajnoczi
  2013-07-02 14:54     ` Kevin Wolf
@ 2013-07-02 21:23     ` Benoît Canet
  2013-07-03  8:01       ` Stefan Hajnoczi
                         ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Benoît Canet @ 2013-07-02 21:23 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel

> > +QCOW2 can use one or more instance of a metadata journal.
> 
> s/instance/instances/
> 
> Is there a reason to use multiple journals rather than a single journal
> for all entry types?  The single journal area avoids seeks.

Here are the main reason for this:

For the deduplication some patterns like cycles of insertion/deletion could
leave the hash table almost empty while filling the journal.

If the journal is full and the hash table is empty a packing operation is
started.

Basically a new journal is created and only the entry presents in the hash table
are reinserted.

This is why I want to keep the deduplication journal appart from regular qcow2
journal: to avoid interferences between a pack operation and regular qcow2
journal entries.

The other thing is that freezing the log store would need a replay of regular
qcow2 entries as it trigger a reset of the journal.

Also since deduplication will not work on spinning disk I discarded the seek
time factor.

Maybe commiting the dedupe journal by erase block sized chunk would be a good
idea to reduce random writes to the SSD.

The additional reason for having multiple journals is that the SILT paper
propose a mode where prefix of the hash is used to dispatch insertions in
multiples store and it easier to do with multiple journals.

> 
> > +
> > +A journal is a sequential log of journal entries appended on a previously
> > +allocated and reseted area.
> 
> I think you say "previously reset area" instead of "reseted".  Another
> option is "initialized area".
> 
> > +A journal is designed like a linked list with each entry pointing to the next
> > +so it's easy to iterate over entries.
> > +
> > +A journal uses the following constants to denote the type of each entry
> > +
> > +TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
> > +TYPE_END  = 1         the entry ends a journal cluster and point to the next
> > +                      cluster
> > +TYPE_HASH = 2         the entry contains a deduplication hash
> > +
> > +QCOW2 journal entry:
> > +
> > +    Byte 0         :    Size of the entry: size = 2 + n with size <= 254
> 
> This is not clear.  I'm wondering if the +2 is included in the byte
> value or not.  I'm also wondering what a byte value of zero means and
> what a byte value of 255 means.

I am counting the journal entry header in the size. So yes the +2 is in the byte
value.
A byte value of zero, 1 or 255  is an error.

Maybe this design is bogus and I should only count the payload size in the size
field. It would make less tricky cases.

> 
> Please include an example to illustrate how this field works.
> 
> > +
> > +         1         :    Type of the entry
> > +
> > +         2 - size  :    The optional n bytes structure carried by entry
> > +
> > +A journal is divided into clusters and no journal entry can be spilled on two
> > +clusters. This avoid having to read more than one cluster to get a single entry.
> > +
> > +For this purpose an entry with the end type is added at the end of a journal
> > +cluster before starting to write in the next cluster.
> > +The size of such an entry is set so the entry points to the next cluster.
> > +
> > +As any journal cluster must be ended with an end entry the size of regular
> > +journal entries is limited to 254 bytes in order to always left room for an end
> > +entry which mimimal size is two bytes.
> > +
> > +The only cases where size > 254 are none entries where size = 255.
> > +
> > +The replay of a journal stop when the first end none entry is reached.
> 
> s/stop/stops/
> 
> > +The journal cluster size is 4096 bytes.
> 
> Questions about this layout:
> 
> 1. Journal entries have no integrity mechanism, which is especially
>    important if they span physical sectors where cheap disks may perform
>    a partial write.  This would leave a corrupt journal.  If the last
>    bytes are a checksum then you can get some confidence that the entry
>    was fully written and is valid.

I will add a checksum mecanism.

Do you have any preferences regarding the checksum function ?

> 
>    Did I miss something?
> 
> 2. Byte-granularity means that read-modify-write is necessary to append
>    entries to the journal.  Therefore a failure could destroy previously
>    committed entries.

It's designed to be committed by 4KB blocks.

> 
>    Any ideas how existing journals handle this?
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 14:54     ` Kevin Wolf
@ 2013-07-02 21:26       ` Benoît Canet
  2013-07-03  8:08         ` Kevin Wolf
  2013-07-03  7:51       ` Stefan Hajnoczi
  1 sibling, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-07-02 21:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi

> > 2. Byte-granularity means that read-modify-write is necessary to append
> >    entries to the journal.  Therefore a failure could destroy previously
> >    committed entries.
> > 
> >    Any ideas how existing journals handle this?
> 
> You commit only whole blocks. So in this case we can consider a block
> only committed as soon as a TYPE_END entry has been written (and after
> that we won't touch it any more until the journalled changes have been
> flushed to disk).
> 
> There's one "interesting" case: cache=writethrough. I'm not entirely
> sure yet what to do with it, but it's slow anyway, so using one block
> per entry and therefore flushing the journal very often might actually
> be not totally unreasonable.

This sure would finish to kill the performance because this would be an io
per metadata written to disk.

> 
> Another thing I'm not sure about is whether a fixed 4k block is good or
> if we should leave it configurable. I don't think making it an option
> would hurt (not necessarily modifyable with qemu-img, but as a field
> in the file format).

I agree.
I also think about make the number of block to be flushed at once configurable.

Benoît

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 14:54     ` Kevin Wolf
  2013-07-02 21:26       ` Benoît Canet
@ 2013-07-03  7:51       ` Stefan Hajnoczi
  1 sibling, 0 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2013-07-03  7:51 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Benoît Canet, qemu-devel

On Tue, Jul 02, 2013 at 04:54:46PM +0200, Kevin Wolf wrote:
> Am 02.07.2013 um 16:42 hat Stefan Hajnoczi geschrieben:
> > On Thu, Jun 20, 2013 at 04:26:09PM +0200, Benoît Canet wrote:
> > > ---
> > >  docs/specs/qcow2.txt |   42 ++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 42 insertions(+)
> > > 
> > > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> > > index 36a559d..a4ffc85 100644
> > > --- a/docs/specs/qcow2.txt
> > > +++ b/docs/specs/qcow2.txt
> > > @@ -350,3 +350,45 @@ Snapshot table entry:
> > >          variable:   Unique ID string for the snapshot (not null terminated)
> > >  
> > >          variable:   Name of the snapshot (not null terminated)
> > > +
> > > +== Journal ==
> > > +
> > > +QCOW2 can use one or more instance of a metadata journal.
> > 
> > s/instance/instances/
> > 
> > Is there a reason to use multiple journals rather than a single journal
> > for all entry types?  The single journal area avoids seeks.
> > 
> > > +
> > > +A journal is a sequential log of journal entries appended on a previously
> > > +allocated and reseted area.
> > 
> > I think you say "previously reset area" instead of "reseted".  Another
> > option is "initialized area".
> > 
> > > +A journal is designed like a linked list with each entry pointing to the next
> > > +so it's easy to iterate over entries.
> > > +
> > > +A journal uses the following constants to denote the type of each entry
> > > +
> > > +TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
> > > +TYPE_END  = 1         the entry ends a journal cluster and point to the next
> > > +                      cluster
> > > +TYPE_HASH = 2         the entry contains a deduplication hash
> > > +
> > > +QCOW2 journal entry:
> > > +
> > > +    Byte 0         :    Size of the entry: size = 2 + n with size <= 254
> > 
> > This is not clear.  I'm wondering if the +2 is included in the byte
> > value or not.  I'm also wondering what a byte value of zero means and
> > what a byte value of 255 means.
> > 
> > Please include an example to illustrate how this field works.
> > 
> > > +
> > > +         1         :    Type of the entry
> > > +
> > > +         2 - size  :    The optional n bytes structure carried by entry
> > > +
> > > +A journal is divided into clusters and no journal entry can be spilled on two
> > > +clusters. This avoid having to read more than one cluster to get a single entry.
> > > +
> > > +For this purpose an entry with the end type is added at the end of a journal
> > > +cluster before starting to write in the next cluster.
> > > +The size of such an entry is set so the entry points to the next cluster.
> > > +
> > > +As any journal cluster must be ended with an end entry the size of regular
> > > +journal entries is limited to 254 bytes in order to always left room for an end
> > > +entry which mimimal size is two bytes.
> > > +
> > > +The only cases where size > 254 are none entries where size = 255.
> > > +
> > > +The replay of a journal stop when the first end none entry is reached.
> > 
> > s/stop/stops/
> > 
> > > +The journal cluster size is 4096 bytes.
> > 
> > Questions about this layout:
> > 
> > 1. Journal entries have no integrity mechanism, which is especially
> >    important if they span physical sectors where cheap disks may perform
> >    a partial write.  This would leave a corrupt journal.  If the last
> >    bytes are a checksum then you can get some confidence that the entry
> >    was fully written and is valid.
> > 
> >    Did I miss something?
> 
> Adding a checksum sounds like a good idea.
> 
> > 2. Byte-granularity means that read-modify-write is necessary to append
> >    entries to the journal.  Therefore a failure could destroy previously
> >    committed entries.
> > 
> >    Any ideas how existing journals handle this?
> 
> You commit only whole blocks. So in this case we can consider a block
> only committed as soon as a TYPE_END entry has been written (and after
> that we won't touch it any more until the journalled changes have been
> flushed to disk).
> 
> There's one "interesting" case: cache=writethrough. I'm not entirely
> sure yet what to do with it, but it's slow anyway, so using one block
> per entry and therefore flushing the journal very often might actually
> be not totally unreasonable.
> 
> Another thing I'm not sure about is whether a fixed 4k block is good or
> if we should leave it configurable. I don't think making it an option
> would hurt (not necessarily modifyable with qemu-img, but as a field
> in the file format).

Making block size configurable seems like a good idea so we can adapt to
disk performance and data integrity characteristics.

Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 21:23     ` Benoît Canet
@ 2013-07-03  8:01       ` Stefan Hajnoczi
  2013-07-03 12:35         ` Benoît Canet
  2013-07-03  8:04       ` Kevin Wolf
  2013-07-03  8:12       ` Stefan Hajnoczi
  2 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2013-07-03  8:01 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel

On Tue, Jul 02, 2013 at 11:23:56PM +0200, Benoît Canet wrote:
> > > +QCOW2 can use one or more instance of a metadata journal.
> > 
> > s/instance/instances/
> > 
> > Is there a reason to use multiple journals rather than a single journal
> > for all entry types?  The single journal area avoids seeks.
> 
> Here are the main reason for this:
> 
> For the deduplication some patterns like cycles of insertion/deletion could
> leave the hash table almost empty while filling the journal.
> 
> If the journal is full and the hash table is empty a packing operation is
> started.
> 
> Basically a new journal is created and only the entry presents in the hash table
> are reinserted.
> 
> This is why I want to keep the deduplication journal appart from regular qcow2
> journal: to avoid interferences between a pack operation and regular qcow2
> journal entries.
> 
> The other thing is that freezing the log store would need a replay of regular
> qcow2 entries as it trigger a reset of the journal.
> 
> Also since deduplication will not work on spinning disk I discarded the seek
> time factor.
> 
> Maybe commiting the dedupe journal by erase block sized chunk would be a good
> idea to reduce random writes to the SSD.
> 
> The additional reason for having multiple journals is that the SILT paper
> propose a mode where prefix of the hash is used to dispatch insertions in
> multiples store and it easier to do with multiple journals.

It sounds like the journal is more than just a data integrity mechanism.
It's an integral part of your dedup algorithm and you plan to carefully
manage it while rebuilding some of the other dedup data structures.

Does this mean the journal forms the first-stage data structure for
deduplication?  Dedup records will accumulate in the journal until it
becomes time to convert them in bulk into a more compact representation?

When I read this specification I was thinking of a journal purely for
logging operations.  You could use a commit record to mark previous
records applied.  Upon startup, qcow2 would inspect uncommitted records
and deal with them.

We just need to figure out how to define a good interface so that the
journal can be used in a general way but also for dedup's specific
needs.

Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 21:23     ` Benoît Canet
  2013-07-03  8:01       ` Stefan Hajnoczi
@ 2013-07-03  8:04       ` Kevin Wolf
  2013-07-03 12:30         ` Benoît Canet
  2013-07-03  8:12       ` Stefan Hajnoczi
  2 siblings, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2013-07-03  8:04 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel, Stefan Hajnoczi

Am 02.07.2013 um 23:23 hat Benoît Canet geschrieben:
> Also since deduplication will not work on spinning disk I discarded the seek
> time factor.

Care to explain that in more detail? Why shouldn't it work on spinning
disks?

Kevin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 21:26       ` Benoît Canet
@ 2013-07-03  8:08         ` Kevin Wolf
  0 siblings, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2013-07-03  8:08 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel, Stefan Hajnoczi

Am 02.07.2013 um 23:26 hat Benoît Canet geschrieben:
> > > 2. Byte-granularity means that read-modify-write is necessary to append
> > >    entries to the journal.  Therefore a failure could destroy previously
> > >    committed entries.
> > > 
> > >    Any ideas how existing journals handle this?
> > 
> > You commit only whole blocks. So in this case we can consider a block
> > only committed as soon as a TYPE_END entry has been written (and after
> > that we won't touch it any more until the journalled changes have been
> > flushed to disk).
> > 
> > There's one "interesting" case: cache=writethrough. I'm not entirely
> > sure yet what to do with it, but it's slow anyway, so using one block
> > per entry and therefore flushing the journal very often might actually
> > be not totally unreasonable.
> 
> This sure would finish to kill the performance because this would be an io
> per metadata written to disk.

cache=writethrough already pretty much kills performance because it's
not only an I/O per metadata write, but also a flush.

The question is, do we have any option to avoid it?

> > Another thing I'm not sure about is whether a fixed 4k block is good or
> > if we should leave it configurable. I don't think making it an option
> > would hurt (not necessarily modifyable with qemu-img, but as a field
> > in the file format).
> 
> I agree.
> I also think about make the number of block to be flushed at once configurable.

This is more of a runtime option. We can store a default in the image,
though.

Kevin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-02 21:23     ` Benoît Canet
  2013-07-03  8:01       ` Stefan Hajnoczi
  2013-07-03  8:04       ` Kevin Wolf
@ 2013-07-03  8:12       ` Stefan Hajnoczi
  2013-07-03 12:53         ` Benoît Canet
  2 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2013-07-03  8:12 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel

On Tue, Jul 02, 2013 at 11:23:56PM +0200, Benoît Canet wrote:
> >    Any ideas how existing journals handle this?

By the way, I don't know much about journalling techniques.  So I'm
asking you these questions so that either you can answer them straight
away or because they might warrant a look at existing journal
implementations like:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/jbd2
http://www.sqlite.org/cgi/src/dir?name=src
http://blitiri.com.ar/p/libjio/

Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-03  8:04       ` Kevin Wolf
@ 2013-07-03 12:30         ` Benoît Canet
  0 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-07-03 12:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Benoît Canet, qemu-devel, Stefan Hajnoczi

> Care to explain that in more detail? Why shouldn't it work on spinning
> disks?

Hash are random they introduce random read access.

With a QCOW2 cluster size of 4KB the deduplication code when writting duplicated
data will do one random read per 4KB block to deduplicate.

A server grade hardisk is rated for 250 iops. This traduce in 1MB/s of deduplicated
data. Not very usable.

On the contrary a samsung 840 pro SSD is rated for 80k iops of random read.
That should traduce in 320MB/s of potentially deduplicated data.

Havind dedup metadata on SSD and actual data on disk would solve the problem but
it would need block backend.

Benoît

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-03  8:01       ` Stefan Hajnoczi
@ 2013-07-03 12:35         ` Benoît Canet
  0 siblings, 0 replies; 41+ messages in thread
From: Benoît Canet @ 2013-07-03 12:35 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Benoît Canet, kwolf, qemu-devel

> Does this mean the journal forms the first-stage data structure for
> deduplication?  Dedup records will accumulate in the journal until it
> becomes time to convert them in bulk into a more compact representation?

The journal is mainly used to persist the last inserted dedup metadata across
QEMU stop and restart. I replay it at startup to rebuild the hash table.
So yes it's the first stage even it's never used for regular queries.

> 
> When I read this specification I was thinking of a journal purely for
> logging operations.  You could use a commit record to mark previous
> records applied.  Upon startup, qcow2 would inspect uncommitted records
> and deal with them.

Maybe that could help regular QCOW2 usage. I don't know.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-03  8:12       ` Stefan Hajnoczi
@ 2013-07-03 12:53         ` Benoît Canet
  2013-07-04  7:13           ` Stefan Hajnoczi
  0 siblings, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-07-03 12:53 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Benoît Canet, kwolf, qemu-devel

> By the way, I don't know much about journalling techniques.  So I'm
> asking you these questions so that either you can answer them straight
> away or because they might warrant a look at existing journal
> implementations like:

I tried to so something simple and performing for the deduplication usage.

That explain that there is no concept of transaction and that the journal's
block are flushed asynchronously in order to have an high insertion rate.

I agree with your previous comment is more a log than a journal.

> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/jbd2
> http://www.sqlite.org/cgi/src/dir?name=src
> http://blitiri.com.ar/p/libjio/

I will try to find a paper on journal design.

Benoît

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-03 12:53         ` Benoît Canet
@ 2013-07-04  7:13           ` Stefan Hajnoczi
  2013-07-04 10:01             ` Benoît Canet
  0 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2013-07-04  7:13 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel

On Wed, Jul 03, 2013 at 02:53:27PM +0200, Benoît Canet wrote:
> > By the way, I don't know much about journalling techniques.  So I'm
> > asking you these questions so that either you can answer them straight
> > away or because they might warrant a look at existing journal
> > implementations like:
> 
> I tried to so something simple and performing for the deduplication usage.
> 
> That explain that there is no concept of transaction and that the journal's
> block are flushed asynchronously in order to have an high insertion rate.
> 
> I agree with your previous comment is more a log than a journal.

Simple is good.  Even for deduplication alone, I think data integrity is
critical - otherwise we risk stale dedup metadata pointing to clusters
that are unallocated or do not contain the right data.  So the journal
will probably need to follow techniques for commits/checksums.

Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-04  7:13           ` Stefan Hajnoczi
@ 2013-07-04 10:01             ` Benoît Canet
  2013-07-16 22:45               ` Benoît Canet
  0 siblings, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-07-04 10:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Benoît Canet, kwolf, qemu-devel

> Simple is good.  Even for deduplication alone, I think data integrity is
> critical - otherwise we risk stale dedup metadata pointing to clusters
> that are unallocated or do not contain the right data.  So the journal
> will probably need to follow techniques for commits/checksums.

I agree that checksums are missing for the dedup.
Maybe we could even use some kind of error correcting code instead of a checksum.

Concerning data integrity the events that the deduplication code cannot loose
are hash deletions because they mark a previously inserted hash as obsolete.

The problem with a commit/flush mechanism on hash deletion is that it will slow
down the store insertion speed and also create some extra SSD wear out.

To solve this I considered the fact that the dedup metadata as a whole is
disposable.

So I implemented a "dedup dirty" bit.

When QEMU stop the journal is flushed and the dirty bit is cleared.
When QEMU start and the dirty bit is set a crash is detected and _all_ the
deduplication metadata is dropped.
The QCOW2 data integrity won't suffer only the dedup ratio will be lower.

As you said once on irc crashes don't happen often.

Benoît

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-04 10:01             ` Benoît Canet
@ 2013-07-16 22:45               ` Benoît Canet
  2013-07-17  8:20                 ` Kevin Wolf
  0 siblings, 1 reply; 41+ messages in thread
From: Benoît Canet @ 2013-07-16 22:45 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, Stefan Hajnoczi

> > Simple is good.  Even for deduplication alone, I think data integrity is
> > critical - otherwise we risk stale dedup metadata pointing to clusters
> > that are unallocated or do not contain the right data.  So the journal
> > will probably need to follow techniques for commits/checksums.
>

I'll add checksums to the journal and clean the journal entry size mess soon.

For the transactional/commits aspect of the journal I think that we need Kevin's
point of view on the subject.

Best regards

Benoît

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
  2013-07-16 22:45               ` Benoît Canet
@ 2013-07-17  8:20                 ` Kevin Wolf
  0 siblings, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2013-07-17  8:20 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel, Stefan Hajnoczi

Am 17.07.2013 um 00:45 hat Benoît Canet geschrieben:
> > > Simple is good.  Even for deduplication alone, I think data integrity is
> > > critical - otherwise we risk stale dedup metadata pointing to clusters
> > > that are unallocated or do not contain the right data.  So the journal
> > > will probably need to follow techniques for commits/checksums.
> >
> 
> I'll add checksums to the journal and clean the journal entry size mess soon.
> 
> For the transactional/commits aspect of the journal I think that we need Kevin's
> point of view on the subject.

Sorry, I was going to prepare a patch that does journalling for the
existing metadata, but once again other things stole my time. I'm still
planning to do that, though, and then we can compare whether your
requirements are fulfilled with it as well.

Kevin

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-07-17  8:20 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
2013-07-02 14:42   ` Stefan Hajnoczi
2013-07-02 14:54     ` Kevin Wolf
2013-07-02 21:26       ` Benoît Canet
2013-07-03  8:08         ` Kevin Wolf
2013-07-03  7:51       ` Stefan Hajnoczi
2013-07-02 21:23     ` Benoît Canet
2013-07-03  8:01       ` Stefan Hajnoczi
2013-07-03 12:35         ` Benoît Canet
2013-07-03  8:04       ` Kevin Wolf
2013-07-03 12:30         ` Benoît Canet
2013-07-03  8:12       ` Stefan Hajnoczi
2013-07-03 12:53         ` Benoît Canet
2013-07-04  7:13           ` Stefan Hajnoczi
2013-07-04 10:01             ` Benoît Canet
2013-07-16 22:45               ` Benoît Canet
2013-07-17  8:20                 ` Kevin Wolf
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 03/24] qcow2: Add journal Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 04/24] qcow2: Create the log store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 05/24] qcow2: Add the hash store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 06/24] qcow2: Add the deduplication store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 07/24] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 08/24] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 09/24] qcow2: Make qcow2_update_cluster_refcount public Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 10/24] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 11/24] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 12/24] qcow2: Do allocate on rewrite on the dedup case Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 13/24] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 14/24] qcow2: Load and save deduplication table header extension Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 15/24] qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 16/24] block: Add qcow2_dedup format and image creation code Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 17/24] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 19/24] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 21/24] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 22/24] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 23/24] qcow2: Enable the deduplication feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 24/24] qcow2: Enable deduplication tests Benoît Canet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.