[Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements
@ 2015-04-30 10:11 Alberto Garcia
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-04-30 10:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Alberto Garcia, qemu-block, Stefan Hajnoczi

Here are some improvements to the qcow2 L2/refcount cache code.

The first one is that all cache tables are now allocated using a
single memory block, as we discussed last week.

Apart from a more efficient use of memory, this allows some additional
optimizations so I took the chance to make other changes.

- qcow2_cache_put() and qcow2_cache_entry_mark_dirty() are now O(1)
- The eviction algorithm is now LRU. The previous one only works well
  with very small cache sizes.
- qcow2_cache_find_entry_to_replace() is no longer necessary.
- Lookups are faster now.

In my tests with a preallocated 128MB L2 cache in an empty drive the
new code is ~13% faster than the previous one (~43% if compiled
without optimizations). This is a best-case scenario, if the cache is
smaller or the drive is full of data the improvements are not so
visible, but I believe the code is simpler now so I hope you find the
changes worthwhile.

Regards,

Berto

Alberto Garcia (6):
  qcow2: use one single memory block for the L2/refcount cache tables
  qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
  qcow2: use an LRU algorithm to replace entries from the L2 cache
  qcow2: remove qcow2_cache_find_entry_to_replace()
  qcow2: use a hash to look for entries in the L2 cache
  qcow2: style fixes in qcow2-cache.c

 block/qcow2-cache.c | 149 +++++++++++++++++++++-------------------------------
 1 file changed, 61 insertions(+), 88 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
@ 2015-04-30 10:11 ` Alberto Garcia
  2015-04-30 15:08   ` Eric Blake
  2015-05-01 14:23   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-04-30 10:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Alberto Garcia, qemu-block, Stefan Hajnoczi

The qcow2 L2/refcount cache contains one separate table for each cache
entry. Doing one allocation per table adds unnecessary overhead and it
also requires us to store the address of each table separately.

Since the size of the cache is constant during its lifetime, it's
better to have an array that contains all the tables using one single
allocation.

In my tests measuring freshly created caches with sizes 128MB (L2) and
32MB (refcount) this uses around 10MB of RAM less.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 48 +++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index b115549..586880b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -28,7 +28,6 @@
 #include "trace.h"
 
 typedef struct Qcow2CachedTable {
-    void*   table;
     int64_t offset;
     bool    dirty;
     int     cache_hits;
@@ -40,39 +39,34 @@ struct Qcow2Cache {
     struct Qcow2Cache*      depends;
     int                     size;
     bool                    depends_on_flush;
+    void                   *table_array;
+    int                     table_size;
 };
 
+static inline void *table_addr(Qcow2Cache *c, int table)
+{
+    return c->table_array + table * c->table_size;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
     Qcow2Cache *c;
-    int i;
 
     c = g_new0(Qcow2Cache, 1);
     c->size = num_tables;
+    c->table_size = s->cluster_size;
     c->entries = g_try_new0(Qcow2CachedTable, num_tables);
-    if (!c->entries) {
-        goto fail;
-    }
+    c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size);
 
-    for (i = 0; i < c->size; i++) {
-        c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size);
-        if (c->entries[i].table == NULL) {
-            goto fail;
-        }
+    if (!c->entries || !c->table_array) {
+        qemu_vfree(c->table_array);
+        g_free(c->entries);
+        g_free(c);
+        c = NULL;
     }
 
     return c;
-
-fail:
-    if (c->entries) {
-        for (i = 0; i < c->size; i++) {
-            qemu_vfree(c->entries[i].table);
-        }
-    }
-    g_free(c->entries);
-    g_free(c);
-    return NULL;
 }
 
 int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
@@ -81,9 +75,9 @@ int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
 
     for (i = 0; i < c->size; i++) {
         assert(c->entries[i].ref == 0);
-        qemu_vfree(c->entries[i].table);
     }
 
+    qemu_vfree(c->table_array);
     g_free(c->entries);
     g_free(c);
 
@@ -151,8 +145,8 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
         BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
     }
 
-    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
-        s->cluster_size);
+    ret = bdrv_pwrite(bs->file, c->entries[i].offset, table_addr(c, i),
+                      s->cluster_size);
     if (ret < 0) {
         return ret;
     }
@@ -304,7 +298,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
             BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
         }
 
-        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
+        ret = bdrv_pread(bs->file, offset, table_addr(c, i), s->cluster_size);
         if (ret < 0) {
             return ret;
         }
@@ -319,7 +313,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
 found:
     c->entries[i].cache_hits++;
     c->entries[i].ref++;
-    *table = c->entries[i].table;
+    *table = table_addr(c, i);
 
     trace_qcow2_cache_get_done(qemu_coroutine_self(),
                                c == s->l2_table_cache, i);
@@ -344,7 +338,7 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
     int i;
 
     for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == *table) {
+        if (table_addr(c, i) == *table) {
             goto found;
         }
     }
@@ -363,7 +357,7 @@ void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
     int i;
 
     for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == table) {
+        if (table_addr(c, i) == table) {
             goto found;
         }
     }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
  2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
@ 2015-04-30 10:11 ` Alberto Garcia
  2015-05-01 14:31   ` Stefan Hajnoczi
  2015-05-05 15:21   ` Eric Blake
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 3/6] qcow2: use an LRU algorithm to replace entries from the L2 cache Alberto Garcia
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-04-30 10:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Alberto Garcia, qemu-block, Stefan Hajnoczi

Since all tables are now stored together, it is possible to obtain
the position of a particular table directly from its address, so the
operation becomes O(1).

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 586880b..d3274f4 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -335,16 +335,12 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
 
 int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
 {
-    int i;
+    int i = (*table - c->table_array) / c->table_size;
 
-    for (i = 0; i < c->size; i++) {
-        if (table_addr(c, i) == *table) {
-            goto found;
-        }
+    if (c->entries[i].offset == 0) {
+        return -ENOENT;
     }
-    return -ENOENT;
 
-found:
     c->entries[i].ref--;
     *table = NULL;
 
@@ -354,15 +350,7 @@ found:
 
 void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
 {
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        if (table_addr(c, i) == table) {
-            goto found;
-        }
-    }
-    abort();
-
-found:
+    int i = (table - c->table_array) / c->table_size;
+    assert(c->entries[i].offset != 0);
     c->entries[i].dirty = true;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 3/6] qcow2: use an LRU algorithm to replace entries from the L2 cache
  2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
@ 2015-04-30 10:11 ` Alberto Garcia
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 4/6] qcow2: remove qcow2_cache_find_entry_to_replace() Alberto Garcia
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-04-30 10:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Alberto Garcia, qemu-block, Stefan Hajnoczi

The current algorithm to evict entries from the cache gives always
preference to those in the lowest positions. As the size of the cache
increases, the chances of the later elements of being removed decrease
exponentially.

In a scenario with random I/O and lots of cache misses, entries in
positions 8 and higher are rarely (if ever) evicted. This can be seen
even with the default cache size, but with larger caches the problem
becomes more obvious.

Using an LRU algorithm makes the chances of being removed from the
cache independent from the position.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index d3274f4..477a209 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -28,10 +28,10 @@
 #include "trace.h"
 
 typedef struct Qcow2CachedTable {
-    int64_t offset;
-    bool    dirty;
-    int     cache_hits;
-    int     ref;
+    int64_t  offset;
+    bool     dirty;
+    uint64_t lru_counter;
+    int      ref;
 } Qcow2CachedTable;
 
 struct Qcow2Cache {
@@ -41,6 +41,7 @@ struct Qcow2Cache {
     bool                    depends_on_flush;
     void                   *table_array;
     int                     table_size;
+    uint64_t                lru_counter;
 };
 
 static inline void *table_addr(Qcow2Cache *c, int table)
@@ -222,16 +223,18 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
     for (i = 0; i < c->size; i++) {
         assert(c->entries[i].ref == 0);
         c->entries[i].offset = 0;
-        c->entries[i].cache_hits = 0;
+        c->entries[i].lru_counter = 0;
     }
 
+    c->lru_counter = 0;
+
     return 0;
 }
 
 static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
 {
     int i;
-    int min_count = INT_MAX;
+    uint64_t min_lru_counter = UINT64_MAX;
     int min_index = -1;
 
 
@@ -240,15 +243,9 @@ static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
             continue;
         }
 
-        if (c->entries[i].cache_hits < min_count) {
+        if (c->entries[i].lru_counter < min_lru_counter) {
             min_index = i;
-            min_count = c->entries[i].cache_hits;
-        }
-
-        /* Give newer hits priority */
-        /* TODO Check how to optimize the replacement strategy */
-        if (c->entries[i].cache_hits > 1) {
-            c->entries[i].cache_hits /= 2;
+            min_lru_counter = c->entries[i].lru_counter;
         }
     }
 
@@ -306,12 +303,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
 
     /* Give the table some hits for the start so that it won't be replaced
      * immediately. The number 32 is completely arbitrary. */
-    c->entries[i].cache_hits = 32;
     c->entries[i].offset = offset;
 
     /* And return the right table */
 found:
-    c->entries[i].cache_hits++;
     c->entries[i].ref++;
     *table = table_addr(c, i);
 
@@ -344,6 +339,10 @@ int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
     c->entries[i].ref--;
     *table = NULL;
 
+    if (c->entries[i].ref == 0) {
+        c->entries[i].lru_counter = ++c->lru_counter;
+    }
+
     assert(c->entries[i].ref >= 0);
     return 0;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 4/6] qcow2: remove qcow2_cache_find_entry_to_replace()
  2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
                   ` (2 preceding siblings ...)
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 3/6] qcow2: use an LRU algorithm to replace entries from the L2 cache Alberto Garcia
@ 2015-04-30 10:11 ` Alberto Garcia
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache Alberto Garcia
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-04-30 10:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Alberto Garcia, qemu-block, Stefan Hajnoczi

A cache miss means that the whole array was traversed and the entry
we were looking for was not found, so there's no need to traverse it
again in order to select an entry to replace.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 45 ++++++++++++++++-----------------------------
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 477a209..e1bba20 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -231,51 +231,38 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
     return 0;
 }
 
-static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
-{
-    int i;
-    uint64_t min_lru_counter = UINT64_MAX;
-    int min_index = -1;
-
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].ref) {
-            continue;
-        }
-
-        if (c->entries[i].lru_counter < min_lru_counter) {
-            min_index = i;
-            min_lru_counter = c->entries[i].lru_counter;
-        }
-    }
-
-    if (min_index == -1) {
-        /* This can't happen in current synchronous code, but leave the check
-         * here as a reminder for whoever starts using AIO with the cache */
-        abort();
-    }
-    return min_index;
-}
-
 static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
     uint64_t offset, void **table, bool read_from_disk)
 {
     BDRVQcowState *s = bs->opaque;
     int i;
     int ret;
+    uint64_t min_lru_counter = UINT64_MAX;
+    int min_lru_index = -1;
 
     trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
                           offset, read_from_disk);
 
     /* Check if the table is already cached */
     for (i = 0; i < c->size; i++) {
-        if (c->entries[i].offset == offset) {
+        const Qcow2CachedTable *t = &c->entries[i];
+        if (t->offset == offset) {
             goto found;
         }
+        if (t->ref == 0 && t->lru_counter < min_lru_counter) {
+            min_lru_counter = t->lru_counter;
+            min_lru_index = i;
+        }
+    }
+
+    if (min_lru_index == -1) {
+        /* This can't happen in current synchronous code, but leave the check
+         * here as a reminder for whoever starts using AIO with the cache */
+        abort();
     }
 
-    /* If not, write a table back and replace it */
-    i = qcow2_cache_find_entry_to_replace(c);
+    /* Cache miss: write a table back and replace it */
+    i = min_lru_index;
     trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
                                         c == s->l2_table_cache, i);
     if (i < 0) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache
  2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
                   ` (3 preceding siblings ...)
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 4/6] qcow2: remove qcow2_cache_find_entry_to_replace() Alberto Garcia
@ 2015-04-30 10:11 ` Alberto Garcia
  2015-05-06 16:42   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 6/6] qcow2: style fixes in qcow2-cache.c Alberto Garcia
  2015-05-06 16:43 ` [Qemu-devel] [Qemu-block] [PATCH 0/6] qcow2 L2/refcount cache improvements Stefan Hajnoczi
  6 siblings, 1 reply; 24+ messages in thread
From: Alberto Garcia @ 2015-04-30 10:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Alberto Garcia, qemu-block, Stefan Hajnoczi

The current cache algorithm traverses the array starting always from
the beginning, so the average number of comparisons needed to perform
a lookup is proportional to the size of the array.

By using a hash of the offset as the starting point, lookups are
faster and independent from the array size.

The hash is computed using the cluster number of the table, multiplied
by 4 to make it perform better when there are collisions.

In my tests, using a cache with 2048 entries, this reduces the average
number of comparisons per lookup from 430 to 2.5.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index e1bba20..c0e0278 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -237,6 +237,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
     BDRVQcowState *s = bs->opaque;
     int i;
     int ret;
+    int lookup_index;
     uint64_t min_lru_counter = UINT64_MAX;
     int min_lru_index = -1;
 
@@ -244,7 +245,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
                           offset, read_from_disk);
 
     /* Check if the table is already cached */
-    for (i = 0; i < c->size; i++) {
+    i = lookup_index = (offset / c->table_size * 4) % c->size;
+    do {
         const Qcow2CachedTable *t = &c->entries[i];
         if (t->offset == offset) {
             goto found;
@@ -253,7 +255,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
             min_lru_counter = t->lru_counter;
             min_lru_index = i;
         }
-    }
+        if (++i == c->size) {
+            i = 0;
+        }
+    } while (i != lookup_index);
 
     if (min_lru_index == -1) {
         /* This can't happen in current synchronous code, but leave the check
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 6/6] qcow2: style fixes in qcow2-cache.c
  2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
                   ` (4 preceding siblings ...)
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache Alberto Garcia
@ 2015-04-30 10:11 ` Alberto Garcia
  2015-05-06 16:42   ` Stefan Hajnoczi
  2015-05-06 16:43 ` [Qemu-devel] [Qemu-block] [PATCH 0/6] qcow2 L2/refcount cache improvements Stefan Hajnoczi
  6 siblings, 1 reply; 24+ messages in thread
From: Alberto Garcia @ 2015-04-30 10:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Alberto Garcia, qemu-block, Stefan Hajnoczi

Fix pointer declaration to make it consistent with the rest of the
code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index c0e0278..dd591ef 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -35,8 +35,8 @@ typedef struct Qcow2CachedTable {
 } Qcow2CachedTable;
 
 struct Qcow2Cache {
-    Qcow2CachedTable*       entries;
-    struct Qcow2Cache*      depends;
+    Qcow2CachedTable       *entries;
+    struct Qcow2Cache      *depends;
     int                     size;
     bool                    depends_on_flush;
     void                   *table_array;
@@ -70,7 +70,7 @@ Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
     return c;
 }
 
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
+int qcow2_cache_destroy(BlockDriverState *bs, Qcow2Cache *c)
 {
     int i;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
@ 2015-04-30 15:08   ` Eric Blake
  2015-05-05 13:00     ` Alberto Garcia
  2015-05-01 14:23   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 1 reply; 24+ messages in thread
From: Eric Blake @ 2015-04-30 15:08 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

On 04/30/2015 04:11 AM, Alberto Garcia wrote:
> The qcow2 L2/refcount cache contains one separate table for each cache
> entry. Doing one allocation per table adds unnecessary overhead and it
> also requires us to store the address of each table separately.
> 
> Since the size of the cache is constant during its lifetime, it's
> better to have an array that contains all the tables using one single
> allocation.
> 
> In my tests measuring freshly created caches with sizes 128MB (L2) and
> 32MB (refcount) this uses around 10MB of RAM less.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c | 48 +++++++++++++++++++++---------------------------
>  1 file changed, 21 insertions(+), 27 deletions(-)
> 

>  
>  typedef struct Qcow2CachedTable {
> -    void*   table;
>      int64_t offset;
>      bool    dirty;
>      int     cache_hits;
> @@ -40,39 +39,34 @@ struct Qcow2Cache {
>      struct Qcow2Cache*      depends;
>      int                     size;
>      bool                    depends_on_flush;
> +    void                   *table_array;
> +    int                     table_size;

Should this be size_t? [1]

>  };
>  
> +static inline void *table_addr(Qcow2Cache *c, int table)
> +{
> +    return c->table_array + table * c->table_size;

Addition on void* is not portable.

> +}
> +
>  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>  {
>      BDRVQcowState *s = bs->opaque;
>      Qcow2Cache *c;
> -    int i;
>  
>      c = g_new0(Qcow2Cache, 1);
>      c->size = num_tables;
> +    c->table_size = s->cluster_size;

[1] Oh, maybe not, since BDRVQcowState declares cluster_size as int.

>      c->entries = g_try_new0(Qcow2CachedTable, num_tables);
> -    if (!c->entries) {
> -        goto fail;
> -    }
> +    c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size);

Are we sure this won't overflow?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
  2015-04-30 15:08   ` Eric Blake
@ 2015-05-01 14:23   ` Stefan Hajnoczi
  2015-05-04 10:58     ` Kevin Wolf
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-01 14:23 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 514 bytes --]

On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
>  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>  {
>      BDRVQcowState *s = bs->opaque;
>      Qcow2Cache *c;
> -    int i;
>  
>      c = g_new0(Qcow2Cache, 1);
>      c->size = num_tables;
> +    c->table_size = s->cluster_size;

This assumes c->table_size meets bs' memory alignment requirements.  The
following would be safer:

c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
@ 2015-05-01 14:31   ` Stefan Hajnoczi
  2015-05-05 13:06     ` Alberto Garcia
  2015-05-05 15:21   ` Eric Blake
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-01 14:31 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

On Thu, Apr 30, 2015 at 01:11:41PM +0300, Alberto Garcia wrote:
>  int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
>  {
> -    int i;
> +    int i = (*table - c->table_array) / c->table_size;
>  
> -    for (i = 0; i < c->size; i++) {
> -        if (table_addr(c, i) == *table) {
> -            goto found;
> -        }
> +    if (c->entries[i].offset == 0) {
> +        return -ENOENT;
>      }
> -    return -ENOENT;
>  
> -found:
>      c->entries[i].ref--;
>      *table = NULL;
>  

When is this function called with a bogus table pointer?

My hunch is there are no callers like that.  I might be wrong.

If there are no callers like that, then you can extract this and include
an assertion instead:

static int qcow2_cache_get_table_idx(Qcow2Cache *c, void *table)
{
    assert(table >= c->table_array &&);
           table < c->table_array + c->size * c->table_size);
    return (table - c->table_array) / c->table_size;
}

Functions that go from void *table to int i should use this helper.

And the qcow2_cache_put() users can drop their error handling code.  A
lot of them ignore the return value anyway, so it simplifies things even
more.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-05-01 14:23   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2015-05-04 10:58     ` Kevin Wolf
  2015-05-05 10:28       ` Stefan Hajnoczi
  0 siblings, 1 reply; 24+ messages in thread
From: Kevin Wolf @ 2015-05-04 10:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-block, Alberto Garcia, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      Qcow2Cache *c;
> > -    int i;
> >  
> >      c = g_new0(Qcow2Cache, 1);
> >      c->size = num_tables;
> > +    c->table_size = s->cluster_size;
> 
> This assumes c->table_size meets bs' memory alignment requirements.  The
> following would be safer:
> 
> c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));

You can't just access more than one cluster. You might be caching data
and later write it back when it's stale.

If you can't do I/O in chunks as small as the cluster size (which is
rather unlikely), relying on bdrv_pwrite(), like Berto's patch does, is
the correct solution.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-05-04 10:58     ` Kevin Wolf
@ 2015-05-05 10:28       ` Stefan Hajnoczi
  2015-05-05 11:20         ` Kevin Wolf
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-05 10:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, Alberto Garcia, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote:
> Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> > >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> > >  {
> > >      BDRVQcowState *s = bs->opaque;
> > >      Qcow2Cache *c;
> > > -    int i;
> > >  
> > >      c = g_new0(Qcow2Cache, 1);
> > >      c->size = num_tables;
> > > +    c->table_size = s->cluster_size;
> > 
> > This assumes c->table_size meets bs' memory alignment requirements.  The
> > following would be safer:
> > 
> > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
> 
> You can't just access more than one cluster. You might be caching data
> and later write it back when it's stale.

I don't mean I/O alignment, just memory alignment (i.e. the start
address of the data buffer in memory).

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-05-05 10:28       ` Stefan Hajnoczi
@ 2015-05-05 11:20         ` Kevin Wolf
  2015-05-05 15:09           ` Alberto Garcia
  2015-05-06 14:57           ` Stefan Hajnoczi
  0 siblings, 2 replies; 24+ messages in thread
From: Kevin Wolf @ 2015-05-05 11:20 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-block, Alberto Garcia, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]

Am 05.05.2015 um 12:28 hat Stefan Hajnoczi geschrieben:
> On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote:
> > Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> > > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> > > >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> > > >  {
> > > >      BDRVQcowState *s = bs->opaque;
> > > >      Qcow2Cache *c;
> > > > -    int i;
> > > >  
> > > >      c = g_new0(Qcow2Cache, 1);
> > > >      c->size = num_tables;
> > > > +    c->table_size = s->cluster_size;
> > > 
> > > This assumes c->table_size meets bs' memory alignment requirements.  The
> > > following would be safer:
> > > 
> > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
> > 
> > You can't just access more than one cluster. You might be caching data
> > and later write it back when it's stale.
> 
> I don't mean I/O alignment, just memory alignment (i.e. the start
> address of the data buffer in memory).

The start address is already taken care of by qemu_blockalign(). With
rounding c->table_size, you'd align the length, and that would be wrong.

Though looking at the code again I see now that c->table_size isn't
consistently used. The I/O requests still use s->cluster_size. We should
either use it everywhere or not introduce it at all.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-04-30 15:08   ` Eric Blake
@ 2015-05-05 13:00     ` Alberto Garcia
  0 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-05-05 13:00 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On Thu 30 Apr 2015 05:08:05 PM CEST, Eric Blake <eblake@redhat.com> wrote:

>>  typedef struct Qcow2CachedTable {
>> -    void*   table;
>>      int64_t offset;
>>      bool    dirty;
>>      int     cache_hits;
>> @@ -40,39 +39,34 @@ struct Qcow2Cache {
>>      struct Qcow2Cache*      depends;
>>      int                     size;
>>      bool                    depends_on_flush;
>> +    void                   *table_array;
>> +    int                     table_size;
>
> Should this be size_t? [1]

The maximum supported table size is 2MB (MAX_CLUSTER_BITS == 21).

>>      c->entries = g_try_new0(Qcow2CachedTable, num_tables);
>> -    if (!c->entries) {
>> -        goto fail;
>> -    }
>> +    c->table_array = qemu_try_blockalign(bs->file, num_tables * c->table_size);
>
> Are we sure this won't overflow?

That's a good catch. I was making some numbers and I doubt that scenario
would happen in practice, but I think it's possible so I'll fix it.

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
  2015-05-01 14:31   ` Stefan Hajnoczi
@ 2015-05-05 13:06     ` Alberto Garcia
  2015-05-06 15:00       ` Stefan Hajnoczi
  0 siblings, 1 reply; 24+ messages in thread
From: Alberto Garcia @ 2015-05-05 13:06 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block

On Fri 01 May 2015 04:31:52 PM CEST, Stefan Hajnoczi wrote:

>>  int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
>>  {
>> -    int i;
>> +    int i = (*table - c->table_array) / c->table_size;
>>  
>> -    for (i = 0; i < c->size; i++) {
>> -        if (table_addr(c, i) == *table) {
>> -            goto found;
>> -        }
>> +    if (c->entries[i].offset == 0) {
>> +        return -ENOENT;
>>      }
>> -    return -ENOENT;
>>  
>> -found:
>>      c->entries[i].ref--;
>>      *table = NULL;
>>  
>
> When is this function called with a bogus table pointer?

I also could not image any such scenario, but I decided to be
conservative and keep the error handling code. I'll double check all
places where it's used and remove the relevant code.

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-05-05 11:20         ` Kevin Wolf
@ 2015-05-05 15:09           ` Alberto Garcia
  2015-05-06 14:57           ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-05-05 15:09 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi

On Tue 05 May 2015 01:20:19 PM CEST, Kevin Wolf wrote:

> Though looking at the code again I see now that c->table_size isn't
> consistently used. The I/O requests still use s->cluster_size. We
> should either use it everywhere or not introduce it at all.

c->table_size is necessary in order to calculate the offset of a
particular table, because s->cluster_size is not always available
(e.g. qcow2_cache_entry_mark_dirty()). We could make it a requirement of
the API, though.

Alternatively we could have c->bs instead of c->table_size. That would
spare us from passing the BlockDriverState pointer to all qcow2_cache_
functions.

The assumption would be that the BlockDriverState pointer is the same
for the whole lifetime of the cache, but that's always guaranteed to be
like that, right?

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
  2015-05-01 14:31   ` Stefan Hajnoczi
@ 2015-05-05 15:21   ` Eric Blake
  1 sibling, 0 replies; 24+ messages in thread
From: Eric Blake @ 2015-05-05 15:21 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1380 bytes --]

On 04/30/2015 04:11 AM, Alberto Garcia wrote:
> Since all tables are now stored together, it is possible to obtain
> the position of a particular table directly from its address, so the
> operation becomes O(1).
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c | 22 +++++-----------------
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index 586880b..d3274f4 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -335,16 +335,12 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>  
>  int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
>  {
> -    int i;
> +    int i = (*table - c->table_array) / c->table_size;

Arithmetic on void* is not portable.

>  void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
>  {
> -    int i;
> -
> -    for (i = 0; i < c->size; i++) {
> -        if (table_addr(c, i) == table) {
> -            goto found;
> -        }
> -    }
> -    abort();
> -
> -found:
> +    int i = (table - c->table_array) / c->table_size;

and again.

> +    assert(c->entries[i].offset != 0);
>      c->entries[i].dirty = true;
>  }
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables
  2015-05-05 11:20         ` Kevin Wolf
  2015-05-05 15:09           ` Alberto Garcia
@ 2015-05-06 14:57           ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-06 14:57 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, Alberto Garcia, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1604 bytes --]

On Tue, May 05, 2015 at 01:20:19PM +0200, Kevin Wolf wrote:
> Am 05.05.2015 um 12:28 hat Stefan Hajnoczi geschrieben:
> > On Mon, May 04, 2015 at 12:58:13PM +0200, Kevin Wolf wrote:
> > > Am 01.05.2015 um 16:23 hat Stefan Hajnoczi geschrieben:
> > > > On Thu, Apr 30, 2015 at 01:11:40PM +0300, Alberto Garcia wrote:
> > > > >  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> > > > >  {
> > > > >      BDRVQcowState *s = bs->opaque;
> > > > >      Qcow2Cache *c;
> > > > > -    int i;
> > > > >  
> > > > >      c = g_new0(Qcow2Cache, 1);
> > > > >      c->size = num_tables;
> > > > > +    c->table_size = s->cluster_size;
> > > > 
> > > > This assumes c->table_size meets bs' memory alignment requirements.  The
> > > > following would be safer:
> > > > 
> > > > c->table_size = QEMU_ALIGN_UP(c->table_size, bdrv_opt_mem_align(bs->file));
> > > 
> > > You can't just access more than one cluster. You might be caching data
> > > and later write it back when it's stale.
> > 
> > I don't mean I/O alignment, just memory alignment (i.e. the start
> > address of the data buffer in memory).
> 
> The start address is already taken care of by qemu_blockalign(). With
> rounding c->table_size, you'd align the length, and that would be wrong.

It wasn't clear to me that c->table + n * c->table_size for all n is
aligned, but I agree with you now:

bdrv_qiov_is_aligned() checks both address and size against memory
alignment.  This means that if the I/O is memory aligned for table_size,
then all multiples of table_size are also aligned.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
  2015-05-05 13:06     ` Alberto Garcia
@ 2015-05-06 15:00       ` Stefan Hajnoczi
  2015-05-06 15:32         ` Alberto Garcia
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-06 15:00 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]

On Tue, May 05, 2015 at 03:06:52PM +0200, Alberto Garcia wrote:
> On Fri 01 May 2015 04:31:52 PM CEST, Stefan Hajnoczi wrote:
> 
> >>  int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
> >>  {
> >> -    int i;
> >> +    int i = (*table - c->table_array) / c->table_size;
> >>  
> >> -    for (i = 0; i < c->size; i++) {
> >> -        if (table_addr(c, i) == *table) {
> >> -            goto found;
> >> -        }
> >> +    if (c->entries[i].offset == 0) {
> >> +        return -ENOENT;
> >>      }
> >> -    return -ENOENT;
> >>  
> >> -found:
> >>      c->entries[i].ref--;
> >>      *table = NULL;
> >>  
> >
> > When is this function called with a bogus table pointer?
> 
> I also could not image any such scenario, but I decided to be
> conservative and keep the error handling code. I'll double check all
> places where it's used and remove the relevant code.

The reason why I'd think this is worth looking into is:

The new qcow2_cache_put() code indexes outside the array bounds if table
is a bogus value.  The old code would return -ENOENT.  So I am a little
nervous about the change, although I suspect the function is never
called with invalid inputs at all.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()
  2015-05-06 15:00       ` Stefan Hajnoczi
@ 2015-05-06 15:32         ` Alberto Garcia
  0 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-05-06 15:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block

On Wed 06 May 2015 05:00:50 PM CEST, Stefan Hajnoczi wrote:

>> >>  int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
>> >>  {
>> >> -    int i;
>> >> +    int i = (*table - c->table_array) / c->table_size;
>> >>  
>> >> -    for (i = 0; i < c->size; i++) {
>> >> -        if (table_addr(c, i) == *table) {
>> >> -            goto found;
>> >> -        }
>> >> +    if (c->entries[i].offset == 0) {
>> >> +        return -ENOENT;
>> >>      }
>> >> -    return -ENOENT;
>> >>  
>> >> -found:
>> >>      c->entries[i].ref--;
>> >>      *table = NULL;
>> >>  
>> >
>> > When is this function called with a bogus table pointer?
>> 
>> I also could not image any such scenario, but I decided to be
>> conservative and keep the error handling code. I'll double check all
>> places where it's used and remove the relevant code.
>
> The reason why I'd think this is worth looking into is:
>
> The new qcow2_cache_put() code indexes outside the array bounds if
>table is a bogus value.  The old code would return -ENOENT.  So I am a
>little nervous about the change, although I suspect the function is
>never called with invalid inputs at all.

I checked again all callers of qcow2_cache_put() and I didn't notice
anything unusual.

In some cases it even goes after qcow2_cache_entry_mark_dirty(), and
that one directly aborts if the table pointer is bogus.

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache Alberto Garcia
@ 2015-05-06 16:42   ` Stefan Hajnoczi
  2015-05-07  8:23     ` Alberto Garcia
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-06 16:42 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

On Thu, Apr 30, 2015 at 01:11:44PM +0300, Alberto Garcia wrote:
> By using a hash of the offset as the starting point, lookups are
> faster and independent from the array size.
> 
> The hash is computed using the cluster number of the table, multiplied
> by 4 to make it perform better when there are collisions.
...
> +    i = lookup_index = (offset / c->table_size * 4) % c->size;

The hash function is very weak.  It's better than always starting with
i=0 but mixes input bits very poorly.  Have you considered an integer
hash function so that more input bits affect the output?
http://burtleburtle.net/bob/hash/integer.html

Regarding collisions, the multiply-by-4 effectively reduces the hash
space by a factor of 4.  This *increases* the chance of collisions in
the hope that the extra slots will prevent chains from forming.

Using an integer hash function ought to be more resistant to input value
patterns and eliminate the need for multiply-by-4, because chains only
form when the table load is high.

Finally, the hash optimizes for the sparsely occupied table scenario.
If the table is loaded (dense) then there will be chains in any case.  A
full search will be necessary.

Feel free to leave the hash function as-is, I don't think it's a huge
issue either way.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] qcow2: style fixes in qcow2-cache.c
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 6/6] qcow2: style fixes in qcow2-cache.c Alberto Garcia
@ 2015-05-06 16:42   ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-06 16:42 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 354 bytes --]

On Thu, Apr 30, 2015 at 01:11:45PM +0300, Alberto Garcia wrote:
> Fix pointer declaration to make it consistent with the rest of the
> code.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

:)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/6] qcow2 L2/refcount cache improvements
  2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
                   ` (5 preceding siblings ...)
  2015-04-30 10:11 ` [Qemu-devel] [PATCH 6/6] qcow2: style fixes in qcow2-cache.c Alberto Garcia
@ 2015-05-06 16:43 ` Stefan Hajnoczi
  6 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2015-05-06 16:43 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On Thu, Apr 30, 2015 at 01:11:39PM +0300, Alberto Garcia wrote:
> Here are some improvements to the qcow2 L2/refcount cache code.
> 
> The first one is that all cache tables are now allocated using a
> single memory block, as we discussed last week.
> 
> Apart from a more efficient use of memory, this allows some additional
> optimizations so I took the chance to make other changes.
> 
> - qcow2_cache_put() and qcow2_cache_entry_mark_dirty() are now O(1)
> - The eviction algorithm is now LRU. The previous one only works well
>   with very small cache sizes.
> - qcow2_cache_find_entry_to_replace() is no longer necessary.
> - Lookups are faster now.
> 
> In my tests with a preallocated 128MB L2 cache in an empty drive the
> new code is ~13% faster than the previous one (~43% if compiled
> without optimizations). This is a best-case scenario, if the cache is
> smaller or the drive is full of data the improvements are not so
> visible, but I believe the code is simpler now so I hope you find the
> changes worthwhile.

Overall I'm happy with this approach.  Looking forward to the next
revision.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache
  2015-05-06 16:42   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2015-05-07  8:23     ` Alberto Garcia
  0 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2015-05-07  8:23 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

On Wed 06 May 2015 06:42:30 PM CEST, Stefan Hajnoczi wrote:

>> By using a hash of the offset as the starting point, lookups are
>> faster and independent from the array size.
>> 
>> The hash is computed using the cluster number of the table, multiplied
>> by 4 to make it perform better when there are collisions.
> ...
>> +    i = lookup_index = (offset / c->table_size * 4) % c->size;

> The hash function is very weak.  It's better than always starting with
> i=0 but mixes input bits very poorly.  Have you considered an integer
> hash function so that more input bits affect the output?
> http://burtleburtle.net/bob/hash/integer.html

I didn't try very hard to optimize the hash function because it doesn't
have a big impact, I just wanted something simple and a bit better than
i=0.

I did my tests with a disk image with preallocated metadata and a cache
size big enough for the whole disk. In that scenario I managed an
average of 2.5 comparisons per lookup with this function.

> Regarding collisions, the multiply-by-4 effectively reduces the hash
> space by a factor of 4.  This *increases* the chance of collisions in
> the hope that the extra slots will prevent chains from forming.

It does, but it keeps the average much lower (in my tests at least), it
went down from >100 to 2.5.

> Feel free to leave the hash function as-is, I don't think it's a huge
> issue either way.

I think I'll just leave it as-is for the time being, but thanks a lot
for your comments.

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-05-07  8:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30 10:11 [Qemu-devel] [PATCH 0/6] qcow2 L2/refcount cache improvements Alberto Garcia
2015-04-30 10:11 ` [Qemu-devel] [PATCH 1/6] qcow2: use one single memory block for the L2/refcount cache tables Alberto Garcia
2015-04-30 15:08   ` Eric Blake
2015-05-05 13:00     ` Alberto Garcia
2015-05-01 14:23   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-05-04 10:58     ` Kevin Wolf
2015-05-05 10:28       ` Stefan Hajnoczi
2015-05-05 11:20         ` Kevin Wolf
2015-05-05 15:09           ` Alberto Garcia
2015-05-06 14:57           ` Stefan Hajnoczi
2015-04-30 10:11 ` [Qemu-devel] [PATCH 2/6] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Alberto Garcia
2015-05-01 14:31   ` Stefan Hajnoczi
2015-05-05 13:06     ` Alberto Garcia
2015-05-06 15:00       ` Stefan Hajnoczi
2015-05-06 15:32         ` Alberto Garcia
2015-05-05 15:21   ` Eric Blake
2015-04-30 10:11 ` [Qemu-devel] [PATCH 3/6] qcow2: use an LRU algorithm to replace entries from the L2 cache Alberto Garcia
2015-04-30 10:11 ` [Qemu-devel] [PATCH 4/6] qcow2: remove qcow2_cache_find_entry_to_replace() Alberto Garcia
2015-04-30 10:11 ` [Qemu-devel] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache Alberto Garcia
2015-05-06 16:42   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-05-07  8:23     ` Alberto Garcia
2015-04-30 10:11 ` [Qemu-devel] [PATCH 6/6] qcow2: style fixes in qcow2-cache.c Alberto Garcia
2015-05-06 16:42   ` Stefan Hajnoczi
2015-05-06 16:43 ` [Qemu-devel] [Qemu-block] [PATCH 0/6] qcow2 L2/refcount cache improvements Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.