All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache
@ 2015-05-18 16:48 Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

These patches use MADV_DONTNEED to clean unused cache entries. Under
Linux it frees the memory used by those pages.

Berto

Alberto Garcia (3):
  qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  qcow2: add option to clean unused cache entries after some time
  qcow2: reorder fields in Qcow2CachedTable to reduce padding

 block/qcow2-cache.c  | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/qcow2.c        | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json |  6 +++++-
 4 files changed, 121 insertions(+), 2 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-18 16:48 ` Alberto Garcia
  2015-05-26 15:39   ` Max Reitz
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
  2 siblings, 1 reply; 20+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

After having emptied the cache, the data in the cache tables is no
longer useful, so we can tell the kernel that we are done with it. In
Linux this frees the resources associated with it.

The effect of this can be seen in the block-commit operation: it moves
data from the top to the base image (and fills both caches), then it
empties the top image. At this point the data in that cache is no
longer needed so it's just wasting memory.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed92a09..ed14a92 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -22,8 +22,10 @@
  * THE SOFTWARE.
  */
 
+#include <sys/mman.h>
 #include "block/block_int.h"
 #include "qemu-common.h"
+#include "qemu/osdep.h"
 #include "qcow2.h"
 #include "trace.h"
 
@@ -60,6 +62,22 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
     return idx;
 }
 
+static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
+                                      int i, int num_tables)
+{
+#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
+    BDRVQcowState *s = bs->opaque;
+    void *t = qcow2_cache_get_table_addr(bs, c, i);
+    long align = sysconf(_SC_PAGESIZE);
+    size_t mem_size = (size_t) s->cluster_size * num_tables;
+    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
+    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
+    if (length > 0) {
+        qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
+    }
+#endif
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
@@ -237,6 +255,8 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
         c->entries[i].lru_counter = 0;
     }
 
+    qcow2_cache_table_release(bs, c, 0, c->size);
+
     c->lru_counter = 0;
 
     return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
@ 2015-05-18 16:48 ` Alberto Garcia
  2015-05-26 16:07   ` Max Reitz
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
  2 siblings, 1 reply; 20+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This adds a new 'cache-clean-interval' option that cleans all qcow2
cache entries that haven't been used in a certain interval, given in
seconds.

This allows setting a large L2 cache size so it can handle scenarios
with lots of I/O and at the same time use little memory during periods
of inactivity.

This feature currently relies on MADV_DONTNEED to free that memory, so
it is not useful in systems that don't follow that behavior.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++++++
 block/qcow2.c        | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json |  6 +++++-
 4 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed14a92..a215f5b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -43,6 +43,7 @@ struct Qcow2Cache {
     bool                    depends_on_flush;
     void                   *table_array;
     uint64_t                lru_counter;
+    uint64_t                cache_clean_lru_counter;
 };
 
 static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
@@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
 #endif
 }
 
+static inline bool can_clean_entry(Qcow2Cache *c, int i)
+{
+    Qcow2CachedTable *t = &c->entries[i];
+    return t->ref == 0 && !t->dirty && t->offset != 0 &&
+        t->lru_counter <= c->cache_clean_lru_counter;
+}
+
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
+{
+    int i = 0;
+    while (i < c->size) {
+        int to_clean = 0;
+
+        /* Skip the entries that we don't need to clean */
+        while (i < c->size && !can_clean_entry(c, i)) {
+            i++;
+        }
+
+        /* And count how many we can clean in a row */
+        while (i < c->size && can_clean_entry(c, i)) {
+            c->entries[i].offset = 0;
+            c->entries[i].lru_counter = 0;
+            i++;
+            to_clean++;
+        }
+
+        if (to_clean > 0) {
+            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
+        }
+    }
+
+    c->cache_clean_lru_counter = c->lru_counter;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index b9a72e3..f6ed39f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Maximum refcount block cache size",
         },
+        {
+            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Clean unused cache entries after this time (in seconds)",
+        },
         { /* end of list */ }
     },
 };
@@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
     [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void cache_clean_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    qcow2_cache_clean_unused(bs, s->l2_table_cache);
+    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
+    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->cache_clean_interval * 1000);
+}
+
+static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_interval > 0) {
+        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
+                                             SCALE_MS, cache_clean_timer_cb,
+                                             bs);
+        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  s->cache_clean_interval * 1000);
+    }
+}
+
+static void cache_clean_timer_del(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_timer) {
+        timer_del(s->cache_clean_timer);
+        timer_free(s->cache_clean_timer);
+        s->cache_clean_timer = NULL;
+    }
+}
+
+static void qcow2_detach_aio_context(BlockDriverState *bs)
+{
+    cache_clean_timer_del(bs);
+}
+
+static void qcow2_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    cache_clean_timer_init(bs, new_context);
+}
+
 static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
                              uint64_t *refcount_cache_size, Error **errp)
 {
@@ -838,6 +886,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         ret = -ENOMEM;
         goto fail;
     }
+    s->cache_clean_interval =
+        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
+    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
 
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
@@ -1004,6 +1055,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
     s->l1_table = NULL;
+    cache_clean_timer_del(bs);
     if (s->l2_table_cache) {
         qcow2_cache_destroy(bs, s->l2_table_cache);
     }
@@ -1453,6 +1505,7 @@ static void qcow2_close(BlockDriverState *bs)
         }
     }
 
+    cache_clean_timer_del(bs);
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
 
@@ -2964,6 +3017,9 @@ BlockDriver bdrv_qcow2 = {
     .create_opts         = &qcow2_create_opts,
     .bdrv_check          = qcow2_check,
     .bdrv_amend_options  = qcow2_amend_options,
+
+    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
+    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
 };
 
 static void bdrv_qcow2_init(void)
diff --git a/block/qcow2.h b/block/qcow2.h
index 0076512..2c45eb2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -93,6 +93,7 @@
 #define QCOW2_OPT_CACHE_SIZE "cache-size"
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
+#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    QEMUTimer *cache_clean_timer;
+    unsigned cache_clean_interval;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
     Qcow2Cache *dependency);
 void qcow2_cache_depends_on_flush(Qcow2Cache *c);
 
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
 
 int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 863ffea..f42338d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1538,6 +1538,9 @@
 # @refcount-cache-size:   #optional the maximum size of the refcount block cache
 #                         in bytes (since 2.2)
 #
+# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
+#                         caches. The interval is in seconds (since 2.4)
+#
 # Since: 1.7
 ##
 { 'struct': 'BlockdevOptionsQcow2',
@@ -1549,7 +1552,8 @@
             '*overlap-check': 'Qcow2OverlapChecks',
             '*cache-size': 'int',
             '*l2-cache-size': 'int',
-            '*refcount-cache-size': 'int' } }
+            '*refcount-cache-size': 'int',
+            '*cache-clean-interval': 'int' } }
 
 
 ##
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
@ 2015-05-18 16:48 ` Alberto Garcia
  2015-05-26 16:10   ` Max Reitz
  2015-05-26 16:13   ` Eric Blake
  2 siblings, 2 replies; 20+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

Changing the current ordering saves 8 bytes per cache entry in x86_64.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index a215f5b..43590ff 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -31,9 +31,9 @@
 
 typedef struct Qcow2CachedTable {
     int64_t  offset;
-    bool     dirty;
     uint64_t lru_counter;
     int      ref;
+    bool     dirty;
 } Qcow2CachedTable;
 
 struct Qcow2Cache {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
@ 2015-05-26 15:39   ` Max Reitz
  2015-05-26 15:51     ` Alberto Garcia
  0 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2015-05-26 15:39 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 18.05.2015 18:48, Alberto Garcia wrote:
> After having emptied the cache, the data in the cache tables is no
> longer useful, so we can tell the kernel that we are done with it. In
> Linux this frees the resources associated with it.
>
> The effect of this can be seen in the block-commit operation: it moves
> data from the top to the base image (and fills both caches), then it
> empties the top image. At this point the data in that cache is no
> longer needed so it's just wasting memory.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)

Looks good, but by applying the same logic, you could do the same call 
in qcow2_cache_create(). So what about it? :-)

Also note that bdrv_commit() is used only by the HMP commit operation, 
not by QMP commit.

Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-26 15:39   ` Max Reitz
@ 2015-05-26 15:51     ` Alberto Garcia
  2015-05-26 15:51       ` Max Reitz
  0 siblings, 1 reply; 20+ messages in thread
From: Alberto Garcia @ 2015-05-26 15:51 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On Tue 26 May 2015 05:39:12 PM CEST, Max Reitz <mreitz@redhat.com> wrote:

>> After having emptied the cache, the data in the cache tables is no
>> longer useful, so we can tell the kernel that we are done with it. In
>> Linux this frees the resources associated with it.

> Looks good, but by applying the same logic, you could do the same call
> in qcow2_cache_create(). So what about it? :-)

But after qcow2_cache_create() the memory is still unused so this
doesn't have any affect.

> Also note that bdrv_commit() is used only by the HMP commit operation,
> not by QMP commit.

Thanks, I should probably clarify that in the commit message.

Berto

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-26 15:51     ` Alberto Garcia
@ 2015-05-26 15:51       ` Max Reitz
  0 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2015-05-26 15:51 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 26.05.2015 17:51, Alberto Garcia wrote:
> On Tue 26 May 2015 05:39:12 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>
>>> After having emptied the cache, the data in the cache tables is no
>>> longer useful, so we can tell the kernel that we are done with it. In
>>> Linux this frees the resources associated with it.
>> Looks good, but by applying the same logic, you could do the same call
>> in qcow2_cache_create(). So what about it? :-)
> But after qcow2_cache_create() the memory is still unused so this
> doesn't have any affect.

That was too easy. Right.

Reviewed-by: Max Reitz <mreitz@redhat.com>

>> Also note that bdrv_commit() is used only by the HMP commit operation,
>> not by QMP commit.
> Thanks, I should probably clarify that in the commit message.
>
> Berto

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
@ 2015-05-26 16:07   ` Max Reitz
  0 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2015-05-26 16:07 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 18.05.2015 18:48, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
>
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
>
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++++++
>   block/qcow2.c        | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2.h        |  4 ++++
>   qapi/block-core.json |  6 +++++-
>   4 files changed, 100 insertions(+), 1 deletion(-)
>
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index ed14a92..a215f5b 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -43,6 +43,7 @@ struct Qcow2Cache {
>       bool                    depends_on_flush;
>       void                   *table_array;
>       uint64_t                lru_counter;
> +    uint64_t                cache_clean_lru_counter;
>   };
>   
>   static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
> @@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
>   #endif
>   }
>   
> +static inline bool can_clean_entry(Qcow2Cache *c, int i)
> +{
> +    Qcow2CachedTable *t = &c->entries[i];
> +    return t->ref == 0 && !t->dirty && t->offset != 0 &&
> +        t->lru_counter <= c->cache_clean_lru_counter;
> +}
> +
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
> +{
> +    int i = 0;
> +    while (i < c->size) {
> +        int to_clean = 0;
> +
> +        /* Skip the entries that we don't need to clean */
> +        while (i < c->size && !can_clean_entry(c, i)) {
> +            i++;
> +        }
> +
> +        /* And count how many we can clean in a row */
> +        while (i < c->size && can_clean_entry(c, i)) {
> +            c->entries[i].offset = 0;
> +            c->entries[i].lru_counter = 0;
> +            i++;
> +            to_clean++;
> +        }
> +
> +        if (to_clean > 0) {
> +            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
> +        }
> +    }
> +
> +    c->cache_clean_lru_counter = c->lru_counter;
> +}
> +
>   Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>   {
>       BDRVQcowState *s = bs->opaque;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b9a72e3..f6ed39f 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
>               .type = QEMU_OPT_SIZE,
>               .help = "Maximum refcount block cache size",
>           },
> +        {
> +            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Clean unused cache entries after this time (in seconds)",
> +        },
>           { /* end of list */ }
>       },
>   };
> @@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
>       [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
>   };
>   
> +static void cache_clean_timer_cb(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BDRVQcowState *s = bs->opaque;
> +    qcow2_cache_clean_unused(bs, s->l2_table_cache);
> +    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
> +    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +              s->cache_clean_interval * 1000);
> +}
> +
> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_interval > 0) {
> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
> +                                             SCALE_MS, cache_clean_timer_cb,
> +                                             bs);
> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  s->cache_clean_interval * 1000);

Can overflow.

> +    }
> +}
> +
> +static void cache_clean_timer_del(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_timer) {
> +        timer_del(s->cache_clean_timer);
> +        timer_free(s->cache_clean_timer);
> +        s->cache_clean_timer = NULL;
> +    }
> +}
> +
> +static void qcow2_detach_aio_context(BlockDriverState *bs)
> +{
> +    cache_clean_timer_del(bs);
> +}
> +
> +static void qcow2_attach_aio_context(BlockDriverState *bs,
> +                                     AioContext *new_context)
> +{
> +    cache_clean_timer_init(bs, new_context);
> +}
> +
>   static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
>                                uint64_t *refcount_cache_size, Error **errp)
>   {
> @@ -838,6 +886,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>           ret = -ENOMEM;
>           goto fail;
>       }
> +    s->cache_clean_interval =
> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);

Can overflow.

Apart from these two things (which are by no means really bad, but it'd 
better not to introduce them in the first place...), the patch looks good.

Max

> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
>   
>       s->cluster_cache = g_malloc(s->cluster_size);
>       /* one more sector for decompressed data alignment */
> @@ -1004,6 +1055,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       qemu_vfree(s->l1_table);
>       /* else pre-write overlap checks in cache_destroy may crash */
>       s->l1_table = NULL;
> +    cache_clean_timer_del(bs);
>       if (s->l2_table_cache) {
>           qcow2_cache_destroy(bs, s->l2_table_cache);
>       }
> @@ -1453,6 +1505,7 @@ static void qcow2_close(BlockDriverState *bs)
>           }
>       }
>   
> +    cache_clean_timer_del(bs);
>       qcow2_cache_destroy(bs, s->l2_table_cache);
>       qcow2_cache_destroy(bs, s->refcount_block_cache);
>   
> @@ -2964,6 +3017,9 @@ BlockDriver bdrv_qcow2 = {
>       .create_opts         = &qcow2_create_opts,
>       .bdrv_check          = qcow2_check,
>       .bdrv_amend_options  = qcow2_amend_options,
> +
> +    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
> +    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
>   };
>   
>   static void bdrv_qcow2_init(void)
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0076512..2c45eb2 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -93,6 +93,7 @@
>   #define QCOW2_OPT_CACHE_SIZE "cache-size"
>   #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
>   #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
> +#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
>   
>   typedef struct QCowHeader {
>       uint32_t magic;
> @@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
>   
>       Qcow2Cache* l2_table_cache;
>       Qcow2Cache* refcount_block_cache;
> +    QEMUTimer *cache_clean_timer;
> +    unsigned cache_clean_interval;
>   
>       uint8_t *cluster_cache;
>       uint8_t *cluster_data;
> @@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>       Qcow2Cache *dependency);
>   void qcow2_cache_depends_on_flush(Qcow2Cache *c);
>   
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
>   int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
>   
>   int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 863ffea..f42338d 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1538,6 +1538,9 @@
>   # @refcount-cache-size:   #optional the maximum size of the refcount block cache
>   #                         in bytes (since 2.2)
>   #
> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
> +#                         caches. The interval is in seconds (since 2.4)
> +#
>   # Since: 1.7
>   ##
>   { 'struct': 'BlockdevOptionsQcow2',
> @@ -1549,7 +1552,8 @@
>               '*overlap-check': 'Qcow2OverlapChecks',
>               '*cache-size': 'int',
>               '*l2-cache-size': 'int',
> -            '*refcount-cache-size': 'int' } }
> +            '*refcount-cache-size': 'int',
> +            '*cache-clean-interval': 'int' } }
>   
>   
>   ##

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
@ 2015-05-26 16:10   ` Max Reitz
  2015-05-26 16:12     ` Eric Blake
  2015-05-26 16:20     ` Alberto Garcia
  2015-05-26 16:13   ` Eric Blake
  1 sibling, 2 replies; 20+ messages in thread
From: Max Reitz @ 2015-05-26 16:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 18.05.2015 18:48, Alberto Garcia wrote:
> Changing the current ordering saves 8 bytes per cache entry in x86_64.

Hm, not seven?

> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index a215f5b..43590ff 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -31,9 +31,9 @@
>   
>   typedef struct Qcow2CachedTable {
>       int64_t  offset;
> -    bool     dirty;
>       uint64_t lru_counter;
>       int      ref;
> +    bool     dirty;
>   } Qcow2CachedTable;
>   
>   struct Qcow2Cache {

With "7" above, or an explanation why it actually is 8:

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-26 16:10   ` Max Reitz
@ 2015-05-26 16:12     ` Eric Blake
  2015-05-26 16:14       ` Max Reitz
  2015-05-26 16:20     ` Alberto Garcia
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Blake @ 2015-05-26 16:12 UTC (permalink / raw)
  To: Max Reitz, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]

On 05/26/2015 10:10 AM, Max Reitz wrote:
> On 18.05.2015 18:48, Alberto Garcia wrote:
>> Changing the current ordering saves 8 bytes per cache entry in x86_64.
> 
> Hm, not seven?
> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   block/qcow2-cache.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>> index a215f5b..43590ff 100644
>> --- a/block/qcow2-cache.c
>> +++ b/block/qcow2-cache.c
>> @@ -31,9 +31,9 @@
>>     typedef struct Qcow2CachedTable {
>>       int64_t  offset;
>> -    bool     dirty;
>>       uint64_t lru_counter;
>>       int      ref;
>> +    bool     dirty;
>>   } Qcow2CachedTable;
>>     struct Qcow2Cache {
> 
> With "7" above, or an explanation why it actually is 8:

Old layout:

0-7   offset
8     dirty
9-15  padding
16-23 lru_counter
24-27 dirty
28-31 padding

New layout:

0-7   offset
8-15  lru_counter
16-19 ref
20    dirty
21-23 padding

It indeed saves 8 bytes.

> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
  2015-05-26 16:10   ` Max Reitz
@ 2015-05-26 16:13   ` Eric Blake
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Blake @ 2015-05-26 16:13 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

On 05/18/2015 10:48 AM, Alberto Garcia wrote:
> Changing the current ordering saves 8 bytes per cache entry in x86_64.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-26 16:12     ` Eric Blake
@ 2015-05-26 16:14       ` Max Reitz
  0 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2015-05-26 16:14 UTC (permalink / raw)
  To: Eric Blake, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 26.05.2015 18:12, Eric Blake wrote:
> On 05/26/2015 10:10 AM, Max Reitz wrote:
>> On 18.05.2015 18:48, Alberto Garcia wrote:
>>> Changing the current ordering saves 8 bytes per cache entry in x86_64.
>> Hm, not seven?
>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>    block/qcow2-cache.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>>> index a215f5b..43590ff 100644
>>> --- a/block/qcow2-cache.c
>>> +++ b/block/qcow2-cache.c
>>> @@ -31,9 +31,9 @@
>>>      typedef struct Qcow2CachedTable {
>>>        int64_t  offset;
>>> -    bool     dirty;
>>>        uint64_t lru_counter;
>>>        int      ref;
>>> +    bool     dirty;
>>>    } Qcow2CachedTable;
>>>      struct Qcow2Cache {
>> With "7" above, or an explanation why it actually is 8:
> Old layout:
>
> 0-7   offset
> 8     dirty
> 9-15  padding
> 16-23 lru_counter
> 24-27 dirty
> 28-31 padding
>
> New layout:
>
> 0-7   offset
> 8-15  lru_counter
> 16-19 ref
> 20    dirty
> 21-23 padding
>
> It indeed saves 8 bytes.

Oh, I forgot about the padding at the end. Thanks!

Max

>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-26 16:10   ` Max Reitz
  2015-05-26 16:12     ` Eric Blake
@ 2015-05-26 16:20     ` Alberto Garcia
  1 sibling, 0 replies; 20+ messages in thread
From: Alberto Garcia @ 2015-05-26 16:20 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On Tue 26 May 2015 06:10:11 PM CEST, Max Reitz wrote:
> On 18.05.2015 18:48, Alberto Garcia wrote:
>> Changing the current ordering saves 8 bytes per cache entry in x86_64.
>
> Hm, not seven?

No, the size is 32 before the patch and 24 afterwards.

What you save is the 7 bytes of padding after 'dirty' and one of the
bytes after 'ref'.

>>   typedef struct Qcow2CachedTable {
>>       int64_t  offset;
>> -    bool     dirty;
>>       uint64_t lru_counter;
>>       int      ref;
>> +    bool     dirty;
>>   } Qcow2CachedTable;

Berto

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-06-02 10:56       ` Kevin Wolf
@ 2015-06-02 11:08         ` Alberto Garcia
  0 siblings, 0 replies; 20+ messages in thread
From: Alberto Garcia @ 2015-06-02 11:08 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi, Max Reitz

On Tue 02 Jun 2015 12:56:10 PM CEST, Kevin Wolf wrote:

>> > It seems that getpagesize() is usually used in qemu.
>> 
>> The getpagesize() manual page actually recommends using sysconf() for
>> portability reasons. But other than that I don't have a problem with
>> getpagesize() if it's the preferred choice in QEMU.
>
> There is a Windows implementation for getpagesize() in qemu, so I
> guess it's actually more portable in our context.

Ok, I'll change that.

> Now, as you'll #ifdef this code out for Windows, that probably doesn't
> matter, so it's just about consistency.

There's actually no need to #ifdef that code out, I only do it because I
know in advance that it's going to be a no-op, but qemu_madvise() itself
doesn't break or anything if the operation is not supported.

Berto

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-06-02 10:50     ` Alberto Garcia
@ 2015-06-02 10:56       ` Kevin Wolf
  2015-06-02 11:08         ` Alberto Garcia
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2015-06-02 10:56 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 02.06.2015 um 12:50 hat Alberto Garcia geschrieben:
> On Tue 02 Jun 2015 12:05:59 PM CEST, Kevin Wolf <kwolf@redhat.com> wrote:
> 
> >> +#include <sys/mman.h>
> >>  #include "block/block_int.h"
> >>  #include "qemu-common.h"
> >> +#include "qemu/osdep.h"
> >>  #include "qcow2.h"
> >>  #include "trace.h"
> >
> > This breaks the mingw build:
> >
> > /mnt/qemu/block/qcow2-cache.c:25:22: fatal error: sys/mman.h: No such file or directory
> >  #include <sys/mman.h>
> 
> Ok, I can use #if defined(CONFIG_MADVISE) || defined(CONFIG_POSIX_MADVISE)
> as it's done in util/osdep.c for the same header.
> 
> >> +#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
> >> +    BDRVQcowState *s = bs->opaque;
> >> +    void *t = qcow2_cache_get_table_addr(bs, c, i);
> >> +    long align = sysconf(_SC_PAGESIZE);
> >
> > It seems that getpagesize() is usually used in qemu.
> 
> The getpagesize() manual page actually recommends using sysconf() for
> portability reasons. But other than that I don't have a problem with
> getpagesize() if it's the preferred choice in QEMU.

There is a Windows implementation for getpagesize() in qemu, so I guess
it's actually more portable in our context.

Now, as you'll #ifdef this code out for Windows, that probably doesn't
matter, so it's just about consistency.

> >> +    size_t mem_size = (size_t) s->cluster_size * num_tables;
> >> +    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
> >> +    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
> >
> > Instead of all the aligning here, shouldn't we just make sure that the
> > tables are created with the right alignment?
> 
> The tables should have the right alignment and their size should be a
> multiple of the page size. The latter condition is not necessarily true
> (a cluster can be smaller than one page) so I'd keep that code in any
> case.

We could adjust the size to MAX(size, pagesize), but perhaps that wastes
a bit too much memory indeed.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-06-02 10:05   ` Kevin Wolf
@ 2015-06-02 10:50     ` Alberto Garcia
  2015-06-02 10:56       ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Alberto Garcia @ 2015-06-02 10:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi, Max Reitz

On Tue 02 Jun 2015 12:05:59 PM CEST, Kevin Wolf <kwolf@redhat.com> wrote:

>> +#include <sys/mman.h>
>>  #include "block/block_int.h"
>>  #include "qemu-common.h"
>> +#include "qemu/osdep.h"
>>  #include "qcow2.h"
>>  #include "trace.h"
>
> This breaks the mingw build:
>
> /mnt/qemu/block/qcow2-cache.c:25:22: fatal error: sys/mman.h: No such file or directory
>  #include <sys/mman.h>

Ok, I can use #if defined(CONFIG_MADVISE) || defined(CONFIG_POSIX_MADVISE)
as it's done in util/osdep.c for the same header.

>> +#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
>> +    BDRVQcowState *s = bs->opaque;
>> +    void *t = qcow2_cache_get_table_addr(bs, c, i);
>> +    long align = sysconf(_SC_PAGESIZE);
>
> It seems that getpagesize() is usually used in qemu.

The getpagesize() manual page actually recommends using sysconf() for
portability reasons. But other than that I don't have a problem with
getpagesize() if it's the preferred choice in QEMU.

>> +    size_t mem_size = (size_t) s->cluster_size * num_tables;
>> +    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
>> +    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
>
> Instead of all the aligning here, shouldn't we just make sure that the
> tables are created with the right alignment?

The tables should have the right alignment and their size should be a
multiple of the page size. The latter condition is not necessarily true
(a cluster can be smaller than one page) so I'd keep that code in any
case.

Berto

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-29  9:24 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
@ 2015-06-02 10:05   ` Kevin Wolf
  2015-06-02 10:50     ` Alberto Garcia
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2015-06-02 10:05 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 29.05.2015 um 11:24 hat Alberto Garcia geschrieben:
> After having emptied the cache, the data in the cache tables is no
> longer useful, so we can tell the kernel that we are done with it. In
> Linux this frees the resources associated with it.
> 
> The effect of this can be seen in the HMP commit operation: it moves
> data from the top to the base image (and fills both caches), then it
> empties the top image. At this point the data in that cache is no
> longer needed so it's just wasting memory.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-cache.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index ed92a09..ed14a92 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -22,8 +22,10 @@
>   * THE SOFTWARE.
>   */
>  
> +#include <sys/mman.h>
>  #include "block/block_int.h"
>  #include "qemu-common.h"
> +#include "qemu/osdep.h"
>  #include "qcow2.h"
>  #include "trace.h"

This breaks the mingw build:

/mnt/qemu/block/qcow2-cache.c:25:22: fatal error: sys/mman.h: No such file or directory
 #include <sys/mman.h>

> @@ -60,6 +62,22 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
>      return idx;
>  }
>  
> +static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
> +                                      int i, int num_tables)
> +{
> +#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
> +    BDRVQcowState *s = bs->opaque;
> +    void *t = qcow2_cache_get_table_addr(bs, c, i);
> +    long align = sysconf(_SC_PAGESIZE);

It seems that getpagesize() is usually used in qemu.

> +    size_t mem_size = (size_t) s->cluster_size * num_tables;
> +    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
> +    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);

Instead of all the aligning here, shouldn't we just make sure that the
tables are created with the right alignment?

> +    if (length > 0) {
> +        qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
> +    }
> +#endif
> +}

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-29  9:24 [Qemu-devel] [PATCH v4 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-29  9:24 ` Alberto Garcia
  2015-06-02 10:05   ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Alberto Garcia @ 2015-05-29  9:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

After having emptied the cache, the data in the cache tables is no
longer useful, so we can tell the kernel that we are done with it. In
Linux this frees the resources associated with it.

The effect of this can be seen in the HMP commit operation: it moves
data from the top to the base image (and fills both caches), then it
empties the top image. At this point the data in that cache is no
longer needed so it's just wasting memory.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cache.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed92a09..ed14a92 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -22,8 +22,10 @@
  * THE SOFTWARE.
  */
 
+#include <sys/mman.h>
 #include "block/block_int.h"
 #include "qemu-common.h"
+#include "qemu/osdep.h"
 #include "qcow2.h"
 #include "trace.h"
 
@@ -60,6 +62,22 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
     return idx;
 }
 
+static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
+                                      int i, int num_tables)
+{
+#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
+    BDRVQcowState *s = bs->opaque;
+    void *t = qcow2_cache_get_table_addr(bs, c, i);
+    long align = sysconf(_SC_PAGESIZE);
+    size_t mem_size = (size_t) s->cluster_size * num_tables;
+    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
+    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
+    if (length > 0) {
+        qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
+    }
+#endif
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
@@ -237,6 +255,8 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
         c->entries[i].lru_counter = 0;
     }
 
+    qcow2_cache_table_release(bs, c, 0, c->size);
+
     c->lru_counter = 0;
 
     return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-27  9:46 [Qemu-devel] [PATCH v3 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-27  9:46 ` Alberto Garcia
  0 siblings, 0 replies; 20+ messages in thread
From: Alberto Garcia @ 2015-05-27  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

After having emptied the cache, the data in the cache tables is no
longer useful, so we can tell the kernel that we are done with it. In
Linux this frees the resources associated with it.

The effect of this can be seen in the HMP commit operation: it moves
data from the top to the base image (and fills both caches), then it
empties the top image. At this point the data in that cache is no
longer needed so it's just wasting memory.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cache.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed92a09..ed14a92 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -22,8 +22,10 @@
  * THE SOFTWARE.
  */
 
+#include <sys/mman.h>
 #include "block/block_int.h"
 #include "qemu-common.h"
+#include "qemu/osdep.h"
 #include "qcow2.h"
 #include "trace.h"
 
@@ -60,6 +62,22 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
     return idx;
 }
 
+static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
+                                      int i, int num_tables)
+{
+#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
+    BDRVQcowState *s = bs->opaque;
+    void *t = qcow2_cache_get_table_addr(bs, c, i);
+    long align = sysconf(_SC_PAGESIZE);
+    size_t mem_size = (size_t) s->cluster_size * num_tables;
+    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
+    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
+    if (length > 0) {
+        qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
+    }
+#endif
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
@@ -237,6 +255,8 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
         c->entries[i].lru_counter = 0;
     }
 
+    qcow2_cache_table_release(bs, c, 0, c->size);
+
     c->lru_counter = 0;
 
     return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-26 17:14 [Qemu-devel] [PATCH v2 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-26 17:14 ` Alberto Garcia
  0 siblings, 0 replies; 20+ messages in thread
From: Alberto Garcia @ 2015-05-26 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

After having emptied the cache, the data in the cache tables is no
longer useful, so we can tell the kernel that we are done with it. In
Linux this frees the resources associated with it.

The effect of this can be seen in the HMP commit operation: it moves
data from the top to the base image (and fills both caches), then it
empties the top image. At this point the data in that cache is no
longer needed so it's just wasting memory.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cache.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed92a09..ed14a92 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -22,8 +22,10 @@
  * THE SOFTWARE.
  */
 
+#include <sys/mman.h>
 #include "block/block_int.h"
 #include "qemu-common.h"
+#include "qemu/osdep.h"
 #include "qcow2.h"
 #include "trace.h"
 
@@ -60,6 +62,22 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
     return idx;
 }
 
+static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
+                                      int i, int num_tables)
+{
+#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
+    BDRVQcowState *s = bs->opaque;
+    void *t = qcow2_cache_get_table_addr(bs, c, i);
+    long align = sysconf(_SC_PAGESIZE);
+    size_t mem_size = (size_t) s->cluster_size * num_tables;
+    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
+    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
+    if (length > 0) {
+        qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
+    }
+#endif
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
@@ -237,6 +255,8 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
         c->entries[i].lru_counter = 0;
     }
 
+    qcow2_cache_table_release(bs, c, 0, c->size);
+
     c->lru_counter = 0;
 
     return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-06-02 11:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
2015-05-26 15:39   ` Max Reitz
2015-05-26 15:51     ` Alberto Garcia
2015-05-26 15:51       ` Max Reitz
2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
2015-05-26 16:07   ` Max Reitz
2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
2015-05-26 16:10   ` Max Reitz
2015-05-26 16:12     ` Eric Blake
2015-05-26 16:14       ` Max Reitz
2015-05-26 16:20     ` Alberto Garcia
2015-05-26 16:13   ` Eric Blake
2015-05-26 17:14 [Qemu-devel] [PATCH v2 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-26 17:14 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
2015-05-27  9:46 [Qemu-devel] [PATCH v3 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-27  9:46 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
2015-05-29  9:24 [Qemu-devel] [PATCH v4 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-29  9:24 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
2015-06-02 10:05   ` Kevin Wolf
2015-06-02 10:50     ` Alberto Garcia
2015-06-02 10:56       ` Kevin Wolf
2015-06-02 11:08         ` Alberto Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.