All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache
@ 2015-05-18 16:48 Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

These patches use MADV_DONTNEED to clean unused cache entries. Under
Linux it frees the memory used by those pages.

Berto

Alberto Garcia (3):
  qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  qcow2: add option to clean unused cache entries after some time
  qcow2: reorder fields in Qcow2CachedTable to reduce padding

 block/qcow2-cache.c  | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/qcow2.c        | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json |  6 +++++-
 4 files changed, 121 insertions(+), 2 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-18 16:48 ` Alberto Garcia
  2015-05-26 15:39   ` Max Reitz
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
  2 siblings, 1 reply; 37+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

After having emptied the cache, the data in the cache tables is no
longer useful, so we can tell the kernel that we are done with it. In
Linux this frees the resources associated with it.

The effect of this can be seen in the block-commit operation: it moves
data from the top to the base image (and fills both caches), then it
empties the top image. At this point the data in that cache is no
longer needed so it's just wasting memory.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed92a09..ed14a92 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -22,8 +22,10 @@
  * THE SOFTWARE.
  */
 
+#include <sys/mman.h>
 #include "block/block_int.h"
 #include "qemu-common.h"
+#include "qemu/osdep.h"
 #include "qcow2.h"
 #include "trace.h"
 
@@ -60,6 +62,22 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
     return idx;
 }
 
+static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
+                                      int i, int num_tables)
+{
+#if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
+    BDRVQcowState *s = bs->opaque;
+    void *t = qcow2_cache_get_table_addr(bs, c, i);
+    long align = sysconf(_SC_PAGESIZE);
+    size_t mem_size = (size_t) s->cluster_size * num_tables;
+    size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
+    size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
+    if (length > 0) {
+        qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
+    }
+#endif
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
@@ -237,6 +255,8 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
         c->entries[i].lru_counter = 0;
     }
 
+    qcow2_cache_table_release(bs, c, 0, c->size);
+
     c->lru_counter = 0;
 
     return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
@ 2015-05-18 16:48 ` Alberto Garcia
  2015-05-26 16:07   ` Max Reitz
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
  2 siblings, 1 reply; 37+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This adds a new 'cache-clean-interval' option that cleans all qcow2
cache entries that haven't been used in a certain interval, given in
seconds.

This allows setting a large L2 cache size so it can handle scenarios
with lots of I/O and at the same time use little memory during periods
of inactivity.

This feature currently relies on MADV_DONTNEED to free that memory, so
it is not useful in systems that don't follow that behavior.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++++++
 block/qcow2.c        | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json |  6 +++++-
 4 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed14a92..a215f5b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -43,6 +43,7 @@ struct Qcow2Cache {
     bool                    depends_on_flush;
     void                   *table_array;
     uint64_t                lru_counter;
+    uint64_t                cache_clean_lru_counter;
 };
 
 static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
@@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
 #endif
 }
 
+static inline bool can_clean_entry(Qcow2Cache *c, int i)
+{
+    Qcow2CachedTable *t = &c->entries[i];
+    return t->ref == 0 && !t->dirty && t->offset != 0 &&
+        t->lru_counter <= c->cache_clean_lru_counter;
+}
+
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
+{
+    int i = 0;
+    while (i < c->size) {
+        int to_clean = 0;
+
+        /* Skip the entries that we don't need to clean */
+        while (i < c->size && !can_clean_entry(c, i)) {
+            i++;
+        }
+
+        /* And count how many we can clean in a row */
+        while (i < c->size && can_clean_entry(c, i)) {
+            c->entries[i].offset = 0;
+            c->entries[i].lru_counter = 0;
+            i++;
+            to_clean++;
+        }
+
+        if (to_clean > 0) {
+            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
+        }
+    }
+
+    c->cache_clean_lru_counter = c->lru_counter;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index b9a72e3..f6ed39f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Maximum refcount block cache size",
         },
+        {
+            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Clean unused cache entries after this time (in seconds)",
+        },
         { /* end of list */ }
     },
 };
@@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
     [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void cache_clean_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    qcow2_cache_clean_unused(bs, s->l2_table_cache);
+    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
+    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              s->cache_clean_interval * 1000);
+}
+
+static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_interval > 0) {
+        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
+                                             SCALE_MS, cache_clean_timer_cb,
+                                             bs);
+        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  s->cache_clean_interval * 1000);
+    }
+}
+
+static void cache_clean_timer_del(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_timer) {
+        timer_del(s->cache_clean_timer);
+        timer_free(s->cache_clean_timer);
+        s->cache_clean_timer = NULL;
+    }
+}
+
+static void qcow2_detach_aio_context(BlockDriverState *bs)
+{
+    cache_clean_timer_del(bs);
+}
+
+static void qcow2_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    cache_clean_timer_init(bs, new_context);
+}
+
 static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
                              uint64_t *refcount_cache_size, Error **errp)
 {
@@ -838,6 +886,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         ret = -ENOMEM;
         goto fail;
     }
+    s->cache_clean_interval =
+        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
+    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
 
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
@@ -1004,6 +1055,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
     s->l1_table = NULL;
+    cache_clean_timer_del(bs);
     if (s->l2_table_cache) {
         qcow2_cache_destroy(bs, s->l2_table_cache);
     }
@@ -1453,6 +1505,7 @@ static void qcow2_close(BlockDriverState *bs)
         }
     }
 
+    cache_clean_timer_del(bs);
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
 
@@ -2964,6 +3017,9 @@ BlockDriver bdrv_qcow2 = {
     .create_opts         = &qcow2_create_opts,
     .bdrv_check          = qcow2_check,
     .bdrv_amend_options  = qcow2_amend_options,
+
+    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
+    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
 };
 
 static void bdrv_qcow2_init(void)
diff --git a/block/qcow2.h b/block/qcow2.h
index 0076512..2c45eb2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -93,6 +93,7 @@
 #define QCOW2_OPT_CACHE_SIZE "cache-size"
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
+#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    QEMUTimer *cache_clean_timer;
+    unsigned cache_clean_interval;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
     Qcow2Cache *dependency);
 void qcow2_cache_depends_on_flush(Qcow2Cache *c);
 
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
 
 int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 863ffea..f42338d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1538,6 +1538,9 @@
 # @refcount-cache-size:   #optional the maximum size of the refcount block cache
 #                         in bytes (since 2.2)
 #
+# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
+#                         caches. The interval is in seconds (since 2.4)
+#
 # Since: 1.7
 ##
 { 'struct': 'BlockdevOptionsQcow2',
@@ -1549,7 +1552,8 @@
             '*overlap-check': 'Qcow2OverlapChecks',
             '*cache-size': 'int',
             '*l2-cache-size': 'int',
-            '*refcount-cache-size': 'int' } }
+            '*refcount-cache-size': 'int',
+            '*cache-clean-interval': 'int' } }
 
 
 ##
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
@ 2015-05-18 16:48 ` Alberto Garcia
  2015-05-26 16:10   ` Max Reitz
  2015-05-26 16:13   ` Eric Blake
  2 siblings, 2 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-18 16:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

Changing the current ordering saves 8 bytes per cache entry in x86_64.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index a215f5b..43590ff 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -31,9 +31,9 @@
 
 typedef struct Qcow2CachedTable {
     int64_t  offset;
-    bool     dirty;
     uint64_t lru_counter;
     int      ref;
+    bool     dirty;
 } Qcow2CachedTable;
 
 struct Qcow2Cache {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
@ 2015-05-26 15:39   ` Max Reitz
  2015-05-26 15:51     ` Alberto Garcia
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2015-05-26 15:39 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 18.05.2015 18:48, Alberto Garcia wrote:
> After having emptied the cache, the data in the cache tables is no
> longer useful, so we can tell the kernel that we are done with it. In
> Linux this frees the resources associated with it.
>
> The effect of this can be seen in the block-commit operation: it moves
> data from the top to the base image (and fills both caches), then it
> empties the top image. At this point the data in that cache is no
> longer needed so it's just wasting memory.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)

Looks good, but by applying the same logic, you could do the same call 
in qcow2_cache_create(). So what about it? :-)

Also note that bdrv_commit() is used only by the HMP commit operation, 
not by QMP commit.

Max

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-26 15:39   ` Max Reitz
@ 2015-05-26 15:51     ` Alberto Garcia
  2015-05-26 15:51       ` Max Reitz
  0 siblings, 1 reply; 37+ messages in thread
From: Alberto Garcia @ 2015-05-26 15:51 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On Tue 26 May 2015 05:39:12 PM CEST, Max Reitz <mreitz@redhat.com> wrote:

>> After having emptied the cache, the data in the cache tables is no
>> longer useful, so we can tell the kernel that we are done with it. In
>> Linux this frees the resources associated with it.

> Looks good, but by applying the same logic, you could do the same call
> in qcow2_cache_create(). So what about it? :-)

But after qcow2_cache_create() the memory is still unused so this
doesn't have any affect.

> Also note that bdrv_commit() is used only by the HMP commit operation,
> not by QMP commit.

Thanks, I should probably clarify that in the commit message.

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty()
  2015-05-26 15:51     ` Alberto Garcia
@ 2015-05-26 15:51       ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-26 15:51 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 26.05.2015 17:51, Alberto Garcia wrote:
> On Tue 26 May 2015 05:39:12 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>
>>> After having emptied the cache, the data in the cache tables is no
>>> longer useful, so we can tell the kernel that we are done with it. In
>>> Linux this frees the resources associated with it.
>> Looks good, but by applying the same logic, you could do the same call
>> in qcow2_cache_create(). So what about it? :-)
> But after qcow2_cache_create() the memory is still unused so this
> doesn't have any affect.

That was too easy. Right.

Reviewed-by: Max Reitz <mreitz@redhat.com>

>> Also note that bdrv_commit() is used only by the HMP commit operation,
>> not by QMP commit.
> Thanks, I should probably clarify that in the commit message.
>
> Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
@ 2015-05-26 16:07   ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-26 16:07 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 18.05.2015 18:48, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
>
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
>
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++++++
>   block/qcow2.c        | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2.h        |  4 ++++
>   qapi/block-core.json |  6 +++++-
>   4 files changed, 100 insertions(+), 1 deletion(-)
>
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index ed14a92..a215f5b 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -43,6 +43,7 @@ struct Qcow2Cache {
>       bool                    depends_on_flush;
>       void                   *table_array;
>       uint64_t                lru_counter;
> +    uint64_t                cache_clean_lru_counter;
>   };
>   
>   static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
> @@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
>   #endif
>   }
>   
> +static inline bool can_clean_entry(Qcow2Cache *c, int i)
> +{
> +    Qcow2CachedTable *t = &c->entries[i];
> +    return t->ref == 0 && !t->dirty && t->offset != 0 &&
> +        t->lru_counter <= c->cache_clean_lru_counter;
> +}
> +
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
> +{
> +    int i = 0;
> +    while (i < c->size) {
> +        int to_clean = 0;
> +
> +        /* Skip the entries that we don't need to clean */
> +        while (i < c->size && !can_clean_entry(c, i)) {
> +            i++;
> +        }
> +
> +        /* And count how many we can clean in a row */
> +        while (i < c->size && can_clean_entry(c, i)) {
> +            c->entries[i].offset = 0;
> +            c->entries[i].lru_counter = 0;
> +            i++;
> +            to_clean++;
> +        }
> +
> +        if (to_clean > 0) {
> +            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
> +        }
> +    }
> +
> +    c->cache_clean_lru_counter = c->lru_counter;
> +}
> +
>   Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>   {
>       BDRVQcowState *s = bs->opaque;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b9a72e3..f6ed39f 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
>               .type = QEMU_OPT_SIZE,
>               .help = "Maximum refcount block cache size",
>           },
> +        {
> +            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Clean unused cache entries after this time (in seconds)",
> +        },
>           { /* end of list */ }
>       },
>   };
> @@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
>       [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
>   };
>   
> +static void cache_clean_timer_cb(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BDRVQcowState *s = bs->opaque;
> +    qcow2_cache_clean_unused(bs, s->l2_table_cache);
> +    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
> +    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +              s->cache_clean_interval * 1000);
> +}
> +
> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_interval > 0) {
> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
> +                                             SCALE_MS, cache_clean_timer_cb,
> +                                             bs);
> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  s->cache_clean_interval * 1000);

Can overflow.

> +    }
> +}
> +
> +static void cache_clean_timer_del(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_timer) {
> +        timer_del(s->cache_clean_timer);
> +        timer_free(s->cache_clean_timer);
> +        s->cache_clean_timer = NULL;
> +    }
> +}
> +
> +static void qcow2_detach_aio_context(BlockDriverState *bs)
> +{
> +    cache_clean_timer_del(bs);
> +}
> +
> +static void qcow2_attach_aio_context(BlockDriverState *bs,
> +                                     AioContext *new_context)
> +{
> +    cache_clean_timer_init(bs, new_context);
> +}
> +
>   static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
>                                uint64_t *refcount_cache_size, Error **errp)
>   {
> @@ -838,6 +886,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>           ret = -ENOMEM;
>           goto fail;
>       }
> +    s->cache_clean_interval =
> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);

Can overflow.

Apart from these two things (which are by no means really bad, but it'd 
better not to introduce them in the first place...), the patch looks good.

Max

> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
>   
>       s->cluster_cache = g_malloc(s->cluster_size);
>       /* one more sector for decompressed data alignment */
> @@ -1004,6 +1055,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       qemu_vfree(s->l1_table);
>       /* else pre-write overlap checks in cache_destroy may crash */
>       s->l1_table = NULL;
> +    cache_clean_timer_del(bs);
>       if (s->l2_table_cache) {
>           qcow2_cache_destroy(bs, s->l2_table_cache);
>       }
> @@ -1453,6 +1505,7 @@ static void qcow2_close(BlockDriverState *bs)
>           }
>       }
>   
> +    cache_clean_timer_del(bs);
>       qcow2_cache_destroy(bs, s->l2_table_cache);
>       qcow2_cache_destroy(bs, s->refcount_block_cache);
>   
> @@ -2964,6 +3017,9 @@ BlockDriver bdrv_qcow2 = {
>       .create_opts         = &qcow2_create_opts,
>       .bdrv_check          = qcow2_check,
>       .bdrv_amend_options  = qcow2_amend_options,
> +
> +    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
> +    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
>   };
>   
>   static void bdrv_qcow2_init(void)
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0076512..2c45eb2 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -93,6 +93,7 @@
>   #define QCOW2_OPT_CACHE_SIZE "cache-size"
>   #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
>   #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
> +#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
>   
>   typedef struct QCowHeader {
>       uint32_t magic;
> @@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
>   
>       Qcow2Cache* l2_table_cache;
>       Qcow2Cache* refcount_block_cache;
> +    QEMUTimer *cache_clean_timer;
> +    unsigned cache_clean_interval;
>   
>       uint8_t *cluster_cache;
>       uint8_t *cluster_data;
> @@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>       Qcow2Cache *dependency);
>   void qcow2_cache_depends_on_flush(Qcow2Cache *c);
>   
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
>   int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
>   
>   int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 863ffea..f42338d 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1538,6 +1538,9 @@
>   # @refcount-cache-size:   #optional the maximum size of the refcount block cache
>   #                         in bytes (since 2.2)
>   #
> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
> +#                         caches. The interval is in seconds (since 2.4)
> +#
>   # Since: 1.7
>   ##
>   { 'struct': 'BlockdevOptionsQcow2',
> @@ -1549,7 +1552,8 @@
>               '*overlap-check': 'Qcow2OverlapChecks',
>               '*cache-size': 'int',
>               '*l2-cache-size': 'int',
> -            '*refcount-cache-size': 'int' } }
> +            '*refcount-cache-size': 'int',
> +            '*cache-clean-interval': 'int' } }
>   
>   
>   ##

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
@ 2015-05-26 16:10   ` Max Reitz
  2015-05-26 16:12     ` Eric Blake
  2015-05-26 16:20     ` Alberto Garcia
  2015-05-26 16:13   ` Eric Blake
  1 sibling, 2 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-26 16:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 18.05.2015 18:48, Alberto Garcia wrote:
> Changing the current ordering saves 8 bytes per cache entry in x86_64.

Hm, not seven?

> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index a215f5b..43590ff 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -31,9 +31,9 @@
>   
>   typedef struct Qcow2CachedTable {
>       int64_t  offset;
> -    bool     dirty;
>       uint64_t lru_counter;
>       int      ref;
> +    bool     dirty;
>   } Qcow2CachedTable;
>   
>   struct Qcow2Cache {

With "7" above, or an explanation why it actually is 8:

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-26 16:10   ` Max Reitz
@ 2015-05-26 16:12     ` Eric Blake
  2015-05-26 16:14       ` Max Reitz
  2015-05-26 16:20     ` Alberto Garcia
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Blake @ 2015-05-26 16:12 UTC (permalink / raw)
  To: Max Reitz, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]

On 05/26/2015 10:10 AM, Max Reitz wrote:
> On 18.05.2015 18:48, Alberto Garcia wrote:
>> Changing the current ordering saves 8 bytes per cache entry in x86_64.
> 
> Hm, not seven?
> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   block/qcow2-cache.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>> index a215f5b..43590ff 100644
>> --- a/block/qcow2-cache.c
>> +++ b/block/qcow2-cache.c
>> @@ -31,9 +31,9 @@
>>     typedef struct Qcow2CachedTable {
>>       int64_t  offset;
>> -    bool     dirty;
>>       uint64_t lru_counter;
>>       int      ref;
>> +    bool     dirty;
>>   } Qcow2CachedTable;
>>     struct Qcow2Cache {
> 
> With "7" above, or an explanation why it actually is 8:

Old layout:

0-7   offset
8     dirty
9-15  padding
16-23 lru_counter
24-27 dirty
28-31 padding

New layout:

0-7   offset
8-15  lru_counter
16-19 ref
20    dirty
21-23 padding

It indeed saves 8 bytes.

> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
  2015-05-26 16:10   ` Max Reitz
@ 2015-05-26 16:13   ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Eric Blake @ 2015-05-26 16:13 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

On 05/18/2015 10:48 AM, Alberto Garcia wrote:
> Changing the current ordering saves 8 bytes per cache entry in x86_64.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-26 16:12     ` Eric Blake
@ 2015-05-26 16:14       ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-26 16:14 UTC (permalink / raw)
  To: Eric Blake, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On 26.05.2015 18:12, Eric Blake wrote:
> On 05/26/2015 10:10 AM, Max Reitz wrote:
>> On 18.05.2015 18:48, Alberto Garcia wrote:
>>> Changing the current ordering saves 8 bytes per cache entry in x86_64.
>> Hm, not seven?
>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>    block/qcow2-cache.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>>> index a215f5b..43590ff 100644
>>> --- a/block/qcow2-cache.c
>>> +++ b/block/qcow2-cache.c
>>> @@ -31,9 +31,9 @@
>>>      typedef struct Qcow2CachedTable {
>>>        int64_t  offset;
>>> -    bool     dirty;
>>>        uint64_t lru_counter;
>>>        int      ref;
>>> +    bool     dirty;
>>>    } Qcow2CachedTable;
>>>      struct Qcow2Cache {
>> With "7" above, or an explanation why it actually is 8:
> Old layout:
>
> 0-7   offset
> 8     dirty
> 9-15  padding
> 16-23 lru_counter
> 24-27 dirty
> 28-31 padding
>
> New layout:
>
> 0-7   offset
> 8-15  lru_counter
> 16-19 ref
> 20    dirty
> 21-23 padding
>
> It indeed saves 8 bytes.

Oh, I forgot about the padding at the end. Thanks!

Max

>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding
  2015-05-26 16:10   ` Max Reitz
  2015-05-26 16:12     ` Eric Blake
@ 2015-05-26 16:20     ` Alberto Garcia
  1 sibling, 0 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-26 16:20 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-block

On Tue 26 May 2015 06:10:11 PM CEST, Max Reitz wrote:
> On 18.05.2015 18:48, Alberto Garcia wrote:
>> Changing the current ordering saves 8 bytes per cache entry in x86_64.
>
> Hm, not seven?

No, the size is 32 before the patch and 24 afterwards.

What you save is the 7 bytes of padding after 'dirty' and one of the
bytes after 'ref'.

>>   typedef struct Qcow2CachedTable {
>>       int64_t  offset;
>> -    bool     dirty;
>>       uint64_t lru_counter;
>>       int      ref;
>> +    bool     dirty;
>>   } Qcow2CachedTable;

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-29  9:24 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
@ 2015-06-02 11:04   ` Kevin Wolf
  0 siblings, 0 replies; 37+ messages in thread
From: Kevin Wolf @ 2015-06-02 11:04 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 29.05.2015 um 11:24 hat Alberto Garcia geschrieben:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
> 
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
> 
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 863ffea..f42338d 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1538,6 +1538,9 @@
>  # @refcount-cache-size:   #optional the maximum size of the refcount block cache
>  #                         in bytes (since 2.2)
>  #
> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
> +#                         caches. The interval is in seconds (since 2.4)

Should add that 0 means disabling the cleaning, and 0 is the default.

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-29  9:24 [Qemu-devel] [PATCH v4 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-29  9:24 ` Alberto Garcia
  2015-06-02 11:04   ` Kevin Wolf
  0 siblings, 1 reply; 37+ messages in thread
From: Alberto Garcia @ 2015-05-29  9:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This adds a new 'cache-clean-interval' option that cleans all qcow2
cache entries that haven't been used in a certain interval, given in
seconds.

This allows setting a large L2 cache size so it can handle scenarios
with lots of I/O and at the same time use little memory during periods
of inactivity.

This feature currently relies on MADV_DONTNEED to free that memory, so
it is not useful in systems that don't follow that behavior.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
 block/qcow2.c        | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json |  6 ++++-
 4 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed14a92..a215f5b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -43,6 +43,7 @@ struct Qcow2Cache {
     bool                    depends_on_flush;
     void                   *table_array;
     uint64_t                lru_counter;
+    uint64_t                cache_clean_lru_counter;
 };
 
 static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
@@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
 #endif
 }
 
+static inline bool can_clean_entry(Qcow2Cache *c, int i)
+{
+    Qcow2CachedTable *t = &c->entries[i];
+    return t->ref == 0 && !t->dirty && t->offset != 0 &&
+        t->lru_counter <= c->cache_clean_lru_counter;
+}
+
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
+{
+    int i = 0;
+    while (i < c->size) {
+        int to_clean = 0;
+
+        /* Skip the entries that we don't need to clean */
+        while (i < c->size && !can_clean_entry(c, i)) {
+            i++;
+        }
+
+        /* And count how many we can clean in a row */
+        while (i < c->size && can_clean_entry(c, i)) {
+            c->entries[i].offset = 0;
+            c->entries[i].lru_counter = 0;
+            i++;
+            to_clean++;
+        }
+
+        if (to_clean > 0) {
+            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
+        }
+    }
+
+    c->cache_clean_lru_counter = c->lru_counter;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index f7b4cc6..8c565e9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Maximum refcount block cache size",
         },
+        {
+            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Clean unused cache entries after this time (in seconds)",
+        },
         { /* end of list */ }
     },
 };
@@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
     [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void cache_clean_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    qcow2_cache_clean_unused(bs, s->l2_table_cache);
+    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
+    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              (int64_t) s->cache_clean_interval * 1000);
+}
+
+static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_interval > 0) {
+        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
+                                             SCALE_MS, cache_clean_timer_cb,
+                                             bs);
+        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  (int64_t) s->cache_clean_interval * 1000);
+    }
+}
+
+static void cache_clean_timer_del(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_timer) {
+        timer_del(s->cache_clean_timer);
+        timer_free(s->cache_clean_timer);
+        s->cache_clean_timer = NULL;
+    }
+}
+
+static void qcow2_detach_aio_context(BlockDriverState *bs)
+{
+    cache_clean_timer_del(bs);
+}
+
+static void qcow2_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    cache_clean_timer_init(bs, new_context);
+}
+
 static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
                              uint64_t *refcount_cache_size, Error **errp)
 {
@@ -552,6 +600,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     const char *opt_overlap_check, *opt_overlap_check_template;
     int overlap_check_template = 0;
     uint64_t l2_cache_size, refcount_cache_size;
+    uint64_t cache_clean_interval;
 
     ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
     if (ret < 0) {
@@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    cache_clean_interval =
+        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
+    if (cache_clean_interval > UINT_MAX) {
+        error_setg(errp, "Cache clean interval too big");
+        ret = -EINVAL;
+        goto fail;
+    }
+    s->cache_clean_interval = cache_clean_interval;
+    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
+
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
     s->cluster_data = qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
@@ -1004,6 +1063,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
     s->l1_table = NULL;
+    cache_clean_timer_del(bs);
     if (s->l2_table_cache) {
         qcow2_cache_destroy(bs, s->l2_table_cache);
     }
@@ -1458,6 +1518,7 @@ static void qcow2_close(BlockDriverState *bs)
         }
     }
 
+    cache_clean_timer_del(bs);
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
 
@@ -2970,6 +3031,9 @@ BlockDriver bdrv_qcow2 = {
     .create_opts         = &qcow2_create_opts,
     .bdrv_check          = qcow2_check,
     .bdrv_amend_options  = qcow2_amend_options,
+
+    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
+    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
 };
 
 static void bdrv_qcow2_init(void)
diff --git a/block/qcow2.h b/block/qcow2.h
index 0076512..2c45eb2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -93,6 +93,7 @@
 #define QCOW2_OPT_CACHE_SIZE "cache-size"
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
+#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    QEMUTimer *cache_clean_timer;
+    unsigned cache_clean_interval;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
     Qcow2Cache *dependency);
 void qcow2_cache_depends_on_flush(Qcow2Cache *c);
 
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
 
 int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 863ffea..f42338d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1538,6 +1538,9 @@
 # @refcount-cache-size:   #optional the maximum size of the refcount block cache
 #                         in bytes (since 2.2)
 #
+# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
+#                         caches. The interval is in seconds (since 2.4)
+#
 # Since: 1.7
 ##
 { 'struct': 'BlockdevOptionsQcow2',
@@ -1549,7 +1552,8 @@
             '*overlap-check': 'Qcow2OverlapChecks',
             '*cache-size': 'int',
             '*l2-cache-size': 'int',
-            '*refcount-cache-size': 'int' } }
+            '*refcount-cache-size': 'int',
+            '*cache-clean-interval': 'int' } }
 
 
 ##
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 19:41             ` Eric Blake
@ 2015-05-29  8:32               ` Kevin Wolf
  0 siblings, 0 replies; 37+ messages in thread
From: Kevin Wolf @ 2015-05-29  8:32 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-block, Alberto Garcia, qemu-devel, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]

Am 28.05.2015 um 21:41 hat Eric Blake geschrieben:
> On 05/28/2015 09:23 AM, Max Reitz wrote:
> 
> >>>> Can we mark the parameter optional, and only provide it when it is
> >>>> non-zero?  That way, qemu-img (which cannot set an interval) will not
> >>>> report it, and the only time it will appear is if it was set as part
> >>>> of opening the block device under qemu.
> >>> That sounds good.
> >> But what do we do with the other parameters (refcount-cache-size,
> >> l2-cache-size)? We cannot have the same solution there because they
> >> don't belong to the image file either, and they're never going go be
> >> zero.
> > 
> > Pssht, don't mention it, or Eric will notice. :-)
> > 
> > Well, one solution would be to remember whether they have been set
> > explicitly or not. But that gets ugly really quickly. Maybe Kevin's
> > series helps there, but I don't know.
> > 
> > Of course, the simplest solution is to worry about cache-clean-interval
> > for now and worry about the cache sizes later… But that basically means
> > "We'll never worry about them unless someone complains", so…
> 
> Hmm, now that we have three pieces of data, I'm starting to lean more
> towards ImageInfoSpecific being the wrong place for this after all; it
> would still be nice to advertise all three, but where?  Is
> BlockdevCacheInfo more appropriate (as a sub-struct of BlockDeviceInfo)?

BlockDeviceInfo contains runtime information, so that looks like the
right place indeed. As these options aren't present for all devices, I
don't think we should extend BlockdevCacheInfo, but rather introduce a
BlockDeviceInfoSpecific that works like ImageInfoSpecific, just for
runtime information.

I agree with Berto that that's out of scope for this series, though.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:23           ` Max Reitz
  2015-05-28 15:30             ` Alberto Garcia
@ 2015-05-28 19:41             ` Eric Blake
  2015-05-29  8:32               ` Kevin Wolf
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Blake @ 2015-05-28 19:41 UTC (permalink / raw)
  To: Max Reitz, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]

On 05/28/2015 09:23 AM, Max Reitz wrote:

>>>> Can we mark the parameter optional, and only provide it when it is
>>>> non-zero?  That way, qemu-img (which cannot set an interval) will not
>>>> report it, and the only time it will appear is if it was set as part
>>>> of opening the block device under qemu.
>>> That sounds good.
>> But what do we do with the other parameters (refcount-cache-size,
>> l2-cache-size)? We cannot have the same solution there because they
>> don't belong to the image file either, and they're never going go be
>> zero.
> 
> Pssht, don't mention it, or Eric will notice. :-)
> 
> Well, one solution would be to remember whether they have been set
> explicitly or not. But that gets ugly really quickly. Maybe Kevin's
> series helps there, but I don't know.
> 
> Of course, the simplest solution is to worry about cache-clean-interval
> for now and worry about the cache sizes later… But that basically means
> "We'll never worry about them unless someone complains", so…

Hmm, now that we have three pieces of data, I'm starting to lean more
towards ImageInfoSpecific being the wrong place for this after all; it
would still be nice to advertise all three, but where?  Is
BlockdevCacheInfo more appropriate (as a sub-struct of BlockDeviceInfo)?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 16:44       ` Kevin Wolf
@ 2015-05-28 17:03         ` Alberto Garcia
  0 siblings, 0 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-28 17:03 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake; +Cc: qemu-block, qemu-devel, Stefan Hajnoczi, Max Reitz

On Thu 28 May 2015 06:44:55 PM CEST, Kevin Wolf wrote:
>> Yeah, I'm not sure such duplication helps.  I'd still like it
>> reported somewhere, though, as it is nice to query that a requested
>> setting is actually working.
>
> This isn't duplicated information. You can have ImageInfoSpecificQCow2
> show lazy_refcounts=off because that is what the image file contains,
> but the real value in effect could be lazy_refcounts=on.

That's right, ImageInfoSpecificQCow2 returns the value from the header
(s->compatible_features), not the runtime value (s->use_lazy_refcounts).

I think I'll resend the patch without the ImageInfo change. Querying the
runtime values is a task on its own.

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:13     ` Eric Blake
  2015-05-28 15:14       ` Max Reitz
@ 2015-05-28 16:44       ` Kevin Wolf
  2015-05-28 17:03         ` Alberto Garcia
  1 sibling, 1 reply; 37+ messages in thread
From: Kevin Wolf @ 2015-05-28 16:44 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-block, Alberto Garcia, qemu-devel, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3400 bytes --]

Am 28.05.2015 um 17:13 hat Eric Blake geschrieben:
> On 05/28/2015 08:56 AM, Max Reitz wrote:
> > On 27.05.2015 11:46, Alberto Garcia wrote:
> >> This adds a new 'cache-clean-interval' option that cleans all qcow2
> >> cache entries that haven't been used in a certain interval, given in
> >> seconds.
> >>
> >> This allows setting a large L2 cache size so it can handle scenarios
> >> with lots of I/O and at the same time use little memory during periods
> >> of inactivity.
> >>
> >> This feature currently relies on MADV_DONTNEED to free that memory, so
> >> it is not useful in systems that don't follow that behavior.
> >>
> 
> >> +++ b/qapi/block-core.json
> >> @@ -41,6 +41,10 @@
> >>   # @corrupt: #optional true if the image has been marked corrupt;
> >> only valid for
> >>   #           compat >= 1.1 (since 2.2)
> >>   #
> >> +# @cache-clean-interval: interval in seconds after which unused L2 and
> >> +#                        refcount cache entries are removed. If 0 then
> >> +#                        this feature is not enabled (since 2.4)
> >> +#
> >>   # @refcount-bits: width of a refcount entry in bits (since 2.3)
> >>   #
> >>   # Since: 1.7
> >> @@ -50,7 +54,8 @@
> >>         'compat': 'str',
> >>         '*lazy-refcounts': 'bool',
> >>         '*corrupt': 'bool',
> >> -      'refcount-bits': 'int'
> >> +      'refcount-bits': 'int',
> >> +      'cache-clean-interval': 'int'
> >>     } }
> > 
> > I'm not too happy about making this part of ImageInfoSpecificQCow2. Two
> > reasons for this: First, it's eventually part of ImageInfo, which is
> > defined as "Information about a QEMU image file", but this option cannot
> > be set in the image file itself but is only a run-time option.
> > 
> > Second, my original goal for ImageInfoSpecific was to provide more
> > information through qemu-img info, and this value will look pretty
> > strange there.
> > 
> > I don't know how to resolve this quandary, though. Since qemu cannot
> > change this interval by itself, I think not providing an interface for
> > reading it is fine. On the other hand, if Eric finds such an interface
> > absolutely mandatory, I can't think of a better place to return it than
> > here.
> 
> Can we mark the parameter optional, and only provide it when it is
> non-zero?  That way, qemu-img (which cannot set an interval) will not
> report it, and the only time it will appear is if it was set as part of
> opening the block device under qemu.

No, that wouldn't be right. It's not information tied to the image file.

> > The only solution which would satisfy both requirements would be another
> > structure which contains "online" flags, and thus is not evaluated by
> > qemu-img info, but only by QMP commands. But that's ugly.
> > 
> 
> Yeah, I'm not sure such duplication helps.  I'd still like it reported
> somewhere, though, as it is nice to query that a requested setting is
> actually working.

This isn't duplicated information. You can have ImageInfoSpecificQCow2
show lazy_refcounts=off because that is what the image file contains,
but the real value in effect could be lazy_refcounts=on.

Options stored in the image and runtime options are two different pieces
of information, even if the former specify the defaults for the latter.
So I think it should be possible to query both of them.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:23           ` Max Reitz
@ 2015-05-28 15:30             ` Alberto Garcia
  2015-05-28 19:41             ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-28 15:30 UTC (permalink / raw)
  To: Max Reitz, Eric Blake, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On Thu 28 May 2015 05:23:40 PM CEST, Max Reitz wrote:

>>>> Can we mark the parameter optional, and only provide it when it is
>>>> non-zero?  That way, qemu-img (which cannot set an interval) will
>>>> not report it, and the only time it will appear is if it was set as
>>>> part of opening the block device under qemu.

>>> That sounds good.

>> But what do we do with the other parameters (refcount-cache-size,
>> l2-cache-size)? We cannot have the same solution there because they
>> don't belong to the image file either, and they're never going go be
>> zero.

> Of course, the simplest solution is to worry about cache-clean-interval 
> for now and worry about the cache sizes later… But that basically means 
> "We'll never worry about them unless someone complains", so…

My worry is that the solution works in this case because the default
value happens to be 0 and means "disabled", so it makes sense, but it
doesn't work in general with other parameters, so we would be adding
something that we know it's ad-hoc.

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:19         ` Alberto Garcia
@ 2015-05-28 15:23           ` Max Reitz
  2015-05-28 15:30             ` Alberto Garcia
  2015-05-28 19:41             ` Eric Blake
  0 siblings, 2 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-28 15:23 UTC (permalink / raw)
  To: Alberto Garcia, Eric Blake, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On 28.05.2015 17:19, Alberto Garcia wrote:
> On Thu 28 May 2015 05:14:12 PM CEST, Max Reitz wrote:
>>>>>           'compat': 'str',
>>>>>           '*lazy-refcounts': 'bool',
>>>>>           '*corrupt': 'bool',
>>>>> -      'refcount-bits': 'int'
>>>>> +      'refcount-bits': 'int',
>>>>> +      'cache-clean-interval': 'int'
>>>>>       } }
>>>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>>>> Two reasons for this: First, it's eventually part of ImageInfo,
>>>> which is defined as "Information about a QEMU image file", but this
>>>> option cannot be set in the image file itself but is only a run-time
>>>> option.
>>>>
>>> Can we mark the parameter optional, and only provide it when it is
>>> non-zero?  That way, qemu-img (which cannot set an interval) will not
>>> report it, and the only time it will appear is if it was set as part
>>> of opening the block device under qemu.
>> That sounds good.
> But what do we do with the other parameters (refcount-cache-size,
> l2-cache-size)? We cannot have the same solution there because they
> don't belong to the image file either, and they're never going go be
> zero.

Pssht, don't mention it, or Eric will notice. :-)

Well, one solution would be to remember whether they have been set 
explicitly or not. But that gets ugly really quickly. Maybe Kevin's 
series helps there, but I don't know.

Of course, the simplest solution is to worry about cache-clean-interval 
for now and worry about the cache sizes later… But that basically means 
"We'll never worry about them unless someone complains", so…

Max

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:14       ` Max Reitz
@ 2015-05-28 15:19         ` Alberto Garcia
  2015-05-28 15:23           ` Max Reitz
  0 siblings, 1 reply; 37+ messages in thread
From: Alberto Garcia @ 2015-05-28 15:19 UTC (permalink / raw)
  To: Max Reitz, Eric Blake, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On Thu 28 May 2015 05:14:12 PM CEST, Max Reitz wrote:
>>>>          'compat': 'str',
>>>>          '*lazy-refcounts': 'bool',
>>>>          '*corrupt': 'bool',
>>>> -      'refcount-bits': 'int'
>>>> +      'refcount-bits': 'int',
>>>> +      'cache-clean-interval': 'int'
>>>>      } }
>>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>>> Two reasons for this: First, it's eventually part of ImageInfo,
>>> which is defined as "Information about a QEMU image file", but this
>>> option cannot be set in the image file itself but is only a run-time
>>> option.
>>>
>> Can we mark the parameter optional, and only provide it when it is
>> non-zero?  That way, qemu-img (which cannot set an interval) will not
>> report it, and the only time it will appear is if it was set as part
>> of opening the block device under qemu.
>
> That sounds good.

But what do we do with the other parameters (refcount-cache-size,
l2-cache-size)? We cannot have the same solution there because they
don't belong to the image file either, and they're never going go be
zero.

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:13     ` Eric Blake
@ 2015-05-28 15:14       ` Max Reitz
  2015-05-28 15:19         ` Alberto Garcia
  2015-05-28 16:44       ` Kevin Wolf
  1 sibling, 1 reply; 37+ messages in thread
From: Max Reitz @ 2015-05-28 15:14 UTC (permalink / raw)
  To: Eric Blake, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On 28.05.2015 17:13, Eric Blake wrote:
> On 05/28/2015 08:56 AM, Max Reitz wrote:
>> On 27.05.2015 11:46, Alberto Garcia wrote:
>>> This adds a new 'cache-clean-interval' option that cleans all qcow2
>>> cache entries that haven't been used in a certain interval, given in
>>> seconds.
>>>
>>> This allows setting a large L2 cache size so it can handle scenarios
>>> with lots of I/O and at the same time use little memory during periods
>>> of inactivity.
>>>
>>> This feature currently relies on MADV_DONTNEED to free that memory, so
>>> it is not useful in systems that don't follow that behavior.
>>>
>>> +++ b/qapi/block-core.json
>>> @@ -41,6 +41,10 @@
>>>    # @corrupt: #optional true if the image has been marked corrupt;
>>> only valid for
>>>    #           compat >= 1.1 (since 2.2)
>>>    #
>>> +# @cache-clean-interval: interval in seconds after which unused L2 and
>>> +#                        refcount cache entries are removed. If 0 then
>>> +#                        this feature is not enabled (since 2.4)
>>> +#
>>>    # @refcount-bits: width of a refcount entry in bits (since 2.3)
>>>    #
>>>    # Since: 1.7
>>> @@ -50,7 +54,8 @@
>>>          'compat': 'str',
>>>          '*lazy-refcounts': 'bool',
>>>          '*corrupt': 'bool',
>>> -      'refcount-bits': 'int'
>>> +      'refcount-bits': 'int',
>>> +      'cache-clean-interval': 'int'
>>>      } }
>> I'm not too happy about making this part of ImageInfoSpecificQCow2. Two
>> reasons for this: First, it's eventually part of ImageInfo, which is
>> defined as "Information about a QEMU image file", but this option cannot
>> be set in the image file itself but is only a run-time option.
>>
>> Second, my original goal for ImageInfoSpecific was to provide more
>> information through qemu-img info, and this value will look pretty
>> strange there.
>>
>> I don't know how to resolve this quandary, though. Since qemu cannot
>> change this interval by itself, I think not providing an interface for
>> reading it is fine. On the other hand, if Eric finds such an interface
>> absolutely mandatory, I can't think of a better place to return it than
>> here.
> Can we mark the parameter optional, and only provide it when it is
> non-zero?  That way, qemu-img (which cannot set an interval) will not
> report it, and the only time it will appear is if it was set as part of
> opening the block device under qemu.

That sounds good.

Max

>> The only solution which would satisfy both requirements would be another
>> structure which contains "online" flags, and thus is not evaluated by
>> qemu-img info, but only by QMP commands. But that's ugly.
>>
> Yeah, I'm not sure such duplication helps.  I'd still like it reported
> somewhere, though, as it is nice to query that a requested setting is
> actually working.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 14:56   ` Max Reitz
  2015-05-28 15:02     ` Alberto Garcia
@ 2015-05-28 15:13     ` Eric Blake
  2015-05-28 15:14       ` Max Reitz
  2015-05-28 16:44       ` Kevin Wolf
  1 sibling, 2 replies; 37+ messages in thread
From: Eric Blake @ 2015-05-28 15:13 UTC (permalink / raw)
  To: Max Reitz, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2843 bytes --]

On 05/28/2015 08:56 AM, Max Reitz wrote:
> On 27.05.2015 11:46, Alberto Garcia wrote:
>> This adds a new 'cache-clean-interval' option that cleans all qcow2
>> cache entries that haven't been used in a certain interval, given in
>> seconds.
>>
>> This allows setting a large L2 cache size so it can handle scenarios
>> with lots of I/O and at the same time use little memory during periods
>> of inactivity.
>>
>> This feature currently relies on MADV_DONTNEED to free that memory, so
>> it is not useful in systems that don't follow that behavior.
>>

>> +++ b/qapi/block-core.json
>> @@ -41,6 +41,10 @@
>>   # @corrupt: #optional true if the image has been marked corrupt;
>> only valid for
>>   #           compat >= 1.1 (since 2.2)
>>   #
>> +# @cache-clean-interval: interval in seconds after which unused L2 and
>> +#                        refcount cache entries are removed. If 0 then
>> +#                        this feature is not enabled (since 2.4)
>> +#
>>   # @refcount-bits: width of a refcount entry in bits (since 2.3)
>>   #
>>   # Since: 1.7
>> @@ -50,7 +54,8 @@
>>         'compat': 'str',
>>         '*lazy-refcounts': 'bool',
>>         '*corrupt': 'bool',
>> -      'refcount-bits': 'int'
>> +      'refcount-bits': 'int',
>> +      'cache-clean-interval': 'int'
>>     } }
> 
> I'm not too happy about making this part of ImageInfoSpecificQCow2. Two
> reasons for this: First, it's eventually part of ImageInfo, which is
> defined as "Information about a QEMU image file", but this option cannot
> be set in the image file itself but is only a run-time option.
> 
> Second, my original goal for ImageInfoSpecific was to provide more
> information through qemu-img info, and this value will look pretty
> strange there.
> 
> I don't know how to resolve this quandary, though. Since qemu cannot
> change this interval by itself, I think not providing an interface for
> reading it is fine. On the other hand, if Eric finds such an interface
> absolutely mandatory, I can't think of a better place to return it than
> here.

Can we mark the parameter optional, and only provide it when it is
non-zero?  That way, qemu-img (which cannot set an interval) will not
report it, and the only time it will appear is if it was set as part of
opening the block device under qemu.

> 
> The only solution which would satisfy both requirements would be another
> structure which contains "online" flags, and thus is not evaluated by
> qemu-img info, but only by QMP commands. But that's ugly.
> 

Yeah, I'm not sure such duplication helps.  I'd still like it reported
somewhere, though, as it is nice to query that a requested setting is
actually working.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:04       ` Max Reitz
@ 2015-05-28 15:06         ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-28 15:06 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On 28.05.2015 17:04, Max Reitz wrote:
> On 28.05.2015 17:02, Alberto Garcia wrote:
>> On Thu 28 May 2015 04:56:34 PM CEST, Max Reitz wrote:
>>>>          'compat': 'str',
>>>>          '*lazy-refcounts': 'bool',
>>>>          '*corrupt': 'bool',
>>>> -      'refcount-bits': 'int'
>>>> +      'refcount-bits': 'int',
>>>> +      'cache-clean-interval': 'int'
>>>>      } }
>>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>>> Two reasons for this: First, it's eventually part of ImageInfo, which
>>> is defined as "Information about a QEMU image file", but this option
>>> cannot be set in the image file itself but is only a run-time option.
>> That's a valid point. Now that I think of it, do we actually have a way
>> to retrieve the sizes of the L2 and refcount caches? I think cache-size,
>> l2-cache-size, and refcount-cache-size are already write-only values.
>
> No, in fact we don't. Well, except for if you manage to retrieve the 
> JSON filename for the qcow2 BDS, it should be part of that. :-P

And I just noticed, passing that option will result in qemu returning 
the JSON filename when queried:

$ qemu-img info "json:{'l2-cache-size':65536,'file.filename':'test.qcow2'}"
image: json:{"driver": "qcow2", "l2-cache-size": 65536, "file": 
{"driver": "file", "filename": "test.qcow2"}}
[…]

Max

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 15:02     ` Alberto Garcia
@ 2015-05-28 15:04       ` Max Reitz
  2015-05-28 15:06         ` Max Reitz
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2015-05-28 15:04 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On 28.05.2015 17:02, Alberto Garcia wrote:
> On Thu 28 May 2015 04:56:34 PM CEST, Max Reitz wrote:
>>>          'compat': 'str',
>>>          '*lazy-refcounts': 'bool',
>>>          '*corrupt': 'bool',
>>> -      'refcount-bits': 'int'
>>> +      'refcount-bits': 'int',
>>> +      'cache-clean-interval': 'int'
>>>      } }
>> I'm not too happy about making this part of ImageInfoSpecificQCow2.
>> Two reasons for this: First, it's eventually part of ImageInfo, which
>> is defined as "Information about a QEMU image file", but this option
>> cannot be set in the image file itself but is only a run-time option.
> That's a valid point. Now that I think of it, do we actually have a way
> to retrieve the sizes of the L2 and refcount caches? I think cache-size,
> l2-cache-size, and refcount-cache-size are already write-only values.

No, in fact we don't. Well, except for if you manage to retrieve the 
JSON filename for the qcow2 BDS, it should be part of that. :-P

Max

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-28 14:56   ` Max Reitz
@ 2015-05-28 15:02     ` Alberto Garcia
  2015-05-28 15:04       ` Max Reitz
  2015-05-28 15:13     ` Eric Blake
  1 sibling, 1 reply; 37+ messages in thread
From: Alberto Garcia @ 2015-05-28 15:02 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On Thu 28 May 2015 04:56:34 PM CEST, Max Reitz wrote:
>>         'compat': 'str',
>>         '*lazy-refcounts': 'bool',
>>         '*corrupt': 'bool',
>> -      'refcount-bits': 'int'
>> +      'refcount-bits': 'int',
>> +      'cache-clean-interval': 'int'
>>     } }
>
> I'm not too happy about making this part of ImageInfoSpecificQCow2.
> Two reasons for this: First, it's eventually part of ImageInfo, which
> is defined as "Information about a QEMU image file", but this option
> cannot be set in the image file itself but is only a run-time option.

That's a valid point. Now that I think of it, do we actually have a way
to retrieve the sizes of the L2 and refcount caches? I think cache-size,
l2-cache-size, and refcount-cache-size are already write-only values.

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-27  9:46 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
  2015-05-27 12:27   ` Eric Blake
@ 2015-05-28 14:56   ` Max Reitz
  2015-05-28 15:02     ` Alberto Garcia
  2015-05-28 15:13     ` Eric Blake
  1 sibling, 2 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-28 14:56 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On 27.05.2015 11:46, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
>
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
>
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>   block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2.h        |  4 ++++
>   qapi/block-core.json | 13 +++++++++--
>   4 files changed, 116 insertions(+), 2 deletions(-)
>
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index ed14a92..a215f5b 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -43,6 +43,7 @@ struct Qcow2Cache {
>       bool                    depends_on_flush;
>       void                   *table_array;
>       uint64_t                lru_counter;
> +    uint64_t                cache_clean_lru_counter;
>   };
>   
>   static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
> @@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
>   #endif
>   }
>   
> +static inline bool can_clean_entry(Qcow2Cache *c, int i)
> +{
> +    Qcow2CachedTable *t = &c->entries[i];
> +    return t->ref == 0 && !t->dirty && t->offset != 0 &&
> +        t->lru_counter <= c->cache_clean_lru_counter;
> +}
> +
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
> +{
> +    int i = 0;
> +    while (i < c->size) {
> +        int to_clean = 0;
> +
> +        /* Skip the entries that we don't need to clean */
> +        while (i < c->size && !can_clean_entry(c, i)) {
> +            i++;
> +        }
> +
> +        /* And count how many we can clean in a row */
> +        while (i < c->size && can_clean_entry(c, i)) {
> +            c->entries[i].offset = 0;
> +            c->entries[i].lru_counter = 0;
> +            i++;
> +            to_clean++;
> +        }
> +
> +        if (to_clean > 0) {
> +            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
> +        }
> +    }
> +
> +    c->cache_clean_lru_counter = c->lru_counter;
> +}
> +
>   Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>   {
>       BDRVQcowState *s = bs->opaque;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f7b4cc6..abafcdf 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
>               .type = QEMU_OPT_SIZE,
>               .help = "Maximum refcount block cache size",
>           },
> +        {
> +            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Clean unused cache entries after this time (in seconds)",
> +        },
>           { /* end of list */ }
>       },
>   };
> @@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
>       [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
>   };
>   
> +static void cache_clean_timer_cb(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BDRVQcowState *s = bs->opaque;
> +    qcow2_cache_clean_unused(bs, s->l2_table_cache);
> +    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
> +    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +              (int64_t) s->cache_clean_interval * 1000);
> +}
> +
> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_interval > 0) {
> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
> +                                             SCALE_MS, cache_clean_timer_cb,
> +                                             bs);
> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  (int64_t) s->cache_clean_interval * 1000);
> +    }
> +}
> +
> +static void cache_clean_timer_del(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_timer) {
> +        timer_del(s->cache_clean_timer);
> +        timer_free(s->cache_clean_timer);
> +        s->cache_clean_timer = NULL;
> +    }
> +}
> +
> +static void qcow2_detach_aio_context(BlockDriverState *bs)
> +{
> +    cache_clean_timer_del(bs);
> +}
> +
> +static void qcow2_attach_aio_context(BlockDriverState *bs,
> +                                     AioContext *new_context)
> +{
> +    cache_clean_timer_init(bs, new_context);
> +}
> +
>   static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
>                                uint64_t *refcount_cache_size, Error **errp)
>   {
> @@ -552,6 +600,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       const char *opt_overlap_check, *opt_overlap_check_template;
>       int overlap_check_template = 0;
>       uint64_t l2_cache_size, refcount_cache_size;
> +    uint64_t cache_clean_interval;
>   
>       ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
>       if (ret < 0) {
> @@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>           goto fail;
>       }
>   
> +    cache_clean_interval =
> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
> +    if (cache_clean_interval > UINT_MAX) {
> +        error_setg(errp, "Cache clean interval too big");
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +    s->cache_clean_interval = cache_clean_interval;
> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> +
>       s->cluster_cache = g_malloc(s->cluster_size);
>       /* one more sector for decompressed data alignment */
>       s->cluster_data = qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
> @@ -1004,6 +1063,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       qemu_vfree(s->l1_table);
>       /* else pre-write overlap checks in cache_destroy may crash */
>       s->l1_table = NULL;
> +    cache_clean_timer_del(bs);
>       if (s->l2_table_cache) {
>           qcow2_cache_destroy(bs, s->l2_table_cache);
>       }
> @@ -1458,6 +1518,7 @@ static void qcow2_close(BlockDriverState *bs)
>           }
>       }
>   
> +    cache_clean_timer_del(bs);
>       qcow2_cache_destroy(bs, s->l2_table_cache);
>       qcow2_cache_destroy(bs, s->refcount_block_cache);
>   
> @@ -2552,6 +2613,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
>           };
>       }
>   
> +    spec_info->qcow2->cache_clean_interval = s->cache_clean_interval;
> +
>       return spec_info;
>   }
>   
> @@ -2970,6 +3033,9 @@ BlockDriver bdrv_qcow2 = {
>       .create_opts         = &qcow2_create_opts,
>       .bdrv_check          = qcow2_check,
>       .bdrv_amend_options  = qcow2_amend_options,
> +
> +    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
> +    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
>   };
>   
>   static void bdrv_qcow2_init(void)
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0076512..2c45eb2 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -93,6 +93,7 @@
>   #define QCOW2_OPT_CACHE_SIZE "cache-size"
>   #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
>   #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
> +#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
>   
>   typedef struct QCowHeader {
>       uint32_t magic;
> @@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
>   
>       Qcow2Cache* l2_table_cache;
>       Qcow2Cache* refcount_block_cache;
> +    QEMUTimer *cache_clean_timer;
> +    unsigned cache_clean_interval;
>   
>       uint8_t *cluster_cache;
>       uint8_t *cluster_data;
> @@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>       Qcow2Cache *dependency);
>   void qcow2_cache_depends_on_flush(Qcow2Cache *c);
>   
> +void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
>   int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
>   
>   int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 863ffea..3a9b590 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -41,6 +41,10 @@
>   # @corrupt: #optional true if the image has been marked corrupt; only valid for
>   #           compat >= 1.1 (since 2.2)
>   #
> +# @cache-clean-interval: interval in seconds after which unused L2 and
> +#                        refcount cache entries are removed. If 0 then
> +#                        this feature is not enabled (since 2.4)
> +#
>   # @refcount-bits: width of a refcount entry in bits (since 2.3)
>   #
>   # Since: 1.7
> @@ -50,7 +54,8 @@
>         'compat': 'str',
>         '*lazy-refcounts': 'bool',
>         '*corrupt': 'bool',
> -      'refcount-bits': 'int'
> +      'refcount-bits': 'int',
> +      'cache-clean-interval': 'int'
>     } }

I'm not too happy about making this part of ImageInfoSpecificQCow2. Two 
reasons for this: First, it's eventually part of ImageInfo, which is 
defined as "Information about a QEMU image file", but this option cannot 
be set in the image file itself but is only a run-time option.

Second, my original goal for ImageInfoSpecific was to provide more 
information through qemu-img info, and this value will look pretty 
strange there.

I don't know how to resolve this quandary, though. Since qemu cannot 
change this interval by itself, I think not providing an interface for 
reading it is fine. On the other hand, if Eric finds such an interface 
absolutely mandatory, I can't think of a better place to return it than 
here.

The only solution which would satisfy both requirements would be another 
structure which contains "online" flags, and thus is not evaluated by 
qemu-img info, but only by QMP commands. But that's ugly.

Max

>   ##
> @@ -1538,6 +1543,9 @@
>   # @refcount-cache-size:   #optional the maximum size of the refcount block cache
>   #                         in bytes (since 2.2)
>   #
> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
> +#                         caches. The interval is in seconds (since 2.4)
> +#
>   # Since: 1.7
>   ##
>   { 'struct': 'BlockdevOptionsQcow2',
> @@ -1549,7 +1557,8 @@
>               '*overlap-check': 'Qcow2OverlapChecks',
>               '*cache-size': 'int',
>               '*l2-cache-size': 'int',
> -            '*refcount-cache-size': 'int' } }
> +            '*refcount-cache-size': 'int',
> +            '*cache-clean-interval': 'int' } }
>   
>   
>   ##

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-27 12:27   ` Eric Blake
  2015-05-27 14:34     ` Alberto Garcia
@ 2015-05-28 14:47     ` Max Reitz
  1 sibling, 0 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-28 14:47 UTC (permalink / raw)
  To: Eric Blake, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On 27.05.2015 14:27, Eric Blake wrote:
> On 05/27/2015 03:46 AM, Alberto Garcia wrote:
>> This adds a new 'cache-clean-interval' option that cleans all qcow2
>> cache entries that haven't been used in a certain interval, given in
>> seconds.
>>
>> This allows setting a large L2 cache size so it can handle scenarios
>> with lots of I/O and at the same time use little memory during periods
>> of inactivity.
>>
>> This feature currently relies on MADV_DONTNEED to free that memory, so
>> it is not useful in systems that don't follow that behavior.
>>
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> Cc: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>>   block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.h        |  4 ++++
>>   qapi/block-core.json | 13 +++++++++--
>>   4 files changed, 116 insertions(+), 2 deletions(-)
>>
>> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    if (s->cache_clean_interval > 0) {
>> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
>> +                                             SCALE_MS, cache_clean_timer_cb,
>> +                                             bs);
>> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                  (int64_t) s->cache_clean_interval * 1000);
>> +    }
>> +}
>> +
> This function sets up a timer for non-zero interval, but does nothing if
> interval is zero. [1]
>
>> @@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>           goto fail;
>>       }
>>   
>> +    cache_clean_interval =
>> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
>> +    if (cache_clean_interval > UINT_MAX) {
>> +        error_setg(errp, "Cache clean interval too big");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
> If you type the qapi code as 'uint32' rather than 'int', you could skip
> the error checking here because the parser would have already clamped
> things.  But I can live with this as-is.

Well, for blockdev-add, yes, but I don't think that applies when the 
option has been passed on the command line.

Max

>> +    s->cache_clean_interval = cache_clean_interval;
>> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> [1] But here, you are unconditionally calling init, whether the new
> value is 0 or nonzero.  Can a block reopen ever cause an existing BDS to
> change its interval, in which case I could create a BDS originally with
> a timer, then reopen it without a timer, and init() would have to remove
> the existing timer?  If I'm reading this patch correctly, right now the
> interval is a write-once deal (no way to change it after the fact), so
> your code is okay; but should a separate patch be added to allow
> adjusting the interval, via a reopen operation?
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-27 12:27   ` Eric Blake
@ 2015-05-27 14:34     ` Alberto Garcia
  2015-05-28 14:47     ` Max Reitz
  1 sibling, 0 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-27 14:34 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi, Max Reitz

On Wed 27 May 2015 02:27:35 PM CEST, Eric Blake <eblake@redhat.com> wrote:

>> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
>
> This function sets up a timer for non-zero interval, but does nothing
> if interval is zero. [1]

Right, zero means there's no interval. I could clarify that in the
documentation (I did it in the ImageInfoSpecificQCow2 part, though).

>> +    if (cache_clean_interval > UINT_MAX) {
>> +        error_setg(errp, "Cache clean interval too big");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>
> If you type the qapi code as 'uint32' rather than 'int', you could
> skip the error checking here because the parser would have already
> clamped things.  But I can live with this as-is.

Hmmm... the UINT_MAX limit is just a way to prevent an overflow because
of an abnormal value, not because the limit itself make sense (it's ~136
years, way beyond any reasonable value for that setting).

I'm not sure if it's a good idea to expose that in the API, it gives the
idea that there's a reason why it is a 32-bit integer, but there's none.

I would actually be ok with having a smaller limit (I don't know, 1
year?), but there's also no easy way to choose one I guess, so I decided
to let the user choose as long as it doesn't break anything.

>> +    s->cache_clean_interval = cache_clean_interval;
>> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
>
> [1] But here, you are unconditionally calling init, whether the new
> value is 0 or nonzero.

Instead of having the same check in all places where we call
cache_clean_timer_init() (there's two at the moment) I think it's enough
with checking the value in the function where it is actually used (which
is anyway a good idea).

> Can a block reopen ever cause an existing BDS to change its interval,
> in which case I could create a BDS originally with a timer, then
> reopen it without a timer, and init() would have to remove the
> existing timer?  If I'm reading this patch correctly, right now the
> interval is a write-once deal (no way to change it after the fact), so
> your code is okay; but should a separate patch be added to allow
> adjusting the interval, via a reopen operation?

I have no objections against changing the interval, I just didn't
consider that case.

If we want to support it it should be as simple as

 timer_del();   s->interval = new_interval;   timer_init();

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-27  9:46 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
@ 2015-05-27 12:27   ` Eric Blake
  2015-05-27 14:34     ` Alberto Garcia
  2015-05-28 14:47     ` Max Reitz
  2015-05-28 14:56   ` Max Reitz
  1 sibling, 2 replies; 37+ messages in thread
From: Eric Blake @ 2015-05-27 12:27 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2971 bytes --]

On 05/27/2015 03:46 AM, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
> 
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
> 
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>  block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.h        |  4 ++++
>  qapi/block-core.json | 13 +++++++++--
>  4 files changed, 116 insertions(+), 2 deletions(-)
> 

> +static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    if (s->cache_clean_interval > 0) {
> +        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
> +                                             SCALE_MS, cache_clean_timer_cb,
> +                                             bs);
> +        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  (int64_t) s->cache_clean_interval * 1000);
> +    }
> +}
> +

This function sets up a timer for non-zero interval, but does nothing if
interval is zero. [1]

> @@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>          goto fail;
>      }
>  
> +    cache_clean_interval =
> +        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
> +    if (cache_clean_interval > UINT_MAX) {
> +        error_setg(errp, "Cache clean interval too big");
> +        ret = -EINVAL;
> +        goto fail;
> +    }

If you type the qapi code as 'uint32' rather than 'int', you could skip
the error checking here because the parser would have already clamped
things.  But I can live with this as-is.

> +    s->cache_clean_interval = cache_clean_interval;
> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));

[1] But here, you are unconditionally calling init, whether the new
value is 0 or nonzero.  Can a block reopen ever cause an existing BDS to
change its interval, in which case I could create a BDS originally with
a timer, then reopen it without a timer, and init() would have to remove
the existing timer?  If I'm reading this patch correctly, right now the
interval is a write-once deal (no way to change it after the fact), so
your code is okay; but should a separate patch be added to allow
adjusting the interval, via a reopen operation?

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-27  9:46 [Qemu-devel] [PATCH v3 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-27  9:46 ` Alberto Garcia
  2015-05-27 12:27   ` Eric Blake
  2015-05-28 14:56   ` Max Reitz
  0 siblings, 2 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-27  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This adds a new 'cache-clean-interval' option that cleans all qcow2
cache entries that haven't been used in a certain interval, given in
seconds.

This allows setting a large L2 cache size so it can handle scenarios
with lots of I/O and at the same time use little memory during periods
of inactivity.

This feature currently relies on MADV_DONTNEED to free that memory, so
it is not useful in systems that don't follow that behavior.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
 block/qcow2.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json | 13 +++++++++--
 4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed14a92..a215f5b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -43,6 +43,7 @@ struct Qcow2Cache {
     bool                    depends_on_flush;
     void                   *table_array;
     uint64_t                lru_counter;
+    uint64_t                cache_clean_lru_counter;
 };
 
 static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
@@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
 #endif
 }
 
+static inline bool can_clean_entry(Qcow2Cache *c, int i)
+{
+    Qcow2CachedTable *t = &c->entries[i];
+    return t->ref == 0 && !t->dirty && t->offset != 0 &&
+        t->lru_counter <= c->cache_clean_lru_counter;
+}
+
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
+{
+    int i = 0;
+    while (i < c->size) {
+        int to_clean = 0;
+
+        /* Skip the entries that we don't need to clean */
+        while (i < c->size && !can_clean_entry(c, i)) {
+            i++;
+        }
+
+        /* And count how many we can clean in a row */
+        while (i < c->size && can_clean_entry(c, i)) {
+            c->entries[i].offset = 0;
+            c->entries[i].lru_counter = 0;
+            i++;
+            to_clean++;
+        }
+
+        if (to_clean > 0) {
+            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
+        }
+    }
+
+    c->cache_clean_lru_counter = c->lru_counter;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index f7b4cc6..abafcdf 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Maximum refcount block cache size",
         },
+        {
+            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Clean unused cache entries after this time (in seconds)",
+        },
         { /* end of list */ }
     },
 };
@@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
     [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void cache_clean_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    qcow2_cache_clean_unused(bs, s->l2_table_cache);
+    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
+    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              (int64_t) s->cache_clean_interval * 1000);
+}
+
+static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_interval > 0) {
+        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
+                                             SCALE_MS, cache_clean_timer_cb,
+                                             bs);
+        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  (int64_t) s->cache_clean_interval * 1000);
+    }
+}
+
+static void cache_clean_timer_del(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_timer) {
+        timer_del(s->cache_clean_timer);
+        timer_free(s->cache_clean_timer);
+        s->cache_clean_timer = NULL;
+    }
+}
+
+static void qcow2_detach_aio_context(BlockDriverState *bs)
+{
+    cache_clean_timer_del(bs);
+}
+
+static void qcow2_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    cache_clean_timer_init(bs, new_context);
+}
+
 static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
                              uint64_t *refcount_cache_size, Error **errp)
 {
@@ -552,6 +600,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     const char *opt_overlap_check, *opt_overlap_check_template;
     int overlap_check_template = 0;
     uint64_t l2_cache_size, refcount_cache_size;
+    uint64_t cache_clean_interval;
 
     ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
     if (ret < 0) {
@@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    cache_clean_interval =
+        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
+    if (cache_clean_interval > UINT_MAX) {
+        error_setg(errp, "Cache clean interval too big");
+        ret = -EINVAL;
+        goto fail;
+    }
+    s->cache_clean_interval = cache_clean_interval;
+    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
+
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
     s->cluster_data = qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
@@ -1004,6 +1063,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
     s->l1_table = NULL;
+    cache_clean_timer_del(bs);
     if (s->l2_table_cache) {
         qcow2_cache_destroy(bs, s->l2_table_cache);
     }
@@ -1458,6 +1518,7 @@ static void qcow2_close(BlockDriverState *bs)
         }
     }
 
+    cache_clean_timer_del(bs);
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
 
@@ -2552,6 +2613,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
         };
     }
 
+    spec_info->qcow2->cache_clean_interval = s->cache_clean_interval;
+
     return spec_info;
 }
 
@@ -2970,6 +3033,9 @@ BlockDriver bdrv_qcow2 = {
     .create_opts         = &qcow2_create_opts,
     .bdrv_check          = qcow2_check,
     .bdrv_amend_options  = qcow2_amend_options,
+
+    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
+    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
 };
 
 static void bdrv_qcow2_init(void)
diff --git a/block/qcow2.h b/block/qcow2.h
index 0076512..2c45eb2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -93,6 +93,7 @@
 #define QCOW2_OPT_CACHE_SIZE "cache-size"
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
+#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    QEMUTimer *cache_clean_timer;
+    unsigned cache_clean_interval;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
     Qcow2Cache *dependency);
 void qcow2_cache_depends_on_flush(Qcow2Cache *c);
 
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
 
 int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 863ffea..3a9b590 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -41,6 +41,10 @@
 # @corrupt: #optional true if the image has been marked corrupt; only valid for
 #           compat >= 1.1 (since 2.2)
 #
+# @cache-clean-interval: interval in seconds after which unused L2 and
+#                        refcount cache entries are removed. If 0 then
+#                        this feature is not enabled (since 2.4)
+#
 # @refcount-bits: width of a refcount entry in bits (since 2.3)
 #
 # Since: 1.7
@@ -50,7 +54,8 @@
       'compat': 'str',
       '*lazy-refcounts': 'bool',
       '*corrupt': 'bool',
-      'refcount-bits': 'int'
+      'refcount-bits': 'int',
+      'cache-clean-interval': 'int'
   } }
 
 ##
@@ -1538,6 +1543,9 @@
 # @refcount-cache-size:   #optional the maximum size of the refcount block cache
 #                         in bytes (since 2.2)
 #
+# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
+#                         caches. The interval is in seconds (since 2.4)
+#
 # Since: 1.7
 ##
 { 'struct': 'BlockdevOptionsQcow2',
@@ -1549,7 +1557,8 @@
             '*overlap-check': 'Qcow2OverlapChecks',
             '*cache-size': 'int',
             '*l2-cache-size': 'int',
-            '*refcount-cache-size': 'int' } }
+            '*refcount-cache-size': 'int',
+            '*cache-clean-interval': 'int' } }
 
 
 ##
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-26 19:10     ` Alberto Garcia
@ 2015-05-26 19:15       ` Eric Blake
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Blake @ 2015-05-26 19:15 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

On 05/26/2015 01:10 PM, Alberto Garcia wrote:
> On Tue 26 May 2015 08:52:41 PM CEST, Eric Blake <eblake@redhat.com> wrote:
> 
>>> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
>>> +#                         caches. The interval is in seconds (since 2.4)
>>> +#
>>>  # Since: 1.7
>>
>> Is there any QMP command that can query the current interval?  I'm not
>> a big fan of write-only interfaces.
> 
> That's a good suggestion, thanks.
> 
> Do we have any API where we could add this? I would rather not have a
> separate command just for this option, if possible.

I agree that a new command is not necessary, and that the best is to
modify an existing command.  It looks like ImageInfoSpecificQCow2 is the
corresponding struct to edit, at which point ImageInfo (part of
query-block) will automatically expose it.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-26 18:52   ` Eric Blake
@ 2015-05-26 19:10     ` Alberto Garcia
  2015-05-26 19:15       ` Eric Blake
  0 siblings, 1 reply; 37+ messages in thread
From: Alberto Garcia @ 2015-05-26 19:10 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi, Max Reitz

On Tue 26 May 2015 08:52:41 PM CEST, Eric Blake <eblake@redhat.com> wrote:

>> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
>> +#                         caches. The interval is in seconds (since 2.4)
>> +#
>>  # Since: 1.7
>
> Is there any QMP command that can query the current interval?  I'm not
> a big fan of write-only interfaces.

That's a good suggestion, thanks.

Do we have any API where we could add this? I would rather not have a
separate command just for this option, if possible.

Berto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-26 17:14 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
  2015-05-26 17:26   ` Max Reitz
@ 2015-05-26 18:52   ` Eric Blake
  2015-05-26 19:10     ` Alberto Garcia
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Blake @ 2015-05-26 18:52 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1451 bytes --]

On 05/26/2015 11:14 AM, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
> 
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
> 
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>  block/qcow2.c        | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.h        |  4 ++++
>  qapi/block-core.json |  6 ++++-
>  4 files changed, 108 insertions(+), 1 deletion(-)
> 

> +++ b/qapi/block-core.json
> @@ -1538,6 +1538,9 @@
>  # @refcount-cache-size:   #optional the maximum size of the refcount block cache
>  #                         in bytes (since 2.2)
>  #
> +# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
> +#                         caches. The interval is in seconds (since 2.4)
> +#
>  # Since: 1.7

Is there any QMP command that can query the current interval?  I'm not a
big fan of write-only interfaces.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-26 17:14 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
@ 2015-05-26 17:26   ` Max Reitz
  2015-05-26 18:52   ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Max Reitz @ 2015-05-26 17:26 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Stefan Hajnoczi

On 26.05.2015 19:14, Alberto Garcia wrote:
> This adds a new 'cache-clean-interval' option that cleans all qcow2
> cache entries that haven't been used in a certain interval, given in
> seconds.
>
> This allows setting a large L2 cache size so it can handle scenarios
> with lots of I/O and at the same time use little memory during periods
> of inactivity.
>
> This feature currently relies on MADV_DONTNEED to free that memory, so
> it is not useful in systems that don't follow that behavior.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
>   block/qcow2.c        | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2.h        |  4 ++++
>   qapi/block-core.json |  6 ++++-
>   4 files changed, 108 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time
  2015-05-26 17:14 [Qemu-devel] [PATCH v2 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
@ 2015-05-26 17:14 ` Alberto Garcia
  2015-05-26 17:26   ` Max Reitz
  2015-05-26 18:52   ` Eric Blake
  0 siblings, 2 replies; 37+ messages in thread
From: Alberto Garcia @ 2015-05-26 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This adds a new 'cache-clean-interval' option that cleans all qcow2
cache entries that haven't been used in a certain interval, given in
seconds.

This allows setting a large L2 cache size so it can handle scenarios
with lots of I/O and at the same time use little memory during periods
of inactivity.

This feature currently relies on MADV_DONTNEED to free that memory, so
it is not useful in systems that don't follow that behavior.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c  | 35 ++++++++++++++++++++++++++++
 block/qcow2.c        | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h        |  4 ++++
 qapi/block-core.json |  6 ++++-
 4 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed14a92..a215f5b 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -43,6 +43,7 @@ struct Qcow2Cache {
     bool                    depends_on_flush;
     void                   *table_array;
     uint64_t                lru_counter;
+    uint64_t                cache_clean_lru_counter;
 };
 
 static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
@@ -78,6 +79,40 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
 #endif
 }
 
+static inline bool can_clean_entry(Qcow2Cache *c, int i)
+{
+    Qcow2CachedTable *t = &c->entries[i];
+    return t->ref == 0 && !t->dirty && t->offset != 0 &&
+        t->lru_counter <= c->cache_clean_lru_counter;
+}
+
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c)
+{
+    int i = 0;
+    while (i < c->size) {
+        int to_clean = 0;
+
+        /* Skip the entries that we don't need to clean */
+        while (i < c->size && !can_clean_entry(c, i)) {
+            i++;
+        }
+
+        /* And count how many we can clean in a row */
+        while (i < c->size && can_clean_entry(c, i)) {
+            c->entries[i].offset = 0;
+            c->entries[i].lru_counter = 0;
+            i++;
+            to_clean++;
+        }
+
+        if (to_clean > 0) {
+            qcow2_cache_table_release(bs, c, i - to_clean, to_clean);
+        }
+    }
+
+    c->cache_clean_lru_counter = c->lru_counter;
+}
+
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
 {
     BDRVQcowState *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index f7b4cc6..8c565e9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -468,6 +468,11 @@ static QemuOptsList qcow2_runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Maximum refcount block cache size",
         },
+        {
+            .name = QCOW2_OPT_CACHE_CLEAN_INTERVAL,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Clean unused cache entries after this time (in seconds)",
+        },
         { /* end of list */ }
     },
 };
@@ -483,6 +488,49 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
     [QCOW2_OL_INACTIVE_L2_BITNR]    = QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void cache_clean_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcowState *s = bs->opaque;
+    qcow2_cache_clean_unused(bs, s->l2_table_cache);
+    qcow2_cache_clean_unused(bs, s->refcount_block_cache);
+    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+              (int64_t) s->cache_clean_interval * 1000);
+}
+
+static void cache_clean_timer_init(BlockDriverState *bs, AioContext *context)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_interval > 0) {
+        s->cache_clean_timer = aio_timer_new(context, QEMU_CLOCK_VIRTUAL,
+                                             SCALE_MS, cache_clean_timer_cb,
+                                             bs);
+        timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  (int64_t) s->cache_clean_interval * 1000);
+    }
+}
+
+static void cache_clean_timer_del(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    if (s->cache_clean_timer) {
+        timer_del(s->cache_clean_timer);
+        timer_free(s->cache_clean_timer);
+        s->cache_clean_timer = NULL;
+    }
+}
+
+static void qcow2_detach_aio_context(BlockDriverState *bs)
+{
+    cache_clean_timer_del(bs);
+}
+
+static void qcow2_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    cache_clean_timer_init(bs, new_context);
+}
+
 static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
                              uint64_t *refcount_cache_size, Error **errp)
 {
@@ -552,6 +600,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     const char *opt_overlap_check, *opt_overlap_check_template;
     int overlap_check_template = 0;
     uint64_t l2_cache_size, refcount_cache_size;
+    uint64_t cache_clean_interval;
 
     ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
     if (ret < 0) {
@@ -839,6 +888,16 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    cache_clean_interval =
+        qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, 0);
+    if (cache_clean_interval > UINT_MAX) {
+        error_setg(errp, "Cache clean interval too big");
+        ret = -EINVAL;
+        goto fail;
+    }
+    s->cache_clean_interval = cache_clean_interval;
+    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
+
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
     s->cluster_data = qemu_try_blockalign(bs->file, QCOW_MAX_CRYPT_CLUSTERS
@@ -1004,6 +1063,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
     s->l1_table = NULL;
+    cache_clean_timer_del(bs);
     if (s->l2_table_cache) {
         qcow2_cache_destroy(bs, s->l2_table_cache);
     }
@@ -1458,6 +1518,7 @@ static void qcow2_close(BlockDriverState *bs)
         }
     }
 
+    cache_clean_timer_del(bs);
     qcow2_cache_destroy(bs, s->l2_table_cache);
     qcow2_cache_destroy(bs, s->refcount_block_cache);
 
@@ -2970,6 +3031,9 @@ BlockDriver bdrv_qcow2 = {
     .create_opts         = &qcow2_create_opts,
     .bdrv_check          = qcow2_check,
     .bdrv_amend_options  = qcow2_amend_options,
+
+    .bdrv_detach_aio_context  = qcow2_detach_aio_context,
+    .bdrv_attach_aio_context  = qcow2_attach_aio_context,
 };
 
 static void bdrv_qcow2_init(void)
diff --git a/block/qcow2.h b/block/qcow2.h
index 0076512..2c45eb2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -93,6 +93,7 @@
 #define QCOW2_OPT_CACHE_SIZE "cache-size"
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
+#define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -236,6 +237,8 @@ typedef struct BDRVQcowState {
 
     Qcow2Cache* l2_table_cache;
     Qcow2Cache* refcount_block_cache;
+    QEMUTimer *cache_clean_timer;
+    unsigned cache_clean_interval;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -581,6 +584,7 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
     Qcow2Cache *dependency);
 void qcow2_cache_depends_on_flush(Qcow2Cache *c);
 
+void qcow2_cache_clean_unused(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
 
 int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 863ffea..f42338d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1538,6 +1538,9 @@
 # @refcount-cache-size:   #optional the maximum size of the refcount block cache
 #                         in bytes (since 2.2)
 #
+# @cache-clean-interval:  #optional clean unused entries in the L2 and refcount
+#                         caches. The interval is in seconds (since 2.4)
+#
 # Since: 1.7
 ##
 { 'struct': 'BlockdevOptionsQcow2',
@@ -1549,7 +1552,8 @@
             '*overlap-check': 'Qcow2OverlapChecks',
             '*cache-size': 'int',
             '*l2-cache-size': 'int',
-            '*refcount-cache-size': 'int' } }
+            '*refcount-cache-size': 'int',
+            '*cache-clean-interval': 'int' } }
 
 
 ##
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-06-02 11:04 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-18 16:48 [Qemu-devel] [PATCH 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-18 16:48 ` [Qemu-devel] [PATCH 1/3] qcow2: mark the memory as no longer needed after qcow2_cache_empty() Alberto Garcia
2015-05-26 15:39   ` Max Reitz
2015-05-26 15:51     ` Alberto Garcia
2015-05-26 15:51       ` Max Reitz
2015-05-18 16:48 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
2015-05-26 16:07   ` Max Reitz
2015-05-18 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: reorder fields in Qcow2CachedTable to reduce padding Alberto Garcia
2015-05-26 16:10   ` Max Reitz
2015-05-26 16:12     ` Eric Blake
2015-05-26 16:14       ` Max Reitz
2015-05-26 16:20     ` Alberto Garcia
2015-05-26 16:13   ` Eric Blake
2015-05-26 17:14 [Qemu-devel] [PATCH v2 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-26 17:14 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
2015-05-26 17:26   ` Max Reitz
2015-05-26 18:52   ` Eric Blake
2015-05-26 19:10     ` Alberto Garcia
2015-05-26 19:15       ` Eric Blake
2015-05-27  9:46 [Qemu-devel] [PATCH v3 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-27  9:46 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
2015-05-27 12:27   ` Eric Blake
2015-05-27 14:34     ` Alberto Garcia
2015-05-28 14:47     ` Max Reitz
2015-05-28 14:56   ` Max Reitz
2015-05-28 15:02     ` Alberto Garcia
2015-05-28 15:04       ` Max Reitz
2015-05-28 15:06         ` Max Reitz
2015-05-28 15:13     ` Eric Blake
2015-05-28 15:14       ` Max Reitz
2015-05-28 15:19         ` Alberto Garcia
2015-05-28 15:23           ` Max Reitz
2015-05-28 15:30             ` Alberto Garcia
2015-05-28 19:41             ` Eric Blake
2015-05-29  8:32               ` Kevin Wolf
2015-05-28 16:44       ` Kevin Wolf
2015-05-28 17:03         ` Alberto Garcia
2015-05-29  9:24 [Qemu-devel] [PATCH v4 0/3] Clean unused entries in the qcow2 L2/refcount cache Alberto Garcia
2015-05-29  9:24 ` [Qemu-devel] [PATCH 2/3] qcow2: add option to clean unused cache entries after some time Alberto Garcia
2015-06-02 11:04   ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.