[Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
@ 2018-07-29 21:27 Leonid Bloch
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore Leonid Bloch
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Leonid Bloch @ 2018-07-29 21:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Leonid Bloch

This series makes the qcow2 L2 cache cover the entire image by default.
The importance of this change is in noticeable performance improvement,
especially with heavy random I/O. The memory overhead is very small:
only 1 MB of cache for every 8 GB of image size. On systems with very
limited RAM the maximal cache size can be limited by the existing
cache-size and l2-cache-size options.

The L2 cache is also resized accordingly, by default, if the image is
resized.

The reasons for making this behavior default, unlike in the patches I
have sent previously, are the following:
1) Unelegant complications with making an option to accept either a
   size or a string, while outputting a proper error message.
2) More bulky logic to sort out what to do if the image is being resized
   but the (defined) overall cache size is too small to contain the
   new l2-cache-size.
(Making this behavior default resolves all of these technical issues
neatly)
3) The performance gain (as measured by fio in random read/write tests)
   can be as high as 50%, or even more, so this would be a reasonable
   default behavior.
4) The memory overhead is really small for the gain, and in cases when
   memory economy is critical, the maximal cache values can always be
   set by the appropriate options.

Leonid Bloch (6):
  Update .gitignore
  qcow2: A grammar fix in conflicting cache sizing error message
  qcow2: Options' documentation fixes
  qcow2: Update total_sectors when resizing the image
  qcow2: Make the default L2 cache sufficient to cover the entire image
  qcow2: Resize the cache upon image resizing

 .gitignore                 |  1 +
 block/qcow2.c              | 20 +++++++++++++-------
 block/qcow2.h              |  4 ----
 docs/qcow2-cache.txt       | 30 +++++++++++++++++-------------
 qapi/block-core.json       |  6 +++---
 qemu-options.hx            | 15 +++++++++------
 tests/qemu-iotests/103.out |  4 ++--
 tests/qemu-iotests/137.out |  2 +-
 8 files changed, 46 insertions(+), 36 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
@ 2018-07-29 21:27 ` Leonid Bloch
  2018-07-30 15:40   ` Eric Blake
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 2/6 for-3.0] qcow2: A grammar fix in conflicting cache sizing error message Leonid Bloch
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Leonid Bloch @ 2018-07-29 21:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Leonid Bloch

Adds /roms/vgabios to .gitignore - this directory appears after the
build, and was not ignored by Git so far.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index 5668d02782..2b3b30ae9f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -212,3 +212,4 @@ trace-dtrace-root.dtrace
 trace-ust-all.h
 trace-ust-all.c
 /target/arm/decode-sve.inc.c
+/roms/vgabios
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 2/6 for-3.0] qcow2: A grammar fix in conflicting cache sizing error message
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore Leonid Bloch
@ 2018-07-29 21:27 ` Leonid Bloch
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 3/6 for-3.0] qcow2: Options' documentation fixes Leonid Bloch
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Leonid Bloch @ 2018-07-29 21:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Leonid Bloch

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 block/qcow2.c              | 2 +-
 tests/qemu-iotests/103.out | 4 ++--
 tests/qemu-iotests/137.out | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6162ed8be2..ec9e6238a0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -797,7 +797,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
         if (l2_cache_size_set && refcount_cache_size_set) {
             error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
                        " and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
-                       "the same time");
+                       "at the same time");
             return;
         } else if (*l2_cache_size > combined_cache_size) {
             error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
diff --git a/tests/qemu-iotests/103.out b/tests/qemu-iotests/103.out
index bd45d3875a..bd9eec3250 100644
--- a/tests/qemu-iotests/103.out
+++ b/tests/qemu-iotests/103.out
@@ -5,10 +5,10 @@ wrote 65536/65536 bytes at offset 0
 
 === Testing invalid option combinations ===
 
-can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and refcount-cache-size may not be set the same time
+can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and refcount-cache-size may not be set at the same time
 can't open device TEST_DIR/t.IMGFMT: l2-cache-size may not exceed cache-size
 can't open device TEST_DIR/t.IMGFMT: refcount-cache-size may not exceed cache-size
-can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and refcount-cache-size may not be set the same time
+can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and refcount-cache-size may not be set at the same time
 can't open device TEST_DIR/t.IMGFMT: L2 cache entry size must be a power of two between 512 and the cluster size (65536)
 can't open device TEST_DIR/t.IMGFMT: L2 cache entry size must be a power of two between 512 and the cluster size (65536)
 can't open device TEST_DIR/t.IMGFMT: L2 cache entry size must be a power of two between 512 and the cluster size (65536)
diff --git a/tests/qemu-iotests/137.out b/tests/qemu-iotests/137.out
index 96724a6c33..6a2ffc71fd 100644
--- a/tests/qemu-iotests/137.out
+++ b/tests/qemu-iotests/137.out
@@ -16,7 +16,7 @@ read 33554432/33554432 bytes at offset 0
 === Try setting some invalid values ===
 
 Parameter 'lazy-refcounts' expects 'on' or 'off'
-cache-size, l2-cache-size and refcount-cache-size may not be set the same time
+cache-size, l2-cache-size and refcount-cache-size may not be set at the same time
 l2-cache-size may not exceed cache-size
 refcount-cache-size may not exceed cache-size
 L2 cache size too big
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 3/6 for-3.0] qcow2: Options' documentation fixes
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore Leonid Bloch
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 2/6 for-3.0] qcow2: A grammar fix in conflicting cache sizing error message Leonid Bloch
@ 2018-07-29 21:27 ` Leonid Bloch
  2018-08-03 11:27   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image Leonid Bloch
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Leonid Bloch @ 2018-07-29 21:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Leonid Bloch

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 docs/qcow2-cache.txt |  9 ++++++---
 qapi/block-core.json |  6 +++---
 qemu-options.hx      | 15 +++++++++------
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 8a09a5cc5f..fd9a6911cc 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -112,9 +112,9 @@ command-line, or the 'blockdev-add' QMP command.
 
 There are three options available, and all of them take bytes:
 
-"l2-cache-size":         maximum size of the L2 table cache
-"refcount-cache-size":   maximum size of the refcount block cache
-"cache-size":            maximum size of both caches combined
+"l2-cache-size":         maximal size of the L2 table cache
+"refcount-cache-size":   maximal size of the refcount block cache
+"cache-size":            maximal size of both caches combined
 
 There are a few things that need to be taken into account:
 
@@ -130,6 +130,9 @@ There are a few things that need to be taken into account:
    memory as possible to the L2 cache before increasing the refcount
    cache size.
 
+ - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
+   can be set simultaneously.
+
 Unlike L2 tables, refcount blocks are not used during normal I/O but
 only during allocations and internal snapshots. In most cases they are
 accessed sequentially (even during random guest I/O) so increasing the
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d40d5ecc3b..22f85687df 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2808,10 +2808,10 @@
 # @overlap-check:         which overlap checks to perform for writes
 #                         to the image, defaults to 'cached' (since 2.2)
 #
-# @cache-size:            the maximum total size of the L2 table and
+# @cache-size:            the maximal total size of the L2 table and
 #                         refcount block caches in bytes (since 2.2)
 #
-# @l2-cache-size:         the maximum size of the L2 table cache in
+# @l2-cache-size:         the maximal size of the L2 table cache in
 #                         bytes (since 2.2)
 #
 # @l2-cache-entry-size:   the size of each entry in the L2 cache in
@@ -2819,7 +2819,7 @@
 #                         and the cluster size. The default value is
 #                         the cluster size (since 2.12)
 #
-# @refcount-cache-size:   the maximum size of the refcount block cache
+# @refcount-cache-size:   the maximal size of the refcount block cache
 #                         in bytes (since 2.2)
 #
 # @cache-clean-interval:  clean unused entries in the L2 and refcount
diff --git a/qemu-options.hx b/qemu-options.hx
index b1bf0f485f..18f3f87da5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -751,16 +751,19 @@ Whether to enable the lazy refcounts feature (on/off; default is taken from the
 image file)
 
 @item cache-size
-The maximum total size of the L2 table and refcount block caches in bytes
-(default: 1048576 bytes or 8 clusters, whichever is larger)
+The maximal total size of the L2 table and refcount block caches in bytes
+(default: the sum of l2-cache-size and refcount-cache-size)
 
 @item l2-cache-size
-The maximum size of the L2 table cache in bytes
-(default: 4/5 of the total cache size)
+The maximal size of the L2 table cache in bytes
+(default: if cache-size is not defined - 1048576 bytes or 8 clusters, whichever
+is larger; otherwise, as large as possible or needed within the cache-size,
+while permitting the requested or the minimal refcount cache size)
 
 @item refcount-cache-size
-The maximum size of the refcount block cache in bytes
-(default: 1/5 of the total cache size)
+The maximal size of the refcount block cache in bytes
+(default: 4 times the cluster size; or if cache-size is specified, the part of
+it which is not used for the L2 cache)
 
 @item cache-clean-interval
 Clean unused entries in the L2 and refcount caches. The interval is in seconds.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
                   ` (2 preceding siblings ...)
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 3/6 for-3.0] qcow2: Options' documentation fixes Leonid Bloch
@ 2018-07-29 21:27 ` Leonid Bloch
  2018-07-30  9:43   ` Kevin Wolf
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 5/6] qcow2: Make the default L2 cache sufficient to cover the entire image Leonid Bloch
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Leonid Bloch @ 2018-07-29 21:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Leonid Bloch

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 block/qcow2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index ec9e6238a0..223d351e40 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3646,6 +3646,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         }
     }
 
+    bs->total_sectors = offset / 512;
+
     /* write updated header.size */
     offset = cpu_to_be64(offset);
     ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 5/6] qcow2: Make the default L2 cache sufficient to cover the entire image
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
                   ` (3 preceding siblings ...)
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image Leonid Bloch
@ 2018-07-29 21:27 ` Leonid Bloch
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 6/6] qcow2: Resize the cache upon image resizing Leonid Bloch
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Leonid Bloch @ 2018-07-29 21:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Leonid Bloch

Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O. The memory overhead is not significant,
as the cache size is only 1 MB for each 8 GB of virtual image size (with
the default cluster size of 64 KB). On systems with limited memory, one
can limit the cache size by the l2-cache-size and cache-size options.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 block/qcow2.c        | 10 ++++------
 block/qcow2.h        |  4 ----
 docs/qcow2-cache.txt | 21 +++++++++++----------
 qemu-options.hx      |  6 +++---
 4 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 223d351e40..74f2cb10a4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -793,6 +793,9 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     *l2_cache_entry_size = qemu_opt_get_size(
         opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size);
 
+    uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
+    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
+
     if (combined_cache_size_set) {
         if (l2_cache_size_set && refcount_cache_size_set) {
             error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
@@ -814,9 +817,6 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
         } else if (refcount_cache_size_set) {
             *l2_cache_size = combined_cache_size - *refcount_cache_size;
         } else {
-            uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
-            uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
-
             /* Assign as much memory as possible to the L2 cache, and
              * use the remainder for the refcount cache */
             if (combined_cache_size >= max_l2_cache + min_refcount_cache) {
@@ -830,9 +830,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
         }
     } else {
         if (!l2_cache_size_set) {
-            *l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
-                                 (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
-                                 * s->cluster_size);
+            *l2_cache_size = max_l2_cache;
         }
         if (!refcount_cache_size_set) {
             *refcount_cache_size = min_refcount_cache;
diff --git a/block/qcow2.h b/block/qcow2.h
index 81b844e936..1b9005e13c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -73,10 +73,6 @@
 /* Must be at least 4 to cover all cases of refcount table growth */
 #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
 
-/* Whichever is more */
-#define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */
-#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
-
 #define DEFAULT_CLUSTER_SIZE 65536
 
 
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index fd9a6911cc..bcc03c8857 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -97,12 +97,14 @@ need:
    l2_cache_size = disk_size_GB * 131072
    refcount_cache_size = disk_size_GB * 32768
 
-QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
-cache of 256KB (262144 bytes), so using the formulas we've just seen
-we have
+QEMU will use a default L2 cache sufficient to cover the entire virtual
+size of an image, which with the default cluster size will result in 1MB
+of cache for every 8GB of virtual image size:
 
-   1048576 / 131072 = 8 GB of virtual disk covered by that cache
-    262144 /  32768 = 8 GB
+   65536 / 8 = 8192 = 8 GB / 1 MB
+
+A default refcount cache is 4 times the cluster size, which defaults to
+256KB (262144 bytes).
 
 
 How to configure the cache sizes
@@ -121,8 +123,8 @@ There are a few things that need to be taken into account:
  - Both caches must have a size that is a multiple of the cluster size
    (or the cache entry size: see "Using smaller cache sizes" below).
 
- - The default L2 cache size is 8 clusters or 1MB (whichever is more),
-   and the minimum is 2 clusters (or 2 cache entries, see below).
+ - The default L2 cache size is enough to cover the entire virtual size of
+   an image, and the minimum is 2 clusters (or 2 cache entries, see below).
 
  - The default (and minimum) refcount cache size is 4 clusters.
 
@@ -180,9 +182,8 @@ Some things to take into account:
    always uses the cluster size as the entry size.
 
  - If the L2 cache is big enough to hold all of the image's L2 tables
-   (as explained in the "Choosing the right cache sizes" section
-   earlier in this document) then none of this is necessary and you
-   can omit the "l2-cache-entry-size" parameter altogether.
+   (the default behavior) then none of this is necessary and you can
+   omit the "l2-cache-entry-size" parameter altogether.
 
 
 Reducing the memory usage
diff --git a/qemu-options.hx b/qemu-options.hx
index 18f3f87da5..c9e7764c13 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -756,9 +756,9 @@ The maximal total size of the L2 table and refcount block caches in bytes
 
 @item l2-cache-size
 The maximal size of the L2 table cache in bytes
-(default: if cache-size is not defined - 1048576 bytes or 8 clusters, whichever
-is larger; otherwise, as large as possible or needed within the cache-size,
-while permitting the requested or the minimal refcount cache size)
+(default: if cache-size is not defined - enough to cover the entire image;
+otherwise, as large as possible or needed within the cache-size, while
+permitting the requested or the minimal refcount cache size)
 
 @item refcount-cache-size
 The maximal size of the refcount block cache in bytes
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 6/6] qcow2: Resize the cache upon image resizing
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
                   ` (4 preceding siblings ...)
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 5/6] qcow2: Make the default L2 cache sufficient to cover the entire image Leonid Bloch
@ 2018-07-29 21:27 ` Leonid Bloch
  2018-08-03 12:42   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2018-07-30 10:55 ` [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Kevin Wolf
  2018-08-03  7:38 ` Kevin Wolf
  7 siblings, 1 reply; 29+ messages in thread
From: Leonid Bloch @ 2018-07-29 21:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Leonid Bloch

The caches are now recalculated upon image resizing. This is done
because the new default behavior of assigning a sufficient L2 cache to
cover the entire image implies that the cache will still be sufficient
after image resizing. To put a limit on the cache, the options
cache-size and l2-cache-size can be used.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 block/qcow2.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 74f2cb10a4..06fac1bb8c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3656,6 +3656,12 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
     }
 
     s->l1_vm_state_index = new_l1_size;
+    /* Update cache sizes */
+    QDict *options = qdict_clone_shallow(bs->options);
+    ret = qcow2_update_options(bs, options, s->flags, errp);
+    if (ret < 0) {
+        goto fail;
+    }
     ret = 0;
 fail:
     qemu_co_mutex_unlock(&s->lock);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image Leonid Bloch
@ 2018-07-30  9:43   ` Kevin Wolf
  2018-07-30 11:41     ` Leonid Bloch
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2018-07-30  9:43 UTC (permalink / raw)
  To: Leonid Bloch; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake

Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
> Signed-off-by: Leonid Bloch <lbloch@janustech.com>
> ---
>  block/qcow2.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index ec9e6238a0..223d351e40 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3646,6 +3646,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>          }
>      }
>  
> +    bs->total_sectors = offset / 512;
> +
>      /* write updated header.size */
>      offset = cpu_to_be64(offset);
>      ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),

This shouldn't be necessary, bdrv_co_truncate() already updates
bs->total_sectors after calling the block driver.

If this is needed by one of the following patches, we need a comment
that explains why this seemingly superfluous assignment is actually
necessary.

Also, 512 should be BDRV_SECTOR_SIZE.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
                   ` (5 preceding siblings ...)
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 6/6] qcow2: Resize the cache upon image resizing Leonid Bloch
@ 2018-07-30 10:55 ` Kevin Wolf
  2018-07-30 12:13   ` Leonid Bloch
  2018-08-03 13:37   ` Alberto Garcia
  2018-08-03  7:38 ` Kevin Wolf
  7 siblings, 2 replies; 29+ messages in thread
From: Kevin Wolf @ 2018-07-30 10:55 UTC (permalink / raw)
  To: Leonid Bloch; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, berto

Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
> This series makes the qcow2 L2 cache cover the entire image by default.
> The importance of this change is in noticeable performance improvement,
> especially with heavy random I/O. The memory overhead is very small:
> only 1 MB of cache for every 8 GB of image size. On systems with very
> limited RAM the maximal cache size can be limited by the existing
> cache-size and l2-cache-size options.
> 
> The L2 cache is also resized accordingly, by default, if the image is
> resized.

I agree with changing the defaults, I would have proposed a change
myself soon. We have been offering cache size options for a long time,
and management tools are still ignoring them. So we need to do something
in QEMU.

Now, what the exact defaults should be, is something to use a bit more
thought on. You say "the memory overhead is very small", without going
into details. Let's check.

This is the formula from your patch (unit is bytes):

    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);

So we get:

    64k clusters, 8G image:   1M (maximum covered by old default)
    64k clusters, 1T image: 128M
    1M clusters, 1T image:    8M
    512b clusters, 1T image: 16G

1T is by far not the maximum size for a qcow2 image, and 16G memory
usage is definitely not "very small" overhead. Even the 128M for the
default cluster size are already a lot. So the simplistic approch of
this series won't work as a default.

Let's refine it a bit:

* Basing it on max_l2_cache sounds fine generally
* Cap it at 32 MB (?) by default
* Enable cache-clean-interval=30 (?) by default to compensate a bit for
  the higher maximum memory usage

Another thing I just noticed while looking at the code is that
cache-clean-interval only considers blocks that aren't dirty, but
doesn't take measures to get dirty blocks written out, so we depend on
the guest flushing the cache before we can get free the memory. Should
we attempt to write unused dirty entries back? Berto?

> The reasons for making this behavior default, unlike in the patches I
> have sent previously, are the following:
> 1) Unelegant complications with making an option to accept either a
>    size or a string, while outputting a proper error message.
> 2) More bulky logic to sort out what to do if the image is being resized
>    but the (defined) overall cache size is too small to contain the
>    new l2-cache-size.
> (Making this behavior default resolves all of these technical issues
> neatly)

The default must always correspond to some explicit setting. We can only
change defaults relatively freely because we can assume that if a user
cared about the setting, they would have specified it explicitly. If
it's not possible to specify a setting explicitly, we can't make this
assumption any more, so we'd be stuck with the default forever.

Now, considering what I wrote above, an alternate wouldn't be the best
way to represent this any more. We do have a cache size again (the 32 MB
cap) even while trying to cover the whole image.

Maybe the right interface with this in mind would be a boolean option
that specifies whether the given cache sizes are exact values (old
semantics) or maximum values, which are limited to what the actual
images size requires. If you do want the "real" full mode, you'd set
l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
better name). The new default would be l2-cache-size=32M,
exact-cache-sizes=false. The old default is cache-size=1M,
exact-cache-sizes=true.

Kevin

> 3) The performance gain (as measured by fio in random read/write tests)
>    can be as high as 50%, or even more, so this would be a reasonable
>    default behavior.
> 4) The memory overhead is really small for the gain, and in cases when
>    memory economy is critical, the maximal cache values can always be
>    set by the appropriate options.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image
  2018-07-30  9:43   ` Kevin Wolf
@ 2018-07-30 11:41     ` Leonid Bloch
  2018-07-30 12:28       ` Kevin Wolf
  0 siblings, 1 reply; 29+ messages in thread
From: Leonid Bloch @ 2018-07-30 11:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake

On 07/30/2018 12:43 PM, Kevin Wolf wrote:
> Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
>> Signed-off-by: Leonid Bloch <lbloch@janustech.com>
>> ---
>>   block/qcow2.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index ec9e6238a0..223d351e40 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -3646,6 +3646,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>           }
>>       }
>>   
>> +    bs->total_sectors = offset / 512;
>> +
>>       /* write updated header.size */
>>       offset = cpu_to_be64(offset);
>>       ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
> 
> This shouldn't be necessary, bdrv_co_truncate() already updates
> bs->total_sectors after calling the block driver.

It looks like without it the cache resize works only on the second resize.

> If this is needed by one of the following patches, we need a comment
> that explains why this seemingly superfluous assignment is actually
> necessary.
> 
> Also, 512 should be BDRV_SECTOR_SIZE.

I was surprised that it's not, but it's 512 also in two other places, 
including in qcow2_co_truncate itself. So I decided to keep that. 
Probably would be better if I'd repair it in the other places instead. :)

Thanks!
Leonid.

> Kevin
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-07-30 10:55 ` [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Kevin Wolf
@ 2018-07-30 12:13   ` Leonid Bloch
  2018-07-30 12:44     ` Kevin Wolf
  2018-08-03 13:37   ` Alberto Garcia
  1 sibling, 1 reply; 29+ messages in thread
From: Leonid Bloch @ 2018-07-30 12:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, berto

On 07/30/2018 01:55 PM, Kevin Wolf wrote:
> Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
>> This series makes the qcow2 L2 cache cover the entire image by default.
>> The importance of this change is in noticeable performance improvement,
>> especially with heavy random I/O. The memory overhead is very small:
>> only 1 MB of cache for every 8 GB of image size. On systems with very
>> limited RAM the maximal cache size can be limited by the existing
>> cache-size and l2-cache-size options.
>>
>> The L2 cache is also resized accordingly, by default, if the image is
>> resized.
> 
> I agree with changing the defaults, I would have proposed a change
> myself soon. We have been offering cache size options for a long time,
> and management tools are still ignoring them. So we need to do something
> in QEMU.
> 
> Now, what the exact defaults should be, is something to use a bit more
> thought on. You say "the memory overhead is very small", without going
> into details. Let's check.
> 
> This is the formula from your patch (unit is bytes):
> 
>      uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> 
> So we get:
> 
>      64k clusters, 8G image:   1M (maximum covered by old default)
>      64k clusters, 1T image: 128M
>      1M clusters, 1T image:    8M
>      512b clusters, 1T image: 16G
> 
> 1T is by far not the maximum size for a qcow2 image, and 16G memory
> usage is definitely not "very small" overhead. Even the 128M for the
> default cluster size are already a lot. So the simplistic approch of
> this series won't work as a default.
> 
> Let's refine it a bit:
> 
> * Basing it on max_l2_cache sounds fine generally
> * Cap it at 32 MB (?) by default

Yes! A great idea! Definitely necessary.

> * Enable cache-clean-interval=30 (?) by default to compensate a bit for
>    the higher maximum memory usage

Do you think that we need this if we cap the cache at 32 MB? And that's 
only the cap. 256 GB+ images are not that often used. And compared to 
the overall QEMU overhead, of over 500 MB, extra 32 in the worst case is 
not that much, considering the performance gain.

> Another thing I just noticed while looking at the code is that
> cache-clean-interval only considers blocks that aren't dirty, but
> doesn't take measures to get dirty blocks written out, so we depend on
> the guest flushing the cache before we can get free the memory. Should
> we attempt to write unused dirty entries back? Berto?
> 
>> The reasons for making this behavior default, unlike in the patches I
>> have sent previously, are the following:
>> 1) Unelegant complications with making an option to accept either a
>>     size or a string, while outputting a proper error message.
>> 2) More bulky logic to sort out what to do if the image is being resized
>>     but the (defined) overall cache size is too small to contain the
>>     new l2-cache-size.
>> (Making this behavior default resolves all of these technical issues
>> neatly)
> 
> The default must always correspond to some explicit setting. We can only
> change defaults relatively freely because we can assume that if a user
> cared about the setting, they would have specified it explicitly. If
> it's not possible to specify a setting explicitly, we can't make this
> assumption any more, so we'd be stuck with the default forever.
> 
> Now, considering what I wrote above, an alternate wouldn't be the best
> way to represent this any more. We do have a cache size again (the 32 MB
> cap) even while trying to cover the whole image.
> 
> Maybe the right interface with this in mind would be a boolean option
> that specifies whether the given cache sizes are exact values (old
> semantics) or maximum values, which are limited to what the actual
> images size requires. If you do want the "real" full mode, you'd set
> l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
> better name). The new default would be l2-cache-size=32M,
> exact-cache-sizes=false. The old default is cache-size=1M,
> exact-cache-sizes=true.

The existing cache-size and l2-cache-size options are already documented 
as maximal values. It would make sense to actually make them as such: 
the caches will be as big as necessary to cover the entire image (no 
need for them to be more than that) but they will be capped by the 
current options, while the new default of l2-cache-size will be 32M. Why 
would one need exact-cache-sizes, if they would be MIN(needed, max)?
Does it sound reasonable?

Leonid.

> 
> Kevin
> 
>> 3) The performance gain (as measured by fio in random read/write tests)
>>     can be as high as 50%, or even more, so this would be a reasonable
>>     default behavior.
>> 4) The memory overhead is really small for the gain, and in cases when
>>     memory economy is critical, the maximal cache values can always be
>>     set by the appropriate options.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image
  2018-07-30 11:41     ` Leonid Bloch
@ 2018-07-30 12:28       ` Kevin Wolf
  2018-08-03 11:56         ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2018-07-30 12:28 UTC (permalink / raw)
  To: Leonid Bloch; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake

Am 30.07.2018 um 13:41 hat Leonid Bloch geschrieben:
> On 07/30/2018 12:43 PM, Kevin Wolf wrote:
> > Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
> > > Signed-off-by: Leonid Bloch <lbloch@janustech.com>
> > > ---
> > >   block/qcow2.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/block/qcow2.c b/block/qcow2.c
> > > index ec9e6238a0..223d351e40 100644
> > > --- a/block/qcow2.c
> > > +++ b/block/qcow2.c
> > > @@ -3646,6 +3646,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> > >           }
> > >       }
> > > +    bs->total_sectors = offset / 512;
> > > +
> > >       /* write updated header.size */
> > >       offset = cpu_to_be64(offset);
> > >       ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
> > 
> > This shouldn't be necessary, bdrv_co_truncate() already updates
> > bs->total_sectors after calling the block driver.
> 
> It looks like without it the cache resize works only on the second resize.

Yes, and after reading the rest of the series, it makes sense to me
because qcow2_update_option() -> read_cache_sizes() uses
bs->total_sectors, so if we call that in qcow2_co_truncate(), we already
need to update the value early.

The comment should explain this connection because otherwise it's not
obvious when you read the code.

> > If this is needed by one of the following patches, we need a comment
> > that explains why this seemingly superfluous assignment is actually
> > necessary.
> > 
> > Also, 512 should be BDRV_SECTOR_SIZE.
> 
> I was surprised that it's not, but it's 512 also in two other places,
> including in qcow2_co_truncate itself. So I decided to keep that. Probably
> would be better if I'd repair it in the other places instead. :)

Yeah, or at least not introduce new places with a literal 512.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-07-30 12:13   ` Leonid Bloch
@ 2018-07-30 12:44     ` Kevin Wolf
  0 siblings, 0 replies; 29+ messages in thread
From: Kevin Wolf @ 2018-07-30 12:44 UTC (permalink / raw)
  To: Leonid Bloch; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, berto

Am 30.07.2018 um 14:13 hat Leonid Bloch geschrieben:
> On 07/30/2018 01:55 PM, Kevin Wolf wrote:
> > Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
> > > This series makes the qcow2 L2 cache cover the entire image by default.
> > > The importance of this change is in noticeable performance improvement,
> > > especially with heavy random I/O. The memory overhead is very small:
> > > only 1 MB of cache for every 8 GB of image size. On systems with very
> > > limited RAM the maximal cache size can be limited by the existing
> > > cache-size and l2-cache-size options.
> > > 
> > > The L2 cache is also resized accordingly, by default, if the image is
> > > resized.
> > 
> > I agree with changing the defaults, I would have proposed a change
> > myself soon. We have been offering cache size options for a long time,
> > and management tools are still ignoring them. So we need to do something
> > in QEMU.
> > 
> > Now, what the exact defaults should be, is something to use a bit more
> > thought on. You say "the memory overhead is very small", without going
> > into details. Let's check.
> > 
> > This is the formula from your patch (unit is bytes):
> > 
> >      uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> > 
> > So we get:
> > 
> >      64k clusters, 8G image:   1M (maximum covered by old default)
> >      64k clusters, 1T image: 128M
> >      1M clusters, 1T image:    8M
> >      512b clusters, 1T image: 16G
> > 
> > 1T is by far not the maximum size for a qcow2 image, and 16G memory
> > usage is definitely not "very small" overhead. Even the 128M for the
> > default cluster size are already a lot. So the simplistic approch of
> > this series won't work as a default.
> > 
> > Let's refine it a bit:
> > 
> > * Basing it on max_l2_cache sounds fine generally
> > * Cap it at 32 MB (?) by default
> 
> Yes! A great idea! Definitely necessary.
> 
> > * Enable cache-clean-interval=30 (?) by default to compensate a bit for
> >    the higher maximum memory usage
> 
> Do you think that we need this if we cap the cache at 32 MB? And that's only
> the cap. 256 GB+ images are not that often used. And compared to the overall
> QEMU overhead, of over 500 MB, extra 32 in the worst case is not that much,
> considering the performance gain.

Don't forget that if you take a few snapshots so you get a backing file
chain, every layer in the chain has its own cache. So capping at 32 MB
with 9 backing files still means that we use 320 MB.

And honestly, is there a real reason not to use cache-clean-interval by
default? Even if it doesn't help much in some cases, it should hardly
ever hurt. If the image was completely idle for 30 seconds (or whatever
number we choose), having to reload some 64k from the disk on the next
access shouldn't make a big difference.

> > Another thing I just noticed while looking at the code is that
> > cache-clean-interval only considers blocks that aren't dirty, but
> > doesn't take measures to get dirty blocks written out, so we depend on
> > the guest flushing the cache before we can get free the memory. Should
> > we attempt to write unused dirty entries back? Berto?
> > 
> > > The reasons for making this behavior default, unlike in the patches I
> > > have sent previously, are the following:
> > > 1) Unelegant complications with making an option to accept either a
> > >     size or a string, while outputting a proper error message.
> > > 2) More bulky logic to sort out what to do if the image is being resized
> > >     but the (defined) overall cache size is too small to contain the
> > >     new l2-cache-size.
> > > (Making this behavior default resolves all of these technical issues
> > > neatly)
> > 
> > The default must always correspond to some explicit setting. We can only
> > change defaults relatively freely because we can assume that if a user
> > cared about the setting, they would have specified it explicitly. If
> > it's not possible to specify a setting explicitly, we can't make this
> > assumption any more, so we'd be stuck with the default forever.
> > 
> > Now, considering what I wrote above, an alternate wouldn't be the best
> > way to represent this any more. We do have a cache size again (the 32 MB
> > cap) even while trying to cover the whole image.
> > 
> > Maybe the right interface with this in mind would be a boolean option
> > that specifies whether the given cache sizes are exact values (old
> > semantics) or maximum values, which are limited to what the actual
> > images size requires. If you do want the "real" full mode, you'd set
> > l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
> > better name). The new default would be l2-cache-size=32M,
> > exact-cache-sizes=false. The old default is cache-size=1M,
> > exact-cache-sizes=true.
> 
> The existing cache-size and l2-cache-size options are already documented as
> maximal values. It would make sense to actually make them as such: the
> caches will be as big as necessary to cover the entire image (no need for
> them to be more than that) but they will be capped by the current options,
> while the new default of l2-cache-size will be 32M. Why would one need
> exact-cache-sizes, if they would be MIN(needed, max)?
> Does it sound reasonable?

I think you have a point there.

There is one case where we need to cover more than the virtual disk
size, and that is the VM state for internal snapshots. These are
sequential accesses, though, so it doesn't matter for it much how many
L2 tables we have cached. And having to reload some tables after taking
or loading a snapshot sounds okay, too.

So essentially everything boils down to applying max_l2_cache even if a
L2 size was explicitly specified (which makes cache-size=INT64_MAX a
reasonable option), changing the default from cache-size=1M to
cache-size=32M (covers 256 GB with the default 64k clusters), and
enabling cache-clean-interval by default. Right?

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore Leonid Bloch
@ 2018-07-30 15:40   ` Eric Blake
  2018-07-30 15:43     ` Eric Blake
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Blake @ 2018-07-30 15:40 UTC (permalink / raw)
  To: Leonid Bloch, qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz

On 07/29/2018 04:27 PM, Leonid Bloch wrote:
> Adds /roms/vgabios to .gitignore - this directory appears after the
> build, and was not ignored by Git so far.
> 

Actually, that's the directories for one of the submodules. It seems 
fishy to have a directory listed in both .gitignore and .gitmodules at 
the same time, so I'm inclined to NACK this patch, and instead figure 
out what in the build is dirtying the directory when the submodule is 
not checked out.

> Signed-off-by: Leonid Bloch <lbloch@janustech.com>
> ---
>   .gitignore | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/.gitignore b/.gitignore
> index 5668d02782..2b3b30ae9f 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -212,3 +212,4 @@ trace-dtrace-root.dtrace
>   trace-ust-all.h
>   trace-ust-all.c
>   /target/arm/decode-sve.inc.c
> +/roms/vgabios
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore
  2018-07-30 15:40   ` Eric Blake
@ 2018-07-30 15:43     ` Eric Blake
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Blake @ 2018-07-30 15:43 UTC (permalink / raw)
  To: Leonid Bloch, qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz

On 07/30/2018 10:40 AM, Eric Blake wrote:
> On 07/29/2018 04:27 PM, Leonid Bloch wrote:
>> Adds /roms/vgabios to .gitignore - this directory appears after the
>> build, and was not ignored by Git so far.
>>
> 
> Actually, that's the directories for one of the submodules. It seems 
> fishy to have a directory listed in both .gitignore and .gitmodules at 
> the same time, so I'm inclined to NACK this patch, and instead figure 
> out what in the build is dirtying the directory when the submodule is 
> not checked out.

Or rather, it was a submodule until commit 91b8eba9. So maybe what's 
happening is that you previously had it checked out, and now that we no 
longer have the submodule, it shows up unless .gitconfig ignores it. For 
that, I'd recommend editing .git/info/exclude to ignore it locally (if 
you plan on frequently checking out points both before and after the 
deletion) rather than making it part of upstream (which tends to focus 
more on ignoring files built by the current contents of master, not 
files that are leftovers from earlier commits).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
                   ` (6 preceding siblings ...)
  2018-07-30 10:55 ` [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Kevin Wolf
@ 2018-08-03  7:38 ` Kevin Wolf
  2018-08-04 11:48   ` Leonid Bloch
  7 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2018-08-03  7:38 UTC (permalink / raw)
  To: Leonid Bloch; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake

Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
> This series makes the qcow2 L2 cache cover the entire image by default.
> The importance of this change is in noticeable performance improvement,
> especially with heavy random I/O. The memory overhead is very small:
> only 1 MB of cache for every 8 GB of image size. On systems with very
> limited RAM the maximal cache size can be limited by the existing
> cache-size and l2-cache-size options.
> 
> The L2 cache is also resized accordingly, by default, if the image is
> resized.

Are you going to send a new version that makes the changes we agreed on?

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 3/6 for-3.0] qcow2: Options' documentation fixes
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 3/6 for-3.0] qcow2: Options' documentation fixes Leonid Bloch
@ 2018-08-03 11:27   ` Alberto Garcia
  2018-08-04 11:52     ` Leonid Bloch
  0 siblings, 1 reply; 29+ messages in thread
From: Alberto Garcia @ 2018-08-03 11:27 UTC (permalink / raw)
  To: Leonid Bloch, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

On Sun 29 Jul 2018 11:27:41 PM CEST, Leonid Bloch wrote:
> Signed-off-by: Leonid Bloch <lbloch@janustech.com>
> ---
>  docs/qcow2-cache.txt |  9 ++++++---
>  qapi/block-core.json |  6 +++---
>  qemu-options.hx      | 15 +++++++++------
>  3 files changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> index 8a09a5cc5f..fd9a6911cc 100644
> --- a/docs/qcow2-cache.txt
> +++ b/docs/qcow2-cache.txt
> @@ -112,9 +112,9 @@ command-line, or the 'blockdev-add' QMP command.
>  
>  There are three options available, and all of them take bytes:
>  
> -"l2-cache-size":         maximum size of the L2 table cache
> -"refcount-cache-size":   maximum size of the refcount block cache
> -"cache-size":            maximum size of both caches combined
> +"l2-cache-size":         maximal size of the L2 table cache
> +"refcount-cache-size":   maximal size of the refcount block cache
> +"cache-size":            maximal size of both caches combined

Why is 'maximum' not clear in this context?

Otherwise the patch looks good to me.

Berto

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 4/6] qcow2: Update total_sectors when resizing the image
  2018-07-30 12:28       ` Kevin Wolf
@ 2018-08-03 11:56         ` Alberto Garcia
  0 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2018-08-03 11:56 UTC (permalink / raw)
  To: Kevin Wolf, Leonid Bloch; +Cc: qemu-devel, qemu-block, Max Reitz

On Mon 30 Jul 2018 02:28:28 PM CEST, Kevin Wolf wrote:
>> > Also, 512 should be BDRV_SECTOR_SIZE.
>> 
>> I was surprised that it's not, but it's 512 also in two other places,
>> including in qcow2_co_truncate itself. So I decided to keep
>> that. Probably would be better if I'd repair it in the other places
>> instead. :)
>
> Yeah, or at least not introduce new places with a literal 512.

I have a few patches already in my queue to replace these literals with
BDRV_SECTOR_SIZE, so no need to spend time with the existing cases, I
can take care of that.

For new cases I agree with Kevin that you should always use
BDRV_SECTOR_SIZE.

Berto

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 6/6] qcow2: Resize the cache upon image resizing
  2018-07-29 21:27 ` [Qemu-devel] [PATCH 6/6] qcow2: Resize the cache upon image resizing Leonid Bloch
@ 2018-08-03 12:42   ` Alberto Garcia
  2018-08-04 11:53     ` Leonid Bloch
  0 siblings, 1 reply; 29+ messages in thread
From: Alberto Garcia @ 2018-08-03 12:42 UTC (permalink / raw)
  To: Leonid Bloch, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

On Sun 29 Jul 2018 11:27:44 PM CEST, Leonid Bloch wrote:
> The caches are now recalculated upon image resizing. This is done
> because the new default behavior of assigning a sufficient L2 cache to
> cover the entire image implies that the cache will still be sufficient
> after image resizing. To put a limit on the cache, the options
> cache-size and l2-cache-size can be used.
>
> Signed-off-by: Leonid Bloch <lbloch@janustech.com>
> ---
>  block/qcow2.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 74f2cb10a4..06fac1bb8c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3656,6 +3656,12 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>      }
>  
>      s->l1_vm_state_index = new_l1_size;
> +    /* Update cache sizes */
> +    QDict *options = qdict_clone_shallow(bs->options);
> +    ret = qcow2_update_options(bs, options, s->flags, errp);
> +    if (ret < 0) {
> +        goto fail;
> +    }

Perhaps this could be merged with the "Update total_sectors when
resizing" patch. When you put both changes together it is quite clear
why you need to update bs->total_sectors, and doing it is only necessary
if you want to do this afterwards.

Berto

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-07-30 10:55 ` [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Kevin Wolf
  2018-07-30 12:13   ` Leonid Bloch
@ 2018-08-03 13:37   ` Alberto Garcia
  2018-08-03 14:55     ` Kevin Wolf
  1 sibling, 1 reply; 29+ messages in thread
From: Alberto Garcia @ 2018-08-03 13:37 UTC (permalink / raw)
  To: Kevin Wolf, Leonid Bloch; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake

On Mon 30 Jul 2018 12:55:22 PM CEST, Kevin Wolf wrote:
> I agree with changing the defaults, I would have proposed a change
> myself soon. We have been offering cache size options for a long time,
> and management tools are still ignoring them. So we need to do
> something in QEMU.

Indeed, there's been a bit of discussion in the mailing list and here:

   https://bugzilla.redhat.com/show_bug.cgi?id=1377735

There are some proposals but I haven't seen one that looks evidently
like the best choice yet.

> Now, what the exact defaults should be, is something to use a bit more
> thought on. You say "the memory overhead is very small", without going
> into details. Let's check.

Thanks for showing these numbers, Kevin. The memory overhead is
definitely not very small, and as a matter of fact a significant part of
the work that I've been doing in the past couple of years has the goal
of reducing that memory overhead, which is very significant if you have
large images and long backing chains.

> * Cap it at 32 MB (?) by default
> * Enable cache-clean-interval=30 (?) by default to compensate a bit for
>   the higher maximum memory usage

I think this seems like a reasonable approach. One question that I was
asked already was "Why is cache-clean-interval not enabled by default?".

I don't think the performance impact is a problem (unless the interval
is too low of course), the only thing that I could think of is that it
could make the memory usage more unpredictable.

> Another thing I just noticed while looking at the code is that
> cache-clean-interval only considers blocks that aren't dirty, but
> doesn't take measures to get dirty blocks written out, so we depend on
> the guest flushing the cache before we can get free the memory. Should
> we attempt to write unused dirty entries back? Berto?

I never thought about it, but sounds like worth exploring.

> Maybe the right interface with this in mind would be a boolean option
> that specifies whether the given cache sizes are exact values (old
> semantics) or maximum values, which are limited to what the actual
> images size requires. If you do want the "real" full mode, you'd set
> l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
> better name). The new default would be l2-cache-size=32M,
> exact-cache-sizes=false. The old default is cache-size=1M,
> exact-cache-sizes=true.

That sounds quite complicated ... plus with the current ("exact values")
semantics l2-cache-size already represents a "maximum" since cache
entries are only filled on demand. That is, you can set up a 128MB L2
cache but most of that memory won't be used unless you fill it up first.

So, if you have an 8GB qcow2 image, having a 1MB L2 cache (enough for
the whole image) or a 100MB one shouldn't be very different, because in
the latter case only the first MB is going to be used in practice.

The only actual difference is the overhead of having larger data
structures (Qcow2Cache.entries and Qcow2Cache.table_array).

And thinking about this, perhaps this could be the simplest approach of
them all: let the user pass any value to l2-cache-size but then in
read_cache_sizes() do something like

    if (l2_cache_size_set && *l2_cache_size > max_l2_cache) {
        *l2_cache_size = max_l2_cache;
    }

Then if the image is resized you can recalculate this.

This way the user can make l2-cache-size the hard maximum that they ever
are willing to use for an image's L2 cache, and QEMU guarantees that it
won't allocate that much memory if it can cover the whole image with
less.

How does that sound?

Berto

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-08-03 13:37   ` Alberto Garcia
@ 2018-08-03 14:55     ` Kevin Wolf
  2018-08-06  7:47       ` Alberto Garcia
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2018-08-03 14:55 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Leonid Bloch, qemu-devel, qemu-block, Max Reitz, Eric Blake

Am 03.08.2018 um 15:37 hat Alberto Garcia geschrieben:
> On Mon 30 Jul 2018 12:55:22 PM CEST, Kevin Wolf wrote:
> > I agree with changing the defaults, I would have proposed a change
> > myself soon. We have been offering cache size options for a long time,
> > and management tools are still ignoring them. So we need to do
> > something in QEMU.
> 
> Indeed, there's been a bit of discussion in the mailing list and here:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=1377735
> 
> There are some proposals but I haven't seen one that looks evidently
> like the best choice yet.
> 
> > Now, what the exact defaults should be, is something to use a bit more
> > thought on. You say "the memory overhead is very small", without going
> > into details. Let's check.
> 
> Thanks for showing these numbers, Kevin. The memory overhead is
> definitely not very small, and as a matter of fact a significant part of
> the work that I've been doing in the past couple of years has the goal
> of reducing that memory overhead, which is very significant if you have
> large images and long backing chains.

The actual overhead is probably quite small in the common cases, we just
must be careful that corner cases don't completely break.

By the way, weren't you working on subclusters a while ago? How did that
go? Because I think those would enable us to use larger cluster sizes
and therefore reduce the metadata sizes as well.

> > * Cap it at 32 MB (?) by default
> > * Enable cache-clean-interval=30 (?) by default to compensate a bit for
> >   the higher maximum memory usage
> 
> I think this seems like a reasonable approach. One question that I was
> asked already was "Why is cache-clean-interval not enabled by default?".
> 
> I don't think the performance impact is a problem (unless the interval
> is too low of course), the only thing that I could think of is that it
> could make the memory usage more unpredictable.

Yes, but I think being more unpredictable is better than being
constantly wrong. Always using more cache just because it would be more
predictable sounds silly, and using too little cache results in bad
performance. So something adaptive seems right to me.

In fact, we already have significant additonal memory use under high I/O
load because we allocate many coroutines with a 1 MB stack each. Caches
that also grow a bit under load should be bearable, I hope.

In the end, we only have a choice between bad performance for random I/O
across the whole disk or throwing more memory at the problem. Let's
choose something that performs good by default and if someone wants to
save memory, they can still set the options.

> > Another thing I just noticed while looking at the code is that
> > cache-clean-interval only considers blocks that aren't dirty, but
> > doesn't take measures to get dirty blocks written out, so we depend on
> > the guest flushing the cache before we can get free the memory. Should
> > we attempt to write unused dirty entries back? Berto?
> 
> I never thought about it, but sounds like worth exploring.

Are you planning to do that or should I add it to my own todo list?

> > Maybe the right interface with this in mind would be a boolean option
> > that specifies whether the given cache sizes are exact values (old
> > semantics) or maximum values, which are limited to what the actual
> > images size requires. If you do want the "real" full mode, you'd set
> > l2-cache-size=INT64_MAX and exact-cache-sizes=false (this could use a
> > better name). The new default would be l2-cache-size=32M,
> > exact-cache-sizes=false. The old default is cache-size=1M,
> > exact-cache-sizes=true.
> 
> That sounds quite complicated ... plus with the current ("exact values")
> semantics l2-cache-size already represents a "maximum" since cache
> entries are only filled on demand. That is, you can set up a 128MB L2
> cache but most of that memory won't be used unless you fill it up first.
> 
> So, if you have an 8GB qcow2 image, having a 1MB L2 cache (enough for
> the whole image) or a 100MB one shouldn't be very different, because in
> the latter case only the first MB is going to be used in practice.
> 
> The only actual difference is the overhead of having larger data
> structures (Qcow2Cache.entries and Qcow2Cache.table_array).
> 
> And thinking about this, perhaps this could be the simplest approach of
> them all: let the user pass any value to l2-cache-size but then in
> read_cache_sizes() do something like
> 
>     if (l2_cache_size_set && *l2_cache_size > max_l2_cache) {
>         *l2_cache_size = max_l2_cache;
>     }
> 
> Then if the image is resized you can recalculate this.
> 
> This way the user can make l2-cache-size the hard maximum that they ever
> are willing to use for an image's L2 cache, and QEMU guarantees that it
> won't allocate that much memory if it can cover the whole image with
> less.
> 
> How does that sound?

Yes. When you read the rest of the thread, you'll see that Leonid
suggested the same and I agree. And if someone really wants cover the
full image no matter how much memory it requires, they can just pass
INT64_MAX and get the right thing.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-08-03  7:38 ` Kevin Wolf
@ 2018-08-04 11:48   ` Leonid Bloch
  0 siblings, 0 replies; 29+ messages in thread
From: Leonid Bloch @ 2018-08-04 11:48 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake

On 08/03/2018 10:38 AM, Kevin Wolf wrote:
> Am 29.07.2018 um 23:27 hat Leonid Bloch geschrieben:
>> This series makes the qcow2 L2 cache cover the entire image by default.
>> The importance of this change is in noticeable performance improvement,
>> especially with heavy random I/O. The memory overhead is very small:
>> only 1 MB of cache for every 8 GB of image size. On systems with very
>> limited RAM the maximal cache size can be limited by the existing
>> cache-size and l2-cache-size options.
>>
>> The L2 cache is also resized accordingly, by default, if the image is
>> resized.
> 
> Are you going to send a new version that makes the changes we agreed on?

Yes, sure. Was somewhat busy. Will send today or tomorrow.

Thanks for reminding.
Leonid.

> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 3/6 for-3.0] qcow2: Options' documentation fixes
  2018-08-03 11:27   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
@ 2018-08-04 11:52     ` Leonid Bloch
  0 siblings, 0 replies; 29+ messages in thread
From: Leonid Bloch @ 2018-08-04 11:52 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

On 08/03/2018 02:27 PM, Alberto Garcia wrote:
> On Sun 29 Jul 2018 11:27:41 PM CEST, Leonid Bloch wrote:
>> Signed-off-by: Leonid Bloch <lbloch@janustech.com>
>> ---
>>   docs/qcow2-cache.txt |  9 ++++++---
>>   qapi/block-core.json |  6 +++---
>>   qemu-options.hx      | 15 +++++++++------
>>   3 files changed, 18 insertions(+), 12 deletions(-)
>>
>> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
>> index 8a09a5cc5f..fd9a6911cc 100644
>> --- a/docs/qcow2-cache.txt
>> +++ b/docs/qcow2-cache.txt
>> @@ -112,9 +112,9 @@ command-line, or the 'blockdev-add' QMP command.
>>   
>>   There are three options available, and all of them take bytes:
>>   
>> -"l2-cache-size":         maximum size of the L2 table cache
>> -"refcount-cache-size":   maximum size of the refcount block cache
>> -"cache-size":            maximum size of both caches combined
>> +"l2-cache-size":         maximal size of the L2 table cache
>> +"refcount-cache-size":   maximal size of the refcount block cache
>> +"cache-size":            maximal size of both caches combined
> 
> Why is 'maximum' not clear in this context?

It seemed to me that it is grammatically incorrect, because maximum is a 
noun and maximal is an adjective. But after a quick check now I see that 
maximum can be used as an adjective as well, so indeed this change is 
not needed. Will update in the next version.

Thanks,
Leonid.

> 
> Otherwise the patch looks good to me.
> 
> Berto
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 6/6] qcow2: Resize the cache upon image resizing
  2018-08-03 12:42   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
@ 2018-08-04 11:53     ` Leonid Bloch
  0 siblings, 0 replies; 29+ messages in thread
From: Leonid Bloch @ 2018-08-04 11:53 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

On 08/03/2018 03:42 PM, Alberto Garcia wrote:
> On Sun 29 Jul 2018 11:27:44 PM CEST, Leonid Bloch wrote:
>> The caches are now recalculated upon image resizing. This is done
>> because the new default behavior of assigning a sufficient L2 cache to
>> cover the entire image implies that the cache will still be sufficient
>> after image resizing. To put a limit on the cache, the options
>> cache-size and l2-cache-size can be used.
>>
>> Signed-off-by: Leonid Bloch <lbloch@janustech.com>
>> ---
>>   block/qcow2.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 74f2cb10a4..06fac1bb8c 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -3656,6 +3656,12 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>       }
>>   
>>       s->l1_vm_state_index = new_l1_size;
>> +    /* Update cache sizes */
>> +    QDict *options = qdict_clone_shallow(bs->options);
>> +    ret = qcow2_update_options(bs, options, s->flags, errp);
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
> 
> Perhaps this could be merged with the "Update total_sectors when
> resizing" patch. When you put both changes together it is quite clear
> why you need to update bs->total_sectors, and doing it is only necessary
> if you want to do this afterwards.

Indeed. I agree.

Leonid.

> 
> Berto
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-08-03 14:55     ` Kevin Wolf
@ 2018-08-06  7:47       ` Alberto Garcia
  2018-08-06 10:45         ` Kevin Wolf
  0 siblings, 1 reply; 29+ messages in thread
From: Alberto Garcia @ 2018-08-06  7:47 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Leonid Bloch, qemu-devel, qemu-block, Max Reitz, Eric Blake

On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote:
> By the way, weren't you working on subclusters a while ago? How did
> that go? Because I think those would enable us to use larger cluster
> sizes and therefore reduce the metadata sizes as well.

I had a working prototype, but the changes to both the code and the
on-disk format were not trivial. I would need to re-evaluate its
performance impact after all the changes that we have had since then,
and then see if it's worth trying again.

I suppose that the main benefit of having subclusters is that
allocations are much faster. Everything else remains more or less the
same, and in particular you can already use larger clusters if you want
to reduce the metadata sizes. Plus, with the l2-cache-entry-size option
you can already solve some of the problems of having large clusters.

>> > Another thing I just noticed while looking at the code is that
>> > cache-clean-interval only considers blocks that aren't dirty, but
>> > doesn't take measures to get dirty blocks written out, so we depend
>> > on the guest flushing the cache before we can get free the
>> > memory. Should we attempt to write unused dirty entries back?
>> > Berto?
>> 
>> I never thought about it, but sounds like worth exploring.
>
> Are you planning to do that or should I add it to my own todo list?

I can do that.

>> And thinking about this, perhaps this could be the simplest approach of
>> them all: let the user pass any value to l2-cache-size but then in
>> read_cache_sizes() do something like
>> 
>>     if (l2_cache_size_set && *l2_cache_size > max_l2_cache) {
>>         *l2_cache_size = max_l2_cache;
>>     }
>> 
>> Then if the image is resized you can recalculate this.
>> 
>> This way the user can make l2-cache-size the hard maximum that they ever
>> are willing to use for an image's L2 cache, and QEMU guarantees that it
>> won't allocate that much memory if it can cover the whole image with
>> less.
>> 
>> How does that sound?
>
> Yes. When you read the rest of the thread, you'll see that Leonid
> suggested the same and I agree. And if someone really wants cover the
> full image no matter how much memory it requires, they can just pass
> INT64_MAX and get the right thing.

I wonder why it took so long to think about this, it sounds rather
simple when you think about it...

Berto

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-08-06  7:47       ` Alberto Garcia
@ 2018-08-06 10:45         ` Kevin Wolf
  2018-08-06 11:07           ` Alberto Garcia
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2018-08-06 10:45 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Leonid Bloch, qemu-devel, qemu-block, Max Reitz, Eric Blake

Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben:
> On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote:
> > By the way, weren't you working on subclusters a while ago? How did
> > that go? Because I think those would enable us to use larger cluster
> > sizes and therefore reduce the metadata sizes as well.
> 
> I had a working prototype, but the changes to both the code and the
> on-disk format were not trivial. I would need to re-evaluate its
> performance impact after all the changes that we have had since then,
> and then see if it's worth trying again.
> 
> I suppose that the main benefit of having subclusters is that
> allocations are much faster. Everything else remains more or less the
> same, and in particular you can already use larger clusters if you want
> to reduce the metadata sizes. Plus, with the l2-cache-entry-size option
> you can already solve some of the problems of having large clusters.

Yes, indeed, subclusters are about making COW less painful (or getting
rid of it altogether). Doing COW for full 2 MB when the guest updates 4k
is just a bit over the top and I think it slows down initial writes
seriously. I haven't benchmarked things in a while, though.

While reasonable cache settings and potentially also avoiding
fragmentation are probably more important overall, I don't think we can
completely ignore initial writes. They are part of the cost of
snapshots, they are what people are seeing first and also what
benchmarks generally show.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-08-06 10:45         ` Kevin Wolf
@ 2018-08-06 11:07           ` Alberto Garcia
  2018-08-06 11:30             ` Kevin Wolf
  0 siblings, 1 reply; 29+ messages in thread
From: Alberto Garcia @ 2018-08-06 11:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Leonid Bloch, qemu-devel, qemu-block, Max Reitz, Eric Blake

On Mon 06 Aug 2018 12:45:20 PM CEST, Kevin Wolf wrote:
> Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben:
>> On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote:
>> > By the way, weren't you working on subclusters a while ago? How did
>> > that go? Because I think those would enable us to use larger
>> > cluster sizes and therefore reduce the metadata sizes as well.
>> 
>> I had a working prototype, but the changes to both the code and the
>> on-disk format were not trivial. I would need to re-evaluate its
>> performance impact after all the changes that we have had since then,
>> and then see if it's worth trying again.
>> 
>> I suppose that the main benefit of having subclusters is that
>> allocations are much faster. Everything else remains more or less the
>> same, and in particular you can already use larger clusters if you
>> want to reduce the metadata sizes. Plus, with the l2-cache-entry-size
>> option you can already solve some of the problems of having large
>> clusters.
>
> Yes, indeed, subclusters are about making COW less painful (or getting
> rid of it altogether). Doing COW for full 2 MB when the guest updates
> 4k is just a bit over the top and I think it slows down initial writes
> seriously. I haven't benchmarked things in a while, though.

Me neither, I think the most recent ones that I have are from last year:

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01033.html

Since then we changed the cow algorithm to do only 2 operations instead
of 5, so that can affect the best case scenario.

> While reasonable cache settings and potentially also avoiding
> fragmentation are probably more important overall, I don't think we
> can completely ignore initial writes. They are part of the cost of
> snapshots, they are what people are seeing first and also what
> benchmarks generally show.

If I have some time I could try to test the patches again on top of QEMU
3.0 and see what happens.

Berto

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-08-06 11:07           ` Alberto Garcia
@ 2018-08-06 11:30             ` Kevin Wolf
  2018-08-06 11:41               ` Alberto Garcia
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2018-08-06 11:30 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Leonid Bloch, qemu-devel, qemu-block, Max Reitz, Eric Blake

Am 06.08.2018 um 13:07 hat Alberto Garcia geschrieben:
> On Mon 06 Aug 2018 12:45:20 PM CEST, Kevin Wolf wrote:
> > Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben:
> >> On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote:
> >> > By the way, weren't you working on subclusters a while ago? How did
> >> > that go? Because I think those would enable us to use larger
> >> > cluster sizes and therefore reduce the metadata sizes as well.
> >> 
> >> I had a working prototype, but the changes to both the code and the
> >> on-disk format were not trivial. I would need to re-evaluate its
> >> performance impact after all the changes that we have had since then,
> >> and then see if it's worth trying again.
> >> 
> >> I suppose that the main benefit of having subclusters is that
> >> allocations are much faster. Everything else remains more or less the
> >> same, and in particular you can already use larger clusters if you
> >> want to reduce the metadata sizes. Plus, with the l2-cache-entry-size
> >> option you can already solve some of the problems of having large
> >> clusters.
> >
> > Yes, indeed, subclusters are about making COW less painful (or getting
> > rid of it altogether). Doing COW for full 2 MB when the guest updates
> > 4k is just a bit over the top and I think it slows down initial writes
> > seriously. I haven't benchmarked things in a while, though.
> 
> Me neither, I think the most recent ones that I have are from last year:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01033.html
> 
> Since then we changed the cow algorithm to do only 2 operations instead
> of 5, so that can affect the best case scenario.

I can't find the numbers from the COW request merging any more, but I
remember that when I tested the approach, the result was something like
this: Merging the write requests pretty much got rid of any measurable
COW overhead without a backing file, but as soon as you have to read
from the backing file, the saved work isn't very significant any more
and the cost of the read dominates.

So when you test subclusters, be sure to not only test an empty
standalone qcow2 image, but also a newly taken snapshot that has to
COW from the backing file a lot.

> > While reasonable cache settings and potentially also avoiding
> > fragmentation are probably more important overall, I don't think we
> > can completely ignore initial writes. They are part of the cost of
> > snapshots, they are what people are seeing first and also what
> > benchmarks generally show.
> 
> If I have some time I could try to test the patches again on top of
> QEMU 3.0 and see what happens.

That sounds good.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default
  2018-08-06 11:30             ` Kevin Wolf
@ 2018-08-06 11:41               ` Alberto Garcia
  0 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2018-08-06 11:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Leonid Bloch, qemu-devel, qemu-block, Max Reitz, Eric Blake

On Mon 06 Aug 2018 01:30:33 PM CEST, Kevin Wolf wrote:
> Am 06.08.2018 um 13:07 hat Alberto Garcia geschrieben:
>> On Mon 06 Aug 2018 12:45:20 PM CEST, Kevin Wolf wrote:
>> > Am 06.08.2018 um 09:47 hat Alberto Garcia geschrieben:
>> >> On Fri 03 Aug 2018 04:55:42 PM CEST, Kevin Wolf wrote:
>> >> > By the way, weren't you working on subclusters a while ago? How did
>> >> > that go? Because I think those would enable us to use larger
>> >> > cluster sizes and therefore reduce the metadata sizes as well.
>> >> 
>> >> I had a working prototype, but the changes to both the code and the
>> >> on-disk format were not trivial. I would need to re-evaluate its
>> >> performance impact after all the changes that we have had since then,
>> >> and then see if it's worth trying again.
>> >> 
>> >> I suppose that the main benefit of having subclusters is that
>> >> allocations are much faster. Everything else remains more or less the
>> >> same, and in particular you can already use larger clusters if you
>> >> want to reduce the metadata sizes. Plus, with the l2-cache-entry-size
>> >> option you can already solve some of the problems of having large
>> >> clusters.
>> >
>> > Yes, indeed, subclusters are about making COW less painful (or getting
>> > rid of it altogether). Doing COW for full 2 MB when the guest updates
>> > 4k is just a bit over the top and I think it slows down initial writes
>> > seriously. I haven't benchmarked things in a while, though.
>> 
>> Me neither, I think the most recent ones that I have are from last year:
>> 
>> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01033.html
>> 
>> Since then we changed the cow algorithm to do only 2 operations instead
>> of 5, so that can affect the best case scenario.
>
> I can't find the numbers from the COW request merging any more, but I
> remember that when I tested the approach, the result was something
> like this: Merging the write requests pretty much got rid of any
> measurable COW overhead without a backing file, but as soon as you
> have to read from the backing file, the saved work isn't very
> significant any more and the cost of the read dominates.
>
> So when you test subclusters, be sure to not only test an empty
> standalone qcow2 image, but also a newly taken snapshot that has to
> COW from the backing file a lot.

Yes, that's the use case that I had in mind too.

Berto

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-08-06 11:41 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-29 21:27 [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Leonid Bloch
2018-07-29 21:27 ` [Qemu-devel] [PATCH 1/6 for-3.0] Update .gitignore Leonid Bloch
2018-07-30 15:40   ` Eric Blake
2018-07-30 15:43     ` Eric Blake
2018-07-29 21:27 ` [Qemu-devel] [PATCH 2/6 for-3.0] qcow2: A grammar fix in conflicting cache sizing error message Leonid Bloch
2018-07-29 21:27 ` [Qemu-devel] [PATCH 3/6 for-3.0] qcow2: Options' documentation fixes Leonid Bloch
2018-08-03 11:27   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2018-08-04 11:52     ` Leonid Bloch
2018-07-29 21:27 ` [Qemu-devel] [PATCH 4/6] qcow2: Update total_sectors when resizing the image Leonid Bloch
2018-07-30  9:43   ` Kevin Wolf
2018-07-30 11:41     ` Leonid Bloch
2018-07-30 12:28       ` Kevin Wolf
2018-08-03 11:56         ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2018-07-29 21:27 ` [Qemu-devel] [PATCH 5/6] qcow2: Make the default L2 cache sufficient to cover the entire image Leonid Bloch
2018-07-29 21:27 ` [Qemu-devel] [PATCH 6/6] qcow2: Resize the cache upon image resizing Leonid Bloch
2018-08-03 12:42   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2018-08-04 11:53     ` Leonid Bloch
2018-07-30 10:55 ` [Qemu-devel] [PATCH 0/6] qcow2: Make the L2 cache cover the whole image by default Kevin Wolf
2018-07-30 12:13   ` Leonid Bloch
2018-07-30 12:44     ` Kevin Wolf
2018-08-03 13:37   ` Alberto Garcia
2018-08-03 14:55     ` Kevin Wolf
2018-08-06  7:47       ` Alberto Garcia
2018-08-06 10:45         ` Kevin Wolf
2018-08-06 11:07           ` Alberto Garcia
2018-08-06 11:30             ` Kevin Wolf
2018-08-06 11:41               ` Alberto Garcia
2018-08-03  7:38 ` Kevin Wolf
2018-08-04 11:48   ` Leonid Bloch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.