qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size
@ 2019-08-16 12:17 Alberto Garcia
  2019-08-16 12:41 ` Alberto Garcia
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Alberto Garcia @ 2019-08-16 12:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Leonid Bloch, Alberto Garcia, qemu-block, Max Reitz

The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
larger than the maximum amount of L2 metadata that the image can have.
For example: with 64 KB clusters the user would need a qcow2 image
with a virtual size of 256 GB in order to have 32 MB of L2 metadata.

Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
we forbid the L2 cache to become larger than the maximum amount of L2
metadata for the image, calculated using this formula:

    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);

The problem with this formula is that the result should be rounded up
to the cluster size because an L2 table on disk always takes one full
cluster.

For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
the last 32 KB of those are not going to be used.

However QEMU rounds the numbers down and only creates 2 cache tables
(128 KB), which is not enough for the image.

A quick test doing 4KB random writes on a 1280 MB image gives me
around 500 IOPS, while with the correct cache size I get 16K IOPS.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 039bdc2f7e..865839682c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -826,7 +826,11 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     bool l2_cache_entry_size_set;
     int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
     uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
-    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
+    uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
+    /* An L2 table is always one cluster in size so the max cache size
+     * should be a multiple of the cluster size. */
+    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
+                                     s->cluster_size);
 
     combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
     l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size
  2019-08-16 12:17 [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size Alberto Garcia
@ 2019-08-16 12:41 ` Alberto Garcia
  2019-08-16 12:59 ` Kevin Wolf
  2019-08-18 10:17 ` Leonid Bloch
  2 siblings, 0 replies; 7+ messages in thread
From: Alberto Garcia @ 2019-08-16 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Leonid Bloch, qemu-stable, qemu-block, Max Reitz

Cc qemu-stable

This bug means that under certain conditions it's impossible to
create a cache large enough for the image, resulting in reduced I/O
performance.

On Fri, Aug 16, 2019 at 03:17:42PM +0300, Alberto Garcia wrote:
> The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
> larger than the maximum amount of L2 metadata that the image can have.
> For example: with 64 KB clusters the user would need a qcow2 image
> with a virtual size of 256 GB in order to have 32 MB of L2 metadata.
> 
> Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
> we forbid the L2 cache to become larger than the maximum amount of L2
> metadata for the image, calculated using this formula:
> 
>     uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> 
> The problem with this formula is that the result should be rounded up
> to the cluster size because an L2 table on disk always takes one full
> cluster.
> 
> For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
> 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
> the last 32 KB of those are not going to be used.
> 
> However QEMU rounds the numbers down and only creates 2 cache tables
> (128 KB), which is not enough for the image.
> 
> A quick test doing 4KB random writes on a 1280 MB image gives me
> around 500 IOPS, while with the correct cache size I get 16K IOPS.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 039bdc2f7e..865839682c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -826,7 +826,11 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
>      bool l2_cache_entry_size_set;
>      int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>      uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
> -    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> +    uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
> +    /* An L2 table is always one cluster in size so the max cache size
> +     * should be a multiple of the cluster size. */
> +    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
> +                                     s->cluster_size);
>  
>      combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
>      l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size
  2019-08-16 12:17 [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size Alberto Garcia
  2019-08-16 12:41 ` Alberto Garcia
@ 2019-08-16 12:59 ` Kevin Wolf
  2019-08-16 13:30   ` Alberto Garcia
  2019-08-18 10:17 ` Leonid Bloch
  2 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2019-08-16 12:59 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Leonid Bloch, Max Reitz, qemu-devel, qemu-block, qemu-stable

Am 16.08.2019 um 14:17 hat Alberto Garcia geschrieben:
> The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
> larger than the maximum amount of L2 metadata that the image can have.
> For example: with 64 KB clusters the user would need a qcow2 image
> with a virtual size of 256 GB in order to have 32 MB of L2 metadata.
> 
> Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
> we forbid the L2 cache to become larger than the maximum amount of L2
> metadata for the image, calculated using this formula:
> 
>     uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> 
> The problem with this formula is that the result should be rounded up
> to the cluster size because an L2 table on disk always takes one full
> cluster.
> 
> For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
> 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
> the last 32 KB of those are not going to be used.
> 
> However QEMU rounds the numbers down and only creates 2 cache tables
> (128 KB), which is not enough for the image.
> 
> A quick test doing 4KB random writes on a 1280 MB image gives me
> around 500 IOPS, while with the correct cache size I get 16K IOPS.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

Hm, this is bad. :-(

The requirement so that this bug doesn't affect the user seems to be
that the image size is a multiple of 64k * 8k = 512 MB. Which means that
users are probably often lucky enough in practice.

I'll Cc: qemu-stable anyway.

Thanks, applied to the block branch.

Kevin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size
  2019-08-16 12:59 ` Kevin Wolf
@ 2019-08-16 13:30   ` Alberto Garcia
  2019-08-16 14:08     ` Kevin Wolf
  0 siblings, 1 reply; 7+ messages in thread
From: Alberto Garcia @ 2019-08-16 13:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Leonid Bloch, Max Reitz, qemu-devel, qemu-block, qemu-stable

On Fri 16 Aug 2019 02:59:21 PM CEST, Kevin Wolf wrote:
> The requirement so that this bug doesn't affect the user seems to be
> that the image size is a multiple of 64k * 8k = 512 MB. Which means
> that users are probably often lucky enough in practice.

Or rather: cluster_size^2 / 8, which, if my numbers are right:

|--------------+-------------|
| Cluster size | Multiple of |
|--------------+-------------|
|         4 KB |        2 MB |
|         8 KB |        8 MB |
|        16 KB |       32 MB |
|        32 KB |      128 MB |
|        64 KB |      512 MB |
|       128 KB |        2 GB |
|       256 KB |        8 GB |
|       512 KB |       32 GB |
|      1024 KB |      128 GB |
|      2048 KB |      512 GB |
|--------------+-------------|

It get trickier with larger clusters, but if you have a larger cluster
size you probably have a very large image anyway, so yes I also think
that users are probably lucky enough in practice.

(also, the number of cache tables is always >= 2, so if the image size
is less than twice those numbers then it's also safe)

And yes, the odd value on the 512KB row on that we discussed last month
is due to this same bug:

https://lists.gnu.org/archive/html/qemu-block/2019-07/msg00496.html

Berto


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size
  2019-08-16 13:30   ` Alberto Garcia
@ 2019-08-16 14:08     ` Kevin Wolf
  2019-08-16 14:25       ` Alberto Garcia
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2019-08-16 14:08 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Leonid Bloch, Max Reitz, qemu-devel, qemu-block, qemu-stable

Am 16.08.2019 um 15:30 hat Alberto Garcia geschrieben:
> On Fri 16 Aug 2019 02:59:21 PM CEST, Kevin Wolf wrote:
> > The requirement so that this bug doesn't affect the user seems to be
> > that the image size is a multiple of 64k * 8k = 512 MB. Which means
> > that users are probably often lucky enough in practice.
> 
> Or rather: cluster_size^2 / 8, which, if my numbers are right:
> 
> |--------------+-------------|
> | Cluster size | Multiple of |
> |--------------+-------------|
> |         4 KB |        2 MB |
> |         8 KB |        8 MB |
> |        16 KB |       32 MB |
> |        32 KB |      128 MB |
> |        64 KB |      512 MB |
> |       128 KB |        2 GB |
> |       256 KB |        8 GB |
> |       512 KB |       32 GB |
> |      1024 KB |      128 GB |
> |      2048 KB |      512 GB |
> |--------------+-------------|
> 
> It get trickier with larger clusters, but if you have a larger cluster
> size you probably have a very large image anyway, so yes I also think
> that users are probably lucky enough in practice.

Yes, I assumed 64k clusters.

The other somewhat popular cluster size is probably 2 MB, where I think
images sizes that are not a multiple of 512 GB are rather likely...

> (also, the number of cache tables is always >= 2, so if the image size
> is less than twice those numbers then it's also safe)

Right. I already corrected my statement to include > 1024 MB in the Red
Hat Bugzilla (but still didn't consider other cluster sizes).

> And yes, the odd value on the 512KB row on that we discussed last month
> is due to this same bug:
> 
> https://lists.gnu.org/archive/html/qemu-block/2019-07/msg00496.html

Hm... And suddently it makes sense. :-)

So I assume all of 512k/1024k/2048k actually perform better? Or is the
effect neglegible for 1024k/2048k?

Kevin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size
  2019-08-16 14:08     ` Kevin Wolf
@ 2019-08-16 14:25       ` Alberto Garcia
  0 siblings, 0 replies; 7+ messages in thread
From: Alberto Garcia @ 2019-08-16 14:25 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Leonid Bloch, Max Reitz, qemu-devel, qemu-block, qemu-stable

On Fri 16 Aug 2019 04:08:19 PM CEST, Kevin Wolf wrote:
>> And yes, the odd value on the 512KB row on that we discussed last month
>> is due to this same bug:
>> 
>> https://lists.gnu.org/archive/html/qemu-block/2019-07/msg00496.html
>
> Hm... And suddently it makes sense. :-)
>
> So I assume all of 512k/1024k/2048k actually perform better? Or is the
> effect neglegible for 1024k/2048k?

The 512K case is the only one that performs better, my test image was
too small (40 GB) for the other cases.

Berto


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size
  2019-08-16 12:17 [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size Alberto Garcia
  2019-08-16 12:41 ` Alberto Garcia
  2019-08-16 12:59 ` Kevin Wolf
@ 2019-08-18 10:17 ` Leonid Bloch
  2 siblings, 0 replies; 7+ messages in thread
From: Leonid Bloch @ 2019-08-18 10:17 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

Thanks, Berto!

On 8/16/19 3:17 PM, Alberto Garcia wrote:
> The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
> larger than the maximum amount of L2 metadata that the image can have.
> For example: with 64 KB clusters the user would need a qcow2 image
> with a virtual size of 256 GB in order to have 32 MB of L2 metadata.
> 
> Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
> we forbid the L2 cache to become larger than the maximum amount of L2
> metadata for the image, calculated using this formula:
> 
>      uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> 
> The problem with this formula is that the result should be rounded up
> to the cluster size because an L2 table on disk always takes one full
> cluster.
> 
> For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
> 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
> the last 32 KB of those are not going to be used.
> 
> However QEMU rounds the numbers down and only creates 2 cache tables
> (128 KB), which is not enough for the image.
> 
> A quick test doing 4KB random writes on a 1280 MB image gives me
> around 500 IOPS, while with the correct cache size I get 16K IOPS.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 039bdc2f7e..865839682c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -826,7 +826,11 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
>       bool l2_cache_entry_size_set;
>       int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>       uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
> -    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> +    uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
> +    /* An L2 table is always one cluster in size so the max cache size
> +     * should be a multiple of the cluster size. */
> +    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
> +                                     s->cluster_size);
>   
>       combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
>       l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-18 10:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 12:17 [Qemu-devel] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size Alberto Garcia
2019-08-16 12:41 ` Alberto Garcia
2019-08-16 12:59 ` Kevin Wolf
2019-08-16 13:30   ` Alberto Garcia
2019-08-16 14:08     ` Kevin Wolf
2019-08-16 14:25       ` Alberto Garcia
2019-08-18 10:17 ` Leonid Bloch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).