All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
@ 2017-08-18 13:31 Stefan Hajnoczi
  2017-08-18 15:18 ` Eric Blake
  2017-08-19  1:01 ` Alexey Kardashevskiy
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2017-08-18 13:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Alexey Kardashevskiy, Kevin Wolf, Stefan Hajnoczi

Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
* cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
s->cluster_data when the first read operation is performance on a
compressed cluster.

The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
code paths that can allocate these buffers, so remove the free functions
in the error code path.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Alexey: Does this improve your memory profiling results?

 block/qcow2-cluster.c | 17 +++++++++++++++++
 block/qcow2.c         | 12 ------------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f64c..c47600a44e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
         nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
         sector_offset = coffset & 511;
         csize = nb_csectors * 512 - sector_offset;
+
+        /* Allocate buffers on first decompress operation, most images are
+         * uncompressed and the memory overhead can be avoided.  The buffers
+         * are freed in .bdrv_close().
+         */
+        if (!s->cluster_data) {
+            /* one more sector for decompressed data alignment */
+            s->cluster_data = qemu_try_blockalign(bs->file->bs,
+                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
+            if (!s->cluster_data) {
+                return -EIO;
+            }
+        }
+        if (!s->cluster_cache) {
+            s->cluster_cache = g_malloc(s->cluster_size);
+        }
+
         BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
         ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data,
                         nb_csectors);
diff --git a/block/qcow2.c b/block/qcow2.c
index 40ba26c111..0ac201910a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1360,16 +1360,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
-    s->cluster_cache = g_malloc(s->cluster_size);
-    /* one more sector for decompressed data alignment */
-    s->cluster_data = qemu_try_blockalign(bs->file->bs, QCOW_MAX_CRYPT_CLUSTERS
-                                                    * s->cluster_size + 512);
-    if (s->cluster_data == NULL) {
-        error_setg(errp, "Could not allocate temporary cluster buffer");
-        ret = -ENOMEM;
-        goto fail;
-    }
-
     s->cluster_cache_offset = -1;
     s->flags = flags;
 
@@ -1507,8 +1497,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
     if (s->refcount_block_cache) {
         qcow2_cache_destroy(bs, s->refcount_block_cache);
     }
-    g_free(s->cluster_cache);
-    qemu_vfree(s->cluster_data);
     qcrypto_block_free(s->crypto);
     qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
     return ret;
-- 
2.13.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-18 13:31 [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand Stefan Hajnoczi
@ 2017-08-18 15:18 ` Eric Blake
  2017-08-19  2:46   ` Alexey Kardashevskiy
  2017-08-21 13:50   ` Stefan Hajnoczi
  2017-08-19  1:01 ` Alexey Kardashevskiy
  1 sibling, 2 replies; 10+ messages in thread
From: Eric Blake @ 2017-08-18 15:18 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Alexey Kardashevskiy, Kevin Wolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
> s->cluster_data when the first read operation is performance on a
> compressed cluster.
> 
> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
> code paths that can allocate these buffers, so remove the free functions
> in the error code path.
> 
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> Alexey: Does this improve your memory profiling results?

Is this a regression from earlier versions?  Likely, it is NOT -rc4
material, and thus can wait for 2.11; but it should be fine for -stable
as part of 2.10.1 down the road.

> +++ b/block/qcow2-cluster.c
> @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
>          nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
>          sector_offset = coffset & 511;
>          csize = nb_csectors * 512 - sector_offset;
> +
> +        /* Allocate buffers on first decompress operation, most images are
> +         * uncompressed and the memory overhead can be avoided.  The buffers
> +         * are freed in .bdrv_close().
> +         */
> +        if (!s->cluster_data) {
> +            /* one more sector for decompressed data alignment */
> +            s->cluster_data = qemu_try_blockalign(bs->file->bs,
> +                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> +            if (!s->cluster_data) {
> +                return -EIO;

Is -ENOMEM any better than -EIO here?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-18 13:31 [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand Stefan Hajnoczi
  2017-08-18 15:18 ` Eric Blake
@ 2017-08-19  1:01 ` Alexey Kardashevskiy
  1 sibling, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2017-08-19  1:01 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: qemu-block, Kevin Wolf

On 18/08/17 23:31, Stefan Hajnoczi wrote:
> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
> s->cluster_data when the first read operation is performance on a
> compressed cluster.
> 
> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
> code paths that can allocate these buffers, so remove the free functions
> in the error code path.
> 
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> Alexey: Does this improve your memory profiling results?

Yes, it does:

was:
12.81% (1,023,193,088B)
now:
05.36% (393,893,888B)


Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>

>  block/qcow2-cluster.c | 17 +++++++++++++++++
>  block/qcow2.c         | 12 ------------
>  2 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index f06c08f64c..c47600a44e 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
>          nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
>          sector_offset = coffset & 511;
>          csize = nb_csectors * 512 - sector_offset;
> +
> +        /* Allocate buffers on first decompress operation, most images are
> +         * uncompressed and the memory overhead can be avoided.  The buffers
> +         * are freed in .bdrv_close().
> +         */
> +        if (!s->cluster_data) {
> +            /* one more sector for decompressed data alignment */
> +            s->cluster_data = qemu_try_blockalign(bs->file->bs,
> +                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> +            if (!s->cluster_data) {
> +                return -EIO;
> +            }
> +        }
> +        if (!s->cluster_cache) {
> +            s->cluster_cache = g_malloc(s->cluster_size);
> +        }
> +
>          BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
>          ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data,
>                          nb_csectors);
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 40ba26c111..0ac201910a 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1360,16 +1360,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
>          goto fail;
>      }
>  
> -    s->cluster_cache = g_malloc(s->cluster_size);
> -    /* one more sector for decompressed data alignment */
> -    s->cluster_data = qemu_try_blockalign(bs->file->bs, QCOW_MAX_CRYPT_CLUSTERS
> -                                                    * s->cluster_size + 512);
> -    if (s->cluster_data == NULL) {
> -        error_setg(errp, "Could not allocate temporary cluster buffer");
> -        ret = -ENOMEM;
> -        goto fail;
> -    }
> -
>      s->cluster_cache_offset = -1;
>      s->flags = flags;
>  
> @@ -1507,8 +1497,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
>      if (s->refcount_block_cache) {
>          qcow2_cache_destroy(bs, s->refcount_block_cache);
>      }
> -    g_free(s->cluster_cache);
> -    qemu_vfree(s->cluster_data);
>      qcrypto_block_free(s->crypto);
>      qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
>      return ret;
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-18 15:18 ` Eric Blake
@ 2017-08-19  2:46   ` Alexey Kardashevskiy
  2017-08-19  2:53     ` Alexey Kardashevskiy
  2017-08-22  4:56     ` Alexey Kardashevskiy
  2017-08-21 13:50   ` Stefan Hajnoczi
  1 sibling, 2 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2017-08-19  2:46 UTC (permalink / raw)
  To: Eric Blake, Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4762 bytes --]

On 19/08/17 01:18, Eric Blake wrote:
> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
>> s->cluster_data when the first read operation is performance on a
>> compressed cluster.
>>
>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
>> code paths that can allocate these buffers, so remove the free functions
>> in the error code path.
>>
>> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Cc: Kevin Wolf <kwolf@redhat.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> Alexey: Does this improve your memory profiling results?
> 
> Is this a regression from earlier versions? 

Hm, I have not thought about this.

So. I did bisect and this started happening from
9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
"hw/virtio-pci: fix virtio behaviour"

Before that, the very same command line would take less than 1GB of
resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
which means that upstream with "-machine pseries-2.6" works fine (less than
1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).

Then I tried bisecting again, with
"scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
devices, started from
e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
added the disable-modern switch) which uses 2GB of memory.

I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".

Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
used memory (yay!)

I do not really know how to reinterpret all of this, do you?


Note: 1GB..9GB numbers from below are the peak values from valgrind's
massif. This is pretty much resident memory used by QEMU process. In my
testing I did not enable KVM and I did not start the guest (i.e. used -S).
150 virtio-block devices, 2GB RAM for the guest.

Also, while bisecting, I only paid attention if it is 1..2GB or 6..9GB -
all tests did fit these 2 ranges, for any given sha1 the amount of memory
would be stable but among "good" commits it could change between 1GB and 2GB.



diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5d14bd6..7ad447a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1783,6 +1783,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
Error **errp)
                        /* PCI BAR regions must be powers of 2 */
                        pow2ceil(proxy->notify.offset + proxy->notify.size));

+#if 0
     memory_region_init_alias(&proxy->modern_cfg,
                              OBJECT(proxy),
                              "virtio-pci-cfg",
@@ -1791,7 +1792,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
Error **errp)
                              memory_region_size(&proxy->modern_bar));

     address_space_init(&proxy->modern_as, &proxy->modern_cfg,
"virtio-pci-cfg-as");
-
+#endif
     if (proxy->disable_legacy == ON_OFF_AUTO_AUTO) {
         proxy->disable_legacy = pcie_port ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
     }
@@ -1860,10 +1861,10 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
Error **errp)

 static void virtio_pci_exit(PCIDevice *pci_dev)
 {
-    VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
+    //VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);

     msix_uninit_exclusive_bar(pci_dev);
-    address_space_destroy(&proxy->modern_as);
+    //address_space_destroy(&proxy->modern_as);
 }

 static void virtio_pci_reset(DeviceState *qdev)





> Likely, it is NOT -rc4
> material, and thus can wait for 2.11; but it should be fine for -stable
> as part of 2.10.1 down the road.
> 
>> +++ b/block/qcow2-cluster.c
>> @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
>>          nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
>>          sector_offset = coffset & 511;
>>          csize = nb_csectors * 512 - sector_offset;
>> +
>> +        /* Allocate buffers on first decompress operation, most images are
>> +         * uncompressed and the memory overhead can be avoided.  The buffers
>> +         * are freed in .bdrv_close().
>> +         */
>> +        if (!s->cluster_data) {
>> +            /* one more sector for decompressed data alignment */
>> +            s->cluster_data = qemu_try_blockalign(bs->file->bs,
>> +                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
>> +            if (!s->cluster_data) {
>> +                return -EIO;
> 
> Is -ENOMEM any better than -EIO here?
> 


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-19  2:46   ` Alexey Kardashevskiy
@ 2017-08-19  2:53     ` Alexey Kardashevskiy
  2017-08-19  8:50       ` Alexey Kardashevskiy
  2017-08-22  4:56     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 10+ messages in thread
From: Alexey Kardashevskiy @ 2017-08-19  2:53 UTC (permalink / raw)
  To: Eric Blake, Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]

On 19/08/17 12:46, Alexey Kardashevskiy wrote:
> On 19/08/17 01:18, Eric Blake wrote:
>> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
>>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
>>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
>>> s->cluster_data when the first read operation is performance on a
>>> compressed cluster.
>>>
>>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
>>> code paths that can allocate these buffers, so remove the free functions
>>> in the error code path.
>>>
>>> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Cc: Kevin Wolf <kwolf@redhat.com>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>> Alexey: Does this improve your memory profiling results?
>>
>> Is this a regression from earlier versions? 
> 
> Hm, I have not thought about this.
> 
> So. I did bisect and this started happening from
> 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
> "hw/virtio-pci: fix virtio behaviour"
> 
> Before that, the very same command line would take less than 1GB of
> resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
> which means that upstream with "-machine pseries-2.6" works fine (less than
> 1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).
> 
> Then I tried bisecting again, with
> "scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
> devices, started from
> e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
> added the disable-modern switch) which uses 2GB of memory.
> 
> I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".
> 
> Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
> used memory (yay!)
> 
> I do not really know how to reinterpret all of this, do you?
> 
> 
> Note: 1GB..9GB numbers from below are the peak values from valgrind's

s/from below/from above/ , sorry, bad cut-n-paste :)


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-19  2:53     ` Alexey Kardashevskiy
@ 2017-08-19  8:50       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2017-08-19  8:50 UTC (permalink / raw)
  To: Eric Blake, Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On 19/08/17 12:53, Alexey Kardashevskiy wrote:
> On 19/08/17 12:46, Alexey Kardashevskiy wrote:
>> On 19/08/17 01:18, Eric Blake wrote:
>>> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
>>>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
>>>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
>>>> s->cluster_data when the first read operation is performance on a
>>>> compressed cluster.
>>>>
>>>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
>>>> code paths that can allocate these buffers, so remove the free functions
>>>> in the error code path.
>>>>
>>>> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> Cc: Kevin Wolf <kwolf@redhat.com>
>>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>> ---
>>>> Alexey: Does this improve your memory profiling results?
>>>
>>> Is this a regression from earlier versions? 
>>
>> Hm, I have not thought about this.
>>
>> So. I did bisect and this started happening from
>> 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
>> "hw/virtio-pci: fix virtio behaviour"
>>
>> Before that, the very same command line would take less than 1GB of
>> resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
>> which means that upstream with "-machine pseries-2.6" works fine (less than
>> 1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).
>>
>> Then I tried bisecting again, with
>> "scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
>> devices, started from
>> e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
>> added the disable-modern switch) which uses 2GB of memory.
>>
>> I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".
>>
>> Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
>> used memory (yay!)
>>
>> I do not really know how to reinterpret all of this, do you?
>>
>>
>> Note: 1GB..9GB numbers from below are the peak values from valgrind's
> 
> s/from below/from above/ , sorry, bad cut-n-paste :)


Just realized - I should have replied with this to "Memory use with >100
virtio devices" actually, sorry for the confusion.


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-18 15:18 ` Eric Blake
  2017-08-19  2:46   ` Alexey Kardashevskiy
@ 2017-08-21 13:50   ` Stefan Hajnoczi
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2017-08-21 13:50 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Alexey Kardashevskiy, Kevin Wolf, qemu-block

On Fri, Aug 18, 2017 at 10:18:37AM -0500, Eric Blake wrote:
> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
> > Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
> > * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
> > s->cluster_data when the first read operation is performance on a
> > compressed cluster.
> > 
> > The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
> > code paths that can allocate these buffers, so remove the free functions
> > in the error code path.
> > 
> > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > Alexey: Does this improve your memory profiling results?
> 
> Is this a regression from earlier versions?  Likely, it is NOT -rc4
> material, and thus can wait for 2.11; but it should be fine for -stable
> as part of 2.10.1 down the road.

It's not a regression.  s->cluster_data was always allocated upfront in
.bdrv_open().

> > +++ b/block/qcow2-cluster.c
> > @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
> >          nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
> >          sector_offset = coffset & 511;
> >          csize = nb_csectors * 512 - sector_offset;
> > +
> > +        /* Allocate buffers on first decompress operation, most images are
> > +         * uncompressed and the memory overhead can be avoided.  The buffers
> > +         * are freed in .bdrv_close().
> > +         */
> > +        if (!s->cluster_data) {
> > +            /* one more sector for decompressed data alignment */
> > +            s->cluster_data = qemu_try_blockalign(bs->file->bs,
> > +                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> > +            if (!s->cluster_data) {
> > +                return -EIO;
> 
> Is -ENOMEM any better than -EIO here?

There is another instance of -ENOMEM in qcow2.c so it seems reasonable
to use the more specific ENOMEM error code.

I'll resend the patch.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-19  2:46   ` Alexey Kardashevskiy
  2017-08-19  2:53     ` Alexey Kardashevskiy
@ 2017-08-22  4:56     ` Alexey Kardashevskiy
  2017-08-30 17:20       ` Stefan Hajnoczi
  1 sibling, 1 reply; 10+ messages in thread
From: Alexey Kardashevskiy @ 2017-08-22  4:56 UTC (permalink / raw)
  To: Eric Blake, Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5125 bytes --]

On 19/08/17 12:46, Alexey Kardashevskiy wrote:
> On 19/08/17 01:18, Eric Blake wrote:
>> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
>>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
>>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
>>> s->cluster_data when the first read operation is performance on a
>>> compressed cluster.
>>>
>>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
>>> code paths that can allocate these buffers, so remove the free functions
>>> in the error code path.
>>>
>>> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Cc: Kevin Wolf <kwolf@redhat.com>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>> Alexey: Does this improve your memory profiling results?
>>
>> Is this a regression from earlier versions? 
> 
> Hm, I have not thought about this.
> 
> So. I did bisect and this started happening from
> 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
> "hw/virtio-pci: fix virtio behaviour"
> 
> Before that, the very same command line would take less than 1GB of
> resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
> which means that upstream with "-machine pseries-2.6" works fine (less than
> 1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).
> 
> Then I tried bisecting again, with
> "scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
> devices, started from
> e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
> added the disable-modern switch) which uses 2GB of memory.
> 
> I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".
> 
> Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
> used memory (yay!)
> 
> I do not really know how to reinterpret all of this, do you?


Anyone, ping? Should I move the conversation to the original thread? Any
hacks to try with libc?



> 
> 
> Note: 1GB..9GB numbers from below are the peak values from valgrind's
> massif. This is pretty much resident memory used by QEMU process. In my
> testing I did not enable KVM and I did not start the guest (i.e. used -S).
> 150 virtio-block devices, 2GB RAM for the guest.
> 
> Also, while bisecting, I only paid attention if it is 1..2GB or 6..9GB -
> all tests did fit these 2 ranges, for any given sha1 the amount of memory
> would be stable but among "good" commits it could change between 1GB and 2GB.
> 
> 
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 5d14bd6..7ad447a 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1783,6 +1783,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
> Error **errp)
>                         /* PCI BAR regions must be powers of 2 */
>                         pow2ceil(proxy->notify.offset + proxy->notify.size));
> 
> +#if 0
>      memory_region_init_alias(&proxy->modern_cfg,
>                               OBJECT(proxy),
>                               "virtio-pci-cfg",
> @@ -1791,7 +1792,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
> Error **errp)
>                               memory_region_size(&proxy->modern_bar));
> 
>      address_space_init(&proxy->modern_as, &proxy->modern_cfg,
> "virtio-pci-cfg-as");
> -
> +#endif
>      if (proxy->disable_legacy == ON_OFF_AUTO_AUTO) {
>          proxy->disable_legacy = pcie_port ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
>      }
> @@ -1860,10 +1861,10 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
> Error **errp)
> 
>  static void virtio_pci_exit(PCIDevice *pci_dev)
>  {
> -    VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
> +    //VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
> 
>      msix_uninit_exclusive_bar(pci_dev);
> -    address_space_destroy(&proxy->modern_as);
> +    //address_space_destroy(&proxy->modern_as);
>  }
> 
>  static void virtio_pci_reset(DeviceState *qdev)
> 
> 
> 
> 
> 
>> Likely, it is NOT -rc4
>> material, and thus can wait for 2.11; but it should be fine for -stable
>> as part of 2.10.1 down the road.
>>
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
>>>          nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
>>>          sector_offset = coffset & 511;
>>>          csize = nb_csectors * 512 - sector_offset;
>>> +
>>> +        /* Allocate buffers on first decompress operation, most images are
>>> +         * uncompressed and the memory overhead can be avoided.  The buffers
>>> +         * are freed in .bdrv_close().
>>> +         */
>>> +        if (!s->cluster_data) {
>>> +            /* one more sector for decompressed data alignment */
>>> +            s->cluster_data = qemu_try_blockalign(bs->file->bs,
>>> +                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
>>> +            if (!s->cluster_data) {
>>> +                return -EIO;
>>
>> Is -ENOMEM any better than -EIO here?
>>
> 
> 


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-22  4:56     ` Alexey Kardashevskiy
@ 2017-08-30 17:20       ` Stefan Hajnoczi
  2017-08-31  0:03         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2017-08-30 17:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Eric Blake, Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block

On Tue, Aug 22, 2017 at 02:56:00PM +1000, Alexey Kardashevskiy wrote:
> On 19/08/17 12:46, Alexey Kardashevskiy wrote:
> > On 19/08/17 01:18, Eric Blake wrote:
> >> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
> >>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
> >>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
> >>> s->cluster_data when the first read operation is performance on a
> >>> compressed cluster.
> >>>
> >>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
> >>> code paths that can allocate these buffers, so remove the free functions
> >>> in the error code path.
> >>>
> >>> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> Cc: Kevin Wolf <kwolf@redhat.com>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >>> ---
> >>> Alexey: Does this improve your memory profiling results?
> >>
> >> Is this a regression from earlier versions? 
> > 
> > Hm, I have not thought about this.
> > 
> > So. I did bisect and this started happening from
> > 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
> > "hw/virtio-pci: fix virtio behaviour"
> > 
> > Before that, the very same command line would take less than 1GB of
> > resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
> > which means that upstream with "-machine pseries-2.6" works fine (less than
> > 1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).
> > 
> > Then I tried bisecting again, with
> > "scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
> > devices, started from
> > e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
> > added the disable-modern switch) which uses 2GB of memory.
> > 
> > I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".
> > 
> > Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
> > used memory (yay!)
> > 
> > I do not really know how to reinterpret all of this, do you?
> 
> 
> Anyone, ping? Should I move the conversation to the original thread? Any
> hacks to try with libc?

I suggest a new top-level thread with Michael Tsirkin CCed.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand
  2017-08-30 17:20       ` Stefan Hajnoczi
@ 2017-08-31  0:03         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2017-08-31  0:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Eric Blake, Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block

On 31/08/17 03:20, Stefan Hajnoczi wrote:
> On Tue, Aug 22, 2017 at 02:56:00PM +1000, Alexey Kardashevskiy wrote:
>> On 19/08/17 12:46, Alexey Kardashevskiy wrote:
>>> On 19/08/17 01:18, Eric Blake wrote:
>>>> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
>>>>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
>>>>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
>>>>> s->cluster_data when the first read operation is performance on a
>>>>> compressed cluster.
>>>>>
>>>>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
>>>>> code paths that can allocate these buffers, so remove the free functions
>>>>> in the error code path.
>>>>>
>>>>> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> Cc: Kevin Wolf <kwolf@redhat.com>
>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>> ---
>>>>> Alexey: Does this improve your memory profiling results?
>>>>
>>>> Is this a regression from earlier versions? 
>>>
>>> Hm, I have not thought about this.
>>>
>>> So. I did bisect and this started happening from
>>> 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
>>> "hw/virtio-pci: fix virtio behaviour"
>>>
>>> Before that, the very same command line would take less than 1GB of
>>> resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
>>> which means that upstream with "-machine pseries-2.6" works fine (less than
>>> 1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).
>>>
>>> Then I tried bisecting again, with
>>> "scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
>>> devices, started from
>>> e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
>>> added the disable-modern switch) which uses 2GB of memory.
>>>
>>> I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".
>>>
>>> Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
>>> used memory (yay!)
>>>
>>> I do not really know how to reinterpret all of this, do you?
>>
>>
>> Anyone, ping? Should I move the conversation to the original thread? Any
>> hacks to try with libc?
> 
> I suggest a new top-level thread with Michael Tsirkin CCed.


I am continuing in the original "Memory use with >100 virtio devices"
thread and the problem is more generic than virtio, it is just easier to
reproduce it with virtio, that's all.



-- 
Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-08-31  0:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-18 13:31 [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand Stefan Hajnoczi
2017-08-18 15:18 ` Eric Blake
2017-08-19  2:46   ` Alexey Kardashevskiy
2017-08-19  2:53     ` Alexey Kardashevskiy
2017-08-19  8:50       ` Alexey Kardashevskiy
2017-08-22  4:56     ` Alexey Kardashevskiy
2017-08-30 17:20       ` Stefan Hajnoczi
2017-08-31  0:03         ` Alexey Kardashevskiy
2017-08-21 13:50   ` Stefan Hajnoczi
2017-08-19  1:01 ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.