All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Qemu and heavily increased RSS usage
@ 2016-06-21  8:21 Peter Lieven
  2016-06-21 13:18 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-21  8:21 UTC (permalink / raw)
  To: qemu-devel

Hi,

while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.

Any help or ideas how to debug further would be appreciated.

Cmdline is:
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -M pc-i440fx-2.1 -nodefaults -netdev type=tap,id=guest3,script=no,downscript=no,ifname=tap0,vnet_hdr -device virtio-net-pci,netdev=guest3,mac=52:54:00:ff:08:5e -iscsi 
initiator-name=iqn.2005-03.org.xx:0025b5d0011f -drive format=raw,discard=on,file=iscsi://172.21.200.56/iqn.2001-05.com.equallogic:0-8a0906-98f384e0a-7d2004ee0a85767a-00lieven-test/0,if=none,cache=writeback,aio=native,id=disk0 -object iothread,id=iothread0 
-device virtio-blk-pci,drive=disk0,iothread=iothread0 -global virtio-blk-pci.scsi=off -serial null -parallel null -m 4096 -smp 4,sockets=1,cores=4,threads=1 -monitor tcp:0:4004,server,nowait,nodelay -qmp tcp:0:3004,server,nowait,nodelay -name lieven-test 
-boot order=c,once=dc,menu=off -k de -mem-path /hugepages -mem-prealloc -cpu Westmere,enforce -rtc base=utc -usb -usbdevice tablet -no-hpet -vga vmware

Thanks,
Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-21  8:21 [Qemu-devel] Qemu and heavily increased RSS usage Peter Lieven
@ 2016-06-21 13:18 ` Dr. David Alan Gilbert
  2016-06-21 15:12   ` Peter Lieven
                     ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2016-06-21 13:18 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel

* Peter Lieven (pl@kamp.de) wrote:
> Hi,
> 
> while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
> We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
> and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
> not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
> valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.
> 
> Any help or ideas how to debug further would be appreciated.

I think I'd try stripping devices off; can you get a similar difference
to happen with a guest with no USB, no hugepages, no VGA and a simple
locally stored IDE disk?

If you're having trouble bisecting is it possible it's a change
in one of the libraries it's linked against?

There was someone asking the other day on #qemu who had a setup that
was apparently using much more RAM than expected and we didn't
manage to track it down but I can't remember the version being used.

Dave

> 
> Cmdline is:
> ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -M pc-i440fx-2.1 -nodefaults
> -netdev type=tap,id=guest3,script=no,downscript=no,ifname=tap0,vnet_hdr
> -device virtio-net-pci,netdev=guest3,mac=52:54:00:ff:08:5e -iscsi
> initiator-name=iqn.2005-03.org.xx:0025b5d0011f -drive format=raw,discard=on,file=iscsi://172.21.200.56/iqn.2001-05.com.equallogic:0-8a0906-98f384e0a-7d2004ee0a85767a-00lieven-test/0,if=none,cache=writeback,aio=native,id=disk0
> -object iothread,id=iothread0 -device
> virtio-blk-pci,drive=disk0,iothread=iothread0 -global
> virtio-blk-pci.scsi=off -serial null -parallel null -m 4096 -smp
> 4,sockets=1,cores=4,threads=1 -monitor tcp:0:4004,server,nowait,nodelay -qmp
> tcp:0:3004,server,nowait,nodelay -name lieven-test -boot
> order=c,once=dc,menu=off -k de -mem-path /hugepages -mem-prealloc -cpu
> Westmere,enforce -rtc base=utc -usb -usbdevice tablet -no-hpet -vga vmware
> 
> Thanks,
> Peter
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-21 13:18 ` Dr. David Alan Gilbert
@ 2016-06-21 15:12   ` Peter Lieven
  2016-06-22 10:56     ` Stefan Hajnoczi
  2016-06-23  9:57   ` Peter Lieven
  2016-06-23 14:58   ` Peter Lieven
  2 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-21 15:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Paolo Bonzini

Am 21.06.2016 um 15:18 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> Hi,
>>
>> while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
>> We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
>> and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
>> not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
>> valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.
>>
>> Any help or ideas how to debug further would be appreciated.
> I think I'd try stripping devices off; can you get a similar difference
> to happen with a guest with no USB, no hugepages, no VGA and a simple
> locally stored IDE disk?

Will do. VGA I already ruled out. Hugepages I can try, but its easier
to monitor the RSS size if the vServer memory is outside the RSS.

>
> If you're having trouble bisecting is it possible it's a change
> in one of the libraries it's linked against?

Same libraries. If I compile qemu-2.2.0 and qemu/master on the exactly
same machine I see the difference.

>
> There was someone asking the other day on #qemu who had a setup that
> was apparently using much more RAM than expected and we didn't
> manage to track it down but I can't remember the version being used.

What I currently suspect are the following changes:
  - We changed the coroutine pool to a per thread model. I disabled the pool. This seems
    to cut the max used RSS to about 150MB which is still a lot more than qemu-2.2.0
  - I suspect that something (e.g. the object based device tree) is creating a lot of small allocations
    which create a massive overhead. I managed to get valgrind/massif running with attached debugger
    and took snapshots of the running VM. What I see is that kernel RSS size is much, much bigger than
    what massif sees as allocated memory. I talk of massif sees 5MB heap usage and the RSS size is 50MB or similar.
  - Changing the memory allocator to tcmalloc or jemalloc seems to relax the issue altough its not gone.
  - VGA memory seems to have been moved from VM memory into Heap. But thats a fixed 16MB allocation.

I will try to cut down devices as you proposed.

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-21 15:12   ` Peter Lieven
@ 2016-06-22 10:56     ` Stefan Hajnoczi
  2016-06-22 19:55       ` Peter Lieven
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2016-06-22 10:56 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Dr. David Alan Gilbert, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 641 bytes --]

On Tue, Jun 21, 2016 at 05:12:57PM +0200, Peter Lieven wrote:
>  - We changed the coroutine pool to a per thread model. I disabled the pool. This seems
>    to cut the max used RSS to about 150MB which is still a lot more than qemu-2.2.0

The per-thread coroutine pools only grow when a thread creates/destroys
coroutines.

The QEMU main loop, iothread, and maybe vcpus should use coroutines.
The numerous thread-pool worker threads should not use coroutines IIRC.

Creating coroutines is expensive and the pools improve performance a
lot.  Maybe you can make observations about how to manage pool size more
efficiently for your VM?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-22 10:56     ` Stefan Hajnoczi
@ 2016-06-22 19:55       ` Peter Lieven
  2016-06-22 20:56         ` Peter Maydell
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-22 19:55 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Dr. David Alan Gilbert, Paolo Bonzini, qemu-devel

Am 22.06.2016 um 12:56 schrieb Stefan Hajnoczi:
> On Tue, Jun 21, 2016 at 05:12:57PM +0200, Peter Lieven wrote:
>>  - We changed the coroutine pool to a per thread model. I disabled the pool. This seems
>>    to cut the max used RSS to about 150MB which is still a lot more than qemu-2.2.0
> The per-thread coroutine pools only grow when a thread creates/destroys
> coroutines.
>
> The QEMU main loop, iothread, and maybe vcpus should use coroutines.
> The numerous thread-pool worker threads should not use coroutines IIRC.
>
> Creating coroutines is expensive and the pools improve performance a
> lot.  Maybe you can make observations about how to manage pool size more
> efficiently for your VM?

I did not want to blame the coroutine pool. Its a good thing. I just wanted to
mention that we changed from a global pool (with a global mutex) to a thread
based pool. This might influence memory consumption. But the increased RSS
usage I observe is also there with disable coroutine pool.

What makes the coroutine pool memory intensive is the stack size of 1MB per
coroutine. Is it really necessary to have such a big stack?

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-22 19:55       ` Peter Lieven
@ 2016-06-22 20:56         ` Peter Maydell
  2016-06-24  9:37           ` Stefan Hajnoczi
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Maydell @ 2016-06-22 20:56 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel

On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
> What makes the coroutine pool memory intensive is the stack size of 1MB per
> coroutine. Is it really necessary to have such a big stack?

That reminds me that I was wondering if we should allocate
our coroutine stacks with MAP_GROWSDOWN (though if we're
not actually using 1MB of stack then it's only going to
be eating virtual memory, not necessarily real memory.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-21 13:18 ` Dr. David Alan Gilbert
  2016-06-21 15:12   ` Peter Lieven
@ 2016-06-23  9:57   ` Peter Lieven
  2016-06-24 22:57     ` Michael S. Tsirkin
  2016-06-23 14:58   ` Peter Lieven
  2 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-23  9:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel

Am 21.06.2016 um 15:18 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> Hi,
>>
>> while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
>> We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
>> and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
>> not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
>> valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.
>>
>> Any help or ideas how to debug further would be appreciated.
> I think I'd try stripping devices off; can you get a similar difference
> to happen with a guest with no USB, no hugepages, no VGA and a simple
> locally stored IDE disk?

 From what I have debugged so far, it seems to be related to virtio-net. With that knowledge I will try to bisect again.

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-21 13:18 ` Dr. David Alan Gilbert
  2016-06-21 15:12   ` Peter Lieven
  2016-06-23  9:57   ` Peter Lieven
@ 2016-06-23 14:58   ` Peter Lieven
  2016-06-23 15:00     ` Dr. David Alan Gilbert
  2016-06-23 15:21     ` Paolo Bonzini
  2 siblings, 2 replies; 28+ messages in thread
From: Peter Lieven @ 2016-06-23 14:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Paolo Bonzini, Fam Zheng, Peter Maydell

Am 21.06.2016 um 15:18 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> Hi,
>>
>> while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
>> We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
>> and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
>> not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
>> valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.
>>
>> Any help or ideas how to debug further would be appreciated.
> I think I'd try stripping devices off; can you get a similar difference
> to happen with a guest with no USB, no hugepages, no VGA and a simple
> locally stored IDE disk?
>
> If you're having trouble bisecting is it possible it's a change
> in one of the libraries it's linked against?
>
> There was someone asking the other day on #qemu who had a setup that
> was apparently using much more RAM than expected and we didn't
> manage to track it down but I can't remember the version being used.

I currently trying to track the increased usage from release to release. The first increase of RSS usage from ~25MB to ~35MB directly
after machine setup is introduced by this patch:

commit ba3f4f64b0e941b9e03568b826746941bef071f9
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Jan 21 12:09:14 2015 +0100

     exec: RCUify AddressSpaceDispatch

     Note that even after this patch, most callers of address_space_*
     functions must still be under the big QEMU lock, otherwise the memory
     region returned by address_space_translate can disappear as soon as
     address_space_translate returns.  This will be fixed in the next part
     of this series.

     Reviewed-by: Fam Zheng <famz@redhat.com>
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

@Paolo, @Fam, any idea?

Thanks,
Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 14:58   ` Peter Lieven
@ 2016-06-23 15:00     ` Dr. David Alan Gilbert
  2016-06-23 15:02       ` Peter Lieven
  2016-06-23 15:21     ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2016-06-23 15:00 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, Paolo Bonzini, Fam Zheng, Peter Maydell

* Peter Lieven (pl@kamp.de) wrote:
> Am 21.06.2016 um 15:18 schrieb Dr. David Alan Gilbert:
> > * Peter Lieven (pl@kamp.de) wrote:
> > > Hi,
> > > 
> > > while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
> > > We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
> > > and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
> > > not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
> > > valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.
> > > 
> > > Any help or ideas how to debug further would be appreciated.
> > I think I'd try stripping devices off; can you get a similar difference
> > to happen with a guest with no USB, no hugepages, no VGA and a simple
> > locally stored IDE disk?
> > 
> > If you're having trouble bisecting is it possible it's a change
> > in one of the libraries it's linked against?
> > 
> > There was someone asking the other day on #qemu who had a setup that
> > was apparently using much more RAM than expected and we didn't
> > manage to track it down but I can't remember the version being used.
> 
> I currently trying to track the increased usage from release to release. The first increase of RSS usage from ~25MB to ~35MB directly
> after machine setup is introduced by this patch:

OK, while 10MB is bad, I'm more interested in where your other 270MB have gone - hopefully
it's not 27 separate 10MB chunks!

Dave

> 
> commit ba3f4f64b0e941b9e03568b826746941bef071f9
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Jan 21 12:09:14 2015 +0100
> 
>     exec: RCUify AddressSpaceDispatch
> 
>     Note that even after this patch, most callers of address_space_*
>     functions must still be under the big QEMU lock, otherwise the memory
>     region returned by address_space_translate can disappear as soon as
>     address_space_translate returns.  This will be fixed in the next part
>     of this series.
> 
>     Reviewed-by: Fam Zheng <famz@redhat.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> @Paolo, @Fam, any idea?
> 
> Thanks,
> Peter
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 15:00     ` Dr. David Alan Gilbert
@ 2016-06-23 15:02       ` Peter Lieven
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Lieven @ 2016-06-23 15:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Paolo Bonzini, Fam Zheng, Peter Maydell

Am 23.06.2016 um 17:00 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> Am 21.06.2016 um 15:18 schrieb Dr. David Alan Gilbert:
>>> * Peter Lieven (pl@kamp.de) wrote:
>>>> Hi,
>>>>
>>>> while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
>>>> We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
>>>> and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
>>>> not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
>>>> valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.
>>>>
>>>> Any help or ideas how to debug further would be appreciated.
>>> I think I'd try stripping devices off; can you get a similar difference
>>> to happen with a guest with no USB, no hugepages, no VGA and a simple
>>> locally stored IDE disk?
>>>
>>> If you're having trouble bisecting is it possible it's a change
>>> in one of the libraries it's linked against?
>>>
>>> There was someone asking the other day on #qemu who had a setup that
>>> was apparently using much more RAM than expected and we didn't
>>> manage to track it down but I can't remember the version being used.
>> I currently trying to track the increased usage from release to release. The first increase of RSS usage from ~25MB to ~35MB directly
>> after machine setup is introduced by this patch:
> OK, while 10MB is bad, I'm more interested in where your other 270MB have gone - hopefully
> it's not 27 separate 10MB chunks!

I'm trying to figure out the exact commits, but its very hard to bisect so far. This is the
first commit that really introduces an easy to reproduce RSS increase. Maybe this is the root
cause. I don't know.

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 14:58   ` Peter Lieven
  2016-06-23 15:00     ` Dr. David Alan Gilbert
@ 2016-06-23 15:21     ` Paolo Bonzini
  2016-06-23 15:31       ` Peter Lieven
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-06-23 15:21 UTC (permalink / raw)
  To: Peter Lieven, Dr. David Alan Gilbert; +Cc: qemu-devel, Fam Zheng, Peter Maydell



On 23/06/2016 16:58, Peter Lieven wrote:
> commit ba3f4f64b0e941b9e03568b826746941bef071f9
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Jan 21 12:09:14 2015 +0100
> 
>     exec: RCUify AddressSpaceDispatch
> 
>     Note that even after this patch, most callers of address_space_*
>     functions must still be under the big QEMU lock, otherwise the memory
>     region returned by address_space_translate can disappear as soon as
>     address_space_translate returns.  This will be fixed in the next part
>     of this series.
> 
>     Reviewed-by: Fam Zheng <famz@redhat.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> @Paolo, @Fam, any idea?

When you use RCU, freeing stuff is delayed a bit.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 15:21     ` Paolo Bonzini
@ 2016-06-23 15:31       ` Peter Lieven
  2016-06-23 15:47         ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-23 15:31 UTC (permalink / raw)
  To: Paolo Bonzini, Dr. David Alan Gilbert
  Cc: qemu-devel, Fam Zheng, Peter Maydell

Am 23.06.2016 um 17:21 schrieb Paolo Bonzini:
>
> On 23/06/2016 16:58, Peter Lieven wrote:
>> commit ba3f4f64b0e941b9e03568b826746941bef071f9
>> Author: Paolo Bonzini <pbonzini@redhat.com>
>> Date:   Wed Jan 21 12:09:14 2015 +0100
>>
>>      exec: RCUify AddressSpaceDispatch
>>
>>      Note that even after this patch, most callers of address_space_*
>>      functions must still be under the big QEMU lock, otherwise the memory
>>      region returned by address_space_translate can disappear as soon as
>>      address_space_translate returns.  This will be fixed in the next part
>>      of this series.
>>
>>      Reviewed-by: Fam Zheng <famz@redhat.com>
>>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> @Paolo, @Fam, any idea?
> When you use RCU, freeing stuff is delayed a bit.

define a bit?

I face the issue that it seems (some) stuff is actually never freed...

Consider the following simple vServer:

./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -M pc-i440fx-2.1 -nodefaults -serial null -parallel null -m 4096 -smp 4,sockets=1,cores=4,threads=1 -monitor tcp:0:4004,server,nowait,nodelay -mem-path /hugepages -mem-prealloc -cpu Westmere,enforce -rtc 
base=utc -no-hpet -vga vmware -pidfile /tmp/qemu.pid

head at 9d82b5a

VmHWM:       22660 kB
VmRSS:       22656 kB

head at 79e2b9a

VmHWM:       32948 kB
VmRSS:       32948 kB

even after several minutes.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 15:31       ` Peter Lieven
@ 2016-06-23 15:47         ` Paolo Bonzini
  2016-06-23 16:19           ` Peter Lieven
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-06-23 15:47 UTC (permalink / raw)
  To: Peter Lieven, Dr. David Alan Gilbert; +Cc: qemu-devel, Fam Zheng, Peter Maydell



On 23/06/2016 17:31, Peter Lieven wrote:
> Am 23.06.2016 um 17:21 schrieb Paolo Bonzini:
>>
>> On 23/06/2016 16:58, Peter Lieven wrote:
>>> commit ba3f4f64b0e941b9e03568b826746941bef071f9
>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>> Date:   Wed Jan 21 12:09:14 2015 +0100
>>>
>>>      exec: RCUify AddressSpaceDispatch
>>>
>>>      Note that even after this patch, most callers of address_space_*
>>>      functions must still be under the big QEMU lock, otherwise the
>>> memory
>>>      region returned by address_space_translate can disappear as soon as
>>>      address_space_translate returns.  This will be fixed in the next
>>> part
>>>      of this series.
>>>
>>>      Reviewed-by: Fam Zheng <famz@redhat.com>
>>>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>> @Paolo, @Fam, any idea?
>> When you use RCU, freeing stuff is delayed a bit.
> 
> define a bit?
> 
> I face the issue that it seems (some) stuff is actually never freed...

Can you confirm that with e.g. valgrind?  It could be that malloc has
asked the kernel for more RSS and never released that, but QEMU did free
the memory.

Paolo

> Consider the following simple vServer:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -M pc-i440fx-2.1
> -nodefaults -serial null -parallel null -m 4096 -smp
> 4,sockets=1,cores=4,threads=1 -monitor tcp:0:4004,server,nowait,nodelay
> -mem-path /hugepages -mem-prealloc -cpu Westmere,enforce -rtc base=utc
> -no-hpet -vga vmware -pidfile /tmp/qemu.pid
> 
> head at 9d82b5a
> 
> VmHWM:       22660 kB
> VmRSS:       22656 kB
> 
> head at 79e2b9a
> 
> VmHWM:       32948 kB
> VmRSS:       32948 kB
> 
> even after several minutes.
> 
> Thanks,
> Peter
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 15:47         ` Paolo Bonzini
@ 2016-06-23 16:19           ` Peter Lieven
  2016-06-23 16:53             ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-23 16:19 UTC (permalink / raw)
  To: Paolo Bonzini, Dr. David Alan Gilbert
  Cc: qemu-devel, Fam Zheng, Peter Maydell

Am 23.06.2016 um 17:47 schrieb Paolo Bonzini:
>
> On 23/06/2016 17:31, Peter Lieven wrote:
>> Am 23.06.2016 um 17:21 schrieb Paolo Bonzini:
>>> On 23/06/2016 16:58, Peter Lieven wrote:
>>>> commit ba3f4f64b0e941b9e03568b826746941bef071f9
>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>> Date:   Wed Jan 21 12:09:14 2015 +0100
>>>>
>>>>       exec: RCUify AddressSpaceDispatch
>>>>
>>>>       Note that even after this patch, most callers of address_space_*
>>>>       functions must still be under the big QEMU lock, otherwise the
>>>> memory
>>>>       region returned by address_space_translate can disappear as soon as
>>>>       address_space_translate returns.  This will be fixed in the next
>>>> part
>>>>       of this series.
>>>>
>>>>       Reviewed-by: Fam Zheng <famz@redhat.com>
>>>>       Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>
>>>> @Paolo, @Fam, any idea?
>>> When you use RCU, freeing stuff is delayed a bit.
>> define a bit?
>>
>> I face the issue that it seems (some) stuff is actually never freed...
> Can you confirm that with e.g. valgrind?  It could be that malloc has
> asked the kernel for more RSS and never released that, but QEMU did free
> the memory.

Valgrind does not see the increased RSS.

HEAD at 9d82b5a
(gdb) monitor leak_check summary any
==10988== LEAK SUMMARY:
==10988==    definitely lost: 392 bytes in 15 blocks
==10988==    indirectly lost: 3,824 bytes in 38 blocks
==10988==      possibly lost: 640 bytes in 2 blocks
==10988==    still reachable: 3,510,751 bytes in 8,898 blocks
==10988==         suppressed: 0 bytes in 0 blocks

HEAD at 79e2b9a
(gdb) monitor leak_check summary any
==8108== LEAK SUMMARY:
==8108==    definitely lost: 392 bytes in 15 blocks
==8108==    indirectly lost: 3,824 bytes in 38 blocks
==8108==      possibly lost: 640 bytes in 2 blocks
==8108==    still reachable: 3,510,975 bytes in 8,898 blocks
==8108==         suppressed: 0 bytes in 0 blocks

Mhh, so your idea could be right. But what to do now? The introduction of RCU obviously increases the short term RSS usage. But thats never corrected as
it seems.

I see this behaviour with kernel 3.19 and kernel 4.4

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 16:19           ` Peter Lieven
@ 2016-06-23 16:53             ` Paolo Bonzini
  2016-06-23 21:28               ` Peter Lieven
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-06-23 16:53 UTC (permalink / raw)
  To: Peter Lieven, Dr. David Alan Gilbert; +Cc: qemu-devel, Fam Zheng, Peter Maydell



On 23/06/2016 18:19, Peter Lieven wrote:
> Mhh, so your idea could be right. But what to do now? The introduction
> of RCU obviously increases the short term RSS usage. But thats never
> corrected as it seems.
> 
> I see this behaviour with kernel 3.19 and kernel 4.4

If it's 10M nothing.  If there is a 100M regression that is also caused
by RCU, we have to give up on it for that data structure, or mmap/munmap
the affected data structures.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 16:53             ` Paolo Bonzini
@ 2016-06-23 21:28               ` Peter Lieven
  2016-06-24  4:10                 ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-23 21:28 UTC (permalink / raw)
  To: Paolo Bonzini, Dr. David Alan Gilbert
  Cc: qemu-devel, Fam Zheng, Peter Maydell

Am 23.06.2016 um 18:53 schrieb Paolo Bonzini:
>
> On 23/06/2016 18:19, Peter Lieven wrote:
>> Mhh, so your idea could be right. But what to do now? The introduction
>> of RCU obviously increases the short term RSS usage. But thats never
>> corrected as it seems.
>>
>> I see this behaviour with kernel 3.19 and kernel 4.4
> If it's 10M nothing.  If there is a 100M regression that is also caused
> by RCU, we have to give up on it for that data structure, or mmap/munmap
> the affected data structures.

If it was only 10MB I would agree. But if I run the VM described earlier
in this thread it goes from ~35MB with Qemu-2.2.0 to ~130-150MB with
current master. This is with coroutine pool disabled. With the coroutine pool
it can grow to sth like 300-350MB.

Is there an easy way to determinate if RCU is the problem? I have the same
symptoms, valgrind doesn't see the allocated memory. Is it possible
to make rcu_call directly invoking the function - maybe with a lock around it
that serializes the calls? Even if its expensive it might show if we search at the
right place.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23 21:28               ` Peter Lieven
@ 2016-06-24  4:10                 ` Paolo Bonzini
  2016-06-24  8:11                   ` Peter Lieven
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-06-24  4:10 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Dr. David Alan Gilbert, qemu-devel, Fam Zheng, Peter Maydell


> > If it's 10M nothing.  If there is a 100M regression that is also caused
> > by RCU, we have to give up on it for that data structure, or mmap/munmap
> > the affected data structures.
> 
> If it was only 10MB I would agree. But if I run the VM described earlier
> in this thread it goes from ~35MB with Qemu-2.2.0 to ~130-150MB with
> current master. This is with coroutine pool disabled. With the coroutine pool
> it can grow to sth like 300-350MB.
> 
> Is there an easy way to determinate if RCU is the problem? I have the same
> symptoms, valgrind doesn't see the allocated memory. Is it possible
> to make rcu_call directly invoking the function - maybe with a lock around it
> that serializes the calls? Even if its expensive it might show if we search
> at the right place.

Yes, you can do that.  Just make it call the function without locks, for
a quick PoC it will be okay.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24  4:10                 ` Paolo Bonzini
@ 2016-06-24  8:11                   ` Peter Lieven
  2016-06-24  8:20                     ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-24  8:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dr. David Alan Gilbert, qemu-devel, Fam Zheng, Peter Maydell

Am 24.06.2016 um 06:10 schrieb Paolo Bonzini:
>>> If it's 10M nothing.  If there is a 100M regression that is also caused
>>> by RCU, we have to give up on it for that data structure, or mmap/munmap
>>> the affected data structures.
>> If it was only 10MB I would agree. But if I run the VM described earlier
>> in this thread it goes from ~35MB with Qemu-2.2.0 to ~130-150MB with
>> current master. This is with coroutine pool disabled. With the coroutine pool
>> it can grow to sth like 300-350MB.
>>
>> Is there an easy way to determinate if RCU is the problem? I have the same
>> symptoms, valgrind doesn't see the allocated memory. Is it possible
>> to make rcu_call directly invoking the function - maybe with a lock around it
>> that serializes the calls? Even if its expensive it might show if we search
>> at the right place.
> Yes, you can do that.  Just make it call the function without locks, for
> a quick PoC it will be okay.

Unfortunately, it leads to immediate segfaults because a lot of things seem
to go horribly wrong ;-)

Do you have any other idea than reverting all the rcu patches for this section?

I'm also wondering why the RSS is not returned to the kernel. One thing could
be fragmentation....

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24  8:11                   ` Peter Lieven
@ 2016-06-24  8:20                     ` Paolo Bonzini
  2016-06-24  8:45                       ` Peter Lieven
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-06-24  8:20 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Dr. David Alan Gilbert, qemu-devel, Fam Zheng, Peter Maydell



On 24/06/2016 10:11, Peter Lieven wrote:
> Am 24.06.2016 um 06:10 schrieb Paolo Bonzini:
>>>> If it's 10M nothing.  If there is a 100M regression that is also caused
>>>> by RCU, we have to give up on it for that data structure, or mmap/munmap
>>>> the affected data structures.
>>> If it was only 10MB I would agree. But if I run the VM described earlier
>>> in this thread it goes from ~35MB with Qemu-2.2.0 to ~130-150MB with
>>> current master. This is with coroutine pool disabled. With the coroutine pool
>>> it can grow to sth like 300-350MB.
>>>
>>> Is there an easy way to determinate if RCU is the problem? I have the same
>>> symptoms, valgrind doesn't see the allocated memory. Is it possible
>>> to make rcu_call directly invoking the function - maybe with a lock around it
>>> that serializes the calls? Even if its expensive it might show if we search
>>> at the right place.
>> Yes, you can do that.  Just make it call the function without locks, for
>> a quick PoC it will be okay.
> 
> Unfortunately, it leads to immediate segfaults because a lot of things seem
> to go horribly wrong ;-)
> 
> Do you have any other idea than reverting all the rcu patches for this section?

Try freeing under the big QEMU lock:

	if (qemu_mutex_iothread_locked()) {
	    unlock = true;
	    qemu_mutex_lock_iothread();
	}
		...
	if (unlock) {
	    qemu_mutex_unlock_iothread();
	}

afbe70535ff1a8a7a32910cc15ebecc0ba92e7da should be easy to backport.

Thanks,

Paolo

> I'm also wondering why the RSS is not returned to the kernel. One thing could
> be fragmentation....
> 
> Peter
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24  8:20                     ` Paolo Bonzini
@ 2016-06-24  8:45                       ` Peter Lieven
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Lieven @ 2016-06-24  8:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dr. David Alan Gilbert, qemu-devel, Fam Zheng, Peter Maydell

Am 24.06.2016 um 10:20 schrieb Paolo Bonzini:
>
> On 24/06/2016 10:11, Peter Lieven wrote:
>> Am 24.06.2016 um 06:10 schrieb Paolo Bonzini:
>>>>> If it's 10M nothing.  If there is a 100M regression that is also caused
>>>>> by RCU, we have to give up on it for that data structure, or mmap/munmap
>>>>> the affected data structures.
>>>> If it was only 10MB I would agree. But if I run the VM described earlier
>>>> in this thread it goes from ~35MB with Qemu-2.2.0 to ~130-150MB with
>>>> current master. This is with coroutine pool disabled. With the coroutine pool
>>>> it can grow to sth like 300-350MB.
>>>>
>>>> Is there an easy way to determinate if RCU is the problem? I have the same
>>>> symptoms, valgrind doesn't see the allocated memory. Is it possible
>>>> to make rcu_call directly invoking the function - maybe with a lock around it
>>>> that serializes the calls? Even if its expensive it might show if we search
>>>> at the right place.
>>> Yes, you can do that.  Just make it call the function without locks, for
>>> a quick PoC it will be okay.
>> Unfortunately, it leads to immediate segfaults because a lot of things seem
>> to go horribly wrong ;-)
>>
>> Do you have any other idea than reverting all the rcu patches for this section?
> Try freeing under the big QEMU lock:
>
> 	if (qemu_mutex_iothread_locked()) {
> 	    unlock = true;
> 	    qemu_mutex_lock_iothread();
> 	}
> 		...
> 	if (unlock) {
> 	    qemu_mutex_unlock_iothread();
> 	}
>
> afbe70535ff1a8a7a32910cc15ebecc0ba92e7da should be easy to backport.

Will check this out. Meanwhile I read a little about returning RSS to the kernel as I was wondering
why RSS and HWM are almost at the same high level. It seems that ptmalloc (glibc default alloctor)
is very reluctant in retuning memory to the kernel. There indeed is no guarantee that freed memory
returned. Only mmap'ed memory that is unmapped is guaranteed to be returned.

So I tried the following without reverting anything:

MALLOC_MMAP_THRESHOLD_=4096 ./x86_64-softmmu/qemu-system-x86_64  ...

No idea on performance impact yet, but it solves the issue.

With default threshold my test VM rises up to 154MB RSS usage:

VmHWM:      154284 kB
VmRSS:      154284 kB

With the option it looks like this:

VmHWM:       50588 kB
VmRSS:       41920 kB

with jemalloc I can observe that the HWM is still high, but RSS is below its value. But still in the order of about 100MB.

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-22 20:56         ` Peter Maydell
@ 2016-06-24  9:37           ` Stefan Hajnoczi
  2016-06-24  9:53             ` Peter Lieven
  2016-06-24  9:58             ` Peter Maydell
  0 siblings, 2 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2016-06-24  9:37 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Lieven, Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 702 bytes --]

On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
> On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
> > What makes the coroutine pool memory intensive is the stack size of 1MB per
> > coroutine. Is it really necessary to have such a big stack?
> 
> That reminds me that I was wondering if we should allocate
> our coroutine stacks with MAP_GROWSDOWN (though if we're
> not actually using 1MB of stack then it's only going to
> be eating virtual memory, not necessarily real memory.)

Yes, MAP_GROWSDOWN will not reduce RSS.

It's possible that we can reduce RSS usage of the coroutine pool but it
will require someone to profile the pool usage patterns.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24  9:37           ` Stefan Hajnoczi
@ 2016-06-24  9:53             ` Peter Lieven
  2016-06-24  9:57               ` Dr. David Alan Gilbert
  2016-06-24  9:58             ` Peter Maydell
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-24  9:53 UTC (permalink / raw)
  To: Stefan Hajnoczi, Peter Maydell
  Cc: Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel

Am 24.06.2016 um 11:37 schrieb Stefan Hajnoczi:
> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
>> On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
>>> What makes the coroutine pool memory intensive is the stack size of 1MB per
>>> coroutine. Is it really necessary to have such a big stack?
>> That reminds me that I was wondering if we should allocate
>> our coroutine stacks with MAP_GROWSDOWN (though if we're
>> not actually using 1MB of stack then it's only going to
>> be eating virtual memory, not necessarily real memory.)
> Yes, MAP_GROWSDOWN will not reduce RSS.

Yes, I can confirm just tested...

>
> It's possible that we can reduce RSS usage of the coroutine pool but it
> will require someone to profile the pool usage patterns.

It would be interesting to see what stack size we really need. Is it possible
to automatically detect this value (at compile time?)

I can also confirm that the coroutine pool is the second major RSS user beside
heap fragmentation.

Lowering the mmap threshold of malloc to about 32k also gives good results.
In this case there are very few active mappings in the running vServer, but the
RSS is still at about 50MB (without coroutine pool). Maybe it would be good
to identify which parts of Qemu malloc lets say >16kB and convert them to mmap
if it is feasible.

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24  9:53             ` Peter Lieven
@ 2016-06-24  9:57               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2016-06-24  9:57 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, Peter Maydell, Paolo Bonzini, qemu-devel

* Peter Lieven (pl@kamp.de) wrote:
> Am 24.06.2016 um 11:37 schrieb Stefan Hajnoczi:
> > On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
> >> On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
> >>> What makes the coroutine pool memory intensive is the stack size of 1MB per
> >>> coroutine. Is it really necessary to have such a big stack?
> >> That reminds me that I was wondering if we should allocate
> >> our coroutine stacks with MAP_GROWSDOWN (though if we're
> >> not actually using 1MB of stack then it's only going to
> >> be eating virtual memory, not necessarily real memory.)
> > Yes, MAP_GROWSDOWN will not reduce RSS.
> 
> Yes, I can confirm just tested...
> 
> >
> > It's possible that we can reduce RSS usage of the coroutine pool but it
> > will require someone to profile the pool usage patterns.
> 
> It would be interesting to see what stack size we really need. Is it possible
> to automatically detect this value (at compile time?)
> 
> I can also confirm that the coroutine pool is the second major RSS user beside
> heap fragmentation.

But is it there stack? You said you tried marking GROWSDOWN, so can you check
 /proc/../smaps and see how much of the Rss is the growsdown space?

Dave

> Lowering the mmap threshold of malloc to about 32k also gives good results.
> In this case there are very few active mappings in the running vServer, but the
> RSS is still at about 50MB (without coroutine pool). Maybe it would be good
> to identify which parts of Qemu malloc lets say >16kB and convert them to mmap
> if it is feasible.
> 
> Peter
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24  9:37           ` Stefan Hajnoczi
  2016-06-24  9:53             ` Peter Lieven
@ 2016-06-24  9:58             ` Peter Maydell
  2016-06-24 10:45               ` Peter Lieven
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Maydell @ 2016-06-24  9:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Lieven, Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel

On 24 June 2016 at 10:37, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
>> On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
>> > What makes the coroutine pool memory intensive is the stack size of 1MB per
>> > coroutine. Is it really necessary to have such a big stack?
>>
>> That reminds me that I was wondering if we should allocate
>> our coroutine stacks with MAP_GROWSDOWN (though if we're
>> not actually using 1MB of stack then it's only going to
>> be eating virtual memory, not necessarily real memory.)
>
> Yes, MAP_GROWSDOWN will not reduce RSS.

Right, but then the 1MB of stack as currently allocated isn't
going to be affecting RSS either I would have thought (except
transiently, since we zero it on allocation which will
bring it into the RSS until it falls back out again
because we don't touch it after that).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24  9:58             ` Peter Maydell
@ 2016-06-24 10:45               ` Peter Lieven
  2016-06-27 12:39                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Lieven @ 2016-06-24 10:45 UTC (permalink / raw)
  To: Peter Maydell, Stefan Hajnoczi
  Cc: Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel

Am 24.06.2016 um 11:58 schrieb Peter Maydell:
> On 24 June 2016 at 10:37, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
>>> On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
>>>> What makes the coroutine pool memory intensive is the stack size of 1MB per
>>>> coroutine. Is it really necessary to have such a big stack?
>>> That reminds me that I was wondering if we should allocate
>>> our coroutine stacks with MAP_GROWSDOWN (though if we're
>>> not actually using 1MB of stack then it's only going to
>>> be eating virtual memory, not necessarily real memory.)
>> Yes, MAP_GROWSDOWN will not reduce RSS.
> Right, but then the 1MB of stack as currently allocated isn't
> going to be affecting RSS either I would have thought (except
> transiently, since we zero it on allocation which will
> bring it into the RSS until it falls back out again
> because we don't touch it after that).

What I observe regarding the coroutine pool is really strange. Under I/O load
while booting the vServer the RSS size is low as expected. If the vServer runs
for some time the RSS size suddenly explodes as if suddenly all the stack memory gets
mapped. This symptom definetely goes away if I disable the pool.

Regarding the coroutine pool I had the following thoughts:
 - mmap the stack so its actually really freed if the coroutine is deleted (with MAP_GROWSDOWN or not?)
 - drop the release_pool. It has actually only an effect for non virtio devices where the coroutine is
   not created and deleted in the same thread. But for virtio the release pool has the drawback that there
   is always a pingpong between the release_pool and the alloc_pool.
 - implement some kind of garbage collector that detects that a threads alloc_pool is actually to big (e.g. it
   stays above a watermark for some time) and then reduce its size.
 - detect that a coroutine was created in a vcpu thread (e.g. IDE) and released in the iothread. In this case
   don't add it to the pool so the alloc_pool of the I/O thread does not grow to max without being used.

Peter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-23  9:57   ` Peter Lieven
@ 2016-06-24 22:57     ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2016-06-24 22:57 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Dr. David Alan Gilbert, qemu-devel

On Thu, Jun 23, 2016 at 11:57:45AM +0200, Peter Lieven wrote:
> Am 21.06.2016 um 15:18 schrieb Dr. David Alan Gilbert:
> > * Peter Lieven (pl@kamp.de) wrote:
> > > Hi,
> > > 
> > > while upgrading from Qemu 2.2.0 to Qemu 2.5.1.1 I noticed that the RSS memory usage has heavily increased.
> > > We use hugepages so the RSS memory does not include VM memory. In Qemu 2.2.0 it used to be ~30MB per vServer
> > > and increased to up to 300 - 400MB for Qemu 2.5.1.1 (same with master). The memory increases over time, but seems
> > > not to grow indefinetly. I tried to bisect, but had no result so far that made sense. I also tried valgrind / massif, but
> > > valgrind does not see the allocation (at least at exit) and massif fails to rund due to - so it pretends - heap corruption.
> > > 
> > > Any help or ideas how to debug further would be appreciated.
> > I think I'd try stripping devices off; can you get a similar difference
> > to happen with a guest with no USB, no hugepages, no VGA and a simple
> > locally stored IDE disk?
> 
> From what I have debugged so far, it seems to be related to virtio-net. With that knowledge I will try to bisect again.
> 
> Peter
> 

Interesting. You can try attaching vhost  to tap so virtio net does not
process packets in qemu, for comparison.

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-24 10:45               ` Peter Lieven
@ 2016-06-27 12:39                 ` Stefan Hajnoczi
  2016-06-27 13:33                   ` Peter Lieven
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2016-06-27 12:39 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Peter Maydell, Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1656 bytes --]

On Fri, Jun 24, 2016 at 12:45:49PM +0200, Peter Lieven wrote:
> Am 24.06.2016 um 11:58 schrieb Peter Maydell:
> > On 24 June 2016 at 10:37, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
> >>> On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
> >>>> What makes the coroutine pool memory intensive is the stack size of 1MB per
> >>>> coroutine. Is it really necessary to have such a big stack?
> >>> That reminds me that I was wondering if we should allocate
> >>> our coroutine stacks with MAP_GROWSDOWN (though if we're
> >>> not actually using 1MB of stack then it's only going to
> >>> be eating virtual memory, not necessarily real memory.)
> >> Yes, MAP_GROWSDOWN will not reduce RSS.
> > Right, but then the 1MB of stack as currently allocated isn't
> > going to be affecting RSS either I would have thought (except
> > transiently, since we zero it on allocation which will
> > bring it into the RSS until it falls back out again
> > because we don't touch it after that).
> 
> What I observe regarding the coroutine pool is really strange. Under I/O load
> while booting the vServer the RSS size is low as expected. If the vServer runs
> for some time the RSS size suddenly explodes as if suddenly all the stack memory gets
> mapped. This symptom definetely goes away if I disable the pool.
> 
> Regarding the coroutine pool I had the following thoughts:
>  - mmap the stack so its actually really freed if the coroutine is deleted (with MAP_GROWSDOWN or not?)

This might be an easy fix if malloc is holding memory and not reusing
it.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] Qemu and heavily increased RSS usage
  2016-06-27 12:39                 ` Stefan Hajnoczi
@ 2016-06-27 13:33                   ` Peter Lieven
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Lieven @ 2016-06-27 13:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel



> Am 27.06.2016 um 14:39 schrieb Stefan Hajnoczi <stefanha@gmail.com>:
> 
>> On Fri, Jun 24, 2016 at 12:45:49PM +0200, Peter Lieven wrote:
>>> Am 24.06.2016 um 11:58 schrieb Peter Maydell:
>>>> On 24 June 2016 at 10:37, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>> On Wed, Jun 22, 2016 at 09:56:06PM +0100, Peter Maydell wrote:
>>>>>> On 22 June 2016 at 20:55, Peter Lieven <pl@kamp.de> wrote:
>>>>>> What makes the coroutine pool memory intensive is the stack size of 1MB per
>>>>>> coroutine. Is it really necessary to have such a big stack?
>>>>> That reminds me that I was wondering if we should allocate
>>>>> our coroutine stacks with MAP_GROWSDOWN (though if we're
>>>>> not actually using 1MB of stack then it's only going to
>>>>> be eating virtual memory, not necessarily real memory.)
>>>> Yes, MAP_GROWSDOWN will not reduce RSS.
>>> Right, but then the 1MB of stack as currently allocated isn't
>>> going to be affecting RSS either I would have thought (except
>>> transiently, since we zero it on allocation which will
>>> bring it into the RSS until it falls back out again
>>> because we don't touch it after that).
>> 
>> What I observe regarding the coroutine pool is really strange. Under I/O load
>> while booting the vServer the RSS size is low as expected. If the vServer runs
>> for some time the RSS size suddenly explodes as if suddenly all the stack memory gets
>> mapped. This symptom definetely goes away if I disable the pool.
>> 
>> Regarding the coroutine pool I had the following thoughts:
>> - mmap the stack so its actually really freed if the coroutine is deleted (with MAP_GROWSDOWN or not?)
> 
> This might be an easy fix if malloc is holding memory and not reusing
> it.

it is reusing it, but its heavily fragmented as it seems. i am preparing a series to improve the rss usage. hopefully i have sth ready by tomorrow.

 Peter

> 
> Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-06-27 13:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-21  8:21 [Qemu-devel] Qemu and heavily increased RSS usage Peter Lieven
2016-06-21 13:18 ` Dr. David Alan Gilbert
2016-06-21 15:12   ` Peter Lieven
2016-06-22 10:56     ` Stefan Hajnoczi
2016-06-22 19:55       ` Peter Lieven
2016-06-22 20:56         ` Peter Maydell
2016-06-24  9:37           ` Stefan Hajnoczi
2016-06-24  9:53             ` Peter Lieven
2016-06-24  9:57               ` Dr. David Alan Gilbert
2016-06-24  9:58             ` Peter Maydell
2016-06-24 10:45               ` Peter Lieven
2016-06-27 12:39                 ` Stefan Hajnoczi
2016-06-27 13:33                   ` Peter Lieven
2016-06-23  9:57   ` Peter Lieven
2016-06-24 22:57     ` Michael S. Tsirkin
2016-06-23 14:58   ` Peter Lieven
2016-06-23 15:00     ` Dr. David Alan Gilbert
2016-06-23 15:02       ` Peter Lieven
2016-06-23 15:21     ` Paolo Bonzini
2016-06-23 15:31       ` Peter Lieven
2016-06-23 15:47         ` Paolo Bonzini
2016-06-23 16:19           ` Peter Lieven
2016-06-23 16:53             ` Paolo Bonzini
2016-06-23 21:28               ` Peter Lieven
2016-06-24  4:10                 ` Paolo Bonzini
2016-06-24  8:11                   ` Peter Lieven
2016-06-24  8:20                     ` Paolo Bonzini
2016-06-24  8:45                       ` Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.