* [Qemu-devel] memory allocation of migration changed? @ 2014-02-11 14:54 Stefan Priebe - Profihost AG 2014-02-11 15:44 ` Stefan Hajnoczi 0 siblings, 1 reply; 12+ messages in thread From: Stefan Priebe - Profihost AG @ 2014-02-11 14:54 UTC (permalink / raw) To: qemu-devel Hello, in the past (Qemu 1.5) a migration failed if there was not enogh memory on the target host available directly at the beginning. Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM memory killer killing qemu processes. So the migration seems to takes place without having anough memory on the target machine? Could this be? Greets, Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-11 14:54 [Qemu-devel] memory allocation of migration changed? Stefan Priebe - Profihost AG @ 2014-02-11 15:44 ` Stefan Hajnoczi 2014-02-11 16:22 ` Peter Lieven 2014-02-11 18:30 ` Stefan Priebe 0 siblings, 2 replies; 12+ messages in thread From: Stefan Hajnoczi @ 2014-02-11 15:44 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Orit Wasserman, Peter Lieven, qemu-devel, Dave Gilbert, Juan Quintela On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > in the past (Qemu 1.5) a migration failed if there was not enogh memory > on the target host available directly at the beginning. > > Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM > memory killer killing qemu processes. So the migration seems to takes > place without having anough memory on the target machine? How much memory is the guest configured with? How much memory does the host have? I wonder if there are zero pages that can be migrated almost "for free" and the destination host doesn't touch. When they are touched for the first time after migration handover, they need to be allocated on the destination host. This can lead to OOM if you overcommitted memory. Can you reproduce the OOM reliably? It should be possible to debug it and figure out whether it's just bad luck or a true regression. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-11 15:44 ` Stefan Hajnoczi @ 2014-02-11 16:22 ` Peter Lieven 2014-02-11 18:32 ` Stefan Priebe 2014-02-11 18:30 ` Stefan Priebe 1 sibling, 1 reply; 12+ messages in thread From: Peter Lieven @ 2014-02-11 16:22 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Orit Wasserman, Juan Quintela, qemu-devel, Dave Gilbert, Stefan Priebe - Profihost AG > Am 11.02.2014 um 16:44 schrieb Stefan Hajnoczi <stefanha@gmail.com>: > > On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> in the past (Qemu 1.5) a migration failed if there was not enogh memory >> on the target host available directly at the beginning. >> >> Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM >> memory killer killing qemu processes. So the migration seems to takes >> place without having anough memory on the target machine? > > How much memory is the guest configured with? How much memory does > the host have? > > I wonder if there are zero pages that can be migrated almost "for > free" and the destination host doesn't touch. When they are touched > for the first time after migration handover, they need to be allocated > on the destination host. This can lead to OOM if you overcommitted > memory. > > Can you reproduce the OOM reliably? It should be possible to debug it > and figure out whether it's just bad luck or a true regression. > > Stefan Kernel Version would also be interesting as well as thp and ksm settings. Peter ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-11 16:22 ` Peter Lieven @ 2014-02-11 18:32 ` Stefan Priebe 2014-02-14 14:59 ` Stefan Hajnoczi 0 siblings, 1 reply; 12+ messages in thread From: Stefan Priebe @ 2014-02-11 18:32 UTC (permalink / raw) To: Peter Lieven, Stefan Hajnoczi Cc: Orit Wasserman, qemu-devel, Dave Gilbert, Juan Quintela Am 11.02.2014 17:22, schrieb Peter Lieven: > > >> Am 11.02.2014 um 16:44 schrieb Stefan Hajnoczi <stefanha@gmail.com>: >> >> On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG >> <s.priebe@profihost.ag> wrote: >>> in the past (Qemu 1.5) a migration failed if there was not enogh memory >>> on the target host available directly at the beginning. >>> >>> Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM >>> memory killer killing qemu processes. So the migration seems to takes >>> place without having anough memory on the target machine? >> >> How much memory is the guest configured with? How much memory does >> the host have? >> >> I wonder if there are zero pages that can be migrated almost "for >> free" and the destination host doesn't touch. When they are touched >> for the first time after migration handover, they need to be allocated >> on the destination host. This can lead to OOM if you overcommitted >> memory. >> >> Can you reproduce the OOM reliably? It should be possible to debug it >> and figure out whether it's just bad luck or a true regression. >> >> Stefan > > Kernel Version would also be interesting as well as thp and ksm settings. Kernel Host: 3.10.26 What's thp / ksm? how to get those settings? Greets, Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-11 18:32 ` Stefan Priebe @ 2014-02-14 14:59 ` Stefan Hajnoczi 2014-02-14 18:15 ` Stefan Priebe 0 siblings, 1 reply; 12+ messages in thread From: Stefan Hajnoczi @ 2014-02-14 14:59 UTC (permalink / raw) To: Stefan Priebe Cc: Orit Wasserman, Peter Lieven, qemu-devel, Dave Gilbert, Juan Quintela On Tue, Feb 11, 2014 at 07:32:46PM +0100, Stefan Priebe wrote: > Am 11.02.2014 17:22, schrieb Peter Lieven: > > > > > >>Am 11.02.2014 um 16:44 schrieb Stefan Hajnoczi <stefanha@gmail.com>: > >> > >>On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG > >><s.priebe@profihost.ag> wrote: > >>>in the past (Qemu 1.5) a migration failed if there was not enogh memory > >>>on the target host available directly at the beginning. > >>> > >>>Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM > >>>memory killer killing qemu processes. So the migration seems to takes > >>>place without having anough memory on the target machine? > >> > >>How much memory is the guest configured with? How much memory does > >>the host have? > >> > >>I wonder if there are zero pages that can be migrated almost "for > >>free" and the destination host doesn't touch. When they are touched > >>for the first time after migration handover, they need to be allocated > >>on the destination host. This can lead to OOM if you overcommitted > >>memory. > >> > >>Can you reproduce the OOM reliably? It should be possible to debug it > >>and figure out whether it's just bad luck or a true regression. > >> > >>Stefan > > > >Kernel Version would also be interesting as well as thp and ksm settings. > > Kernel Host: 3.10.26 > > What's thp / ksm? how to get those settings? Transparent Huge Pages # cat /sys/kernel/mm/transparent_hugepage/enabled Kernel Samepage Merging # cat /sys/kernel/mm/ksm/run Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-14 14:59 ` Stefan Hajnoczi @ 2014-02-14 18:15 ` Stefan Priebe 0 siblings, 0 replies; 12+ messages in thread From: Stefan Priebe @ 2014-02-14 18:15 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Orit Wasserman, Peter Lieven, qemu-devel, Dave Gilbert, Juan Quintela Am 14.02.2014 15:59, schrieb Stefan Hajnoczi: > On Tue, Feb 11, 2014 at 07:32:46PM +0100, Stefan Priebe wrote: >> Am 11.02.2014 17:22, schrieb Peter Lieven: >>> >>> >>>> Am 11.02.2014 um 16:44 schrieb Stefan Hajnoczi <stefanha@gmail.com>: >>>> >>>> On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG >>>> <s.priebe@profihost.ag> wrote: >>>>> in the past (Qemu 1.5) a migration failed if there was not enogh memory >>>>> on the target host available directly at the beginning. >>>>> >>>>> Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM >>>>> memory killer killing qemu processes. So the migration seems to takes >>>>> place without having anough memory on the target machine? >>>> >>>> How much memory is the guest configured with? How much memory does >>>> the host have? >>>> >>>> I wonder if there are zero pages that can be migrated almost "for >>>> free" and the destination host doesn't touch. When they are touched >>>> for the first time after migration handover, they need to be allocated >>>> on the destination host. This can lead to OOM if you overcommitted >>>> memory. >>>> >>>> Can you reproduce the OOM reliably? It should be possible to debug it >>>> and figure out whether it's just bad luck or a true regression. >>>> >>>> Stefan >>> >>> Kernel Version would also be interesting as well as thp and ksm settings. >> >> Kernel Host: 3.10.26 >> >> What's thp / ksm? how to get those settings? > > Transparent Huge Pages > > # cat /sys/kernel/mm/transparent_hugepage/enabled > > Kernel Samepage Merging > > # cat /sys/kernel/mm/ksm/run # cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never # cat /sys/kernel/mm/ksm/run 1 > > Stefan > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-11 15:44 ` Stefan Hajnoczi 2014-02-11 16:22 ` Peter Lieven @ 2014-02-11 18:30 ` Stefan Priebe 2014-02-14 15:03 ` Stefan Hajnoczi 1 sibling, 1 reply; 12+ messages in thread From: Stefan Priebe @ 2014-02-11 18:30 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Orit Wasserman, Peter Lieven, qemu-devel, Dave Gilbert, Juan Quintela Am 11.02.2014 16:44, schrieb Stefan Hajnoczi: > On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> in the past (Qemu 1.5) a migration failed if there was not enogh memory >> on the target host available directly at the beginning. >> >> Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM >> memory killer killing qemu processes. So the migration seems to takes >> place without having anough memory on the target machine? > > How much memory is the guest configured with? How much memory does > the host have? Guest: 48GB Host: 192GB > I wonder if there are zero pages that can be migrated almost "for > free" and the destination host doesn't touch. When they are touched > for the first time after migration handover, they need to be allocated > on the destination host. This can lead to OOM if you overcommitted > memory. In the past the migration failed immediatly with exit code 255. > Can you reproduce the OOM reliably? It should be possible to debug it > and figure out whether it's just bad luck or a true regression. So there is no known patch changing this behaviour? What is about those? fc1c4a5d32e15a4c40c47945da85ef9c1e0c1b54 211ea74022f51164a7729030b28eec90b6c99a08 f1c72795af573b24a7da5eb52375c9aba8a37972 Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-11 18:30 ` Stefan Priebe @ 2014-02-14 15:03 ` Stefan Hajnoczi 2014-02-14 18:16 ` Stefan Priebe 0 siblings, 1 reply; 12+ messages in thread From: Stefan Hajnoczi @ 2014-02-14 15:03 UTC (permalink / raw) To: Stefan Priebe Cc: Orit Wasserman, Peter Lieven, qemu-devel, Dave Gilbert, Juan Quintela On Tue, Feb 11, 2014 at 07:30:54PM +0100, Stefan Priebe wrote: > Am 11.02.2014 16:44, schrieb Stefan Hajnoczi: > >On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG > ><s.priebe@profihost.ag> wrote: > >>in the past (Qemu 1.5) a migration failed if there was not enogh memory > >>on the target host available directly at the beginning. > >> > >>Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM > >>memory killer killing qemu processes. So the migration seems to takes > >>place without having anough memory on the target machine? > > > >How much memory is the guest configured with? How much memory does > >the host have? > > Guest: 48GB > Host: 192GB > > >I wonder if there are zero pages that can be migrated almost "for > >free" and the destination host doesn't touch. When they are touched > >for the first time after migration handover, they need to be allocated > >on the destination host. This can lead to OOM if you overcommitted > >memory. > > In the past the migration failed immediatly with exit code 255. > > >Can you reproduce the OOM reliably? It should be possible to debug it > >and figure out whether it's just bad luck or a true regression. > > So there is no known patch changing this behaviour? > > What is about those? > fc1c4a5d32e15a4c40c47945da85ef9c1e0c1b54 > 211ea74022f51164a7729030b28eec90b6c99a08 > f1c72795af573b24a7da5eb52375c9aba8a37972 Yes, that's what I was referring to when I mentioned zero pages. The problem might just be that the destination host didn't have enough free memory. Migration succeeded due to memory overcommit on the host, but quickly ran out of memory after handover. The quick answer there is to reconsider your overcommitting memory and also checking memory availability before live migrating. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-14 15:03 ` Stefan Hajnoczi @ 2014-02-14 18:16 ` Stefan Priebe 2014-02-24 15:00 ` Stefan Hajnoczi 0 siblings, 1 reply; 12+ messages in thread From: Stefan Priebe @ 2014-02-14 18:16 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Orit Wasserman, Peter Lieven, qemu-devel, Dave Gilbert, Juan Quintela Am 14.02.2014 16:03, schrieb Stefan Hajnoczi: > On Tue, Feb 11, 2014 at 07:30:54PM +0100, Stefan Priebe wrote: >> Am 11.02.2014 16:44, schrieb Stefan Hajnoczi: >>> On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG >>> <s.priebe@profihost.ag> wrote: >>>> in the past (Qemu 1.5) a migration failed if there was not enogh memory >>>> on the target host available directly at the beginning. >>>> >>>> Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM >>>> memory killer killing qemu processes. So the migration seems to takes >>>> place without having anough memory on the target machine? >>> >>> How much memory is the guest configured with? How much memory does >>> the host have? >> >> Guest: 48GB >> Host: 192GB >> >>> I wonder if there are zero pages that can be migrated almost "for >>> free" and the destination host doesn't touch. When they are touched >>> for the first time after migration handover, they need to be allocated >>> on the destination host. This can lead to OOM if you overcommitted >>> memory. >> >> In the past the migration failed immediatly with exit code 255. >> >>> Can you reproduce the OOM reliably? It should be possible to debug it >>> and figure out whether it's just bad luck or a true regression. >> >> So there is no known patch changing this behaviour? >> >> What is about those? >> fc1c4a5d32e15a4c40c47945da85ef9c1e0c1b54 >> 211ea74022f51164a7729030b28eec90b6c99a08 >> f1c72795af573b24a7da5eb52375c9aba8a37972 > > Yes, that's what I was referring to when I mentioned zero pages. > > The problem might just be that the destination host didn't have enough > free memory. Migration succeeded due to memory overcommit on the host, > but quickly ran out of memory after handover. The quick answer there is > to reconsider your overcommitting memory and also checking memory > availability before live migrating. Yes that makes sense in the past it was just working due to a failing qemu process. What is the right way to check for enough free memory and memory usage of a specific vm? Stefan > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-14 18:16 ` Stefan Priebe @ 2014-02-24 15:00 ` Stefan Hajnoczi 2014-02-24 16:13 ` Eric Blake 0 siblings, 1 reply; 12+ messages in thread From: Stefan Hajnoczi @ 2014-02-24 15:00 UTC (permalink / raw) To: Stefan Priebe Cc: Orit Wasserman, Peter Lieven, qemu-devel, Dave Gilbert, Juan Quintela On Fri, Feb 14, 2014 at 07:16:17PM +0100, Stefan Priebe wrote: > > Am 14.02.2014 16:03, schrieb Stefan Hajnoczi: > >On Tue, Feb 11, 2014 at 07:30:54PM +0100, Stefan Priebe wrote: > >>Am 11.02.2014 16:44, schrieb Stefan Hajnoczi: > >>>On Tue, Feb 11, 2014 at 3:54 PM, Stefan Priebe - Profihost AG > >>><s.priebe@profihost.ag> wrote: > >>>>in the past (Qemu 1.5) a migration failed if there was not enogh memory > >>>>on the target host available directly at the beginning. > >>>> > >>>>Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM > >>>>memory killer killing qemu processes. So the migration seems to takes > >>>>place without having anough memory on the target machine? > >>> > >>>How much memory is the guest configured with? How much memory does > >>>the host have? > >> > >>Guest: 48GB > >>Host: 192GB > >> > >>>I wonder if there are zero pages that can be migrated almost "for > >>>free" and the destination host doesn't touch. When they are touched > >>>for the first time after migration handover, they need to be allocated > >>>on the destination host. This can lead to OOM if you overcommitted > >>>memory. > >> > >>In the past the migration failed immediatly with exit code 255. > >> > >>>Can you reproduce the OOM reliably? It should be possible to debug it > >>>and figure out whether it's just bad luck or a true regression. > >> > >>So there is no known patch changing this behaviour? > >> > >>What is about those? > >>fc1c4a5d32e15a4c40c47945da85ef9c1e0c1b54 > >>211ea74022f51164a7729030b28eec90b6c99a08 > >>f1c72795af573b24a7da5eb52375c9aba8a37972 > > > >Yes, that's what I was referring to when I mentioned zero pages. > > > >The problem might just be that the destination host didn't have enough > >free memory. Migration succeeded due to memory overcommit on the host, > >but quickly ran out of memory after handover. The quick answer there is > >to reconsider your overcommitting memory and also checking memory > >availability before live migrating. > > Yes that makes sense in the past it was just working due to a > failing qemu process. > > What is the right way to check for enough free memory and memory > usage of a specific vm? I would approach it in terms of guest RAM allocation plus QEMU overhead: host_ram >= num_guests * guest_ram_size + num_guests * qemu_overhead The qemu_overhead is the question mark. It depends on the QEMU features the guests have enabled and number of devices. QEMU also does not have a strict policy on predictable memory consumption, which makes it hard to give a formula for it. You could get an idea by running memtest86 or something that touches all memory inside the guest and then looking at the RSS memory statistic for the QEMU process: qemu_overhead = rss - guest_ram_size Even that is just an estimation since RSS only tells you how much host physical memory is *currently* allocated. There could be some codepath that hasn't been executed yet that will require much more memory. Whatever you do, don't look at the virtual size (VSZ) because that is very misleading (including mmaps of pages that may never be touched). Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-24 15:00 ` Stefan Hajnoczi @ 2014-02-24 16:13 ` Eric Blake 2014-03-12 19:15 ` Stefan Priebe 0 siblings, 1 reply; 12+ messages in thread From: Eric Blake @ 2014-02-24 16:13 UTC (permalink / raw) To: Stefan Hajnoczi, Stefan Priebe Cc: Orit Wasserman, Juan Quintela, Peter Lieven, qemu-devel, Dave Gilbert [-- Attachment #1: Type: text/plain, Size: 1219 bytes --] On 02/24/2014 08:00 AM, Stefan Hajnoczi wrote: >> What is the right way to check for enough free memory and memory >> usage of a specific vm? > > I would approach it in terms of guest RAM allocation plus QEMU overhead: > > host_ram >= num_guests * guest_ram_size + num_guests * qemu_overhead > > The qemu_overhead is the question mark. It depends on the QEMU features > the guests have enabled and number of devices. > > QEMU also does not have a strict policy on predictable memory > consumption, which makes it hard to give a formula for it. In fact, at one point libvirt tried to put an automatic cap on the memory usable by qemu by multiplying RAM size and accounting for a margin of overhead, but no matter what heuristics we tried, we still got complaints from users that their guests were killed when they ran out of memory, and so we ended up reverting the automatic limits from libvirt. (You can still enforce a limit as an end user, although the libvirt documentation no longer recommends attempting that, for as long as the qemu allocation remains unpredictable.) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] memory allocation of migration changed? 2014-02-24 16:13 ` Eric Blake @ 2014-03-12 19:15 ` Stefan Priebe 0 siblings, 0 replies; 12+ messages in thread From: Stefan Priebe @ 2014-03-12 19:15 UTC (permalink / raw) To: Eric Blake, Stefan Hajnoczi Cc: Orit Wasserman, Juan Quintela, Peter Lieven, qemu-devel, Dave Gilbert Am 24.02.2014 17:13, schrieb Eric Blake: > On 02/24/2014 08:00 AM, Stefan Hajnoczi wrote: > >>> What is the right way to check for enough free memory and memory >>> usage of a specific vm? >> >> I would approach it in terms of guest RAM allocation plus QEMU overhead: >> >> host_ram >= num_guests * guest_ram_size + num_guests * qemu_overhead >> >> The qemu_overhead is the question mark. It depends on the QEMU features >> the guests have enabled and number of devices. >> >> QEMU also does not have a strict policy on predictable memory >> consumption, which makes it hard to give a formula for it. > > In fact, at one point libvirt tried to put an automatic cap on the > memory usable by qemu by multiplying RAM size and accounting for a > margin of overhead, but no matter what heuristics we tried, we still got > complaints from users that their guests were killed when they ran out of > memory, and so we ended up reverting the automatic limits from libvirt. > (You can still enforce a limit as an end user, although the libvirt > documentation no longer recommends attempting that, for as long as the > qemu allocation remains unpredictable.) > Might something like: vm.overcommit_ratio=100 vm.overcommit_memory=2 help? So that the migration might not happen but that's better than killing a random process? Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-03-12 19:15 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-11 14:54 [Qemu-devel] memory allocation of migration changed? Stefan Priebe - Profihost AG 2014-02-11 15:44 ` Stefan Hajnoczi 2014-02-11 16:22 ` Peter Lieven 2014-02-11 18:32 ` Stefan Priebe 2014-02-14 14:59 ` Stefan Hajnoczi 2014-02-14 18:15 ` Stefan Priebe 2014-02-11 18:30 ` Stefan Priebe 2014-02-14 15:03 ` Stefan Hajnoczi 2014-02-14 18:16 ` Stefan Priebe 2014-02-24 15:00 ` Stefan Hajnoczi 2014-02-24 16:13 ` Eric Blake 2014-03-12 19:15 ` Stefan Priebe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.