* [Qemu-devel] When it's okay to treat OOM as fatal? @ 2018-10-16 13:01 Markus Armbruster 2018-10-16 13:20 ` Daniel P. Berrangé ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Markus Armbruster @ 2018-10-16 13:01 UTC (permalink / raw) To: qemu-devel We sometimes use g_new() & friends, which abort() on OOM, and sometimes g_try_new() & friends, which can fail, and therefore require error handling. HACKING points out the difference, but is mum on when to use what: 3. Low level memory management Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign APIs is not allowed in the QEMU codebase. Instead of these routines, use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree APIs. Please note that g_malloc will exit on allocation failure, so there is no need to test for failure (as you would have to with malloc). Calling g_malloc with a zero size is valid and will return NULL. Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following reasons: a. It catches multiplication overflowing size_t; b. It returns T * instead of void *, letting compiler catch more type errors. Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. Memory allocated by qemu_memalign or qemu_blockalign must be freed with qemu_vfree, since breaking this will cause problems on Win32. Now, in my personal opinion, handling OOM gracefully is worth the (commonly considerable) trouble when you're coding for an Apple II or similar. Anything that pages commonly becomes unusable long before allocations fail. Anything that overcommits will send you a (commonly lethal) signal instead. Anything that tries handling OOM gracefully, and manages to dodge both these bullets somehow, will commonly get it wrong and crash. But others are entitled to their opinions as much as I am. I just want to know what our rules are, preferably in the form of a patch to HACKING. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-16 13:01 [Qemu-devel] When it's okay to treat OOM as fatal? Markus Armbruster @ 2018-10-16 13:20 ` Daniel P. Berrangé 2018-10-18 13:06 ` Markus Armbruster 2018-10-16 13:33 ` Dr. David Alan Gilbert 2018-10-17 10:05 ` Stefan Hajnoczi 2 siblings, 1 reply; 13+ messages in thread From: Daniel P. Berrangé @ 2018-10-16 13:20 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel On Tue, Oct 16, 2018 at 03:01:29PM +0200, Markus Armbruster wrote: > We sometimes use g_new() & friends, which abort() on OOM, and sometimes > g_try_new() & friends, which can fail, and therefore require error > handling. > > HACKING points out the difference, but is mum on when to use what: > > 3. Low level memory management > > Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign > APIs is not allowed in the QEMU codebase. Instead of these routines, > use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ > g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree > APIs. > > Please note that g_malloc will exit on allocation failure, so there > is no need to test for failure (as you would have to with malloc). > Calling g_malloc with a zero size is valid and will return NULL. > > Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following > reasons: > > a. It catches multiplication overflowing size_t; > b. It returns T * instead of void *, letting compiler catch more type > errors. > > Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. > > Memory allocated by qemu_memalign or qemu_blockalign must be freed with > qemu_vfree, since breaking this will cause problems on Win32. > > Now, in my personal opinion, handling OOM gracefully is worth the > (commonly considerable) trouble when you're coding for an Apple II or > similar. Anything that pages commonly becomes unusable long before > allocations fail. Anything that overcommits will send you a (commonly > lethal) signal instead. Anything that tries handling OOM gracefully, > and manages to dodge both these bullets somehow, will commonly get it > wrong and crash. FWIW, with the cgroups memory controller (with or without containers) you can be in an environment where there's a memory cap. This can conceivably cause QEMU to see ENOMEM, while the host OS in general is operating normally with no swap usage / paging. That said, no one has ever been able to come up with an algorithm that reliably predicts the "normal" QEMU peak memory usage. So any time the cgroups memory cap has been used, it has typically resulted in QEMU unreasonably aborting in normal operation. This makes it impractical to try to confine QEMU's memory usage with cgroups IMHO. > But others are entitled to their opinions as much as I am. I just want > to know what our rules are, preferably in the form of a patch to > HACKING. I vaguely recall it being said that we should use g_try_new in code paths that can be triggered from monitor commands that would cause allocation of "significant" amounts of RAM, for some arbitrary defintiion of what "significant" means. eg hotplug a QXL PCI video card with 256 MB of video RAM, you might use g_try_new() for allocating this 256 MB chunk and return gracefully on failure, rather than the hotplug op causing QEMU to abort. The problem with OOM handling is proving that the cleanup paths you take actually do something sensible / correct, rather than result in cascading failures due to further OOMs. You're going to need test cases that exercise the relevant codepaths, and a way to inject OOM at each individual malloc, or across a sequence of mallocs. This is extraordinarily expensive to test as it becomes a combinatorial problem. We've done such exhaustive malloc failure testing in libvirt before but it takes such a long time and it is hard to characterize "correct" output of the test suite. This meant we caught obvious mistakes that lead to SEGVs for the test, but needed hand inspection to identify cases where we incorrectly carried on executing with critical data missing due to the OOM. It has been a while since I last tried todo OOM testing of libvirt, so I don't have high confidence in us doing something sensible. The only thing in our favour is that we've designed our malloc API replacements so that the pointer to allocated memory is returned to the caller separately from the success/failure status. Combined with attribute((return_check)) this let us get compile time validation that we are actually checking for malloc failures. GLibs g_try_new API don't allow such compile time checking as they still overload the pointer with the success/failure status. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-16 13:20 ` Daniel P. Berrangé @ 2018-10-18 13:06 ` Markus Armbruster 2018-10-18 14:28 ` Paolo Bonzini 0 siblings, 1 reply; 13+ messages in thread From: Markus Armbruster @ 2018-10-18 13:06 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: qemu-devel Daniel P. Berrangé <berrange@redhat.com> writes: > On Tue, Oct 16, 2018 at 03:01:29PM +0200, Markus Armbruster wrote: >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes >> g_try_new() & friends, which can fail, and therefore require error >> handling. >> >> HACKING points out the difference, but is mum on when to use what: >> >> 3. Low level memory management >> >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign >> APIs is not allowed in the QEMU codebase. Instead of these routines, >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree >> APIs. >> >> Please note that g_malloc will exit on allocation failure, so there >> is no need to test for failure (as you would have to with malloc). >> Calling g_malloc with a zero size is valid and will return NULL. >> >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following >> reasons: >> >> a. It catches multiplication overflowing size_t; >> b. It returns T * instead of void *, letting compiler catch more type >> errors. >> >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. >> >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with >> qemu_vfree, since breaking this will cause problems on Win32. >> >> Now, in my personal opinion, handling OOM gracefully is worth the >> (commonly considerable) trouble when you're coding for an Apple II or >> similar. Anything that pages commonly becomes unusable long before >> allocations fail. Anything that overcommits will send you a (commonly >> lethal) signal instead. Anything that tries handling OOM gracefully, >> and manages to dodge both these bullets somehow, will commonly get it >> wrong and crash. > > FWIW, with the cgroups memory controller (with or without containers) > you can be in an environment where there's a memory cap. This can > conceivably cause QEMU to see ENOMEM, while the host OS in general > is operating normally with no swap usage / paging. > > That said, no one has ever been able to come up with an algorithm that > reliably predicts the "normal" QEMU peak memory usage. So any time the > cgroups memory cap has been used, it has typically resulted in QEMU > unreasonably aborting in normal operation. This makes it impractical > to try to confine QEMU's memory usage with cgroups IMHO. > >> But others are entitled to their opinions as much as I am. I just want >> to know what our rules are, preferably in the form of a patch to >> HACKING. > > I vaguely recall it being said that we should use g_try_new in code > paths that can be triggered from monitor commands that would cause > allocation of "significant" amounts of RAM, for some arbitrary > defintiion of what "significant" means. > > eg hotplug a QXL PCI video card with 256 MB of video RAM, you might > use g_try_new() for allocating this 256 MB chunk and return gracefully > on failure, rather than the hotplug op causing QEMU to abort. Funny you picked this example. It happens to be one of the devices that made me ask. Device "qxl" creates a memory region "qxl.vgavram" with a size taken from uint32_t property "ram_size", silently rounded up to the next power of two. It uses &error_fatal for error handling. Let's play with it. $ upstream-qemu -monitor stdio -display none -device qxl,ram_size=2147483648 QEMU 3.0.50 monitor - type 'help' for more information (qemu) info qtree bus: main-system-bus [...] dev: i440FX-pcihost, id "" pci-hole64-size = 2147483648 (2 GiB) short_root_bus = 0 (0x0) x-pci-hole64-fix = true bus: pci.0 type PCI dev: qxl, id "" ---> ram_size = 2147483648 (0x80000000) vram_size = 67108864 (0x4000000) [...] Happily allocates 2GiB of RAM. I could do this with a monitor command (qxl is hot-pluggable), but I'm too lazy for that. Adding another 26 of them for a total of 54 GiB also succeeds. That's more than this box has RAM and swap space combined. Fun: scratch -display none, and Gtk starts spitting messages at seven qxl devices, and SEGVs at eight. Cherry on top: $ upstream-qemu -device qxl,ram_size=2147483649 upstream-qemu: /home/armbru/work/qemu/exec.c:1891: find_ram_offset: Assertion `size != 0' failed. Aborted (core dumped) My points are: 1. Even if we 'should use g_try_new in code paths that can be triggered from monitor commands that would cause allocation of "significant" amounts of RAM', we actually don't, at least not anywhere near consistently. 2. And even when we don't, that's not the actual problem, simply because allocation stubbornly refuses to fail. Instead we die of other causes. > The problem with OOM handling is proving that the cleanup paths you > take actually do something sensible / correct, rather than result > in cascading failures due to further OOMs. You're going to need test > cases that exercise the relevant codepaths, and a way to inject OOM > at each individual malloc, or across a sequence of mallocs. This is > extraordinarily expensive to test as it becomes a combinatorial > problem. Exactly. > We've done such exhaustive malloc failure testing in libvirt before > but it takes such a long time and it is hard to characterize "correct" > output of the test suite. This meant we caught obvious mistakes that > lead to SEGVs for the test, but needed hand inspection to identify > cases where we incorrectly carried on executing with critical data > missing due to the OOM. It has been a while since I last tried todo > OOM testing of libvirt, so I don't have high confidence in us doing > something sensible. If "extraordinary expensive" work results in low confidence, decaying quickly to even lower confidence unless you expensively maintain it, then it's a bad investment. > The only thing in our favour is that we've designed > our malloc API replacements so that the pointer to allocated memory is > returned to the caller separately from the success/failure status. > Combined with attribute((return_check)) this let us get compile time > validation that we are actually checking for malloc failures. GLibs > g_try_new API don't allow such compile time checking as they still > overload the pointer with the success/failure status. Forcing error handling into existence is the easy part. Making sure it actually works is much, much harder. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-18 13:06 ` Markus Armbruster @ 2018-10-18 14:28 ` Paolo Bonzini 0 siblings, 0 replies; 13+ messages in thread From: Paolo Bonzini @ 2018-10-18 14:28 UTC (permalink / raw) To: Markus Armbruster, Daniel P. Berrangé; +Cc: qemu-devel On 18/10/2018 15:06, Markus Armbruster wrote: > Device "qxl" creates a memory region "qxl.vgavram" with a size taken > from uint32_t property "ram_size", silently rounded up to the next power > of two. It uses &error_fatal for error handling. That's good to some extent---it means that the core code _is_ ready for handling ENOMEM in this part of QEMU, it's just the device that doesn't use it. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-16 13:01 [Qemu-devel] When it's okay to treat OOM as fatal? Markus Armbruster 2018-10-16 13:20 ` Daniel P. Berrangé @ 2018-10-16 13:33 ` Dr. David Alan Gilbert 2018-10-18 14:46 ` Markus Armbruster 2018-10-17 10:05 ` Stefan Hajnoczi 2 siblings, 1 reply; 13+ messages in thread From: Dr. David Alan Gilbert @ 2018-10-16 13:33 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel * Markus Armbruster (armbru@redhat.com) wrote: > We sometimes use g_new() & friends, which abort() on OOM, and sometimes > g_try_new() & friends, which can fail, and therefore require error > handling. > > HACKING points out the difference, but is mum on when to use what: > > 3. Low level memory management > > Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign > APIs is not allowed in the QEMU codebase. Instead of these routines, > use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ > g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree > APIs. > > Please note that g_malloc will exit on allocation failure, so there > is no need to test for failure (as you would have to with malloc). > Calling g_malloc with a zero size is valid and will return NULL. > > Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following > reasons: > > a. It catches multiplication overflowing size_t; > b. It returns T * instead of void *, letting compiler catch more type > errors. > > Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. > > Memory allocated by qemu_memalign or qemu_blockalign must be freed with > qemu_vfree, since breaking this will cause problems on Win32. > > Now, in my personal opinion, handling OOM gracefully is worth the > (commonly considerable) trouble when you're coding for an Apple II or > similar. Anything that pages commonly becomes unusable long before > allocations fail. That's not always my experience; I've seen cases where you suddenly allocate a load more memory and hit OOM fairly quickly on that hot process. Most of the time on the desktop you're right. > Anything that overcommits will send you a (commonly > lethal) signal instead. Anything that tries handling OOM gracefully, > and manages to dodge both these bullets somehow, will commonly get it > wrong and crash. If your qemu has maped it's main memory from hugetlbfs or similar pools then we're looking at the other memory allocations; and that's a bit of an interesting difference where those other allocations should be a lot smaller. > But others are entitled to their opinions as much as I am. I just want > to know what our rules are, preferably in the form of a patch to > HACKING. My rule is to try not to break a happily running VM by some new activity; I don't worry about it during startup. So for example, I don't like it when starting a migration, allocates some more memory and kills the VM - the user had a happy stable VM upto that point. Migration gets the blame at this point. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-16 13:33 ` Dr. David Alan Gilbert @ 2018-10-18 14:46 ` Markus Armbruster 2018-10-18 14:54 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 13+ messages in thread From: Markus Armbruster @ 2018-10-18 14:46 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: qemu-devel "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > * Markus Armbruster (armbru@redhat.com) wrote: >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes >> g_try_new() & friends, which can fail, and therefore require error >> handling. >> >> HACKING points out the difference, but is mum on when to use what: >> >> 3. Low level memory management >> >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign >> APIs is not allowed in the QEMU codebase. Instead of these routines, >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree >> APIs. >> >> Please note that g_malloc will exit on allocation failure, so there >> is no need to test for failure (as you would have to with malloc). >> Calling g_malloc with a zero size is valid and will return NULL. >> >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following >> reasons: >> >> a. It catches multiplication overflowing size_t; >> b. It returns T * instead of void *, letting compiler catch more type >> errors. >> >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. >> >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with >> qemu_vfree, since breaking this will cause problems on Win32. >> >> Now, in my personal opinion, handling OOM gracefully is worth the >> (commonly considerable) trouble when you're coding for an Apple II or >> similar. Anything that pages commonly becomes unusable long before >> allocations fail. > > That's not always my experience; I've seen cases where you suddenly > allocate a load more memory and hit OOM fairly quickly on that hot > process. Most of the time on the desktop you're right. > >> Anything that overcommits will send you a (commonly >> lethal) signal instead. Anything that tries handling OOM gracefully, >> and manages to dodge both these bullets somehow, will commonly get it >> wrong and crash. > > If your qemu has maped it's main memory from hugetlbfs or similar pools > then we're looking at the other memory allocations; and that's a bit of > an interesting difference where those other allocations should be a lot > smaller. > >> But others are entitled to their opinions as much as I am. I just want >> to know what our rules are, preferably in the form of a patch to >> HACKING. > > My rule is to try not to break a happily running VM by some new > activity; I don't worry about it during startup. > > So for example, I don't like it when starting a migration, allocates > some more memory and kills the VM - the user had a happy stable VM > upto that point. Migration gets the blame at this point. I don't doubt reliable OOM handling would be nice. I do doubt it's practical for an application like QEMU. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-18 14:46 ` Markus Armbruster @ 2018-10-18 14:54 ` Dr. David Alan Gilbert 2018-10-18 17:26 ` Markus Armbruster 0 siblings, 1 reply; 13+ messages in thread From: Dr. David Alan Gilbert @ 2018-10-18 14:54 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel * Markus Armbruster (armbru@redhat.com) wrote: > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > > > * Markus Armbruster (armbru@redhat.com) wrote: > >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes > >> g_try_new() & friends, which can fail, and therefore require error > >> handling. > >> > >> HACKING points out the difference, but is mum on when to use what: > >> > >> 3. Low level memory management > >> > >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign > >> APIs is not allowed in the QEMU codebase. Instead of these routines, > >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ > >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree > >> APIs. > >> > >> Please note that g_malloc will exit on allocation failure, so there > >> is no need to test for failure (as you would have to with malloc). > >> Calling g_malloc with a zero size is valid and will return NULL. > >> > >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following > >> reasons: > >> > >> a. It catches multiplication overflowing size_t; > >> b. It returns T * instead of void *, letting compiler catch more type > >> errors. > >> > >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. > >> > >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with > >> qemu_vfree, since breaking this will cause problems on Win32. > >> > >> Now, in my personal opinion, handling OOM gracefully is worth the > >> (commonly considerable) trouble when you're coding for an Apple II or > >> similar. Anything that pages commonly becomes unusable long before > >> allocations fail. > > > > That's not always my experience; I've seen cases where you suddenly > > allocate a load more memory and hit OOM fairly quickly on that hot > > process. Most of the time on the desktop you're right. > > > >> Anything that overcommits will send you a (commonly > >> lethal) signal instead. Anything that tries handling OOM gracefully, > >> and manages to dodge both these bullets somehow, will commonly get it > >> wrong and crash. > > > > If your qemu has maped it's main memory from hugetlbfs or similar pools > > then we're looking at the other memory allocations; and that's a bit of > > an interesting difference where those other allocations should be a lot > > smaller. > > > >> But others are entitled to their opinions as much as I am. I just want > >> to know what our rules are, preferably in the form of a patch to > >> HACKING. > > > > My rule is to try not to break a happily running VM by some new > > activity; I don't worry about it during startup. > > > > So for example, I don't like it when starting a migration, allocates > > some more memory and kills the VM - the user had a happy stable VM > > upto that point. Migration gets the blame at this point. > > I don't doubt reliable OOM handling would be nice. I do doubt it's > practical for an application like QEMU. Well, our use of glib certainly makes it much much harder. I just try and make sure anywhere that I'm allocating a non-trivial amount of memory (especially anything guest or user controlled) uses the _try_ variants. That should keep a lot of the larger allocations. However, it scares me that we've got things that can return big chunks of JSON for example, and I don't think they're being careful about it. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-18 14:54 ` Dr. David Alan Gilbert @ 2018-10-18 17:26 ` Markus Armbruster 2018-10-18 18:01 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 13+ messages in thread From: Markus Armbruster @ 2018-10-18 17:26 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: qemu-devel "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > * Markus Armbruster (armbru@redhat.com) wrote: >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: >> >> > * Markus Armbruster (armbru@redhat.com) wrote: >> >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes >> >> g_try_new() & friends, which can fail, and therefore require error >> >> handling. >> >> >> >> HACKING points out the difference, but is mum on when to use what: >> >> >> >> 3. Low level memory management >> >> >> >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign >> >> APIs is not allowed in the QEMU codebase. Instead of these routines, >> >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ >> >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree >> >> APIs. >> >> >> >> Please note that g_malloc will exit on allocation failure, so there >> >> is no need to test for failure (as you would have to with malloc). >> >> Calling g_malloc with a zero size is valid and will return NULL. >> >> >> >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following >> >> reasons: >> >> >> >> a. It catches multiplication overflowing size_t; >> >> b. It returns T * instead of void *, letting compiler catch more type >> >> errors. >> >> >> >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. >> >> >> >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with >> >> qemu_vfree, since breaking this will cause problems on Win32. >> >> >> >> Now, in my personal opinion, handling OOM gracefully is worth the >> >> (commonly considerable) trouble when you're coding for an Apple II or >> >> similar. Anything that pages commonly becomes unusable long before >> >> allocations fail. >> > >> > That's not always my experience; I've seen cases where you suddenly >> > allocate a load more memory and hit OOM fairly quickly on that hot >> > process. Most of the time on the desktop you're right. >> > >> >> Anything that overcommits will send you a (commonly >> >> lethal) signal instead. Anything that tries handling OOM gracefully, >> >> and manages to dodge both these bullets somehow, will commonly get it >> >> wrong and crash. >> > >> > If your qemu has maped it's main memory from hugetlbfs or similar pools >> > then we're looking at the other memory allocations; and that's a bit of >> > an interesting difference where those other allocations should be a lot >> > smaller. >> > >> >> But others are entitled to their opinions as much as I am. I just want >> >> to know what our rules are, preferably in the form of a patch to >> >> HACKING. >> > >> > My rule is to try not to break a happily running VM by some new >> > activity; I don't worry about it during startup. >> > >> > So for example, I don't like it when starting a migration, allocates >> > some more memory and kills the VM - the user had a happy stable VM >> > upto that point. Migration gets the blame at this point. >> >> I don't doubt reliable OOM handling would be nice. I do doubt it's >> practical for an application like QEMU. > > Well, our use of glib certainly makes it much much harder. > I just try and make sure anywhere that I'm allocating a non-trivial > amount of memory (especially anything guest or user controlled) uses > the _try_ variants. That should keep a lot of the larger allocations. Matters only when your g_try_new()s actually fail (which they won't, at least not reliably), and your error paths actually work (which they won't unless you test them, no offense). > However, it scares me that we've got things that can return big chunks > of JSON for example, and I don't think they're being careful about it. We got countless allocations small and large (large as in Gigabytes) that kill QEMU on OOM. Some of the small allocations add up to Megabytes (QObjects for JSON work, for example). Yet the *practical* problem isn't lack of graceful handling when these allocations fail. Because they pretty much don't. The practical problem I see is general confusion on what to do about OOM. There's no written guidance. Vague rules of thumb on when to handle OOM are floating around. Code gets copied. Unsurprisingly, OOM handling is a haphazard affair. In this state, whatever OOM handling we have is too unreliable to be worth much, since it can only help when (1) allocations actually fail (they generally don't), and (2) the allocation that fails is actually handled (they generally aren't), and (3) the handling actually works (we don't test OOM, so it generally doesn't). For the sake of the argument, let's assume there's a practical way to run QEMU so that memory allocations actually fail. We then still need to find a way to increase the probability for failed allocations to be actually handled, and the probability for the error handling to actually work, both to a useful level. This will require rules on OOM handling, a strategy to make them stick, a strategy to test OOM, and resources to implement all that. Will the benefits be worth the effort? Arguing about that in the near-total vacuum we have now is unlikely to be productive. To ground the debate at least somewhat, I'd like those of us in favour of OOM handling to propose a first draft of OOM handling rules. If we can't do even that, I'll be tempted to shoot down OOM handling in patches to code I maintain. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-18 17:26 ` Markus Armbruster @ 2018-10-18 18:01 ` Dr. David Alan Gilbert 2018-10-19 5:43 ` Markus Armbruster 0 siblings, 1 reply; 13+ messages in thread From: Dr. David Alan Gilbert @ 2018-10-18 18:01 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel * Markus Armbruster (armbru@redhat.com) wrote: > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > > > * Markus Armbruster (armbru@redhat.com) wrote: > >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > >> > >> > * Markus Armbruster (armbru@redhat.com) wrote: > >> >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes > >> >> g_try_new() & friends, which can fail, and therefore require error > >> >> handling. > >> >> > >> >> HACKING points out the difference, but is mum on when to use what: > >> >> > >> >> 3. Low level memory management > >> >> > >> >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign > >> >> APIs is not allowed in the QEMU codebase. Instead of these routines, > >> >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ > >> >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree > >> >> APIs. > >> >> > >> >> Please note that g_malloc will exit on allocation failure, so there > >> >> is no need to test for failure (as you would have to with malloc). > >> >> Calling g_malloc with a zero size is valid and will return NULL. > >> >> > >> >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following > >> >> reasons: > >> >> > >> >> a. It catches multiplication overflowing size_t; > >> >> b. It returns T * instead of void *, letting compiler catch more type > >> >> errors. > >> >> > >> >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. > >> >> > >> >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with > >> >> qemu_vfree, since breaking this will cause problems on Win32. > >> >> > >> >> Now, in my personal opinion, handling OOM gracefully is worth the > >> >> (commonly considerable) trouble when you're coding for an Apple II or > >> >> similar. Anything that pages commonly becomes unusable long before > >> >> allocations fail. > >> > > >> > That's not always my experience; I've seen cases where you suddenly > >> > allocate a load more memory and hit OOM fairly quickly on that hot > >> > process. Most of the time on the desktop you're right. > >> > > >> >> Anything that overcommits will send you a (commonly > >> >> lethal) signal instead. Anything that tries handling OOM gracefully, > >> >> and manages to dodge both these bullets somehow, will commonly get it > >> >> wrong and crash. > >> > > >> > If your qemu has maped it's main memory from hugetlbfs or similar pools > >> > then we're looking at the other memory allocations; and that's a bit of > >> > an interesting difference where those other allocations should be a lot > >> > smaller. > >> > > >> >> But others are entitled to their opinions as much as I am. I just want > >> >> to know what our rules are, preferably in the form of a patch to > >> >> HACKING. > >> > > >> > My rule is to try not to break a happily running VM by some new > >> > activity; I don't worry about it during startup. > >> > > >> > So for example, I don't like it when starting a migration, allocates > >> > some more memory and kills the VM - the user had a happy stable VM > >> > upto that point. Migration gets the blame at this point. > >> > >> I don't doubt reliable OOM handling would be nice. I do doubt it's > >> practical for an application like QEMU. > > > > Well, our use of glib certainly makes it much much harder. > > I just try and make sure anywhere that I'm allocating a non-trivial > > amount of memory (especially anything guest or user controlled) uses > > the _try_ variants. That should keep a lot of the larger allocations. > > Matters only when your g_try_new()s actually fail (which they won't, at > least not reliably), and your error paths actually work (which they > won't unless you test them, no offense). > > > However, it scares me that we've got things that can return big chunks > > of JSON for example, and I don't think they're being careful about it. > > We got countless allocations small and large (large as in Gigabytes) > that kill QEMU on OOM. Some of the small allocations add up to > Megabytes (QObjects for JSON work, for example). > > Yet the *practical* problem isn't lack of graceful handling when these > allocations fail. Because they pretty much don't. > > The practical problem I see is general confusion on what to do about > OOM. There's no written guidance. Vague rules of thumb on when to > handle OOM are floating around. Code gets copied. Unsurprisingly, OOM > handling is a haphazard affair. > In this state, whatever OOM handling we have is too unreliable to be > worth much, since it can only help when (1) allocations actually fail > (they generally don't), and (2) the allocation that fails is actually > handled (they generally aren't), and (3) the handling actually works (we > don't test OOM, so it generally doesn't). > > For the sake of the argument, let's assume there's a practical way to > run QEMU so that memory allocations actually fail. We then still need > to find a way to increase the probability for failed allocations to be > actually handled, and the probability for the error handling to actually > work, both to a useful level. This will require rules on OOM handling, > a strategy to make them stick, a strategy to test OOM, and resources to > implement all that. There's probably no way to guarantee we've got all paths, however we can test in restricted memory environments. For example we could set up a test environment that runs a series of hotplug or migration tests (say avocado or something) in cgroups or nested VMs with random reduced amounts of RAM. These will blow up spectacularly and we can slowly attack some of the more common paths. If we can find common cases then perhaps we can identify things to use static checkers for. We can also try setting up tests in environments closer to the way OpenStack and oVirt configure they're hosts; they seem to jump through hoops to get a feeling of how much spare memory to allocate, but of course since we don't define how much we use they can't really do that. Using mlock would probably make the allocations more likely to fail rather than fault later? > Will the benefits be worth the effort? Arguing about that in the > near-total vacuum we have now is unlikely to be productive. To ground > the debate at least somewhat, I'd like those of us in favour of OOM > handling to propose a first draft of OOM handling rules. Well, I'm up to give it a go; but before I do, can you define a bit more what you want. Firstly what do you define as 'OOM handling' and secondly what type of level of rules do you want. > If we can't do even that, I'll be tempted to shoot down OOM handling in > patches to code I maintain. Please please don't do that; getting it right in the monitor path and QMP is import for those cases where we generate big chunks of JSON (it would be better if we didn't generate big chunks of JSON, but that's a partially separate problem). Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-18 18:01 ` Dr. David Alan Gilbert @ 2018-10-19 5:43 ` Markus Armbruster 2018-10-19 10:07 ` Dr. David Alan Gilbert 2018-10-22 13:40 ` Dr. David Alan Gilbert 0 siblings, 2 replies; 13+ messages in thread From: Markus Armbruster @ 2018-10-19 5:43 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: Markus Armbruster, qemu-devel "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > * Markus Armbruster (armbru@redhat.com) wrote: >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: >> >> > * Markus Armbruster (armbru@redhat.com) wrote: >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: >> >> >> >> > * Markus Armbruster (armbru@redhat.com) wrote: >> >> >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes >> >> >> g_try_new() & friends, which can fail, and therefore require error >> >> >> handling. >> >> >> >> >> >> HACKING points out the difference, but is mum on when to use what: >> >> >> >> >> >> 3. Low level memory management >> >> >> >> >> >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign >> >> >> APIs is not allowed in the QEMU codebase. Instead of these routines, >> >> >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ >> >> >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree >> >> >> APIs. >> >> >> >> >> >> Please note that g_malloc will exit on allocation failure, so there >> >> >> is no need to test for failure (as you would have to with malloc). >> >> >> Calling g_malloc with a zero size is valid and will return NULL. >> >> >> >> >> >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following >> >> >> reasons: >> >> >> >> >> >> a. It catches multiplication overflowing size_t; >> >> >> b. It returns T * instead of void *, letting compiler catch more type >> >> >> errors. >> >> >> >> >> >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. >> >> >> >> >> >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with >> >> >> qemu_vfree, since breaking this will cause problems on Win32. >> >> >> >> >> >> Now, in my personal opinion, handling OOM gracefully is worth the >> >> >> (commonly considerable) trouble when you're coding for an Apple II or >> >> >> similar. Anything that pages commonly becomes unusable long before >> >> >> allocations fail. >> >> > >> >> > That's not always my experience; I've seen cases where you suddenly >> >> > allocate a load more memory and hit OOM fairly quickly on that hot >> >> > process. Most of the time on the desktop you're right. >> >> > >> >> >> Anything that overcommits will send you a (commonly >> >> >> lethal) signal instead. Anything that tries handling OOM gracefully, >> >> >> and manages to dodge both these bullets somehow, will commonly get it >> >> >> wrong and crash. >> >> > >> >> > If your qemu has maped it's main memory from hugetlbfs or similar pools >> >> > then we're looking at the other memory allocations; and that's a bit of >> >> > an interesting difference where those other allocations should be a lot >> >> > smaller. >> >> > >> >> >> But others are entitled to their opinions as much as I am. I just want >> >> >> to know what our rules are, preferably in the form of a patch to >> >> >> HACKING. >> >> > >> >> > My rule is to try not to break a happily running VM by some new >> >> > activity; I don't worry about it during startup. >> >> > >> >> > So for example, I don't like it when starting a migration, allocates >> >> > some more memory and kills the VM - the user had a happy stable VM >> >> > upto that point. Migration gets the blame at this point. >> >> >> >> I don't doubt reliable OOM handling would be nice. I do doubt it's >> >> practical for an application like QEMU. >> > >> > Well, our use of glib certainly makes it much much harder. >> > I just try and make sure anywhere that I'm allocating a non-trivial >> > amount of memory (especially anything guest or user controlled) uses >> > the _try_ variants. That should keep a lot of the larger allocations. >> >> Matters only when your g_try_new()s actually fail (which they won't, at >> least not reliably), and your error paths actually work (which they >> won't unless you test them, no offense). >> >> > However, it scares me that we've got things that can return big chunks >> > of JSON for example, and I don't think they're being careful about it. >> >> We got countless allocations small and large (large as in Gigabytes) >> that kill QEMU on OOM. Some of the small allocations add up to >> Megabytes (QObjects for JSON work, for example). >> >> Yet the *practical* problem isn't lack of graceful handling when these >> allocations fail. Because they pretty much don't. >> >> The practical problem I see is general confusion on what to do about >> OOM. There's no written guidance. Vague rules of thumb on when to >> handle OOM are floating around. Code gets copied. Unsurprisingly, OOM >> handling is a haphazard affair. > >> In this state, whatever OOM handling we have is too unreliable to be >> worth much, since it can only help when (1) allocations actually fail >> (they generally don't), and (2) the allocation that fails is actually >> handled (they generally aren't), and (3) the handling actually works (we >> don't test OOM, so it generally doesn't). >> >> For the sake of the argument, let's assume there's a practical way to >> run QEMU so that memory allocations actually fail. We then still need >> to find a way to increase the probability for failed allocations to be >> actually handled, and the probability for the error handling to actually >> work, both to a useful level. This will require rules on OOM handling, >> a strategy to make them stick, a strategy to test OOM, and resources to >> implement all that. > > There's probably no way to guarantee we've got all paths, however we > can test in restricted memory environments. > For example we could set up a test environment that runs a series of > hotplug or migration tests (say avocado or something) in cgroups > or nested VMs with random reduced amounts of RAM. These will blow up > spectacularly and we can slowly attack some of the more common paths. There's also fault injection. It's more targeted. Bonus: it lets you make only the allocations fail you deem likely to fail, i.e. keep the unchecked ones working ;-P > If we can find common cases then perhaps we can identify things to use > static checkers for. > > We can also try setting up tests in environments closer to the way > OpenStack and oVirt configure they're hosts; they seem to jump through > hoops to get a feeling of how much spare memory to allocate, but of > course since we don't define how much we use they can't really do that. > > Using mlock would probably make the allocations more likely to > fail rather than fault later? "More likely" as in "at all likely". Without a solution here, all the other work is on unreachable code. My box lets me allocate Gigabytes of memory I don't have. In case you find my example involving -device qxl is too opaque, I append a test program. It successfully allocates one Terabyte in 1024 Gigabyte chunks for me. It behaves exactly the same with g_malloc() instead of malloc(). The "normal" way to disable memory overcommit is /proc/sys/vm/overcommit_memory, but it's system-wide, and requires root. That's a big hammer. A more precise tool could be more useful. To actually matter, we need a tool that libvirt can apply to production VMs. >> Will the benefits be worth the effort? Arguing about that in the >> near-total vacuum we have now is unlikely to be productive. To ground >> the debate at least somewhat, I'd like those of us in favour of OOM >> handling to propose a first draft of OOM handling rules. > > Well, I'm up to give it a go; but before I do, can you define a bit more > what you want. Firstly what do you define as 'OOM handling' and secondly > what type of level of rules do you want. 0. OOM is failure to allocate a chunk of memory. 1. When is it okay to terminate the process on OOM? Write down rules that let people decide whether a given allocation needs to be handled gracefully. If your rules involve small vs. large allocations, then make sure to define "small". If your rules involve "in response to untrusted input", spell that out. If your rules involve "in response to trusted input (think QMP)", spell that out. Also: exit() or abort()? If I remember correctly, GLib aborts. 2. How to handle OOM gracefully [skip for first draft] The usual: revert the functions side effects, return failure to caller, repeat for caller until reaching the caller that consumes the error. 3. Coding conventions [definitely skip for first draft] This is part of the "strategy to make the rules stick". >> If we can't do even that, I'll be tempted to shoot down OOM handling in >> patches to code I maintain. > > Please please don't do that; getting it right in the monitor path and > QMP is import for those cases where we generate big chunks of JSON > (it would be better if we didn't generate big chunks of JSON, but that's > a partially separate problem). As long as allocations don't fail, this is all mental masturbation (pardon my french). #include <stdlib.h> #include <stdio.h> int main(void) { size_t GiB = 1024 * 1024 * 1024; int i; void *p; for (i = 0; i < 1024; i++) { printf("%d\n", i); p = malloc(GiB); if (!p) { printf("OOM\n"); break; } } return 0; } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-19 5:43 ` Markus Armbruster @ 2018-10-19 10:07 ` Dr. David Alan Gilbert 2018-10-22 13:40 ` Dr. David Alan Gilbert 1 sibling, 0 replies; 13+ messages in thread From: Dr. David Alan Gilbert @ 2018-10-19 10:07 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel * Markus Armbruster (armbru@redhat.com) wrote: > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > > > * Markus Armbruster (armbru@redhat.com) wrote: > >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > >> > >> > * Markus Armbruster (armbru@redhat.com) wrote: > >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > >> >> > >> >> > * Markus Armbruster (armbru@redhat.com) wrote: > >> >> >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes > >> >> >> g_try_new() & friends, which can fail, and therefore require error > >> >> >> handling. > >> >> >> > >> >> >> HACKING points out the difference, but is mum on when to use what: > >> >> >> > >> >> >> 3. Low level memory management > >> >> >> > >> >> >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign > >> >> >> APIs is not allowed in the QEMU codebase. Instead of these routines, > >> >> >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ > >> >> >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree > >> >> >> APIs. > >> >> >> > >> >> >> Please note that g_malloc will exit on allocation failure, so there > >> >> >> is no need to test for failure (as you would have to with malloc). > >> >> >> Calling g_malloc with a zero size is valid and will return NULL. > >> >> >> > >> >> >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following > >> >> >> reasons: > >> >> >> > >> >> >> a. It catches multiplication overflowing size_t; > >> >> >> b. It returns T * instead of void *, letting compiler catch more type > >> >> >> errors. > >> >> >> > >> >> >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. > >> >> >> > >> >> >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with > >> >> >> qemu_vfree, since breaking this will cause problems on Win32. > >> >> >> > >> >> >> Now, in my personal opinion, handling OOM gracefully is worth the > >> >> >> (commonly considerable) trouble when you're coding for an Apple II or > >> >> >> similar. Anything that pages commonly becomes unusable long before > >> >> >> allocations fail. > >> >> > > >> >> > That's not always my experience; I've seen cases where you suddenly > >> >> > allocate a load more memory and hit OOM fairly quickly on that hot > >> >> > process. Most of the time on the desktop you're right. > >> >> > > >> >> >> Anything that overcommits will send you a (commonly > >> >> >> lethal) signal instead. Anything that tries handling OOM gracefully, > >> >> >> and manages to dodge both these bullets somehow, will commonly get it > >> >> >> wrong and crash. > >> >> > > >> >> > If your qemu has maped it's main memory from hugetlbfs or similar pools > >> >> > then we're looking at the other memory allocations; and that's a bit of > >> >> > an interesting difference where those other allocations should be a lot > >> >> > smaller. > >> >> > > >> >> >> But others are entitled to their opinions as much as I am. I just want > >> >> >> to know what our rules are, preferably in the form of a patch to > >> >> >> HACKING. > >> >> > > >> >> > My rule is to try not to break a happily running VM by some new > >> >> > activity; I don't worry about it during startup. > >> >> > > >> >> > So for example, I don't like it when starting a migration, allocates > >> >> > some more memory and kills the VM - the user had a happy stable VM > >> >> > upto that point. Migration gets the blame at this point. > >> >> > >> >> I don't doubt reliable OOM handling would be nice. I do doubt it's > >> >> practical for an application like QEMU. > >> > > >> > Well, our use of glib certainly makes it much much harder. > >> > I just try and make sure anywhere that I'm allocating a non-trivial > >> > amount of memory (especially anything guest or user controlled) uses > >> > the _try_ variants. That should keep a lot of the larger allocations. > >> > >> Matters only when your g_try_new()s actually fail (which they won't, at > >> least not reliably), and your error paths actually work (which they > >> won't unless you test them, no offense). > >> > >> > However, it scares me that we've got things that can return big chunks > >> > of JSON for example, and I don't think they're being careful about it. > >> > >> We got countless allocations small and large (large as in Gigabytes) > >> that kill QEMU on OOM. Some of the small allocations add up to > >> Megabytes (QObjects for JSON work, for example). > >> > >> Yet the *practical* problem isn't lack of graceful handling when these > >> allocations fail. Because they pretty much don't. > >> > >> The practical problem I see is general confusion on what to do about > >> OOM. There's no written guidance. Vague rules of thumb on when to > >> handle OOM are floating around. Code gets copied. Unsurprisingly, OOM > >> handling is a haphazard affair. > > > >> In this state, whatever OOM handling we have is too unreliable to be > >> worth much, since it can only help when (1) allocations actually fail > >> (they generally don't), and (2) the allocation that fails is actually > >> handled (they generally aren't), and (3) the handling actually works (we > >> don't test OOM, so it generally doesn't). > >> > >> For the sake of the argument, let's assume there's a practical way to > >> run QEMU so that memory allocations actually fail. We then still need > >> to find a way to increase the probability for failed allocations to be > >> actually handled, and the probability for the error handling to actually > >> work, both to a useful level. This will require rules on OOM handling, > >> a strategy to make them stick, a strategy to test OOM, and resources to > >> implement all that. > > > > There's probably no way to guarantee we've got all paths, however we > > can test in restricted memory environments. > > For example we could set up a test environment that runs a series of > > hotplug or migration tests (say avocado or something) in cgroups > > or nested VMs with random reduced amounts of RAM. These will blow up > > spectacularly and we can slowly attack some of the more common paths. > > There's also fault injection. It's more targeted. Bonus: it lets you > make only the allocations fail you deem likely to fail, i.e. keep the > unchecked ones working ;-P Yes, although I'm worrying more about the paths we forgot about, so I like the idea of random testing to find those. > > If we can find common cases then perhaps we can identify things to use > > static checkers for. > > > > We can also try setting up tests in environments closer to the way > > OpenStack and oVirt configure they're hosts; they seem to jump through > > hoops to get a feeling of how much spare memory to allocate, but of > > course since we don't define how much we use they can't really do that. > > > > Using mlock would probably make the allocations more likely to > > fail rather than fault later? > > "More likely" as in "at all likely". Without a solution here, all the > other work is on unreachable code. My box lets me allocate Gigabytes of > memory I don't have. In case you find my example involving -device qxl > is too opaque, I append a test program. It successfully allocates one > Terabyte in 1024 Gigabyte chunks for me. It behaves exactly the same > with g_malloc() instead of malloc(). Yes, it's tricky. The only way that I got to work was ulimit -v 4000000 and then your test prints OOM and the qxl test prints: -device qxl,ram_size=2147483648: cannot set up guest memory 'qxl.vgavram': Cannot allocate memory > The "normal" way to disable memory overcommit is > /proc/sys/vm/overcommit_memory, but it's system-wide, and requires root. > That's a big hammer. A more precise tool could be more useful. To > actually matter, we need a tool that libvirt can apply to production > VMs. I had a play with some other things that I thought should work but couldn't persuade them to, and I worry why. I thought ulimit -l together with qemu with -realtime mlock=on would work, but it didn't seem to - that one really worries me. I thought cgroup limits would work, but again they didn't seem to. libvirt can set up both ulimit mlock and some cgroup limits. While overcommit_memory is a big hammer, it's not necessarily a problem having a root-only hammer, because that still solves the problem for dedicated hypervisor machines that are common in both OpenStack and oVirt. I guess we should also take a step back and worry if this is a linux-ism. > >> Will the benefits be worth the effort? Arguing about that in the > >> near-total vacuum we have now is unlikely to be productive. To ground > >> the debate at least somewhat, I'd like those of us in favour of OOM > >> handling to propose a first draft of OOM handling rules. > > > > Well, I'm up to give it a go; but before I do, can you define a bit more > > what you want. Firstly what do you define as 'OOM handling' and secondly > > what type of level of rules do you want. > > 0. OOM is failure to allocate a chunk of memory. > > 1. When is it okay to terminate the process on OOM? > > Write down rules that let people decide whether a given allocation > needs to be handled gracefully. > > If your rules involve small vs. large allocations, then make sure to > define "small". > > If your rules involve "in response to untrusted input", spell that > out. > > If your rules involve "in response to trusted input (think QMP)", > spell that out. > > Also: exit() or abort()? If I remember correctly, GLib aborts. > > 2. How to handle OOM gracefully [skip for first draft] > > The usual: revert the functions side effects, return failure to > caller, repeat for caller until reaching the caller that consumes the > error. > > 3. Coding conventions [definitely skip for first draft] > > This is part of the "strategy to make the rules stick". OK, that's reasonable (although I might try and avoid the use of OOM as a name because people think o the kernel OOM killer, and that confuses the thing we'd try and avoid). > >> If we can't do even that, I'll be tempted to shoot down OOM handling in > >> patches to code I maintain. > > > > Please please don't do that; getting it right in the monitor path and > > QMP is import for those cases where we generate big chunks of JSON > > (it would be better if we didn't generate big chunks of JSON, but that's > > a partially separate problem). > > As long as allocations don't fail, this is all mental masturbation > (pardon my french). No need to blame the French. Dave > > > #include <stdlib.h> > #include <stdio.h> > > int > main(void) > { > size_t GiB = 1024 * 1024 * 1024; > int i; > void *p; > > for (i = 0; i < 1024; i++) { > printf("%d\n", i); > p = malloc(GiB); > if (!p) { > printf("OOM\n"); > break; > } > } > return 0; > } -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-19 5:43 ` Markus Armbruster 2018-10-19 10:07 ` Dr. David Alan Gilbert @ 2018-10-22 13:40 ` Dr. David Alan Gilbert 1 sibling, 0 replies; 13+ messages in thread From: Dr. David Alan Gilbert @ 2018-10-22 13:40 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel * Markus Armbruster (armbru@redhat.com) wrote: > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > > > * Markus Armbruster (armbru@redhat.com) wrote: > >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > >> > >> > * Markus Armbruster (armbru@redhat.com) wrote: > >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > >> >> > >> >> > * Markus Armbruster (armbru@redhat.com) wrote: > >> >> >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes > >> >> >> g_try_new() & friends, which can fail, and therefore require error > >> >> >> handling. > >> >> >> > >> >> >> HACKING points out the difference, but is mum on when to use what: > >> >> >> > >> >> >> 3. Low level memory management > >> >> >> > >> >> >> Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign > >> >> >> APIs is not allowed in the QEMU codebase. Instead of these routines, > >> >> >> use the GLib memory allocation routines g_malloc/g_malloc0/g_new/ > >> >> >> g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree > >> >> >> APIs. > >> >> >> > >> >> >> Please note that g_malloc will exit on allocation failure, so there > >> >> >> is no need to test for failure (as you would have to with malloc). > >> >> >> Calling g_malloc with a zero size is valid and will return NULL. > >> >> >> > >> >> >> Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following > >> >> >> reasons: > >> >> >> > >> >> >> a. It catches multiplication overflowing size_t; > >> >> >> b. It returns T * instead of void *, letting compiler catch more type > >> >> >> errors. > >> >> >> > >> >> >> Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though. > >> >> >> > >> >> >> Memory allocated by qemu_memalign or qemu_blockalign must be freed with > >> >> >> qemu_vfree, since breaking this will cause problems on Win32. > >> >> >> > >> >> >> Now, in my personal opinion, handling OOM gracefully is worth the > >> >> >> (commonly considerable) trouble when you're coding for an Apple II or > >> >> >> similar. Anything that pages commonly becomes unusable long before > >> >> >> allocations fail. > >> >> > > >> >> > That's not always my experience; I've seen cases where you suddenly > >> >> > allocate a load more memory and hit OOM fairly quickly on that hot > >> >> > process. Most of the time on the desktop you're right. > >> >> > > >> >> >> Anything that overcommits will send you a (commonly > >> >> >> lethal) signal instead. Anything that tries handling OOM gracefully, > >> >> >> and manages to dodge both these bullets somehow, will commonly get it > >> >> >> wrong and crash. > >> >> > > >> >> > If your qemu has maped it's main memory from hugetlbfs or similar pools > >> >> > then we're looking at the other memory allocations; and that's a bit of > >> >> > an interesting difference where those other allocations should be a lot > >> >> > smaller. > >> >> > > >> >> >> But others are entitled to their opinions as much as I am. I just want > >> >> >> to know what our rules are, preferably in the form of a patch to > >> >> >> HACKING. > >> >> > > >> >> > My rule is to try not to break a happily running VM by some new > >> >> > activity; I don't worry about it during startup. > >> >> > > >> >> > So for example, I don't like it when starting a migration, allocates > >> >> > some more memory and kills the VM - the user had a happy stable VM > >> >> > upto that point. Migration gets the blame at this point. > >> >> > >> >> I don't doubt reliable OOM handling would be nice. I do doubt it's > >> >> practical for an application like QEMU. > >> > > >> > Well, our use of glib certainly makes it much much harder. > >> > I just try and make sure anywhere that I'm allocating a non-trivial > >> > amount of memory (especially anything guest or user controlled) uses > >> > the _try_ variants. That should keep a lot of the larger allocations. > >> > >> Matters only when your g_try_new()s actually fail (which they won't, at > >> least not reliably), and your error paths actually work (which they > >> won't unless you test them, no offense). > >> > >> > However, it scares me that we've got things that can return big chunks > >> > of JSON for example, and I don't think they're being careful about it. > >> > >> We got countless allocations small and large (large as in Gigabytes) > >> that kill QEMU on OOM. Some of the small allocations add up to > >> Megabytes (QObjects for JSON work, for example). > >> > >> Yet the *practical* problem isn't lack of graceful handling when these > >> allocations fail. Because they pretty much don't. > >> > >> The practical problem I see is general confusion on what to do about > >> OOM. There's no written guidance. Vague rules of thumb on when to > >> handle OOM are floating around. Code gets copied. Unsurprisingly, OOM > >> handling is a haphazard affair. > > > >> In this state, whatever OOM handling we have is too unreliable to be > >> worth much, since it can only help when (1) allocations actually fail > >> (they generally don't), and (2) the allocation that fails is actually > >> handled (they generally aren't), and (3) the handling actually works (we > >> don't test OOM, so it generally doesn't). > >> > >> For the sake of the argument, let's assume there's a practical way to > >> run QEMU so that memory allocations actually fail. We then still need > >> to find a way to increase the probability for failed allocations to be > >> actually handled, and the probability for the error handling to actually > >> work, both to a useful level. This will require rules on OOM handling, > >> a strategy to make them stick, a strategy to test OOM, and resources to > >> implement all that. > > > > There's probably no way to guarantee we've got all paths, however we > > can test in restricted memory environments. > > For example we could set up a test environment that runs a series of > > hotplug or migration tests (say avocado or something) in cgroups > > or nested VMs with random reduced amounts of RAM. These will blow up > > spectacularly and we can slowly attack some of the more common paths. > > There's also fault injection. It's more targeted. Bonus: it lets you > make only the allocations fail you deem likely to fail, i.e. keep the > unchecked ones working ;-P > > > If we can find common cases then perhaps we can identify things to use > > static checkers for. > > > > We can also try setting up tests in environments closer to the way > > OpenStack and oVirt configure they're hosts; they seem to jump through > > hoops to get a feeling of how much spare memory to allocate, but of > > course since we don't define how much we use they can't really do that. > > > > Using mlock would probably make the allocations more likely to > > fail rather than fault later? > > "More likely" as in "at all likely". Without a solution here, all the > other work is on unreachable code. My box lets me allocate Gigabytes of > memory I don't have. In case you find my example involving -device qxl > is too opaque, I append a test program. It successfully allocates one > Terabyte in 1024 Gigabyte chunks for me. It behaves exactly the same > with g_malloc() instead of malloc(). > > The "normal" way to disable memory overcommit is > /proc/sys/vm/overcommit_memory, but it's system-wide, and requires root. > That's a big hammer. A more precise tool could be more useful. To > actually matter, we need a tool that libvirt can apply to production > VMs. > > >> Will the benefits be worth the effort? Arguing about that in the > >> near-total vacuum we have now is unlikely to be productive. To ground > >> the debate at least somewhat, I'd like those of us in favour of OOM > >> handling to propose a first draft of OOM handling rules. > > > > Well, I'm up to give it a go; but before I do, can you define a bit more > > what you want. Firstly what do you define as 'OOM handling' and secondly > > what type of level of rules do you want. > > 0. OOM is failure to allocate a chunk of memory. > > 1. When is it okay to terminate the process on OOM? > > Write down rules that let people decide whether a given allocation > needs to be handled gracefully. > > If your rules involve small vs. large allocations, then make sure to > define "small". > > If your rules involve "in response to untrusted input", spell that > out. > > If your rules involve "in response to trusted input (think QMP)", > spell that out. > > Also: exit() or abort()? If I remember correctly, GLib aborts. > > 2. How to handle OOM gracefully [skip for first draft] > > The usual: revert the functions side effects, return failure to > caller, repeat for caller until reaching the caller that consumes the > error. > > 3. Coding conventions [definitely skip for first draft] > > This is part of the "strategy to make the rules stick". OK, how about this as a strawman: https://wiki.qemu.org/Features/AllocationFailures Dave > >> If we can't do even that, I'll be tempted to shoot down OOM handling in > >> patches to code I maintain. > > > > Please please don't do that; getting it right in the monitor path and > > QMP is import for those cases where we generate big chunks of JSON > > (it would be better if we didn't generate big chunks of JSON, but that's > > a partially separate problem). > > As long as allocations don't fail, this is all mental masturbation > (pardon my french). > > > > #include <stdlib.h> > #include <stdio.h> > > int > main(void) > { > size_t GiB = 1024 * 1024 * 1024; > int i; > void *p; > > for (i = 0; i < 1024; i++) { > printf("%d\n", i); > p = malloc(GiB); > if (!p) { > printf("OOM\n"); > break; > } > } > return 0; > } -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] When it's okay to treat OOM as fatal? 2018-10-16 13:01 [Qemu-devel] When it's okay to treat OOM as fatal? Markus Armbruster 2018-10-16 13:20 ` Daniel P. Berrangé 2018-10-16 13:33 ` Dr. David Alan Gilbert @ 2018-10-17 10:05 ` Stefan Hajnoczi 2 siblings, 0 replies; 13+ messages in thread From: Stefan Hajnoczi @ 2018-10-17 10:05 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel [-- Attachment #1: Type: text/plain, Size: 1161 bytes --] On Tue, Oct 16, 2018 at 03:01:29PM +0200, Markus Armbruster wrote: > Anything that pages commonly becomes unusable long before > allocations fail. Anything that overcommits will send you a (commonly > lethal) signal instead. Anything that tries handling OOM gracefully, > and manages to dodge both these bullets somehow, will commonly get it > wrong and crash. In the block layer blk_try_blockalign() (previously qemu_try_blockalign()) is used because significant amounts of memory can be allocated by the untrusted guest or untrusted disk image files. I think the error handling is reasonable in those cases: 1. QEMU startup or disk hotplug fail with a nice error message OR 2. An I/O request is failed (ultimately just EIO error reporting but it's better than killing the QEMU process!) I'm pretty sure ENOMEM errors are possible even when memory overcommit is enabled. My thinking has been to use g_new() for small QEMU-internal structures and g_try_new() for large amounts of memory allocated in response to untrusted inputs. (Untrusted inputs must never be used for unbounded allocation sizes but those bounded sizes can still be large.) Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-10-22 13:40 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-16 13:01 [Qemu-devel] When it's okay to treat OOM as fatal? Markus Armbruster 2018-10-16 13:20 ` Daniel P. Berrangé 2018-10-18 13:06 ` Markus Armbruster 2018-10-18 14:28 ` Paolo Bonzini 2018-10-16 13:33 ` Dr. David Alan Gilbert 2018-10-18 14:46 ` Markus Armbruster 2018-10-18 14:54 ` Dr. David Alan Gilbert 2018-10-18 17:26 ` Markus Armbruster 2018-10-18 18:01 ` Dr. David Alan Gilbert 2018-10-19 5:43 ` Markus Armbruster 2018-10-19 10:07 ` Dr. David Alan Gilbert 2018-10-22 13:40 ` Dr. David Alan Gilbert 2018-10-17 10:05 ` Stefan Hajnoczi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.