From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57478) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3NZo-0002e1-IV for qemu-devel@nongnu.org; Fri, 12 Jun 2015 07:56:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z3NZj-0003Xz-HF for qemu-devel@nongnu.org; Fri, 12 Jun 2015 07:56:48 -0400 Received: from e06smtp11.uk.ibm.com ([195.75.94.107]:43990) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3NZj-0003Wi-6n for qemu-devel@nongnu.org; Fri, 12 Jun 2015 07:56:43 -0400 Received: from /spool/local by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 12 Jun 2015 12:56:41 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id C7EE917D8066 for ; Fri, 12 Jun 2015 12:57:41 +0100 (BST) Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t5CBudYS23330826 for ; Fri, 12 Jun 2015 11:56:39 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t5CBucOm030782 for ; Fri, 12 Jun 2015 05:56:38 -0600 Message-ID: <557AC8F5.6040105@de.ibm.com> Date: Fri, 12 Jun 2015 13:56:37 +0200 From: Christian Borntraeger MIME-Version: 1.0 References: <1433845144-26889-1-git-send-email-den@openvz.org> <1433845144-26889-2-git-send-email-den@openvz.org> <5576C1CF.40305@de.ibm.com> <5578274D.6070900@openvz.org> <20150610151113-mutt-send-email-mst@redhat.com> In-Reply-To: <20150610151113-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" , "Denis V. Lunev" Cc: James.Bottomley@HansenPartnership.com, qemu-devel@nongnu.org, Raushaniya Maksudova , Anthony Liguori Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin: > On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote: >> On 09/06/15 13:37, Christian Borntraeger wrote: >>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev: >>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer, >>>> when Linux is under severe memory pressure. Various mechanisms are >>>> responsible for correct virtio_balloon memory management. Nevertheless it >>>> is often the case that these control tools does not have enough time to >>>> react on fast changing memory load. As a result OS runs out of memory and >>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon >>>> should not cause the termination of processes while there are pages in the >>>> balloon. Now there is no way for virtio balloon driver to free memory at >>>> the last moment before some process get killed by OOM-killer. >>>> >>>> This does not provide a security breach as balloon itself is running >>>> inside Guest OS and is working in the cooperation with the host. Thus >>>> some improvements from Guest side should be considered as normal. >>>> >>>> To solve the problem, introduce a virtio_balloon callback which is >>>> expected to be called from the oom notifier call chain in out_of_memory() >>>> function. If virtio balloon could release some memory, it will make the >>>> system return and retry the allocation that forced the out of memory >>>> killer to run. >>>> >>>> This behavior should be enabled if and only if appropriate feature bit >>>> is set on the device. It is off by default. >>> The balloon frees pages in this way >>> >>> static void balloon_page(void *addr, int deflate) >>> { >>> #if defined(__linux__) >>> if (!kvm_enabled() || kvm_has_sync_mmu()) >>> qemu_madvise(addr, TARGET_PAGE_SIZE, >>> deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); >>> #endif >>> } >>> >>> The guest can re-touch that page and get a empty zero or the old page back without >>> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed? >>> >>> Christian >> >> I'd like to do that :) Actually original version of kernel patch >> has enabled this unconditionally. But Michael asked to make >> it configurable and off by default. >> >> Den > > That's not the question here. The question is why is it limited by kvm_has_sync_mmu. Well we have two interesting options here: VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM For any sane host with ondemand paging just re-accessing the page should simply work. So the common case could be VIRTIO_BALLOON_F_MUST_TELL_HOST == off VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on Only for the rare case of hypervisors without paging or other memory related restrictions we have to enable MUST_TELL_HOST. Now: QEMU knows exactly which case we have, so why not let QEMU tell the guest what the capabilities are. (e.g. sync_mmu ---> no need to tell the host). I can at least imaging that some admin wants to make the the oom case configurable, but a sane default seems to be to not kill random guest processes. Christian