All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 0/1] balloon: add a feature bit to let Guest OS deflate
@ 2015-06-09 10:19 Denis V. Lunev
  2015-06-09 10:19 ` [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom Denis V. Lunev
  0 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2015-06-09 10:19 UTC (permalink / raw)
  Cc: Michael S. Tsirkin, qemu-devel, Raushaniya Maksudova,
	James.Bottomley, Anthony Liguori, Denis V. Lunev

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system to return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux (actually in
linux-next at the moment)

  commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
  Author: Raushaniya Maksudova <rmaksudova@parallels.com>
  Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for baloon device which do the trick.

Changes from v5:
- ported to QEMU current

Changes from v4:
- spelling corrected according to suggestions from Eric Blake

Changes from v3:
- ported to git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream_rebased

Changes from v2:
- fixed mistake with bit number in virtio_balloon_get_features

Changes from v1:
- From: in patch 1 according to the original ownership
- feature processing in patch 2 as suggested by Michael. It could be done
  without additional field, but this will require to move the property
  level up, i.e. to PCI & CCW level.

Signed-off-by: Raushaniya Maksudova <rmaksudova@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Anthony Liguori <aliguori@amazon.com>
CC: Michael S. Tsirkin <mst@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-09 10:19 [Qemu-devel] [PATCH v6 0/1] balloon: add a feature bit to let Guest OS deflate Denis V. Lunev
@ 2015-06-09 10:19 ` Denis V. Lunev
  2015-06-09 10:37   ` Christian Borntraeger
  2015-06-09 10:37   ` Denis V. Lunev
  0 siblings, 2 replies; 14+ messages in thread
From: Denis V. Lunev @ 2015-06-09 10:19 UTC (permalink / raw)
  Cc: Michael S. Tsirkin, qemu-devel, Raushaniya Maksudova,
	James.Bottomley, Anthony Liguori, Denis V. Lunev

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system return and retry the allocation that forced the out of memory
killer to run.

This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.

This functionality was recently merged into vanilla Linux.

  commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
  Author: Raushaniya Maksudova <rmaksudova@parallels.com>
  Date:   Mon Nov 10 09:36:29 2014 +1030

This patch adds respective control bits into QEMU. It introduces
deflate-on-oom option for balloon device which does the trick.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Raushaniya Maksudova <rmaksudova@virtuozzo.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c         | 6 ++++--
 include/hw/virtio/virtio-balloon.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index f915c7b..d3f36f8 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -312,8 +312,8 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 
 static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f)
 {
-    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
-    return f;
+    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+    return f | (1u << VIRTIO_BALLOON_F_STATS_VQ) | dev->host_features;
 }
 
 static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
@@ -423,6 +423,8 @@ static void virtio_balloon_instance_init(Object *obj)
 }
 
 static Property virtio_balloon_properties[] = {
+    DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 4ab8f54..7f49b1f 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -36,6 +36,7 @@ typedef struct VirtIOBalloon {
     QEMUTimer *stats_timer;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
+    uint32_t host_features;
 } VirtIOBalloon;
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-09 10:19 ` [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom Denis V. Lunev
@ 2015-06-09 10:37   ` Christian Borntraeger
  2015-06-10 12:02     ` Denis V. Lunev
  2015-06-09 10:37   ` Denis V. Lunev
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Borntraeger @ 2015-06-09 10:37 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: James.Bottomley, Anthony Liguori, qemu-devel,
	Raushaniya Maksudova, Michael S. Tsirkin

Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> when Linux is under severe memory pressure. Various mechanisms are
> responsible for correct virtio_balloon memory management. Nevertheless it
> is often the case that these control tools does not have enough time to
> react on fast changing memory load. As a result OS runs out of memory and
> invokes OOM-killer. The balancing of memory by use of the virtio balloon
> should not cause the termination of processes while there are pages in the
> balloon. Now there is no way for virtio balloon driver to free memory at
> the last moment before some process get killed by OOM-killer.
> 
> This does not provide a security breach as balloon itself is running
> inside Guest OS and is working in the cooperation with the host. Thus
> some improvements from Guest side should be considered as normal.
> 
> To solve the problem, introduce a virtio_balloon callback which is
> expected to be called from the oom notifier call chain in out_of_memory()
> function. If virtio balloon could release some memory, it will make the
> system return and retry the allocation that forced the out of memory
> killer to run.
> 
> This behavior should be enabled if and only if appropriate feature bit
> is set on the device. It is off by default.

The balloon frees pages in this way

static void balloon_page(void *addr, int deflate)
{
#if defined(__linux__)
    if (!kvm_enabled() || kvm_has_sync_mmu())
        qemu_madvise(addr, TARGET_PAGE_SIZE,
                deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
#endif
}

The guest can re-touch that page and get a empty zero or the old page back without
tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?

Christian
> 
> This functionality was recently merged into vanilla Linux.
> 
>   commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
>   Author: Raushaniya Maksudova <rmaksudova@parallels.com>
>   Date:   Mon Nov 10 09:36:29 2014 +1030
> 
> This patch adds respective control bits into QEMU. It introduces
> deflate-on-oom option for balloon device which does the trick.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Raushaniya Maksudova <rmaksudova@virtuozzo.com>
> CC: Anthony Liguori <aliguori@amazon.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/virtio/virtio-balloon.c         | 6 ++++--
>  include/hw/virtio/virtio-balloon.h | 1 +
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index f915c7b..d3f36f8 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -312,8 +312,8 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
> 
>  static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f)
>  {
> -    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
> -    return f;
> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> +    return f | (1u << VIRTIO_BALLOON_F_STATS_VQ) | dev->host_features;
>  }
> 
>  static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
> @@ -423,6 +423,8 @@ static void virtio_balloon_instance_init(Object *obj)
>  }
> 
>  static Property virtio_balloon_properties[] = {
> +    DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> 
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 4ab8f54..7f49b1f 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -36,6 +36,7 @@ typedef struct VirtIOBalloon {
>      QEMUTimer *stats_timer;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
> +    uint32_t host_features;
>  } VirtIOBalloon;
> 
>  #endif
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-09 10:19 ` [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom Denis V. Lunev
  2015-06-09 10:37   ` Christian Borntraeger
@ 2015-06-09 10:37   ` Denis V. Lunev
  1 sibling, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2015-06-09 10:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James.Bottomley, qemu-devel, Raushaniya Maksudova, Anthony Liguori

On 09/06/15 13:19, Denis V. Lunev wrote:
> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> when Linux is under severe memory pressure. Various mechanisms are
> responsible for correct virtio_balloon memory management. Nevertheless it
> is often the case that these control tools does not have enough time to
> react on fast changing memory load. As a result OS runs out of memory and
> invokes OOM-killer. The balancing of memory by use of the virtio balloon
> should not cause the termination of processes while there are pages in the
> balloon. Now there is no way for virtio balloon driver to free memory at
> the last moment before some process get killed by OOM-killer.
>
> This does not provide a security breach as balloon itself is running
> inside Guest OS and is working in the cooperation with the host. Thus
> some improvements from Guest side should be considered as normal.
>
> To solve the problem, introduce a virtio_balloon callback which is
> expected to be called from the oom notifier call chain in out_of_memory()
> function. If virtio balloon could release some memory, it will make the
> system return and retry the allocation that forced the out of memory
> killer to run.
>
> This behavior should be enabled if and only if appropriate feature bit
> is set on the device. It is off by default.
>
> This functionality was recently merged into vanilla Linux.
>
>    commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
>    Author: Raushaniya Maksudova <rmaksudova@parallels.com>
>    Date:   Mon Nov 10 09:36:29 2014 +1030
>
> This patch adds respective control bits into QEMU. It introduces
> deflate-on-oom option for balloon device which does the trick.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Raushaniya Maksudova <rmaksudova@virtuozzo.com>
> CC: Anthony Liguori <aliguori@amazon.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>   hw/virtio/virtio-balloon.c         | 6 ++++--
>   include/hw/virtio/virtio-balloon.h | 1 +
>   2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index f915c7b..d3f36f8 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -312,8 +312,8 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>   
>   static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f)
>   {
> -    f |= (1 << VIRTIO_BALLOON_F_STATS_VQ);
> -    return f;
> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> +    return f | (1u << VIRTIO_BALLOON_F_STATS_VQ) | dev->host_features;
>   }
>   
>   static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
> @@ -423,6 +423,8 @@ static void virtio_balloon_instance_init(Object *obj)
>   }
>   
>   static Property virtio_balloon_properties[] = {
> +    DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 4ab8f54..7f49b1f 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -36,6 +36,7 @@ typedef struct VirtIOBalloon {
>       QEMUTimer *stats_timer;
>       int64_t stats_last_update;
>       int64_t stats_poll_interval;
> +    uint32_t host_features;
>   } VirtIOBalloon;
>   
>   #endif
Michael,

I can rebase over uncommitted [PATCH 23/33] virtio-balloon: switch to 
virtio_add_feature
which changes bits assignments for balloon.

Should I?

Regards,
     Den

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-09 10:37   ` Christian Borntraeger
@ 2015-06-10 12:02     ` Denis V. Lunev
  2015-06-10 13:13       ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2015-06-10 12:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: James.Bottomley, Anthony Liguori, qemu-devel,
	Raushaniya Maksudova, Michael S. Tsirkin

On 09/06/15 13:37, Christian Borntraeger wrote:
> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
>> when Linux is under severe memory pressure. Various mechanisms are
>> responsible for correct virtio_balloon memory management. Nevertheless it
>> is often the case that these control tools does not have enough time to
>> react on fast changing memory load. As a result OS runs out of memory and
>> invokes OOM-killer. The balancing of memory by use of the virtio balloon
>> should not cause the termination of processes while there are pages in the
>> balloon. Now there is no way for virtio balloon driver to free memory at
>> the last moment before some process get killed by OOM-killer.
>>
>> This does not provide a security breach as balloon itself is running
>> inside Guest OS and is working in the cooperation with the host. Thus
>> some improvements from Guest side should be considered as normal.
>>
>> To solve the problem, introduce a virtio_balloon callback which is
>> expected to be called from the oom notifier call chain in out_of_memory()
>> function. If virtio balloon could release some memory, it will make the
>> system return and retry the allocation that forced the out of memory
>> killer to run.
>>
>> This behavior should be enabled if and only if appropriate feature bit
>> is set on the device. It is off by default.
> The balloon frees pages in this way
>
> static void balloon_page(void *addr, int deflate)
> {
> #if defined(__linux__)
>      if (!kvm_enabled() || kvm_has_sync_mmu())
>          qemu_madvise(addr, TARGET_PAGE_SIZE,
>                  deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> #endif
> }
>
> The guest can re-touch that page and get a empty zero or the old page back without
> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?
>
> Christian

I'd like to do that :) Actually original version of kernel patch
has enabled this unconditionally. But Michael asked to make
it configurable and off by default.

Den

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-10 12:02     ` Denis V. Lunev
@ 2015-06-10 13:13       ` Michael S. Tsirkin
  2015-06-10 13:27         ` Denis V. Lunev
  2015-06-12 11:56         ` Christian Borntraeger
  0 siblings, 2 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-06-10 13:13 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: James.Bottomley, Christian Borntraeger, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
> On 09/06/15 13:37, Christian Borntraeger wrote:
> >Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
> >>Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> >>when Linux is under severe memory pressure. Various mechanisms are
> >>responsible for correct virtio_balloon memory management. Nevertheless it
> >>is often the case that these control tools does not have enough time to
> >>react on fast changing memory load. As a result OS runs out of memory and
> >>invokes OOM-killer. The balancing of memory by use of the virtio balloon
> >>should not cause the termination of processes while there are pages in the
> >>balloon. Now there is no way for virtio balloon driver to free memory at
> >>the last moment before some process get killed by OOM-killer.
> >>
> >>This does not provide a security breach as balloon itself is running
> >>inside Guest OS and is working in the cooperation with the host. Thus
> >>some improvements from Guest side should be considered as normal.
> >>
> >>To solve the problem, introduce a virtio_balloon callback which is
> >>expected to be called from the oom notifier call chain in out_of_memory()
> >>function. If virtio balloon could release some memory, it will make the
> >>system return and retry the allocation that forced the out of memory
> >>killer to run.
> >>
> >>This behavior should be enabled if and only if appropriate feature bit
> >>is set on the device. It is off by default.
> >The balloon frees pages in this way
> >
> >static void balloon_page(void *addr, int deflate)
> >{
> >#if defined(__linux__)
> >     if (!kvm_enabled() || kvm_has_sync_mmu())
> >         qemu_madvise(addr, TARGET_PAGE_SIZE,
> >                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> >#endif
> >}
> >
> >The guest can re-touch that page and get a empty zero or the old page back without
> >tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?
> >
> >Christian
> 
> I'd like to do that :) Actually original version of kernel patch
> has enabled this unconditionally. But Michael asked to make
> it configurable and off by default.
> 
> Den

That's not the question here.  The question is why is it limited by kvm_has_sync_mmu.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-10 13:13       ` Michael S. Tsirkin
@ 2015-06-10 13:27         ` Denis V. Lunev
  2015-06-12 11:56         ` Christian Borntraeger
  1 sibling, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2015-06-10 13:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James.Bottomley, Christian Borntraeger, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

On 10/06/15 16:13, Michael S. Tsirkin wrote:
> On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
>> On 09/06/15 13:37, Christian Borntraeger wrote:
>>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
>>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
>>>> when Linux is under severe memory pressure. Various mechanisms are
>>>> responsible for correct virtio_balloon memory management. Nevertheless it
>>>> is often the case that these control tools does not have enough time to
>>>> react on fast changing memory load. As a result OS runs out of memory and
>>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon
>>>> should not cause the termination of processes while there are pages in the
>>>> balloon. Now there is no way for virtio balloon driver to free memory at
>>>> the last moment before some process get killed by OOM-killer.
>>>>
>>>> This does not provide a security breach as balloon itself is running
>>>> inside Guest OS and is working in the cooperation with the host. Thus
>>>> some improvements from Guest side should be considered as normal.
>>>>
>>>> To solve the problem, introduce a virtio_balloon callback which is
>>>> expected to be called from the oom notifier call chain in out_of_memory()
>>>> function. If virtio balloon could release some memory, it will make the
>>>> system return and retry the allocation that forced the out of memory
>>>> killer to run.
>>>>
>>>> This behavior should be enabled if and only if appropriate feature bit
>>>> is set on the device. It is off by default.
>>> The balloon frees pages in this way
>>>
>>> static void balloon_page(void *addr, int deflate)
>>> {
>>> #if defined(__linux__)
>>>      if (!kvm_enabled() || kvm_has_sync_mmu())
>>>          qemu_madvise(addr, TARGET_PAGE_SIZE,
>>>                  deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
>>> #endif
>>> }
>>>
>>> The guest can re-touch that page and get a empty zero or the old page back without
>>> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?
>>>
>>> Christian
>> I'd like to do that :) Actually original version of kernel patch
>> has enabled this unconditionally. But Michael asked to make
>> it configurable and off by default.
>>
>> Den
> That's not the question here.  The question is why is it limited by kvm_has_sync_mmu.
>
original comment about this is quite simple

"    Until 2.6.27, KVM forced memory pinning so we must disable 
ballooning unless the
     kernel actually supports it when using KVM.  It's always safe when 
using TCG."

Thus this check is a rudiment of a very-very old kernels.
Actually I do not know whether current QEMU will start
on such kernels :)

Den

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-10 13:13       ` Michael S. Tsirkin
  2015-06-10 13:27         ` Denis V. Lunev
@ 2015-06-12 11:56         ` Christian Borntraeger
  2015-06-13 20:10           ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Borntraeger @ 2015-06-12 11:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Denis V. Lunev
  Cc: James.Bottomley, qemu-devel, Raushaniya Maksudova, Anthony Liguori

Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin:
> On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
>> On 09/06/15 13:37, Christian Borntraeger wrote:
>>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
>>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
>>>> when Linux is under severe memory pressure. Various mechanisms are
>>>> responsible for correct virtio_balloon memory management. Nevertheless it
>>>> is often the case that these control tools does not have enough time to
>>>> react on fast changing memory load. As a result OS runs out of memory and
>>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon
>>>> should not cause the termination of processes while there are pages in the
>>>> balloon. Now there is no way for virtio balloon driver to free memory at
>>>> the last moment before some process get killed by OOM-killer.
>>>>
>>>> This does not provide a security breach as balloon itself is running
>>>> inside Guest OS and is working in the cooperation with the host. Thus
>>>> some improvements from Guest side should be considered as normal.
>>>>
>>>> To solve the problem, introduce a virtio_balloon callback which is
>>>> expected to be called from the oom notifier call chain in out_of_memory()
>>>> function. If virtio balloon could release some memory, it will make the
>>>> system return and retry the allocation that forced the out of memory
>>>> killer to run.
>>>>
>>>> This behavior should be enabled if and only if appropriate feature bit
>>>> is set on the device. It is off by default.
>>> The balloon frees pages in this way
>>>
>>> static void balloon_page(void *addr, int deflate)
>>> {
>>> #if defined(__linux__)
>>>     if (!kvm_enabled() || kvm_has_sync_mmu())
>>>         qemu_madvise(addr, TARGET_PAGE_SIZE,
>>>                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
>>> #endif
>>> }
>>>
>>> The guest can re-touch that page and get a empty zero or the old page back without
>>> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?
>>>
>>> Christian
>>
>> I'd like to do that :) Actually original version of kernel patch
>> has enabled this unconditionally. But Michael asked to make
>> it configurable and off by default.
>>
>> Den
> 
> That's not the question here.  The question is why is it limited by kvm_has_sync_mmu.

Well we have two interesting options here:

VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM

For any sane host with ondemand paging just re-accessing the page
should simply work. So the common case could be
VIRTIO_BALLOON_F_MUST_TELL_HOST == off
VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on

Only for the rare case of hypervisors without paging or other memory
related restrictions we have to enable MUST_TELL_HOST.
Now: QEMU knows exactly which case we have, so why not let QEMU tell
the guest what the capabilities are. (e.g. sync_mmu ---> no need to 
tell the host).

I can at least imaging that some admin wants to make the the oom case
configurable, but a sane default seems to be to not kill random
guest processes.

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-12 11:56         ` Christian Borntraeger
@ 2015-06-13 20:10           ` Michael S. Tsirkin
  2015-06-15  7:01             ` Christian Borntraeger
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-06-13 20:10 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: James.Bottomley, Denis V. Lunev, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

On Fri, Jun 12, 2015 at 01:56:37PM +0200, Christian Borntraeger wrote:
> Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin:
> > On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
> >> On 09/06/15 13:37, Christian Borntraeger wrote:
> >>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
> >>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> >>>> when Linux is under severe memory pressure. Various mechanisms are
> >>>> responsible for correct virtio_balloon memory management. Nevertheless it
> >>>> is often the case that these control tools does not have enough time to
> >>>> react on fast changing memory load. As a result OS runs out of memory and
> >>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon
> >>>> should not cause the termination of processes while there are pages in the
> >>>> balloon. Now there is no way for virtio balloon driver to free memory at
> >>>> the last moment before some process get killed by OOM-killer.
> >>>>
> >>>> This does not provide a security breach as balloon itself is running
> >>>> inside Guest OS and is working in the cooperation with the host. Thus
> >>>> some improvements from Guest side should be considered as normal.
> >>>>
> >>>> To solve the problem, introduce a virtio_balloon callback which is
> >>>> expected to be called from the oom notifier call chain in out_of_memory()
> >>>> function. If virtio balloon could release some memory, it will make the
> >>>> system return and retry the allocation that forced the out of memory
> >>>> killer to run.
> >>>>
> >>>> This behavior should be enabled if and only if appropriate feature bit
> >>>> is set on the device. It is off by default.
> >>> The balloon frees pages in this way
> >>>
> >>> static void balloon_page(void *addr, int deflate)
> >>> {
> >>> #if defined(__linux__)
> >>>     if (!kvm_enabled() || kvm_has_sync_mmu())
> >>>         qemu_madvise(addr, TARGET_PAGE_SIZE,
> >>>                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> >>> #endif
> >>> }
> >>>
> >>> The guest can re-touch that page and get a empty zero or the old page back without
> >>> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?
> >>>
> >>> Christian
> >>
> >> I'd like to do that :) Actually original version of kernel patch
> >> has enabled this unconditionally. But Michael asked to make
> >> it configurable and off by default.
> >>
> >> Den
> > 
> > That's not the question here.  The question is why is it limited by kvm_has_sync_mmu.
> 
> Well we have two interesting options here:
> 
> VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM
> 
> For any sane host with ondemand paging just re-accessing the page
> should simply work. So the common case could be
> VIRTIO_BALLOON_F_MUST_TELL_HOST == off

Disabling this breaks useful optimizations such as
ability not to migrate memory in the balloon.

> VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on

AFAIK management tools depend on balloon not deflating
below host-specified threshold to avoid OOM on the host.
So I don't think we can make this a default,
management needs to enable this explicitly.

> Only for the rare case of hypervisors without paging or other memory
> related restrictions we have to enable MUST_TELL_HOST.
> Now: QEMU knows exactly which case we have, so why not let QEMU tell
> the guest what the capabilities are. (e.g. sync_mmu ---> no need to 
> tell the host).
> 
> I can at least imaging that some admin wants to make the the oom case
> configurable, but a sane default seems to be to not kill random
> guest processes.
> 
> Christian


-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-13 20:10           ` Michael S. Tsirkin
@ 2015-06-15  7:01             ` Christian Borntraeger
  2015-06-15  9:06               ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Borntraeger @ 2015-06-15  7:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James.Bottomley, Denis V. Lunev, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

Am 13.06.2015 um 22:10 schrieb Michael S. Tsirkin:
> On Fri, Jun 12, 2015 at 01:56:37PM +0200, Christian Borntraeger wrote:
>> Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin:
>>> On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
>>>> On 09/06/15 13:37, Christian Borntraeger wrote:
>>>>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
>>>>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
>>>>>> when Linux is under severe memory pressure. Various mechanisms are
>>>>>> responsible for correct virtio_balloon memory management. Nevertheless it
>>>>>> is often the case that these control tools does not have enough time to
>>>>>> react on fast changing memory load. As a result OS runs out of memory and
>>>>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon
>>>>>> should not cause the termination of processes while there are pages in the
>>>>>> balloon. Now there is no way for virtio balloon driver to free memory at
>>>>>> the last moment before some process get killed by OOM-killer.
>>>>>>
>>>>>> This does not provide a security breach as balloon itself is running
>>>>>> inside Guest OS and is working in the cooperation with the host. Thus
>>>>>> some improvements from Guest side should be considered as normal.
>>>>>>
>>>>>> To solve the problem, introduce a virtio_balloon callback which is
>>>>>> expected to be called from the oom notifier call chain in out_of_memory()
>>>>>> function. If virtio balloon could release some memory, it will make the
>>>>>> system return and retry the allocation that forced the out of memory
>>>>>> killer to run.
>>>>>>
>>>>>> This behavior should be enabled if and only if appropriate feature bit
>>>>>> is set on the device. It is off by default.
>>>>> The balloon frees pages in this way
>>>>>
>>>>> static void balloon_page(void *addr, int deflate)
>>>>> {
>>>>> #if defined(__linux__)
>>>>>     if (!kvm_enabled() || kvm_has_sync_mmu())
>>>>>         qemu_madvise(addr, TARGET_PAGE_SIZE,
>>>>>                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
>>>>> #endif
>>>>> }
>>>>>
>>>>> The guest can re-touch that page and get a empty zero or the old page back without
>>>>> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?
>>>>>
>>>>> Christian
>>>>
>>>> I'd like to do that :) Actually original version of kernel patch
>>>> has enabled this unconditionally. But Michael asked to make
>>>> it configurable and off by default.
>>>>
>>>> Den
>>>
>>> That's not the question here.  The question is why is it limited by kvm_has_sync_mmu.
>>
>> Well we have two interesting options here:
>>
>> VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM
>>
>> For any sane host with ondemand paging just re-accessing the page
>> should simply work. So the common case could be
>> VIRTIO_BALLOON_F_MUST_TELL_HOST == off
> 
> Disabling this breaks useful optimizations such as
> ability not to migrate memory in the balloon.

memory in the balloon is usually backed by the empty zero page after
the madvise (WONT_NEED will finally result in zap_pte_range for the
common case). In a ideal world migration should be able to optimize
zero pages.

 
>> VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on
> 
> AFAIK management tools depend on balloon not deflating
> below host-specified threshold to avoid OOM on the host.
> So I don't think we can make this a default,
> management needs to enable this explicitly.

If the ballooning is required to keep the host memory managedment
from OOM - iow abusing ballooning as memory hotplug between guests
then yes better let the guest oom - that makes sense.

Now: I think that doing so (not having enough swap in the host if
all guests deflate) and relying on balloon semantics is fundamentally
broken. Let me explain this: The problem is that we rely on guest
cooperation for the host integrity. As I explained  using madvise 
WONT_NEED will replace the current PTEs with invalid/emtpy PTEs. As
soon as the guest kernel re-touches the page (e.g. a malicious 
kernel module - not the balloon driver) it will be backed by the VMAs
default method - so usually with a shared R/O copy of the empty
zero page. Write accesses will result in a copy-on-write and allocate
new memory in the host. 
There is nothing we can do in the balloon protocol to protect the host
against malicious guests allocating all the maximum memory.

If you need host integrity against guest memory usage, something like
cgroups_memory or so is probably the only reliable way.

> 
>> Only for the rare case of hypervisors without paging or other memory
>> related restrictions we have to enable MUST_TELL_HOST.
>> Now: QEMU knows exactly which case we have, so why not let QEMU tell
>> the guest what the capabilities are. (e.g. sync_mmu ---> no need to 
>> tell the host).
>>
>> I can at least imaging that some admin wants to make the the oom case
>> configurable, but a sane default seems to be to not kill random
>> guest processes.
>>
>> Christian
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-15  7:01             ` Christian Borntraeger
@ 2015-06-15  9:06               ` Michael S. Tsirkin
  2015-06-15  9:59                 ` Christian Borntraeger
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-06-15  9:06 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: James.Bottomley, Denis V. Lunev, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

On Mon, Jun 15, 2015 at 09:01:53AM +0200, Christian Borntraeger wrote:
> Am 13.06.2015 um 22:10 schrieb Michael S. Tsirkin:
> > On Fri, Jun 12, 2015 at 01:56:37PM +0200, Christian Borntraeger wrote:
> >> Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin:
> >>> On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
> >>>> On 09/06/15 13:37, Christian Borntraeger wrote:
> >>>>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
> >>>>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> >>>>>> when Linux is under severe memory pressure. Various mechanisms are
> >>>>>> responsible for correct virtio_balloon memory management. Nevertheless it
> >>>>>> is often the case that these control tools does not have enough time to
> >>>>>> react on fast changing memory load. As a result OS runs out of memory and
> >>>>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon
> >>>>>> should not cause the termination of processes while there are pages in the
> >>>>>> balloon. Now there is no way for virtio balloon driver to free memory at
> >>>>>> the last moment before some process get killed by OOM-killer.
> >>>>>>
> >>>>>> This does not provide a security breach as balloon itself is running
> >>>>>> inside Guest OS and is working in the cooperation with the host. Thus
> >>>>>> some improvements from Guest side should be considered as normal.
> >>>>>>
> >>>>>> To solve the problem, introduce a virtio_balloon callback which is
> >>>>>> expected to be called from the oom notifier call chain in out_of_memory()
> >>>>>> function. If virtio balloon could release some memory, it will make the
> >>>>>> system return and retry the allocation that forced the out of memory
> >>>>>> killer to run.
> >>>>>>
> >>>>>> This behavior should be enabled if and only if appropriate feature bit
> >>>>>> is set on the device. It is off by default.
> >>>>> The balloon frees pages in this way
> >>>>>
> >>>>> static void balloon_page(void *addr, int deflate)
> >>>>> {
> >>>>> #if defined(__linux__)
> >>>>>     if (!kvm_enabled() || kvm_has_sync_mmu())
> >>>>>         qemu_madvise(addr, TARGET_PAGE_SIZE,
> >>>>>                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> >>>>> #endif
> >>>>> }
> >>>>>
> >>>>> The guest can re-touch that page and get a empty zero or the old page back without
> >>>>> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed?
> >>>>>
> >>>>> Christian
> >>>>
> >>>> I'd like to do that :) Actually original version of kernel patch
> >>>> has enabled this unconditionally. But Michael asked to make
> >>>> it configurable and off by default.
> >>>>
> >>>> Den
> >>>
> >>> That's not the question here.  The question is why is it limited by kvm_has_sync_mmu.
> >>
> >> Well we have two interesting options here:
> >>
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM
> >>
> >> For any sane host with ondemand paging just re-accessing the page
> >> should simply work. So the common case could be
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST == off
> > 
> > Disabling this breaks useful optimizations such as
> > ability not to migrate memory in the balloon.
> 
> memory in the balloon is usually backed by the empty zero page after
> the madvise (WONT_NEED will finally result in zap_pte_range for the
> common case). In a ideal world migration should be able to optimize
> zero pages.

This still involves reading them in as opposed to just skipping them.

>  
> >> VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on
> > 
> > AFAIK management tools depend on balloon not deflating
> > below host-specified threshold to avoid OOM on the host.
> > So I don't think we can make this a default,
> > management needs to enable this explicitly.
> 
> If the ballooning is required to keep the host memory managedment
> from OOM - iow abusing ballooning as memory hotplug between guests
> then yes better let the guest oom - that makes sense.
> 
> Now: I think that doing so (not having enough swap in the host if
> all guests deflate) and relying on balloon semantics is fundamentally
> broken. Let me explain this: The problem is that we rely on guest
> cooperation for the host integrity. As I explained  using madvise 
> WONT_NEED will replace the current PTEs with invalid/emtpy PTEs. As
> soon as the guest kernel re-touches the page (e.g. a malicious 
> kernel module - not the balloon driver) it will be backed by the VMAs
> default method - so usually with a shared R/O copy of the empty
> zero page. Write accesses will result in a copy-on-write and allocate
> new memory in the host. 
> There is nothing we can do in the balloon protocol to protect the host
> against malicious guests allocating all the maximum memory.

If we want to try and harden host, we can unmap it so guest will crash
if it touches pages without deflate.

> If you need host integrity against guest memory usage, something like
> cgroups_memory or so is probably the only reliable way.

In the original design, protection against a malicious guest is not the
point of the balloon, it's a technology that let you overcommit
cooperative guests.

> > 
> >> Only for the rare case of hypervisors without paging or other memory
> >> related restrictions we have to enable MUST_TELL_HOST.
> >> Now: QEMU knows exactly which case we have, so why not let QEMU tell
> >> the guest what the capabilities are. (e.g. sync_mmu ---> no need to 
> >> tell the host).
> >>
> >> I can at least imaging that some admin wants to make the the oom case
> >> configurable, but a sane default seems to be to not kill random
> >> guest processes.
> >>
> >> Christian
> > 
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-15  9:06               ` Michael S. Tsirkin
@ 2015-06-15  9:59                 ` Christian Borntraeger
  2015-06-15 10:10                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Borntraeger @ 2015-06-15  9:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James.Bottomley, Denis V. Lunev, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

Am 15.06.2015 um 11:06 schrieb Michael S. Tsirkin:

>>> AFAIK management tools depend on balloon not deflating
>>> below host-specified threshold to avoid OOM on the host.
>>> So I don't think we can make this a default,
>>> management needs to enable this explicitly.
>>
>> If the ballooning is required to keep the host memory managedment
>> from OOM - iow abusing ballooning as memory hotplug between guests
>> then yes better let the guest oom - that makes sense.
>>
>> Now: I think that doing so (not having enough swap in the host if
>> all guests deflate) and relying on balloon semantics is fundamentally
>> broken. Let me explain this: The problem is that we rely on guest
>> cooperation for the host integrity. As I explained  using madvise 
>> WONT_NEED will replace the current PTEs with invalid/emtpy PTEs. As
>> soon as the guest kernel re-touches the page (e.g. a malicious 
>> kernel module - not the balloon driver) it will be backed by the VMAs
>> default method - so usually with a shared R/O copy of the empty
>> zero page. Write accesses will result in a copy-on-write and allocate
>> new memory in the host. 
>> There is nothing we can do in the balloon protocol to protect the host
>> against malicious guests allocating all the maximum memory.
> 
> If we want to try and harden host, we can unmap it so guest will crash
> if it touches pages without deflate.
> 
>> If you need host integrity against guest memory usage, something like
>> cgroups_memory or so is probably the only reliable way.
> 
> In the original design, protection against a malicious guest is not the
> point of the balloon, it's a technology that let you overcommit
> cooperative guests.

Sure. But then its perfectly fine to let the guest reclaim by default,
because your statement 
"AFAIK management tools depend on balloon not deflating
 below host-specified threshold to avoid OOM on the host.
 So I don't think we can make this a default,
 management needs to enable this explicitly."

is not true ;-)

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-15  9:59                 ` Christian Borntraeger
@ 2015-06-15 10:10                   ` Michael S. Tsirkin
  2015-06-15 11:10                     ` Christian Borntraeger
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-06-15 10:10 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: James.Bottomley, Denis V. Lunev, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

On Mon, Jun 15, 2015 at 11:59:05AM +0200, Christian Borntraeger wrote:
> Am 15.06.2015 um 11:06 schrieb Michael S. Tsirkin:
> 
> >>> AFAIK management tools depend on balloon not deflating
> >>> below host-specified threshold to avoid OOM on the host.
> >>> So I don't think we can make this a default,
> >>> management needs to enable this explicitly.
> >>
> >> If the ballooning is required to keep the host memory managedment
> >> from OOM - iow abusing ballooning as memory hotplug between guests
> >> then yes better let the guest oom - that makes sense.
> >>
> >> Now: I think that doing so (not having enough swap in the host if
> >> all guests deflate) and relying on balloon semantics is fundamentally
> >> broken. Let me explain this: The problem is that we rely on guest
> >> cooperation for the host integrity. As I explained  using madvise 
> >> WONT_NEED will replace the current PTEs with invalid/emtpy PTEs. As
> >> soon as the guest kernel re-touches the page (e.g. a malicious 
> >> kernel module - not the balloon driver) it will be backed by the VMAs
> >> default method - so usually with a shared R/O copy of the empty
> >> zero page. Write accesses will result in a copy-on-write and allocate
> >> new memory in the host. 
> >> There is nothing we can do in the balloon protocol to protect the host
> >> against malicious guests allocating all the maximum memory.
> > 
> > If we want to try and harden host, we can unmap it so guest will crash
> > if it touches pages without deflate.
> > 
> >> If you need host integrity against guest memory usage, something like
> >> cgroups_memory or so is probably the only reliable way.
> > 
> > In the original design, protection against a malicious guest is not the
> > point of the balloon, it's a technology that let you overcommit
> > cooperative guests.
> 
> Sure. But then its perfectly fine to let the guest reclaim by default,
> because your statement 
> "AFAIK management tools depend on balloon not deflating
>  below host-specified threshold to avoid OOM on the host.
>  So I don't think we can make this a default,
>  management needs to enable this explicitly."
> 
> is not true ;-)
> 
> Christian

I don't see the connection.
Deflate on OOM means "it's OK to deflate if you like, this won't
cause any harm". If you set this flag, you can't overcommit host too
agressively even with cooperative guests.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom
  2015-06-15 10:10                   ` Michael S. Tsirkin
@ 2015-06-15 11:10                     ` Christian Borntraeger
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2015-06-15 11:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James.Bottomley, Denis V. Lunev, qemu-devel,
	Raushaniya Maksudova, Anthony Liguori

Am 15.06.2015 um 12:10 schrieb Michael S. Tsirkin:
> On Mon, Jun 15, 2015 at 11:59:05AM +0200, Christian Borntraeger wrote:
>> Am 15.06.2015 um 11:06 schrieb Michael S. Tsirkin:
>>
>>>>> AFAIK management tools depend on balloon not deflating
>>>>> below host-specified threshold to avoid OOM on the host.
>>>>> So I don't think we can make this a default,
>>>>> management needs to enable this explicitly.
>>>>
>>>> If the ballooning is required to keep the host memory managedment
>>>> from OOM - iow abusing ballooning as memory hotplug between guests
>>>> then yes better let the guest oom - that makes sense.
>>>>
>>>> Now: I think that doing so (not having enough swap in the host if
>>>> all guests deflate) and relying on balloon semantics is fundamentally
>>>> broken. Let me explain this: The problem is that we rely on guest
>>>> cooperation for the host integrity. As I explained  using madvise 
>>>> WONT_NEED will replace the current PTEs with invalid/emtpy PTEs. As
>>>> soon as the guest kernel re-touches the page (e.g. a malicious 
>>>> kernel module - not the balloon driver) it will be backed by the VMAs
>>>> default method - so usually with a shared R/O copy of the empty
>>>> zero page. Write accesses will result in a copy-on-write and allocate
>>>> new memory in the host. 
>>>> There is nothing we can do in the balloon protocol to protect the host
>>>> against malicious guests allocating all the maximum memory.
>>>
>>> If we want to try and harden host, we can unmap it so guest will crash
>>> if it touches pages without deflate.
>>>
>>>> If you need host integrity against guest memory usage, something like
>>>> cgroups_memory or so is probably the only reliable way.
>>>
>>> In the original design, protection against a malicious guest is not the
>>> point of the balloon, it's a technology that let you overcommit
>>> cooperative guests.
>>
>> Sure. But then its perfectly fine to let the guest reclaim by default,
>> because your statement 
>> "AFAIK management tools depend on balloon not deflating
>>  below host-specified threshold to avoid OOM on the host.
>>  So I don't think we can make this a default,
>>  management needs to enable this explicitly."
>>
>> is not true ;-)
>>
>> Christian
> 
> I don't see the connection.
> Deflate on OOM means "it's OK to deflate if you like, this won't
> cause any harm". If you set this flag, you can't overcommit host too
> agressively even with cooperative guests.

The connection is that there is no fundamental issue being solved that
requires the setting to be off or on. (both is ok with pros and cons)
Keeping reclaim on oom off has of course the lowest impact as it is
just todays behaviour - so we play safe.

I personally can live fine with both defaults and in the end its up to
you to decide as virtio maintainer. (my personal opinion to have deflate on
OOM=yes, MUST_TELL_HOST=off still stands, though)

Just keep in mind that you add an interface that we will drag along forever
and that we might require all future user to add a statement to libvirt
to do the right thing. So we better make up our mind which default value
has the bigger downsides. If that decision making process comes to the
conclusion to do it like this patch - fine with me. 

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-06-15 11:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 10:19 [Qemu-devel] [PATCH v6 0/1] balloon: add a feature bit to let Guest OS deflate Denis V. Lunev
2015-06-09 10:19 ` [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom Denis V. Lunev
2015-06-09 10:37   ` Christian Borntraeger
2015-06-10 12:02     ` Denis V. Lunev
2015-06-10 13:13       ` Michael S. Tsirkin
2015-06-10 13:27         ` Denis V. Lunev
2015-06-12 11:56         ` Christian Borntraeger
2015-06-13 20:10           ` Michael S. Tsirkin
2015-06-15  7:01             ` Christian Borntraeger
2015-06-15  9:06               ` Michael S. Tsirkin
2015-06-15  9:59                 ` Christian Borntraeger
2015-06-15 10:10                   ` Michael S. Tsirkin
2015-06-15 11:10                     ` Christian Borntraeger
2015-06-09 10:37   ` Denis V. Lunev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.