All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
       [not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
@ 2017-10-10 11:47 ` Michal Hocko
  2017-10-10 11:47 ` Michal Hocko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-10-10 11:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mst, wei.w.wang, virtualization, linux-mm

On Tue 10-10-17 19:47:37, Tetsuo Handa wrote:
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
> 
>   Thread1                                       Thread2
>     fill_balloon()
>       takes a balloon_lock
>       balloon_page_enqueue()
>         alloc_page(GFP_HIGHUSER_MOVABLE)
>           direct reclaim (__GFP_FS context)       takes a fs lock
>             waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                       __alloc_pages_may_oom()
>                                                         takes the oom_lock
>                                                         out_of_memory()
>                                                           blocking_notifier_call_chain()
>                                                             leak_balloon()
>                                                               tries to take that balloon_lock and deadlocks
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

FWIW this looks good to me from the deadlock POV. I cannot judge virtio
internals and I would appreciate if it could move away from the oom
notifier API to a more generic reclaim mechanism (e.g. shrinkers).

Reviewed-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>  {
>  	unsigned num_freed_pages;
>  	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	/* We can only do one array worth at a time. */
>  	num = min(num, ARRAY_SIZE(vb->pfns));
>  
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>  	/* We can't release more pages than taken */
>  	num = min(num, (size_t)vb->num_pages);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>  	if (diff > 0)
>  		diff -= fill_balloon(vb, diff);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>  {
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
       [not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
  2017-10-10 11:47 ` [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify() Michal Hocko
@ 2017-10-10 11:47 ` Michal Hocko
  2017-10-12  2:36 ` Wei Wang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-10-10 11:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, virtualization, mst

On Tue 10-10-17 19:47:37, Tetsuo Handa wrote:
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
> 
>   Thread1                                       Thread2
>     fill_balloon()
>       takes a balloon_lock
>       balloon_page_enqueue()
>         alloc_page(GFP_HIGHUSER_MOVABLE)
>           direct reclaim (__GFP_FS context)       takes a fs lock
>             waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                       __alloc_pages_may_oom()
>                                                         takes the oom_lock
>                                                         out_of_memory()
>                                                           blocking_notifier_call_chain()
>                                                             leak_balloon()
>                                                               tries to take that balloon_lock and deadlocks
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

FWIW this looks good to me from the deadlock POV. I cannot judge virtio
internals and I would appreciate if it could move away from the oom
notifier API to a more generic reclaim mechanism (e.g. shrinkers).

Reviewed-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>  {
>  	unsigned num_freed_pages;
>  	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	/* We can only do one array worth at a time. */
>  	num = min(num, ARRAY_SIZE(vb->pfns));
>  
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>  	/* We can't release more pages than taken */
>  	num = min(num, (size_t)vb->num_pages);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>  	if (diff > 0)
>  		diff -= fill_balloon(vb, diff);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>  {
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
       [not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
  2017-10-10 11:47 ` [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify() Michal Hocko
  2017-10-10 11:47 ` Michal Hocko
@ 2017-10-12  2:36 ` Wei Wang
  2017-10-13 11:28   ` Tetsuo Handa
  2017-10-12  2:36 ` Wei Wang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Wei Wang @ 2017-10-12  2:36 UTC (permalink / raw)
  To: Tetsuo Handa, mst, mhocko; +Cc: virtualization, linux-mm

On 10/10/2017 06:47 PM, Tetsuo Handa wrote:
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
>
>    Thread1                                       Thread2
>      fill_balloon()
>        takes a balloon_lock
>        balloon_page_enqueue()
>          alloc_page(GFP_HIGHUSER_MOVABLE)
>            direct reclaim (__GFP_FS context)       takes a fs lock
>              waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                        __alloc_pages_may_oom()
>                                                          takes the oom_lock
>                                                          out_of_memory()
>                                                            blocking_notifier_call_chain()
>                                                              leak_balloon()
>                                                                tries to take that balloon_lock and deadlocks
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>


I think we could also use 'bool wait' to let the OOM use the traditional 
leak_balloon code path,
which doesn't need memory allocation from xb_preload(). That is, inside 
leak_balloon(), we can have

use_sg = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG) & wait;


Reviewed-by: Wei Wang <wei.w.wang@intel.com>


> ---
>   drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>   	}
>   }
>   
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>   {
>   	unsigned num_freed_pages;
>   	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>   	/* We can only do one array worth at a time. */
>   	num = min(num, ARRAY_SIZE(vb->pfns));
>   
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>   	/* We can't release more pages than taken */
>   	num = min(num, (size_t)vb->num_pages);
>   	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>   		return NOTIFY_OK;
>   
>   	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>   	update_balloon_size(vb);
>   	*freed += num_freed_pages;
>   
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>   	if (diff > 0)
>   		diff -= fill_balloon(vb, diff);
>   	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>   	update_balloon_size(vb);
>   
>   	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>   {
>   	/* There might be pages left in the balloon: free them. */
>   	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>   	update_balloon_size(vb);
>   
>   	/* Now we reset the device so we can clean up the queues. */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
       [not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
                   ` (2 preceding siblings ...)
  2017-10-12  2:36 ` Wei Wang
@ 2017-10-12  2:36 ` Wei Wang
  2017-10-13 13:23 ` Michael S. Tsirkin
  2017-10-13 13:23 ` Michael S. Tsirkin
  5 siblings, 0 replies; 27+ messages in thread
From: Wei Wang @ 2017-10-12  2:36 UTC (permalink / raw)
  To: Tetsuo Handa, mst, mhocko; +Cc: linux-mm, virtualization

On 10/10/2017 06:47 PM, Tetsuo Handa wrote:
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
>
>    Thread1                                       Thread2
>      fill_balloon()
>        takes a balloon_lock
>        balloon_page_enqueue()
>          alloc_page(GFP_HIGHUSER_MOVABLE)
>            direct reclaim (__GFP_FS context)       takes a fs lock
>              waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                        __alloc_pages_may_oom()
>                                                          takes the oom_lock
>                                                          out_of_memory()
>                                                            blocking_notifier_call_chain()
>                                                              leak_balloon()
>                                                                tries to take that balloon_lock and deadlocks
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>


I think we could also use 'bool wait' to let the OOM use the traditional 
leak_balloon code path,
which doesn't need memory allocation from xb_preload(). That is, inside 
leak_balloon(), we can have

use_sg = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG) & wait;


Reviewed-by: Wei Wang <wei.w.wang@intel.com>


> ---
>   drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>   	}
>   }
>   
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>   {
>   	unsigned num_freed_pages;
>   	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>   	/* We can only do one array worth at a time. */
>   	num = min(num, ARRAY_SIZE(vb->pfns));
>   
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>   	/* We can't release more pages than taken */
>   	num = min(num, (size_t)vb->num_pages);
>   	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>   		return NOTIFY_OK;
>   
>   	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>   	update_balloon_size(vb);
>   	*freed += num_freed_pages;
>   
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>   	if (diff > 0)
>   		diff -= fill_balloon(vb, diff);
>   	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>   	update_balloon_size(vb);
>   
>   	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>   {
>   	/* There might be pages left in the balloon: free them. */
>   	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>   	update_balloon_size(vb);
>   
>   	/* Now we reset the device so we can clean up the queues. */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-12  2:36 ` Wei Wang
@ 2017-10-13 11:28   ` Tetsuo Handa
  2017-10-13 13:19     ` Michael S. Tsirkin
  2017-10-13 13:19     ` Michael S. Tsirkin
  0 siblings, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-13 11:28 UTC (permalink / raw)
  To: mst, jasowang; +Cc: virtualization, linux-mm

Michael, will you pick up this patch?
----------
>From 210dba24134e54cd470e79712c5cb8bb255566c0 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 10 Oct 2017 19:28:20 +0900
Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()

In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
is specified, this allocation attempt might indirectly depend on somebody
else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
__GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
leak_balloon() is called from out_of_memory().

  Thread1                                       Thread2
    fill_balloon()
      takes a balloon_lock
      balloon_page_enqueue()
        alloc_page(GFP_HIGHUSER_MOVABLE)
          direct reclaim (__GFP_FS context)       takes a fs lock
            waits for that fs lock                  alloc_page(GFP_NOFS)
                                                      __alloc_pages_may_oom()
                                                        takes the oom_lock
                                                        out_of_memory()
                                                          blocking_notifier_call_chain()
                                                            leak_balloon()
                                                              tries to take that balloon_lock and deadlocks

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Wei Wang <wei.w.wang@intel.com>
---
 drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..03e6078 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	}
 }
 
-static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
 {
 	unsigned num_freed_pages;
 	struct page *page;
@@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
-	mutex_lock(&vb->balloon_lock);
+	if (wait)
+		mutex_lock(&vb->balloon_lock);
+	else if (!mutex_trylock(&vb->balloon_lock)) {
+		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
+			(unsigned long) min(num, (size_t)vb->num_pages));
+		return 0;
+	}
 	/* We can't release more pages than taken */
 	num = min(num, (size_t)vb->num_pages);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 		return NOTIFY_OK;
 
 	freed = parm;
-	num_freed_pages = leak_balloon(vb, oom_pages);
+	num_freed_pages = leak_balloon(vb, oom_pages, false);
 	update_balloon_size(vb);
 	*freed += num_freed_pages;
 
@@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
 	if (diff > 0)
 		diff -= fill_balloon(vb, diff);
 	else if (diff < 0)
-		diff += leak_balloon(vb, -diff);
+		diff += leak_balloon(vb, -diff, true);
 	update_balloon_size(vb);
 
 	if (diff)
@@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
 {
 	/* There might be pages left in the balloon: free them. */
 	while (vb->num_pages)
-		leak_balloon(vb, vb->num_pages);
+		leak_balloon(vb, vb->num_pages, true);
 	update_balloon_size(vb);
 
 	/* Now we reset the device so we can clean up the queues. */
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-13 11:28   ` Tetsuo Handa
@ 2017-10-13 13:19     ` Michael S. Tsirkin
  2017-10-13 13:19     ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-13 13:19 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: jasowang, virtualization, linux-mm

On Fri, Oct 13, 2017 at 08:28:37PM +0900, Tetsuo Handa wrote:
> Michael, will you pick up this patch?
> ----------
> >From 210dba24134e54cd470e79712c5cb8bb255566c0 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Tue, 10 Oct 2017 19:28:20 +0900
> Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> 
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
> 
>   Thread1                                       Thread2
>     fill_balloon()
>       takes a balloon_lock
>       balloon_page_enqueue()
>         alloc_page(GFP_HIGHUSER_MOVABLE)
>           direct reclaim (__GFP_FS context)       takes a fs lock
>             waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                       __alloc_pages_may_oom()
>                                                         takes the oom_lock
>                                                         out_of_memory()
>                                                           blocking_notifier_call_chain()
>                                                             leak_balloon()
>                                                               tries to take that balloon_lock and deadlocks
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reviewed-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Wei Wang <wei.w.wang@intel.com>

I won't since it does not deflate on OOM as we have promised host to do.
Will post a patch to fix the issue shortly.

> ---
>  drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>  {
>  	unsigned num_freed_pages;
>  	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	/* We can only do one array worth at a time. */
>  	num = min(num, ARRAY_SIZE(vb->pfns));
>  
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>  	/* We can't release more pages than taken */
>  	num = min(num, (size_t)vb->num_pages);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>  	if (diff > 0)
>  		diff -= fill_balloon(vb, diff);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>  {
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> -- 
> 1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-13 11:28   ` Tetsuo Handa
  2017-10-13 13:19     ` Michael S. Tsirkin
@ 2017-10-13 13:19     ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-13 13:19 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, virtualization

On Fri, Oct 13, 2017 at 08:28:37PM +0900, Tetsuo Handa wrote:
> Michael, will you pick up this patch?
> ----------
> >From 210dba24134e54cd470e79712c5cb8bb255566c0 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Tue, 10 Oct 2017 19:28:20 +0900
> Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> 
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
> 
>   Thread1                                       Thread2
>     fill_balloon()
>       takes a balloon_lock
>       balloon_page_enqueue()
>         alloc_page(GFP_HIGHUSER_MOVABLE)
>           direct reclaim (__GFP_FS context)       takes a fs lock
>             waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                       __alloc_pages_may_oom()
>                                                         takes the oom_lock
>                                                         out_of_memory()
>                                                           blocking_notifier_call_chain()
>                                                             leak_balloon()
>                                                               tries to take that balloon_lock and deadlocks
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reviewed-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Wei Wang <wei.w.wang@intel.com>

I won't since it does not deflate on OOM as we have promised host to do.
Will post a patch to fix the issue shortly.

> ---
>  drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>  {
>  	unsigned num_freed_pages;
>  	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	/* We can only do one array worth at a time. */
>  	num = min(num, ARRAY_SIZE(vb->pfns));
>  
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>  	/* We can't release more pages than taken */
>  	num = min(num, (size_t)vb->num_pages);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>  	if (diff > 0)
>  		diff -= fill_balloon(vb, diff);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>  {
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
       [not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
                   ` (4 preceding siblings ...)
  2017-10-13 13:23 ` Michael S. Tsirkin
@ 2017-10-13 13:23 ` Michael S. Tsirkin
  2017-10-13 16:41   ` Tetsuo Handa
  2017-10-13 16:41   ` Tetsuo Handa
  5 siblings, 2 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-13 13:23 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mhocko, wei.w.wang, virtualization, linux-mm

On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote:
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
> 
>   Thread1                                       Thread2
>     fill_balloon()
>       takes a balloon_lock
>       balloon_page_enqueue()
>         alloc_page(GFP_HIGHUSER_MOVABLE)
>           direct reclaim (__GFP_FS context)       takes a fs lock
>             waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                       __alloc_pages_may_oom()
>                                                         takes the oom_lock
>                                                         out_of_memory()
>                                                           blocking_notifier_call_chain()
>                                                             leak_balloon()
>                                                               tries to take that balloon_lock and deadlocks
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

This doesn't deflate on oom if lock is contended, and we acked
DEFLATE_ON_OOM so host actually expects us to.

The proper fix isn't that hard - just avoid allocations under lock.

Patch posted, pls take a look.


> ---
>  drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>  {
>  	unsigned num_freed_pages;
>  	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	/* We can only do one array worth at a time. */
>  	num = min(num, ARRAY_SIZE(vb->pfns));
>  
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>  	/* We can't release more pages than taken */
>  	num = min(num, (size_t)vb->num_pages);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>  	if (diff > 0)
>  		diff -= fill_balloon(vb, diff);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>  {
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> -- 
> 1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
       [not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
                   ` (3 preceding siblings ...)
  2017-10-12  2:36 ` Wei Wang
@ 2017-10-13 13:23 ` Michael S. Tsirkin
  2017-10-13 13:23 ` Michael S. Tsirkin
  5 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-13 13:23 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, virtualization, mhocko

On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote:
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might indirectly depend on somebody
> else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> leak_balloon() is called from out_of_memory().
> 
>   Thread1                                       Thread2
>     fill_balloon()
>       takes a balloon_lock
>       balloon_page_enqueue()
>         alloc_page(GFP_HIGHUSER_MOVABLE)
>           direct reclaim (__GFP_FS context)       takes a fs lock
>             waits for that fs lock                  alloc_page(GFP_NOFS)
>                                                       __alloc_pages_may_oom()
>                                                         takes the oom_lock
>                                                         out_of_memory()
>                                                           blocking_notifier_call_chain()
>                                                             leak_balloon()
>                                                               tries to take that balloon_lock and deadlocks
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

This doesn't deflate on oom if lock is contended, and we acked
DEFLATE_ON_OOM so host actually expects us to.

The proper fix isn't that hard - just avoid allocations under lock.

Patch posted, pls take a look.


> ---
>  drivers/virtio/virtio_balloon.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..03e6078 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
>  {
>  	unsigned num_freed_pages;
>  	struct page *page;
> @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	/* We can only do one array worth at a time. */
>  	num = min(num, ARRAY_SIZE(vb->pfns));
>  
> -	mutex_lock(&vb->balloon_lock);
> +	if (wait)
> +		mutex_lock(&vb->balloon_lock);
> +	else if (!mutex_trylock(&vb->balloon_lock)) {
> +		pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n",
> +			(unsigned long) min(num, (size_t)vb->num_pages));
> +		return 0;
> +	}
>  	/* We can't release more pages than taken */
>  	num = min(num, (size_t)vb->num_pages);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	num_freed_pages = leak_balloon(vb, oom_pages, false);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work)
>  	if (diff > 0)
>  		diff -= fill_balloon(vb, diff);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, true);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb)
>  {
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, true);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-13 13:23 ` Michael S. Tsirkin
@ 2017-10-13 16:41   ` Tetsuo Handa
  2017-10-15  0:22     ` Michael S. Tsirkin
  2017-10-15  0:22     ` Michael S. Tsirkin
  2017-10-13 16:41   ` Tetsuo Handa
  1 sibling, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-13 16:41 UTC (permalink / raw)
  To: mst; +Cc: mhocko, wei.w.wang, virtualization, linux-mm

Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote:
> > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > serialize against fill_balloon(). But in fill_balloon(),
> > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > is specified, this allocation attempt might indirectly depend on somebody
> > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> > leak_balloon() is called from out_of_memory().
> > 
> >   Thread1                                       Thread2
> >     fill_balloon()
> >       takes a balloon_lock
> >       balloon_page_enqueue()
> >         alloc_page(GFP_HIGHUSER_MOVABLE)
> >           direct reclaim (__GFP_FS context)       takes a fs lock
> >             waits for that fs lock                  alloc_page(GFP_NOFS)
> >                                                       __alloc_pages_may_oom()
> >                                                         takes the oom_lock
> >                                                         out_of_memory()
> >                                                           blocking_notifier_call_chain()
> >                                                             leak_balloon()
> >                                                               tries to take that balloon_lock and deadlocks
> > 
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> This doesn't deflate on oom if lock is contended, and we acked
> DEFLATE_ON_OOM so host actually expects us to.

I still don't understand what is wrong with not deflating on OOM.
According to https://lists.oasis-open.org/archives/virtio-dev/201504/msg00084.html ,

  If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the
  driver MAY use pages from the balloon when \field{num_pages}
  is less than or equal to the actual number of pages in the
  balloon if this is required for system stability
  (e.g. if memory is required by applications running within
  the guest).

it says "MAY" rather than "MUST". I think it is legal to be a no-op.
Maybe I don't understand the difference between "deflate the balloon" and
"use pages from the balloon" ?

According to https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027807.html ,
it seems to me that the expected behavior after deflating while inflating
was not defined when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was proposed.

When the host increased "struct virtio_balloon_config"->num_pages and
kicked the guest, the guest's update_balloon_size_func() starts calling
fill_balloon() until "struct virtio_balloon"->num_pages reaches
"struct virtio_balloon_config"->num_pages, doesn't it?

  struct virtio_balloon_config {
  	/* Number of pages host wants Guest to give up. */
  	__u32 num_pages;
  	/* Number of pages we've actually got in balloon. */
  	__u32 actual;
  };

If leak_balloon() is called via out_of_memory(), leak_balloon()
will decrease "struct virtio_balloon"->num_pages.
But, is "struct virtio_balloon_config"->num_pages updated when
leak_balloon() is called via out_of_memory() ?
If yes, update_balloon_size_func() would stop calling fill_balloon()
when leak_balloon() was called via out_of_memory().
If no, update_balloon_size_func() would continue calling fill_balloon()
when leak_balloon() was called via out_of_memory() via fill_balloon()
via update_balloon_size_func(). That is, when fill_balloon() tries to
increase "struct virtio_balloon"->num_pages, leak_balloon() which
decreases "struct virtio_balloon"->num_pages is called due to indirect
__GFP_DIRECT_RECLAIM dependency via out_of_memory().
As a result, fill_balloon() will continue trying to increase
"struct virtio_balloon"->num_pages and leak_balloon() will continue
decreasing "struct virtio_balloon"->num_pages when leak_balloon()
is called via fill_balloon() via update_balloon_size_func() due to
host increased "struct virtio_balloon_config"->num_pages and kicked
the guest. We deflate the balloon in order to inflate the balloon.
That is OOM lockup, isn't it? How is such situation better than
invoking the OOM killer in order to inflate the balloon?

> 
> The proper fix isn't that hard - just avoid allocations under lock.
> 
> Patch posted, pls take a look.

Your patch allocates pages in order to inflate the balloon, but
your patch will allow leak_balloon() to deflate the balloon.
How deflating the balloon (i.e. calling leak_balloon()) makes sense
when allocating pages for inflating the balloon (i.e. calling
fill_balloon()) ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-13 13:23 ` Michael S. Tsirkin
  2017-10-13 16:41   ` Tetsuo Handa
@ 2017-10-13 16:41   ` Tetsuo Handa
  1 sibling, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-13 16:41 UTC (permalink / raw)
  To: mst; +Cc: linux-mm, virtualization, mhocko

Michael S. Tsirkin wrote:
> On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote:
> > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > serialize against fill_balloon(). But in fill_balloon(),
> > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > is specified, this allocation attempt might indirectly depend on somebody
> > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> > leak_balloon() is called from out_of_memory().
> > 
> >   Thread1                                       Thread2
> >     fill_balloon()
> >       takes a balloon_lock
> >       balloon_page_enqueue()
> >         alloc_page(GFP_HIGHUSER_MOVABLE)
> >           direct reclaim (__GFP_FS context)       takes a fs lock
> >             waits for that fs lock                  alloc_page(GFP_NOFS)
> >                                                       __alloc_pages_may_oom()
> >                                                         takes the oom_lock
> >                                                         out_of_memory()
> >                                                           blocking_notifier_call_chain()
> >                                                             leak_balloon()
> >                                                               tries to take that balloon_lock and deadlocks
> > 
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> This doesn't deflate on oom if lock is contended, and we acked
> DEFLATE_ON_OOM so host actually expects us to.

I still don't understand what is wrong with not deflating on OOM.
According to https://lists.oasis-open.org/archives/virtio-dev/201504/msg00084.html ,

  If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the
  driver MAY use pages from the balloon when \field{num_pages}
  is less than or equal to the actual number of pages in the
  balloon if this is required for system stability
  (e.g. if memory is required by applications running within
  the guest).

it says "MAY" rather than "MUST". I think it is legal to be a no-op.
Maybe I don't understand the difference between "deflate the balloon" and
"use pages from the balloon" ?

According to https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027807.html ,
it seems to me that the expected behavior after deflating while inflating
was not defined when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was proposed.

When the host increased "struct virtio_balloon_config"->num_pages and
kicked the guest, the guest's update_balloon_size_func() starts calling
fill_balloon() until "struct virtio_balloon"->num_pages reaches
"struct virtio_balloon_config"->num_pages, doesn't it?

  struct virtio_balloon_config {
  	/* Number of pages host wants Guest to give up. */
  	__u32 num_pages;
  	/* Number of pages we've actually got in balloon. */
  	__u32 actual;
  };

If leak_balloon() is called via out_of_memory(), leak_balloon()
will decrease "struct virtio_balloon"->num_pages.
But, is "struct virtio_balloon_config"->num_pages updated when
leak_balloon() is called via out_of_memory() ?
If yes, update_balloon_size_func() would stop calling fill_balloon()
when leak_balloon() was called via out_of_memory().
If no, update_balloon_size_func() would continue calling fill_balloon()
when leak_balloon() was called via out_of_memory() via fill_balloon()
via update_balloon_size_func(). That is, when fill_balloon() tries to
increase "struct virtio_balloon"->num_pages, leak_balloon() which
decreases "struct virtio_balloon"->num_pages is called due to indirect
__GFP_DIRECT_RECLAIM dependency via out_of_memory().
As a result, fill_balloon() will continue trying to increase
"struct virtio_balloon"->num_pages and leak_balloon() will continue
decreasing "struct virtio_balloon"->num_pages when leak_balloon()
is called via fill_balloon() via update_balloon_size_func() due to
host increased "struct virtio_balloon_config"->num_pages and kicked
the guest. We deflate the balloon in order to inflate the balloon.
That is OOM lockup, isn't it? How is such situation better than
invoking the OOM killer in order to inflate the balloon?

> 
> The proper fix isn't that hard - just avoid allocations under lock.
> 
> Patch posted, pls take a look.

Your patch allocates pages in order to inflate the balloon, but
your patch will allow leak_balloon() to deflate the balloon.
How deflating the balloon (i.e. calling leak_balloon()) makes sense
when allocating pages for inflating the balloon (i.e. calling
fill_balloon()) ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-13 16:41   ` Tetsuo Handa
@ 2017-10-15  0:22     ` Michael S. Tsirkin
  2017-10-15  5:38       ` Tetsuo Handa
  2017-10-15  5:38       ` Tetsuo Handa
  2017-10-15  0:22     ` Michael S. Tsirkin
  1 sibling, 2 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-15  0:22 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mhocko, wei.w.wang, virtualization, linux-mm

On Sat, Oct 14, 2017 at 01:41:14AM +0900, Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote:
> > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > serialize against fill_balloon(). But in fill_balloon(),
> > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > > is specified, this allocation attempt might indirectly depend on somebody
> > > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> > > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> > > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> > > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> > > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> > > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> > > leak_balloon() is called from out_of_memory().
> > > 
> > >   Thread1                                       Thread2
> > >     fill_balloon()
> > >       takes a balloon_lock
> > >       balloon_page_enqueue()
> > >         alloc_page(GFP_HIGHUSER_MOVABLE)
> > >           direct reclaim (__GFP_FS context)       takes a fs lock
> > >             waits for that fs lock                  alloc_page(GFP_NOFS)
> > >                                                       __alloc_pages_may_oom()
> > >                                                         takes the oom_lock
> > >                                                         out_of_memory()
> > >                                                           blocking_notifier_call_chain()
> > >                                                             leak_balloon()
> > >                                                               tries to take that balloon_lock and deadlocks
> > > 
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > 
> > This doesn't deflate on oom if lock is contended, and we acked
> > DEFLATE_ON_OOM so host actually expects us to.
> 
> I still don't understand what is wrong with not deflating on OOM.

Well the point of this flag is that when it's acked,
host knows that it's safe to inflate the balloon
to a large portion of guest memory and this won't
cause an OOM situation.

Without this flag, if you inflate after OOM, then it's
fine as allocations fail, but if you inflate and then
enter OOM, balloon won't give up memory.

The flag is unfortunately hard for management tools to use
which led to it still not being supported by hosts.


> According to https://lists.oasis-open.org/archives/virtio-dev/201504/msg00084.html ,
> 
>   If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the
>   driver MAY use pages from the balloon when \field{num_pages}
>   is less than or equal to the actual number of pages in the
>   balloon if this is required for system stability
>   (e.g. if memory is required by applications running within
>   the guest).
> 
> it says "MAY" rather than "MUST". I think it is legal to be a no-op.
> Maybe I don't understand the difference between "deflate the balloon" and
> "use pages from the balloon" ?

Maybe we should strengthen this to SHOULD.


> According to https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027807.html ,
> it seems to me that the expected behavior after deflating while inflating
> was not defined when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was proposed.

I think the assumption is that it fill back up eventually
when guest does have some free memory.

> When the host increased "struct virtio_balloon_config"->num_pages and
> kicked the guest, the guest's update_balloon_size_func() starts calling
> fill_balloon() until "struct virtio_balloon"->num_pages reaches
> "struct virtio_balloon_config"->num_pages, doesn't it?
> 
>   struct virtio_balloon_config {
>   	/* Number of pages host wants Guest to give up. */
>   	__u32 num_pages;
>   	/* Number of pages we've actually got in balloon. */
>   	__u32 actual;
>   };
> 
> If leak_balloon() is called via out_of_memory(), leak_balloon()
> will decrease "struct virtio_balloon"->num_pages.
> But, is "struct virtio_balloon_config"->num_pages updated when
> leak_balloon() is called via out_of_memory() ?
> If yes, update_balloon_size_func() would stop calling fill_balloon()
> when leak_balloon() was called via out_of_memory().
> If no, update_balloon_size_func() would continue calling fill_balloon()
> when leak_balloon() was called via out_of_memory() via fill_balloon()
> via update_balloon_size_func(). That is, when fill_balloon() tries to
> increase "struct virtio_balloon"->num_pages, leak_balloon() which
> decreases "struct virtio_balloon"->num_pages is called due to indirect
> __GFP_DIRECT_RECLAIM dependency via out_of_memory().
> As a result, fill_balloon() will continue trying to increase
> "struct virtio_balloon"->num_pages and leak_balloon() will continue
> decreasing "struct virtio_balloon"->num_pages when leak_balloon()
> is called via fill_balloon() via update_balloon_size_func() due to
> host increased "struct virtio_balloon_config"->num_pages and kicked
> the guest. We deflate the balloon in order to inflate the balloon.
> That is OOM lockup, isn't it? How is such situation better than
> invoking the OOM killer in order to inflate the balloon?

I don't think it's a lockup see below.

> > 
> > The proper fix isn't that hard - just avoid allocations under lock.
> > 
> > Patch posted, pls take a look.
> 
> Your patch allocates pages in order to inflate the balloon, but
> your patch will allow leak_balloon() to deflate the balloon.
> How deflating the balloon (i.e. calling leak_balloon()) makes sense
> when allocating pages for inflating the balloon (i.e. calling
> fill_balloon()) ?

The idea is that fill_balloon is allocating memory with __GFP_NORETRY
so it will avoid disruptive actions like the OOM killer.
Under pressure it will normally fail and retry in half a second or so.

Calling leak_balloon in that situation could benefit the system as a whole.

I might be misunderstanding the meaning of the relevant GFP flags,
pls correct me if I'm wrong.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-13 16:41   ` Tetsuo Handa
  2017-10-15  0:22     ` Michael S. Tsirkin
@ 2017-10-15  0:22     ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-15  0:22 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, virtualization, mhocko

On Sat, Oct 14, 2017 at 01:41:14AM +0900, Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote:
> > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > serialize against fill_balloon(). But in fill_balloon(),
> > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > > is specified, this allocation attempt might indirectly depend on somebody
> > > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
> > > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
> > > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
> > > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
> > > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
> > > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
> > > leak_balloon() is called from out_of_memory().
> > > 
> > >   Thread1                                       Thread2
> > >     fill_balloon()
> > >       takes a balloon_lock
> > >       balloon_page_enqueue()
> > >         alloc_page(GFP_HIGHUSER_MOVABLE)
> > >           direct reclaim (__GFP_FS context)       takes a fs lock
> > >             waits for that fs lock                  alloc_page(GFP_NOFS)
> > >                                                       __alloc_pages_may_oom()
> > >                                                         takes the oom_lock
> > >                                                         out_of_memory()
> > >                                                           blocking_notifier_call_chain()
> > >                                                             leak_balloon()
> > >                                                               tries to take that balloon_lock and deadlocks
> > > 
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > 
> > This doesn't deflate on oom if lock is contended, and we acked
> > DEFLATE_ON_OOM so host actually expects us to.
> 
> I still don't understand what is wrong with not deflating on OOM.

Well the point of this flag is that when it's acked,
host knows that it's safe to inflate the balloon
to a large portion of guest memory and this won't
cause an OOM situation.

Without this flag, if you inflate after OOM, then it's
fine as allocations fail, but if you inflate and then
enter OOM, balloon won't give up memory.

The flag is unfortunately hard for management tools to use
which led to it still not being supported by hosts.


> According to https://lists.oasis-open.org/archives/virtio-dev/201504/msg00084.html ,
> 
>   If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the
>   driver MAY use pages from the balloon when \field{num_pages}
>   is less than or equal to the actual number of pages in the
>   balloon if this is required for system stability
>   (e.g. if memory is required by applications running within
>   the guest).
> 
> it says "MAY" rather than "MUST". I think it is legal to be a no-op.
> Maybe I don't understand the difference between "deflate the balloon" and
> "use pages from the balloon" ?

Maybe we should strengthen this to SHOULD.


> According to https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027807.html ,
> it seems to me that the expected behavior after deflating while inflating
> was not defined when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was proposed.

I think the assumption is that it fill back up eventually
when guest does have some free memory.

> When the host increased "struct virtio_balloon_config"->num_pages and
> kicked the guest, the guest's update_balloon_size_func() starts calling
> fill_balloon() until "struct virtio_balloon"->num_pages reaches
> "struct virtio_balloon_config"->num_pages, doesn't it?
> 
>   struct virtio_balloon_config {
>   	/* Number of pages host wants Guest to give up. */
>   	__u32 num_pages;
>   	/* Number of pages we've actually got in balloon. */
>   	__u32 actual;
>   };
> 
> If leak_balloon() is called via out_of_memory(), leak_balloon()
> will decrease "struct virtio_balloon"->num_pages.
> But, is "struct virtio_balloon_config"->num_pages updated when
> leak_balloon() is called via out_of_memory() ?
> If yes, update_balloon_size_func() would stop calling fill_balloon()
> when leak_balloon() was called via out_of_memory().
> If no, update_balloon_size_func() would continue calling fill_balloon()
> when leak_balloon() was called via out_of_memory() via fill_balloon()
> via update_balloon_size_func(). That is, when fill_balloon() tries to
> increase "struct virtio_balloon"->num_pages, leak_balloon() which
> decreases "struct virtio_balloon"->num_pages is called due to indirect
> __GFP_DIRECT_RECLAIM dependency via out_of_memory().
> As a result, fill_balloon() will continue trying to increase
> "struct virtio_balloon"->num_pages and leak_balloon() will continue
> decreasing "struct virtio_balloon"->num_pages when leak_balloon()
> is called via fill_balloon() via update_balloon_size_func() due to
> host increased "struct virtio_balloon_config"->num_pages and kicked
> the guest. We deflate the balloon in order to inflate the balloon.
> That is OOM lockup, isn't it? How is such situation better than
> invoking the OOM killer in order to inflate the balloon?

I don't think it's a lockup see below.

> > 
> > The proper fix isn't that hard - just avoid allocations under lock.
> > 
> > Patch posted, pls take a look.
> 
> Your patch allocates pages in order to inflate the balloon, but
> your patch will allow leak_balloon() to deflate the balloon.
> How deflating the balloon (i.e. calling leak_balloon()) makes sense
> when allocating pages for inflating the balloon (i.e. calling
> fill_balloon()) ?

The idea is that fill_balloon is allocating memory with __GFP_NORETRY
so it will avoid disruptive actions like the OOM killer.
Under pressure it will normally fail and retry in half a second or so.

Calling leak_balloon in that situation could benefit the system as a whole.

I might be misunderstanding the meaning of the relevant GFP flags,
pls correct me if I'm wrong.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-15  0:22     ` Michael S. Tsirkin
@ 2017-10-15  5:38       ` Tetsuo Handa
  2017-10-16 10:58         ` Tetsuo Handa
  2017-10-16 10:58         ` Tetsuo Handa
  2017-10-15  5:38       ` Tetsuo Handa
  1 sibling, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-15  5:38 UTC (permalink / raw)
  To: mst; +Cc: mhocko, wei.w.wang, virtualization, linux-mm

Michael S. Tsirkin wrote:
> > > 
> > > The proper fix isn't that hard - just avoid allocations under lock.
> > > 
> > > Patch posted, pls take a look.
> > 
> > Your patch allocates pages in order to inflate the balloon, but
> > your patch will allow leak_balloon() to deflate the balloon.
> > How deflating the balloon (i.e. calling leak_balloon()) makes sense
> > when allocating pages for inflating the balloon (i.e. calling
> > fill_balloon()) ?
> 
> The idea is that fill_balloon is allocating memory with __GFP_NORETRY
> so it will avoid disruptive actions like the OOM killer.
> Under pressure it will normally fail and retry in half a second or so.
> 
> Calling leak_balloon in that situation could benefit the system as a whole.
> 
> I might be misunderstanding the meaning of the relevant GFP flags,
> pls correct me if I'm wrong.

Would you answer to below question by "yes"/"no" ?

  If leak_balloon() is called via out_of_memory(), leak_balloon()
  will decrease "struct virtio_balloon"->num_pages.
  But, is "struct virtio_balloon_config"->num_pages updated when
  leak_balloon() is called via out_of_memory() ?

Below explanation assumes that your answer is "no".

I consider that fill_balloon() is using __GFP_NORETRY is a bug.
Consider an extreme situation that guest1 is started with 8192MB
memory and then guest1's memory is reduced to 128MB by

  virsh qemu-monitor-command --domain guest1 --hmp 'balloon 128'

when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was not negotiated.
Of course, 128MB would be too small to operate guest1 properly.
Since update_balloon_size_func() continues calling fill_balloon()
until guest1's memory is reduced to 128MB, you will see flooding of
"puff" messages (and guest1 is practically unusable because all CPU
resource will be wasted for unsuccessful memory reclaim attempts)
unless the OOM killer is invoked.

What this patch is trying to handle is a situation when
VIRTIO_BALLOON_F_DEFLATE_ON_OOM was negotiated. Once
update_balloon_size_func() started calling fill_balloon(),
update_balloon_size_func() will continue calling fill_balloon()
until guest1's memory is reduced to 128MB, won't it?

Since fill_balloon() uses __GFP_IO | __GFP_FS, fill_balloon() can
indirectly trigger out_of_memory() despite __GFP_NORETRY is specified.

When update_balloon_size_func() is running for calling fill_balloon(),
calling leak_balloon() will increase number of pages to fill which
fill_balloon() is supposed to fill. Leaking some pages from leak_balloon()
via blocking_notifier_call_chain() callback could avoid invocation of the
OOM killer for that specific moment, but it bounces back to us later because
number of pages to allocate later (note that update_balloon_size_func() is
running for calling fill_balloon()) is increased by leak_balloon().

Thus, I don't think that avoid invoking the OOM killer by calling leak_balloon()
makes sense when update_balloon_size_func() is running for calling fill_balloon().
And this patch tries to detect it by replacing mutex_lock() with mutex_trylock().

> Well the point of this flag is that when it's acked,
> host knows that it's safe to inflate the balloon
> to a large portion of guest memory and this won't
> cause an OOM situation.

Assuming that your answer to the question is "no", I don't think it is
safe to inflate the balloon to a large portion of guest memory, for once
update_balloon_size_func() started calling fill_balloon(),
update_balloon_size_func() can not stop calling fill_balloon() even when
blocking_notifier_call_chain() callback called leak_balloon() because
"struct virtio_balloon_config"->num_pages will not be updated when
blocking_notifier_call_chain() callback called leak_balloon().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-15  0:22     ` Michael S. Tsirkin
  2017-10-15  5:38       ` Tetsuo Handa
@ 2017-10-15  5:38       ` Tetsuo Handa
  1 sibling, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-15  5:38 UTC (permalink / raw)
  To: mst; +Cc: linux-mm, virtualization, mhocko

Michael S. Tsirkin wrote:
> > > 
> > > The proper fix isn't that hard - just avoid allocations under lock.
> > > 
> > > Patch posted, pls take a look.
> > 
> > Your patch allocates pages in order to inflate the balloon, but
> > your patch will allow leak_balloon() to deflate the balloon.
> > How deflating the balloon (i.e. calling leak_balloon()) makes sense
> > when allocating pages for inflating the balloon (i.e. calling
> > fill_balloon()) ?
> 
> The idea is that fill_balloon is allocating memory with __GFP_NORETRY
> so it will avoid disruptive actions like the OOM killer.
> Under pressure it will normally fail and retry in half a second or so.
> 
> Calling leak_balloon in that situation could benefit the system as a whole.
> 
> I might be misunderstanding the meaning of the relevant GFP flags,
> pls correct me if I'm wrong.

Would you answer to below question by "yes"/"no" ?

  If leak_balloon() is called via out_of_memory(), leak_balloon()
  will decrease "struct virtio_balloon"->num_pages.
  But, is "struct virtio_balloon_config"->num_pages updated when
  leak_balloon() is called via out_of_memory() ?

Below explanation assumes that your answer is "no".

I consider that fill_balloon() is using __GFP_NORETRY is a bug.
Consider an extreme situation that guest1 is started with 8192MB
memory and then guest1's memory is reduced to 128MB by

  virsh qemu-monitor-command --domain guest1 --hmp 'balloon 128'

when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was not negotiated.
Of course, 128MB would be too small to operate guest1 properly.
Since update_balloon_size_func() continues calling fill_balloon()
until guest1's memory is reduced to 128MB, you will see flooding of
"puff" messages (and guest1 is practically unusable because all CPU
resource will be wasted for unsuccessful memory reclaim attempts)
unless the OOM killer is invoked.

What this patch is trying to handle is a situation when
VIRTIO_BALLOON_F_DEFLATE_ON_OOM was negotiated. Once
update_balloon_size_func() started calling fill_balloon(),
update_balloon_size_func() will continue calling fill_balloon()
until guest1's memory is reduced to 128MB, won't it?

Since fill_balloon() uses __GFP_IO | __GFP_FS, fill_balloon() can
indirectly trigger out_of_memory() despite __GFP_NORETRY is specified.

When update_balloon_size_func() is running for calling fill_balloon(),
calling leak_balloon() will increase number of pages to fill which
fill_balloon() is supposed to fill. Leaking some pages from leak_balloon()
via blocking_notifier_call_chain() callback could avoid invocation of the
OOM killer for that specific moment, but it bounces back to us later because
number of pages to allocate later (note that update_balloon_size_func() is
running for calling fill_balloon()) is increased by leak_balloon().

Thus, I don't think that avoid invoking the OOM killer by calling leak_balloon()
makes sense when update_balloon_size_func() is running for calling fill_balloon().
And this patch tries to detect it by replacing mutex_lock() with mutex_trylock().

> Well the point of this flag is that when it's acked,
> host knows that it's safe to inflate the balloon
> to a large portion of guest memory and this won't
> cause an OOM situation.

Assuming that your answer to the question is "no", I don't think it is
safe to inflate the balloon to a large portion of guest memory, for once
update_balloon_size_func() started calling fill_balloon(),
update_balloon_size_func() can not stop calling fill_balloon() even when
blocking_notifier_call_chain() callback called leak_balloon() because
"struct virtio_balloon_config"->num_pages will not be updated when
blocking_notifier_call_chain() callback called leak_balloon().

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-15  5:38       ` Tetsuo Handa
@ 2017-10-16 10:58         ` Tetsuo Handa
  2017-10-16 17:01           ` Michael S. Tsirkin
  2017-10-16 17:01           ` Michael S. Tsirkin
  2017-10-16 10:58         ` Tetsuo Handa
  1 sibling, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-16 10:58 UTC (permalink / raw)
  To: mst; +Cc: mhocko, wei.w.wang, virtualization, linux-mm

Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > > > 
> > > > The proper fix isn't that hard - just avoid allocations under lock.
> > > > 
> > > > Patch posted, pls take a look.
> > > 
> > > Your patch allocates pages in order to inflate the balloon, but
> > > your patch will allow leak_balloon() to deflate the balloon.
> > > How deflating the balloon (i.e. calling leak_balloon()) makes sense
> > > when allocating pages for inflating the balloon (i.e. calling
> > > fill_balloon()) ?
> > 
> > The idea is that fill_balloon is allocating memory with __GFP_NORETRY
> > so it will avoid disruptive actions like the OOM killer.
> > Under pressure it will normally fail and retry in half a second or so.
> > 
> > Calling leak_balloon in that situation could benefit the system as a whole.
> > 
> > I might be misunderstanding the meaning of the relevant GFP flags,
> > pls correct me if I'm wrong.
> 
> Would you answer to below question by "yes"/"no" ?
> 
>   If leak_balloon() is called via out_of_memory(), leak_balloon()
>   will decrease "struct virtio_balloon"->num_pages.
>   But, is "struct virtio_balloon_config"->num_pages updated when
>   leak_balloon() is called via out_of_memory() ?
> 
> Below explanation assumes that your answer is "no".
> 
> I consider that fill_balloon() is using __GFP_NORETRY is a bug.

Below are my test results using 4.14-rc5.

  http://I-love.SAKURA.ne.jp/tmp/20171016-default.log.xz
  http://I-love.SAKURA.ne.jp/tmp/20171016-deflate.log.xz

20171016-default.log.xz is without VIRTIO_BALLOON_F_DEFLATE_ON_OOM and
20171016-deflate.log.xz is with VIRTIO_BALLOON_F_DEFLATE_ON_OOM. (I used
inverting virtio_has_feature(VIRTIO_BALLOON_F_DEFLATE_ON_OOM) test
because the QEMU I'm using does not support deflate-on-oom option.)

> Consider an extreme situation that guest1 is started with 8192MB
> memory and then guest1's memory is reduced to 128MB by
> 
>   virsh qemu-monitor-command --domain guest1 --hmp 'balloon 128'
> 
> when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was not negotiated.
> Of course, 128MB would be too small to operate guest1 properly.
> Since update_balloon_size_func() continues calling fill_balloon()
> until guest1's memory is reduced to 128MB, you will see flooding of
> "puff" messages (and guest1 is practically unusable because all CPU
> resource will be wasted for unsuccessful memory reclaim attempts)
> unless the OOM killer is invoked.

20171016-default.log.xz continued printing "puff" messages until kernel
panic caused by somebody else killing all OOM killable processes via
GFP_KERNEL allocation requests. Although fill_balloon() was printing
a lot of noises due to __GFP_NORETRY, the system made forward progress
in the form of kernel panic triggered by no more OOM killable processes.

----------
[   88.270838] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   88.503496] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   88.730058] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   88.937971] Out of memory: Kill process 669 (dhclient) score 54 or sacrifice child
[   89.007322] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.234874] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.439735] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.496389] Out of memory: Kill process 853 (tuned) score 47 or sacrifice child
[   89.663808] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.883812] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   90.104417] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   90.251293] Out of memory: Kill process 568 (polkitd) score 36 or sacrifice child
[   90.326131] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   90.821722] Out of memory: Kill process 601 (NetworkManager) score 23 or sacrifice child
[   91.293848] Out of memory: Kill process 585 (rsyslogd) score 13 or sacrifice child
[   91.799413] Out of memory: Kill process 415 (systemd-journal) score 13 or sacrifice child
[   91.861974] Out of memory: Kill process 987 (qmgr) score 6 or sacrifice child
[   91.925297] Out of memory: Kill process 985 (master) score 6 or sacrifice child
[   92.254082] Out of memory: Kill process 985 (master) score 6 or sacrifice child
[   92.464029] Out of memory: Kill process 881 (login) score 4 or sacrifice child
[   92.467859] Out of memory: Kill process 881 (login) score 4 or sacrifice child
[   93.490928] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   93.713220] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   93.932145] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.147652] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.363826] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.606404] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.833539] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.053230] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.267805] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.483789] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   98.290124] Out of memory: Kill process 595 (crond) score 3 or sacrifice child
[   98.643588] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   98.864487] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.084685] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.299766] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.515745] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.758334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.985175] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  100.204474] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  100.419757] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  100.635681] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  101.263840] Out of memory: Kill process 587 (systemd-logind) score 2 or sacrifice child
[  101.319432] Out of memory: Kill process 586 (irqbalance) score 1 or sacrifice child
[  101.386546] Out of memory: Kill process 569 (dbus-daemon) score 0 or sacrifice child
[  103.805754] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.033073] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.253104] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.467810] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.683792] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.926409] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.153490] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.372610] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.587751] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.803719] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  108.958882] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.185254] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.404393] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.621584] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.697419] Kernel panic - not syncing: Out of memory and no killable processes...
[  109.723710] ---[ end Kernel panic - not syncing: Out of memory and no killable processes...
----------

> 
> What this patch is trying to handle is a situation when
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM was negotiated. Once
> update_balloon_size_func() started calling fill_balloon(),
> update_balloon_size_func() will continue calling fill_balloon()
> until guest1's memory is reduced to 128MB, won't it?
> 
> Since fill_balloon() uses __GFP_IO | __GFP_FS, fill_balloon() can
> indirectly trigger out_of_memory() despite __GFP_NORETRY is specified.
> 
> When update_balloon_size_func() is running for calling fill_balloon(),
> calling leak_balloon() will increase number of pages to fill which
> fill_balloon() is supposed to fill. Leaking some pages from leak_balloon()
> via blocking_notifier_call_chain() callback could avoid invocation of the
> OOM killer for that specific moment, but it bounces back to us later because
> number of pages to allocate later (note that update_balloon_size_func() is
> running for calling fill_balloon()) is increased by leak_balloon().

20171016-deflate.log.xz continued printing "puff" messages without any OOM
killer messages, for fill_balloon() always inflates faster than leak_balloon()
deflates.

Since the OOM killer cannot be invoked unless leak_balloon() completely
deflates faster than fill_balloon() inflates, the guest remained unusable
(e.g. unable to login via ssh) other than printing "puff" messages.
This result was worse than 20171016-default.log.xz , for the system was
not able to make any forward progress (i.e. complete OOM lockup).

----------
[   19.866938] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.089350] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.430965] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.652641] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.894680] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.122063] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.340872] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.556128] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.772473] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   22.014960] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   24.972961] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.201565] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.420875] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.636081] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.851670] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   26.095842] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   26.322284] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   26.851691] virtio_balloon virtio3: Out of puff! Can't get 1 pages
(...snipped...)
[  211.748567] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  211.963910] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.157737] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.385239] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.604426] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.819807] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.036491] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.278718] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.505410] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.724334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.940318] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  217.155560] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  220.312187] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  220.342980] sysrq: SysRq : Resetting
----------

> 
> Thus, I don't think that avoid invoking the OOM killer by calling leak_balloon()
> makes sense when update_balloon_size_func() is running for calling fill_balloon().
> And this patch tries to detect it by replacing mutex_lock() with mutex_trylock().
> 
> > Well the point of this flag is that when it's acked,
> > host knows that it's safe to inflate the balloon
> > to a large portion of guest memory and this won't
> > cause an OOM situation.
> 
> Assuming that your answer to the question is "no", I don't think it is
> safe to inflate the balloon to a large portion of guest memory, for once
> update_balloon_size_func() started calling fill_balloon(),
> update_balloon_size_func() can not stop calling fill_balloon() even when
> blocking_notifier_call_chain() callback called leak_balloon() because
> "struct virtio_balloon_config"->num_pages will not be updated when
> blocking_notifier_call_chain() callback called leak_balloon().

As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
fill_balloon() sequence can effectively disable the OOM killer when the host
assumed that it's safe to inflate the balloon to a large portion of guest
memory and this won't cause an OOM situation.

> > I think the assumption is that it fill back up eventually
> > when guest does have some free memory.

So, my question again because such assumption is broken.
What is the expected behavior after deflating while inflating?
How should "struct virtio_balloon_config"->num_pages be interpreted?

  struct virtio_balloon_config {
  	/* Number of pages host wants Guest to give up. */
  	__u32 num_pages;
  	/* Number of pages we've actually got in balloon. */
  	__u32 actual;
  };

If leak_balloon() from out_of_memory() should be stronger than
fill_balloon() from update_balloon_size_func(), we need to make
sure that update_balloon_size_func() stops calling fill_balloon()
when leak_balloon() was called from out_of_memory().

If fill_balloon() from update_balloon_size_func() should be
stronger than leak_balloon() from out_of_memory(), we need to make
sure that leak_balloon() from out_of_memory() is ignored when
fill_balloon() from update_balloon_size_func() is running.

Apart from vb->balloon_lock deadlock avoidance, we need to define
the expected behavior.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-15  5:38       ` Tetsuo Handa
  2017-10-16 10:58         ` Tetsuo Handa
@ 2017-10-16 10:58         ` Tetsuo Handa
  1 sibling, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-16 10:58 UTC (permalink / raw)
  To: mst; +Cc: linux-mm, virtualization, mhocko

Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > > > 
> > > > The proper fix isn't that hard - just avoid allocations under lock.
> > > > 
> > > > Patch posted, pls take a look.
> > > 
> > > Your patch allocates pages in order to inflate the balloon, but
> > > your patch will allow leak_balloon() to deflate the balloon.
> > > How deflating the balloon (i.e. calling leak_balloon()) makes sense
> > > when allocating pages for inflating the balloon (i.e. calling
> > > fill_balloon()) ?
> > 
> > The idea is that fill_balloon is allocating memory with __GFP_NORETRY
> > so it will avoid disruptive actions like the OOM killer.
> > Under pressure it will normally fail and retry in half a second or so.
> > 
> > Calling leak_balloon in that situation could benefit the system as a whole.
> > 
> > I might be misunderstanding the meaning of the relevant GFP flags,
> > pls correct me if I'm wrong.
> 
> Would you answer to below question by "yes"/"no" ?
> 
>   If leak_balloon() is called via out_of_memory(), leak_balloon()
>   will decrease "struct virtio_balloon"->num_pages.
>   But, is "struct virtio_balloon_config"->num_pages updated when
>   leak_balloon() is called via out_of_memory() ?
> 
> Below explanation assumes that your answer is "no".
> 
> I consider that fill_balloon() is using __GFP_NORETRY is a bug.

Below are my test results using 4.14-rc5.

  http://I-love.SAKURA.ne.jp/tmp/20171016-default.log.xz
  http://I-love.SAKURA.ne.jp/tmp/20171016-deflate.log.xz

20171016-default.log.xz is without VIRTIO_BALLOON_F_DEFLATE_ON_OOM and
20171016-deflate.log.xz is with VIRTIO_BALLOON_F_DEFLATE_ON_OOM. (I used
inverting virtio_has_feature(VIRTIO_BALLOON_F_DEFLATE_ON_OOM) test
because the QEMU I'm using does not support deflate-on-oom option.)

> Consider an extreme situation that guest1 is started with 8192MB
> memory and then guest1's memory is reduced to 128MB by
> 
>   virsh qemu-monitor-command --domain guest1 --hmp 'balloon 128'
> 
> when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was not negotiated.
> Of course, 128MB would be too small to operate guest1 properly.
> Since update_balloon_size_func() continues calling fill_balloon()
> until guest1's memory is reduced to 128MB, you will see flooding of
> "puff" messages (and guest1 is practically unusable because all CPU
> resource will be wasted for unsuccessful memory reclaim attempts)
> unless the OOM killer is invoked.

20171016-default.log.xz continued printing "puff" messages until kernel
panic caused by somebody else killing all OOM killable processes via
GFP_KERNEL allocation requests. Although fill_balloon() was printing
a lot of noises due to __GFP_NORETRY, the system made forward progress
in the form of kernel panic triggered by no more OOM killable processes.

----------
[   88.270838] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   88.503496] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   88.730058] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   88.937971] Out of memory: Kill process 669 (dhclient) score 54 or sacrifice child
[   89.007322] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.234874] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.439735] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.496389] Out of memory: Kill process 853 (tuned) score 47 or sacrifice child
[   89.663808] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   89.883812] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   90.104417] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   90.251293] Out of memory: Kill process 568 (polkitd) score 36 or sacrifice child
[   90.326131] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   90.821722] Out of memory: Kill process 601 (NetworkManager) score 23 or sacrifice child
[   91.293848] Out of memory: Kill process 585 (rsyslogd) score 13 or sacrifice child
[   91.799413] Out of memory: Kill process 415 (systemd-journal) score 13 or sacrifice child
[   91.861974] Out of memory: Kill process 987 (qmgr) score 6 or sacrifice child
[   91.925297] Out of memory: Kill process 985 (master) score 6 or sacrifice child
[   92.254082] Out of memory: Kill process 985 (master) score 6 or sacrifice child
[   92.464029] Out of memory: Kill process 881 (login) score 4 or sacrifice child
[   92.467859] Out of memory: Kill process 881 (login) score 4 or sacrifice child
[   93.490928] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   93.713220] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   93.932145] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.147652] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.363826] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.606404] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   94.833539] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.053230] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.267805] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.483789] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   98.290124] Out of memory: Kill process 595 (crond) score 3 or sacrifice child
[   98.643588] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   98.864487] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.084685] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.299766] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.515745] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.758334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   99.985175] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  100.204474] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  100.419757] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  100.635681] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  101.263840] Out of memory: Kill process 587 (systemd-logind) score 2 or sacrifice child
[  101.319432] Out of memory: Kill process 586 (irqbalance) score 1 or sacrifice child
[  101.386546] Out of memory: Kill process 569 (dbus-daemon) score 0 or sacrifice child
[  103.805754] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.033073] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.253104] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.467810] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.683792] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  104.926409] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.153490] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.372610] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.587751] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  105.803719] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  108.958882] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.185254] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.404393] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.621584] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  109.697419] Kernel panic - not syncing: Out of memory and no killable processes...
[  109.723710] ---[ end Kernel panic - not syncing: Out of memory and no killable processes...
----------

> 
> What this patch is trying to handle is a situation when
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM was negotiated. Once
> update_balloon_size_func() started calling fill_balloon(),
> update_balloon_size_func() will continue calling fill_balloon()
> until guest1's memory is reduced to 128MB, won't it?
> 
> Since fill_balloon() uses __GFP_IO | __GFP_FS, fill_balloon() can
> indirectly trigger out_of_memory() despite __GFP_NORETRY is specified.
> 
> When update_balloon_size_func() is running for calling fill_balloon(),
> calling leak_balloon() will increase number of pages to fill which
> fill_balloon() is supposed to fill. Leaking some pages from leak_balloon()
> via blocking_notifier_call_chain() callback could avoid invocation of the
> OOM killer for that specific moment, but it bounces back to us later because
> number of pages to allocate later (note that update_balloon_size_func() is
> running for calling fill_balloon()) is increased by leak_balloon().

20171016-deflate.log.xz continued printing "puff" messages without any OOM
killer messages, for fill_balloon() always inflates faster than leak_balloon()
deflates.

Since the OOM killer cannot be invoked unless leak_balloon() completely
deflates faster than fill_balloon() inflates, the guest remained unusable
(e.g. unable to login via ssh) other than printing "puff" messages.
This result was worse than 20171016-default.log.xz , for the system was
not able to make any forward progress (i.e. complete OOM lockup).

----------
[   19.866938] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.089350] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.430965] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.652641] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   20.894680] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.122063] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.340872] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.556128] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   21.772473] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   22.014960] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   24.972961] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.201565] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.420875] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.636081] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   25.851670] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   26.095842] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   26.322284] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   26.851691] virtio_balloon virtio3: Out of puff! Can't get 1 pages
(...snipped...)
[  211.748567] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  211.963910] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.157737] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.385239] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.604426] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  215.819807] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.036491] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.278718] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.505410] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.724334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  216.940318] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  217.155560] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  220.312187] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[  220.342980] sysrq: SysRq : Resetting
----------

> 
> Thus, I don't think that avoid invoking the OOM killer by calling leak_balloon()
> makes sense when update_balloon_size_func() is running for calling fill_balloon().
> And this patch tries to detect it by replacing mutex_lock() with mutex_trylock().
> 
> > Well the point of this flag is that when it's acked,
> > host knows that it's safe to inflate the balloon
> > to a large portion of guest memory and this won't
> > cause an OOM situation.
> 
> Assuming that your answer to the question is "no", I don't think it is
> safe to inflate the balloon to a large portion of guest memory, for once
> update_balloon_size_func() started calling fill_balloon(),
> update_balloon_size_func() can not stop calling fill_balloon() even when
> blocking_notifier_call_chain() callback called leak_balloon() because
> "struct virtio_balloon_config"->num_pages will not be updated when
> blocking_notifier_call_chain() callback called leak_balloon().

As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
fill_balloon() sequence can effectively disable the OOM killer when the host
assumed that it's safe to inflate the balloon to a large portion of guest
memory and this won't cause an OOM situation.

> > I think the assumption is that it fill back up eventually
> > when guest does have some free memory.

So, my question again because such assumption is broken.
What is the expected behavior after deflating while inflating?
How should "struct virtio_balloon_config"->num_pages be interpreted?

  struct virtio_balloon_config {
  	/* Number of pages host wants Guest to give up. */
  	__u32 num_pages;
  	/* Number of pages we've actually got in balloon. */
  	__u32 actual;
  };

If leak_balloon() from out_of_memory() should be stronger than
fill_balloon() from update_balloon_size_func(), we need to make
sure that update_balloon_size_func() stops calling fill_balloon()
when leak_balloon() was called from out_of_memory().

If fill_balloon() from update_balloon_size_func() should be
stronger than leak_balloon() from out_of_memory(), we need to make
sure that leak_balloon() from out_of_memory() is ignored when
fill_balloon() from update_balloon_size_func() is running.

Apart from vb->balloon_lock deadlock avoidance, we need to define
the expected behavior.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-16 10:58         ` Tetsuo Handa
@ 2017-10-16 17:01           ` Michael S. Tsirkin
  2017-10-18 10:59             ` Tetsuo Handa
  2017-10-18 10:59             ` Tetsuo Handa
  2017-10-16 17:01           ` Michael S. Tsirkin
  1 sibling, 2 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-16 17:01 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, wei.w.wang, virtualization, linux-mm,
	Raushaniya Maksudova, Denis V. Lunev

On Mon, Oct 16, 2017 at 07:58:29PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Michael S. Tsirkin wrote:
> > > > > 
> > > > > The proper fix isn't that hard - just avoid allocations under lock.
> > > > > 
> > > > > Patch posted, pls take a look.
> > > > 
> > > > Your patch allocates pages in order to inflate the balloon, but
> > > > your patch will allow leak_balloon() to deflate the balloon.
> > > > How deflating the balloon (i.e. calling leak_balloon()) makes sense
> > > > when allocating pages for inflating the balloon (i.e. calling
> > > > fill_balloon()) ?
> > > 
> > > The idea is that fill_balloon is allocating memory with __GFP_NORETRY
> > > so it will avoid disruptive actions like the OOM killer.
> > > Under pressure it will normally fail and retry in half a second or so.
> > > 
> > > Calling leak_balloon in that situation could benefit the system as a whole.
> > > 
> > > I might be misunderstanding the meaning of the relevant GFP flags,
> > > pls correct me if I'm wrong.
> > 
> > Would you answer to below question by "yes"/"no" ?
> > 
> >   If leak_balloon() is called via out_of_memory(), leak_balloon()
> >   will decrease "struct virtio_balloon"->num_pages.
> >   But, is "struct virtio_balloon_config"->num_pages updated when
> >   leak_balloon() is called via out_of_memory() ?
> > 
> > Below explanation assumes that your answer is "no".
> > 
> > I consider that fill_balloon() is using __GFP_NORETRY is a bug.
> 
> Below are my test results using 4.14-rc5.
> 
>   http://I-love.SAKURA.ne.jp/tmp/20171016-default.log.xz
>   http://I-love.SAKURA.ne.jp/tmp/20171016-deflate.log.xz
> 
> 20171016-default.log.xz is without VIRTIO_BALLOON_F_DEFLATE_ON_OOM and
> 20171016-deflate.log.xz is with VIRTIO_BALLOON_F_DEFLATE_ON_OOM. (I used
> inverting virtio_has_feature(VIRTIO_BALLOON_F_DEFLATE_ON_OOM) test
> because the QEMU I'm using does not support deflate-on-oom option.)
> 
> > Consider an extreme situation that guest1 is started with 8192MB
> > memory and then guest1's memory is reduced to 128MB by
> > 
> >   virsh qemu-monitor-command --domain guest1 --hmp 'balloon 128'
> > 
> > when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was not negotiated.
> > Of course, 128MB would be too small to operate guest1 properly.
> > Since update_balloon_size_func() continues calling fill_balloon()
> > until guest1's memory is reduced to 128MB, you will see flooding of
> > "puff" messages (and guest1 is practically unusable because all CPU
> > resource will be wasted for unsuccessful memory reclaim attempts)
> > unless the OOM killer is invoked.
> 
> 20171016-default.log.xz continued printing "puff" messages until kernel
> panic caused by somebody else killing all OOM killable processes via
> GFP_KERNEL allocation requests. Although fill_balloon() was printing
> a lot of noises due to __GFP_NORETRY, the system made forward progress
> in the form of kernel panic triggered by no more OOM killable processes.
> 
> ----------
> [   88.270838] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   88.503496] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   88.730058] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   88.937971] Out of memory: Kill process 669 (dhclient) score 54 or sacrifice child
> [   89.007322] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.234874] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.439735] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.496389] Out of memory: Kill process 853 (tuned) score 47 or sacrifice child
> [   89.663808] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.883812] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   90.104417] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   90.251293] Out of memory: Kill process 568 (polkitd) score 36 or sacrifice child
> [   90.326131] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   90.821722] Out of memory: Kill process 601 (NetworkManager) score 23 or sacrifice child
> [   91.293848] Out of memory: Kill process 585 (rsyslogd) score 13 or sacrifice child
> [   91.799413] Out of memory: Kill process 415 (systemd-journal) score 13 or sacrifice child
> [   91.861974] Out of memory: Kill process 987 (qmgr) score 6 or sacrifice child
> [   91.925297] Out of memory: Kill process 985 (master) score 6 or sacrifice child
> [   92.254082] Out of memory: Kill process 985 (master) score 6 or sacrifice child
> [   92.464029] Out of memory: Kill process 881 (login) score 4 or sacrifice child
> [   92.467859] Out of memory: Kill process 881 (login) score 4 or sacrifice child
> [   93.490928] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   93.713220] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   93.932145] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.147652] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.363826] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.606404] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.833539] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.053230] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.267805] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.483789] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   98.290124] Out of memory: Kill process 595 (crond) score 3 or sacrifice child
> [   98.643588] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   98.864487] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.084685] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.299766] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.515745] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.758334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.985175] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  100.204474] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  100.419757] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  100.635681] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  101.263840] Out of memory: Kill process 587 (systemd-logind) score 2 or sacrifice child
> [  101.319432] Out of memory: Kill process 586 (irqbalance) score 1 or sacrifice child
> [  101.386546] Out of memory: Kill process 569 (dbus-daemon) score 0 or sacrifice child
> [  103.805754] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.033073] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.253104] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.467810] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.683792] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.926409] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.153490] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.372610] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.587751] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.803719] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  108.958882] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.185254] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.404393] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.621584] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.697419] Kernel panic - not syncing: Out of memory and no killable processes...
> [  109.723710] ---[ end Kernel panic - not syncing: Out of memory and no killable processes...
> ----------
> 
> > 
> > What this patch is trying to handle is a situation when
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM was negotiated. Once
> > update_balloon_size_func() started calling fill_balloon(),
> > update_balloon_size_func() will continue calling fill_balloon()
> > until guest1's memory is reduced to 128MB, won't it?
> > 
> > Since fill_balloon() uses __GFP_IO | __GFP_FS, fill_balloon() can
> > indirectly trigger out_of_memory() despite __GFP_NORETRY is specified.
> > 
> > When update_balloon_size_func() is running for calling fill_balloon(),
> > calling leak_balloon() will increase number of pages to fill which
> > fill_balloon() is supposed to fill. Leaking some pages from leak_balloon()
> > via blocking_notifier_call_chain() callback could avoid invocation of the
> > OOM killer for that specific moment, but it bounces back to us later because
> > number of pages to allocate later (note that update_balloon_size_func() is
> > running for calling fill_balloon()) is increased by leak_balloon().
> 
> 20171016-deflate.log.xz continued printing "puff" messages without any OOM
> killer messages, for fill_balloon() always inflates faster than leak_balloon()
> deflates.
> 
> Since the OOM killer cannot be invoked unless leak_balloon() completely
> deflates faster than fill_balloon() inflates, the guest remained unusable
> (e.g. unable to login via ssh) other than printing "puff" messages.
> This result was worse than 20171016-default.log.xz , for the system was
> not able to make any forward progress (i.e. complete OOM lockup).
> 
> ----------
> [   19.866938] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.089350] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.430965] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.652641] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.894680] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.122063] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.340872] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.556128] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.772473] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   22.014960] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   24.972961] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.201565] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.420875] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.636081] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.851670] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   26.095842] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   26.322284] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   26.851691] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> (...snipped...)
> [  211.748567] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  211.963910] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.157737] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.385239] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.604426] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.819807] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.036491] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.278718] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.505410] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.724334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.940318] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  217.155560] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  220.312187] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  220.342980] sysrq: SysRq : Resetting
> ----------
> 
> > 
> > Thus, I don't think that avoid invoking the OOM killer by calling leak_balloon()
> > makes sense when update_balloon_size_func() is running for calling fill_balloon().
> > And this patch tries to detect it by replacing mutex_lock() with mutex_trylock().
> > 
> > > Well the point of this flag is that when it's acked,
> > > host knows that it's safe to inflate the balloon
> > > to a large portion of guest memory and this won't
> > > cause an OOM situation.
> > 
> > Assuming that your answer to the question is "no", I don't think it is
> > safe to inflate the balloon to a large portion of guest memory, for once
> > update_balloon_size_func() started calling fill_balloon(),
> > update_balloon_size_func() can not stop calling fill_balloon() even when
> > blocking_notifier_call_chain() callback called leak_balloon() because
> > "struct virtio_balloon_config"->num_pages will not be updated when
> > blocking_notifier_call_chain() callback called leak_balloon().
> 
> As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
> OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
> fill_balloon() sequence can effectively disable the OOM killer when the host
> assumed that it's safe to inflate the balloon to a large portion of guest
> memory and this won't cause an OOM situation.

I agree. But I wonder how did the contributors use it in their systems.


> > > I think the assumption is that it fill back up eventually
> > > when guest does have some free memory.
> 
> So, my question again because such assumption is broken.
> What is the expected behavior after deflating while inflating?
> How should "struct virtio_balloon_config"->num_pages be interpreted?
> 
>   struct virtio_balloon_config {
>   	/* Number of pages host wants Guest to give up. */
>   	__u32 num_pages;
>   	/* Number of pages we've actually got in balloon. */
>   	__u32 actual;
>   };
> 
> If leak_balloon() from out_of_memory() should be stronger than
> fill_balloon() from update_balloon_size_func(), we need to make
> sure that update_balloon_size_func() stops calling fill_balloon()
> when leak_balloon() was called from out_of_memory().

I think that's the case. Question is, when can we inflate again?

> If fill_balloon() from update_balloon_size_func() should be
> stronger than leak_balloon() from out_of_memory(), we need to make
> sure that leak_balloon() from out_of_memory() is ignored when
> fill_balloon() from update_balloon_size_func() is running.
> 
> Apart from vb->balloon_lock deadlock avoidance, we need to define
> the expected behavior.


-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-16 10:58         ` Tetsuo Handa
  2017-10-16 17:01           ` Michael S. Tsirkin
@ 2017-10-16 17:01           ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-16 17:01 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: virtualization, linux-mm, Raushaniya Maksudova, Denis V. Lunev, mhocko

On Mon, Oct 16, 2017 at 07:58:29PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Michael S. Tsirkin wrote:
> > > > > 
> > > > > The proper fix isn't that hard - just avoid allocations under lock.
> > > > > 
> > > > > Patch posted, pls take a look.
> > > > 
> > > > Your patch allocates pages in order to inflate the balloon, but
> > > > your patch will allow leak_balloon() to deflate the balloon.
> > > > How deflating the balloon (i.e. calling leak_balloon()) makes sense
> > > > when allocating pages for inflating the balloon (i.e. calling
> > > > fill_balloon()) ?
> > > 
> > > The idea is that fill_balloon is allocating memory with __GFP_NORETRY
> > > so it will avoid disruptive actions like the OOM killer.
> > > Under pressure it will normally fail and retry in half a second or so.
> > > 
> > > Calling leak_balloon in that situation could benefit the system as a whole.
> > > 
> > > I might be misunderstanding the meaning of the relevant GFP flags,
> > > pls correct me if I'm wrong.
> > 
> > Would you answer to below question by "yes"/"no" ?
> > 
> >   If leak_balloon() is called via out_of_memory(), leak_balloon()
> >   will decrease "struct virtio_balloon"->num_pages.
> >   But, is "struct virtio_balloon_config"->num_pages updated when
> >   leak_balloon() is called via out_of_memory() ?
> > 
> > Below explanation assumes that your answer is "no".
> > 
> > I consider that fill_balloon() is using __GFP_NORETRY is a bug.
> 
> Below are my test results using 4.14-rc5.
> 
>   http://I-love.SAKURA.ne.jp/tmp/20171016-default.log.xz
>   http://I-love.SAKURA.ne.jp/tmp/20171016-deflate.log.xz
> 
> 20171016-default.log.xz is without VIRTIO_BALLOON_F_DEFLATE_ON_OOM and
> 20171016-deflate.log.xz is with VIRTIO_BALLOON_F_DEFLATE_ON_OOM. (I used
> inverting virtio_has_feature(VIRTIO_BALLOON_F_DEFLATE_ON_OOM) test
> because the QEMU I'm using does not support deflate-on-oom option.)
> 
> > Consider an extreme situation that guest1 is started with 8192MB
> > memory and then guest1's memory is reduced to 128MB by
> > 
> >   virsh qemu-monitor-command --domain guest1 --hmp 'balloon 128'
> > 
> > when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was not negotiated.
> > Of course, 128MB would be too small to operate guest1 properly.
> > Since update_balloon_size_func() continues calling fill_balloon()
> > until guest1's memory is reduced to 128MB, you will see flooding of
> > "puff" messages (and guest1 is practically unusable because all CPU
> > resource will be wasted for unsuccessful memory reclaim attempts)
> > unless the OOM killer is invoked.
> 
> 20171016-default.log.xz continued printing "puff" messages until kernel
> panic caused by somebody else killing all OOM killable processes via
> GFP_KERNEL allocation requests. Although fill_balloon() was printing
> a lot of noises due to __GFP_NORETRY, the system made forward progress
> in the form of kernel panic triggered by no more OOM killable processes.
> 
> ----------
> [   88.270838] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   88.503496] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   88.730058] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   88.937971] Out of memory: Kill process 669 (dhclient) score 54 or sacrifice child
> [   89.007322] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.234874] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.439735] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.496389] Out of memory: Kill process 853 (tuned) score 47 or sacrifice child
> [   89.663808] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   89.883812] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   90.104417] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   90.251293] Out of memory: Kill process 568 (polkitd) score 36 or sacrifice child
> [   90.326131] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   90.821722] Out of memory: Kill process 601 (NetworkManager) score 23 or sacrifice child
> [   91.293848] Out of memory: Kill process 585 (rsyslogd) score 13 or sacrifice child
> [   91.799413] Out of memory: Kill process 415 (systemd-journal) score 13 or sacrifice child
> [   91.861974] Out of memory: Kill process 987 (qmgr) score 6 or sacrifice child
> [   91.925297] Out of memory: Kill process 985 (master) score 6 or sacrifice child
> [   92.254082] Out of memory: Kill process 985 (master) score 6 or sacrifice child
> [   92.464029] Out of memory: Kill process 881 (login) score 4 or sacrifice child
> [   92.467859] Out of memory: Kill process 881 (login) score 4 or sacrifice child
> [   93.490928] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   93.713220] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   93.932145] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.147652] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.363826] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.606404] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   94.833539] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.053230] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.267805] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.483789] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   98.290124] Out of memory: Kill process 595 (crond) score 3 or sacrifice child
> [   98.643588] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   98.864487] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.084685] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.299766] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.515745] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.758334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   99.985175] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  100.204474] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  100.419757] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  100.635681] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  101.263840] Out of memory: Kill process 587 (systemd-logind) score 2 or sacrifice child
> [  101.319432] Out of memory: Kill process 586 (irqbalance) score 1 or sacrifice child
> [  101.386546] Out of memory: Kill process 569 (dbus-daemon) score 0 or sacrifice child
> [  103.805754] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.033073] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.253104] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.467810] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.683792] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  104.926409] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.153490] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.372610] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.587751] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  105.803719] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  108.958882] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.185254] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.404393] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.621584] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  109.697419] Kernel panic - not syncing: Out of memory and no killable processes...
> [  109.723710] ---[ end Kernel panic - not syncing: Out of memory and no killable processes...
> ----------
> 
> > 
> > What this patch is trying to handle is a situation when
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM was negotiated. Once
> > update_balloon_size_func() started calling fill_balloon(),
> > update_balloon_size_func() will continue calling fill_balloon()
> > until guest1's memory is reduced to 128MB, won't it?
> > 
> > Since fill_balloon() uses __GFP_IO | __GFP_FS, fill_balloon() can
> > indirectly trigger out_of_memory() despite __GFP_NORETRY is specified.
> > 
> > When update_balloon_size_func() is running for calling fill_balloon(),
> > calling leak_balloon() will increase number of pages to fill which
> > fill_balloon() is supposed to fill. Leaking some pages from leak_balloon()
> > via blocking_notifier_call_chain() callback could avoid invocation of the
> > OOM killer for that specific moment, but it bounces back to us later because
> > number of pages to allocate later (note that update_balloon_size_func() is
> > running for calling fill_balloon()) is increased by leak_balloon().
> 
> 20171016-deflate.log.xz continued printing "puff" messages without any OOM
> killer messages, for fill_balloon() always inflates faster than leak_balloon()
> deflates.
> 
> Since the OOM killer cannot be invoked unless leak_balloon() completely
> deflates faster than fill_balloon() inflates, the guest remained unusable
> (e.g. unable to login via ssh) other than printing "puff" messages.
> This result was worse than 20171016-default.log.xz , for the system was
> not able to make any forward progress (i.e. complete OOM lockup).
> 
> ----------
> [   19.866938] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.089350] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.430965] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.652641] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   20.894680] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.122063] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.340872] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.556128] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   21.772473] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   22.014960] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   24.972961] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.201565] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.420875] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.636081] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   25.851670] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   26.095842] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   26.322284] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   26.851691] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> (...snipped...)
> [  211.748567] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  211.963910] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.157737] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.385239] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.604426] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  215.819807] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.036491] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.278718] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.505410] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.724334] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  216.940318] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  217.155560] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  220.312187] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [  220.342980] sysrq: SysRq : Resetting
> ----------
> 
> > 
> > Thus, I don't think that avoid invoking the OOM killer by calling leak_balloon()
> > makes sense when update_balloon_size_func() is running for calling fill_balloon().
> > And this patch tries to detect it by replacing mutex_lock() with mutex_trylock().
> > 
> > > Well the point of this flag is that when it's acked,
> > > host knows that it's safe to inflate the balloon
> > > to a large portion of guest memory and this won't
> > > cause an OOM situation.
> > 
> > Assuming that your answer to the question is "no", I don't think it is
> > safe to inflate the balloon to a large portion of guest memory, for once
> > update_balloon_size_func() started calling fill_balloon(),
> > update_balloon_size_func() can not stop calling fill_balloon() even when
> > blocking_notifier_call_chain() callback called leak_balloon() because
> > "struct virtio_balloon_config"->num_pages will not be updated when
> > blocking_notifier_call_chain() callback called leak_balloon().
> 
> As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
> OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
> fill_balloon() sequence can effectively disable the OOM killer when the host
> assumed that it's safe to inflate the balloon to a large portion of guest
> memory and this won't cause an OOM situation.

I agree. But I wonder how did the contributors use it in their systems.


> > > I think the assumption is that it fill back up eventually
> > > when guest does have some free memory.
> 
> So, my question again because such assumption is broken.
> What is the expected behavior after deflating while inflating?
> How should "struct virtio_balloon_config"->num_pages be interpreted?
> 
>   struct virtio_balloon_config {
>   	/* Number of pages host wants Guest to give up. */
>   	__u32 num_pages;
>   	/* Number of pages we've actually got in balloon. */
>   	__u32 actual;
>   };
> 
> If leak_balloon() from out_of_memory() should be stronger than
> fill_balloon() from update_balloon_size_func(), we need to make
> sure that update_balloon_size_func() stops calling fill_balloon()
> when leak_balloon() was called from out_of_memory().

I think that's the case. Question is, when can we inflate again?

> If fill_balloon() from update_balloon_size_func() should be
> stronger than leak_balloon() from out_of_memory(), we need to make
> sure that leak_balloon() from out_of_memory() is ignored when
> fill_balloon() from update_balloon_size_func() is running.
> 
> Apart from vb->balloon_lock deadlock avoidance, we need to define
> the expected behavior.


-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-16 17:01           ` Michael S. Tsirkin
@ 2017-10-18 10:59             ` Tetsuo Handa
  2017-10-18 17:16               ` Michael S. Tsirkin
  2017-10-18 17:16               ` Michael S. Tsirkin
  2017-10-18 10:59             ` Tetsuo Handa
  1 sibling, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-18 10:59 UTC (permalink / raw)
  To: mst; +Cc: mhocko, wei.w.wang, virtualization, linux-mm, rmaksudova, den

Tetsuo Handa wrote:
> 20171016-deflate.log.xz continued printing "puff" messages without any OOM
> killer messages, for fill_balloon() always inflates faster than leak_balloon()
> deflates.
> 
> Since the OOM killer cannot be invoked unless leak_balloon() completely
> deflates faster than fill_balloon() inflates, the guest remained unusable
> (e.g. unable to login via ssh) other than printing "puff" messages.
> This result was worse than 20171016-default.log.xz , for the system was
> not able to make any forward progress (i.e. complete OOM lockup).

I tested further and found that it is not complete OOM lockup.

It turned out that the reason of being unable to login via ssh was that fork()
was failing because __vm_enough_memory() was failing because
/proc/sys/vm/overcommit_memory was set to 0. Although virtio_balloon driver
was ready to release pages if asked via virtballoon_oom_notify() from
out_of_memory(), __vm_enough_memory() was not able to take such pages into
account. As a result, operations which need to use fork() were failing without
calling out_of_memory().
( http://lkml.kernel.org/r/201710181954.FHH51594.MtFOFLOQFSOHVJ@I-love.SAKURA.ne.jp )

Do you see anything wrong with the patch I used for emulating
VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?

----------------------------------------
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..a679ac2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 		}
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
-		if (!virtio_has_feature(vb->vdev,
+		if (virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, -1);
 	}
@@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	struct page *page, *next;
 
 	list_for_each_entry_safe(page, next, pages, lru) {
-		if (!virtio_has_feature(vb->vdev,
+		if (virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, 1);
 		list_del(&page->lru);
@@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	unsigned num_freed_pages;
 
 	vb = container_of(self, struct virtio_balloon, nb);
-	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
 	freed = parm;
----------------------------------------

> As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
> OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
> fill_balloon() sequence can effectively disable the OOM killer when the host
> assumed that it's safe to inflate the balloon to a large portion of guest
> memory and this won't cause an OOM situation.

The other problem is that, although it is not complete OOM lockup, it is too
slow to wait if we hit out_of_memory() => fill_balloon() => out_of_memory() =>
fill_balloon() sequence.

> If leak_balloon() from out_of_memory() should be stronger than
> fill_balloon() from update_balloon_size_func(), we need to make
> sure that update_balloon_size_func() stops calling fill_balloon()
> when leak_balloon() was called from out_of_memory().

I tried below patch to reduce the possibility of hitting out_of_memory() =>
fill_balloon() => out_of_memory() => fill_balloon() sequence.

----------------------------------------
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a679ac2..9037fee 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -57,7 +57,7 @@ struct virtio_balloon {
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
-	struct work_struct update_balloon_size_work;
+	struct delayed_work update_balloon_size_work;
 
 	/* Prevent updating balloon when it is being canceled. */
 	spinlock_t stop_update_lock;
@@ -88,6 +88,7 @@ struct virtio_balloon {
 
 	/* To register callback in oom notifier call chain */
 	struct notifier_block nb;
+	struct timer_list deflate_on_oom_timer;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -141,7 +142,8 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned fill_balloon(struct virtio_balloon *vb, size_t num,
+			     unsigned long *delay)
 {
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	unsigned num_allocated_pages;
@@ -152,14 +154,21 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
 	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		struct page *page = balloon_page_enqueue(vb_dev_info);
+		struct page *page;
+
+		if (timer_pending(&vb->deflate_on_oom_timer)) {
+			/* Wait for hold off timer expiracy. */
+			*delay = HZ;
+			break;
+		}
+		page = balloon_page_enqueue(vb_dev_info);
 
 		if (!page) {
 			dev_info_ratelimited(&vb->vdev->dev,
 					     "Out of puff! Can't get %u pages\n",
 					     VIRTIO_BALLOON_PAGES_PER_PAGE);
 			/* Sleep for at least 1/5 of a second before retry. */
-			msleep(200);
+			*delay = HZ / 5;
 			break;
 		}
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
@@ -310,7 +319,8 @@ static void virtballoon_changed(struct virtio_device *vdev)
 
 	spin_lock_irqsave(&vb->stop_update_lock, flags);
 	if (!vb->stop_update)
-		queue_work(system_freezable_wq, &vb->update_balloon_size_work);
+		queue_delayed_work(system_freezable_wq,
+				   &vb->update_balloon_size_work, 0);
 	spin_unlock_irqrestore(&vb->stop_update_lock, flags);
 }
 
@@ -366,9 +376,13 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
+	/* Hold off fill_balloon() for 60 seconds. */
+	mod_timer(&vb->deflate_on_oom_timer, jiffies + 60 * HZ);
 	freed = parm;
 	num_freed_pages = leak_balloon(vb, oom_pages);
 	update_balloon_size(vb);
+	dev_info_ratelimited(&vb->vdev->dev, "Released %u pages. Remains %u pages.\n",
+			     num_freed_pages, vb->num_pages);
 	*freed += num_freed_pages;
 
 	return NOTIFY_OK;
@@ -387,19 +401,21 @@ static void update_balloon_size_func(struct work_struct *work)
 {
 	struct virtio_balloon *vb;
 	s64 diff;
+	unsigned long delay = 0;
 
-	vb = container_of(work, struct virtio_balloon,
+	vb = container_of(to_delayed_work(work), struct virtio_balloon,
 			  update_balloon_size_work);
 	diff = towards_target(vb);
 
 	if (diff > 0)
-		diff -= fill_balloon(vb, diff);
+		diff -= fill_balloon(vb, diff, &delay);
 	else if (diff < 0)
 		diff += leak_balloon(vb, -diff);
 	update_balloon_size(vb);
 
 	if (diff)
-		queue_work(system_freezable_wq, work);
+		queue_delayed_work(system_freezable_wq, to_delayed_work(work),
+				   delay);
 }
 
 static int init_vqs(struct virtio_balloon *vb)
@@ -521,6 +537,10 @@ static struct dentry *balloon_mount(struct file_system_type *fs_type,
 
 #endif /* CONFIG_BALLOON_COMPACTION */
 
+static void timer_expired(unsigned long unused)
+{
+}
+
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
@@ -539,7 +559,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	}
 
 	INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func);
-	INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func);
+	INIT_DELAYED_WORK(&vb->update_balloon_size_work,
+			  update_balloon_size_func);
 	spin_lock_init(&vb->stop_update_lock);
 	vb->stop_update = false;
 	vb->num_pages = 0;
@@ -553,6 +574,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	if (err)
 		goto out_free_vb;
 
+	setup_timer(&vb->deflate_on_oom_timer, timer_expired, 0);
 	vb->nb.notifier_call = virtballoon_oom_notify;
 	vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
 	err = register_oom_notifier(&vb->nb);
@@ -564,6 +586,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	if (IS_ERR(balloon_mnt)) {
 		err = PTR_ERR(balloon_mnt);
 		unregister_oom_notifier(&vb->nb);
+		del_timer_sync(&vb->deflate_on_oom_timer);
 		goto out_del_vqs;
 	}
 
@@ -573,6 +596,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		err = PTR_ERR(vb->vb_dev_info.inode);
 		kern_unmount(balloon_mnt);
 		unregister_oom_notifier(&vb->nb);
+		del_timer_sync(&vb->deflate_on_oom_timer);
 		vb->vb_dev_info.inode = NULL;
 		goto out_del_vqs;
 	}
@@ -611,11 +635,12 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	struct virtio_balloon *vb = vdev->priv;
 
 	unregister_oom_notifier(&vb->nb);
+	del_timer_sync(&vb->deflate_on_oom_timer);
 
 	spin_lock_irq(&vb->stop_update_lock);
 	vb->stop_update = true;
 	spin_unlock_irq(&vb->stop_update_lock);
-	cancel_work_sync(&vb->update_balloon_size_work);
+	cancel_delayed_work_sync(&vb->update_balloon_size_work);
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
 	remove_common(vb);
----------------------------------------

While response was better than now, inflating again spoiled the effort.
Retrying to inflate until allocation fails is already too painful.

Complete log is at http://I-love.SAKURA.ne.jp/tmp/20171018-deflate.log.xz .
----------------------------------------
[   19.529096] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
[   19.530721] kworker/0:2 cpuset=/ mems_allowed=0
[   19.531581] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
[   19.532397] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   19.533285] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[   19.534143] Call Trace:
[   19.535015]  dump_stack+0x63/0x87
[   19.535844]  warn_alloc+0x114/0x1c0
[   19.536667]  __alloc_pages_slowpath+0x9a6/0xba7
[   19.537491]  ? sched_clock_cpu+0x11/0xb0
[   19.538311]  __alloc_pages_nodemask+0x26a/0x290
[   19.539188]  alloc_pages_current+0x6a/0xb0
[   19.540004]  balloon_page_enqueue+0x25/0xf0
[   19.540818]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
[   19.541626]  process_one_work+0x149/0x360
[   19.542417]  worker_thread+0x4d/0x3c0
[   19.543186]  kthread+0x109/0x140
[   19.543930]  ? rescuer_thread+0x380/0x380
[   19.544716]  ? kthread_park+0x60/0x60
[   19.545426]  ret_from_fork+0x25/0x30
[   19.546141] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   19.547903] virtio_balloon virtio3: Released 256 pages. Remains 1984834 pages.
[   19.659660] virtio_balloon virtio3: Released 256 pages. Remains 1984578 pages.
[   21.891392] virtio_balloon virtio3: Released 256 pages. Remains 1984322 pages.
[   21.894719] virtio_balloon virtio3: Released 256 pages. Remains 1984066 pages.
[   22.490131] virtio_balloon virtio3: Released 256 pages. Remains 1983810 pages.
[   31.939666] virtio_balloon virtio3: Released 256 pages. Remains 1983554 pages.
[   95.524753] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
[   95.525641] kworker/0:2 cpuset=/ mems_allowed=0
[   95.526110] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
[   95.526552] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   95.527018] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[   95.527492] Call Trace:
[   95.527969]  dump_stack+0x63/0x87
[   95.528469]  warn_alloc+0x114/0x1c0
[   95.528922]  __alloc_pages_slowpath+0x9a6/0xba7
[   95.529388]  ? qxl_image_free_objects+0x56/0x60 [qxl]
[   95.529849]  ? qxl_draw_opaque_fb+0x102/0x3a0 [qxl]
[   95.530315]  __alloc_pages_nodemask+0x26a/0x290
[   95.530777]  alloc_pages_current+0x6a/0xb0
[   95.531243]  balloon_page_enqueue+0x25/0xf0
[   95.531703]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
[   95.532180]  process_one_work+0x149/0x360
[   95.532645]  worker_thread+0x4d/0x3c0
[   95.533143]  kthread+0x109/0x140
[   95.533622]  ? rescuer_thread+0x380/0x380
[   95.534100]  ? kthread_park+0x60/0x60
[   95.534568]  ret_from_fork+0x25/0x30
[   95.535093] warn_alloc_show_mem: 1 callbacks suppressed
[   95.535093] Mem-Info:
[   95.536072] active_anon:11171 inactive_anon:2084 isolated_anon:0
[   95.536072]  active_file:8 inactive_file:70 isolated_file:0
[   95.536072]  unevictable:0 dirty:0 writeback:0 unstable:0
[   95.536072]  slab_reclaimable:3554 slab_unreclaimable:6848
[   95.536072]  mapped:588 shmem:2144 pagetables:749 bounce:0
[   95.536072]  free:25859 free_pcp:72 free_cma:0
[   95.538922] Node 0 active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:280kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2352kB dirty:0kB writeback:0kB shmem:8576kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   95.540516] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   95.542325] lowmem_reserve[]: 0 2954 7925 7925 7925
[   95.543020] Node 0 DMA32 free:44748kB min:25144kB low:31428kB high:37712kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129308kB managed:3063740kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   95.544949] lowmem_reserve[]: 0 0 4970 4970 4970
[   95.545624] Node 0 Normal free:42788kB min:42304kB low:52880kB high:63456kB active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:108kB unevictable:0kB writepending:0kB present:5242880kB managed:5093540kB mlocked:0kB kernel_stack:1984kB pagetables:2996kB bounce:0kB free_pcp:372kB local_pcp:220kB free_cma:0kB
[   95.547739] lowmem_reserve[]: 0 0 0 0 0
[   95.548464] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[   95.549988] Node 0 DMA32: 3*4kB (UM) 4*8kB (UM) 2*16kB (U) 2*32kB (U) 1*64kB (U) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UM) 2*2048kB (UM) 9*4096kB (M) = 44748kB
[   95.551551] Node 0 Normal: 925*4kB (UME) 455*8kB (UME) 349*16kB (UME) 137*32kB (UME) 37*64kB (UME) 23*128kB (UME) 8*256kB (UME) 5*512kB (UM) 3*1024kB (UM) 6*2048kB (U) 0*4096kB = 42588kB
[   95.553147] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   95.554030] 2224 total pagecache pages
[   95.554874] 0 pages in swap cache
[   95.555667] Swap cache stats: add 0, delete 0, find 0/0
[   95.556469] Free swap  = 0kB
[   95.557280] Total swap = 0kB
[   95.558079] 2097045 pages RAM
[   95.558856] 0 pages HighMem/MovableOnly
[   95.559652] 53748 pages reserved
[   95.560444] 0 pages cma reserved
[   95.561262] 0 pages hwpoisoned
[   95.562086] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.565779] virtio_balloon virtio3: Released 256 pages. Remains 1984947 pages.
[   96.265255] virtio_balloon virtio3: Released 256 pages. Remains 1984691 pages.
[  105.498910] virtio_balloon virtio3: Released 256 pages. Remains 1984435 pages.
[  105.500518] virtio_balloon virtio3: Released 256 pages. Remains 1984179 pages.
[  105.520034] virtio_balloon virtio3: Released 256 pages. Remains 1983923 pages.
----------------------------------------

Michael S. Tsirkin wrote:
> I think that's the case. Question is, when can we inflate again?

I think that it is when the host explicitly asked again, for
VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-16 17:01           ` Michael S. Tsirkin
  2017-10-18 10:59             ` Tetsuo Handa
@ 2017-10-18 10:59             ` Tetsuo Handa
  1 sibling, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-18 10:59 UTC (permalink / raw)
  To: mst; +Cc: virtualization, linux-mm, rmaksudova, den, mhocko

Tetsuo Handa wrote:
> 20171016-deflate.log.xz continued printing "puff" messages without any OOM
> killer messages, for fill_balloon() always inflates faster than leak_balloon()
> deflates.
> 
> Since the OOM killer cannot be invoked unless leak_balloon() completely
> deflates faster than fill_balloon() inflates, the guest remained unusable
> (e.g. unable to login via ssh) other than printing "puff" messages.
> This result was worse than 20171016-default.log.xz , for the system was
> not able to make any forward progress (i.e. complete OOM lockup).

I tested further and found that it is not complete OOM lockup.

It turned out that the reason of being unable to login via ssh was that fork()
was failing because __vm_enough_memory() was failing because
/proc/sys/vm/overcommit_memory was set to 0. Although virtio_balloon driver
was ready to release pages if asked via virtballoon_oom_notify() from
out_of_memory(), __vm_enough_memory() was not able to take such pages into
account. As a result, operations which need to use fork() were failing without
calling out_of_memory().
( http://lkml.kernel.org/r/201710181954.FHH51594.MtFOFLOQFSOHVJ@I-love.SAKURA.ne.jp )

Do you see anything wrong with the patch I used for emulating
VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?

----------------------------------------
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..a679ac2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 		}
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
-		if (!virtio_has_feature(vb->vdev,
+		if (virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, -1);
 	}
@@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	struct page *page, *next;
 
 	list_for_each_entry_safe(page, next, pages, lru) {
-		if (!virtio_has_feature(vb->vdev,
+		if (virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, 1);
 		list_del(&page->lru);
@@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	unsigned num_freed_pages;
 
 	vb = container_of(self, struct virtio_balloon, nb);
-	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
 	freed = parm;
----------------------------------------

> As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
> OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
> fill_balloon() sequence can effectively disable the OOM killer when the host
> assumed that it's safe to inflate the balloon to a large portion of guest
> memory and this won't cause an OOM situation.

The other problem is that, although it is not complete OOM lockup, it is too
slow to wait if we hit out_of_memory() => fill_balloon() => out_of_memory() =>
fill_balloon() sequence.

> If leak_balloon() from out_of_memory() should be stronger than
> fill_balloon() from update_balloon_size_func(), we need to make
> sure that update_balloon_size_func() stops calling fill_balloon()
> when leak_balloon() was called from out_of_memory().

I tried below patch to reduce the possibility of hitting out_of_memory() =>
fill_balloon() => out_of_memory() => fill_balloon() sequence.

----------------------------------------
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a679ac2..9037fee 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -57,7 +57,7 @@ struct virtio_balloon {
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
-	struct work_struct update_balloon_size_work;
+	struct delayed_work update_balloon_size_work;
 
 	/* Prevent updating balloon when it is being canceled. */
 	spinlock_t stop_update_lock;
@@ -88,6 +88,7 @@ struct virtio_balloon {
 
 	/* To register callback in oom notifier call chain */
 	struct notifier_block nb;
+	struct timer_list deflate_on_oom_timer;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -141,7 +142,8 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned fill_balloon(struct virtio_balloon *vb, size_t num,
+			     unsigned long *delay)
 {
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	unsigned num_allocated_pages;
@@ -152,14 +154,21 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
 	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		struct page *page = balloon_page_enqueue(vb_dev_info);
+		struct page *page;
+
+		if (timer_pending(&vb->deflate_on_oom_timer)) {
+			/* Wait for hold off timer expiracy. */
+			*delay = HZ;
+			break;
+		}
+		page = balloon_page_enqueue(vb_dev_info);
 
 		if (!page) {
 			dev_info_ratelimited(&vb->vdev->dev,
 					     "Out of puff! Can't get %u pages\n",
 					     VIRTIO_BALLOON_PAGES_PER_PAGE);
 			/* Sleep for at least 1/5 of a second before retry. */
-			msleep(200);
+			*delay = HZ / 5;
 			break;
 		}
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
@@ -310,7 +319,8 @@ static void virtballoon_changed(struct virtio_device *vdev)
 
 	spin_lock_irqsave(&vb->stop_update_lock, flags);
 	if (!vb->stop_update)
-		queue_work(system_freezable_wq, &vb->update_balloon_size_work);
+		queue_delayed_work(system_freezable_wq,
+				   &vb->update_balloon_size_work, 0);
 	spin_unlock_irqrestore(&vb->stop_update_lock, flags);
 }
 
@@ -366,9 +376,13 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
+	/* Hold off fill_balloon() for 60 seconds. */
+	mod_timer(&vb->deflate_on_oom_timer, jiffies + 60 * HZ);
 	freed = parm;
 	num_freed_pages = leak_balloon(vb, oom_pages);
 	update_balloon_size(vb);
+	dev_info_ratelimited(&vb->vdev->dev, "Released %u pages. Remains %u pages.\n",
+			     num_freed_pages, vb->num_pages);
 	*freed += num_freed_pages;
 
 	return NOTIFY_OK;
@@ -387,19 +401,21 @@ static void update_balloon_size_func(struct work_struct *work)
 {
 	struct virtio_balloon *vb;
 	s64 diff;
+	unsigned long delay = 0;
 
-	vb = container_of(work, struct virtio_balloon,
+	vb = container_of(to_delayed_work(work), struct virtio_balloon,
 			  update_balloon_size_work);
 	diff = towards_target(vb);
 
 	if (diff > 0)
-		diff -= fill_balloon(vb, diff);
+		diff -= fill_balloon(vb, diff, &delay);
 	else if (diff < 0)
 		diff += leak_balloon(vb, -diff);
 	update_balloon_size(vb);
 
 	if (diff)
-		queue_work(system_freezable_wq, work);
+		queue_delayed_work(system_freezable_wq, to_delayed_work(work),
+				   delay);
 }
 
 static int init_vqs(struct virtio_balloon *vb)
@@ -521,6 +537,10 @@ static struct dentry *balloon_mount(struct file_system_type *fs_type,
 
 #endif /* CONFIG_BALLOON_COMPACTION */
 
+static void timer_expired(unsigned long unused)
+{
+}
+
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
@@ -539,7 +559,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	}
 
 	INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func);
-	INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func);
+	INIT_DELAYED_WORK(&vb->update_balloon_size_work,
+			  update_balloon_size_func);
 	spin_lock_init(&vb->stop_update_lock);
 	vb->stop_update = false;
 	vb->num_pages = 0;
@@ -553,6 +574,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	if (err)
 		goto out_free_vb;
 
+	setup_timer(&vb->deflate_on_oom_timer, timer_expired, 0);
 	vb->nb.notifier_call = virtballoon_oom_notify;
 	vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
 	err = register_oom_notifier(&vb->nb);
@@ -564,6 +586,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	if (IS_ERR(balloon_mnt)) {
 		err = PTR_ERR(balloon_mnt);
 		unregister_oom_notifier(&vb->nb);
+		del_timer_sync(&vb->deflate_on_oom_timer);
 		goto out_del_vqs;
 	}
 
@@ -573,6 +596,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		err = PTR_ERR(vb->vb_dev_info.inode);
 		kern_unmount(balloon_mnt);
 		unregister_oom_notifier(&vb->nb);
+		del_timer_sync(&vb->deflate_on_oom_timer);
 		vb->vb_dev_info.inode = NULL;
 		goto out_del_vqs;
 	}
@@ -611,11 +635,12 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	struct virtio_balloon *vb = vdev->priv;
 
 	unregister_oom_notifier(&vb->nb);
+	del_timer_sync(&vb->deflate_on_oom_timer);
 
 	spin_lock_irq(&vb->stop_update_lock);
 	vb->stop_update = true;
 	spin_unlock_irq(&vb->stop_update_lock);
-	cancel_work_sync(&vb->update_balloon_size_work);
+	cancel_delayed_work_sync(&vb->update_balloon_size_work);
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
 	remove_common(vb);
----------------------------------------

While response was better than now, inflating again spoiled the effort.
Retrying to inflate until allocation fails is already too painful.

Complete log is at http://I-love.SAKURA.ne.jp/tmp/20171018-deflate.log.xz .
----------------------------------------
[   19.529096] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
[   19.530721] kworker/0:2 cpuset=/ mems_allowed=0
[   19.531581] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
[   19.532397] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   19.533285] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[   19.534143] Call Trace:
[   19.535015]  dump_stack+0x63/0x87
[   19.535844]  warn_alloc+0x114/0x1c0
[   19.536667]  __alloc_pages_slowpath+0x9a6/0xba7
[   19.537491]  ? sched_clock_cpu+0x11/0xb0
[   19.538311]  __alloc_pages_nodemask+0x26a/0x290
[   19.539188]  alloc_pages_current+0x6a/0xb0
[   19.540004]  balloon_page_enqueue+0x25/0xf0
[   19.540818]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
[   19.541626]  process_one_work+0x149/0x360
[   19.542417]  worker_thread+0x4d/0x3c0
[   19.543186]  kthread+0x109/0x140
[   19.543930]  ? rescuer_thread+0x380/0x380
[   19.544716]  ? kthread_park+0x60/0x60
[   19.545426]  ret_from_fork+0x25/0x30
[   19.546141] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   19.547903] virtio_balloon virtio3: Released 256 pages. Remains 1984834 pages.
[   19.659660] virtio_balloon virtio3: Released 256 pages. Remains 1984578 pages.
[   21.891392] virtio_balloon virtio3: Released 256 pages. Remains 1984322 pages.
[   21.894719] virtio_balloon virtio3: Released 256 pages. Remains 1984066 pages.
[   22.490131] virtio_balloon virtio3: Released 256 pages. Remains 1983810 pages.
[   31.939666] virtio_balloon virtio3: Released 256 pages. Remains 1983554 pages.
[   95.524753] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
[   95.525641] kworker/0:2 cpuset=/ mems_allowed=0
[   95.526110] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
[   95.526552] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   95.527018] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[   95.527492] Call Trace:
[   95.527969]  dump_stack+0x63/0x87
[   95.528469]  warn_alloc+0x114/0x1c0
[   95.528922]  __alloc_pages_slowpath+0x9a6/0xba7
[   95.529388]  ? qxl_image_free_objects+0x56/0x60 [qxl]
[   95.529849]  ? qxl_draw_opaque_fb+0x102/0x3a0 [qxl]
[   95.530315]  __alloc_pages_nodemask+0x26a/0x290
[   95.530777]  alloc_pages_current+0x6a/0xb0
[   95.531243]  balloon_page_enqueue+0x25/0xf0
[   95.531703]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
[   95.532180]  process_one_work+0x149/0x360
[   95.532645]  worker_thread+0x4d/0x3c0
[   95.533143]  kthread+0x109/0x140
[   95.533622]  ? rescuer_thread+0x380/0x380
[   95.534100]  ? kthread_park+0x60/0x60
[   95.534568]  ret_from_fork+0x25/0x30
[   95.535093] warn_alloc_show_mem: 1 callbacks suppressed
[   95.535093] Mem-Info:
[   95.536072] active_anon:11171 inactive_anon:2084 isolated_anon:0
[   95.536072]  active_file:8 inactive_file:70 isolated_file:0
[   95.536072]  unevictable:0 dirty:0 writeback:0 unstable:0
[   95.536072]  slab_reclaimable:3554 slab_unreclaimable:6848
[   95.536072]  mapped:588 shmem:2144 pagetables:749 bounce:0
[   95.536072]  free:25859 free_pcp:72 free_cma:0
[   95.538922] Node 0 active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:280kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2352kB dirty:0kB writeback:0kB shmem:8576kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   95.540516] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   95.542325] lowmem_reserve[]: 0 2954 7925 7925 7925
[   95.543020] Node 0 DMA32 free:44748kB min:25144kB low:31428kB high:37712kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129308kB managed:3063740kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   95.544949] lowmem_reserve[]: 0 0 4970 4970 4970
[   95.545624] Node 0 Normal free:42788kB min:42304kB low:52880kB high:63456kB active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:108kB unevictable:0kB writepending:0kB present:5242880kB managed:5093540kB mlocked:0kB kernel_stack:1984kB pagetables:2996kB bounce:0kB free_pcp:372kB local_pcp:220kB free_cma:0kB
[   95.547739] lowmem_reserve[]: 0 0 0 0 0
[   95.548464] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[   95.549988] Node 0 DMA32: 3*4kB (UM) 4*8kB (UM) 2*16kB (U) 2*32kB (U) 1*64kB (U) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UM) 2*2048kB (UM) 9*4096kB (M) = 44748kB
[   95.551551] Node 0 Normal: 925*4kB (UME) 455*8kB (UME) 349*16kB (UME) 137*32kB (UME) 37*64kB (UME) 23*128kB (UME) 8*256kB (UME) 5*512kB (UM) 3*1024kB (UM) 6*2048kB (U) 0*4096kB = 42588kB
[   95.553147] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   95.554030] 2224 total pagecache pages
[   95.554874] 0 pages in swap cache
[   95.555667] Swap cache stats: add 0, delete 0, find 0/0
[   95.556469] Free swap  = 0kB
[   95.557280] Total swap = 0kB
[   95.558079] 2097045 pages RAM
[   95.558856] 0 pages HighMem/MovableOnly
[   95.559652] 53748 pages reserved
[   95.560444] 0 pages cma reserved
[   95.561262] 0 pages hwpoisoned
[   95.562086] virtio_balloon virtio3: Out of puff! Can't get 1 pages
[   95.565779] virtio_balloon virtio3: Released 256 pages. Remains 1984947 pages.
[   96.265255] virtio_balloon virtio3: Released 256 pages. Remains 1984691 pages.
[  105.498910] virtio_balloon virtio3: Released 256 pages. Remains 1984435 pages.
[  105.500518] virtio_balloon virtio3: Released 256 pages. Remains 1984179 pages.
[  105.520034] virtio_balloon virtio3: Released 256 pages. Remains 1983923 pages.
----------------------------------------

Michael S. Tsirkin wrote:
> I think that's the case. Question is, when can we inflate again?

I think that it is when the host explicitly asked again, for
VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-18 10:59             ` Tetsuo Handa
@ 2017-10-18 17:16               ` Michael S. Tsirkin
  2017-10-19 11:52                 ` Tetsuo Handa
  2017-10-19 11:52                 ` Tetsuo Handa
  2017-10-18 17:16               ` Michael S. Tsirkin
  1 sibling, 2 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-18 17:16 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, wei.w.wang, virtualization, linux-mm, rmaksudova, den

On Wed, Oct 18, 2017 at 07:59:23PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > 20171016-deflate.log.xz continued printing "puff" messages without any OOM
> > killer messages, for fill_balloon() always inflates faster than leak_balloon()
> > deflates.
> > 
> > Since the OOM killer cannot be invoked unless leak_balloon() completely
> > deflates faster than fill_balloon() inflates, the guest remained unusable
> > (e.g. unable to login via ssh) other than printing "puff" messages.
> > This result was worse than 20171016-default.log.xz , for the system was
> > not able to make any forward progress (i.e. complete OOM lockup).
> 
> I tested further and found that it is not complete OOM lockup.
> 
> It turned out that the reason of being unable to login via ssh was that fork()
> was failing because __vm_enough_memory() was failing because
> /proc/sys/vm/overcommit_memory was set to 0. Although virtio_balloon driver
> was ready to release pages if asked via virtballoon_oom_notify() from
> out_of_memory(), __vm_enough_memory() was not able to take such pages into
> account. As a result, operations which need to use fork() were failing without
> calling out_of_memory().
> ( http://lkml.kernel.org/r/201710181954.FHH51594.MtFOFLOQFSOHVJ@I-love.SAKURA.ne.jp )
> 
> Do you see anything wrong with the patch I used for emulating
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?
> 
> ----------------------------------------
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..a679ac2 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  		}
>  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> -		if (!virtio_has_feature(vb->vdev,
> +		if (virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  			adjust_managed_page_count(page, -1);
>  	}
> @@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	struct page *page, *next;
>  
>  	list_for_each_entry_safe(page, next, pages, lru) {
> -		if (!virtio_has_feature(vb->vdev,
> +		if (virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  			adjust_managed_page_count(page, 1);
>  		list_del(&page->lru);
> @@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	unsigned num_freed_pages;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
> -	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> ----------------------------------------

Looks right but it's probably easier to configure qemu to set that
feature bit. Basically you just add deflate-on-oom=on to the
balloon device.


> > As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
> > OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
> > fill_balloon() sequence can effectively disable the OOM killer when the host
> > assumed that it's safe to inflate the balloon to a large portion of guest
> > memory and this won't cause an OOM situation.
> 
> The other problem is that, although it is not complete OOM lockup, it is too
> slow to wait if we hit out_of_memory() => fill_balloon() => out_of_memory() =>
> fill_balloon() sequence.
> 
> > If leak_balloon() from out_of_memory() should be stronger than
> > fill_balloon() from update_balloon_size_func(), we need to make
> > sure that update_balloon_size_func() stops calling fill_balloon()
> > when leak_balloon() was called from out_of_memory().
> 
> I tried below patch to reduce the possibility of hitting out_of_memory() =>
> fill_balloon() => out_of_memory() => fill_balloon() sequence.
> 
> ----------------------------------------
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a679ac2..9037fee 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -57,7 +57,7 @@ struct virtio_balloon {
>  
>  	/* The balloon servicing is delegated to a freezable workqueue. */
>  	struct work_struct update_balloon_stats_work;
> -	struct work_struct update_balloon_size_work;
> +	struct delayed_work update_balloon_size_work;
>  
>  	/* Prevent updating balloon when it is being canceled. */
>  	spinlock_t stop_update_lock;
> @@ -88,6 +88,7 @@ struct virtio_balloon {
>  
>  	/* To register callback in oom notifier call chain */
>  	struct notifier_block nb;
> +	struct timer_list deflate_on_oom_timer;
>  };
>  
>  static struct virtio_device_id id_table[] = {
> @@ -141,7 +142,8 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned fill_balloon(struct virtio_balloon *vb, size_t num,
> +			     unsigned long *delay)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	unsigned num_allocated_pages;
> @@ -152,14 +154,21 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
>  	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> -		struct page *page = balloon_page_enqueue(vb_dev_info);
> +		struct page *page;
> +
> +		if (timer_pending(&vb->deflate_on_oom_timer)) {
> +			/* Wait for hold off timer expiracy. */
> +			*delay = HZ;
> +			break;
> +		}
> +		page = balloon_page_enqueue(vb_dev_info);
>  
>  		if (!page) {
>  			dev_info_ratelimited(&vb->vdev->dev,
>  					     "Out of puff! Can't get %u pages\n",
>  					     VIRTIO_BALLOON_PAGES_PER_PAGE);
>  			/* Sleep for at least 1/5 of a second before retry. */
> -			msleep(200);
> +			*delay = HZ / 5;
>  			break;
>  		}
>  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> @@ -310,7 +319,8 @@ static void virtballoon_changed(struct virtio_device *vdev)
>  
>  	spin_lock_irqsave(&vb->stop_update_lock, flags);
>  	if (!vb->stop_update)
> -		queue_work(system_freezable_wq, &vb->update_balloon_size_work);
> +		queue_delayed_work(system_freezable_wq,
> +				   &vb->update_balloon_size_work, 0);
>  	spin_unlock_irqrestore(&vb->stop_update_lock, flags);
>  }
>  
> @@ -366,9 +376,13 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
> +	/* Hold off fill_balloon() for 60 seconds. */
> +	mod_timer(&vb->deflate_on_oom_timer, jiffies + 60 * HZ);
>  	freed = parm;
>  	num_freed_pages = leak_balloon(vb, oom_pages);
>  	update_balloon_size(vb);
> +	dev_info_ratelimited(&vb->vdev->dev, "Released %u pages. Remains %u pages.\n",
> +			     num_freed_pages, vb->num_pages);
>  	*freed += num_freed_pages;
>  
>  	return NOTIFY_OK;
> @@ -387,19 +401,21 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	unsigned long delay = 0;
>  
> -	vb = container_of(work, struct virtio_balloon,
> +	vb = container_of(to_delayed_work(work), struct virtio_balloon,
>  			  update_balloon_size_work);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, &delay);
>  	else if (diff < 0)
>  		diff += leak_balloon(vb, -diff);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> -		queue_work(system_freezable_wq, work);
> +		queue_delayed_work(system_freezable_wq, to_delayed_work(work),
> +				   delay);
>  }
>  
>  static int init_vqs(struct virtio_balloon *vb)
> @@ -521,6 +537,10 @@ static struct dentry *balloon_mount(struct file_system_type *fs_type,
>  
>  #endif /* CONFIG_BALLOON_COMPACTION */
>  
> +static void timer_expired(unsigned long unused)
> +{
> +}
> +
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> @@ -539,7 +559,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	}
>  
>  	INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func);
> -	INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func);
> +	INIT_DELAYED_WORK(&vb->update_balloon_size_work,
> +			  update_balloon_size_func);
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> @@ -553,6 +574,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	if (err)
>  		goto out_free_vb;
>  
> +	setup_timer(&vb->deflate_on_oom_timer, timer_expired, 0);
>  	vb->nb.notifier_call = virtballoon_oom_notify;
>  	vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
>  	err = register_oom_notifier(&vb->nb);
> @@ -564,6 +586,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	if (IS_ERR(balloon_mnt)) {
>  		err = PTR_ERR(balloon_mnt);
>  		unregister_oom_notifier(&vb->nb);
> +		del_timer_sync(&vb->deflate_on_oom_timer);
>  		goto out_del_vqs;
>  	}
>  
> @@ -573,6 +596,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  		err = PTR_ERR(vb->vb_dev_info.inode);
>  		kern_unmount(balloon_mnt);
>  		unregister_oom_notifier(&vb->nb);
> +		del_timer_sync(&vb->deflate_on_oom_timer);
>  		vb->vb_dev_info.inode = NULL;
>  		goto out_del_vqs;
>  	}
> @@ -611,11 +635,12 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	struct virtio_balloon *vb = vdev->priv;
>  
>  	unregister_oom_notifier(&vb->nb);
> +	del_timer_sync(&vb->deflate_on_oom_timer);
>  
>  	spin_lock_irq(&vb->stop_update_lock);
>  	vb->stop_update = true;
>  	spin_unlock_irq(&vb->stop_update_lock);
> -	cancel_work_sync(&vb->update_balloon_size_work);
> +	cancel_delayed_work_sync(&vb->update_balloon_size_work);
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> ----------------------------------------

OK. Or if you use my patch, you can just set a flag and go
	if (vb->oom)
		msleep(1000);
at beginning of fill_balloon.



> While response was better than now, inflating again spoiled the effort.
> Retrying to inflate until allocation fails is already too painful.
> 
> Complete log is at http://I-love.SAKURA.ne.jp/tmp/20171018-deflate.log.xz .
> ----------------------------------------
> [   19.529096] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
> [   19.530721] kworker/0:2 cpuset=/ mems_allowed=0
> [   19.531581] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
> [   19.532397] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   19.533285] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
> [   19.534143] Call Trace:
> [   19.535015]  dump_stack+0x63/0x87
> [   19.535844]  warn_alloc+0x114/0x1c0
> [   19.536667]  __alloc_pages_slowpath+0x9a6/0xba7
> [   19.537491]  ? sched_clock_cpu+0x11/0xb0
> [   19.538311]  __alloc_pages_nodemask+0x26a/0x290
> [   19.539188]  alloc_pages_current+0x6a/0xb0
> [   19.540004]  balloon_page_enqueue+0x25/0xf0
> [   19.540818]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
> [   19.541626]  process_one_work+0x149/0x360
> [   19.542417]  worker_thread+0x4d/0x3c0
> [   19.543186]  kthread+0x109/0x140
> [   19.543930]  ? rescuer_thread+0x380/0x380
> [   19.544716]  ? kthread_park+0x60/0x60
> [   19.545426]  ret_from_fork+0x25/0x30
> [   19.546141] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   19.547903] virtio_balloon virtio3: Released 256 pages. Remains 1984834 pages.
> [   19.659660] virtio_balloon virtio3: Released 256 pages. Remains 1984578 pages.
> [   21.891392] virtio_balloon virtio3: Released 256 pages. Remains 1984322 pages.
> [   21.894719] virtio_balloon virtio3: Released 256 pages. Remains 1984066 pages.
> [   22.490131] virtio_balloon virtio3: Released 256 pages. Remains 1983810 pages.
> [   31.939666] virtio_balloon virtio3: Released 256 pages. Remains 1983554 pages.
> [   95.524753] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
> [   95.525641] kworker/0:2 cpuset=/ mems_allowed=0
> [   95.526110] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
> [   95.526552] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   95.527018] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
> [   95.527492] Call Trace:
> [   95.527969]  dump_stack+0x63/0x87
> [   95.528469]  warn_alloc+0x114/0x1c0
> [   95.528922]  __alloc_pages_slowpath+0x9a6/0xba7
> [   95.529388]  ? qxl_image_free_objects+0x56/0x60 [qxl]
> [   95.529849]  ? qxl_draw_opaque_fb+0x102/0x3a0 [qxl]
> [   95.530315]  __alloc_pages_nodemask+0x26a/0x290
> [   95.530777]  alloc_pages_current+0x6a/0xb0
> [   95.531243]  balloon_page_enqueue+0x25/0xf0
> [   95.531703]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
> [   95.532180]  process_one_work+0x149/0x360
> [   95.532645]  worker_thread+0x4d/0x3c0
> [   95.533143]  kthread+0x109/0x140
> [   95.533622]  ? rescuer_thread+0x380/0x380
> [   95.534100]  ? kthread_park+0x60/0x60
> [   95.534568]  ret_from_fork+0x25/0x30
> [   95.535093] warn_alloc_show_mem: 1 callbacks suppressed
> [   95.535093] Mem-Info:
> [   95.536072] active_anon:11171 inactive_anon:2084 isolated_anon:0
> [   95.536072]  active_file:8 inactive_file:70 isolated_file:0
> [   95.536072]  unevictable:0 dirty:0 writeback:0 unstable:0
> [   95.536072]  slab_reclaimable:3554 slab_unreclaimable:6848
> [   95.536072]  mapped:588 shmem:2144 pagetables:749 bounce:0
> [   95.536072]  free:25859 free_pcp:72 free_cma:0
> [   95.538922] Node 0 active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:280kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2352kB dirty:0kB writeback:0kB shmem:8576kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [   95.540516] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   95.542325] lowmem_reserve[]: 0 2954 7925 7925 7925
> [   95.543020] Node 0 DMA32 free:44748kB min:25144kB low:31428kB high:37712kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129308kB managed:3063740kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   95.544949] lowmem_reserve[]: 0 0 4970 4970 4970
> [   95.545624] Node 0 Normal free:42788kB min:42304kB low:52880kB high:63456kB active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:108kB unevictable:0kB writepending:0kB present:5242880kB managed:5093540kB mlocked:0kB kernel_stack:1984kB pagetables:2996kB bounce:0kB free_pcp:372kB local_pcp:220kB free_cma:0kB
> [   95.547739] lowmem_reserve[]: 0 0 0 0 0
> [   95.548464] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
> [   95.549988] Node 0 DMA32: 3*4kB (UM) 4*8kB (UM) 2*16kB (U) 2*32kB (U) 1*64kB (U) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UM) 2*2048kB (UM) 9*4096kB (M) = 44748kB
> [   95.551551] Node 0 Normal: 925*4kB (UME) 455*8kB (UME) 349*16kB (UME) 137*32kB (UME) 37*64kB (UME) 23*128kB (UME) 8*256kB (UME) 5*512kB (UM) 3*1024kB (UM) 6*2048kB (U) 0*4096kB = 42588kB
> [   95.553147] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [   95.554030] 2224 total pagecache pages
> [   95.554874] 0 pages in swap cache
> [   95.555667] Swap cache stats: add 0, delete 0, find 0/0
> [   95.556469] Free swap  = 0kB
> [   95.557280] Total swap = 0kB
> [   95.558079] 2097045 pages RAM
> [   95.558856] 0 pages HighMem/MovableOnly
> [   95.559652] 53748 pages reserved
> [   95.560444] 0 pages cma reserved
> [   95.561262] 0 pages hwpoisoned
> [   95.562086] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.565779] virtio_balloon virtio3: Released 256 pages. Remains 1984947 pages.
> [   96.265255] virtio_balloon virtio3: Released 256 pages. Remains 1984691 pages.
> [  105.498910] virtio_balloon virtio3: Released 256 pages. Remains 1984435 pages.
> [  105.500518] virtio_balloon virtio3: Released 256 pages. Remains 1984179 pages.
> [  105.520034] virtio_balloon virtio3: Released 256 pages. Remains 1983923 pages.
> ----------------------------------------
> 
> Michael S. Tsirkin wrote:
> > I think that's the case. Question is, when can we inflate again?
> 
> I think that it is when the host explicitly asked again, for
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.

Problem is host has no idea when it's safe.
If we expect host to ask again after X seconds we
might just as well do it in the guest.





























--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-18 10:59             ` Tetsuo Handa
  2017-10-18 17:16               ` Michael S. Tsirkin
@ 2017-10-18 17:16               ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-18 17:16 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: virtualization, linux-mm, rmaksudova, den, mhocko

On Wed, Oct 18, 2017 at 07:59:23PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > 20171016-deflate.log.xz continued printing "puff" messages without any OOM
> > killer messages, for fill_balloon() always inflates faster than leak_balloon()
> > deflates.
> > 
> > Since the OOM killer cannot be invoked unless leak_balloon() completely
> > deflates faster than fill_balloon() inflates, the guest remained unusable
> > (e.g. unable to login via ssh) other than printing "puff" messages.
> > This result was worse than 20171016-default.log.xz , for the system was
> > not able to make any forward progress (i.e. complete OOM lockup).
> 
> I tested further and found that it is not complete OOM lockup.
> 
> It turned out that the reason of being unable to login via ssh was that fork()
> was failing because __vm_enough_memory() was failing because
> /proc/sys/vm/overcommit_memory was set to 0. Although virtio_balloon driver
> was ready to release pages if asked via virtballoon_oom_notify() from
> out_of_memory(), __vm_enough_memory() was not able to take such pages into
> account. As a result, operations which need to use fork() were failing without
> calling out_of_memory().
> ( http://lkml.kernel.org/r/201710181954.FHH51594.MtFOFLOQFSOHVJ@I-love.SAKURA.ne.jp )
> 
> Do you see anything wrong with the patch I used for emulating
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?
> 
> ----------------------------------------
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f0b3a0b..a679ac2 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  		}
>  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> -		if (!virtio_has_feature(vb->vdev,
> +		if (virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  			adjust_managed_page_count(page, -1);
>  	}
> @@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	struct page *page, *next;
>  
>  	list_for_each_entry_safe(page, next, pages, lru) {
> -		if (!virtio_has_feature(vb->vdev,
> +		if (virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  			adjust_managed_page_count(page, 1);
>  		list_del(&page->lru);
> @@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	unsigned num_freed_pages;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
> -	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> ----------------------------------------

Looks right but it's probably easier to configure qemu to set that
feature bit. Basically you just add deflate-on-oom=on to the
balloon device.


> > As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete
> > OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() =>
> > fill_balloon() sequence can effectively disable the OOM killer when the host
> > assumed that it's safe to inflate the balloon to a large portion of guest
> > memory and this won't cause an OOM situation.
> 
> The other problem is that, although it is not complete OOM lockup, it is too
> slow to wait if we hit out_of_memory() => fill_balloon() => out_of_memory() =>
> fill_balloon() sequence.
> 
> > If leak_balloon() from out_of_memory() should be stronger than
> > fill_balloon() from update_balloon_size_func(), we need to make
> > sure that update_balloon_size_func() stops calling fill_balloon()
> > when leak_balloon() was called from out_of_memory().
> 
> I tried below patch to reduce the possibility of hitting out_of_memory() =>
> fill_balloon() => out_of_memory() => fill_balloon() sequence.
> 
> ----------------------------------------
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a679ac2..9037fee 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -57,7 +57,7 @@ struct virtio_balloon {
>  
>  	/* The balloon servicing is delegated to a freezable workqueue. */
>  	struct work_struct update_balloon_stats_work;
> -	struct work_struct update_balloon_size_work;
> +	struct delayed_work update_balloon_size_work;
>  
>  	/* Prevent updating balloon when it is being canceled. */
>  	spinlock_t stop_update_lock;
> @@ -88,6 +88,7 @@ struct virtio_balloon {
>  
>  	/* To register callback in oom notifier call chain */
>  	struct notifier_block nb;
> +	struct timer_list deflate_on_oom_timer;
>  };
>  
>  static struct virtio_device_id id_table[] = {
> @@ -141,7 +142,8 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned fill_balloon(struct virtio_balloon *vb, size_t num,
> +			     unsigned long *delay)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	unsigned num_allocated_pages;
> @@ -152,14 +154,21 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
>  	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> -		struct page *page = balloon_page_enqueue(vb_dev_info);
> +		struct page *page;
> +
> +		if (timer_pending(&vb->deflate_on_oom_timer)) {
> +			/* Wait for hold off timer expiracy. */
> +			*delay = HZ;
> +			break;
> +		}
> +		page = balloon_page_enqueue(vb_dev_info);
>  
>  		if (!page) {
>  			dev_info_ratelimited(&vb->vdev->dev,
>  					     "Out of puff! Can't get %u pages\n",
>  					     VIRTIO_BALLOON_PAGES_PER_PAGE);
>  			/* Sleep for at least 1/5 of a second before retry. */
> -			msleep(200);
> +			*delay = HZ / 5;
>  			break;
>  		}
>  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> @@ -310,7 +319,8 @@ static void virtballoon_changed(struct virtio_device *vdev)
>  
>  	spin_lock_irqsave(&vb->stop_update_lock, flags);
>  	if (!vb->stop_update)
> -		queue_work(system_freezable_wq, &vb->update_balloon_size_work);
> +		queue_delayed_work(system_freezable_wq,
> +				   &vb->update_balloon_size_work, 0);
>  	spin_unlock_irqrestore(&vb->stop_update_lock, flags);
>  }
>  
> @@ -366,9 +376,13 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
> +	/* Hold off fill_balloon() for 60 seconds. */
> +	mod_timer(&vb->deflate_on_oom_timer, jiffies + 60 * HZ);
>  	freed = parm;
>  	num_freed_pages = leak_balloon(vb, oom_pages);
>  	update_balloon_size(vb);
> +	dev_info_ratelimited(&vb->vdev->dev, "Released %u pages. Remains %u pages.\n",
> +			     num_freed_pages, vb->num_pages);
>  	*freed += num_freed_pages;
>  
>  	return NOTIFY_OK;
> @@ -387,19 +401,21 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	unsigned long delay = 0;
>  
> -	vb = container_of(work, struct virtio_balloon,
> +	vb = container_of(to_delayed_work(work), struct virtio_balloon,
>  			  update_balloon_size_work);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, &delay);
>  	else if (diff < 0)
>  		diff += leak_balloon(vb, -diff);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> -		queue_work(system_freezable_wq, work);
> +		queue_delayed_work(system_freezable_wq, to_delayed_work(work),
> +				   delay);
>  }
>  
>  static int init_vqs(struct virtio_balloon *vb)
> @@ -521,6 +537,10 @@ static struct dentry *balloon_mount(struct file_system_type *fs_type,
>  
>  #endif /* CONFIG_BALLOON_COMPACTION */
>  
> +static void timer_expired(unsigned long unused)
> +{
> +}
> +
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> @@ -539,7 +559,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	}
>  
>  	INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func);
> -	INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func);
> +	INIT_DELAYED_WORK(&vb->update_balloon_size_work,
> +			  update_balloon_size_func);
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> @@ -553,6 +574,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	if (err)
>  		goto out_free_vb;
>  
> +	setup_timer(&vb->deflate_on_oom_timer, timer_expired, 0);
>  	vb->nb.notifier_call = virtballoon_oom_notify;
>  	vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
>  	err = register_oom_notifier(&vb->nb);
> @@ -564,6 +586,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	if (IS_ERR(balloon_mnt)) {
>  		err = PTR_ERR(balloon_mnt);
>  		unregister_oom_notifier(&vb->nb);
> +		del_timer_sync(&vb->deflate_on_oom_timer);
>  		goto out_del_vqs;
>  	}
>  
> @@ -573,6 +596,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  		err = PTR_ERR(vb->vb_dev_info.inode);
>  		kern_unmount(balloon_mnt);
>  		unregister_oom_notifier(&vb->nb);
> +		del_timer_sync(&vb->deflate_on_oom_timer);
>  		vb->vb_dev_info.inode = NULL;
>  		goto out_del_vqs;
>  	}
> @@ -611,11 +635,12 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	struct virtio_balloon *vb = vdev->priv;
>  
>  	unregister_oom_notifier(&vb->nb);
> +	del_timer_sync(&vb->deflate_on_oom_timer);
>  
>  	spin_lock_irq(&vb->stop_update_lock);
>  	vb->stop_update = true;
>  	spin_unlock_irq(&vb->stop_update_lock);
> -	cancel_work_sync(&vb->update_balloon_size_work);
> +	cancel_delayed_work_sync(&vb->update_balloon_size_work);
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> ----------------------------------------

OK. Or if you use my patch, you can just set a flag and go
	if (vb->oom)
		msleep(1000);
at beginning of fill_balloon.



> While response was better than now, inflating again spoiled the effort.
> Retrying to inflate until allocation fails is already too painful.
> 
> Complete log is at http://I-love.SAKURA.ne.jp/tmp/20171018-deflate.log.xz .
> ----------------------------------------
> [   19.529096] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
> [   19.530721] kworker/0:2 cpuset=/ mems_allowed=0
> [   19.531581] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
> [   19.532397] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   19.533285] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
> [   19.534143] Call Trace:
> [   19.535015]  dump_stack+0x63/0x87
> [   19.535844]  warn_alloc+0x114/0x1c0
> [   19.536667]  __alloc_pages_slowpath+0x9a6/0xba7
> [   19.537491]  ? sched_clock_cpu+0x11/0xb0
> [   19.538311]  __alloc_pages_nodemask+0x26a/0x290
> [   19.539188]  alloc_pages_current+0x6a/0xb0
> [   19.540004]  balloon_page_enqueue+0x25/0xf0
> [   19.540818]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
> [   19.541626]  process_one_work+0x149/0x360
> [   19.542417]  worker_thread+0x4d/0x3c0
> [   19.543186]  kthread+0x109/0x140
> [   19.543930]  ? rescuer_thread+0x380/0x380
> [   19.544716]  ? kthread_park+0x60/0x60
> [   19.545426]  ret_from_fork+0x25/0x30
> [   19.546141] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   19.547903] virtio_balloon virtio3: Released 256 pages. Remains 1984834 pages.
> [   19.659660] virtio_balloon virtio3: Released 256 pages. Remains 1984578 pages.
> [   21.891392] virtio_balloon virtio3: Released 256 pages. Remains 1984322 pages.
> [   21.894719] virtio_balloon virtio3: Released 256 pages. Remains 1984066 pages.
> [   22.490131] virtio_balloon virtio3: Released 256 pages. Remains 1983810 pages.
> [   31.939666] virtio_balloon virtio3: Released 256 pages. Remains 1983554 pages.
> [   95.524753] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
> [   95.525641] kworker/0:2 cpuset=/ mems_allowed=0
> [   95.526110] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302
> [   95.526552] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   95.527018] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
> [   95.527492] Call Trace:
> [   95.527969]  dump_stack+0x63/0x87
> [   95.528469]  warn_alloc+0x114/0x1c0
> [   95.528922]  __alloc_pages_slowpath+0x9a6/0xba7
> [   95.529388]  ? qxl_image_free_objects+0x56/0x60 [qxl]
> [   95.529849]  ? qxl_draw_opaque_fb+0x102/0x3a0 [qxl]
> [   95.530315]  __alloc_pages_nodemask+0x26a/0x290
> [   95.530777]  alloc_pages_current+0x6a/0xb0
> [   95.531243]  balloon_page_enqueue+0x25/0xf0
> [   95.531703]  update_balloon_size_func+0xe1/0x260 [virtio_balloon]
> [   95.532180]  process_one_work+0x149/0x360
> [   95.532645]  worker_thread+0x4d/0x3c0
> [   95.533143]  kthread+0x109/0x140
> [   95.533622]  ? rescuer_thread+0x380/0x380
> [   95.534100]  ? kthread_park+0x60/0x60
> [   95.534568]  ret_from_fork+0x25/0x30
> [   95.535093] warn_alloc_show_mem: 1 callbacks suppressed
> [   95.535093] Mem-Info:
> [   95.536072] active_anon:11171 inactive_anon:2084 isolated_anon:0
> [   95.536072]  active_file:8 inactive_file:70 isolated_file:0
> [   95.536072]  unevictable:0 dirty:0 writeback:0 unstable:0
> [   95.536072]  slab_reclaimable:3554 slab_unreclaimable:6848
> [   95.536072]  mapped:588 shmem:2144 pagetables:749 bounce:0
> [   95.536072]  free:25859 free_pcp:72 free_cma:0
> [   95.538922] Node 0 active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:280kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2352kB dirty:0kB writeback:0kB shmem:8576kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [   95.540516] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   95.542325] lowmem_reserve[]: 0 2954 7925 7925 7925
> [   95.543020] Node 0 DMA32 free:44748kB min:25144kB low:31428kB high:37712kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129308kB managed:3063740kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   95.544949] lowmem_reserve[]: 0 0 4970 4970 4970
> [   95.545624] Node 0 Normal free:42788kB min:42304kB low:52880kB high:63456kB active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:108kB unevictable:0kB writepending:0kB present:5242880kB managed:5093540kB mlocked:0kB kernel_stack:1984kB pagetables:2996kB bounce:0kB free_pcp:372kB local_pcp:220kB free_cma:0kB
> [   95.547739] lowmem_reserve[]: 0 0 0 0 0
> [   95.548464] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
> [   95.549988] Node 0 DMA32: 3*4kB (UM) 4*8kB (UM) 2*16kB (U) 2*32kB (U) 1*64kB (U) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UM) 2*2048kB (UM) 9*4096kB (M) = 44748kB
> [   95.551551] Node 0 Normal: 925*4kB (UME) 455*8kB (UME) 349*16kB (UME) 137*32kB (UME) 37*64kB (UME) 23*128kB (UME) 8*256kB (UME) 5*512kB (UM) 3*1024kB (UM) 6*2048kB (U) 0*4096kB = 42588kB
> [   95.553147] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [   95.554030] 2224 total pagecache pages
> [   95.554874] 0 pages in swap cache
> [   95.555667] Swap cache stats: add 0, delete 0, find 0/0
> [   95.556469] Free swap  = 0kB
> [   95.557280] Total swap = 0kB
> [   95.558079] 2097045 pages RAM
> [   95.558856] 0 pages HighMem/MovableOnly
> [   95.559652] 53748 pages reserved
> [   95.560444] 0 pages cma reserved
> [   95.561262] 0 pages hwpoisoned
> [   95.562086] virtio_balloon virtio3: Out of puff! Can't get 1 pages
> [   95.565779] virtio_balloon virtio3: Released 256 pages. Remains 1984947 pages.
> [   96.265255] virtio_balloon virtio3: Released 256 pages. Remains 1984691 pages.
> [  105.498910] virtio_balloon virtio3: Released 256 pages. Remains 1984435 pages.
> [  105.500518] virtio_balloon virtio3: Released 256 pages. Remains 1984179 pages.
> [  105.520034] virtio_balloon virtio3: Released 256 pages. Remains 1983923 pages.
> ----------------------------------------
> 
> Michael S. Tsirkin wrote:
> > I think that's the case. Question is, when can we inflate again?
> 
> I think that it is when the host explicitly asked again, for
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.

Problem is host has no idea when it's safe.
If we expect host to ask again after X seconds we
might just as well do it in the guest.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-18 17:16               ` Michael S. Tsirkin
  2017-10-19 11:52                 ` Tetsuo Handa
@ 2017-10-19 11:52                 ` Tetsuo Handa
  2017-10-19 13:00                   ` Michael S. Tsirkin
  2017-10-19 13:00                   ` Michael S. Tsirkin
  1 sibling, 2 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-19 11:52 UTC (permalink / raw)
  To: mst; +Cc: mhocko, wei.w.wang, virtualization, linux-mm, rmaksudova, den

Michael S. Tsirkin wrote:
> On Wed, Oct 18, 2017 at 07:59:23PM +0900, Tetsuo Handa wrote:
> > Do you see anything wrong with the patch I used for emulating
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?
> > 
> > ----------------------------------------
> > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > index f0b3a0b..a679ac2 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> >  		}
> >  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> >  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > -		if (!virtio_has_feature(vb->vdev,
> > +		if (virtio_has_feature(vb->vdev,
> >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> >  			adjust_managed_page_count(page, -1);
> >  	}
> > @@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
> >  	struct page *page, *next;
> >  
> >  	list_for_each_entry_safe(page, next, pages, lru) {
> > -		if (!virtio_has_feature(vb->vdev,
> > +		if (virtio_has_feature(vb->vdev,
> >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> >  			adjust_managed_page_count(page, 1);
> >  		list_del(&page->lru);
> > @@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
> >  	unsigned num_freed_pages;
> >  
> >  	vb = container_of(self, struct virtio_balloon, nb);
> > -	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> >  		return NOTIFY_OK;
> >  
> >  	freed = parm;
> > ----------------------------------------
> 
> Looks right but it's probably easier to configure qemu to set that
> feature bit. Basically you just add deflate-on-oom=on to the
> balloon device.

I'm using CentOS 7 where qemu does not recognize deflate-on-oom option. ;-)

> OK. Or if you use my patch, you can just set a flag and go
> 	if (vb->oom)
> 		msleep(1000);
> at beginning of fill_balloon.

I don't think it is a good manner to sleep for long from the point of view of
system_freezable_wq, for system_freezable_wq is expected to flush shortly
according to include/linux/workqueue.h . I think that using delayed_work is better.

> > While response was better than now, inflating again spoiled the effort.
> > Retrying to inflate until allocation fails is already too painful.
> > 
> > Michael S. Tsirkin wrote:
> > > I think that's the case. Question is, when can we inflate again?
> > 
> > I think that it is when the host explicitly asked again, for
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.
> 
> Problem is host has no idea when it's safe.
> If we expect host to ask again after X seconds we
> might just as well do it in the guest.

To me, fill_balloon() with VIRTIO_BALLOON_F_DEFLATE_ON_OOM sounds like
doing

  echo 3 > /proc/sys/vm/drop_caches

where nobody knows whether it won't impact the system.
Thus, I don't think it is a problem. It will be up to administrator
who enters that command.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-18 17:16               ` Michael S. Tsirkin
@ 2017-10-19 11:52                 ` Tetsuo Handa
  2017-10-19 11:52                 ` Tetsuo Handa
  1 sibling, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-10-19 11:52 UTC (permalink / raw)
  To: mst; +Cc: virtualization, linux-mm, rmaksudova, den, mhocko

Michael S. Tsirkin wrote:
> On Wed, Oct 18, 2017 at 07:59:23PM +0900, Tetsuo Handa wrote:
> > Do you see anything wrong with the patch I used for emulating
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?
> > 
> > ----------------------------------------
> > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > index f0b3a0b..a679ac2 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> >  		}
> >  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> >  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > -		if (!virtio_has_feature(vb->vdev,
> > +		if (virtio_has_feature(vb->vdev,
> >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> >  			adjust_managed_page_count(page, -1);
> >  	}
> > @@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
> >  	struct page *page, *next;
> >  
> >  	list_for_each_entry_safe(page, next, pages, lru) {
> > -		if (!virtio_has_feature(vb->vdev,
> > +		if (virtio_has_feature(vb->vdev,
> >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> >  			adjust_managed_page_count(page, 1);
> >  		list_del(&page->lru);
> > @@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
> >  	unsigned num_freed_pages;
> >  
> >  	vb = container_of(self, struct virtio_balloon, nb);
> > -	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> >  		return NOTIFY_OK;
> >  
> >  	freed = parm;
> > ----------------------------------------
> 
> Looks right but it's probably easier to configure qemu to set that
> feature bit. Basically you just add deflate-on-oom=on to the
> balloon device.

I'm using CentOS 7 where qemu does not recognize deflate-on-oom option. ;-)

> OK. Or if you use my patch, you can just set a flag and go
> 	if (vb->oom)
> 		msleep(1000);
> at beginning of fill_balloon.

I don't think it is a good manner to sleep for long from the point of view of
system_freezable_wq, for system_freezable_wq is expected to flush shortly
according to include/linux/workqueue.h . I think that using delayed_work is better.

> > While response was better than now, inflating again spoiled the effort.
> > Retrying to inflate until allocation fails is already too painful.
> > 
> > Michael S. Tsirkin wrote:
> > > I think that's the case. Question is, when can we inflate again?
> > 
> > I think that it is when the host explicitly asked again, for
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.
> 
> Problem is host has no idea when it's safe.
> If we expect host to ask again after X seconds we
> might just as well do it in the guest.

To me, fill_balloon() with VIRTIO_BALLOON_F_DEFLATE_ON_OOM sounds like
doing

  echo 3 > /proc/sys/vm/drop_caches

where nobody knows whether it won't impact the system.
Thus, I don't think it is a problem. It will be up to administrator
who enters that command.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-19 11:52                 ` Tetsuo Handa
  2017-10-19 13:00                   ` Michael S. Tsirkin
@ 2017-10-19 13:00                   ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-19 13:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, wei.w.wang, virtualization, linux-mm, rmaksudova, den

On Thu, Oct 19, 2017 at 08:52:20PM +0900, Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > On Wed, Oct 18, 2017 at 07:59:23PM +0900, Tetsuo Handa wrote:
> > > Do you see anything wrong with the patch I used for emulating
> > > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?
> > > 
> > > ----------------------------------------
> > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > > index f0b3a0b..a679ac2 100644
> > > --- a/drivers/virtio/virtio_balloon.c
> > > +++ b/drivers/virtio/virtio_balloon.c
> > > @@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> > >  		}
> > >  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> > >  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > -		if (!virtio_has_feature(vb->vdev,
> > > +		if (virtio_has_feature(vb->vdev,
> > >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > >  			adjust_managed_page_count(page, -1);
> > >  	}
> > > @@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
> > >  	struct page *page, *next;
> > >  
> > >  	list_for_each_entry_safe(page, next, pages, lru) {
> > > -		if (!virtio_has_feature(vb->vdev,
> > > +		if (virtio_has_feature(vb->vdev,
> > >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > >  			adjust_managed_page_count(page, 1);
> > >  		list_del(&page->lru);
> > > @@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
> > >  	unsigned num_freed_pages;
> > >  
> > >  	vb = container_of(self, struct virtio_balloon, nb);
> > > -	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > >  		return NOTIFY_OK;
> > >  
> > >  	freed = parm;
> > > ----------------------------------------
> > 
> > Looks right but it's probably easier to configure qemu to set that
> > feature bit. Basically you just add deflate-on-oom=on to the
> > balloon device.
> 
> I'm using CentOS 7 where qemu does not recognize deflate-on-oom option. ;-)
> 
> > OK. Or if you use my patch, you can just set a flag and go
> > 	if (vb->oom)
> > 		msleep(1000);
> > at beginning of fill_balloon.
> 
> I don't think it is a good manner to sleep for long from the point of view of
> system_freezable_wq, for system_freezable_wq is expected to flush shortly
> according to include/linux/workqueue.h . I think that using delayed_work is better.

Well it's already using msleep, I'm fine with reworking it all to use
delayed_work. That's unrelated to the OOM issues though.

> 
> > > While response was better than now, inflating again spoiled the effort.
> > > Retrying to inflate until allocation fails is already too painful.
> > > 
> > > Michael S. Tsirkin wrote:
> > > > I think that's the case. Question is, when can we inflate again?
> > > 
> > > I think that it is when the host explicitly asked again, for
> > > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.
> > 
> > Problem is host has no idea when it's safe.
> > If we expect host to ask again after X seconds we
> > might just as well do it in the guest.
> 
> To me, fill_balloon() with VIRTIO_BALLOON_F_DEFLATE_ON_OOM sounds like
> doing
> 
>   echo 3 > /proc/sys/vm/drop_caches
> 
> where nobody knows whether it won't impact the system.
> Thus, I don't think it is a problem. It will be up to administrator
> who enters that command.

Right now existing hypervisors do not send that interrupt.
If you suggest a new feature where hypervisors send an interrupt,
that might work but will need a new feature bit. Please send an email
to virtio-dev@lists.oasis-open.org (subscriber only, sorry about that)
so the bit can be reserved.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
  2017-10-19 11:52                 ` Tetsuo Handa
@ 2017-10-19 13:00                   ` Michael S. Tsirkin
  2017-10-19 13:00                   ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2017-10-19 13:00 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: virtualization, linux-mm, rmaksudova, den, mhocko

On Thu, Oct 19, 2017 at 08:52:20PM +0900, Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > On Wed, Oct 18, 2017 at 07:59:23PM +0900, Tetsuo Handa wrote:
> > > Do you see anything wrong with the patch I used for emulating
> > > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ?
> > > 
> > > ----------------------------------------
> > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > > index f0b3a0b..a679ac2 100644
> > > --- a/drivers/virtio/virtio_balloon.c
> > > +++ b/drivers/virtio/virtio_balloon.c
> > > @@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> > >  		}
> > >  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> > >  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > -		if (!virtio_has_feature(vb->vdev,
> > > +		if (virtio_has_feature(vb->vdev,
> > >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > >  			adjust_managed_page_count(page, -1);
> > >  	}
> > > @@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
> > >  	struct page *page, *next;
> > >  
> > >  	list_for_each_entry_safe(page, next, pages, lru) {
> > > -		if (!virtio_has_feature(vb->vdev,
> > > +		if (virtio_has_feature(vb->vdev,
> > >  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > >  			adjust_managed_page_count(page, 1);
> > >  		list_del(&page->lru);
> > > @@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
> > >  	unsigned num_freed_pages;
> > >  
> > >  	vb = container_of(self, struct virtio_balloon, nb);
> > > -	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > >  		return NOTIFY_OK;
> > >  
> > >  	freed = parm;
> > > ----------------------------------------
> > 
> > Looks right but it's probably easier to configure qemu to set that
> > feature bit. Basically you just add deflate-on-oom=on to the
> > balloon device.
> 
> I'm using CentOS 7 where qemu does not recognize deflate-on-oom option. ;-)
> 
> > OK. Or if you use my patch, you can just set a flag and go
> > 	if (vb->oom)
> > 		msleep(1000);
> > at beginning of fill_balloon.
> 
> I don't think it is a good manner to sleep for long from the point of view of
> system_freezable_wq, for system_freezable_wq is expected to flush shortly
> according to include/linux/workqueue.h . I think that using delayed_work is better.

Well it's already using msleep, I'm fine with reworking it all to use
delayed_work. That's unrelated to the OOM issues though.

> 
> > > While response was better than now, inflating again spoiled the effort.
> > > Retrying to inflate until allocation fails is already too painful.
> > > 
> > > Michael S. Tsirkin wrote:
> > > > I think that's the case. Question is, when can we inflate again?
> > > 
> > > I think that it is when the host explicitly asked again, for
> > > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation.
> > 
> > Problem is host has no idea when it's safe.
> > If we expect host to ask again after X seconds we
> > might just as well do it in the guest.
> 
> To me, fill_balloon() with VIRTIO_BALLOON_F_DEFLATE_ON_OOM sounds like
> doing
> 
>   echo 3 > /proc/sys/vm/drop_caches
> 
> where nobody knows whether it won't impact the system.
> Thus, I don't think it is a problem. It will be up to administrator
> who enters that command.

Right now existing hypervisors do not send that interrupt.
If you suggest a new feature where hypervisors send an interrupt,
that might work but will need a new feature bit. Please send an email
to virtio-dev@lists.oasis-open.org (subscriber only, sorry about that)
so the bit can be reserved.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-10-19 13:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
2017-10-10 11:47 ` [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify() Michal Hocko
2017-10-10 11:47 ` Michal Hocko
2017-10-12  2:36 ` Wei Wang
2017-10-13 11:28   ` Tetsuo Handa
2017-10-13 13:19     ` Michael S. Tsirkin
2017-10-13 13:19     ` Michael S. Tsirkin
2017-10-12  2:36 ` Wei Wang
2017-10-13 13:23 ` Michael S. Tsirkin
2017-10-13 13:23 ` Michael S. Tsirkin
2017-10-13 16:41   ` Tetsuo Handa
2017-10-15  0:22     ` Michael S. Tsirkin
2017-10-15  5:38       ` Tetsuo Handa
2017-10-16 10:58         ` Tetsuo Handa
2017-10-16 17:01           ` Michael S. Tsirkin
2017-10-18 10:59             ` Tetsuo Handa
2017-10-18 17:16               ` Michael S. Tsirkin
2017-10-19 11:52                 ` Tetsuo Handa
2017-10-19 11:52                 ` Tetsuo Handa
2017-10-19 13:00                   ` Michael S. Tsirkin
2017-10-19 13:00                   ` Michael S. Tsirkin
2017-10-18 17:16               ` Michael S. Tsirkin
2017-10-18 10:59             ` Tetsuo Handa
2017-10-16 17:01           ` Michael S. Tsirkin
2017-10-16 10:58         ` Tetsuo Handa
2017-10-15  5:38       ` Tetsuo Handa
2017-10-15  0:22     ` Michael S. Tsirkin
2017-10-13 16:41   ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.