linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] virtio-balloon: Fixes + switch back to OOM handler
@ 2020-02-05 16:33 David Hildenbrand
  2020-02-05 16:34 ` [PATCH v1 1/3] virtio-balloon: Fix memory leak when unloading while hinting is in progress David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-02-05 16:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, David Hildenbrand, Alexander Duyck,
	David Rientjes, Jason Wang, Liang Li, Michael S. Tsirkin,
	Michal Hocko, Nadav Amit, Tyler Sanderson, Wei Wang

Two fixes for issues I stumbled over while working on patch #3.

Switch back to the good ol' OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
as the switch to the shrinker introduce some undesired side effects. Keep
the shrinker in place to handle VIRTIO_BALLOON_F_FREE_PAGE_HINT.
Lengthy discussion under [1].

I tested with QEMU and "deflate-on-oom=on". Works as expected. Did not
test the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, as it is
hard to trigger (only when migrating a VM, and even then, it might not
trigger).

[1] https://www.spinics.net/lists/linux-virtualization/msg40863.html

David Hildenbrand (3):
  virtio-balloon: Fix memory leak when unloading while hinting is in
    progress
  virtio_balloon: Fix memory leaks on errors in virtballoon_probe()
  virtio-balloon: Switch back to OOM handler for
    VIRTIO_BALLOON_F_DEFLATE_ON_OOM

 drivers/virtio/virtio_balloon.c | 124 +++++++++++++++-----------------
 1 file changed, 57 insertions(+), 67 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v1 1/3] virtio-balloon: Fix memory leak when unloading while hinting is in progress
  2020-02-05 16:33 [PATCH v1 0/3] virtio-balloon: Fixes + switch back to OOM handler David Hildenbrand
@ 2020-02-05 16:34 ` David Hildenbrand
  2020-02-06  8:36   ` Michael S. Tsirkin
  2020-02-05 16:34 ` [PATCH v1 2/3] virtio_balloon: Fix memory leaks on errors in virtballoon_probe() David Hildenbrand
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
  2 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2020-02-05 16:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, David Hildenbrand, Michael S. Tsirkin,
	Jason Wang, Wei Wang, Liang Li

When unloading the driver while hinting is in progress, we will not
release the free page blocks back to MM, resulting in a memory leak.

Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Liang Li <liang.z.li@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8e400ece9273..abef2306c899 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -968,6 +968,10 @@ static void remove_common(struct virtio_balloon *vb)
 		leak_balloon(vb, vb->num_pages);
 	update_balloon_size(vb);
 
+	/* There might be free pages that are being reported: release them. */
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+		return_free_pages_to_mm(vb, ULONG_MAX);
+
 	/* Now we reset the device so we can clean up the queues. */
 	vb->vdev->config->reset(vb->vdev);
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 2/3] virtio_balloon: Fix memory leaks on errors in virtballoon_probe()
  2020-02-05 16:33 [PATCH v1 0/3] virtio-balloon: Fixes + switch back to OOM handler David Hildenbrand
  2020-02-05 16:34 ` [PATCH v1 1/3] virtio-balloon: Fix memory leak when unloading while hinting is in progress David Hildenbrand
@ 2020-02-05 16:34 ` David Hildenbrand
  2020-02-06  8:36   ` Michael S. Tsirkin
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
  2 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2020-02-05 16:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, David Hildenbrand, Michael S. Tsirkin,
	Jason Wang, Wei Wang, Liang Li

We forget to put the inode and unmount the kernfs used for compaction.

Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Liang Li <liang.z.li@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index abef2306c899..7e5d84caeb94 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -901,8 +901,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb);
 	if (IS_ERR(vb->vb_dev_info.inode)) {
 		err = PTR_ERR(vb->vb_dev_info.inode);
-		kern_unmount(balloon_mnt);
-		goto out_del_vqs;
+		goto out_kern_unmount;
 	}
 	vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
 #endif
@@ -913,13 +912,13 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		 */
 		if (virtqueue_get_vring_size(vb->free_page_vq) < 2) {
 			err = -ENOSPC;
-			goto out_del_vqs;
+			goto out_iput;
 		}
 		vb->balloon_wq = alloc_workqueue("balloon-wq",
 					WQ_FREEZABLE | WQ_CPU_INTENSIVE, 0);
 		if (!vb->balloon_wq) {
 			err = -ENOMEM;
-			goto out_del_vqs;
+			goto out_iput;
 		}
 		INIT_WORK(&vb->report_free_page_work, report_free_page_func);
 		vb->cmd_id_received_cache = VIRTIO_BALLOON_CMD_ID_STOP;
@@ -953,6 +952,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
 out_del_balloon_wq:
 	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
 		destroy_workqueue(vb->balloon_wq);
+out_iput:
+#ifdef CONFIG_BALLOON_COMPACTION
+	iput(vb->vb_dev_info.inode);
+out_kern_unmount:
+	kern_unmount(balloon_mnt);
+#endif
 out_del_vqs:
 	vdev->config->del_vqs(vdev);
 out_free_vb:
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:33 [PATCH v1 0/3] virtio-balloon: Fixes + switch back to OOM handler David Hildenbrand
  2020-02-05 16:34 ` [PATCH v1 1/3] virtio-balloon: Fix memory leak when unloading while hinting is in progress David Hildenbrand
  2020-02-05 16:34 ` [PATCH v1 2/3] virtio_balloon: Fix memory leaks on errors in virtballoon_probe() David Hildenbrand
@ 2020-02-05 16:34 ` David Hildenbrand
  2020-02-05 22:37   ` Tyler Sanderson
                     ` (6 more replies)
  2 siblings, 7 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-02-05 16:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, David Hildenbrand, Tyler Sanderson,
	Michael S . Tsirkin, Wei Wang, Alexander Duyck, David Rientjes,
	Nadav Amit, Michal Hocko

Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
changed the behavior when deflation happens automatically. Instead of
deflating when called by the OOM handler, the shrinker is used.

However, the balloon is not simply some slab cache that should be
shrunk when under memory pressure. The shrinker does not have a concept of
priorities, so this behavior cannot be configured.

There was a report that this results in undesired side effects when
inflating the balloon to shrink the page cache. [1]
	"When inflating the balloon against page cache (i.e. no free memory
	 remains) vmscan.c will both shrink page cache, but also invoke the
	 shrinkers -- including the balloon's shrinker. So the balloon
	 driver allocates memory which requires reclaim, vmscan gets this
	 memory by shrinking the balloon, and then the driver adds the
	 memory back to the balloon. Basically a busy no-op."

The name "deflate on OOM" makes it pretty clear when deflation should
happen - after other approaches to reclaim memory failed, not while
reclaiming. This allows to minimize the footprint of a guest - memory
will only be taken out of the balloon when really needed.

Especially, a drop_slab() will result in the whole balloon getting
deflated - undesired. While handling it via the OOM handler might not be
perfect, it keeps existing behavior. If we want a different behavior, then
we need a new feature bit and document it properly (although, there should
be a clear use case and the intended effects should be well described).

Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
this has no such side effects. Always register the shrinker with
VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
pages that are still to be processed by the guest. The hypervisor takes
care of identifying and resolving possible races between processing a
hinting request and the guest reusing a page.

In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
notifier with shrinker"), don't add a moodule parameter to configure the
number of pages to deflate on OOM. Can be re-added if really needed.
Also, pay attention that leak_balloon() returns the number of 4k pages -
convert it properly in virtio_balloon_oom_notify().

Note1: using the OOM handler is frowned upon, but it really is what we
       need for this feature.

Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
       could actually skip sending deflation requests to our hypervisor,
       making the OOM path *very* simple. Besically freeing pages and
       updating the balloon. If the communication with the host ever
       becomes a problem on this call path.

[1] https://www.spinics.net/lists/linux-virtualization/msg40863.html

Reported-by: Tyler Sanderson <tysand@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 107 +++++++++++++-------------------
 1 file changed, 44 insertions(+), 63 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7e5d84caeb94..e7b18f556c5e 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -14,6 +14,7 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/balloon_compaction.h>
+#include <linux/oom.h>
 #include <linux/wait.h>
 #include <linux/mm.h>
 #include <linux/mount.h>
@@ -27,7 +28,9 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
-#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+/* Maximum number of (4k) pages to deflate on OOM notifications. */
+#define VIRTIO_BALLOON_OOM_NR_PAGES 256
+#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
 
 #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
 					     __GFP_NOMEMALLOC)
@@ -112,8 +115,11 @@ struct virtio_balloon {
 	/* Memory statistics */
 	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 
-	/* To register a shrinker to shrink memory upon memory pressure */
+	/* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
 	struct shrinker shrinker;
+
+	/* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
+	struct notifier_block oom_nb;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
 	return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
 }
 
-static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
-                                          unsigned long pages_to_free)
-{
-	return leak_balloon(vb, pages_to_free * VIRTIO_BALLOON_PAGES_PER_PAGE) /
-		VIRTIO_BALLOON_PAGES_PER_PAGE;
-}
-
-static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
-					  unsigned long pages_to_free)
-{
-	unsigned long pages_freed = 0;
-
-	/*
-	 * One invocation of leak_balloon can deflate at most
-	 * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
-	 * multiple times to deflate pages till reaching pages_to_free.
-	 */
-	while (vb->num_pages && pages_freed < pages_to_free)
-		pages_freed += leak_balloon_pages(vb,
-						  pages_to_free - pages_freed);
-
-	update_balloon_size(vb);
-
-	return pages_freed;
-}
-
 static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
 						  struct shrink_control *sc)
 {
-	unsigned long pages_to_free, pages_freed = 0;
 	struct virtio_balloon *vb = container_of(shrinker,
 					struct virtio_balloon, shrinker);
 
-	pages_to_free = sc->nr_to_scan;
-
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
-		pages_freed = shrink_free_pages(vb, pages_to_free);
-
-	if (pages_freed >= pages_to_free)
-		return pages_freed;
-
-	pages_freed += shrink_balloon_pages(vb, pages_to_free - pages_freed);
-
-	return pages_freed;
+	return shrink_free_pages(vb, sc->nr_to_scan);
 }
 
 static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
@@ -837,26 +806,22 @@ static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
 {
 	struct virtio_balloon *vb = container_of(shrinker,
 					struct virtio_balloon, shrinker);
-	unsigned long count;
-
-	count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
-	count += vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
 
-	return count;
+	return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
 }
 
-static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
+static int virtio_balloon_oom_notify(struct notifier_block *nb,
+				     unsigned long dummy, void *parm)
 {
-	unregister_shrinker(&vb->shrinker);
-}
+	struct virtio_balloon *vb = container_of(nb,
+						 struct virtio_balloon, oom_nb);
+	unsigned long *freed = parm;
 
-static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
-{
-	vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
-	vb->shrinker.count_objects = virtio_balloon_shrinker_count;
-	vb->shrinker.seeks = DEFAULT_SEEKS;
+	*freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
+		  VIRTIO_BALLOON_PAGES_PER_PAGE;
+	update_balloon_size(vb);
 
-	return register_shrinker(&vb->shrinker);
+	return NOTIFY_OK;
 }
 
 static int virtballoon_probe(struct virtio_device *vdev)
@@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device *vdev)
 			virtio_cwrite(vb->vdev, struct virtio_balloon_config,
 				      poison_val, &poison_val);
 		}
-	}
-	/*
-	 * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
-	 * shrinker needs to be registered to relieve memory pressure.
-	 */
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
-		err = virtio_balloon_register_shrinker(vb);
+
+		/*
+		 * We're allowed to reuse any free pages, even if they are
+		 * still to be processed by the host.
+		 */
+		vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
+		vb->shrinker.count_objects = virtio_balloon_shrinker_count;
+		vb->shrinker.seeks = DEFAULT_SEEKS;
+		err = register_shrinker(&vb->shrinker);
 		if (err)
 			goto out_del_balloon_wq;
 	}
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
+		vb->oom_nb.notifier_call = virtio_balloon_oom_notify;
+		vb->oom_nb.priority = VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY;
+		err = register_oom_notifier(&vb->oom_nb);
+		if (err < 0)
+			goto out_unregister_shrinker;
+	}
+
 	virtio_device_ready(vdev);
 
 	if (towards_target(vb))
 		virtballoon_changed(vdev);
 	return 0;
 
+out_unregister_shrinker:
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+		unregister_shrinker(&vb->shrinker);
 out_del_balloon_wq:
 	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
 		destroy_workqueue(vb->balloon_wq);
@@ -987,8 +965,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb = vdev->priv;
 
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
-		virtio_balloon_unregister_shrinker(vb);
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+		unregister_oom_notifier(&vb->oom_nb);
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+		unregister_shrinker(&vb->shrinker);
+
 	spin_lock_irq(&vb->stop_update_lock);
 	vb->stop_update = true;
 	spin_unlock_irq(&vb->stop_update_lock);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
@ 2020-02-05 22:37   ` Tyler Sanderson
  2020-02-05 22:52     ` David Hildenbrand
  2020-02-06  7:40   ` Michael S. Tsirkin
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Tyler Sanderson @ 2020-02-05 22:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Michael S . Tsirkin,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 11534 bytes --]

On Wed, Feb 5, 2020 at 8:34 AM David Hildenbrand <david@redhat.com> wrote:

> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> changed the behavior when deflation happens automatically. Instead of
> deflating when called by the OOM handler, the shrinker is used.
>
> However, the balloon is not simply some slab cache that should be
> shrunk when under memory pressure. The shrinker does not have a concept of
> priorities, so this behavior cannot be configured.
>
> There was a report that this results in undesired side effects when
> inflating the balloon to shrink the page cache. [1]
>         "When inflating the balloon against page cache (i.e. no free memory
>          remains) vmscan.c will both shrink page cache, but also invoke the
>          shrinkers -- including the balloon's shrinker. So the balloon
>          driver allocates memory which requires reclaim, vmscan gets this
>          memory by shrinking the balloon, and then the driver adds the
>          memory back to the balloon. Basically a busy no-op."
>
> The name "deflate on OOM" makes it pretty clear when deflation should
> happen - after other approaches to reclaim memory failed, not while
> reclaiming. This allows to minimize the footprint of a guest - memory
> will only be taken out of the balloon when really needed.
>
> Especially, a drop_slab() will result in the whole balloon getting
> deflated - undesired. While handling it via the OOM handler might not be
> perfect, it keeps existing behavior. If we want a different behavior, then
> we need a new feature bit and document it properly (although, there should
> be a clear use case and the intended effects should be well described).
>
> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> this has no such side effects. Always register the shrinker with
> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
> pages that are still to be processed by the guest. The hypervisor takes
> care of identifying and resolving possible races between processing a
> hinting request and the guest reusing a page.
>
> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> notifier with shrinker"), don't add a moodule parameter to configure the
> number of pages to deflate on OOM. Can be re-added if really needed.
> Also, pay attention that leak_balloon() returns the number of 4k pages -
> convert it properly in virtio_balloon_oom_notify().
>
> Note1: using the OOM handler is frowned upon, but it really is what we
>        need for this feature.
>
> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>        could actually skip sending deflation requests to our hypervisor,
>        making the OOM path *very* simple. Besically freeing pages and
>        updating the balloon. If the communication with the host ever
>        becomes a problem on this call path.
>
> [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
>
> Reported-by: Tyler Sanderson <tysand@google.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Nadav Amit <namit@vmware.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 107 +++++++++++++-------------------
>  1 file changed, 44 insertions(+), 63 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c
> b/drivers/virtio/virtio_balloon.c
> index 7e5d84caeb94..e7b18f556c5e 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -14,6 +14,7 @@
>  #include <linux/slab.h>
>  #include <linux/module.h>
>  #include <linux/balloon_compaction.h>
> +#include <linux/oom.h>
>  #include <linux/wait.h>
>  #include <linux/mm.h>
>  #include <linux/mount.h>
> @@ -27,7 +28,9 @@
>   */
>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >>
> VIRTIO_BALLOON_PFN_SHIFT)
>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
> -#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> +/* Maximum number of (4k) pages to deflate on OOM notifications. */
> +#define VIRTIO_BALLOON_OOM_NR_PAGES 256
> +#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
>
>  #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN
> | \
>                                              __GFP_NOMEMALLOC)
> @@ -112,8 +115,11 @@ struct virtio_balloon {
>         /* Memory statistics */
>         struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
>
> -       /* To register a shrinker to shrink memory upon memory pressure */
> +       /* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT
> */
>         struct shrinker shrinker;
> +
> +       /* OOM notifier to deflate on OOM -
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
> +       struct notifier_block oom_nb;
>  };
>
>  static struct virtio_device_id id_table[] = {
> @@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct
> virtio_balloon *vb,
>         return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>
> -static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
> -                                          unsigned long pages_to_free)
> -{
> -       return leak_balloon(vb, pages_to_free *
> VIRTIO_BALLOON_PAGES_PER_PAGE) /
> -               VIRTIO_BALLOON_PAGES_PER_PAGE;
> -}
> -
> -static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
> -                                         unsigned long pages_to_free)
> -{
> -       unsigned long pages_freed = 0;
> -
> -       /*
> -        * One invocation of leak_balloon can deflate at most
> -        * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
> -        * multiple times to deflate pages till reaching pages_to_free.
> -        */
> -       while (vb->num_pages && pages_freed < pages_to_free)
> -               pages_freed += leak_balloon_pages(vb,
> -                                                 pages_to_free -
> pages_freed);
> -
> -       update_balloon_size(vb);
> -
> -       return pages_freed;
> -}
> -
>  static unsigned long virtio_balloon_shrinker_scan(struct shrinker
> *shrinker,
>                                                   struct shrink_control
> *sc)
>  {
> -       unsigned long pages_to_free, pages_freed = 0;
>         struct virtio_balloon *vb = container_of(shrinker,
>                                         struct virtio_balloon, shrinker);
>
> -       pages_to_free = sc->nr_to_scan;
> -
> -       if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> -               pages_freed = shrink_free_pages(vb, pages_to_free);
> -
> -       if (pages_freed >= pages_to_free)
> -               return pages_freed;
> -
> -       pages_freed += shrink_balloon_pages(vb, pages_to_free -
> pages_freed);
> -
> -       return pages_freed;
> +       return shrink_free_pages(vb, sc->nr_to_scan);
>  }
>
>  static unsigned long virtio_balloon_shrinker_count(struct shrinker
> *shrinker,
> @@ -837,26 +806,22 @@ static unsigned long
> virtio_balloon_shrinker_count(struct shrinker *shrinker,
>  {
>         struct virtio_balloon *vb = container_of(shrinker,
>                                         struct virtio_balloon, shrinker);
> -       unsigned long count;
> -
> -       count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
> -       count += vb->num_free_page_blocks *
> VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>
> -       return count;
> +       return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>
> -static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
> +static int virtio_balloon_oom_notify(struct notifier_block *nb,
> +                                    unsigned long dummy, void *parm)
>  {
> -       unregister_shrinker(&vb->shrinker);
> -}
> +       struct virtio_balloon *vb = container_of(nb,
> +                                                struct virtio_balloon,
> oom_nb);
> +       unsigned long *freed = parm;
>
> -static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
> -{
> -       vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> -       vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> -       vb->shrinker.seeks = DEFAULT_SEEKS;
> +       *freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
> +                 VIRTIO_BALLOON_PAGES_PER_PAGE;
> +       update_balloon_size(vb);
>
> -       return register_shrinker(&vb->shrinker);
> +       return NOTIFY_OK;
>  }
>
>  static int virtballoon_probe(struct virtio_device *vdev)
> @@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device
> *vdev)
>                         virtio_cwrite(vb->vdev, struct
> virtio_balloon_config,
>                                       poison_val, &poison_val);
>                 }
> -       }
> -       /*
> -        * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if
> a
> -        * shrinker needs to be registered to relieve memory pressure.
> -        */
> -       if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> {
> -               err = virtio_balloon_register_shrinker(vb);
> +
> +               /*
> +                * We're allowed to reuse any free pages, even if they are
> +                * still to be processed by the host.
>
It is important to clarify that pages that are on the inflate queue but not
ACKed by the host (the queue entry has not been returned) are _not_ okay to
reuse.
If the host is going to do something destructive to the page (like deback
it) then that needs to happen before the entry is returned.

+                */
> +               vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> +               vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> +               vb->shrinker.seeks = DEFAULT_SEEKS;
> +               err = register_shrinker(&vb->shrinker);
>                 if (err)
>                         goto out_del_balloon_wq;
>         }
> +       if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
> +               vb->oom_nb.notifier_call = virtio_balloon_oom_notify;
> +               vb->oom_nb.priority = VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY;
> +               err = register_oom_notifier(&vb->oom_nb);
> +               if (err < 0)
> +                       goto out_unregister_shrinker;
> +       }
> +
>         virtio_device_ready(vdev);
>
>         if (towards_target(vb))
>                 virtballoon_changed(vdev);
>         return 0;
>
> +out_unregister_shrinker:
> +       if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +               unregister_shrinker(&vb->shrinker);
>  out_del_balloon_wq:
>         if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>                 destroy_workqueue(vb->balloon_wq);
> @@ -987,8 +965,11 @@ static void virtballoon_remove(struct virtio_device
> *vdev)
>  {
>         struct virtio_balloon *vb = vdev->priv;
>
> -       if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> -               virtio_balloon_unregister_shrinker(vb);
> +       if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> +               unregister_oom_notifier(&vb->oom_nb);
> +       if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +               unregister_shrinker(&vb->shrinker);
> +
>         spin_lock_irq(&vb->stop_update_lock);
>         vb->stop_update = true;
>         spin_unlock_irq(&vb->stop_update_lock);
> --
> 2.24.1
>
>

[-- Attachment #2: Type: text/html, Size: 14388 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 22:37   ` Tyler Sanderson
@ 2020-02-05 22:52     ` David Hildenbrand
  2020-02-05 23:06       ` Tyler Sanderson
  0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2020-02-05 22:52 UTC (permalink / raw)
  To: Tyler Sanderson
  Cc: David Hildenbrand, linux-kernel, linux-mm, virtualization,
	Michael S . Tsirkin, Wei Wang, Alexander Duyck, David Rientjes,
	Nadav Amit, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 10424 bytes --]



> Am 05.02.2020 um 23:37 schrieb Tyler Sanderson <tysand@google.com>:
> 
> 
> 
> 
>> On Wed, Feb 5, 2020 at 8:34 AM David Hildenbrand <david@redhat.com> wrote:
>> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
>> changed the behavior when deflation happens automatically. Instead of
>> deflating when called by the OOM handler, the shrinker is used.
>> 
>> However, the balloon is not simply some slab cache that should be
>> shrunk when under memory pressure. The shrinker does not have a concept of
>> priorities, so this behavior cannot be configured.
>> 
>> There was a report that this results in undesired side effects when
>> inflating the balloon to shrink the page cache. [1]
>>         "When inflating the balloon against page cache (i.e. no free memory
>>          remains) vmscan.c will both shrink page cache, but also invoke the
>>          shrinkers -- including the balloon's shrinker. So the balloon
>>          driver allocates memory which requires reclaim, vmscan gets this
>>          memory by shrinking the balloon, and then the driver adds the
>>          memory back to the balloon. Basically a busy no-op."
>> 
>> The name "deflate on OOM" makes it pretty clear when deflation should
>> happen - after other approaches to reclaim memory failed, not while
>> reclaiming. This allows to minimize the footprint of a guest - memory
>> will only be taken out of the balloon when really needed.
>> 
>> Especially, a drop_slab() will result in the whole balloon getting
>> deflated - undesired. While handling it via the OOM handler might not be
>> perfect, it keeps existing behavior. If we want a different behavior, then
>> we need a new feature bit and document it properly (although, there should
>> be a clear use case and the intended effects should be well described).
>> 
>> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
>> this has no such side effects. Always register the shrinker with
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
>> pages that are still to be processed by the guest. The hypervisor takes
>> care of identifying and resolving possible races between processing a
>> hinting request and the guest reusing a page.
>> 
>> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
>> notifier with shrinker"), don't add a moodule parameter to configure the
>> number of pages to deflate on OOM. Can be re-added if really needed.
>> Also, pay attention that leak_balloon() returns the number of 4k pages -
>> convert it properly in virtio_balloon_oom_notify().
>> 
>> Note1: using the OOM handler is frowned upon, but it really is what we
>>        need for this feature.
>> 
>> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>>        could actually skip sending deflation requests to our hypervisor,
>>        making the OOM path *very* simple. Besically freeing pages and
>>        updating the balloon. If the communication with the host ever
>>        becomes a problem on this call path.
>> 
>> [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
>> 
>> Reported-by: Tyler Sanderson <tysand@google.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Cc: Wei Wang <wei.w.wang@intel.com>
>> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Nadav Amit <namit@vmware.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  drivers/virtio/virtio_balloon.c | 107 +++++++++++++-------------------
>>  1 file changed, 44 insertions(+), 63 deletions(-)
>> 
>> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
>> index 7e5d84caeb94..e7b18f556c5e 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -14,6 +14,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/module.h>
>>  #include <linux/balloon_compaction.h>
>> +#include <linux/oom.h>
>>  #include <linux/wait.h>
>>  #include <linux/mm.h>
>>  #include <linux/mount.h>
>> @@ -27,7 +28,9 @@
>>   */
>>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
>>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>> -#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>> +/* Maximum number of (4k) pages to deflate on OOM notifications. */
>> +#define VIRTIO_BALLOON_OOM_NR_PAGES 256
>> +#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
>> 
>>  #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
>>                                              __GFP_NOMEMALLOC)
>> @@ -112,8 +115,11 @@ struct virtio_balloon {
>>         /* Memory statistics */
>>         struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
>> 
>> -       /* To register a shrinker to shrink memory upon memory pressure */
>> +       /* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
>>         struct shrinker shrinker;
>> +
>> +       /* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
>> +       struct notifier_block oom_nb;
>>  };
>> 
>>  static struct virtio_device_id id_table[] = {
>> @@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
>>         return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>>  }
>> 
>> -static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
>> -                                          unsigned long pages_to_free)
>> -{
>> -       return leak_balloon(vb, pages_to_free * VIRTIO_BALLOON_PAGES_PER_PAGE) /
>> -               VIRTIO_BALLOON_PAGES_PER_PAGE;
>> -}
>> -
>> -static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
>> -                                         unsigned long pages_to_free)
>> -{
>> -       unsigned long pages_freed = 0;
>> -
>> -       /*
>> -        * One invocation of leak_balloon can deflate at most
>> -        * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
>> -        * multiple times to deflate pages till reaching pages_to_free.
>> -        */
>> -       while (vb->num_pages && pages_freed < pages_to_free)
>> -               pages_freed += leak_balloon_pages(vb,
>> -                                                 pages_to_free - pages_freed);
>> -
>> -       update_balloon_size(vb);
>> -
>> -       return pages_freed;
>> -}
>> -
>>  static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
>>                                                   struct shrink_control *sc)
>>  {
>> -       unsigned long pages_to_free, pages_freed = 0;
>>         struct virtio_balloon *vb = container_of(shrinker,
>>                                         struct virtio_balloon, shrinker);
>> 
>> -       pages_to_free = sc->nr_to_scan;
>> -
>> -       if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>> -               pages_freed = shrink_free_pages(vb, pages_to_free);
>> -
>> -       if (pages_freed >= pages_to_free)
>> -               return pages_freed;
>> -
>> -       pages_freed += shrink_balloon_pages(vb, pages_to_free - pages_freed);
>> -
>> -       return pages_freed;
>> +       return shrink_free_pages(vb, sc->nr_to_scan);
>>  }
>> 
>>  static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
>> @@ -837,26 +806,22 @@ static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
>>  {
>>         struct virtio_balloon *vb = container_of(shrinker,
>>                                         struct virtio_balloon, shrinker);
>> -       unsigned long count;
>> -
>> -       count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
>> -       count += vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>> 
>> -       return count;
>> +       return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>>  }
>> 
>> -static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
>> +static int virtio_balloon_oom_notify(struct notifier_block *nb,
>> +                                    unsigned long dummy, void *parm)
>>  {
>> -       unregister_shrinker(&vb->shrinker);
>> -}
>> +       struct virtio_balloon *vb = container_of(nb,
>> +                                                struct virtio_balloon, oom_nb);
>> +       unsigned long *freed = parm;
>> 
>> -static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
>> -{
>> -       vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
>> -       vb->shrinker.count_objects = virtio_balloon_shrinker_count;
>> -       vb->shrinker.seeks = DEFAULT_SEEKS;
>> +       *freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
>> +                 VIRTIO_BALLOON_PAGES_PER_PAGE;
>> +       update_balloon_size(vb);
>> 
>> -       return register_shrinker(&vb->shrinker);
>> +       return NOTIFY_OK;
>>  }
>> 
>>  static int virtballoon_probe(struct virtio_device *vdev)
>> @@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device *vdev)
>>                         virtio_cwrite(vb->vdev, struct virtio_balloon_config,
>>                                       poison_val, &poison_val);
>>                 }
>> -       }
>> -       /*
>> -        * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
>> -        * shrinker needs to be registered to relieve memory pressure.
>> -        */
>> -       if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
>> -               err = virtio_balloon_register_shrinker(vb);
>> +
>> +               /*
>> +                * We're allowed to reuse any free pages, even if they are
>> +                * still to be processed by the host.
> It is important to clarify that pages that are on the inflate queue but not ACKed by the host (the queue entry has not been returned) are _not_ okay to reuse.
> If the host is going to do something destructive to the page (like deback it) then that needs to happen before the entry is returned.

While you are correct, this comment is in the „free page hinting“ section/if (not obvious by looking at the diff only), so it does not apply to inflate/deflate queues - but only free pages that are getting hinted. Or am I misreading your suggestion/missing something?

Thanks!



[-- Attachment #2: Type: text/html, Size: 14853 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 22:52     ` David Hildenbrand
@ 2020-02-05 23:06       ` Tyler Sanderson
  0 siblings, 0 replies; 32+ messages in thread
From: Tyler Sanderson @ 2020-02-05 23:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Michael S . Tsirkin,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 10633 bytes --]

On Wed, Feb 5, 2020 at 2:52 PM David Hildenbrand <david@redhat.com> wrote:

>
>
> Am 05.02.2020 um 23:37 schrieb Tyler Sanderson <tysand@google.com>:
>
> 
>
>
> On Wed, Feb 5, 2020 at 8:34 AM David Hildenbrand <david@redhat.com> wrote:
>
>> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
>> changed the behavior when deflation happens automatically. Instead of
>> deflating when called by the OOM handler, the shrinker is used.
>>
>> However, the balloon is not simply some slab cache that should be
>> shrunk when under memory pressure. The shrinker does not have a concept of
>> priorities, so this behavior cannot be configured.
>>
>> There was a report that this results in undesired side effects when
>> inflating the balloon to shrink the page cache. [1]
>>         "When inflating the balloon against page cache (i.e. no free
>> memory
>>          remains) vmscan.c will both shrink page cache, but also invoke
>> the
>>          shrinkers -- including the balloon's shrinker. So the balloon
>>          driver allocates memory which requires reclaim, vmscan gets this
>>          memory by shrinking the balloon, and then the driver adds the
>>          memory back to the balloon. Basically a busy no-op."
>>
>> The name "deflate on OOM" makes it pretty clear when deflation should
>> happen - after other approaches to reclaim memory failed, not while
>> reclaiming. This allows to minimize the footprint of a guest - memory
>> will only be taken out of the balloon when really needed.
>>
>> Especially, a drop_slab() will result in the whole balloon getting
>> deflated - undesired. While handling it via the OOM handler might not be
>> perfect, it keeps existing behavior. If we want a different behavior, then
>> we need a new feature bit and document it properly (although, there should
>> be a clear use case and the intended effects should be well described).
>>
>> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
>> this has no such side effects. Always register the shrinker with
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
>> pages that are still to be processed by the guest. The hypervisor takes
>> care of identifying and resolving possible races between processing a
>> hinting request and the guest reusing a page.
>>
>> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
>> notifier with shrinker"), don't add a moodule parameter to configure the
>> number of pages to deflate on OOM. Can be re-added if really needed.
>> Also, pay attention that leak_balloon() returns the number of 4k pages -
>> convert it properly in virtio_balloon_oom_notify().
>>
>> Note1: using the OOM handler is frowned upon, but it really is what we
>>        need for this feature.
>>
>> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>>        could actually skip sending deflation requests to our hypervisor,
>>        making the OOM path *very* simple. Besically freeing pages and
>>        updating the balloon. If the communication with the host ever
>>        becomes a problem on this call path.
>>
>> [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
>>
>> Reported-by: Tyler Sanderson <tysand@google.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Cc: Wei Wang <wei.w.wang@intel.com>
>> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Nadav Amit <namit@vmware.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  drivers/virtio/virtio_balloon.c | 107 +++++++++++++-------------------
>>  1 file changed, 44 insertions(+), 63 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_balloon.c
>> b/drivers/virtio/virtio_balloon.c
>> index 7e5d84caeb94..e7b18f556c5e 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -14,6 +14,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/module.h>
>>  #include <linux/balloon_compaction.h>
>> +#include <linux/oom.h>
>>  #include <linux/wait.h>
>>  #include <linux/mm.h>
>>  #include <linux/mount.h>
>> @@ -27,7 +28,9 @@
>>   */
>>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >>
>> VIRTIO_BALLOON_PFN_SHIFT)
>>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>> -#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>> +/* Maximum number of (4k) pages to deflate on OOM notifications. */
>> +#define VIRTIO_BALLOON_OOM_NR_PAGES 256
>> +#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
>>
>>  #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY |
>> __GFP_NOWARN | \
>>                                              __GFP_NOMEMALLOC)
>> @@ -112,8 +115,11 @@ struct virtio_balloon {
>>         /* Memory statistics */
>>         struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
>>
>> -       /* To register a shrinker to shrink memory upon memory pressure */
>> +       /* Shrinker to return free pages -
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT */
>>         struct shrinker shrinker;
>> +
>> +       /* OOM notifier to deflate on OOM -
>> VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
>> +       struct notifier_block oom_nb;
>>  };
>>
>>  static struct virtio_device_id id_table[] = {
>> @@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct
>> virtio_balloon *vb,
>>         return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>>  }
>>
>> -static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
>> -                                          unsigned long pages_to_free)
>> -{
>> -       return leak_balloon(vb, pages_to_free *
>> VIRTIO_BALLOON_PAGES_PER_PAGE) /
>> -               VIRTIO_BALLOON_PAGES_PER_PAGE;
>> -}
>> -
>> -static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
>> -                                         unsigned long pages_to_free)
>> -{
>> -       unsigned long pages_freed = 0;
>> -
>> -       /*
>> -        * One invocation of leak_balloon can deflate at most
>> -        * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
>> -        * multiple times to deflate pages till reaching pages_to_free.
>> -        */
>> -       while (vb->num_pages && pages_freed < pages_to_free)
>> -               pages_freed += leak_balloon_pages(vb,
>> -                                                 pages_to_free -
>> pages_freed);
>> -
>> -       update_balloon_size(vb);
>> -
>> -       return pages_freed;
>> -}
>> -
>>  static unsigned long virtio_balloon_shrinker_scan(struct shrinker
>> *shrinker,
>>                                                   struct shrink_control
>> *sc)
>>  {
>> -       unsigned long pages_to_free, pages_freed = 0;
>>         struct virtio_balloon *vb = container_of(shrinker,
>>                                         struct virtio_balloon, shrinker);
>>
>> -       pages_to_free = sc->nr_to_scan;
>> -
>> -       if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>> -               pages_freed = shrink_free_pages(vb, pages_to_free);
>> -
>> -       if (pages_freed >= pages_to_free)
>> -               return pages_freed;
>> -
>> -       pages_freed += shrink_balloon_pages(vb, pages_to_free -
>> pages_freed);
>> -
>> -       return pages_freed;
>> +       return shrink_free_pages(vb, sc->nr_to_scan);
>>  }
>>
>>  static unsigned long virtio_balloon_shrinker_count(struct shrinker
>> *shrinker,
>> @@ -837,26 +806,22 @@ static unsigned long
>> virtio_balloon_shrinker_count(struct shrinker *shrinker,
>>  {
>>         struct virtio_balloon *vb = container_of(shrinker,
>>                                         struct virtio_balloon, shrinker);
>> -       unsigned long count;
>> -
>> -       count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
>> -       count += vb->num_free_page_blocks *
>> VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>>
>> -       return count;
>> +       return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>>  }
>>
>> -static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
>> +static int virtio_balloon_oom_notify(struct notifier_block *nb,
>> +                                    unsigned long dummy, void *parm)
>>  {
>> -       unregister_shrinker(&vb->shrinker);
>> -}
>> +       struct virtio_balloon *vb = container_of(nb,
>> +                                                struct virtio_balloon,
>> oom_nb);
>> +       unsigned long *freed = parm;
>>
>> -static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
>> -{
>> -       vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
>> -       vb->shrinker.count_objects = virtio_balloon_shrinker_count;
>> -       vb->shrinker.seeks = DEFAULT_SEEKS;
>> +       *freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
>> +                 VIRTIO_BALLOON_PAGES_PER_PAGE;
>> +       update_balloon_size(vb);
>>
>> -       return register_shrinker(&vb->shrinker);
>> +       return NOTIFY_OK;
>>  }
>>
>>  static int virtballoon_probe(struct virtio_device *vdev)
>> @@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device
>> *vdev)
>>                         virtio_cwrite(vb->vdev, struct
>> virtio_balloon_config,
>>                                       poison_val, &poison_val);
>>                 }
>> -       }
>> -       /*
>> -        * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide
>> if a
>> -        * shrinker needs to be registered to relieve memory pressure.
>> -        */
>> -       if (virtio_has_feature(vb->vdev,
>> VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
>> -               err = virtio_balloon_register_shrinker(vb);
>> +
>> +               /*
>> +                * We're allowed to reuse any free pages, even if they are
>> +                * still to be processed by the host.
>>
> It is important to clarify that pages that are on the inflate queue but
> not ACKed by the host (the queue entry has not been returned) are _not_
> okay to reuse.
> If the host is going to do something destructive to the page (like deback
> it) then that needs to happen before the entry is returned.
>
>
> While you are correct, this comment is in the „free page hinting“
> section/if (not obvious by looking at the diff only), so it does not apply
> to inflate/deflate queues - but only free pages that are getting hinted. Or
> am I misreading your suggestion/missing something?
>
Ah you are right. Thanks!

>
> Thanks!
>
>
>

[-- Attachment #2: Type: text/html, Size: 13158 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
  2020-02-05 22:37   ` Tyler Sanderson
@ 2020-02-06  7:40   ` Michael S. Tsirkin
  2020-02-06  8:42     ` David Hildenbrand
  2020-02-06  8:57   ` Wang, Wei W
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  7:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Wed, Feb 05, 2020 at 05:34:02PM +0100, David Hildenbrand wrote:
> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> changed the behavior when deflation happens automatically. Instead of
> deflating when called by the OOM handler, the shrinker is used.
> 
> However, the balloon is not simply some slab cache that should be
> shrunk when under memory pressure. The shrinker does not have a concept of
> priorities, so this behavior cannot be configured.
> 
> There was a report that this results in undesired side effects when
> inflating the balloon to shrink the page cache. [1]
> 	"When inflating the balloon against page cache (i.e. no free memory
> 	 remains) vmscan.c will both shrink page cache, but also invoke the
> 	 shrinkers -- including the balloon's shrinker. So the balloon
> 	 driver allocates memory which requires reclaim, vmscan gets this
> 	 memory by shrinking the balloon, and then the driver adds the
> 	 memory back to the balloon. Basically a busy no-op."
> 
> The name "deflate on OOM" makes it pretty clear when deflation should
> happen - after other approaches to reclaim memory failed, not while
> reclaiming. This allows to minimize the footprint of a guest - memory
> will only be taken out of the balloon when really needed.
> 
> Especially, a drop_slab() will result in the whole balloon getting
> deflated - undesired. While handling it via the OOM handler might not be
> perfect, it keeps existing behavior. If we want a different behavior, then
> we need a new feature bit and document it properly (although, there should
> be a clear use case and the intended effects should be well described).
> 
> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> this has no such side effects. Always register the shrinker with
> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
> pages that are still to be processed by the guest. The hypervisor takes
> care of identifying and resolving possible races between processing a
> hinting request and the guest reusing a page.
> 
> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> notifier with shrinker"), don't add a moodule parameter to configure the
> number of pages to deflate on OOM. Can be re-added if really needed.

I agree. And to make this case even stronger:

The oom_pages module parameter was known to be broken: whatever its
value, we return at most VIRTIO_BALLOON_ARRAY_PFNS_MAX.  So module
parameter values > 256 never worked, and it seems highly unlikely that
freeing 1Mbyte on OOM is too aggressive.
There was a patch
 virtio-balloon: deflate up to oom_pages on OOM
by Wei Wang to try to fix it:
https://lore.kernel.org/r/1508500466-21165-3-git-send-email-wei.w.wang@intel.com
but this was dropped.

> Also, pay attention that leak_balloon() returns the number of 4k pages -
> convert it properly in virtio_balloon_oom_notify().

Oh. So it was returning a wrong value originally (before 71994620bb25).
However what really matters for notifiers is whether the value is 0 -
whether we made progress. So it's cosmetic.

> Note1: using the OOM handler is frowned upon, but it really is what we
>        need for this feature.

Quite. However, I went back researching why we dropped the OOM notifier,
and found this:

https://lore.kernel.org/r/1508500466-21165-2-git-send-email-wei.w.wang@intel.com

To quote from there:

The balloon_lock was used to synchronize the access demand to elements
of struct virtio_balloon and its queue operations (please see commit
e22504296d). This prevents the concurrent run of the leak_balloon and
fill_balloon functions, thereby resulting in a deadlock issue on OOM:

fill_balloon: take balloon_lock and wait for OOM to get some memory;
oom_notify: release some inflated memory via leak_balloon();
leak_balloon: wait for balloon_lock to be released by fill_balloon.





> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>        could actually skip sending deflation requests to our hypervisor,
>        making the OOM path *very* simple. Besically freeing pages and
>        updating the balloon.

Well not exactly. !VIRTIO_BALLOON_F_MUST_TELL_HOST does not actually
mean "never tell host". It means "host will not discard pages in the
balloon, you can defer host notification until after use".

This was the original implementation:

+       if (vb->tell_host_first) {
+               tell_host(vb, vb->deflate_vq);
+               release_pages_by_pfn(vb->pfns, vb->num_pfns);
+       } else {
+               release_pages_by_pfn(vb->pfns, vb->num_pfns);
+               tell_host(vb, vb->deflate_vq);
+       }
+}

I don't know whether completely skipping host notifications
when !VIRTIO_BALLOON_F_MUST_TELL_HOST will break any hosts.

>	 If the communication with the host ever
>        becomes a problem on this call path.
> 
> [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
> 
> Reported-by: Tyler Sanderson <tysand@google.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Nadav Amit <namit@vmware.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 107 +++++++++++++-------------------
>  1 file changed, 44 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 7e5d84caeb94..e7b18f556c5e 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -14,6 +14,7 @@
>  #include <linux/slab.h>
>  #include <linux/module.h>
>  #include <linux/balloon_compaction.h>
> +#include <linux/oom.h>
>  #include <linux/wait.h>
>  #include <linux/mm.h>
>  #include <linux/mount.h>
> @@ -27,7 +28,9 @@
>   */
>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
> -#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> +/* Maximum number of (4k) pages to deflate on OOM notifications. */
> +#define VIRTIO_BALLOON_OOM_NR_PAGES 256
> +#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
>  
>  #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
>  					     __GFP_NOMEMALLOC)
> @@ -112,8 +115,11 @@ struct virtio_balloon {
>  	/* Memory statistics */
>  	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
>  
> -	/* To register a shrinker to shrink memory upon memory pressure */
> +	/* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
>  	struct shrinker shrinker;
> +
> +	/* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
> +	struct notifier_block oom_nb;
>  };
>  
>  static struct virtio_device_id id_table[] = {
> @@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
>  	return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>  
> -static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
> -                                          unsigned long pages_to_free)
> -{
> -	return leak_balloon(vb, pages_to_free * VIRTIO_BALLOON_PAGES_PER_PAGE) /
> -		VIRTIO_BALLOON_PAGES_PER_PAGE;
> -}
> -
> -static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
> -					  unsigned long pages_to_free)
> -{
> -	unsigned long pages_freed = 0;
> -
> -	/*
> -	 * One invocation of leak_balloon can deflate at most
> -	 * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
> -	 * multiple times to deflate pages till reaching pages_to_free.
> -	 */
> -	while (vb->num_pages && pages_freed < pages_to_free)
> -		pages_freed += leak_balloon_pages(vb,
> -						  pages_to_free - pages_freed);
> -
> -	update_balloon_size(vb);
> -
> -	return pages_freed;
> -}
> -
>  static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
>  						  struct shrink_control *sc)
>  {
> -	unsigned long pages_to_free, pages_freed = 0;
>  	struct virtio_balloon *vb = container_of(shrinker,
>  					struct virtio_balloon, shrinker);
>  
> -	pages_to_free = sc->nr_to_scan;
> -
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> -		pages_freed = shrink_free_pages(vb, pages_to_free);
> -
> -	if (pages_freed >= pages_to_free)
> -		return pages_freed;
> -
> -	pages_freed += shrink_balloon_pages(vb, pages_to_free - pages_freed);
> -
> -	return pages_freed;
> +	return shrink_free_pages(vb, sc->nr_to_scan);
>  }
>  
>  static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
> @@ -837,26 +806,22 @@ static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
>  {
>  	struct virtio_balloon *vb = container_of(shrinker,
>  					struct virtio_balloon, shrinker);
> -	unsigned long count;
> -
> -	count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
> -	count += vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  
> -	return count;
> +	return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>  
> -static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
> +static int virtio_balloon_oom_notify(struct notifier_block *nb,
> +				     unsigned long dummy, void *parm)
>  {
> -	unregister_shrinker(&vb->shrinker);
> -}
> +	struct virtio_balloon *vb = container_of(nb,
> +						 struct virtio_balloon, oom_nb);
> +	unsigned long *freed = parm;
>  
> -static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
> -{
> -	vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> -	vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> -	vb->shrinker.seeks = DEFAULT_SEEKS;
> +	*freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
> +		  VIRTIO_BALLOON_PAGES_PER_PAGE;
> +	update_balloon_size(vb);
>  
> -	return register_shrinker(&vb->shrinker);
> +	return NOTIFY_OK;
>  }
>  
>  static int virtballoon_probe(struct virtio_device *vdev)
> @@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  			virtio_cwrite(vb->vdev, struct virtio_balloon_config,
>  				      poison_val, &poison_val);
>  		}
> -	}
> -	/*
> -	 * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
> -	 * shrinker needs to be registered to relieve memory pressure.
> -	 */
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
> -		err = virtio_balloon_register_shrinker(vb);
> +
> +		/*
> +		 * We're allowed to reuse any free pages, even if they are
> +		 * still to be processed by the host.
> +		 */
> +		vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> +		vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> +		vb->shrinker.seeks = DEFAULT_SEEKS;
> +		err = register_shrinker(&vb->shrinker);
>  		if (err)
>  			goto out_del_balloon_wq;
>  	}
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
> +		vb->oom_nb.notifier_call = virtio_balloon_oom_notify;
> +		vb->oom_nb.priority = VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY;
> +		err = register_oom_notifier(&vb->oom_nb);
> +		if (err < 0)
> +			goto out_unregister_shrinker;
> +	}
> +
>  	virtio_device_ready(vdev);
>  
>  	if (towards_target(vb))
>  		virtballoon_changed(vdev);
>  	return 0;
>  
> +out_unregister_shrinker:
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +		unregister_shrinker(&vb->shrinker);
>  out_del_balloon_wq:
>  	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>  		destroy_workqueue(vb->balloon_wq);
> @@ -987,8 +965,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb = vdev->priv;
>  
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> -		virtio_balloon_unregister_shrinker(vb);
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> +		unregister_oom_notifier(&vb->oom_nb);
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +		unregister_shrinker(&vb->shrinker);
> +
>  	spin_lock_irq(&vb->stop_update_lock);
>  	vb->stop_update = true;
>  	spin_unlock_irq(&vb->stop_update_lock);
> -- 
> 2.24.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 1/3] virtio-balloon: Fix memory leak when unloading while hinting is in progress
  2020-02-05 16:34 ` [PATCH v1 1/3] virtio-balloon: Fix memory leak when unloading while hinting is in progress David Hildenbrand
@ 2020-02-06  8:36   ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Jason Wang, Wei Wang, Liang Li

On Wed, Feb 05, 2020 at 05:34:00PM +0100, David Hildenbrand wrote:
> When unloading the driver while hinting is in progress, we will not
> release the free page blocks back to MM, resulting in a memory leak.
> 
> Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Liang Li <liang.z.li@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Applied, thanks!

> ---
>  drivers/virtio/virtio_balloon.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8e400ece9273..abef2306c899 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -968,6 +968,10 @@ static void remove_common(struct virtio_balloon *vb)
>  		leak_balloon(vb, vb->num_pages);
>  	update_balloon_size(vb);
>  
> +	/* There might be free pages that are being reported: release them. */
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +		return_free_pages_to_mm(vb, ULONG_MAX);
> +
>  	/* Now we reset the device so we can clean up the queues. */
>  	vb->vdev->config->reset(vb->vdev);
>  
> -- 
> 2.24.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 2/3] virtio_balloon: Fix memory leaks on errors in virtballoon_probe()
  2020-02-05 16:34 ` [PATCH v1 2/3] virtio_balloon: Fix memory leaks on errors in virtballoon_probe() David Hildenbrand
@ 2020-02-06  8:36   ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Jason Wang, Wei Wang, Liang Li

On Wed, Feb 05, 2020 at 05:34:01PM +0100, David Hildenbrand wrote:
> We forget to put the inode and unmount the kernfs used for compaction.
> 
> Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Liang Li <liang.z.li@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Applied, thanks!

> ---
>  drivers/virtio/virtio_balloon.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index abef2306c899..7e5d84caeb94 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -901,8 +901,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb);
>  	if (IS_ERR(vb->vb_dev_info.inode)) {
>  		err = PTR_ERR(vb->vb_dev_info.inode);
> -		kern_unmount(balloon_mnt);
> -		goto out_del_vqs;
> +		goto out_kern_unmount;
>  	}
>  	vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
>  #endif
> @@ -913,13 +912,13 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  		 */
>  		if (virtqueue_get_vring_size(vb->free_page_vq) < 2) {
>  			err = -ENOSPC;
> -			goto out_del_vqs;
> +			goto out_iput;
>  		}
>  		vb->balloon_wq = alloc_workqueue("balloon-wq",
>  					WQ_FREEZABLE | WQ_CPU_INTENSIVE, 0);
>  		if (!vb->balloon_wq) {
>  			err = -ENOMEM;
> -			goto out_del_vqs;
> +			goto out_iput;
>  		}
>  		INIT_WORK(&vb->report_free_page_work, report_free_page_func);
>  		vb->cmd_id_received_cache = VIRTIO_BALLOON_CMD_ID_STOP;
> @@ -953,6 +952,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  out_del_balloon_wq:
>  	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>  		destroy_workqueue(vb->balloon_wq);
> +out_iput:
> +#ifdef CONFIG_BALLOON_COMPACTION
> +	iput(vb->vb_dev_info.inode);
> +out_kern_unmount:
> +	kern_unmount(balloon_mnt);
> +#endif
>  out_del_vqs:
>  	vdev->config->del_vqs(vdev);
>  out_free_vb:
> -- 
> 2.24.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-06  7:40   ` Michael S. Tsirkin
@ 2020-02-06  8:42     ` David Hildenbrand
  2020-02-06  8:57       ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2020-02-06  8:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On 06.02.20 08:40, Michael S. Tsirkin wrote:
> On Wed, Feb 05, 2020 at 05:34:02PM +0100, David Hildenbrand wrote:
>> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
>> changed the behavior when deflation happens automatically. Instead of
>> deflating when called by the OOM handler, the shrinker is used.
>>
>> However, the balloon is not simply some slab cache that should be
>> shrunk when under memory pressure. The shrinker does not have a concept of
>> priorities, so this behavior cannot be configured.
>>
>> There was a report that this results in undesired side effects when
>> inflating the balloon to shrink the page cache. [1]
>> 	"When inflating the balloon against page cache (i.e. no free memory
>> 	 remains) vmscan.c will both shrink page cache, but also invoke the
>> 	 shrinkers -- including the balloon's shrinker. So the balloon
>> 	 driver allocates memory which requires reclaim, vmscan gets this
>> 	 memory by shrinking the balloon, and then the driver adds the
>> 	 memory back to the balloon. Basically a busy no-op."
>>
>> The name "deflate on OOM" makes it pretty clear when deflation should
>> happen - after other approaches to reclaim memory failed, not while
>> reclaiming. This allows to minimize the footprint of a guest - memory
>> will only be taken out of the balloon when really needed.
>>
>> Especially, a drop_slab() will result in the whole balloon getting
>> deflated - undesired. While handling it via the OOM handler might not be
>> perfect, it keeps existing behavior. If we want a different behavior, then
>> we need a new feature bit and document it properly (although, there should
>> be a clear use case and the intended effects should be well described).
>>
>> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
>> this has no such side effects. Always register the shrinker with
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
>> pages that are still to be processed by the guest. The hypervisor takes
>> care of identifying and resolving possible races between processing a
>> hinting request and the guest reusing a page.
>>
>> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
>> notifier with shrinker"), don't add a moodule parameter to configure the
>> number of pages to deflate on OOM. Can be re-added if really needed.
> 
> I agree. And to make this case even stronger:
> 
> The oom_pages module parameter was known to be broken: whatever its
> value, we return at most VIRTIO_BALLOON_ARRAY_PFNS_MAX.  So module
> parameter values > 256 never worked, and it seems highly unlikely that
> freeing 1Mbyte on OOM is too aggressive.
> There was a patch
>  virtio-balloon: deflate up to oom_pages on OOM
> by Wei Wang to try to fix it:
> https://lore.kernel.org/r/1508500466-21165-3-git-send-email-wei.w.wang@intel.com
> but this was dropped.

Makes sense. 1MB is usually good enough.

> 
>> Also, pay attention that leak_balloon() returns the number of 4k pages -
>> convert it properly in virtio_balloon_oom_notify().
> 
> Oh. So it was returning a wrong value originally (before 71994620bb25).
> However what really matters for notifiers is whether the value is 0 -
> whether we made progress. So it's cosmetic.

Yes, that's also my understanding.

> 
>> Note1: using the OOM handler is frowned upon, but it really is what we
>>        need for this feature.
> 
> Quite. However, I went back researching why we dropped the OOM notifier,
> and found this:
> 
> https://lore.kernel.org/r/1508500466-21165-2-git-send-email-wei.w.wang@intel.com
> 
> To quote from there:
> 
> The balloon_lock was used to synchronize the access demand to elements
> of struct virtio_balloon and its queue operations (please see commit
> e22504296d). This prevents the concurrent run of the leak_balloon and
> fill_balloon functions, thereby resulting in a deadlock issue on OOM:
> 
> fill_balloon: take balloon_lock and wait for OOM to get some memory;
> oom_notify: release some inflated memory via leak_balloon();
> leak_balloon: wait for balloon_lock to be released by fill_balloon.

fill_balloon does the allocation *before* taking the lock. tell_host()
should not allocate memory AFAIR. So how could this ever happen?

Anyhow, we could simply work around this by doing a trylock in
fill_balloon() and retrying in the caller. That should be easy. But I
want to understand first, how something like that would even be possible.

>> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>>        could actually skip sending deflation requests to our hypervisor,
>>        making the OOM path *very* simple. Besically freeing pages and
>>        updating the balloon.
> 
> Well not exactly. !VIRTIO_BALLOON_F_MUST_TELL_HOST does not actually
> mean "never tell host". It means "host will not discard pages in the
> balloon, you can defer host notification until after use".
> 
> This was the original implementation:
> 
> +       if (vb->tell_host_first) {
> +               tell_host(vb, vb->deflate_vq);
> +               release_pages_by_pfn(vb->pfns, vb->num_pfns);
> +       } else {
> +               release_pages_by_pfn(vb->pfns, vb->num_pfns);
> +               tell_host(vb, vb->deflate_vq);
> +       }
> +}
> 
> I don't know whether completely skipping host notifications
> when !VIRTIO_BALLOON_F_MUST_TELL_HOST will break any hosts.

We discussed this already somewhere else, but here is again what I found.

commit bf50e69f63d21091e525185c3ae761412be0ba72
Author: Dave Hansen <dave@linux.vnet.ibm.com>
Date:   Thu Apr 7 10:43:25 2011 -0700

    virtio balloon: kill tell-host-first logic

    The virtio balloon driver has a VIRTIO_BALLOON_F_MUST_TELL_HOST
    feature bit.  Whenever the bit is set, the guest kernel must
    always tell the host before we free pages back to the allocator.
    Without this feature, we might free a page (and have another
    user touch it) while the hypervisor is unprepared for it.

    But, if the bit is _not_ set, we are under no obligation to
    reverse the order; we're under no obligation to do _anything_.
    As of now, qemu-kvm defines the bit, but doesn't set it.

MUST_TELL_HOST really means "no need to deflate, just reuse a page". We
should finally document this somewhere.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-06  8:42     ` David Hildenbrand
@ 2020-02-06  8:57       ` Michael S. Tsirkin
  2020-02-06  9:05         ` David Hildenbrand
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Thu, Feb 06, 2020 at 09:42:34AM +0100, David Hildenbrand wrote:
> On 06.02.20 08:40, Michael S. Tsirkin wrote:
> > On Wed, Feb 05, 2020 at 05:34:02PM +0100, David Hildenbrand wrote:
> >> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> >> changed the behavior when deflation happens automatically. Instead of
> >> deflating when called by the OOM handler, the shrinker is used.
> >>
> >> However, the balloon is not simply some slab cache that should be
> >> shrunk when under memory pressure. The shrinker does not have a concept of
> >> priorities, so this behavior cannot be configured.
> >>
> >> There was a report that this results in undesired side effects when
> >> inflating the balloon to shrink the page cache. [1]
> >> 	"When inflating the balloon against page cache (i.e. no free memory
> >> 	 remains) vmscan.c will both shrink page cache, but also invoke the
> >> 	 shrinkers -- including the balloon's shrinker. So the balloon
> >> 	 driver allocates memory which requires reclaim, vmscan gets this
> >> 	 memory by shrinking the balloon, and then the driver adds the
> >> 	 memory back to the balloon. Basically a busy no-op."
> >>
> >> The name "deflate on OOM" makes it pretty clear when deflation should
> >> happen - after other approaches to reclaim memory failed, not while
> >> reclaiming. This allows to minimize the footprint of a guest - memory
> >> will only be taken out of the balloon when really needed.
> >>
> >> Especially, a drop_slab() will result in the whole balloon getting
> >> deflated - undesired. While handling it via the OOM handler might not be
> >> perfect, it keeps existing behavior. If we want a different behavior, then
> >> we need a new feature bit and document it properly (although, there should
> >> be a clear use case and the intended effects should be well described).
> >>
> >> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> >> this has no such side effects. Always register the shrinker with
> >> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
> >> pages that are still to be processed by the guest. The hypervisor takes
> >> care of identifying and resolving possible races between processing a
> >> hinting request and the guest reusing a page.
> >>
> >> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> >> notifier with shrinker"), don't add a moodule parameter to configure the
> >> number of pages to deflate on OOM. Can be re-added if really needed.
> > 
> > I agree. And to make this case even stronger:
> > 
> > The oom_pages module parameter was known to be broken: whatever its
> > value, we return at most VIRTIO_BALLOON_ARRAY_PFNS_MAX.  So module
> > parameter values > 256 never worked, and it seems highly unlikely that
> > freeing 1Mbyte on OOM is too aggressive.
> > There was a patch
> >  virtio-balloon: deflate up to oom_pages on OOM
> > by Wei Wang to try to fix it:
> > https://lore.kernel.org/r/1508500466-21165-3-git-send-email-wei.w.wang@intel.com
> > but this was dropped.
> 
> Makes sense. 1MB is usually good enough.
> 
> > 
> >> Also, pay attention that leak_balloon() returns the number of 4k pages -
> >> convert it properly in virtio_balloon_oom_notify().
> > 
> > Oh. So it was returning a wrong value originally (before 71994620bb25).
> > However what really matters for notifiers is whether the value is 0 -
> > whether we made progress. So it's cosmetic.
> 
> Yes, that's also my understanding.
> 
> > 
> >> Note1: using the OOM handler is frowned upon, but it really is what we
> >>        need for this feature.
> > 
> > Quite. However, I went back researching why we dropped the OOM notifier,
> > and found this:
> > 
> > https://lore.kernel.org/r/1508500466-21165-2-git-send-email-wei.w.wang@intel.com
> > 
> > To quote from there:
> > 
> > The balloon_lock was used to synchronize the access demand to elements
> > of struct virtio_balloon and its queue operations (please see commit
> > e22504296d). This prevents the concurrent run of the leak_balloon and
> > fill_balloon functions, thereby resulting in a deadlock issue on OOM:
> > 
> > fill_balloon: take balloon_lock and wait for OOM to get some memory;
> > oom_notify: release some inflated memory via leak_balloon();
> > leak_balloon: wait for balloon_lock to be released by fill_balloon.
> 
> fill_balloon does the allocation *before* taking the lock. tell_host()
> should not allocate memory AFAIR. So how could this ever happen?
> 
> Anyhow, we could simply work around this by doing a trylock in
> fill_balloon() and retrying in the caller. That should be easy. But I
> want to understand first, how something like that would even be possible.

Hmm it looks like you are right.  Sorry!


> >> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
> >>        could actually skip sending deflation requests to our hypervisor,
> >>        making the OOM path *very* simple. Besically freeing pages and
> >>        updating the balloon.
> > 
> > Well not exactly. !VIRTIO_BALLOON_F_MUST_TELL_HOST does not actually
> > mean "never tell host". It means "host will not discard pages in the
> > balloon, you can defer host notification until after use".
> > 
> > This was the original implementation:
> > 
> > +       if (vb->tell_host_first) {
> > +               tell_host(vb, vb->deflate_vq);
> > +               release_pages_by_pfn(vb->pfns, vb->num_pfns);
> > +       } else {
> > +               release_pages_by_pfn(vb->pfns, vb->num_pfns);
> > +               tell_host(vb, vb->deflate_vq);
> > +       }
> > +}
> > 
> > I don't know whether completely skipping host notifications
> > when !VIRTIO_BALLOON_F_MUST_TELL_HOST will break any hosts.
> 
> We discussed this already somewhere else, but here is again what I found.
> 
> commit bf50e69f63d21091e525185c3ae761412be0ba72
> Author: Dave Hansen <dave@linux.vnet.ibm.com>
> Date:   Thu Apr 7 10:43:25 2011 -0700
> 
>     virtio balloon: kill tell-host-first logic
> 
>     The virtio balloon driver has a VIRTIO_BALLOON_F_MUST_TELL_HOST
>     feature bit.  Whenever the bit is set, the guest kernel must
>     always tell the host before we free pages back to the allocator.
>     Without this feature, we might free a page (and have another
>     user touch it) while the hypervisor is unprepared for it.
> 
>     But, if the bit is _not_ set, we are under no obligation to
>     reverse the order; we're under no obligation to do _anything_.
>     As of now, qemu-kvm defines the bit, but doesn't set it.

Well this is not what the spec says in the end.
To continue that commit message:

    This patch makes the "tell host first" logic the only case.  This
    should make everybody happy, and reduce the amount of untested or
    untestable code in the kernel.

you can try proposing the change to the virtio TC, see what do others
think.


> MUST_TELL_HOST really means "no need to deflate, just reuse a page". We
> should finally document this somewhere.

I'm not sure it's not too late to change what that flag means.  If not
sending deflate messages at all is a useful optimization, it seems
safer to add a feature flag for that.

> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
  2020-02-05 22:37   ` Tyler Sanderson
  2020-02-06  7:40   ` Michael S. Tsirkin
@ 2020-02-06  8:57   ` Wang, Wei W
  2020-02-06  9:11   ` Michael S. Tsirkin
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Wang, Wei W @ 2020-02-06  8:57 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, virtualization, Tyler Sanderson, Michael S . Tsirkin,
	Alexander Duyck, David Rientjes, Nadav Amit, Michal Hocko

On Thursday, February 6, 2020 12:34 AM, David Hildenbrand wrote:
> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> changed the behavior when deflation happens automatically. Instead of
> deflating when called by the OOM handler, the shrinker is used.
> 
> However, the balloon is not simply some slab cache that should be shrunk
> when under memory pressure. The shrinker does not have a concept of
> priorities, so this behavior cannot be configured.
> 
> There was a report that this results in undesired side effects when inflating
> the balloon to shrink the page cache. [1]
> 	"When inflating the balloon against page cache (i.e. no free memory
> 	 remains) vmscan.c will both shrink page cache, but also invoke the
> 	 shrinkers -- including the balloon's shrinker. So the balloon
> 	 driver allocates memory which requires reclaim, vmscan gets this
> 	 memory by shrinking the balloon, and then the driver adds the
> 	 memory back to the balloon. Basically a busy no-op."

Not sure if we need to go back to OOM, which has many drawbacks as we discussed.
Just posted out another approach, which is simple.

Best,
Wei


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-06  8:57       ` Michael S. Tsirkin
@ 2020-02-06  9:05         ` David Hildenbrand
  2020-02-06  9:09           ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2020-02-06  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

>> commit bf50e69f63d21091e525185c3ae761412be0ba72
>> Author: Dave Hansen <dave@linux.vnet.ibm.com>
>> Date:   Thu Apr 7 10:43:25 2011 -0700
>>
>>     virtio balloon: kill tell-host-first logic
>>
>>     The virtio balloon driver has a VIRTIO_BALLOON_F_MUST_TELL_HOST
>>     feature bit.  Whenever the bit is set, the guest kernel must
>>     always tell the host before we free pages back to the allocator.
>>     Without this feature, we might free a page (and have another
>>     user touch it) while the hypervisor is unprepared for it.
>>
>>     But, if the bit is _not_ set, we are under no obligation to
>>     reverse the order; we're under no obligation to do _anything_.
>>     As of now, qemu-kvm defines the bit, but doesn't set it.
> 
> Well this is not what the spec says in the end.

I didn't check the spec, maybe I should do that :)

> To continue that commit message:
> 
>     This patch makes the "tell host first" logic the only case.  This
>     should make everybody happy, and reduce the amount of untested or
>     untestable code in the kernel.

Yeah, but this comment explains that the current deflate is only in
place, because it makes the code simpler (to support both cases). Of
course, doing the deflate might result in performance improvements.
(e.g., MADV_WILLNEED)

> 
> you can try proposing the change to the virtio TC, see what do others
> think.

We can just drop the comment from this patch for now. The tell_host host
not be an issue AFAIKS.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-06  9:05         ` David Hildenbrand
@ 2020-02-06  9:09           ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  9:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Thu, Feb 06, 2020 at 10:05:43AM +0100, David Hildenbrand wrote:
> >> commit bf50e69f63d21091e525185c3ae761412be0ba72
> >> Author: Dave Hansen <dave@linux.vnet.ibm.com>
> >> Date:   Thu Apr 7 10:43:25 2011 -0700
> >>
> >>     virtio balloon: kill tell-host-first logic
> >>
> >>     The virtio balloon driver has a VIRTIO_BALLOON_F_MUST_TELL_HOST
> >>     feature bit.  Whenever the bit is set, the guest kernel must
> >>     always tell the host before we free pages back to the allocator.
> >>     Without this feature, we might free a page (and have another
> >>     user touch it) while the hypervisor is unprepared for it.
> >>
> >>     But, if the bit is _not_ set, we are under no obligation to
> >>     reverse the order; we're under no obligation to do _anything_.
> >>     As of now, qemu-kvm defines the bit, but doesn't set it.
> > 
> > Well this is not what the spec says in the end.
> 
> I didn't check the spec, maybe I should do that :)
> 
> > To continue that commit message:
> > 
> >     This patch makes the "tell host first" logic the only case.  This
> >     should make everybody happy, and reduce the amount of untested or
> >     untestable code in the kernel.
> 
> Yeah, but this comment explains that the current deflate is only in
> place, because it makes the code simpler (to support both cases). Of
> course, doing the deflate might result in performance improvements.
> (e.g., MADV_WILLNEED)
> 
> > 
> > you can try proposing the change to the virtio TC, see what do others
> > think.
> 
> We can just drop the comment from this patch for now. The tell_host host
> not be an issue AFAIKS.

I guess it's a good idea.


> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
                     ` (2 preceding siblings ...)
  2020-02-06  8:57   ` Wang, Wei W
@ 2020-02-06  9:11   ` Michael S. Tsirkin
  2020-02-06  9:12   ` Michael S. Tsirkin
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  9:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Wed, Feb 05, 2020 at 05:34:02PM +0100, David Hildenbrand wrote:
> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> changed the behavior when deflation happens automatically. Instead of
> deflating when called by the OOM handler, the shrinker is used.
> 
> However, the balloon is not simply some slab cache that should be
> shrunk when under memory pressure. The shrinker does not have a concept of
> priorities, so this behavior cannot be configured.
> 
> There was a report that this results in undesired side effects when
> inflating the balloon to shrink the page cache. [1]
> 	"When inflating the balloon against page cache (i.e. no free memory
> 	 remains) vmscan.c will both shrink page cache, but also invoke the
> 	 shrinkers -- including the balloon's shrinker. So the balloon
> 	 driver allocates memory which requires reclaim, vmscan gets this
> 	 memory by shrinking the balloon, and then the driver adds the
> 	 memory back to the balloon. Basically a busy no-op."
> 
> The name "deflate on OOM" makes it pretty clear when deflation should
> happen - after other approaches to reclaim memory failed, not while
> reclaiming. This allows to minimize the footprint of a guest - memory
> will only be taken out of the balloon when really needed.
> 
> Especially, a drop_slab() will result in the whole balloon getting
> deflated - undesired. While handling it via the OOM handler might not be
> perfect, it keeps existing behavior. If we want a different behavior, then
> we need a new feature bit and document it properly (although, there should
> be a clear use case and the intended effects should be well described).
> 
> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> this has no such side effects. Always register the shrinker with
> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
> pages that are still to be processed by the guest. The hypervisor takes
> care of identifying and resolving possible races between processing a
> hinting request and the guest reusing a page.
> 
> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> notifier with shrinker"), don't add a moodule parameter to configure the
> number of pages to deflate on OOM. Can be re-added if really needed.
> Also, pay attention that leak_balloon() returns the number of 4k pages -
> convert it properly in virtio_balloon_oom_notify().
> 
> Note1: using the OOM handler is frowned upon, but it really is what we
>        need for this feature.
> 
> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>        could actually skip sending deflation requests to our hypervisor,
>        making the OOM path *very* simple. Besically freeing pages and
>        updating the balloon. If the communication with the host ever
>        becomes a problem on this call path.
> 
> [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
> 
> Reported-by: Tyler Sanderson <tysand@google.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Nadav Amit <namit@vmware.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

So the revert looks ok, from that POV and with commit log changes

Acked-by: Michael S. Tsirkin <mst@redhat.com>

however, let's see what do others think, and whether Wei can come
up with a fixup for the shrinker.


> ---
>  drivers/virtio/virtio_balloon.c | 107 +++++++++++++-------------------
>  1 file changed, 44 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 7e5d84caeb94..e7b18f556c5e 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -14,6 +14,7 @@
>  #include <linux/slab.h>
>  #include <linux/module.h>
>  #include <linux/balloon_compaction.h>
> +#include <linux/oom.h>
>  #include <linux/wait.h>
>  #include <linux/mm.h>
>  #include <linux/mount.h>
> @@ -27,7 +28,9 @@
>   */
>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
> -#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> +/* Maximum number of (4k) pages to deflate on OOM notifications. */
> +#define VIRTIO_BALLOON_OOM_NR_PAGES 256
> +#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
>  
>  #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
>  					     __GFP_NOMEMALLOC)
> @@ -112,8 +115,11 @@ struct virtio_balloon {
>  	/* Memory statistics */
>  	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
>  
> -	/* To register a shrinker to shrink memory upon memory pressure */
> +	/* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
>  	struct shrinker shrinker;
> +
> +	/* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
> +	struct notifier_block oom_nb;
>  };
>  
>  static struct virtio_device_id id_table[] = {
> @@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
>  	return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>  
> -static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
> -                                          unsigned long pages_to_free)
> -{
> -	return leak_balloon(vb, pages_to_free * VIRTIO_BALLOON_PAGES_PER_PAGE) /
> -		VIRTIO_BALLOON_PAGES_PER_PAGE;
> -}
> -
> -static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
> -					  unsigned long pages_to_free)
> -{
> -	unsigned long pages_freed = 0;
> -
> -	/*
> -	 * One invocation of leak_balloon can deflate at most
> -	 * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
> -	 * multiple times to deflate pages till reaching pages_to_free.
> -	 */
> -	while (vb->num_pages && pages_freed < pages_to_free)
> -		pages_freed += leak_balloon_pages(vb,
> -						  pages_to_free - pages_freed);
> -
> -	update_balloon_size(vb);
> -
> -	return pages_freed;
> -}
> -
>  static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
>  						  struct shrink_control *sc)
>  {
> -	unsigned long pages_to_free, pages_freed = 0;
>  	struct virtio_balloon *vb = container_of(shrinker,
>  					struct virtio_balloon, shrinker);
>  
> -	pages_to_free = sc->nr_to_scan;
> -
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> -		pages_freed = shrink_free_pages(vb, pages_to_free);
> -
> -	if (pages_freed >= pages_to_free)
> -		return pages_freed;
> -
> -	pages_freed += shrink_balloon_pages(vb, pages_to_free - pages_freed);
> -
> -	return pages_freed;
> +	return shrink_free_pages(vb, sc->nr_to_scan);
>  }
>  
>  static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
> @@ -837,26 +806,22 @@ static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
>  {
>  	struct virtio_balloon *vb = container_of(shrinker,
>  					struct virtio_balloon, shrinker);
> -	unsigned long count;
> -
> -	count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
> -	count += vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  
> -	return count;
> +	return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>  
> -static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
> +static int virtio_balloon_oom_notify(struct notifier_block *nb,
> +				     unsigned long dummy, void *parm)
>  {
> -	unregister_shrinker(&vb->shrinker);
> -}
> +	struct virtio_balloon *vb = container_of(nb,
> +						 struct virtio_balloon, oom_nb);
> +	unsigned long *freed = parm;
>  
> -static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
> -{
> -	vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> -	vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> -	vb->shrinker.seeks = DEFAULT_SEEKS;
> +	*freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
> +		  VIRTIO_BALLOON_PAGES_PER_PAGE;
> +	update_balloon_size(vb);
>  
> -	return register_shrinker(&vb->shrinker);
> +	return NOTIFY_OK;
>  }
>  
>  static int virtballoon_probe(struct virtio_device *vdev)
> @@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  			virtio_cwrite(vb->vdev, struct virtio_balloon_config,
>  				      poison_val, &poison_val);
>  		}
> -	}
> -	/*
> -	 * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
> -	 * shrinker needs to be registered to relieve memory pressure.
> -	 */
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
> -		err = virtio_balloon_register_shrinker(vb);
> +
> +		/*
> +		 * We're allowed to reuse any free pages, even if they are
> +		 * still to be processed by the host.
> +		 */
> +		vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> +		vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> +		vb->shrinker.seeks = DEFAULT_SEEKS;
> +		err = register_shrinker(&vb->shrinker);
>  		if (err)
>  			goto out_del_balloon_wq;
>  	}
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
> +		vb->oom_nb.notifier_call = virtio_balloon_oom_notify;
> +		vb->oom_nb.priority = VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY;
> +		err = register_oom_notifier(&vb->oom_nb);
> +		if (err < 0)
> +			goto out_unregister_shrinker;
> +	}
> +
>  	virtio_device_ready(vdev);
>  
>  	if (towards_target(vb))
>  		virtballoon_changed(vdev);
>  	return 0;
>  
> +out_unregister_shrinker:
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +		unregister_shrinker(&vb->shrinker);
>  out_del_balloon_wq:
>  	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>  		destroy_workqueue(vb->balloon_wq);
> @@ -987,8 +965,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb = vdev->priv;
>  
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> -		virtio_balloon_unregister_shrinker(vb);
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> +		unregister_oom_notifier(&vb->oom_nb);
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +		unregister_shrinker(&vb->shrinker);
> +
>  	spin_lock_irq(&vb->stop_update_lock);
>  	vb->stop_update = true;
>  	spin_unlock_irq(&vb->stop_update_lock);
> -- 
> 2.24.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
                     ` (3 preceding siblings ...)
  2020-02-06  9:11   ` Michael S. Tsirkin
@ 2020-02-06  9:12   ` Michael S. Tsirkin
  2020-02-06  9:21     ` David Hildenbrand
  2020-02-14  9:51   ` David Hildenbrand
  2020-02-14 14:06   ` Michal Hocko
  6 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  9:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Wed, Feb 05, 2020 at 05:34:02PM +0100, David Hildenbrand wrote:
> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> changed the behavior when deflation happens automatically. Instead of
> deflating when called by the OOM handler, the shrinker is used.
> 
> However, the balloon is not simply some slab cache that should be
> shrunk when under memory pressure. The shrinker does not have a concept of
> priorities, so this behavior cannot be configured.
> 
> There was a report that this results in undesired side effects when
> inflating the balloon to shrink the page cache. [1]
> 	"When inflating the balloon against page cache (i.e. no free memory
> 	 remains) vmscan.c will both shrink page cache, but also invoke the
> 	 shrinkers -- including the balloon's shrinker. So the balloon
> 	 driver allocates memory which requires reclaim, vmscan gets this
> 	 memory by shrinking the balloon, and then the driver adds the
> 	 memory back to the balloon. Basically a busy no-op."
> 
> The name "deflate on OOM" makes it pretty clear when deflation should
> happen - after other approaches to reclaim memory failed, not while
> reclaiming. This allows to minimize the footprint of a guest - memory
> will only be taken out of the balloon when really needed.
> 
> Especially, a drop_slab() will result in the whole balloon getting
> deflated - undesired. While handling it via the OOM handler might not be
> perfect, it keeps existing behavior. If we want a different behavior, then
> we need a new feature bit and document it properly (although, there should
> be a clear use case and the intended effects should be well described).
> 
> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> this has no such side effects. Always register the shrinker with
> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
> pages that are still to be processed by the guest. The hypervisor takes
> care of identifying and resolving possible races between processing a
> hinting request and the guest reusing a page.
> 
> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> notifier with shrinker"), don't add a moodule parameter to configure the
> number of pages to deflate on OOM. Can be re-added if really needed.
> Also, pay attention that leak_balloon() returns the number of 4k pages -
> convert it properly in virtio_balloon_oom_notify().
> 
> Note1: using the OOM handler is frowned upon, but it really is what we
>        need for this feature.
> 
> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>        could actually skip sending deflation requests to our hypervisor,
>        making the OOM path *very* simple. Besically freeing pages and
>        updating the balloon. If the communication with the host ever
>        becomes a problem on this call path.
> 
> [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
> 
> Reported-by: Tyler Sanderson <tysand@google.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Nadav Amit <namit@vmware.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>


I guess we should add a Fixes tag to the patch it's reverting,
this way it's backported and hypervisors will be able to rely on OOM
behaviour.

> ---
>  drivers/virtio/virtio_balloon.c | 107 +++++++++++++-------------------
>  1 file changed, 44 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 7e5d84caeb94..e7b18f556c5e 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -14,6 +14,7 @@
>  #include <linux/slab.h>
>  #include <linux/module.h>
>  #include <linux/balloon_compaction.h>
> +#include <linux/oom.h>
>  #include <linux/wait.h>
>  #include <linux/mm.h>
>  #include <linux/mount.h>
> @@ -27,7 +28,9 @@
>   */
>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
> -#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> +/* Maximum number of (4k) pages to deflate on OOM notifications. */
> +#define VIRTIO_BALLOON_OOM_NR_PAGES 256
> +#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
>  
>  #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
>  					     __GFP_NOMEMALLOC)
> @@ -112,8 +115,11 @@ struct virtio_balloon {
>  	/* Memory statistics */
>  	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
>  
> -	/* To register a shrinker to shrink memory upon memory pressure */
> +	/* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
>  	struct shrinker shrinker;
> +
> +	/* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
> +	struct notifier_block oom_nb;
>  };
>  
>  static struct virtio_device_id id_table[] = {
> @@ -786,50 +792,13 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
>  	return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>  
> -static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
> -                                          unsigned long pages_to_free)
> -{
> -	return leak_balloon(vb, pages_to_free * VIRTIO_BALLOON_PAGES_PER_PAGE) /
> -		VIRTIO_BALLOON_PAGES_PER_PAGE;
> -}
> -
> -static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
> -					  unsigned long pages_to_free)
> -{
> -	unsigned long pages_freed = 0;
> -
> -	/*
> -	 * One invocation of leak_balloon can deflate at most
> -	 * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
> -	 * multiple times to deflate pages till reaching pages_to_free.
> -	 */
> -	while (vb->num_pages && pages_freed < pages_to_free)
> -		pages_freed += leak_balloon_pages(vb,
> -						  pages_to_free - pages_freed);
> -
> -	update_balloon_size(vb);
> -
> -	return pages_freed;
> -}
> -
>  static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
>  						  struct shrink_control *sc)
>  {
> -	unsigned long pages_to_free, pages_freed = 0;
>  	struct virtio_balloon *vb = container_of(shrinker,
>  					struct virtio_balloon, shrinker);
>  
> -	pages_to_free = sc->nr_to_scan;
> -
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> -		pages_freed = shrink_free_pages(vb, pages_to_free);
> -
> -	if (pages_freed >= pages_to_free)
> -		return pages_freed;
> -
> -	pages_freed += shrink_balloon_pages(vb, pages_to_free - pages_freed);
> -
> -	return pages_freed;
> +	return shrink_free_pages(vb, sc->nr_to_scan);
>  }
>  
>  static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
> @@ -837,26 +806,22 @@ static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
>  {
>  	struct virtio_balloon *vb = container_of(shrinker,
>  					struct virtio_balloon, shrinker);
> -	unsigned long count;
> -
> -	count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
> -	count += vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  
> -	return count;
> +	return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
>  }
>  
> -static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
> +static int virtio_balloon_oom_notify(struct notifier_block *nb,
> +				     unsigned long dummy, void *parm)
>  {
> -	unregister_shrinker(&vb->shrinker);
> -}
> +	struct virtio_balloon *vb = container_of(nb,
> +						 struct virtio_balloon, oom_nb);
> +	unsigned long *freed = parm;
>  
> -static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
> -{
> -	vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> -	vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> -	vb->shrinker.seeks = DEFAULT_SEEKS;
> +	*freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
> +		  VIRTIO_BALLOON_PAGES_PER_PAGE;
> +	update_balloon_size(vb);
>  
> -	return register_shrinker(&vb->shrinker);
> +	return NOTIFY_OK;
>  }
>  
>  static int virtballoon_probe(struct virtio_device *vdev)
> @@ -933,22 +898,35 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  			virtio_cwrite(vb->vdev, struct virtio_balloon_config,
>  				      poison_val, &poison_val);
>  		}
> -	}
> -	/*
> -	 * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
> -	 * shrinker needs to be registered to relieve memory pressure.
> -	 */
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
> -		err = virtio_balloon_register_shrinker(vb);
> +
> +		/*
> +		 * We're allowed to reuse any free pages, even if they are
> +		 * still to be processed by the host.
> +		 */
> +		vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> +		vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> +		vb->shrinker.seeks = DEFAULT_SEEKS;
> +		err = register_shrinker(&vb->shrinker);
>  		if (err)
>  			goto out_del_balloon_wq;
>  	}
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
> +		vb->oom_nb.notifier_call = virtio_balloon_oom_notify;
> +		vb->oom_nb.priority = VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY;
> +		err = register_oom_notifier(&vb->oom_nb);
> +		if (err < 0)
> +			goto out_unregister_shrinker;
> +	}
> +
>  	virtio_device_ready(vdev);
>  
>  	if (towards_target(vb))
>  		virtballoon_changed(vdev);
>  	return 0;
>  
> +out_unregister_shrinker:
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +		unregister_shrinker(&vb->shrinker);
>  out_del_balloon_wq:
>  	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>  		destroy_workqueue(vb->balloon_wq);
> @@ -987,8 +965,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb = vdev->priv;
>  
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> -		virtio_balloon_unregister_shrinker(vb);
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> +		unregister_oom_notifier(&vb->oom_nb);
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> +		unregister_shrinker(&vb->shrinker);
> +
>  	spin_lock_irq(&vb->stop_update_lock);
>  	vb->stop_update = true;
>  	spin_unlock_irq(&vb->stop_update_lock);
> -- 
> 2.24.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-06  9:12   ` Michael S. Tsirkin
@ 2020-02-06  9:21     ` David Hildenbrand
  0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-02-06  9:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On 06.02.20 10:12, Michael S. Tsirkin wrote:
> On Wed, Feb 05, 2020 at 05:34:02PM +0100, David Hildenbrand wrote:
>> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
>> changed the behavior when deflation happens automatically. Instead of
>> deflating when called by the OOM handler, the shrinker is used.
>>
>> However, the balloon is not simply some slab cache that should be
>> shrunk when under memory pressure. The shrinker does not have a concept of
>> priorities, so this behavior cannot be configured.
>>
>> There was a report that this results in undesired side effects when
>> inflating the balloon to shrink the page cache. [1]
>> 	"When inflating the balloon against page cache (i.e. no free memory
>> 	 remains) vmscan.c will both shrink page cache, but also invoke the
>> 	 shrinkers -- including the balloon's shrinker. So the balloon
>> 	 driver allocates memory which requires reclaim, vmscan gets this
>> 	 memory by shrinking the balloon, and then the driver adds the
>> 	 memory back to the balloon. Basically a busy no-op."
>>
>> The name "deflate on OOM" makes it pretty clear when deflation should
>> happen - after other approaches to reclaim memory failed, not while
>> reclaiming. This allows to minimize the footprint of a guest - memory
>> will only be taken out of the balloon when really needed.
>>
>> Especially, a drop_slab() will result in the whole balloon getting
>> deflated - undesired. While handling it via the OOM handler might not be
>> perfect, it keeps existing behavior. If we want a different behavior, then
>> we need a new feature bit and document it properly (although, there should
>> be a clear use case and the intended effects should be well described).
>>
>> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
>> this has no such side effects. Always register the shrinker with
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
>> pages that are still to be processed by the guest. The hypervisor takes
>> care of identifying and resolving possible races between processing a
>> hinting request and the guest reusing a page.
>>
>> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
>> notifier with shrinker"), don't add a moodule parameter to configure the
>> number of pages to deflate on OOM. Can be re-added if really needed.
>> Also, pay attention that leak_balloon() returns the number of 4k pages -
>> convert it properly in virtio_balloon_oom_notify().
>>
>> Note1: using the OOM handler is frowned upon, but it really is what we
>>        need for this feature.
>>
>> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>>        could actually skip sending deflation requests to our hypervisor,
>>        making the OOM path *very* simple. Besically freeing pages and
>>        updating the balloon. If the communication with the host ever
>>        becomes a problem on this call path.
>>
>> [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
>>
>> Reported-by: Tyler Sanderson <tysand@google.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Cc: Wei Wang <wei.w.wang@intel.com>
>> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Nadav Amit <namit@vmware.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> 
> I guess we should add a Fixes tag to the patch it's reverting,
> this way it's backported and hypervisors will be able to rely on OOM
> behaviour.

Makes sense,

Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
                     ` (4 preceding siblings ...)
  2020-02-06  9:12   ` Michael S. Tsirkin
@ 2020-02-14  9:51   ` David Hildenbrand
  2020-02-14 13:31     ` Wang, Wei W
  2020-02-16  9:47     ` Michael S. Tsirkin
  2020-02-14 14:06   ` Michal Hocko
  6 siblings, 2 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-02-14  9:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtualization, Tyler Sanderson, Michael S . Tsirkin,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On 05.02.20 17:34, David Hildenbrand wrote:
> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> changed the behavior when deflation happens automatically. Instead of
> deflating when called by the OOM handler, the shrinker is used.
> 
> However, the balloon is not simply some slab cache that should be
> shrunk when under memory pressure. The shrinker does not have a concept of
> priorities, so this behavior cannot be configured.
> 
> There was a report that this results in undesired side effects when
> inflating the balloon to shrink the page cache. [1]
> 	"When inflating the balloon against page cache (i.e. no free memory
> 	 remains) vmscan.c will both shrink page cache, but also invoke the
> 	 shrinkers -- including the balloon's shrinker. So the balloon
> 	 driver allocates memory which requires reclaim, vmscan gets this
> 	 memory by shrinking the balloon, and then the driver adds the
> 	 memory back to the balloon. Basically a busy no-op."
> 
> The name "deflate on OOM" makes it pretty clear when deflation should
> happen - after other approaches to reclaim memory failed, not while
> reclaiming. This allows to minimize the footprint of a guest - memory
> will only be taken out of the balloon when really needed.
> 
> Especially, a drop_slab() will result in the whole balloon getting
> deflated - undesired. While handling it via the OOM handler might not be
> perfect, it keeps existing behavior. If we want a different behavior, then
> we need a new feature bit and document it properly (although, there should
> be a clear use case and the intended effects should be well described).
> 
> Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> this has no such side effects. Always register the shrinker with
> VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
> pages that are still to be processed by the guest. The hypervisor takes
> care of identifying and resolving possible races between processing a
> hinting request and the guest reusing a page.
> 
> In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> notifier with shrinker"), don't add a moodule parameter to configure the
> number of pages to deflate on OOM. Can be re-added if really needed.
> Also, pay attention that leak_balloon() returns the number of 4k pages -
> convert it properly in virtio_balloon_oom_notify().
> 
> Note1: using the OOM handler is frowned upon, but it really is what we
>        need for this feature.
> 
> Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>        could actually skip sending deflation requests to our hypervisor,
>        making the OOM path *very* simple. Besically freeing pages and
>        updating the balloon. If the communication with the host ever
>        becomes a problem on this call path.
> 

@Michael, how to proceed with this?


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-14  9:51   ` David Hildenbrand
@ 2020-02-14 13:31     ` Wang, Wei W
  2020-02-16  9:47     ` Michael S. Tsirkin
  1 sibling, 0 replies; 32+ messages in thread
From: Wang, Wei W @ 2020-02-14 13:31 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, virtualization, Tyler Sanderson, Michael S . Tsirkin,
	Alexander Duyck, David Rientjes, Nadav Amit, Michal Hocko

On Friday, February 14, 2020 5:52 PM, David Hildenbrand wrote:
> > Commit 71994620bb25 ("virtio_balloon: replace oom notifier with
> > shrinker") changed the behavior when deflation happens automatically.
> > Instead of deflating when called by the OOM handler, the shrinker is used.
> >
> > However, the balloon is not simply some slab cache that should be
> > shrunk when under memory pressure. The shrinker does not have a
> > concept of priorities, so this behavior cannot be configured.
> >
> > There was a report that this results in undesired side effects when
> > inflating the balloon to shrink the page cache. [1]
> > 	"When inflating the balloon against page cache (i.e. no free memory
> > 	 remains) vmscan.c will both shrink page cache, but also invoke the
> > 	 shrinkers -- including the balloon's shrinker. So the balloon
> > 	 driver allocates memory which requires reclaim, vmscan gets this
> > 	 memory by shrinking the balloon, and then the driver adds the
> > 	 memory back to the balloon. Basically a busy no-op."
> >
> > The name "deflate on OOM" makes it pretty clear when deflation should
> > happen - after other approaches to reclaim memory failed, not while
> > reclaiming. This allows to minimize the footprint of a guest - memory
> > will only be taken out of the balloon when really needed.
> >
> > Especially, a drop_slab() will result in the whole balloon getting
> > deflated - undesired. While handling it via the OOM handler might not
> > be perfect, it keeps existing behavior. If we want a different
> > behavior, then we need a new feature bit and document it properly
> > (although, there should be a clear use case and the intended effects should
> be well described).
> >
> > Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT,
> because
> > this has no such side effects. Always register the shrinker with
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to
> reuse
> > free pages that are still to be processed by the guest. The hypervisor
> > takes care of identifying and resolving possible races between
> > processing a hinting request and the guest reusing a page.
> >
> > In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> > notifier with shrinker"), don't add a moodule parameter to configure
> > the number of pages to deflate on OOM. Can be re-added if really needed.
> > Also, pay attention that leak_balloon() returns the number of 4k pages
> > - convert it properly in virtio_balloon_oom_notify().
> >
> > Note1: using the OOM handler is frowned upon, but it really is what we
> >        need for this feature.
> >
> > Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with
> QEMU) we
> >        could actually skip sending deflation requests to our hypervisor,
> >        making the OOM path *very* simple. Besically freeing pages and
> >        updating the balloon. If the communication with the host ever
> >        becomes a problem on this call path.
> >
> 
> @Michael, how to proceed with this?
> 

I vote for not going back. When there are solid request and strong reasons in the future, we could reopen this discussion.

Best,
Wei

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
                     ` (5 preceding siblings ...)
  2020-02-14  9:51   ` David Hildenbrand
@ 2020-02-14 14:06   ` Michal Hocko
  2020-02-14 14:18     ` David Hildenbrand
  6 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2020-02-14 14:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Michael S . Tsirkin, Wei Wang, Alexander Duyck, David Rientjes,
	Nadav Amit

On Wed 05-02-20 17:34:02, David Hildenbrand wrote:
> Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> changed the behavior when deflation happens automatically. Instead of
> deflating when called by the OOM handler, the shrinker is used.
> 
> However, the balloon is not simply some slab cache that should be
> shrunk when under memory pressure. The shrinker does not have a concept of
> priorities, so this behavior cannot be configured.

Adding a priority to the shrinker doesn't sound like a big problem to
me. Shrinkers already get shrink_control data structure already and
priority could be added there.

> There was a report that this results in undesired side effects when
> inflating the balloon to shrink the page cache. [1]
> 	"When inflating the balloon against page cache (i.e. no free memory
> 	 remains) vmscan.c will both shrink page cache, but also invoke the
> 	 shrinkers -- including the balloon's shrinker. So the balloon
> 	 driver allocates memory which requires reclaim, vmscan gets this
> 	 memory by shrinking the balloon, and then the driver adds the
> 	 memory back to the balloon. Basically a busy no-op."
> 
> The name "deflate on OOM" makes it pretty clear when deflation should
> happen - after other approaches to reclaim memory failed, not while
> reclaiming. This allows to minimize the footprint of a guest - memory
> will only be taken out of the balloon when really needed.
> 
> Especially, a drop_slab() will result in the whole balloon getting
> deflated - undesired.

Could you explain why some more? drop_caches shouldn't be really used in
any production workloads and if somebody really wants all the cache to
be dropped then why is balloon any different?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-14 14:06   ` Michal Hocko
@ 2020-02-14 14:18     ` David Hildenbrand
  2020-02-14 20:48       ` Tyler Sanderson
  0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2020-02-14 14:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Michael S . Tsirkin, Wei Wang, Alexander Duyck, David Rientjes,
	Nadav Amit

>> There was a report that this results in undesired side effects when
>> inflating the balloon to shrink the page cache. [1]
>> 	"When inflating the balloon against page cache (i.e. no free memory
>> 	 remains) vmscan.c will both shrink page cache, but also invoke the
>> 	 shrinkers -- including the balloon's shrinker. So the balloon
>> 	 driver allocates memory which requires reclaim, vmscan gets this
>> 	 memory by shrinking the balloon, and then the driver adds the
>> 	 memory back to the balloon. Basically a busy no-op."
>>
>> The name "deflate on OOM" makes it pretty clear when deflation should
>> happen - after other approaches to reclaim memory failed, not while
>> reclaiming. This allows to minimize the footprint of a guest - memory
>> will only be taken out of the balloon when really needed.
>>
>> Especially, a drop_slab() will result in the whole balloon getting
>> deflated - undesired.
> 
> Could you explain why some more? drop_caches shouldn't be really used in
> any production workloads and if somebody really wants all the cache to
> be dropped then why is balloon any different?
> 

Deflation should happen when the guest is out of memory, not when
somebody thinks it's time to reclaim some memory. That's what the
feature promised from the beginning: Only give the guest more memory in
case it *really* needs more memory.

Deflate on oom, not deflate on reclaim/memory pressure. (that's what the
report was all about)

A priority for shrinkers might be a step into the right direction.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-14 14:18     ` David Hildenbrand
@ 2020-02-14 20:48       ` Tyler Sanderson
  2020-02-14 21:17         ` David Hildenbrand
  2020-02-16  9:46         ` Michael S. Tsirkin
  0 siblings, 2 replies; 32+ messages in thread
From: Tyler Sanderson @ 2020-02-14 20:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, linux-kernel, linux-mm, virtualization,
	Michael S . Tsirkin, Wei Wang, Alexander Duyck, David Rientjes,
	Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

Regarding Wei's patch that modifies the shrinker implementation, versus
this patch which reverts to OOM notifier:
I am in favor of both patches. But I do want to make sure a fix gets back
ported to 4.19 where the performance regression was first introduced.
My concern with reverting to the OOM notifier is, as mst@ put it (in the
other thread):
"when linux hits OOM all kind of error paths are being hit, latent bugs
start triggering, latency goes up drastically."
The guest could be in a lot of pain before the OOM notifier is invoked, and
it seems like the shrinker API might allow more fine grained control of
when we deflate.

On the other hand, I'm not totally convinced that Wei's patch is an
expected use of the shrinker/page-cache APIs, and maybe it is fragile.
Needs more testing and scrutiny.

It seems to me like the shrinker API is the right API in the long run,
perhaps with some fixes and modifications. But maybe reverting to OOM
notifier is the best patch to back port?

On Fri, Feb 14, 2020 at 6:19 AM David Hildenbrand <david@redhat.com> wrote:

> >> There was a report that this results in undesired side effects when
> >> inflating the balloon to shrink the page cache. [1]
> >>      "When inflating the balloon against page cache (i.e. no free memory
> >>       remains) vmscan.c will both shrink page cache, but also invoke the
> >>       shrinkers -- including the balloon's shrinker. So the balloon
> >>       driver allocates memory which requires reclaim, vmscan gets this
> >>       memory by shrinking the balloon, and then the driver adds the
> >>       memory back to the balloon. Basically a busy no-op."
> >>
> >> The name "deflate on OOM" makes it pretty clear when deflation should
> >> happen - after other approaches to reclaim memory failed, not while
> >> reclaiming. This allows to minimize the footprint of a guest - memory
> >> will only be taken out of the balloon when really needed.
> >>
> >> Especially, a drop_slab() will result in the whole balloon getting
> >> deflated - undesired.
> >
> > Could you explain why some more? drop_caches shouldn't be really used in
> > any production workloads and if somebody really wants all the cache to
> > be dropped then why is balloon any different?
> >
>
> Deflation should happen when the guest is out of memory, not when
> somebody thinks it's time to reclaim some memory. That's what the
> feature promised from the beginning: Only give the guest more memory in
> case it *really* needs more memory.
>
> Deflate on oom, not deflate on reclaim/memory pressure. (that's what the
> report was all about)
>
> A priority for shrinkers might be a step into the right direction.
>
> --
> Thanks,
>
> David / dhildenb
>
>

[-- Attachment #2: Type: text/html, Size: 3531 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-14 20:48       ` Tyler Sanderson
@ 2020-02-14 21:17         ` David Hildenbrand
  2020-02-16  9:46         ` Michael S. Tsirkin
  1 sibling, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-02-14 21:17 UTC (permalink / raw)
  To: Tyler Sanderson
  Cc: David Hildenbrand, Michal Hocko, linux-kernel, linux-mm,
	virtualization, Michael S . Tsirkin, Wei Wang, Alexander Duyck,
	David Rientjes, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]



> Am 14.02.2020 um 21:49 schrieb Tyler Sanderson <tysand@google.com>:
> 
> 
> Regarding Wei's patch that modifies the shrinker implementation, versus this patch which reverts to OOM notifier:
> I am in favor of both patches. But I do want to make sure a fix gets back ported to 4.19 where the performance regression was first introduced.
> My concern with reverting to the OOM notifier is, as mst@ put it (in the other thread):
> "when linux hits OOM all kind of error paths are being hit, latent bugs start triggering, latency goes up drastically."

Yeah, and that was the default behavior for years, so it‘s not big news :)

> The guest could be in a lot of pain before the OOM notifier is invoked, and it seems like the shrinker API might allow more fine grained control of when we deflate.
> 
> On the other hand, I'm not totally convinced that Wei's patch is an expected use of the shrinker/page-cache APIs, and maybe it is fragile. Needs more testing and scrutiny.
> 
> It seems to me like the shrinker API is the right API in the long run, perhaps with some fixes and modifications. But maybe reverting to OOM notifier is the best patch to back

I think that‘s a good idea. Revert to the old state we had for years and then implement a proper, fully tested solution (e.g., shrinkers with priorities).

Cheers!

[-- Attachment #2: Type: text/html, Size: 1967 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-14 20:48       ` Tyler Sanderson
  2020-02-14 21:17         ` David Hildenbrand
@ 2020-02-16  9:46         ` Michael S. Tsirkin
  1 sibling, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-16  9:46 UTC (permalink / raw)
  To: Tyler Sanderson
  Cc: David Hildenbrand, Michal Hocko, linux-kernel, linux-mm,
	virtualization, Wei Wang, Alexander Duyck, David Rientjes,
	Nadav Amit

On Fri, Feb 14, 2020 at 12:48:42PM -0800, Tyler Sanderson wrote:
> Regarding Wei's patch that modifies the shrinker implementation, versus this
> patch which reverts to OOM notifier:
> I am in favor of both patches. But I do want to make sure a fix gets back
> ported to 4.19 where the performance regression was first introduced.
> My concern with reverting to the OOM notifier is, as mst@ put it (in the other
> thread):
> "when linux hits OOM all kind of error paths are being hit, latent bugs start
> triggering, latency goes up drastically."
> The guest could be in a lot of pain before the OOM notifier is invoked, and it
> seems like the shrinker API might allow more fine grained control of when we
> deflate.
> 
> On the other hand, I'm not totally convinced that Wei's patch is an expected
> use of the shrinker/page-cache APIs, and maybe it is fragile. Needs more
> testing and scrutiny.
> 
> It seems to me like the shrinker API is the right API in the long run, perhaps
> with some fixes and modifications. But maybe reverting to OOM notifier is the
> best patch to back port?

In that case can I see some Tested-by reports pls?


> On Fri, Feb 14, 2020 at 6:19 AM David Hildenbrand <david@redhat.com> wrote:
> 
>     >> There was a report that this results in undesired side effects when
>     >> inflating the balloon to shrink the page cache. [1]
>     >>      "When inflating the balloon against page cache (i.e. no free memory
>     >>       remains) vmscan.c will both shrink page cache, but also invoke the
>     >>       shrinkers -- including the balloon's shrinker. So the balloon
>     >>       driver allocates memory which requires reclaim, vmscan gets this
>     >>       memory by shrinking the balloon, and then the driver adds the
>     >>       memory back to the balloon. Basically a busy no-op."
>     >>
>     >> The name "deflate on OOM" makes it pretty clear when deflation should
>     >> happen - after other approaches to reclaim memory failed, not while
>     >> reclaiming. This allows to minimize the footprint of a guest - memory
>     >> will only be taken out of the balloon when really needed.
>     >>
>     >> Especially, a drop_slab() will result in the whole balloon getting
>     >> deflated - undesired.
>     >
>     > Could you explain why some more? drop_caches shouldn't be really used in
>     > any production workloads and if somebody really wants all the cache to
>     > be dropped then why is balloon any different?
>     >
> 
>     Deflation should happen when the guest is out of memory, not when
>     somebody thinks it's time to reclaim some memory. That's what the
>     feature promised from the beginning: Only give the guest more memory in
>     case it *really* needs more memory.
> 
>     Deflate on oom, not deflate on reclaim/memory pressure. (that's what the
>     report was all about)
> 
>     A priority for shrinkers might be a step into the right direction.
> 
>     --
>     Thanks,
> 
>     David / dhildenb
> 
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-14  9:51   ` David Hildenbrand
  2020-02-14 13:31     ` Wang, Wei W
@ 2020-02-16  9:47     ` Michael S. Tsirkin
  2020-02-21  3:29       ` Tyler Sanderson
  1 sibling, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-02-16  9:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtualization, Tyler Sanderson,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Fri, Feb 14, 2020 at 10:51:43AM +0100, David Hildenbrand wrote:
> On 05.02.20 17:34, David Hildenbrand wrote:
> > Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
> > changed the behavior when deflation happens automatically. Instead of
> > deflating when called by the OOM handler, the shrinker is used.
> > 
> > However, the balloon is not simply some slab cache that should be
> > shrunk when under memory pressure. The shrinker does not have a concept of
> > priorities, so this behavior cannot be configured.
> > 
> > There was a report that this results in undesired side effects when
> > inflating the balloon to shrink the page cache. [1]
> > 	"When inflating the balloon against page cache (i.e. no free memory
> > 	 remains) vmscan.c will both shrink page cache, but also invoke the
> > 	 shrinkers -- including the balloon's shrinker. So the balloon
> > 	 driver allocates memory which requires reclaim, vmscan gets this
> > 	 memory by shrinking the balloon, and then the driver adds the
> > 	 memory back to the balloon. Basically a busy no-op."
> > 
> > The name "deflate on OOM" makes it pretty clear when deflation should
> > happen - after other approaches to reclaim memory failed, not while
> > reclaiming. This allows to minimize the footprint of a guest - memory
> > will only be taken out of the balloon when really needed.
> > 
> > Especially, a drop_slab() will result in the whole balloon getting
> > deflated - undesired. While handling it via the OOM handler might not be
> > perfect, it keeps existing behavior. If we want a different behavior, then
> > we need a new feature bit and document it properly (although, there should
> > be a clear use case and the intended effects should be well described).
> > 
> > Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> > this has no such side effects. Always register the shrinker with
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
> > pages that are still to be processed by the guest. The hypervisor takes
> > care of identifying and resolving possible races between processing a
> > hinting request and the guest reusing a page.
> > 
> > In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> > notifier with shrinker"), don't add a moodule parameter to configure the
> > number of pages to deflate on OOM. Can be re-added if really needed.
> > Also, pay attention that leak_balloon() returns the number of 4k pages -
> > convert it properly in virtio_balloon_oom_notify().
> > 
> > Note1: using the OOM handler is frowned upon, but it really is what we
> >        need for this feature.
> > 
> > Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
> >        could actually skip sending deflation requests to our hypervisor,
> >        making the OOM path *very* simple. Besically freeing pages and
> >        updating the balloon. If the communication with the host ever
> >        becomes a problem on this call path.
> > 
> 
> @Michael, how to proceed with this?
> 

I'd like to see some reports that this helps people.
e.g. a tested-by tag.

> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-16  9:47     ` Michael S. Tsirkin
@ 2020-02-21  3:29       ` Tyler Sanderson
  2020-03-08  4:47         ` Tyler Sanderson
  0 siblings, 1 reply; 32+ messages in thread
From: Tyler Sanderson @ 2020-02-21  3:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Hildenbrand, linux-kernel, linux-mm, virtualization,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 3842 bytes --]

Testing this patch is on my short-term TODO list, but I wasn't able to get
to it this week. It is prioritized.

In the meantime, I can anecdotally vouch that kernels before 4.19, the ones
using the OOM notifier callback, have roughly 10x faster balloon inflation
when pressuring the cache. So I anticipate this patch will return to that
state and help my use case.

I will try to post official measurements of this patch next week.

On Sun, Feb 16, 2020 at 1:47 AM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Fri, Feb 14, 2020 at 10:51:43AM +0100, David Hildenbrand wrote:
> > On 05.02.20 17:34, David Hildenbrand wrote:
> > > Commit 71994620bb25 ("virtio_balloon: replace oom notifier with
> shrinker")
> > > changed the behavior when deflation happens automatically. Instead of
> > > deflating when called by the OOM handler, the shrinker is used.
> > >
> > > However, the balloon is not simply some slab cache that should be
> > > shrunk when under memory pressure. The shrinker does not have a
> concept of
> > > priorities, so this behavior cannot be configured.
> > >
> > > There was a report that this results in undesired side effects when
> > > inflating the balloon to shrink the page cache. [1]
> > >     "When inflating the balloon against page cache (i.e. no free memory
> > >      remains) vmscan.c will both shrink page cache, but also invoke the
> > >      shrinkers -- including the balloon's shrinker. So the balloon
> > >      driver allocates memory which requires reclaim, vmscan gets this
> > >      memory by shrinking the balloon, and then the driver adds the
> > >      memory back to the balloon. Basically a busy no-op."
> > >
> > > The name "deflate on OOM" makes it pretty clear when deflation should
> > > happen - after other approaches to reclaim memory failed, not while
> > > reclaiming. This allows to minimize the footprint of a guest - memory
> > > will only be taken out of the balloon when really needed.
> > >
> > > Especially, a drop_slab() will result in the whole balloon getting
> > > deflated - undesired. While handling it via the OOM handler might not
> be
> > > perfect, it keeps existing behavior. If we want a different behavior,
> then
> > > we need a new feature bit and document it properly (although, there
> should
> > > be a clear use case and the intended effects should be well described).
> > >
> > > Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
> > > this has no such side effects. Always register the shrinker with
> > > VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse
> free
> > > pages that are still to be processed by the guest. The hypervisor takes
> > > care of identifying and resolving possible races between processing a
> > > hinting request and the guest reusing a page.
> > >
> > > In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
> > > notifier with shrinker"), don't add a moodule parameter to configure
> the
> > > number of pages to deflate on OOM. Can be re-added if really needed.
> > > Also, pay attention that leak_balloon() returns the number of 4k pages
> -
> > > convert it properly in virtio_balloon_oom_notify().
> > >
> > > Note1: using the OOM handler is frowned upon, but it really is what we
> > >        need for this feature.
> > >
> > > Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU)
> we
> > >        could actually skip sending deflation requests to our
> hypervisor,
> > >        making the OOM path *very* simple. Besically freeing pages and
> > >        updating the balloon. If the communication with the host ever
> > >        becomes a problem on this call path.
> > >
> >
> > @Michael, how to proceed with this?
> >
>
> I'd like to see some reports that this helps people.
> e.g. a tested-by tag.
>
> > --
> > Thanks,
> >
> > David / dhildenb
>
>

[-- Attachment #2: Type: text/html, Size: 4768 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-02-21  3:29       ` Tyler Sanderson
@ 2020-03-08  4:47         ` Tyler Sanderson
  2020-03-09  9:03           ` David Hildenbrand
  2020-03-09 10:24           ` Michael S. Tsirkin
  0 siblings, 2 replies; 32+ messages in thread
From: Tyler Sanderson @ 2020-03-08  4:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Hildenbrand, linux-kernel, linux-mm, virtualization,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 5583 bytes --]

Tested-by: Tyler Sanderson <tysand@google.com>

Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
GB file full of random bytes that we continually cat to /dev/null.
This fills the page cache as the file is read. Meanwhile we trigger
the balloon to inflate, with a target size of 53 GB. This setup causes
the balloon inflation to pressure the page cache as the page cache is
also trying to grow. Afterwards we shrink the balloon back to zero (so
total deflate = total inflate).

Without patch (kernel 4.19.0-5):
Inflation never reaches the target until we stop the "cat file >
/dev/null" process. Total inflation time was 542 seconds. The longest
period that made no net forward progress was 315 seconds (see attached
graph).
Result of "grep balloon /proc/vmstat" after the test:
balloon_inflate 154828377
balloon_deflate 154828377

With patch (kernel 5.6.0-rc4+):
Total inflation duration was 63 seconds. No deflate-queue activity
occurs when pressuring the page-cache.
Result of "grep balloon /proc/vmstat" after the test:
balloon_inflate 12968539
balloon_deflate 12968539

Conclusion: This patch fixes the issue. In the test it reduced
inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
But more importantly, if we hadn't killed the "grep balloon
/proc/vmstat" process then, without the patch, the inflation process
would never reach the target.

Attached is a png of a graph showing the problematic behavior without
this patch. It shows deflate-queue activity increasing linearly while
balloon size stays constant over the course of more than 8 minutes of
the test.


On Thu, Feb 20, 2020 at 7:29 PM Tyler Sanderson <tysand@google.com> wrote:
>
> Testing this patch is on my short-term TODO list, but I wasn't able to get to it this week. It is prioritized.
>
> In the meantime, I can anecdotally vouch that kernels before 4.19, the ones using the OOM notifier callback, have roughly 10x faster balloon inflation when pressuring the cache. So I anticipate this patch will return to that state and help my use case.
>
> I will try to post official measurements of this patch next week.
>
> On Sun, Feb 16, 2020 at 1:47 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Fri, Feb 14, 2020 at 10:51:43AM +0100, David Hildenbrand wrote:
>> > On 05.02.20 17:34, David Hildenbrand wrote:
>> > > Commit 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
>> > > changed the behavior when deflation happens automatically. Instead of
>> > > deflating when called by the OOM handler, the shrinker is used.
>> > >
>> > > However, the balloon is not simply some slab cache that should be
>> > > shrunk when under memory pressure. The shrinker does not have a concept of
>> > > priorities, so this behavior cannot be configured.
>> > >
>> > > There was a report that this results in undesired side effects when
>> > > inflating the balloon to shrink the page cache. [1]
>> > >     "When inflating the balloon against page cache (i.e. no free memory
>> > >      remains) vmscan.c will both shrink page cache, but also invoke the
>> > >      shrinkers -- including the balloon's shrinker. So the balloon
>> > >      driver allocates memory which requires reclaim, vmscan gets this
>> > >      memory by shrinking the balloon, and then the driver adds the
>> > >      memory back to the balloon. Basically a busy no-op."
>> > >
>> > > The name "deflate on OOM" makes it pretty clear when deflation should
>> > > happen - after other approaches to reclaim memory failed, not while
>> > > reclaiming. This allows to minimize the footprint of a guest - memory
>> > > will only be taken out of the balloon when really needed.
>> > >
>> > > Especially, a drop_slab() will result in the whole balloon getting
>> > > deflated - undesired. While handling it via the OOM handler might not be
>> > > perfect, it keeps existing behavior. If we want a different behavior, then
>> > > we need a new feature bit and document it properly (although, there should
>> > > be a clear use case and the intended effects should be well described).
>> > >
>> > > Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
>> > > this has no such side effects. Always register the shrinker with
>> > > VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
>> > > pages that are still to be processed by the guest. The hypervisor takes
>> > > care of identifying and resolving possible races between processing a
>> > > hinting request and the guest reusing a page.
>> > >
>> > > In contrast to pre commit 71994620bb25 ("virtio_balloon: replace oom
>> > > notifier with shrinker"), don't add a moodule parameter to configure the
>> > > number of pages to deflate on OOM. Can be re-added if really needed.
>> > > Also, pay attention that leak_balloon() returns the number of 4k pages -
>> > > convert it properly in virtio_balloon_oom_notify().
>> > >
>> > > Note1: using the OOM handler is frowned upon, but it really is what we
>> > >        need for this feature.
>> > >
>> > > Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
>> > >        could actually skip sending deflation requests to our hypervisor,
>> > >        making the OOM path *very* simple. Besically freeing pages and
>> > >        updating the balloon. If the communication with the host ever
>> > >        becomes a problem on this call path.
>> > >
>> >
>> > @Michael, how to proceed with this?
>> >
>>
>> I'd like to see some reports that this helps people.
>> e.g. a tested-by tag.
>>
>> > --
>> > Thanks,
>> >
>> > David / dhildenb
>>

[-- Attachment #2: without_patch.png --]
[-- Type: image/png, Size: 13504 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-03-08  4:47         ` Tyler Sanderson
@ 2020-03-09  9:03           ` David Hildenbrand
  2020-03-09 10:14             ` Michael S. Tsirkin
  2020-03-09 10:24           ` Michael S. Tsirkin
  1 sibling, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2020-03-09  9:03 UTC (permalink / raw)
  To: Tyler Sanderson, Michael S. Tsirkin
  Cc: linux-kernel, linux-mm, virtualization, Wei Wang,
	Alexander Duyck, David Rientjes, Nadav Amit, Michal Hocko

On 08.03.20 05:47, Tyler Sanderson wrote:
> Tested-by: Tyler Sanderson <tysand@google.com>
> 
> Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
> GB file full of random bytes that we continually cat to /dev/null.
> This fills the page cache as the file is read. Meanwhile we trigger
> the balloon to inflate, with a target size of 53 GB. This setup causes
> the balloon inflation to pressure the page cache as the page cache is
> also trying to grow. Afterwards we shrink the balloon back to zero (so
> total deflate = total inflate).
> 
> Without patch (kernel 4.19.0-5):
> Inflation never reaches the target until we stop the "cat file >
> /dev/null" process. Total inflation time was 542 seconds. The longest
> period that made no net forward progress was 315 seconds (see attached
> graph).
> Result of "grep balloon /proc/vmstat" after the test:
> balloon_inflate 154828377
> balloon_deflate 154828377
> 
> With patch (kernel 5.6.0-rc4+):
> Total inflation duration was 63 seconds. No deflate-queue activity
> occurs when pressuring the page-cache.
> Result of "grep balloon /proc/vmstat" after the test:
> balloon_inflate 12968539
> balloon_deflate 12968539
> 
> Conclusion: This patch fixes the issue. In the test it reduced
> inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
> But more importantly, if we hadn't killed the "grep balloon
> /proc/vmstat" process then, without the patch, the inflation process
> would never reach the target.
> 
> Attached is a png of a graph showing the problematic behavior without
> this patch. It shows deflate-queue activity increasing linearly while
> balloon size stays constant over the course of more than 8 minutes of
> the test.

Thanks a lot for the extended test!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-03-09  9:03           ` David Hildenbrand
@ 2020-03-09 10:14             ` Michael S. Tsirkin
  2020-03-09 10:59               ` David Hildenbrand
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-03-09 10:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Tyler Sanderson, linux-kernel, linux-mm, virtualization,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Mon, Mar 09, 2020 at 10:03:14AM +0100, David Hildenbrand wrote:
> On 08.03.20 05:47, Tyler Sanderson wrote:
> > Tested-by: Tyler Sanderson <tysand@google.com>
> > 
> > Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
> > GB file full of random bytes that we continually cat to /dev/null.
> > This fills the page cache as the file is read. Meanwhile we trigger
> > the balloon to inflate, with a target size of 53 GB. This setup causes
> > the balloon inflation to pressure the page cache as the page cache is
> > also trying to grow. Afterwards we shrink the balloon back to zero (so
> > total deflate = total inflate).
> > 
> > Without patch (kernel 4.19.0-5):
> > Inflation never reaches the target until we stop the "cat file >
> > /dev/null" process. Total inflation time was 542 seconds. The longest
> > period that made no net forward progress was 315 seconds (see attached
> > graph).
> > Result of "grep balloon /proc/vmstat" after the test:
> > balloon_inflate 154828377
> > balloon_deflate 154828377
> > 
> > With patch (kernel 5.6.0-rc4+):
> > Total inflation duration was 63 seconds. No deflate-queue activity
> > occurs when pressuring the page-cache.
> > Result of "grep balloon /proc/vmstat" after the test:
> > balloon_inflate 12968539
> > balloon_deflate 12968539
> > 
> > Conclusion: This patch fixes the issue. In the test it reduced
> > inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
> > But more importantly, if we hadn't killed the "grep balloon
> > /proc/vmstat" process then, without the patch, the inflation process
> > would never reach the target.
> > 
> > Attached is a png of a graph showing the problematic behavior without
> > this patch. It shows deflate-queue activity increasing linearly while
> > balloon size stays constant over the course of more than 8 minutes of
> > the test.
> 
> Thanks a lot for the extended test!


Given we shipped this for a long time, I think the best way
to make progress is to merge 1/3, 2/3 right now, and 3/3
in the next release.

> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-03-08  4:47         ` Tyler Sanderson
  2020-03-09  9:03           ` David Hildenbrand
@ 2020-03-09 10:24           ` Michael S. Tsirkin
  1 sibling, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2020-03-09 10:24 UTC (permalink / raw)
  To: Tyler Sanderson
  Cc: David Hildenbrand, linux-kernel, linux-mm, virtualization,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On Sat, Mar 07, 2020 at 08:47:25PM -0800, Tyler Sanderson wrote:
> Tested-by: Tyler Sanderson <tysand@google.com>
> 
> Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
> GB file full of random bytes that we continually cat to /dev/null.
> This fills the page cache as the file is read. Meanwhile we trigger
> the balloon to inflate, with a target size of 53 GB. This setup causes
> the balloon inflation to pressure the page cache as the page cache is
> also trying to grow. Afterwards we shrink the balloon back to zero (so
> total deflate = total inflate).
> 
> Without patch (kernel 4.19.0-5):
> Inflation never reaches the target until we stop the "cat file >
> /dev/null" process. Total inflation time was 542 seconds. The longest
> period that made no net forward progress was 315 seconds (see attached
> graph).
> Result of "grep balloon /proc/vmstat" after the test:
> balloon_inflate 154828377
> balloon_deflate 154828377
> 
> With patch (kernel 5.6.0-rc4+):
> Total inflation duration was 63 seconds. No deflate-queue activity
> occurs when pressuring the page-cache.
> Result of "grep balloon /proc/vmstat" after the test:
> balloon_inflate 12968539
> balloon_deflate 12968539
> 
> Conclusion: This patch fixes the issue. In the test it reduced
> inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
> But more importantly, if we hadn't killed the "grep balloon
> /proc/vmstat" process then, without the patch, the inflation process
> would never reach the target.
> 
> Attached is a png of a graph showing the problematic behavior without
> this patch. It shows deflate-queue activity increasing linearly while
> balloon size stays constant over the course of more than 8 minutes of
> the test.

OK this is now queued for -next. Tyler thanks a lot for the detailed
test report - it's really awesome! I included it in the commit log in
full so that if we need to come back to this it's easy to reproduce the
testing.

-- 
MST



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  2020-03-09 10:14             ` Michael S. Tsirkin
@ 2020-03-09 10:59               ` David Hildenbrand
  0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-03-09 10:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tyler Sanderson, linux-kernel, linux-mm, virtualization,
	Wei Wang, Alexander Duyck, David Rientjes, Nadav Amit,
	Michal Hocko

On 09.03.20 11:14, Michael S. Tsirkin wrote:
> On Mon, Mar 09, 2020 at 10:03:14AM +0100, David Hildenbrand wrote:
>> On 08.03.20 05:47, Tyler Sanderson wrote:
>>> Tested-by: Tyler Sanderson <tysand@google.com>
>>>
>>> Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
>>> GB file full of random bytes that we continually cat to /dev/null.
>>> This fills the page cache as the file is read. Meanwhile we trigger
>>> the balloon to inflate, with a target size of 53 GB. This setup causes
>>> the balloon inflation to pressure the page cache as the page cache is
>>> also trying to grow. Afterwards we shrink the balloon back to zero (so
>>> total deflate = total inflate).
>>>
>>> Without patch (kernel 4.19.0-5):
>>> Inflation never reaches the target until we stop the "cat file >
>>> /dev/null" process. Total inflation time was 542 seconds. The longest
>>> period that made no net forward progress was 315 seconds (see attached
>>> graph).
>>> Result of "grep balloon /proc/vmstat" after the test:
>>> balloon_inflate 154828377
>>> balloon_deflate 154828377
>>>
>>> With patch (kernel 5.6.0-rc4+):
>>> Total inflation duration was 63 seconds. No deflate-queue activity
>>> occurs when pressuring the page-cache.
>>> Result of "grep balloon /proc/vmstat" after the test:
>>> balloon_inflate 12968539
>>> balloon_deflate 12968539
>>>
>>> Conclusion: This patch fixes the issue. In the test it reduced
>>> inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
>>> But more importantly, if we hadn't killed the "grep balloon
>>> /proc/vmstat" process then, without the patch, the inflation process
>>> would never reach the target.
>>>
>>> Attached is a png of a graph showing the problematic behavior without
>>> this patch. It shows deflate-queue activity increasing linearly while
>>> balloon size stays constant over the course of more than 8 minutes of
>>> the test.
>>
>> Thanks a lot for the extended test!
> 
> 
> Given we shipped this for a long time, I think the best way
> to make progress is to merge 1/3, 2/3 right now, and 3/3
> in the next release.

Agreed.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-03-09 11:00 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 16:33 [PATCH v1 0/3] virtio-balloon: Fixes + switch back to OOM handler David Hildenbrand
2020-02-05 16:34 ` [PATCH v1 1/3] virtio-balloon: Fix memory leak when unloading while hinting is in progress David Hildenbrand
2020-02-06  8:36   ` Michael S. Tsirkin
2020-02-05 16:34 ` [PATCH v1 2/3] virtio_balloon: Fix memory leaks on errors in virtballoon_probe() David Hildenbrand
2020-02-06  8:36   ` Michael S. Tsirkin
2020-02-05 16:34 ` [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM David Hildenbrand
2020-02-05 22:37   ` Tyler Sanderson
2020-02-05 22:52     ` David Hildenbrand
2020-02-05 23:06       ` Tyler Sanderson
2020-02-06  7:40   ` Michael S. Tsirkin
2020-02-06  8:42     ` David Hildenbrand
2020-02-06  8:57       ` Michael S. Tsirkin
2020-02-06  9:05         ` David Hildenbrand
2020-02-06  9:09           ` Michael S. Tsirkin
2020-02-06  8:57   ` Wang, Wei W
2020-02-06  9:11   ` Michael S. Tsirkin
2020-02-06  9:12   ` Michael S. Tsirkin
2020-02-06  9:21     ` David Hildenbrand
2020-02-14  9:51   ` David Hildenbrand
2020-02-14 13:31     ` Wang, Wei W
2020-02-16  9:47     ` Michael S. Tsirkin
2020-02-21  3:29       ` Tyler Sanderson
2020-03-08  4:47         ` Tyler Sanderson
2020-03-09  9:03           ` David Hildenbrand
2020-03-09 10:14             ` Michael S. Tsirkin
2020-03-09 10:59               ` David Hildenbrand
2020-03-09 10:24           ` Michael S. Tsirkin
2020-02-14 14:06   ` Michal Hocko
2020-02-14 14:18     ` David Hildenbrand
2020-02-14 20:48       ` Tyler Sanderson
2020-02-14 21:17         ` David Hildenbrand
2020-02-16  9:46         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).