All of lore.kernel.org
 help / color / mirror / Atom feed
* Tentative fix for "out of PoD memory" issue
@ 2021-10-21 11:53 Juergen Gross
  2021-10-21 13:54 ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 2+ messages in thread
From: Juergen Gross @ 2021-10-21 11:53 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 426 bytes --]

Marek,

could you please test whether the attached patch is fixing your
problem?

BTW, I don't think this couldn't happen before kernel 5.15. I guess
my modification to use a kernel thread instead of a workqueue just
made the issue more probable.

I couldn't reproduce the crash you are seeing, but the introduced
wait was 4.2 seconds on my test system (a PVH guest with 2 GB of
memory, maxmem 6 GB).


Juergen

[-- Attachment #1.1.2: 0001-xen-balloon-add-late_initcall_sync-for-initial-ballo.patch --]
[-- Type: text/x-patch, Size: 2156 bytes --]

From 3ee35f6f110e2258ec94f0d1397fac8c26b41761 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
To: linux-kernel@vger.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org
Date: Thu, 21 Oct 2021 12:51:06 +0200
Subject: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning
 done
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.

In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.

In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/balloon.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3a50f097ed3e..d19b851c3d3b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -765,3 +765,23 @@ static int __init balloon_init(void)
 	return 0;
 }
 subsys_initcall(balloon_init);
+
+static int __init balloon_wait_finish(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	/* PV guests don't need to wait. */
+	if (xen_pv_domain() || !current_credit())
+		return 0;
+
+	pr_info("Waiting for initial ballooning down having finished.\n");
+
+	while (current_credit())
+		schedule_timeout_interruptible(HZ / 10);
+
+	pr_info("Initial ballooning down finished.\n");
+
+	return 0;
+}
+late_initcall_sync(balloon_wait_finish);
-- 
2.26.2


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-10-21 13:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 11:53 Tentative fix for "out of PoD memory" issue Juergen Gross
2021-10-21 13:54 ` Marek Marczykowski-Górecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.