All of lore.kernel.org
 help / color / mirror / Atom feed
* Tentative fix for "out of PoD memory" issue
@ 2021-10-21 11:53 Juergen Gross
  2021-10-21 13:54 ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 2+ messages in thread
From: Juergen Gross @ 2021-10-21 11:53 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 426 bytes --]

Marek,

could you please test whether the attached patch is fixing your
problem?

BTW, I don't think this couldn't happen before kernel 5.15. I guess
my modification to use a kernel thread instead of a workqueue just
made the issue more probable.

I couldn't reproduce the crash you are seeing, but the introduced
wait was 4.2 seconds on my test system (a PVH guest with 2 GB of
memory, maxmem 6 GB).


Juergen

[-- Attachment #1.1.2: 0001-xen-balloon-add-late_initcall_sync-for-initial-ballo.patch --]
[-- Type: text/x-patch, Size: 2156 bytes --]

From 3ee35f6f110e2258ec94f0d1397fac8c26b41761 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
To: linux-kernel@vger.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org
Date: Thu, 21 Oct 2021 12:51:06 +0200
Subject: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning
 done
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.

In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.

In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/balloon.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3a50f097ed3e..d19b851c3d3b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -765,3 +765,23 @@ static int __init balloon_init(void)
 	return 0;
 }
 subsys_initcall(balloon_init);
+
+static int __init balloon_wait_finish(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	/* PV guests don't need to wait. */
+	if (xen_pv_domain() || !current_credit())
+		return 0;
+
+	pr_info("Waiting for initial ballooning down having finished.\n");
+
+	while (current_credit())
+		schedule_timeout_interruptible(HZ / 10);
+
+	pr_info("Initial ballooning down finished.\n");
+
+	return 0;
+}
+late_initcall_sync(balloon_wait_finish);
-- 
2.26.2


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: Tentative fix for "out of PoD memory" issue
  2021-10-21 11:53 Tentative fix for "out of PoD memory" issue Juergen Gross
@ 2021-10-21 13:54 ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 2+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-10-21 13:54 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2498 bytes --]

On Thu, Oct 21, 2021 at 01:53:06PM +0200, Juergen Gross wrote:
> Marek,
> 
> could you please test whether the attached patch is fixing your
> problem?

Sure. In fact, I made a similar patch in the meantime (attached) to
experiment with this a bit.

> BTW, I don't think this couldn't happen before kernel 5.15. I guess
> my modification to use a kernel thread instead of a workqueue just
> made the issue more probable.

I think you are right here. But this all looks still a bit weird.

1. baseline: 5.10.61 (before using kernel thread - which was backported to
stable branches).

Here the startup completes successfully (no "out of PoD memory"
issue) with memory=270MB. 

2. 5.10.61 with added boot delay patch:
The delay is about 18s and the guest boot successfully.

3. 5.10.71 with "xen/balloon: fix cancelled balloon action" but without
delay patch:
The domain is killed during startup (in the middle of fsck, I think)

4. 5.10.74 with delay patch:
The delay is about 19s and the guest boot successfully.

Now the weird part: with memory=270MB with the delay patch, the balloon
delay _fails_ - state=BP_ECANCELED, and credit is -19712 at that time.
In both thread and workqueue balloon variants. Yet, it isn't killed (*).
But with 5.10.61, even without the delay patch, the guest starts
successfully in the end.

Also, I think there was some implicit wait for initial balloon down
before. That was the main motivation for 197ecb3802c0 "xen/balloon: add
runtime control for scrubbing ballooned out pages" - because that
initial balloon down held the system startup for some long time. Sadly,
I can't find my notes from debugging that (especially if I had written
down a stacktrace _where_ exactly it was waiting)...

> I couldn't reproduce the crash you are seeing, but the introduced
> wait was 4.2 seconds on my test system (a PVH guest with 2 GB of
> memory, maxmem 6 GB).

I'm testing it on a much more aggressive setting:
 - memory: 270 MB (the minimal that is sufficient to boot the system)
 - maxmem: 4 GB

The default settings in Qubes are:
 - memory: 400 MB
 - maxmem: 4 GB

That should explains why it happens on Qubes way more often than
elsewhere.


(*) At some point during system boot, qubes memory manager kicks in and
the VM gets more memory. But it's rather late, and definitely after it is
killed with "out of PoD memory" in other cases.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #1.2: balloon-wait-5.10.61.patch --]
[-- Type: text/plain, Size: 3701 bytes --]

From 947818c731094a952d4955e99a23ef336daf7ab9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
 <marmarek@invisiblethingslab.com>
Date: Thu, 21 Oct 2021 01:10:21 +0200
Subject: [PATCH] WIP: xen/balloon: wait for initial balloon down before
 starting userspace
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Organization: Invisible Things Lab
Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

When HVM/PVH guest with maxmem > memory, a populate on demand feature is
used. This allows the guest to see up to 'maxmem' memory, but when it
tries to use more than 'memory', it is crashed. Balloon driver should
prevent that by ballooning down the guest before it tries to use too
much memory. Unfortunately, this was done asynchronously and it wasn't
really guaranteed to be quick enough. And indeed, with recent kernel
versions, the initial balloon down process is slower and guests with
small initial 'memory' are crashed frequently by Xen.

Fix this by adding late init call that waits for the initial balloon
down to complete, before allowing any userspace to run. If that initial
balloon down fails, it is very likely that guest will be killed (as soon
as it will really use all the memory that something has allocated) -
print a message about that to aid diagnosing issues.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
(cherry picked from commit 9a226e669c918c98ee603ee30a1798da6434a423)
---
 drivers/xen/balloon.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b57b2067ecbf..c2a4e25a14dc 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -56,6 +56,7 @@
 #include <linux/percpu-defs.h>
 #include <linux/slab.h>
 #include <linux/sysctl.h>
+#include <linux/completion.h>
 
 #include <asm/page.h>
 #include <asm/tlb.h>
@@ -136,6 +137,8 @@ static DEFINE_MUTEX(balloon_mutex);
 struct balloon_stats balloon_stats;
 EXPORT_SYMBOL_GPL(balloon_stats);
 
+static DECLARE_COMPLETION(initial_balloon);
+
 /* We increase/decrease in batches which fit in a page */
 static xen_pfn_t frame_list[PAGE_SIZE / sizeof(xen_pfn_t)];
 
@@ -501,7 +504,6 @@ static void balloon_process(struct work_struct *work)
 	enum bp_state state = BP_DONE;
 	long credit;
 
-
 	do {
 		mutex_lock(&balloon_mutex);
 
@@ -526,6 +528,15 @@ static void balloon_process(struct work_struct *work)
 
 		state = update_schedule(state);
 
+		if (credit >= 0)
+			complete(&initial_balloon);
+		else if (state == BP_ECANCELED) {
+			if (!completion_done(&initial_balloon) && !xen_pv_domain())
+				pr_err("Initial balloon down failed, expect the domain to be killed with \"out of PoD memory\" error by Xen.\n");
+			complete(&initial_balloon);
+		}
+
+
 		mutex_unlock(&balloon_mutex);
 
 		cond_resched();
@@ -677,6 +688,20 @@ static void __init balloon_add_region(unsigned long start_pfn,
 }
 #endif
 
+static int __init wait_for_initial_balloon_down(void)
+{
+	mutex_lock(&balloon_mutex);
+	/* optionally re-init completion after retrieving balloon target */
+	if (current_credit() < 0)
+		reinit_completion(&initial_balloon);
+	mutex_unlock(&balloon_mutex);
+	printk(KERN_INFO "waiting for initial balloon down %ld\n", current_credit());
+	wait_for_completion(&initial_balloon);
+	printk(KERN_INFO "done waiting for initial balloon down %ld\n", current_credit());
+	return 0;
+}
+late_initcall(wait_for_initial_balloon_down);
+
 static int __init balloon_init(void)
 {
 	if (!xen_domain())
-- 
2.31.1


[-- Attachment #1.3: balloon-wait-5.10.74.patch --]
[-- Type: text/plain, Size: 3474 bytes --]

From 9a226e669c918c98ee603ee30a1798da6434a423 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
 <marmarek@invisiblethingslab.com>
Date: Thu, 21 Oct 2021 01:10:21 +0200
Subject: [PATCH] WIP: xen/balloon: wait for initial balloon down before
 starting userspace
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Organization: Invisible Things Lab
Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

When HVM/PVH guest with maxmem > memory, a populate on demand feature is
used. This allows the guest to see up to 'maxmem' memory, but when it
tries to use more than 'memory', it is crashed. Balloon driver should
prevent that by ballooning down the guest before it tries to use too
much memory. Unfortunately, this was done asynchronously and it wasn't
really guaranteed to be quick enough. And indeed, with recent kernel
versions, the initial balloon down process is slower and guests with
small initial 'memory' are crashed frequently by Xen.

Fix this by adding late init call that waits for the initial balloon
down to complete, before allowing any userspace to run. If that initial
balloon down fails, it is very likely that guest will be killed (as soon
as it will really use all the memory that something has allocated) -
print a message about that to aid diagnosing issues.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
 drivers/xen/balloon.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 1911a62a6d9c..a91d90f91c81 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -58,6 +58,7 @@
 #include <linux/percpu-defs.h>
 #include <linux/slab.h>
 #include <linux/sysctl.h>
+#include <linux/completion.h>
 
 #include <asm/page.h>
 #include <asm/tlb.h>
@@ -140,6 +141,8 @@ static DEFINE_MUTEX(balloon_mutex);
 struct balloon_stats balloon_stats;
 EXPORT_SYMBOL_GPL(balloon_stats);
 
+static DECLARE_COMPLETION(initial_balloon);
+
 /* We increase/decrease in batches which fit in a page */
 static xen_pfn_t frame_list[PAGE_SIZE / sizeof(xen_pfn_t)];
 
@@ -531,6 +534,14 @@ static int balloon_thread(void *unused)
 
 		credit = current_credit();
 
+		if (credit >= 0)
+			complete(&initial_balloon);
+		else if (state == BP_ECANCELED) {
+			if (!completion_done(&initial_balloon) && !xen_pv_domain())
+				pr_err("Initial balloon down failed, expect the domain to be killed with \"out of PoD memory\" error by Xen.\n");
+			complete(&initial_balloon);
+		}
+
 		wait_event_freezable_timeout(balloon_thread_wq,
 			balloon_thread_cond(state, credit), timeout);
 
@@ -706,6 +717,20 @@ static void __init balloon_add_region(unsigned long start_pfn,
 }
 #endif
 
+static int __init wait_for_initial_balloon_down(void)
+{
+	mutex_lock(&balloon_mutex);
+	/* optionally re-init completion after retrieving balloon target */
+	if (current_credit() < 0)
+		reinit_completion(&initial_balloon);
+	mutex_unlock(&balloon_mutex);
+	printk(KERN_INFO "waiting for initial balloon down %ld\n", current_credit());
+	wait_for_completion(&initial_balloon);
+	printk(KERN_INFO "done waiting for initial balloon down %ld\n", current_credit());
+	return 0;
+}
+late_initcall(wait_for_initial_balloon_down);
+
 static int __init balloon_init(void)
 {
 	struct task_struct *task;
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-10-21 13:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 11:53 Tentative fix for "out of PoD memory" issue Juergen Gross
2021-10-21 13:54 ` Marek Marczykowski-Górecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.