linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
@ 2021-11-02  9:19 Juergen Gross
  2021-11-04  1:55 ` Boris Ostrovsky
  2021-11-05 12:18 ` Boris Ostrovsky
  0 siblings, 2 replies; 9+ messages in thread
From: Juergen Gross @ 2021-11-02  9:19 UTC (permalink / raw)
  To: xen-devel, linux-doc, linux-kernel
  Cc: Juergen Gross, Jonathan Corbet, Boris Ostrovsky,
	Stefano Stabellini, stable, Marek Marczykowski-Górecki

When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.

In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.

In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.

Warn on console if initial ballooning fails, panic() after stalling
for more than 3 minutes per default. Add a module parameter for
changing this timeout.

Cc: <stable@vger.kernel.org>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- add warning and panic() when stalling (Marek Marczykowski-Górecki)
- don't wait if credit > 0
V3:
- issue warning only after ballooning failed (Marek Marczykowski-Górecki)
- make panic() timeout configurable via parameter
V4:
- fix boot parameter (Boris Ostrovsky)
- set new state directly in update_schedule() (Boris Ostrovsky)
---
 .../admin-guide/kernel-parameters.txt         |  7 ++
 drivers/xen/balloon.c                         | 86 ++++++++++++++-----
 2 files changed, 70 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..1396fd2d9031 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6349,6 +6349,13 @@
 			improve timer resolution at the expense of processing
 			more timer interrupts.
 
+	xen.balloon_boot_timeout= [XEN]
+			The time (in seconds) to wait before giving up to boot
+			in case initial ballooning fails to free enough memory.
+			Applies only when running as HVM or PVH guest and
+			started with less memory configured than allowed at
+			max. Default is 180.
+
 	xen.event_eoi_delay=	[XEN]
 			How long to delay EOI handling in case of event
 			storms (jiffies). Default is 10.
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3a50f097ed3e..3a661b7697d4 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -58,6 +58,7 @@
 #include <linux/percpu-defs.h>
 #include <linux/slab.h>
 #include <linux/sysctl.h>
+#include <linux/moduleparam.h>
 
 #include <asm/page.h>
 #include <asm/tlb.h>
@@ -73,6 +74,12 @@
 #include <xen/page.h>
 #include <xen/mem-reservation.h>
 
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "xen."
+
+static uint __read_mostly balloon_boot_timeout = 180;
+module_param(balloon_boot_timeout, uint, 0444);
+
 static int xen_hotplug_unpopulated;
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
@@ -125,12 +132,12 @@ static struct ctl_table xen_root[] = {
  * BP_ECANCELED: error, balloon operation canceled.
  */
 
-enum bp_state {
+static enum bp_state {
 	BP_DONE,
 	BP_WAIT,
 	BP_EAGAIN,
 	BP_ECANCELED
-};
+} balloon_state = BP_DONE;
 
 /* Main waiting point for xen-balloon thread. */
 static DECLARE_WAIT_QUEUE_HEAD(balloon_thread_wq);
@@ -199,18 +206,15 @@ static struct page *balloon_next_page(struct page *page)
 	return list_entry(next, struct page, lru);
 }
 
-static enum bp_state update_schedule(enum bp_state state)
+static void update_schedule(void)
 {
-	if (state == BP_WAIT)
-		return BP_WAIT;
-
-	if (state == BP_ECANCELED)
-		return BP_ECANCELED;
+	if (balloon_state == BP_WAIT || balloon_state == BP_ECANCELED)
+		return;
 
-	if (state == BP_DONE) {
+	if (balloon_state == BP_DONE) {
 		balloon_stats.schedule_delay = 1;
 		balloon_stats.retry_count = 1;
-		return BP_DONE;
+		return;
 	}
 
 	++balloon_stats.retry_count;
@@ -219,7 +223,8 @@ static enum bp_state update_schedule(enum bp_state state)
 			balloon_stats.retry_count > balloon_stats.max_retry_count) {
 		balloon_stats.schedule_delay = 1;
 		balloon_stats.retry_count = 1;
-		return BP_ECANCELED;
+		balloon_state = BP_ECANCELED;
+		return;
 	}
 
 	balloon_stats.schedule_delay <<= 1;
@@ -227,7 +232,7 @@ static enum bp_state update_schedule(enum bp_state state)
 	if (balloon_stats.schedule_delay > balloon_stats.max_schedule_delay)
 		balloon_stats.schedule_delay = balloon_stats.max_schedule_delay;
 
-	return BP_EAGAIN;
+	balloon_state = BP_EAGAIN;
 }
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
@@ -494,9 +499,9 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
  * Stop waiting if either state is BP_DONE and ballooning action is
  * needed, or if the credit has changed while state is not BP_DONE.
  */
-static bool balloon_thread_cond(enum bp_state state, long credit)
+static bool balloon_thread_cond(long credit)
 {
-	if (state == BP_DONE)
+	if (balloon_state == BP_DONE)
 		credit = 0;
 
 	return current_credit() != credit || kthread_should_stop();
@@ -510,13 +515,12 @@ static bool balloon_thread_cond(enum bp_state state, long credit)
  */
 static int balloon_thread(void *unused)
 {
-	enum bp_state state = BP_DONE;
 	long credit;
 	unsigned long timeout;
 
 	set_freezable();
 	for (;;) {
-		switch (state) {
+		switch (balloon_state) {
 		case BP_DONE:
 		case BP_ECANCELED:
 			timeout = 3600 * HZ;
@@ -532,7 +536,7 @@ static int balloon_thread(void *unused)
 		credit = current_credit();
 
 		wait_event_freezable_timeout(balloon_thread_wq,
-			balloon_thread_cond(state, credit), timeout);
+			balloon_thread_cond(credit), timeout);
 
 		if (kthread_should_stop())
 			return 0;
@@ -543,22 +547,23 @@ static int balloon_thread(void *unused)
 
 		if (credit > 0) {
 			if (balloon_is_inflated())
-				state = increase_reservation(credit);
+				balloon_state = increase_reservation(credit);
 			else
-				state = reserve_additional_memory();
+				balloon_state = reserve_additional_memory();
 		}
 
 		if (credit < 0) {
 			long n_pages;
 
 			n_pages = min(-credit, si_mem_available());
-			state = decrease_reservation(n_pages, GFP_BALLOON);
-			if (state == BP_DONE && n_pages != -credit &&
+			balloon_state = decrease_reservation(n_pages,
+							     GFP_BALLOON);
+			if (balloon_state == BP_DONE && n_pages != -credit &&
 			    n_pages < totalreserve_pages)
-				state = BP_EAGAIN;
+				balloon_state = BP_EAGAIN;
 		}
 
-		state = update_schedule(state);
+		update_schedule();
 
 		mutex_unlock(&balloon_mutex);
 
@@ -765,3 +770,38 @@ static int __init balloon_init(void)
 	return 0;
 }
 subsys_initcall(balloon_init);
+
+static int __init balloon_wait_finish(void)
+{
+	long credit, last_credit = 0;
+	unsigned long last_changed = 0;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	/* PV guests don't need to wait. */
+	if (xen_pv_domain() || !current_credit())
+		return 0;
+
+	pr_info("Waiting for initial ballooning down having finished.\n");
+
+	while ((credit = current_credit()) < 0) {
+		if (credit != last_credit) {
+			last_changed = jiffies;
+			last_credit = credit;
+		}
+		if (balloon_state == BP_ECANCELED) {
+			pr_warn_once("Initial ballooning failed, %ld pages need to be freed.\n",
+				     -credit);
+			if (jiffies - last_changed >= HZ * balloon_boot_timeout)
+				panic("Initial ballooning failed!\n");
+		}
+
+		schedule_timeout_interruptible(HZ / 10);
+	}
+
+	pr_info("Initial ballooning down finished.\n");
+
+	return 0;
+}
+late_initcall_sync(balloon_wait_finish);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-02  9:19 [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done Juergen Gross
@ 2021-11-04  1:55 ` Boris Ostrovsky
  2021-11-04 15:55   ` Boris Ostrovsky
  2021-11-05 12:18 ` Boris Ostrovsky
  1 sibling, 1 reply; 9+ messages in thread
From: Boris Ostrovsky @ 2021-11-04  1:55 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki


On 11/2/21 5:19 AM, Juergen Gross wrote:
> When running as PVH or HVM guest with actual memory < max memory the
> hypervisor is using "populate on demand" in order to allow the guest
> to balloon down from its maximum memory size. For this to work
> correctly the guest must not touch more memory pages than its target
> memory size as otherwise the PoD cache will be exhausted and the guest
> is crashed as a result of that.
>
> In extreme cases ballooning down might not be finished today before
> the init process is started, which can consume lots of memory.
>
> In order to avoid random boot crashes in such cases, add a late init
> call to wait for ballooning down having finished for PVH/HVM guests.
>
> Warn on console if initial ballooning fails, panic() after stalling
> for more than 3 minutes per default. Add a module parameter for
> changing this timeout.
>
> Cc: <stable@vger.kernel.org>
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>



Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-04  1:55 ` Boris Ostrovsky
@ 2021-11-04 15:55   ` Boris Ostrovsky
  2021-11-04 16:21     ` Juergen Gross
  2021-11-04 16:23     ` Marek Marczykowski-Górecki
  0 siblings, 2 replies; 9+ messages in thread
From: Boris Ostrovsky @ 2021-11-04 15:55 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki


On 11/3/21 9:55 PM, Boris Ostrovsky wrote:
>
> On 11/2/21 5:19 AM, Juergen Gross wrote:
>> When running as PVH or HVM guest with actual memory < max memory the
>> hypervisor is using "populate on demand" in order to allow the guest
>> to balloon down from its maximum memory size. For this to work
>> correctly the guest must not touch more memory pages than its target
>> memory size as otherwise the PoD cache will be exhausted and the guest
>> is crashed as a result of that.
>>
>> In extreme cases ballooning down might not be finished today before
>> the init process is started, which can consume lots of memory.
>>
>> In order to avoid random boot crashes in such cases, add a late init
>> call to wait for ballooning down having finished for PVH/HVM guests.
>>
>> Warn on console if initial ballooning fails, panic() after stalling
>> for more than 3 minutes per default. Add a module parameter for
>> changing this timeout.
>>
>> Cc: <stable@vger.kernel.org>
>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>
>
>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


This appears to have noticeable effect on boot time (and boot experience in general).


I have


   memory=1024
   maxmem=8192


And my boot time (on an admittedly slow box) went from 33 to 45 seconds. And boot pauses in the middle while it is waiting for ballooning to complete.


[    5.062714] xen:balloon: Waiting for initial ballooning down having finished.
[    5.449696] random: crng init done
[   34.613050] xen:balloon: Initial ballooning down finished.


So at least I think we should consider bumping log level down from info.



-boris


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-04 15:55   ` Boris Ostrovsky
@ 2021-11-04 16:21     ` Juergen Gross
  2021-11-04 16:34       ` Boris Ostrovsky
  2021-11-04 16:23     ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 9+ messages in thread
From: Juergen Gross @ 2021-11-04 16:21 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki


[-- Attachment #1.1.1: Type: text/plain, Size: 2120 bytes --]

On 04.11.21 16:55, Boris Ostrovsky wrote:
> 
> On 11/3/21 9:55 PM, Boris Ostrovsky wrote:
>>
>> On 11/2/21 5:19 AM, Juergen Gross wrote:
>>> When running as PVH or HVM guest with actual memory < max memory the
>>> hypervisor is using "populate on demand" in order to allow the guest
>>> to balloon down from its maximum memory size. For this to work
>>> correctly the guest must not touch more memory pages than its target
>>> memory size as otherwise the PoD cache will be exhausted and the guest
>>> is crashed as a result of that.
>>>
>>> In extreme cases ballooning down might not be finished today before
>>> the init process is started, which can consume lots of memory.
>>>
>>> In order to avoid random boot crashes in such cases, add a late init
>>> call to wait for ballooning down having finished for PVH/HVM guests.
>>>
>>> Warn on console if initial ballooning fails, panic() after stalling
>>> for more than 3 minutes per default. Add a module parameter for
>>> changing this timeout.
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Reported-by: Marek Marczykowski-Górecki 
>>> <marmarek@invisiblethingslab.com>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>>
>>
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> 
> This appears to have noticeable effect on boot time (and boot experience 
> in general).
> 
> 
> I have
> 
> 
>    memory=1024
>    maxmem=8192
> 
> 
> And my boot time (on an admittedly slow box) went from 33 to 45 seconds. 
> And boot pauses in the middle while it is waiting for ballooning to 
> complete.
> 
> 
> [    5.062714] xen:balloon: Waiting for initial ballooning down having 
> finished.
> [    5.449696] random: crng init done
> [   34.613050] xen:balloon: Initial ballooning down finished.

This shows that before it was just by chance that the PoD cache wasn't
exhausted.

> So at least I think we should consider bumping log level down from info.

Which level would you prefer? warn?

And if so, would you mind doing this while committing (I have one day
off tomorrow)?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-04 15:55   ` Boris Ostrovsky
  2021-11-04 16:21     ` Juergen Gross
@ 2021-11-04 16:23     ` Marek Marczykowski-Górecki
  1 sibling, 0 replies; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-11-04 16:23 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Juergen Gross, xen-devel, linux-doc, linux-kernel,
	Jonathan Corbet, Stefano Stabellini, stable

[-- Attachment #1: Type: text/plain, Size: 580 bytes --]

On Thu, Nov 04, 2021 at 11:55:34AM -0400, Boris Ostrovsky wrote:
> This appears to have noticeable effect on boot time (and boot experience in general).

Yes, this is kind of intentional. It avoids killing the domain ("out of
PoD memory") for the price of increased boot time. I still wonder why it
wasn't an issue before, although it could just be a race that was less
likely to loose before.

You can find some analysis of mine here:
https://lore.kernel.org/xen-devel/YXFxKC4shCATB913@mail-itl/

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-04 16:21     ` Juergen Gross
@ 2021-11-04 16:34       ` Boris Ostrovsky
  2021-11-04 16:44         ` Juergen Gross
  0 siblings, 1 reply; 9+ messages in thread
From: Boris Ostrovsky @ 2021-11-04 16:34 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki


On 11/4/21 12:21 PM, Juergen Gross wrote:
> On 04.11.21 16:55, Boris Ostrovsky wrote:
>>
>> On 11/3/21 9:55 PM, Boris Ostrovsky wrote:
>>>
>>> On 11/2/21 5:19 AM, Juergen Gross wrote:
>>>> When running as PVH or HVM guest with actual memory < max memory the
>>>> hypervisor is using "populate on demand" in order to allow the guest
>>>> to balloon down from its maximum memory size. For this to work
>>>> correctly the guest must not touch more memory pages than its target
>>>> memory size as otherwise the PoD cache will be exhausted and the guest
>>>> is crashed as a result of that.
>>>>
>>>> In extreme cases ballooning down might not be finished today before
>>>> the init process is started, which can consume lots of memory.
>>>>
>>>> In order to avoid random boot crashes in such cases, add a late init
>>>> call to wait for ballooning down having finished for PVH/HVM guests.
>>>>
>>>> Warn on console if initial ballooning fails, panic() after stalling
>>>> for more than 3 minutes per default. Add a module parameter for
>>>> changing this timeout.
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>>
>>>
>>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>
>>
>> This appears to have noticeable effect on boot time (and boot experience in general).
>>
>>
>> I have
>>
>>
>>    memory=1024
>>    maxmem=8192
>>
>>
>> And my boot time (on an admittedly slow box) went from 33 to 45 seconds. And boot pauses in the middle while it is waiting for ballooning to complete.
>>
>>
>> [    5.062714] xen:balloon: Waiting for initial ballooning down having finished.
>> [    5.449696] random: crng init done
>> [   34.613050] xen:balloon: Initial ballooning down finished.
>
> This shows that before it was just by chance that the PoD cache wasn't
> exhausted.


True.


>
>> So at least I think we should consider bumping log level down from info.
>
> Which level would you prefer? warn?
>

Notice? Although that won't make much difference as WARN is the default level.


I suppose we can't turn scrubbing off at this point?


> And if so, would you mind doing this while committing (I have one day
> off tomorrow)?


Yes, of course.


-boris


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-04 16:34       ` Boris Ostrovsky
@ 2021-11-04 16:44         ` Juergen Gross
  2021-11-04 17:05           ` Boris Ostrovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Juergen Gross @ 2021-11-04 16:44 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki


[-- Attachment #1.1.1: Type: text/plain, Size: 3020 bytes --]

On 04.11.21 17:34, Boris Ostrovsky wrote:
> 
> On 11/4/21 12:21 PM, Juergen Gross wrote:
>> On 04.11.21 16:55, Boris Ostrovsky wrote:
>>>
>>> On 11/3/21 9:55 PM, Boris Ostrovsky wrote:
>>>>
>>>> On 11/2/21 5:19 AM, Juergen Gross wrote:
>>>>> When running as PVH or HVM guest with actual memory < max memory the
>>>>> hypervisor is using "populate on demand" in order to allow the guest
>>>>> to balloon down from its maximum memory size. For this to work
>>>>> correctly the guest must not touch more memory pages than its target
>>>>> memory size as otherwise the PoD cache will be exhausted and the guest
>>>>> is crashed as a result of that.
>>>>>
>>>>> In extreme cases ballooning down might not be finished today before
>>>>> the init process is started, which can consume lots of memory.
>>>>>
>>>>> In order to avoid random boot crashes in such cases, add a late init
>>>>> call to wait for ballooning down having finished for PVH/HVM guests.
>>>>>
>>>>> Warn on console if initial ballooning fails, panic() after stalling
>>>>> for more than 3 minutes per default. Add a module parameter for
>>>>> changing this timeout.
>>>>>
>>>>> Cc: <stable@vger.kernel.org>
>>>>> Reported-by: Marek Marczykowski-Górecki 
>>>>> <marmarek@invisiblethingslab.com>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>
>>>>
>>>>
>>>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>
>>>
>>> This appears to have noticeable effect on boot time (and boot 
>>> experience in general).
>>>
>>>
>>> I have
>>>
>>>
>>>    memory=1024
>>>    maxmem=8192
>>>
>>>
>>> And my boot time (on an admittedly slow box) went from 33 to 45 
>>> seconds. And boot pauses in the middle while it is waiting for 
>>> ballooning to complete.
>>>
>>>
>>> [    5.062714] xen:balloon: Waiting for initial ballooning down 
>>> having finished.
>>> [    5.449696] random: crng init done
>>> [   34.613050] xen:balloon: Initial ballooning down finished.
>>
>> This shows that before it was just by chance that the PoD cache wasn't
>> exhausted.
> 
> 
> True.
> 
> 
>>
>>> So at least I think we should consider bumping log level down from info.
>>
>> Which level would you prefer? warn?
>>
> 
> Notice? Although that won't make much difference as WARN is the default 
> level.

Right. That was my thinking.

> I suppose we can't turn scrubbing off at this point?

I don't think we can be sure a ballooned page wasn't in use before. And
it could contain some data e.g. from the loaded initrd, maybe even put
there by the boot loader. So no, I wouldn't want to do that by default.

We could add another value to the xen_scrub_pages boot parameter, like
xen_scrub_pages=not-at-boot or some such. But this should be another
patch. And it should be documented that initrd or kernel data might
leak.

>> And if so, would you mind doing this while committing (I have one day
>> off tomorrow)?
> 
> 
> Yes, of course.

Thanks.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-04 16:44         ` Juergen Gross
@ 2021-11-04 17:05           ` Boris Ostrovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Boris Ostrovsky @ 2021-11-04 17:05 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki



>
>>> And if so, would you mind doing this while committing (I have one day
>>> off tomorrow)?
>>
>>
>> Yes, of course.
>
> Thanks.


So I'll change this to pr_notice, even though it's not the best solution.


-boris


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-11-02  9:19 [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done Juergen Gross
  2021-11-04  1:55 ` Boris Ostrovsky
@ 2021-11-05 12:18 ` Boris Ostrovsky
  1 sibling, 0 replies; 9+ messages in thread
From: Boris Ostrovsky @ 2021-11-05 12:18 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki


On 11/2/21 5:19 AM, Juergen Gross wrote:
> When running as PVH or HVM guest with actual memory < max memory the
> hypervisor is using "populate on demand" in order to allow the guest
> to balloon down from its maximum memory size. For this to work
> correctly the guest must not touch more memory pages than its target
> memory size as otherwise the PoD cache will be exhausted and the guest
> is crashed as a result of that.
>
> In extreme cases ballooning down might not be finished today before
> the init process is started, which can consume lots of memory.
>
> In order to avoid random boot crashes in such cases, add a late init
> call to wait for ballooning down having finished for PVH/HVM guests.
>
> Warn on console if initial ballooning fails, panic() after stalling
> for more than 3 minutes per default. Add a module parameter for
> changing this timeout.
>
> Cc: <stable@vger.kernel.org>
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>



Applied to for-linus-5-16b.


-boris


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-11-05 12:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-02  9:19 [PATCH v4] xen/balloon: add late_initcall_sync() for initial ballooning done Juergen Gross
2021-11-04  1:55 ` Boris Ostrovsky
2021-11-04 15:55   ` Boris Ostrovsky
2021-11-04 16:21     ` Juergen Gross
2021-11-04 16:34       ` Boris Ostrovsky
2021-11-04 16:44         ` Juergen Gross
2021-11-04 17:05           ` Boris Ostrovsky
2021-11-04 16:23     ` Marek Marczykowski-Górecki
2021-11-05 12:18 ` Boris Ostrovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).