linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
@ 2021-10-28 10:59 Juergen Gross
  2021-10-28 18:13 ` Boris Ostrovsky
  2021-10-28 20:16 ` Marek Marczykowski-Górecki
  0 siblings, 2 replies; 7+ messages in thread
From: Juergen Gross @ 2021-10-28 10:59 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini, stable,
	Marek Marczykowski-Górecki

When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.

In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.

In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.

Cc: <stable@vger.kernel.org>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/balloon.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3a50f097ed3e..d19b851c3d3b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -765,3 +765,23 @@ static int __init balloon_init(void)
 	return 0;
 }
 subsys_initcall(balloon_init);
+
+static int __init balloon_wait_finish(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	/* PV guests don't need to wait. */
+	if (xen_pv_domain() || !current_credit())
+		return 0;
+
+	pr_info("Waiting for initial ballooning down having finished.\n");
+
+	while (current_credit())
+		schedule_timeout_interruptible(HZ / 10);
+
+	pr_info("Initial ballooning down finished.\n");
+
+	return 0;
+}
+late_initcall_sync(balloon_wait_finish);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-10-28 10:59 [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done Juergen Gross
@ 2021-10-28 18:13 ` Boris Ostrovsky
  2021-10-28 20:16 ` Marek Marczykowski-Górecki
  1 sibling, 0 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2021-10-28 18:13 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel
  Cc: Stefano Stabellini, stable, Marek Marczykowski-Górecki


On 10/28/21 6:59 AM, Juergen Gross wrote:
> +
> +	while (current_credit())
> +		schedule_timeout_interruptible(HZ / 10);


Should we be concerned that we may get stuck here forever if for some reason we can't balloon down everything?


-boris



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-10-28 10:59 [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done Juergen Gross
  2021-10-28 18:13 ` Boris Ostrovsky
@ 2021-10-28 20:16 ` Marek Marczykowski-Górecki
  2021-10-29  4:48   ` Juergen Gross
  1 sibling, 1 reply; 7+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-10-28 20:16 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, linux-kernel, Boris Ostrovsky, Stefano Stabellini, stable

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:
> When running as PVH or HVM guest with actual memory < max memory the
> hypervisor is using "populate on demand" in order to allow the guest
> to balloon down from its maximum memory size. For this to work
> correctly the guest must not touch more memory pages than its target
> memory size as otherwise the PoD cache will be exhausted and the guest
> is crashed as a result of that.
> 
> In extreme cases ballooning down might not be finished today before
> the init process is started, which can consume lots of memory.
> 
> In order to avoid random boot crashes in such cases, add a late init
> call to wait for ballooning down having finished for PVH/HVM guests.
> 
> Cc: <stable@vger.kernel.org>
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

It may happen that initial balloon down fails (state==BP_ECANCELED). In
that case, it waits indefinitely. I think it should rather report a
failure (and panic? it's similar to OOM before PID 1 starts, so rather
hard to recover), instead of hanging.

Anyway, it does fix the boot crashes.

> ---
>  drivers/xen/balloon.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 3a50f097ed3e..d19b851c3d3b 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -765,3 +765,23 @@ static int __init balloon_init(void)
>  	return 0;
>  }
>  subsys_initcall(balloon_init);
> +
> +static int __init balloon_wait_finish(void)
> +{
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	/* PV guests don't need to wait. */
> +	if (xen_pv_domain() || !current_credit())
> +		return 0;
> +
> +	pr_info("Waiting for initial ballooning down having finished.\n");
> +
> +	while (current_credit())
> +		schedule_timeout_interruptible(HZ / 10);
> +
> +	pr_info("Initial ballooning down finished.\n");
> +
> +	return 0;
> +}
> +late_initcall_sync(balloon_wait_finish);
> -- 
> 2.26.2
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-10-28 20:16 ` Marek Marczykowski-Górecki
@ 2021-10-29  4:48   ` Juergen Gross
  2021-10-29  9:57     ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 7+ messages in thread
From: Juergen Gross @ 2021-10-29  4:48 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, linux-kernel, Boris Ostrovsky, Stefano Stabellini, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 1440 bytes --]

On 28.10.21 22:16, Marek Marczykowski-Górecki wrote:
> On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:
>> When running as PVH or HVM guest with actual memory < max memory the
>> hypervisor is using "populate on demand" in order to allow the guest
>> to balloon down from its maximum memory size. For this to work
>> correctly the guest must not touch more memory pages than its target
>> memory size as otherwise the PoD cache will be exhausted and the guest
>> is crashed as a result of that.
>>
>> In extreme cases ballooning down might not be finished today before
>> the init process is started, which can consume lots of memory.
>>
>> In order to avoid random boot crashes in such cases, add a late init
>> call to wait for ballooning down having finished for PVH/HVM guests.
>>
>> Cc: <stable@vger.kernel.org>
>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> It may happen that initial balloon down fails (state==BP_ECANCELED). In
> that case, it waits indefinitely. I think it should rather report a
> failure (and panic? it's similar to OOM before PID 1 starts, so rather
> hard to recover), instead of hanging.

Okay, I can add something like that. I'm thinking of issuing a failure
message in case of credit not having changed for 1 minute and panic()
after two more minutes. Is this fine?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-10-29  4:48   ` Juergen Gross
@ 2021-10-29  9:57     ` Marek Marczykowski-Górecki
  2021-10-29 10:22       ` Juergen Gross
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-10-29  9:57 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, linux-kernel, Boris Ostrovsky, Stefano Stabellini, stable

[-- Attachment #1: Type: text/plain, Size: 2146 bytes --]

On Fri, Oct 29, 2021 at 06:48:44AM +0200, Juergen Gross wrote:
> On 28.10.21 22:16, Marek Marczykowski-Górecki wrote:
> > On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:
> > > When running as PVH or HVM guest with actual memory < max memory the
> > > hypervisor is using "populate on demand" in order to allow the guest
> > > to balloon down from its maximum memory size. For this to work
> > > correctly the guest must not touch more memory pages than its target
> > > memory size as otherwise the PoD cache will be exhausted and the guest
> > > is crashed as a result of that.
> > > 
> > > In extreme cases ballooning down might not be finished today before
> > > the init process is started, which can consume lots of memory.
> > > 
> > > In order to avoid random boot crashes in such cases, add a late init
> > > call to wait for ballooning down having finished for PVH/HVM guests.
> > > 
> > > Cc: <stable@vger.kernel.org>
> > > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > 
> > It may happen that initial balloon down fails (state==BP_ECANCELED). In
> > that case, it waits indefinitely. I think it should rather report a
> > failure (and panic? it's similar to OOM before PID 1 starts, so rather
> > hard to recover), instead of hanging.
> 
> Okay, I can add something like that. I'm thinking of issuing a failure
> message in case of credit not having changed for 1 minute and panic()
> after two more minutes. Is this fine?

Isn't it better to get a state from balloon_thread()? If the balloon
fails it won't really try anymore (until 3600s timeout), so waiting in
that state doesn't help. And reporting the failure earlier may be more
user friendly. Or maybe there is something that could wakeup the thread
earlier, that I don't see? Hot plugging more RAM is rather unlikely at
this stage...
See my patch at [1], although rather hacky (and likely - racy).

[1] https://lore.kernel.org/xen-devel/YXFxKC4shCATB913@mail-itl/

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-10-29  9:57     ` Marek Marczykowski-Górecki
@ 2021-10-29 10:22       ` Juergen Gross
  2021-10-29 10:32         ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 7+ messages in thread
From: Juergen Gross @ 2021-10-29 10:22 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, linux-kernel, Boris Ostrovsky, Stefano Stabellini, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 2531 bytes --]

On 29.10.21 11:57, Marek Marczykowski-Górecki wrote:
> On Fri, Oct 29, 2021 at 06:48:44AM +0200, Juergen Gross wrote:
>> On 28.10.21 22:16, Marek Marczykowski-Górecki wrote:
>>> On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:
>>>> When running as PVH or HVM guest with actual memory < max memory the
>>>> hypervisor is using "populate on demand" in order to allow the guest
>>>> to balloon down from its maximum memory size. For this to work
>>>> correctly the guest must not touch more memory pages than its target
>>>> memory size as otherwise the PoD cache will be exhausted and the guest
>>>> is crashed as a result of that.
>>>>
>>>> In extreme cases ballooning down might not be finished today before
>>>> the init process is started, which can consume lots of memory.
>>>>
>>>> In order to avoid random boot crashes in such cases, add a late init
>>>> call to wait for ballooning down having finished for PVH/HVM guests.
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> It may happen that initial balloon down fails (state==BP_ECANCELED). In
>>> that case, it waits indefinitely. I think it should rather report a
>>> failure (and panic? it's similar to OOM before PID 1 starts, so rather
>>> hard to recover), instead of hanging.
>>
>> Okay, I can add something like that. I'm thinking of issuing a failure
>> message in case of credit not having changed for 1 minute and panic()
>> after two more minutes. Is this fine?
> 
> Isn't it better to get a state from balloon_thread()? If the balloon
> fails it won't really try anymore (until 3600s timeout), so waiting in
> that state doesn't help. And reporting the failure earlier may be more
> user friendly. Or maybe there is something that could wakeup the thread
> earlier, that I don't see? Hot plugging more RAM is rather unlikely at
> this stage...

Waking up the thread would be easy, but probably that wouldn't really
help.

The idea was that maybe a Xen admin would see the guest not booting up
further and then adding some more memory to the guest (this should wake
up the balloon thread again).

I agree that stopping to wait for ballooning to finish in case of it
having failed is probably a sensible thing to do. Additionally I could
add a boot parameter to control the timeout after the fail message and
the panic().

What do you think?


Juergen


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
  2021-10-29 10:22       ` Juergen Gross
@ 2021-10-29 10:32         ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 7+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-10-29 10:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, linux-kernel, Boris Ostrovsky, Stefano Stabellini, stable

[-- Attachment #1: Type: text/plain, Size: 3057 bytes --]

On Fri, Oct 29, 2021 at 12:22:18PM +0200, Juergen Gross wrote:
> On 29.10.21 11:57, Marek Marczykowski-Górecki wrote:
> > On Fri, Oct 29, 2021 at 06:48:44AM +0200, Juergen Gross wrote:
> > > On 28.10.21 22:16, Marek Marczykowski-Górecki wrote:
> > > > On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:
> > > > > When running as PVH or HVM guest with actual memory < max memory the
> > > > > hypervisor is using "populate on demand" in order to allow the guest
> > > > > to balloon down from its maximum memory size. For this to work
> > > > > correctly the guest must not touch more memory pages than its target
> > > > > memory size as otherwise the PoD cache will be exhausted and the guest
> > > > > is crashed as a result of that.
> > > > > 
> > > > > In extreme cases ballooning down might not be finished today before
> > > > > the init process is started, which can consume lots of memory.
> > > > > 
> > > > > In order to avoid random boot crashes in such cases, add a late init
> > > > > call to wait for ballooning down having finished for PVH/HVM guests.
> > > > > 
> > > > > Cc: <stable@vger.kernel.org>
> > > > > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > > 
> > > > It may happen that initial balloon down fails (state==BP_ECANCELED). In
> > > > that case, it waits indefinitely. I think it should rather report a
> > > > failure (and panic? it's similar to OOM before PID 1 starts, so rather
> > > > hard to recover), instead of hanging.
> > > 
> > > Okay, I can add something like that. I'm thinking of issuing a failure
> > > message in case of credit not having changed for 1 minute and panic()
> > > after two more minutes. Is this fine?
> > 
> > Isn't it better to get a state from balloon_thread()? If the balloon
> > fails it won't really try anymore (until 3600s timeout), so waiting in
> > that state doesn't help. And reporting the failure earlier may be more
> > user friendly. Or maybe there is something that could wakeup the thread
> > earlier, that I don't see? Hot plugging more RAM is rather unlikely at
> > this stage...
> 
> Waking up the thread would be easy, but probably that wouldn't really
> help.

Waking it up alone no. I was thinking what could wake it up - if
nothing, then definitely waiting wouldn't help. You explained that just
below:

> The idea was that maybe a Xen admin would see the guest not booting up
> further and then adding some more memory to the guest (this should wake
> up the balloon thread again).
> 
> I agree that stopping to wait for ballooning to finish in case of it
> having failed is probably a sensible thing to do. Additionally I could
> add a boot parameter to control the timeout after the fail message and
> the panic().

Right, that would make sense: it's basically a time admin has to plug in
more memory to the VM.

> What do you think?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-29 10:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28 10:59 [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done Juergen Gross
2021-10-28 18:13 ` Boris Ostrovsky
2021-10-28 20:16 ` Marek Marczykowski-Górecki
2021-10-29  4:48   ` Juergen Gross
2021-10-29  9:57     ` Marek Marczykowski-Górecki
2021-10-29 10:22       ` Juergen Gross
2021-10-29 10:32         ` Marek Marczykowski-Górecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).