All of lore.kernel.org
 help / color / mirror / Atom feed
* xen-balloon thread using 100% of CPU, regression in 5.4.150
@ 2021-10-03  4:47 Marek Marczykowski-Górecki
  2021-10-04  5:31 ` Juergen Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-10-03  4:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

Hi,

After updating a PVH domU to 5.4.150, I see xen-balloon thread using
100% CPU (one thread).
This is a domain started with memory=maxmem=716800KiB (via libvirt). Then,
inside, I see:

# cat /sys/devices/system/xen_memory/xen_memory0/target_kb
716924
# cat /sys/devices/system/xen_memory/xen_memory0/info/current_kb
716400

Doing `cat info/current_kb > target_kb` "fixes" the issue. But still,
something is wrong - on earlier kernel (5.4.143 to be precise), it
wasn't spinning, with exactly the same values reported in sysfs. It
shouldn't run in circles if it can't get that much memory it wants. I
strongly suspect "xen/balloon: use a kernel thread instead a workqueue"
or related commit being responsible, but I haven't verified it.

This specific test is from Xen 4.8.5 (+quite a lot of patches), but I've
got report of the same issue on 4.14.3 too. Anyway, I don't think Xen
version matters here much.

I have _not_ managed to reproduce the issue on 5.10.70, nor 5.14.9. In
both cases, just after starting the domain, I see
current_kb=target_kb=716412. And writing 716924 to target_kb manually
does not cause xen-balloon thread to spin.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen-balloon thread using 100% of CPU, regression in 5.4.150
  2021-10-03  4:47 xen-balloon thread using 100% of CPU, regression in 5.4.150 Marek Marczykowski-Górecki
@ 2021-10-04  5:31 ` Juergen Gross
  2021-10-04  9:14   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 6+ messages in thread
From: Juergen Gross @ 2021-10-04  5:31 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel; +Cc: Jan Beulich


[-- Attachment #1.1.1: Type: text/plain, Size: 1116 bytes --]

On 03.10.21 06:47, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> After updating a PVH domU to 5.4.150, I see xen-balloon thread using
> 100% CPU (one thread).
> This is a domain started with memory=maxmem=716800KiB (via libvirt). Then,
> inside, I see:
> 
> # cat /sys/devices/system/xen_memory/xen_memory0/target_kb
> 716924
> # cat /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> 716400
> 
> Doing `cat info/current_kb > target_kb` "fixes" the issue. But still,
> something is wrong - on earlier kernel (5.4.143 to be precise), it
> wasn't spinning, with exactly the same values reported in sysfs. It
> shouldn't run in circles if it can't get that much memory it wants. I
> strongly suspect "xen/balloon: use a kernel thread instead a workqueue"
> or related commit being responsible, but I haven't verified it.

I think you are right. I need to handle the BP_ECANCELED case similar to
BP_EAGAIN in the kernel thread (wait until target size changes again).

One further question: do you see any kernel message in the guest related
to the looping balloon thread?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen-balloon thread using 100% of CPU, regression in 5.4.150
  2021-10-04  5:31 ` Juergen Gross
@ 2021-10-04  9:14   ` Marek Marczykowski-Górecki
  2021-10-05  8:05     ` Juergen Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-10-04  9:14 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On Mon, Oct 04, 2021 at 07:31:40AM +0200, Juergen Gross wrote:
> On 03.10.21 06:47, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > After updating a PVH domU to 5.4.150, I see xen-balloon thread using
> > 100% CPU (one thread).
> > This is a domain started with memory=maxmem=716800KiB (via libvirt). Then,
> > inside, I see:
> > 
> > # cat /sys/devices/system/xen_memory/xen_memory0/target_kb
> > 716924
> > # cat /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> > 716400
> > 
> > Doing `cat info/current_kb > target_kb` "fixes" the issue. But still,
> > something is wrong - on earlier kernel (5.4.143 to be precise), it
> > wasn't spinning, with exactly the same values reported in sysfs. It
> > shouldn't run in circles if it can't get that much memory it wants. I
> > strongly suspect "xen/balloon: use a kernel thread instead a workqueue"
> > or related commit being responsible, but I haven't verified it.
> 
> I think you are right. I need to handle the BP_ECANCELED case similar to
> BP_EAGAIN in the kernel thread (wait until target size changes again).
> 
> One further question: do you see any kernel message in the guest related
> to the looping balloon thread?

Nothing, only the usual "xen:balloon: Initialising balloon driver", and
nothing related to balloon after that.


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen-balloon thread using 100% of CPU, regression in 5.4.150
  2021-10-04  9:14   ` Marek Marczykowski-Górecki
@ 2021-10-05  8:05     ` Juergen Gross
  2021-10-05 13:31       ` Marek Marczykowski-Górecki
  2021-10-05 13:33       ` Jason Andryuk
  0 siblings, 2 replies; 6+ messages in thread
From: Juergen Gross @ 2021-10-05  8:05 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel, Jan Beulich


[-- Attachment #1.1.1: Type: text/plain, Size: 1505 bytes --]

On 04.10.21 11:14, Marek Marczykowski-Górecki wrote:
> On Mon, Oct 04, 2021 at 07:31:40AM +0200, Juergen Gross wrote:
>> On 03.10.21 06:47, Marek Marczykowski-Górecki wrote:
>>> Hi,
>>>
>>> After updating a PVH domU to 5.4.150, I see xen-balloon thread using
>>> 100% CPU (one thread).
>>> This is a domain started with memory=maxmem=716800KiB (via libvirt). Then,
>>> inside, I see:
>>>
>>> # cat /sys/devices/system/xen_memory/xen_memory0/target_kb
>>> 716924
>>> # cat /sys/devices/system/xen_memory/xen_memory0/info/current_kb
>>> 716400
>>>
>>> Doing `cat info/current_kb > target_kb` "fixes" the issue. But still,
>>> something is wrong - on earlier kernel (5.4.143 to be precise), it
>>> wasn't spinning, with exactly the same values reported in sysfs. It
>>> shouldn't run in circles if it can't get that much memory it wants. I
>>> strongly suspect "xen/balloon: use a kernel thread instead a workqueue"
>>> or related commit being responsible, but I haven't verified it.
>>
>> I think you are right. I need to handle the BP_ECANCELED case similar to
>> BP_EAGAIN in the kernel thread (wait until target size changes again).
>>
>> One further question: do you see any kernel message in the guest related
>> to the looping balloon thread?
> 
> Nothing, only the usual "xen:balloon: Initialising balloon driver", and
> nothing related to balloon after that.

Could you try the attached patch, please? I've tested it briefly with
PV and PVH guests.


Juergen


[-- Attachment #1.1.2: 0001-xen-balloon-fix-cancelled-balloon-action.patch --]
[-- Type: text/x-patch, Size: 2117 bytes --]

From c0901b425d5939b7f3ce6c3f4bb7a0161b819745 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Mon, 4 Oct 2021 17:05:48 +0200
Subject: [PATCH] xen/balloon: fix cancelled balloon action
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In case a ballooning action is cancelled the new kernel thread handling
the ballooning might end up in a busy loop.

Fix that by handling the cancelled action gracefully.

Cc: stable@vger.kernel.org
Fixes: 8480ed9c2bbd56 ("xen/balloon: use a kernel thread instead a workqueue")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/balloon.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 43ebfe36ac27..3a50f097ed3e 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -491,12 +491,12 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 }
 
 /*
- * Stop waiting if either state is not BP_EAGAIN and ballooning action is
- * needed, or if the credit has changed while state is BP_EAGAIN.
+ * Stop waiting if either state is BP_DONE and ballooning action is
+ * needed, or if the credit has changed while state is not BP_DONE.
  */
 static bool balloon_thread_cond(enum bp_state state, long credit)
 {
-	if (state != BP_EAGAIN)
+	if (state == BP_DONE)
 		credit = 0;
 
 	return current_credit() != credit || kthread_should_stop();
@@ -516,10 +516,19 @@ static int balloon_thread(void *unused)
 
 	set_freezable();
 	for (;;) {
-		if (state == BP_EAGAIN)
-			timeout = balloon_stats.schedule_delay * HZ;
-		else
+		switch (state) {
+		case BP_DONE:
+		case BP_ECANCELED:
 			timeout = 3600 * HZ;
+			break;
+		case BP_EAGAIN:
+			timeout = balloon_stats.schedule_delay * HZ;
+			break;
+		case BP_WAIT:
+			timeout = HZ;
+			break;
+		}
+
 		credit = current_credit();
 
 		wait_event_freezable_timeout(balloon_thread_wq,
-- 
2.26.2


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: xen-balloon thread using 100% of CPU, regression in 5.4.150
  2021-10-05  8:05     ` Juergen Gross
@ 2021-10-05 13:31       ` Marek Marczykowski-Górecki
  2021-10-05 13:33       ` Jason Andryuk
  1 sibling, 0 replies; 6+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-10-05 13:31 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]

On Tue, Oct 05, 2021 at 10:05:39AM +0200, Juergen Gross wrote:
> On 04.10.21 11:14, Marek Marczykowski-Górecki wrote:
> > On Mon, Oct 04, 2021 at 07:31:40AM +0200, Juergen Gross wrote:
> > > On 03.10.21 06:47, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > After updating a PVH domU to 5.4.150, I see xen-balloon thread using
> > > > 100% CPU (one thread).
> > > > This is a domain started with memory=maxmem=716800KiB (via libvirt). Then,
> > > > inside, I see:
> > > > 
> > > > # cat /sys/devices/system/xen_memory/xen_memory0/target_kb
> > > > 716924
> > > > # cat /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> > > > 716400
> > > > 
> > > > Doing `cat info/current_kb > target_kb` "fixes" the issue. But still,
> > > > something is wrong - on earlier kernel (5.4.143 to be precise), it
> > > > wasn't spinning, with exactly the same values reported in sysfs. It
> > > > shouldn't run in circles if it can't get that much memory it wants. I
> > > > strongly suspect "xen/balloon: use a kernel thread instead a workqueue"
> > > > or related commit being responsible, but I haven't verified it.
> > > 
> > > I think you are right. I need to handle the BP_ECANCELED case similar to
> > > BP_EAGAIN in the kernel thread (wait until target size changes again).
> > > 
> > > One further question: do you see any kernel message in the guest related
> > > to the looping balloon thread?
> > 
> > Nothing, only the usual "xen:balloon: Initialising balloon driver", and
> > nothing related to balloon after that.
> 
> Could you try the attached patch, please? I've tested it briefly with
> PV and PVH guests.

Yes, it helps, thanks!

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen-balloon thread using 100% of CPU, regression in 5.4.150
  2021-10-05  8:05     ` Juergen Gross
  2021-10-05 13:31       ` Marek Marczykowski-Górecki
@ 2021-10-05 13:33       ` Jason Andryuk
  1 sibling, 0 replies; 6+ messages in thread
From: Jason Andryuk @ 2021-10-05 13:33 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Marek Marczykowski-Górecki, xen-devel, Jan Beulich

On Tue, Oct 5, 2021 at 4:05 AM Juergen Gross <jgross@suse.com> wrote:
>
> On 04.10.21 11:14, Marek Marczykowski-Górecki wrote:
> > On Mon, Oct 04, 2021 at 07:31:40AM +0200, Juergen Gross wrote:
> >> On 03.10.21 06:47, Marek Marczykowski-Górecki wrote:
> >>> Hi,
> >>>
> >>> After updating a PVH domU to 5.4.150, I see xen-balloon thread using
> >>> 100% CPU (one thread).
> >>> This is a domain started with memory=maxmem=716800KiB (via libvirt). Then,
> >>> inside, I see:
> >>>
> >>> # cat /sys/devices/system/xen_memory/xen_memory0/target_kb
> >>> 716924
> >>> # cat /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> >>> 716400
> >>>
> >>> Doing `cat info/current_kb > target_kb` "fixes" the issue. But still,
> >>> something is wrong - on earlier kernel (5.4.143 to be precise), it
> >>> wasn't spinning, with exactly the same values reported in sysfs. It
> >>> shouldn't run in circles if it can't get that much memory it wants. I
> >>> strongly suspect "xen/balloon: use a kernel thread instead a workqueue"
> >>> or related commit being responsible, but I haven't verified it.
> >>
> >> I think you are right. I need to handle the BP_ECANCELED case similar to
> >> BP_EAGAIN in the kernel thread (wait until target size changes again).
> >>
> >> One further question: do you see any kernel message in the guest related
> >> to the looping balloon thread?
> >
> > Nothing, only the usual "xen:balloon: Initialising balloon driver", and
> > nothing related to balloon after that.
>
> Could you try the attached patch, please? I've tested it briefly with
> PV and PVH guests.

I was seeing the CPU spinning in dom0 with xen command line:
dom0_mem=min:420M,max:420M,420M

Your patch eliminated the CPU spinning.

Tested-by: Jason Andryuk <jandryuk@gmail.com>

Thanks, Juergen

-Jason


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-05 13:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-03  4:47 xen-balloon thread using 100% of CPU, regression in 5.4.150 Marek Marczykowski-Górecki
2021-10-04  5:31 ` Juergen Gross
2021-10-04  9:14   ` Marek Marczykowski-Górecki
2021-10-05  8:05     ` Juergen Gross
2021-10-05 13:31       ` Marek Marczykowski-Górecki
2021-10-05 13:33       ` Jason Andryuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.