From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Magenheimer Subject: Linux balloon driver stops accepting target_kb for a long time Date: Mon, 23 Aug 2010 15:45:53 -0700 (PDT) Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: jeremy@goop.org, Keir Fraser , JBeulich@novell.com List-Id: xen-devel@lists.xenproject.org Balloon experts -- I'm seeing a strange problem in either the balloon driver or in the Xen code that provides the support for it... still trying to narrow down which. The problem appears when I am running in-kernel selfballooning code and then only rarely... I'm not sure exactly what conditions are required but for a long period of time (>30 minutes), writing to target_kb inside a PV guest has no effect at all on the memory size of the VM (as viewed inside the guest with "free -k")! Under most conditions, writing to target_kb "immediately" changes the memory size, but once in this state, no effect at all. At the end of this long period of time, suddenly everything is back to normal... and there's no obvious trigger that signals the return to normalcy. Note that though the problem is observed with selfballooning, changing target_kb manually fails as well, so I suspect the problem exists regardless of selfballooning but only selfballooning is exercising the balloon sizing enough to encounter the bug. Reviewing code, one thing caught my attention. In balloon_process(), the balloon_mutex is down'ed then, under certain conditions schedule() is called with the balloon_mutex still held and without another timer set. Any chance this could be a problem, especially if another kernel thread invokes balloon_set_new_target()? If so, what might finally kick the scheduled-out thread after 30 minutes to reset the balloon_timer and up the mutex? If this is wrong, any other ideas what might be causing this weird problem? Thanks, Dan P.S. This is the Linux 2.6.18-based balloon driver (with latest patches from xen-unstable), but I may see if I can reproduce it on an upstream balloon driver as well.