All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen balloon driver discuss
       [not found] <SNT0-MC3-F148nSuKiM000aac29@SNT0-MC3-F14.Snt0.hotmail.com>
@ 2010-11-21  6:26 ` tinnycloud
  2010-11-22  4:33   ` MaoXiaoyun
  2010-11-29  6:56   ` Chu Rui
  0 siblings, 2 replies; 32+ messages in thread
From: tinnycloud @ 2010-11-21  6:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, dan.magenheimer

Hi: 
	Greeting first.

	I was trying to run about 24 HVMS (currently only Linux, later will
involve Windows) on one physical server with 24GB memory, 16CPUs.
	Each VM is configured with 2GB memory, and I reserved 8GB memory for
dom0. 
	For safety reason, only domain U's memory is allowed to balloon.
	
	Inside domain U, I used xenballooned provide by xensource,
periodically write /proc/meminfo into xenstore in dom
0(/local/domain/did/memory/meminfo).
	And in domain 0, I wrote a python script to read the meminfo, like
xen provided strategy, use Committed_AS to calculate the domain U balloon
target.
	The time interval is 1 seconds.

	Inside each VM, I setup a apache server for test.	Well, I'd
like to say the result is not so good.
	It appears that too much read/write on xenstore, when I give some of
the stress(by using ab) to guest domains, 
	the CPU usage of xenstore is up to 100%. Thus the monitor running in
dom0 also response quite slowly.
	Also, in ab test, the Committed_AS grows very fast, reach to maxmem
in short time, but in fact the only a small amount
	of memory guest really need, so I guess there should be some more to
be taken into consideration for ballooning.

	For xenstore issue, I first plan to wrote a C program inside domain
U to replace xenballoond to see whether the situation
	will be refined. If not, how about set up event channel directly for
domU and dom0, would it be faster?

	Regards balloon strategy, I would do like this, when there are
enough memory , just fulfill the guest balloon request, and when shortage 
	of memory, distribute memory evenly on the guests those request
inflation.
	
	Does anyone have better suggestion, thanks in advance.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Xen balloon driver discuss
  2010-11-21  6:26 ` Xen balloon driver discuss tinnycloud
@ 2010-11-22  4:33   ` MaoXiaoyun
  2010-11-22 17:46     ` Dan Magenheimer
  2010-11-29  6:56   ` Chu Rui
  1 sibling, 1 reply; 32+ messages in thread
From: MaoXiaoyun @ 2010-11-22  4:33 UTC (permalink / raw)
  To: xen devel; +Cc: george.dunlap, dan.magenheimer


[-- Attachment #1.1: Type: text/plain, Size: 2635 bytes --]


 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU periodically
update the meminfo into the page, while on the other side dom0 retrive the updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 3200 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Xen balloon driver discuss
  2010-11-22  4:33   ` MaoXiaoyun
@ 2010-11-22 17:46     ` Dan Magenheimer
  2010-11-23 14:58       ` tinnycloud
  2010-11-27  6:54       ` cloudroot
  0 siblings, 2 replies; 32+ messages in thread
From: Dan Magenheimer @ 2010-11-22 17:46 UTC (permalink / raw)
  To: MaoXiaoyun, xen devel; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 3916 bytes --]

Xenstore IS slow and you could improve xenballoond performance by only sending the single CommittedAS value from xenballoond in domU to dom0 instead of all of /proc/meminfo.   But you are making an assumption that getting memory utilization information from domU to dom0 FASTER (e.g. with a shared page) will provide better ballooning results.  I have not found this to be the case, which is what led to my investigation into self-ballooning, which led to Transcendent Memory.  See the 2010 Xen Summit for more information.

 

In your last paragraph below "Regards balloon strategy", the problem is it is not easy to define "enough memory" and "shortage of memory" within any guest and almost impossible to define it and effectively load balance across many guests.  See my Linux Plumber's Conference presentation (with complete speaker notes) here:

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-Final.pdf

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-SpkNotes.pdf

 

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] 
Sent: Sunday, November 21, 2010 9:33 PM
To: xen devel
Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU periodically
update the meminfo into the page, while on the other side dom0 retrive the updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is ! 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there ! are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 

[-- Attachment #1.2: Type: text/html, Size: 10004 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* re: Xen balloon driver discuss
  2010-11-22 17:46     ` Dan Magenheimer
@ 2010-11-23 14:58       ` tinnycloud
  2010-11-27  6:54       ` cloudroot
  1 sibling, 0 replies; 32+ messages in thread
From: tinnycloud @ 2010-11-23 14:58 UTC (permalink / raw)
  To: 'Dan Magenheimer', 'xen devel'; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 5219 bytes --]

HI Dan:

 

         Appreciate for your presentation in summarizing the memory
overcommit, really vivid and in great help.

         Well, I guess recently days the strategy in my mind will fall into
the solution Set C in pdf.

 

         The tmem solution your worked out for memory overcommit is both
efficient and effective. 

         I guess I will have a try on Linux Guest.

 

         The real situation I have is most of the running VMs on host are
windows. So I had to come up those policies to balance the memory.

         Although policies are all workload dependent. Good news is host
workload  is configurable, and not very heavy

So I will try to figure out some favorable policy. The policies referred in
pdf are good start for me.

 

         Today, instead of trying to implement "/proc/meminfo" with shared
pages, I hacked the balloon driver to have another 

         workqueue periodically write meminfo into xenstore through xenbus,
which solve the problem of xenstrore high CPU

         utilization  problem.

 

         Later I will try to google more on how Citrix does.

         Thanks for your help, or do you have any better idea for windows
guest?

         

 

Sent: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.23 1:47
To: MaoXiaoyun; xen devel
CC: george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

Xenstore IS slow and you could improve xenballoond performance by only
sending the single CommittedAS value from xenballoond in domU to dom0
instead of all of /proc/meminfo.   But you are making an assumption that
getting memory utilization information from domU to dom0 FASTER (e.g. with a
shared page) will provide better ballooning results.  I have not found this
to be the case, which is what led to my investigation into self-ballooning,
which led to Transcendent Memory.  See the 2010 Xen Summit for more
information.

 

In your last paragraph below "Regards balloon strategy", the problem is it
is not easy to define "enough memory" and "shortage of memory" within any
guest and almost impossible to define it and effectively load balance across
many guests.  See my Linux Plumber's Conference presentation (with complete
speaker notes) here:

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-Final.pdf

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-SpkNotes.pdf

 

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] 
Sent: Sunday, November 21, 2010 9:33 PM
To: xen devel
Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my
opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU
periodically
update the meminfo into the page, while on the other side dom0 retrive the
updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page
between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and
transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is ! 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there ! are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 


[-- Attachment #1.2: Type: text/html, Size: 16106 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* re: Xen balloon driver discuss
  2010-11-22 17:46     ` Dan Magenheimer
  2010-11-23 14:58       ` tinnycloud
@ 2010-11-27  6:54       ` cloudroot
  2010-11-28  2:36         ` Dan Magenheimer
  2010-11-28 13:00         ` Pasi Kärkkäinen
  1 sibling, 2 replies; 32+ messages in thread
From: cloudroot @ 2010-11-27  6:54 UTC (permalink / raw)
  To: 'tinnycloud', 'Dan Magenheimer', 'xen devel'
  Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 6534 bytes --]

Hi Dan:

 

         I have set the benchmark to test balloon driver, but unfortunately
the Xen crashed on memory Panic.

         Before I attach the details output from serial port(which takes
time on next run), I am afraid of I might miss something on test
environment.

 

         My dom0 kernel is 2.6.31, pvops. 

Well currently there is no  driver/xen/balloon.c on this kernel source tree,
so I build the xen-balloon.ko, Xen-platform-pci.ko form 

linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.

 

         What I did is put a C program in the each Dom U(total 24 HVM), the
program will allocate the memory and fill it with random string repeatly.

         And in dom0, a phthon monitor will collect the meminfo from
xenstore and calculate the target to balloon from Committed_AS.

The panic happens when the program is running in just one Dom.

 

         I am writing to ask whether my balloon driver is out of date, or
where can I get the latest source code, 

         I've googled a lot, but still have a lot of confusion on those
source tree.

 

         Many thanks. 

         

         

From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Date: 2010.11.23 22:58
TO: 'Dan Magenheimer'; 'xen devel'
CC: 'george.dunlap@eu.citrix.com'
Subject: re: Xen balloon driver discuss

 

HI Dan:

 

         Appreciate for your presentation in summarizing the memory
overcommit, really vivid and in great help.

         Well, I guess recently days the strategy in my mind will fall into
the solution Set C in pdf.

 

         The tmem solution your worked out for memory overcommit is both
efficient and effective. 

         I guess I will have a try on Linux Guest.

 

         The real situation I have is most of the running VMs on host are
windows. So I had to come up those policies to balance the memory.

         Although policies are all workload dependent. Good news is host
workload  is configurable, and not very heavy

So I will try to figure out some favorable policy. The policies referred in
pdf are good start for me.

 

         Today, instead of trying to implement "/proc/meminfo" with shared
pages, I hacked the balloon driver to have another 

         workqueue periodically write meminfo into xenstore through xenbus,
which solve the problem of xenstrore high CPU

         utilization  problem.

 

         Later I will try to google more on how Citrix does.

         Thanks for your help, or do you have any better idea for windows
guest?

         

 

Sent: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.23 1:47
To: MaoXiaoyun; xen devel
CC: george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

Xenstore IS slow and you could improve xenballoond performance by only
sending the single CommittedAS value from xenballoond in domU to dom0
instead of all of /proc/meminfo.   But you are making an assumption that
getting memory utilization information from domU to dom0 FASTER (e.g. with a
shared page) will provide better ballooning results.  I have not found this
to be the case, which is what led to my investigation into self-ballooning,
which led to Transcendent Memory.  See the 2010 Xen Summit for more
information.

 

In your last paragraph below "Regards balloon strategy", the problem is it
is not easy to define "enough memory" and "shortage of memory" within any
guest and almost impossible to define it and effectively load balance across
many guests.  See my Linux Plumber's Conference presentation (with complete
speaker notes) here:

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-Final.pdf

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-SpkNotes.pdf

 

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] 
Sent: Sunday, November 21, 2010 9:33 PM
To: xen devel
Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my
opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU
periodically
update the meminfo into the page, while on the other side dom0 retrive the
updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page
between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and
transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is ! 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there ! are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 


[-- Attachment #1.2: Type: text/html, Size: 20974 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Xen balloon driver discuss
  2010-11-27  6:54       ` cloudroot
@ 2010-11-28  2:36         ` Dan Magenheimer
  2010-11-29  4:20           ` tinnycloud
                             ` (3 more replies)
  2010-11-28 13:00         ` Pasi Kärkkäinen
  1 sibling, 4 replies; 32+ messages in thread
From: Dan Magenheimer @ 2010-11-28  2:36 UTC (permalink / raw)
  To: cloudroot, tinnycloud, xen devel; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 7412 bytes --]

Am I understanding correctly that you are running each linux-2.6.18 as HVM (not PV)?  I didn't think that the linux-2.6.18 balloon driver worked at all in an HVM guest.

 

You also didn't say what version of Xen you are using.  If you are running xen-unstable, you should also provide the changeset number.

 

In any case, any load of HVM guests should never crash Xen itself, but if you are running HVM guests, I probably can't help much as I almost never run HVM guests.

 

From: cloudroot [mailto:cloudroot@sina.com] 
Sent: Friday, November 26, 2010 11:55 PM
To: tinnycloud; Dan Magenheimer; xen devel
Cc: george.dunlap@eu.citrix.com
Subject: re: Xen balloon driver discuss

 

Hi Dan:

 

         I have set the benchmark to test balloon driver, but unfortunately the Xen crashed on memory Panic.

         Before I attach the details output from serial port(which takes time on next run), I am afraid of I might miss something on test environment.

 

         My dom0 kernel is 2.6.31, pvops. 

Well currently there is no  driver/xen/balloon.c on this kernel source tree, so I build the xen-balloon.ko, Xen-platform-pci.ko form 

linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.

 

         What I did is put a C program in the each Dom U(total 24 HVM), the program will allocate the memory and fill it with random string repeatly.

         And in dom0, a phthon monitor will collect the meminfo from xenstore and calculate the target to balloon from Committed_AS.

The panic happens when the program is running in just one Dom.

 

         I am writing to ask whether my balloon driver is out of date, or where can I get the latest source code, 

         I've googled a lot, but still have a lot of confusion on those source tree.

 

         Many thanks. 

         

         

From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Date: 2010.11.23 22:58
TO: 'Dan Magenheimer'; 'xen devel'
CC: 'george.dunlap@eu.citrix.com'
Subject: re: Xen balloon driver discuss

 

HI Dan:

 

         Appreciate for your presentation in summarizing the memory overcommit, really vivid and in great help.

         Well, I guess recently days the strategy in my mind will fall into the solution Set C in pdf.

 

         The tmem solution your worked out for memory overcommit is both efficient and effective. 

         I guess I will have a try on Linux Guest.

 

         The real situation I have is most of the running VMs on host are windows. So I had to come up those policies to balance the memory.

         Although policies are all workload dependent. Good news is host workload  is configurable, and not very heavy

So I will try to figure out some favorable policy. The policies referred in pdf are good start for me.

 

         Today, instead of trying to implement "/proc/meminfo" with shared pages, I hacked the balloon driver to have another 

         workqueue periodically write meminfo into xenstore through xenbus, which solve the problem of xenstrore high CPU

         utilization  problem.

 

         Later I will try to google more on how Citrix does.

         Thanks for your help, or do you have any better idea for windows guest?

         

 

Sent: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.23 1:47
To: MaoXiaoyun; xen devel
CC: george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

Xenstore IS slow and you could improve xenballoond performance by only sending the single CommittedAS value from xenballoond in domU to dom0 instead of all of /proc/meminfo.   But you are making an assumption that getting memory utilization information from domU to dom0 FASTER (e.g. with a shared page) will provide better ballooning results.  I have not found this to be the case, which is what led to my investigation into self-ballooning, which led to Transcendent Memory.  See the 2010 Xen Summit for more information.

 

In your last paragraph below "Regards balloon strategy", the problem is it is not easy to define "enough memory" and "shortage of memory" within any guest and almost impossible to define it and effectively load balance across many guests.  See my Linux Plumber's Conference presentation (with complete speaker notes) here:

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-Final.pdf

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-SpkNotes.pdf

 

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] 
Sent: Sunday, November 21, 2010 9:33 PM
To: xen devel
Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU periodically
update the meminfo into the page, while on the other side dom0 retrive the updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is ! 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there ! are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 

[-- Attachment #1.2: Type: text/html, Size: 21468 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: re: Xen balloon driver discuss
  2010-11-27  6:54       ` cloudroot
  2010-11-28  2:36         ` Dan Magenheimer
@ 2010-11-28 13:00         ` Pasi Kärkkäinen
  1 sibling, 0 replies; 32+ messages in thread
From: Pasi Kärkkäinen @ 2010-11-28 13:00 UTC (permalink / raw)
  To: cloudroot
  Cc: george.dunlap, 'tinnycloud', 'xen devel',
	'Dan Magenheimer'

On Sat, Nov 27, 2010 at 02:54:46PM +0800, cloudroot wrote:
>    Hi Dan:
> 
> 
> 
>             I have set the benchmark to test balloon driver, but
>    unfortunately the Xen crashed on memory Panic.
> 
>             Before I attach the details output from serial port(which takes
>    time on next run), I am afraid of I might miss something on test
>    environment.
> 
> 
> 
>             My dom0 kernel is 2.6.31, pvops.
> 

You should switch to 2.6.32 based dom0 kernel.
2.6.31 tree is not supported or maintained anymore.

So switch to xen/stable-2.6.32.x branch in Jeremy's git tree.

Other than that.. I haven't tried ballooning with HVM guests,
so I'm not sure if that should work with the EL5 kernel.

-- Pasi


>    Well currently there is no  driver/xen/balloon.c on this kernel source
>    tree, so I build the xen-balloon.ko, Xen-platform-pci.ko form
> 
>    linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.
> 
> 
> 
>             What I did is put a C program in the each Dom U(total 24 HVM),
>    the program will allocate the memory and fill it with random string
>    repeatly.
> 
>             And in dom0, a phthon monitor will collect the meminfo from
>    xenstore and calculate the target to balloon from Committed_AS.
> 
>    The panic happens when the program is running in just one Dom.
> 
> 
> 
>             I am writing to ask whether my balloon driver is out of date, or
>    where can I get the latest source code,
> 
>             I've googled a lot, but still have a lot of confusion on those
>    source tree.
> 
> 
> 
>             Many thanks.
> 
> 
> 
> 
> 
>    From: tinnycloud [mailto:tinnycloud@hotmail.com]
>    Date: 2010.11.23 22:58
>    TO: 'Dan Magenheimer'; 'xen devel'
>    CC: 'george.dunlap@eu.citrix.com'
>    Subject: re: Xen balloon driver discuss
> 
> 
> 
>    HI Dan:
> 
> 
> 
>             Appreciate for your presentation in summarizing the memory
>    overcommit, really vivid and in great help.
> 
>             Well, I guess recently days the strategy in my mind will fall
>    into the solution Set C in pdf.
> 
> 
> 
>             The tmem solution your worked out for memory overcommit is both
>    efficient and effective.
> 
>             I guess I will have a try on Linux Guest.
> 
> 
> 
>             The real situation I have is most of the running VMs on host are
>    windows. So I had to come up those policies to balance the memory.
> 
>             Although policies are all workload dependent. Good news is host
>    workload  is configurable, and not very heavy
> 
>    So I will try to figure out some favorable policy. The policies referred
>    in pdf are good start for me.
> 
> 
> 
>             Today, instead of trying to implement "/proc/meminfo" with shared
>    pages, I hacked the balloon driver to have another
> 
>             workqueue periodically write meminfo into xenstore through
>    xenbus, which solve the problem of xenstrore high CPU
> 
>             utilization  problem.
> 
> 
> 
>             Later I will try to google more on how Citrix does.
> 
>             Thanks for your help, or do you have any better idea for windows
>    guest?
> 
> 
> 
> 
> 
>    Sent: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>    Date: 2010.11.23 1:47
>    To: MaoXiaoyun; xen devel
>    CC: george.dunlap@eu.citrix.com
>    Subject: RE: Xen balloon driver discuss
> 
> 
> 
>    Xenstore IS slow and you could improve xenballoond performance by only
>    sending the single CommittedAS value from xenballoond in domU to dom0
>    instead of all of /proc/meminfo.   But you are making an assumption that
>    getting memory utilization information from domU to dom0 FASTER (e.g. with
>    a shared page) will provide better ballooning results.  I have not found
>    this to be the case, which is what led to my investigation into
>    self-ballooning, which led to Transcendent Memory.  See the 2010 Xen
>    Summit for more information.
> 
> 
> 
>    In your last paragraph below "Regards balloon strategy", the problem is it
>    is not easy to define "enough memory" and "shortage of memory" within any
>    guest and almost impossible to define it and effectively load balance
>    across many guests.  See my Linux Plumber's Conference presentation (with
>    complete speaker notes) here:
> 
> 
> 
>    [1]http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-Final.pdf
> 
> 
> 
>    [2]http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-SpkNotes.pdf
> 
> 
> 
>    From: MaoXiaoyun [mailto:tinnycloud@hotmail.com]
>    Sent: Sunday, November 21, 2010 9:33 PM
>    To: xen devel
>    Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
>    Subject: RE: Xen balloon driver discuss
> 
> 
> 
> 
>    Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my
>    opinoin is slow.
>    What I want to do is: there is a shared page between domU and dom0, and
>    domU periodically
>    update the meminfo into the page, while on the other side dom0 retrive the
>    updated data for
>    caculating the target, which is used by guest for balloning.
> 
>    The problem I met is,  currently I don't know how to implement a shared
>    page between
>    dom0 and domU.
>    Would it like dom 0 alloc a unbound event and wait guest to connect, and
>    transfer date through
>    grant table?
>    Or someone has more efficient way?
>    many thanks.
> 
>    > From: tinnycloud@hotmail.com
>    > To: xen-devel@lists.xensource.com
>    > CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
>    > Subject: Xen balloon driver discuss
>    > Date: Sun, 21 Nov 2010 14:26:01 +0800
>    >
>    > Hi:
>    > Greeting first.
>    >
>    > I was trying to run about 24 HVMS (currently only Linux, later will
>    > involve Windows) on one physical server with 24GB memory, 16CPUs.
>    > Each VM is configured with 2GB memory, and I reserved 8GB memory for
>    > dom0.
>    > For safety reason, only domain U's memory is allowed to balloon.
>    >
>    > Inside domain U, I used xenballooned provide by xensource,
>    > periodically write /proc/meminfo into xenstore in dom
>    > 0(/local/domain/did/memory/meminfo).
>    > And in domain 0, I wrote a python script to read the meminfo, like
>    > xen provided strategy, use Committed_AS to calculate the domain U
>    balloon
>    > target.
>    > The time interval is ! 1 seconds.
>    >
>    > Inside each VM, I setup a apache server for test. Well, I'd
>    > like to say the result is not so good.
>    > It appears that too much read/write on xenstore, when I give some of
>    > the stress(by using ab) to guest domains,
>    > the CPU usage of xenstore is up to 100%. Thus the monitor running in
>    > dom0 also response quite slowly.
>    > Also, in ab test, the Committed_AS grows very fast, reach to maxmem
>    > in short time, but in fact the only a small amount
>    > of memory guest really need, so I guess there should be some more to
>    > be taken into consideration for ballooning.
>    >
>    > For xenstore issue, I first plan to wrote a C program inside domain
>    > U to replace xenballoond to see whether the situation
>    > will be refined. If not, how about set up event channel directly for
>    > domU and dom0, would it be faster?
>    >
>    > Regards balloon strategy, I would do like this, when there ! are
>    > enough memory , just fulfill the guest balloon request, and when
>    shortage
>    > of memory, distribute memory evenly on the guests those request
>    > inflation.
>    >
>    > Does anyone have better suggestion, thanks in advance.
>    >
> 
> References
> 
>    Visible links
>    1. http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-Final.pdf
>    2. http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-SpkNotes.pdf

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* re: Xen balloon driver discuss
  2010-11-28  2:36         ` Dan Magenheimer
@ 2010-11-29  4:20           ` tinnycloud
  2010-11-29  6:34           ` xiaoyun.maoxy
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: tinnycloud @ 2010-11-29  4:20 UTC (permalink / raw)
  To: 'Dan Magenheimer', 'xen devel'; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 13706 bytes --]

Hi Dan:

 

         You are right, the HVM guest is kernel-2.6.18-164.el5.src.rpm,
coming from
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/SRPMS/

         Currently the balloon driver is compiled from this kernel. (So I am
afraid of if the driver may out of date, and I plan to get new balloon.c
from xenlinux and put it into this kernel to compile a new xen-balloon.ko)

         My xen is 4.0.0, again pvops kernel 2.6.31

         

         Actually, I have two problems, first is PoD "populate-on-demand
memory" issue,  and second is xen panic(I will get more test and report on
another reply)

I have googled some and apply the patch from
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01404.html,
but it doesn't work for me.

 

-------------------------------------------------Domain Crash
Case---------------------------------------------

The issue is easy to reproduce, I started one HVM with command line:

xm cr hvm.linux.balloon maxmem=2048 memory=512

 

         the guest works well at first, but crashed as long as I logined
into it throught VNC

 

         the serial output is:

 

         blktap_sysfs_create: adding attributes for dev ffff8801224df000

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages
132088 pod_entries 9489

(XEN) domain_crash called from p2m.c:1127

(XEN) Domain 4 reported crashed by domain 0 on cpu#0:

(XEN) printk: 31 messages suppressed.

(XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff
(domain 4)

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259ca00

 

-------------------------------------------------Xen Crash
Case---------------------------------------------

In addition, if start guest like 

m cr hvm.linux.balloon maxmem=2048 memory=400

 

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff8801224df000

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages
132088 pod_entries 9489

(XEN) domain_crash called from p2m.c:1127

(XEN) Domain 4 reported crashed by domain 0 on cpu#0:

(XEN) printk: 31 messages suppressed.

(XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff
(domain 4)

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259ca00

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259c600

 (XEN) Error: p2m lock held by p2m_change_type

(XEN) Xen BUG at p2m-ept.c:38

(XEN) ----[ Xen-4.0.0  x86_64  debug=n  Not tainted ]----

(XEN) CPU:    6

(XEN) RIP:    e008:[<ffff82c4801df2aa>]
ept_pod_check_and_populate+0x13a/0x150

(XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor

(XEN) rax: 0000000000000000   rbx: ffff83063fdc0000   rcx: 0000000000000092

(XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi: ffff82c48021e844

(XEN) rbp: ffff83023fefff28   rsp: ffff83023feffc18   r8:  0000000000000001

(XEN) r9:  0000000000000001   r10: 0000000000000000   r11: ffff82c4801318d0

(XEN) r12: ffff8302f5914ef8   r13: 0000000000000001   r14: 0000000000000000

(XEN) r15: 0000000000003bdf   cr0: 0000000080050033   cr4: 00000000000026f0

(XEN) cr3: 000000063fc2e000   cr2: 00002ba99c046000

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

(XEN) Xen stack trace from rsp=ffff83023feffc18:

(XEN)    0000000000000002 0000000000000000 0000000000000000 ffff83063fdc0000

(XEN)    ffff8302f5914ef8 0000000000000001 ffff83023feffc70 ffff82c4801df46e

(XEN)    0000000000000000 ffff83023feffcc4 0000000000003bdf 00000000000001df

(XEN)    ffff8302f5914000 ffff83063fdc0000 ffff83023fefff28 0000000000003bdf

(XEN)    0000000000000002 0000000000000001 0000000000000030 ffff82c4801bafe4

(XEN)    ffff8302f89dc000 000000043fefff28 ffff83023fefff28 0000000000003bdf

(XEN)    00000000002f9223 0000000000000030 ffff83023fefff28 ffff82c48019bab1

(XEN)    0000000000000000 00000001bdc62000 0000000000000000 0000000000000182

(XEN)    ffff8300bdc62000 ffff82c4801b3824 ffff83063fdc0348 07008300bdc62000

(XEN)    ffff83023fe808d0 0000000000000040 000000063fc3601e 0000000000000000

(XEN)    ffff83023fefff28 ffff82c480167d17 ffff82c4802509c0 0000000000000000

(XEN)    0000000003bdf000 000000000001c000 ffff83023feffdc8 0000000000000080

(XEN)    ffff82c480250dd0 0000000000003bdf 00ff82c480250080 ffff82c480250dc0

(XEN)    ffff82c480250080 ffff82c480250dc0 0000000000004040 0000000000000000

(XEN)    0000000000004040 0000000000000040 ffff82c4801447da 0000000000000080

(XEN)    ffff83023fefff28 0000000000000092 ffff82c4801a7f6c 00000000000000fc

(XEN)    0000000000000092 0000000000000006 ffff8300bdc63760 0000000000000006

(XEN)    ffff82c48025c100 ffff82c480250100 ffff82c480250100 0000000000000292

(XEN)    ffff8300bdc637f0 00000249b30f6a00 0000000000000292 ffff82c4801a9383

(XEN)    00000000000000ef ffff8300bdc62000 ffff8300bdc62000 ffff8300bdc637e8

(XEN) Xen call trace:

(XEN)    [<ffff82c4801df2aa>] ept_pod_check_and_populate+0x13a/0x150

(XEN)    [<ffff82c4801df46e>] ept_get_entry+0x1ae/0x1c0

(XEN)    [<ffff82c4801bafe4>] p2m_change_type+0x144/0x1b0

(XEN)    [<ffff82c48019bab1>] hvm_hap_nested_page_fault+0x121/0x190

(XEN)    [<ffff82c4801b3824>] vmx_vmexit_handler+0x304/0x1a90

(XEN)    [<ffff82c480167d17>] __smp_call_function_interrupt+0x57/0x90

(XEN)    [<ffff82c4801447da>] __find_next_bit+0x6a/0x70

(XEN)    [<ffff82c4801a7f6c>] vpic_get_highest_priority_irq+0x2c/0xa0

(XEN)    [<ffff82c4801a9383>] pt_update_irq+0x33/0x1e0

(XEN)    [<ffff82c4801a6042>] vlapic_has_pending_irq+0x42/0x70

(XEN)    [<ffff82c4801a0c88>] hvm_vcpu_has_pending_irq+0x88/0xa0

(XEN)    [<ffff82c4801b263b>] vmx_vmenter_helper+0x5b/0x150

(XEN)    [<ffff82c4801ada63>] vmx_asm_do_vmentry+0x0/0xdd

(XEN)    

(XEN) 

(XEN) ****************************************

(XEN) Panic on CPU 6:

(XEN) Xen BUG at p2m-ept.c:38

(XEN) ****************************************

(XEN) 

(XEN) Manual reset required ('noreboot' specified)

 

---------------------------------------Works
configuration--------------------------------------------------

And if starts guest like 

xm cr hvm.linux.balloon maxmem=1024 memory=512

the guest can be successfully logon through VNC

 

         Any idea on what happens? 

         PoD is new to me, I will try to know more, thanks.

 

From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.28 10:36
sent: tinnycloud; xen devel
cc: george.dunlap@eu.citrix.com
subject: RE: Xen balloon driver discuss

 

Am I understanding correctly that you are running each linux-2.6.18 as HVM
(not PV)?  I didn't think that the linux-2.6.18 balloon driver worked at all
in an HVM guest.

 

You also didn't say what version of Xen you are using.  If you are running
xen-unstable, you should also provide the changeset number.

 

In any case, any load of HVM guests should never crash Xen itself, but if
you are running HVM guests, I probably can't help much as I almost never run
HVM guests.

 

From: cloudroot [mailto:cloudroot@sina.com] 
Sent: Friday, November 26, 2010 11:55 PM
To: tinnycloud; Dan Magenheimer; xen devel
Cc: george.dunlap@eu.citrix.com
Subject: re: Xen balloon driver discuss

 

Hi Dan:

 

         I have set the benchmark to test balloon driver, but unfortunately
the Xen crashed on memory Panic.

         Before I attach the details output from serial port(which takes
time on next run), I am afraid of I might miss something on test
environment.

 

         My dom0 kernel is 2.6.31, pvops. 

Well currently there is no  driver/xen/balloon.c on this kernel source tree,
so I build the xen-balloon.ko, Xen-platform-pci.ko form 

linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.

 

         What I did is put a C program in the each Dom U(total 24 HVM), the
program will allocate the memory and fill it with random string repeatly.

         And in dom0, a phthon monitor will collect the meminfo from
xenstore and calculate the target to balloon from Committed_AS.

The panic happens when the program is running in just one Dom.

 

         I am writing to ask whether my balloon driver is out of date, or
where can I get the latest source code, 

         I've googled a lot, but still have a lot of confusion on those
source tree.

 

         Many thanks. 

         

         

From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Date: 2010.11.23 22:58
TO: 'Dan Magenheimer'; 'xen devel'
CC: 'george.dunlap@eu.citrix.com'
Subject: re: Xen balloon driver discuss

 

HI Dan:

 

         Appreciate for your presentation in summarizing the memory
overcommit, really vivid and in great help.

         Well, I guess recently days the strategy in my mind will fall into
the solution Set C in pdf.

 

         The tmem solution your worked out for memory overcommit is both
efficient and effective. 

         I guess I will have a try on Linux Guest.

 

         The real situation I have is most of the running VMs on host are
windows. So I had to come up those policies to balance the memory.

         Although policies are all workload dependent. Good news is host
workload  is configurable, and not very heavy

So I will try to figure out some favorable policy. The policies referred in
pdf are good start for me.

 

         Today, instead of trying to implement "/proc/meminfo" with shared
pages, I hacked the balloon driver to have another 

         workqueue periodically write meminfo into xenstore through xenbus,
which solve the problem of xenstrore high CPU

         utilization  problem.

 

         Later I will try to google more on how Citrix does.

         Thanks for your help, or do you have any better idea for windows
guest?

         

 

Sent: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.23 1:47
To: MaoXiaoyun; xen devel
CC: george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

Xenstore IS slow and you could improve xenballoond performance by only
sending the single CommittedAS value from xenballoond in domU to dom0
instead of all of /proc/meminfo.   But you are making an assumption that
getting memory utilization information from domU to dom0 FASTER (e.g. with a
shared page) will provide better ballooning results.  I have not found this
to be the case, which is what led to my investigation into self-ballooning,
which led to Transcendent Memory.  See the 2010 Xen Summit for more
information.

 

In your last paragraph below "Regards balloon strategy", the problem is it
is not easy to define "enough memory" and "shortage of memory" within any
guest and almost impossible to define it and effectively load balance across
many guests.  See my Linux Plumber's Conference presentation (with complete
speaker notes) here:

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-Final.pdf

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-SpkNotes.pdf

 

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] 
Sent: Sunday, November 21, 2010 9:33 PM
To: xen devel
Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my
opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU
periodically
update the meminfo into the page, while on the other side dom0 retrive the
updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page
between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and
transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is ! 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there ! are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 


[-- Attachment #1.2: Type: text/html, Size: 51397 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Xen balloon driver discuss
  2010-11-28  2:36         ` Dan Magenheimer
  2010-11-29  4:20           ` tinnycloud
@ 2010-11-29  6:34           ` xiaoyun.maoxy
       [not found]           ` <002b01cb8f8f$852bda10$8f838e30$@maoxy@aliyun-inc.com>
  2010-11-29 10:12           ` George Dunlap
  3 siblings, 0 replies; 32+ messages in thread
From: xiaoyun.maoxy @ 2010-11-29  6:34 UTC (permalink / raw)
  To: 'xen devel', george.dunlap
  Cc: 'tinnycloud', 'Dan Magenheimer'


[-- Attachment #1.1: Type: text/plain, Size: 15069 bytes --]

Hi George:

 

         I read
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01404.html
more carefully, and got my print out of

         first call of p2m_pod_demand_populate(), which is :

         

houyi-chunk2.dev.sd.aliyun.com login: blktap_sysfs_create: adding attributes
for dev ffff880122466400

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 523776

 

And  memory/target under /local/domain/1/  is 524288.

 

So 523776 is less than 524288, I think the problem is similar, right? 

But the question is why the patch doesn’t work for me.

         Many thanks. 

 

From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Date: 2010年11月29日 12:21
To: 'Dan Magenheimer'; 'xen devel'
CC: 'george.dunlap@eu.citrix.com'
Subject: re: Xen balloon driver discuss

 

Hi Dan:

 

         You are right, the HVM guest is kernel-2.6.18-164.el5.src.rpm,
coming from
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/SRPMS/

         Currently the balloon driver is compiled from this kernel. (So I am
afraid of if the driver may out of date, and I plan to get new balloon.c
from xenlinux and put it into this kernel to compile a new xen-balloon.ko)

         My xen is 4.0.0, again pvops kernel 2.6.31

         

         Actually, I have two problems, first is PoD “populate-on-demand
memory” issue,  and second is xen panic(I will get more test and report on
another reply)

I have googled some and apply the patch from
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01404.html,
but it doesn’t work for me.

 

-------------------------------------------------Domain Crash
Case---------------------------------------------

The issue is easy to reproduce, I started one HVM with command line:

xm cr hvm.linux.balloon maxmem=2048 memory=512

 

         the guest works well at first, but crashed as long as I logined
into it throught VNC

 

         the serial output is:

 

         blktap_sysfs_create: adding attributes for dev ffff8801224df000

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages
132088 pod_entries 9489

(XEN) domain_crash called from p2m.c:1127

(XEN) Domain 4 reported crashed by domain 0 on cpu#0:

(XEN) printk: 31 messages suppressed.

(XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff
(domain 4)

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259ca00

 

-------------------------------------------------Xen Crash
Case---------------------------------------------

In addition, if start guest like 

m cr hvm.linux.balloon maxmem=2048 memory=400

 

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff8801224df000

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages
132088 pod_entries 9489

(XEN) domain_crash called from p2m.c:1127

(XEN) Domain 4 reported crashed by domain 0 on cpu#0:

(XEN) printk: 31 messages suppressed.

(XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff
(domain 4)

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259ca00

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259c600

 (XEN) Error: p2m lock held by p2m_change_type

(XEN) Xen BUG at p2m-ept.c:38

(XEN) ----[ Xen-4.0.0  x86_64  debug=n  Not tainted ]----

(XEN) CPU:    6

(XEN) RIP:    e008:[<ffff82c4801df2aa>]
ept_pod_check_and_populate+0x13a/0x150

(XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor

(XEN) rax: 0000000000000000   rbx: ffff83063fdc0000   rcx: 0000000000000092

(XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi: ffff82c48021e844

(XEN) rbp: ffff83023fefff28   rsp: ffff83023feffc18   r8:  0000000000000001

(XEN) r9:  0000000000000001   r10: 0000000000000000   r11: ffff82c4801318d0

(XEN) r12: ffff8302f5914ef8   r13: 0000000000000001   r14: 0000000000000000

(XEN) r15: 0000000000003bdf   cr0: 0000000080050033   cr4: 00000000000026f0

(XEN) cr3: 000000063fc2e000   cr2: 00002ba99c046000

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

(XEN) Xen stack trace from rsp=ffff83023feffc18:

(XEN)    0000000000000002 0000000000000000 0000000000000000 ffff83063fdc0000

(XEN)    ffff8302f5914ef8 0000000000000001 ffff83023feffc70 ffff82c4801df46e

(XEN)    0000000000000000 ffff83023feffcc4 0000000000003bdf 00000000000001df

(XEN)    ffff8302f5914000 ffff83063fdc0000 ffff83023fefff28 0000000000003bdf

(XEN)    0000000000000002 0000000000000001 0000000000000030 ffff82c4801bafe4

(XEN)    ffff8302f89dc000 000000043fefff28 ffff83023fefff28 0000000000003bdf

(XEN)    00000000002f9223 0000000000000030 ffff83023fefff28 ffff82c48019bab1

(XEN)    0000000000000000 00000001bdc62000 0000000000000000 0000000000000182

(XEN)    ffff8300bdc62000 ffff82c4801b3824 ffff83063fdc0348 07008300bdc62000

(XEN)    ffff83023fe808d0 0000000000000040 000000063fc3601e 0000000000000000

(XEN)    ffff83023fefff28 ffff82c480167d17 ffff82c4802509c0 0000000000000000

(XEN)    0000000003bdf000 000000000001c000 ffff83023feffdc8 0000000000000080

(XEN)    ffff82c480250dd0 0000000000003bdf 00ff82c480250080 ffff82c480250dc0

(XEN)    ffff82c480250080 ffff82c480250dc0 0000000000004040 0000000000000000

(XEN)    0000000000004040 0000000000000040 ffff82c4801447da 0000000000000080

(XEN)    ffff83023fefff28 0000000000000092 ffff82c4801a7f6c 00000000000000fc

(XEN)    0000000000000092 0000000000000006 ffff8300bdc63760 0000000000000006

(XEN)    ffff82c48025c100 ffff82c480250100 ffff82c480250100 0000000000000292

(XEN)    ffff8300bdc637f0 00000249b30f6a00 0000000000000292 ffff82c4801a9383

(XEN)    00000000000000ef ffff8300bdc62000 ffff8300bdc62000 ffff8300bdc637e8

(XEN) Xen call trace:

(XEN)    [<ffff82c4801df2aa>] ept_pod_check_and_populate+0x13a/0x150

(XEN)    [<ffff82c4801df46e>] ept_get_entry+0x1ae/0x1c0

(XEN)    [<ffff82c4801bafe4>] p2m_change_type+0x144/0x1b0

(XEN)    [<ffff82c48019bab1>] hvm_hap_nested_page_fault+0x121/0x190

(XEN)    [<ffff82c4801b3824>] vmx_vmexit_handler+0x304/0x1a90

(XEN)    [<ffff82c480167d17>] __smp_call_function_interrupt+0x57/0x90

(XEN)    [<ffff82c4801447da>] __find_next_bit+0x6a/0x70

(XEN)    [<ffff82c4801a7f6c>] vpic_get_highest_priority_irq+0x2c/0xa0

(XEN)    [<ffff82c4801a9383>] pt_update_irq+0x33/0x1e0

(XEN)    [<ffff82c4801a6042>] vlapic_has_pending_irq+0x42/0x70

(XEN)    [<ffff82c4801a0c88>] hvm_vcpu_has_pending_irq+0x88/0xa0

(XEN)    [<ffff82c4801b263b>] vmx_vmenter_helper+0x5b/0x150

(XEN)    [<ffff82c4801ada63>] vmx_asm_do_vmentry+0x0/0xdd

(XEN)    

(XEN) 

(XEN) ****************************************

(XEN) Panic on CPU 6:

(XEN) Xen BUG at p2m-ept.c:38

(XEN) ****************************************

(XEN) 

(XEN) Manual reset required ('noreboot' specified)

 

---------------------------------------Works
configuration--------------------------------------------------

And if starts guest like 

xm cr hvm.linux.balloon maxmem=1024 memory=512

the guest can be successfully logon through VNC

 

         Any idea on what happens? 

         PoD is new to me, I will try to know more, thanks.

 

From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.28 10:36
sent: tinnycloud; xen devel
cc: george.dunlap@eu.citrix.com
subject: RE: Xen balloon driver discuss

 

Am I understanding correctly that you are running each linux-2.6.18 as HVM
(not PV)?  I didn’t think that the linux-2.6.18 balloon driver worked at
all in an HVM guest.

 

You also didn’t say what version of Xen you are using.  If you are running
xen-unstable, you should also provide the changeset number.

 

In any case, any load of HVM guests should never crash Xen itself, but if
you are running HVM guests, I probably can’t help much as I almost never
run HVM guests.

 

From: cloudroot [mailto:cloudroot@sina.com] 
Sent: Friday, November 26, 2010 11:55 PM
To: tinnycloud; Dan Magenheimer; xen devel
Cc: george.dunlap@eu.citrix.com
Subject: re: Xen balloon driver discuss

 

Hi Dan:

 

         I have set the benchmark to test balloon driver, but unfortunately
the Xen crashed on memory Panic.

         Before I attach the details output from serial port(which takes
time on next run), I am afraid of I might miss something on test
environment.

 

         My dom0 kernel is 2.6.31, pvops. 

Well currently there is no  driver/xen/balloon.c on this kernel source tree,
so I build the xen-balloon.ko, Xen-platform-pci.ko form 

linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.

 

         What I did is put a C program in the each Dom U(total 24 HVM), the
program will allocate the memory and fill it with random string repeatly.

         And in dom0, a phthon monitor will collect the meminfo from
xenstore and calculate the target to balloon from Committed_AS.

The panic happens when the program is running in just one Dom.

 

         I am writing to ask whether my balloon driver is out of date, or
where can I get the latest source code, 

         I’ve googled a lot, but still have a lot of confusion on those
source tree.

 

         Many thanks. 

         

         

From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Date: 2010.11.23 22:58
TO: 'Dan Magenheimer'; 'xen devel'
CC: 'george.dunlap@eu.citrix.com'
Subject: re: Xen balloon driver discuss

 

HI Dan:

 

         Appreciate for your presentation in summarizing the memory
overcommit, really vivid and in great help.

         Well, I guess recently days the strategy in my mind will fall into
the solution Set C in pdf.

 

         The tmem solution your worked out for memory overcommit is both
efficient and effective. 

         I guess I will have a try on Linux Guest.

 

         The real situation I have is most of the running VMs on host are
windows. So I had to come up those policies to balance the memory.

         Although policies are all workload dependent. Good news is host
workload  is configurable, and not very heavy

So I will try to figure out some favorable policy. The policies referred in
pdf are good start for me.

 

         Today, instead of trying to implement “/proc/meminfo” with shared
pages, I hacked the balloon driver to have another 

         workqueue periodically write meminfo into xenstore through xenbus,
which solve the problem of xenstrore high CPU

         utilization  problem.

 

         Later I will try to google more on how Citrix does.

         Thanks for your help, or do you have any better idea for windows
guest?

         

 

Sent: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.23 1:47
To: MaoXiaoyun; xen devel
CC: george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

Xenstore IS slow and you could improve xenballoond performance by only
sending the single CommittedAS value from xenballoond in domU to dom0
instead of all of /proc/meminfo.   But you are making an assumption that
getting memory utilization information from domU to dom0 FASTER (e.g. with a
shared page) will provide better ballooning results.  I have not found this
to be the case, which is what led to my investigation into self-ballooning,
which led to Transcendent Memory.  See the 2010 Xen Summit for more
information.

 

In your last paragraph below “Regards balloon strategy”, the problem is it
is not easy to define “enough memory” and “shortage of memory” within
any guest and almost impossible to define it and effectively load balance
across many guests.  See my Linux Plumber’s Conference presentation (with
complete speaker notes) here:

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-Final.pdf

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-SpkNotes.pdf

 

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] 
Sent: Sunday, November 21, 2010 9:33 PM
To: xen devel
Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my
opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU
periodically
update the meminfo into the page, while on the other side dom0 retrive the
updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page
between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and
transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is ! 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there ! are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 


[-- Attachment #1.2: Type: text/html, Size: 57310 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Xen balloon driver discuss
  2010-11-21  6:26 ` Xen balloon driver discuss tinnycloud
  2010-11-22  4:33   ` MaoXiaoyun
@ 2010-11-29  6:56   ` Chu Rui
  2010-11-29 10:55     ` 答复: [Xen-devel] " tinnycloud
  1 sibling, 1 reply; 32+ messages in thread
From: Chu Rui @ 2010-11-29  6:56 UTC (permalink / raw)
  To: tinnycloud; +Cc: George.Dunlap, dan.magenheimer, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 329 bytes --]

I am also interested with tinnycloud's problem.
It looks that the pod cache has been used up like this:

    if ( p2md->pod.count == 0 )
        goto out_of_memory;

George, would you please take a look on this problem, and, if possbile, tell
a little more about what does PoD cache mean? Is it a memory pool for PoD
allocation?

[-- Attachment #1.2: Type: text/html, Size: 413 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* re: Xen balloon driver discuss
       [not found]           ` <002b01cb8f8f$852bda10$8f838e30$@maoxy@aliyun-inc.com>
@ 2010-11-29  8:37             ` tinnycloud
  2010-11-29 10:09             ` George Dunlap
  1 sibling, 0 replies; 32+ messages in thread
From: tinnycloud @ 2010-11-29  8:37 UTC (permalink / raw)
  To: 'xen devel', george.dunlap, 'tinnycloud'
  Cc: 'Dan Magenheimer'


[-- Attachment #1.1: Type: text/plain, Size: 16757 bytes --]

Well, I forget to print out the pod count, please see below. Thanks  

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88015456b200

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 523776 pod_count 130560

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 523264 pod_count 130048

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 522752 pod_count 129536

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 522240 pod_count 129024

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 521728 pod_count 128512

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 521216 pod_count 128000

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 520704 pod_count 127488

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 520192 pod_count 126976

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 519680 pod_count 126464

 

 

It looks like 512M pod cached is too small. 

Since if I xm memset the hvm to 1G before I login into it through VNC, the
guest won’t crash.

 

So the solution is to enlarge the pod cache,like

xm cr hvm.linux.balloon maxmem=2048 memory=768

 

am I right? 

 

 

To: 'xen devel'; george.dunlap@eu.citrix.com
CC: 'tinnycloud'; 'Dan Magenheimer'
Subject: Re: Xen balloon driver discuss

 

Hi George:

 

         I read
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01404.html
more carefully, and got my print out of

         first call of p2m_pod_demand_populate(), which is :

         

houyi-chunk2.dev.sd.aliyun.com login: blktap_sysfs_create: adding attributes
for dev ffff880122466400

(XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! tot_pages
132088 pod_entries 523776

 

And  memory/target under /local/domain/1/  is 524288.

 

So 523776 is less than 524288, I think the problem is similar, right? 

But the question is why the patch doesn’t work for me.

         Many thanks. 

 

From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Date: 2010年11月29日 12:21
To: 'Dan Magenheimer'; 'xen devel'
CC: 'george.dunlap@eu.citrix.com'
Subject: re: Xen balloon driver discuss

 

Hi Dan:

 

         You are right, the HVM guest is kernel-2.6.18-164.el5.src.rpm,
coming from
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/SRPMS/

         Currently the balloon driver is compiled from this kernel. (So I am
afraid of if the driver may out of date, and I plan to get new balloon.c
from xenlinux and put it into this kernel to compile a new xen-balloon.ko)

         My xen is 4.0.0, again pvops kernel 2.6.31

         

         Actually, I have two problems, first is PoD “populate-on-demand
memory” issue,  and second is xen panic(I will get more test and report on
another reply)

I have googled some and apply the patch from
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01404.html,
but it doesn’t work for me.

 

-------------------------------------------------Domain Crash
Case---------------------------------------------

The issue is easy to reproduce, I started one HVM with command line:

xm cr hvm.linux.balloon maxmem=2048 memory=512

 

         the guest works well at first, but crashed as long as I logined
into it throught VNC

 

         the serial output is:

 

         blktap_sysfs_create: adding attributes for dev ffff8801224df000

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages
132088 pod_entries 9489

(XEN) domain_crash called from p2m.c:1127

(XEN) Domain 4 reported crashed by domain 0 on cpu#0:

(XEN) printk: 31 messages suppressed.

(XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff
(domain 4)

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259ca00

 

-------------------------------------------------Xen Crash
Case---------------------------------------------

In addition, if start guest like 

m cr hvm.linux.balloon maxmem=2048 memory=400

 

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff8801224df000

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages
132088 pod_entries 9489

(XEN) domain_crash called from p2m.c:1127

(XEN) Domain 4 reported crashed by domain 0 on cpu#0:

(XEN) printk: 31 messages suppressed.

(XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff
(domain 4)

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259ca00

blktap_sysfs_destroy

blktap_sysfs_create: adding attributes for dev ffff88012259c600

 (XEN) Error: p2m lock held by p2m_change_type

(XEN) Xen BUG at p2m-ept.c:38

(XEN) ----[ Xen-4.0.0  x86_64  debug=n  Not tainted ]----

(XEN) CPU:    6

(XEN) RIP:    e008:[<ffff82c4801df2aa>]
ept_pod_check_and_populate+0x13a/0x150

(XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor

(XEN) rax: 0000000000000000   rbx: ffff83063fdc0000   rcx: 0000000000000092

(XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi: ffff82c48021e844

(XEN) rbp: ffff83023fefff28   rsp: ffff83023feffc18   r8:  0000000000000001

(XEN) r9:  0000000000000001   r10: 0000000000000000   r11: ffff82c4801318d0

(XEN) r12: ffff8302f5914ef8   r13: 0000000000000001   r14: 0000000000000000

(XEN) r15: 0000000000003bdf   cr0: 0000000080050033   cr4: 00000000000026f0

(XEN) cr3: 000000063fc2e000   cr2: 00002ba99c046000

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

(XEN) Xen stack trace from rsp=ffff83023feffc18:

(XEN)    0000000000000002 0000000000000000 0000000000000000 ffff83063fdc0000

(XEN)    ffff8302f5914ef8 0000000000000001 ffff83023feffc70 ffff82c4801df46e

(XEN)    0000000000000000 ffff83023feffcc4 0000000000003bdf 00000000000001df

(XEN)    ffff8302f5914000 ffff83063fdc0000 ffff83023fefff28 0000000000003bdf

(XEN)    0000000000000002 0000000000000001 0000000000000030 ffff82c4801bafe4

(XEN)    ffff8302f89dc000 000000043fefff28 ffff83023fefff28 0000000000003bdf

(XEN)    00000000002f9223 0000000000000030 ffff83023fefff28 ffff82c48019bab1

(XEN)    0000000000000000 00000001bdc62000 0000000000000000 0000000000000182

(XEN)    ffff8300bdc62000 ffff82c4801b3824 ffff83063fdc0348 07008300bdc62000

(XEN)    ffff83023fe808d0 0000000000000040 000000063fc3601e 0000000000000000

(XEN)    ffff83023fefff28 ffff82c480167d17 ffff82c4802509c0 0000000000000000

(XEN)    0000000003bdf000 000000000001c000 ffff83023feffdc8 0000000000000080

(XEN)    ffff82c480250dd0 0000000000003bdf 00ff82c480250080 ffff82c480250dc0

(XEN)    ffff82c480250080 ffff82c480250dc0 0000000000004040 0000000000000000

(XEN)    0000000000004040 0000000000000040 ffff82c4801447da 0000000000000080

(XEN)    ffff83023fefff28 0000000000000092 ffff82c4801a7f6c 00000000000000fc

(XEN)    0000000000000092 0000000000000006 ffff8300bdc63760 0000000000000006

(XEN)    ffff82c48025c100 ffff82c480250100 ffff82c480250100 0000000000000292

(XEN)    ffff8300bdc637f0 00000249b30f6a00 0000000000000292 ffff82c4801a9383

(XEN)    00000000000000ef ffff8300bdc62000 ffff8300bdc62000 ffff8300bdc637e8

(XEN) Xen call trace:

(XEN)    [<ffff82c4801df2aa>] ept_pod_check_and_populate+0x13a/0x150

(XEN)    [<ffff82c4801df46e>] ept_get_entry+0x1ae/0x1c0

(XEN)    [<ffff82c4801bafe4>] p2m_change_type+0x144/0x1b0

(XEN)    [<ffff82c48019bab1>] hvm_hap_nested_page_fault+0x121/0x190

(XEN)    [<ffff82c4801b3824>] vmx_vmexit_handler+0x304/0x1a90

(XEN)    [<ffff82c480167d17>] __smp_call_function_interrupt+0x57/0x90

(XEN)    [<ffff82c4801447da>] __find_next_bit+0x6a/0x70

(XEN)    [<ffff82c4801a7f6c>] vpic_get_highest_priority_irq+0x2c/0xa0

(XEN)    [<ffff82c4801a9383>] pt_update_irq+0x33/0x1e0

(XEN)    [<ffff82c4801a6042>] vlapic_has_pending_irq+0x42/0x70

(XEN)    [<ffff82c4801a0c88>] hvm_vcpu_has_pending_irq+0x88/0xa0

(XEN)    [<ffff82c4801b263b>] vmx_vmenter_helper+0x5b/0x150

(XEN)    [<ffff82c4801ada63>] vmx_asm_do_vmentry+0x0/0xdd

(XEN)    

(XEN) 

(XEN) ****************************************

(XEN) Panic on CPU 6:

(XEN) Xen BUG at p2m-ept.c:38

(XEN) ****************************************

(XEN) 

(XEN) Manual reset required ('noreboot' specified)

 

---------------------------------------Works
configuration--------------------------------------------------

And if starts guest like 

xm cr hvm.linux.balloon maxmem=1024 memory=512

the guest can be successfully logon through VNC

 

         Any idea on what happens? 

         PoD is new to me, I will try to know more, thanks.

 

From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.28 10:36
sent: tinnycloud; xen devel
cc: george.dunlap@eu.citrix.com
subject: RE: Xen balloon driver discuss

 

Am I understanding correctly that you are running each linux-2.6.18 as HVM
(not PV)?  I didn’t think that the linux-2.6.18 balloon driver worked at
all in an HVM guest.

 

You also didn’t say what version of Xen you are using.  If you are running
xen-unstable, you should also provide the changeset number.

 

In any case, any load of HVM guests should never crash Xen itself, but if
you are running HVM guests, I probably can’t help much as I almost never
run HVM guests.

 

From: cloudroot [mailto:cloudroot@sina.com] 
Sent: Friday, November 26, 2010 11:55 PM
To: tinnycloud; Dan Magenheimer; xen devel
Cc: george.dunlap@eu.citrix.com
Subject: re: Xen balloon driver discuss

 

Hi Dan:

 

         I have set the benchmark to test balloon driver, but unfortunately
the Xen crashed on memory Panic.

         Before I attach the details output from serial port(which takes
time on next run), I am afraid of I might miss something on test
environment.

 

         My dom0 kernel is 2.6.31, pvops. 

Well currently there is no  driver/xen/balloon.c on this kernel source tree,
so I build the xen-balloon.ko, Xen-platform-pci.ko form 

linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.

 

         What I did is put a C program in the each Dom U(total 24 HVM), the
program will allocate the memory and fill it with random string repeatly.

         And in dom0, a phthon monitor will collect the meminfo from
xenstore and calculate the target to balloon from Committed_AS.

The panic happens when the program is running in just one Dom.

 

         I am writing to ask whether my balloon driver is out of date, or
where can I get the latest source code, 

         I’ve googled a lot, but still have a lot of confusion on those
source tree.

 

         Many thanks. 

         

         

From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Date: 2010.11.23 22:58
TO: 'Dan Magenheimer'; 'xen devel'
CC: 'george.dunlap@eu.citrix.com'
Subject: re: Xen balloon driver discuss

 

HI Dan:

 

         Appreciate for your presentation in summarizing the memory
overcommit, really vivid and in great help.

         Well, I guess recently days the strategy in my mind will fall into
the solution Set C in pdf.

 

         The tmem solution your worked out for memory overcommit is both
efficient and effective. 

         I guess I will have a try on Linux Guest.

 

         The real situation I have is most of the running VMs on host are
windows. So I had to come up those policies to balance the memory.

         Although policies are all workload dependent. Good news is host
workload  is configurable, and not very heavy

So I will try to figure out some favorable policy. The policies referred in
pdf are good start for me.

 

         Today, instead of trying to implement “/proc/meminfo” with shared
pages, I hacked the balloon driver to have another 

         workqueue periodically write meminfo into xenstore through xenbus,
which solve the problem of xenstrore high CPU

         utilization  problem.

 

         Later I will try to google more on how Citrix does.

         Thanks for your help, or do you have any better idea for windows
guest?

         

 

Sent: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
Date: 2010.11.23 1:47
To: MaoXiaoyun; xen devel
CC: george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

Xenstore IS slow and you could improve xenballoond performance by only
sending the single CommittedAS value from xenballoond in domU to dom0
instead of all of /proc/meminfo.   But you are making an assumption that
getting memory utilization information from domU to dom0 FASTER (e.g. with a
shared page) will provide better ballooning results.  I have not found this
to be the case, which is what led to my investigation into self-ballooning,
which led to Transcendent Memory.  See the 2010 Xen Summit for more
information.

 

In your last paragraph below “Regards balloon strategy”, the problem is it
is not easy to define “enough memory” and “shortage of memory” within
any guest and almost impossible to define it and effectively load balance
across many guests.  See my Linux Plumber’s Conference presentation (with
complete speaker notes) here:

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-Final.pdf

 

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmt
VirtEnv-LPC2010-SpkNotes.pdf

 

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] 
Sent: Sunday, November 21, 2010 9:33 PM
To: xen devel
Cc: Dan Magenheimer; george.dunlap@eu.citrix.com
Subject: RE: Xen balloon driver discuss

 

 
Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in my
opinoin is slow.
What I want to do is: there is a shared page between domU and dom0, and domU
periodically
update the meminfo into the page, while on the other side dom0 retrive the
updated data for
caculating the target, which is used by guest for balloning.
 
The problem I met is,  currently I don't know how to implement a shared page
between 
dom0 and domU. 
Would it like dom 0 alloc a unbound event and wait guest to connect, and
transfer date through
grant table?
Or someone has more efficient way?
many thanks.
 
> From: tinnycloud@hotmail.com
> To: xen-devel@lists.xensource.com
> CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> Subject: Xen balloon driver discuss
> Date: Sun, 21 Nov 2010 14:26:01 +0800
> 
> Hi: 
> Greeting first.
> 
> I was trying to run about 24 HVMS (currently only Linux, later will
> involve Windows) on one physical server with 24GB memory, 16CPUs.
> Each VM is configured with 2GB memory, and I reserved 8GB memory for
> dom0. 
> For safety reason, only domain U's memory is allowed to balloon.
> 
> Inside domain U, I used xenballooned provide by xensource,
> periodically write /proc/meminfo into xenstore in dom
> 0(/local/domain/did/memory/meminfo).
> And in domain 0, I wrote a python script to read the meminfo, like
> xen provided strategy, use Committed_AS to calculate the domain U balloon
> target.
> The time interval is ! 1 seconds.
> 
> Inside each VM, I setup a apache server for test. Well, I'd
> like to say the result is not so good.
> It appears that too much read/write on xenstore, when I give some of
> the stress(by using ab) to guest domains, 
> the CPU usage of xenstore is up to 100%. Thus the monitor running in
> dom0 also response quite slowly.
> Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> in short time, but in fact the only a small amount
> of memory guest really need, so I guess there should be some more to
> be taken into consideration for ballooning.
> 
> For xenstore issue, I first plan to wrote a C program inside domain
> U to replace xenballoond to see whether the situation
> will be refined. If not, how about set up event channel directly for
> domU and dom0, would it be faster?
> 
> Regards balloon strategy, I would do like this, when there ! are
> enough memory , just fulfill the guest balloon request, and when shortage 
> of memory, distribute memory evenly on the guests those request
> inflation.
> 
> Does anyone have better suggestion, thanks in advance.
> 


[-- Attachment #1.2: Type: text/html, Size: 62522 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Xen balloon driver discuss
       [not found]           ` <002b01cb8f8f$852bda10$8f838e30$@maoxy@aliyun-inc.com>
  2010-11-29  8:37             ` tinnycloud
@ 2010-11-29 10:09             ` George Dunlap
  1 sibling, 0 replies; 32+ messages in thread
From: George Dunlap @ 2010-11-29 10:09 UTC (permalink / raw)
  To: xiaoyun.maoxy
  Cc: 'tinnycloud', 'xen devel', Dan Magenheimer, Chu Rui

No, you're confusing two things.  pod_entries is the number of entries
in the p2m table that have neither been populated with memory, nor been
reclaimed by the balloon driver.

Are you sure the balloon driver is actually working?

Chu: Yes, the PoD "cache" is the memory pool which is used to populate
PoD entries.  "Cache" is a bad name, I should have called it "pool" to
begin with.

 -George

On 29/11/10 06:34, xiaoyun.maoxy wrote:
> Hi George:
> 
> I read 
> http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01404.html 
> more carefully, and got my print out of
> 
> first call of p2m_pod_demand_populate(), which is :
> 
> houyi-chunk2.dev.sd.aliyun.com login: blktap_sysfs_create: adding 
> attributes for dev ffff880122466400
> 
> (XEN) p2m_pod_demand_populate: =========pulate-on-demand memory! 
> tot_pages 132088 pod_entries 523776
> 
> And memory/target under /local/domain/1/ is 524288.
> 
> So 523776 is less than 524288, I think the problem is similar, right?
> 
> But the question is why the patch doesn’t work for me.
> 
> Many thanks.
> 
> *From:* tinnycloud [mailto:tinnycloud@hotmail.com]
> *Date:* 2010年11月29日 12:21
> *To:* 'Dan Magenheimer'; 'xen devel'
> *CC:* 'george.dunlap@eu.citrix.com'
> *Subject:* re: Xen balloon driver discuss
> 
> Hi Dan:
> 
> You are right, the HVM guest is kernel-2.6.18-164.el5.src.rpm, coming 
> from ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/SRPMS/
> 
> Currently the balloon driver is compiled from this kernel. (So I am 
> afraid of if the driver may out of date, and I plan to get new balloon.c 
> from xenlinux and put it into this kernel to compile a new xen-balloon.ko)
> 
> My xen is 4.0.0, again pvops kernel 2.6.31
> 
> Actually, I have two problems, first is PoD “populate-on-demand memory” 
> issue, and second is xen panic(I will get more test and report on 
> another reply)
> 
> I have googled some and apply the patch from 
> http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01404.html, but 
> it doesn’t work for me.
> 
> -------------------------------------------------Domain Crash 
> Case---------------------------------------------
> 
> The issue is easy to reproduce, I started one HVM with command line:
> 
> xm cr hvm.linux.balloon maxmem=2048 memory=512
> 
> the guest works well at first, but crashed as long as I logined into it 
> throught VNC
> 
> the serial output is:
> 
> blktap_sysfs_create: adding attributes for dev ffff8801224df000
> 
> (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 
> tot_pages 132088 pod_entries 9489
> 
> (XEN) domain_crash called from p2m.c:1127
> 
> (XEN) Domain 4 reported crashed by domain 0 on cpu#0:
> 
> (XEN) printk: 31 messages suppressed.
> 
> (XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff 
> (domain 4)
> 
> blktap_sysfs_destroy
> 
> blktap_sysfs_create: adding attributes for dev ffff88012259ca00
> 
> -------------------------------------------------Xen Crash 
> Case---------------------------------------------
> 
> In addition, if start guest like
> 
> m cr hvm.linux.balloon maxmem=2048 memory=400
> 
> blktap_sysfs_destroy
> 
> blktap_sysfs_create: adding attributes for dev ffff8801224df000
> 
> (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 
> tot_pages 132088 pod_entries 9489
> 
> (XEN) domain_crash called from p2m.c:1127
> 
> (XEN) Domain 4 reported crashed by domain 0 on cpu#0:
> 
> (XEN) printk: 31 messages suppressed.
> 
> (XEN) grant_table.c:555:d0 Iomem mapping not permitted ffffffffffffffff 
> (domain 4)
> 
> blktap_sysfs_destroy
> 
> blktap_sysfs_create: adding attributes for dev ffff88012259ca00
> 
> blktap_sysfs_destroy
> 
> blktap_sysfs_create: adding attributes for dev ffff88012259c600
> 
> (XEN) Error: p2m lock held by p2m_change_type
> 
> (XEN) Xen BUG at p2m-ept.c:38
> 
> (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]----
> 
> (XEN) CPU: 6
> 
> (XEN) RIP: e008:[<ffff82c4801df2aa>] ept_pod_check_and_populate+0x13a/0x150
> 
> (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor
> 
> (XEN) rax: 0000000000000000 rbx: ffff83063fdc0000 rcx: 0000000000000092
> 
> (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48021e844
> 
> (XEN) rbp: ffff83023fefff28 rsp: ffff83023feffc18 r8: 0000000000000001
> 
> (XEN) r9: 0000000000000001 r10: 0000000000000000 r11: ffff82c4801318d0
> 
> (XEN) r12: ffff8302f5914ef8 r13: 0000000000000001 r14: 0000000000000000
> 
> (XEN) r15: 0000000000003bdf cr0: 0000000080050033 cr4: 00000000000026f0
> 
> (XEN) cr3: 000000063fc2e000 cr2: 00002ba99c046000
> 
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> 
> (XEN) Xen stack trace from rsp=ffff83023feffc18:
> 
> (XEN) 0000000000000002 0000000000000000 0000000000000000 ffff83063fdc0000
> 
> (XEN) ffff8302f5914ef8 0000000000000001 ffff83023feffc70 ffff82c4801df46e
> 
> (XEN) 0000000000000000 ffff83023feffcc4 0000000000003bdf 00000000000001df
> 
> (XEN) ffff8302f5914000 ffff83063fdc0000 ffff83023fefff28 0000000000003bdf
> 
> (XEN) 0000000000000002 0000000000000001 0000000000000030 ffff82c4801bafe4
> 
> (XEN) ffff8302f89dc000 000000043fefff28 ffff83023fefff28 0000000000003bdf
> 
> (XEN) 00000000002f9223 0000000000000030 ffff83023fefff28 ffff82c48019bab1
> 
> (XEN) 0000000000000000 00000001bdc62000 0000000000000000 0000000000000182
> 
> (XEN) ffff8300bdc62000 ffff82c4801b3824 ffff83063fdc0348 07008300bdc62000
> 
> (XEN) ffff83023fe808d0 0000000000000040 000000063fc3601e 0000000000000000
> 
> (XEN) ffff83023fefff28 ffff82c480167d17 ffff82c4802509c0 0000000000000000
> 
> (XEN) 0000000003bdf000 000000000001c000 ffff83023feffdc8 0000000000000080
> 
> (XEN) ffff82c480250dd0 0000000000003bdf 00ff82c480250080 ffff82c480250dc0
> 
> (XEN) ffff82c480250080 ffff82c480250dc0 0000000000004040 0000000000000000
> 
> (XEN) 0000000000004040 0000000000000040 ffff82c4801447da 0000000000000080
> 
> (XEN) ffff83023fefff28 0000000000000092 ffff82c4801a7f6c 00000000000000fc
> 
> (XEN) 0000000000000092 0000000000000006 ffff8300bdc63760 0000000000000006
> 
> (XEN) ffff82c48025c100 ffff82c480250100 ffff82c480250100 0000000000000292
> 
> (XEN) ffff8300bdc637f0 00000249b30f6a00 0000000000000292 ffff82c4801a9383
> 
> (XEN) 00000000000000ef ffff8300bdc62000 ffff8300bdc62000 ffff8300bdc637e8
> 
> (XEN) Xen call trace:
> 
> (XEN) [<ffff82c4801df2aa>] ept_pod_check_and_populate+0x13a/0x150
> 
> (XEN) [<ffff82c4801df46e>] ept_get_entry+0x1ae/0x1c0
> 
> (XEN) [<ffff82c4801bafe4>] p2m_change_type+0x144/0x1b0
> 
> (XEN) [<ffff82c48019bab1>] hvm_hap_nested_page_fault+0x121/0x190
> 
> (XEN) [<ffff82c4801b3824>] vmx_vmexit_handler+0x304/0x1a90
> 
> (XEN) [<ffff82c480167d17>] __smp_call_function_interrupt+0x57/0x90
> 
> (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70
> 
> (XEN) [<ffff82c4801a7f6c>] vpic_get_highest_priority_irq+0x2c/0xa0
> 
> (XEN) [<ffff82c4801a9383>] pt_update_irq+0x33/0x1e0
> 
> (XEN) [<ffff82c4801a6042>] vlapic_has_pending_irq+0x42/0x70
> 
> (XEN) [<ffff82c4801a0c88>] hvm_vcpu_has_pending_irq+0x88/0xa0
> 
> (XEN) [<ffff82c4801b263b>] vmx_vmenter_helper+0x5b/0x150
> 
> (XEN) [<ffff82c4801ada63>] vmx_asm_do_vmentry+0x0/0xdd
> 
> (XEN)
> 
> (XEN)
> 
> (XEN) ****************************************
> 
> (XEN) Panic on CPU 6:
> 
> (XEN) Xen BUG at p2m-ept.c:38
> 
> (XEN) ****************************************
> 
> (XEN)
> 
> (XEN) Manual reset required ('noreboot' specified)
> 
> ---------------------------------------Works 
> configuration--------------------------------------------------
> 
> And if starts guest like
> 
> xm cr hvm.linux.balloon maxmem=1024 memory=512
> 
> the guest can be successfully logon through VNC
> 
> Any idea on what happens?
> 
> PoD is new to me, I will try to know more, thanks.
> 
> *From:* Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> *Date:* 2010.11.28 10:36
> *sent:* tinnycloud; xen devel
> *cc:* george.dunlap@eu.citrix.com
> *subject:* RE: Xen balloon driver discuss
> 
> Am I understanding correctly that you are running each linux-2.6.18 as 
> HVM (not PV)? I didn’t think that the linux-2.6.18 balloon driver worked 
> at all in an HVM guest.
> 
> You also didn’t say what version of Xen you are using. If you are 
> running xen-unstable, you should also provide the changeset number.
> 
> In any case, any load of HVM guests should never crash Xen itself, but 
> if you are running HVM guests, I probably can’t help much as I almost 
> never run HVM guests.
> 
> *From:* cloudroot [mailto:cloudroot@sina.com]
> *Sent:* Friday, November 26, 2010 11:55 PM
> *To:* tinnycloud; Dan Magenheimer; xen devel
> *Cc:* george.dunlap@eu.citrix.com
> *Subject:* re: Xen balloon driver discuss
> 
> Hi Dan:
> 
> I have set the benchmark to test balloon driver, but unfortunately the 
> Xen crashed on memory Panic.
> 
> Before I attach the details output from serial port(which takes time on 
> next run), I am afraid of I might miss something on test environment.
> 
> My dom0 kernel is 2.6.31, pvops.
> 
> Well currently there is no driver/xen/balloon.c on this kernel source 
> tree, so I build the xen-balloon.ko, Xen-platform-pci.ko form
> 
> linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.
> 
> What I did is put a C program in the each Dom U(total 24 HVM), the 
> program will allocate the memory and fill it with random string repeatly.
> 
> And in dom0, a phthon monitor will collect the meminfo from xenstore and 
> calculate the target to balloon from Committed_AS.
> 
> The panic happens when the program is running in just one Dom.
> 
> I am writing to ask whether my balloon driver is out of date, or where 
> can I get the latest source code,
> 
> I’ve googled a lot, but still have a lot of confusion on those source tree.
> 
> Many thanks.
> 
> *From:* tinnycloud [mailto:tinnycloud@hotmail.com]
> *Date:* 2010.11.23 22:58
> *TO:* 'Dan Magenheimer'; 'xen devel'
> *CC:* 'george.dunlap@eu.citrix.com'
> *Subject:* re: Xen balloon driver discuss
> 
> HI Dan:
> 
> Appreciate for your presentation in summarizing the memory overcommit, 
> really vivid and in great help.
> 
> Well, I guess recently days the strategy in my mind will fall into the 
> solution Set C in pdf.
> 
> The tmem solution your worked out for memory overcommit is both 
> efficient and effective.
> 
> I guess I will have a try on Linux Guest.
> 
> The real situation I have is most of the running VMs on host are 
> windows. So I had to come up those policies to balance the memory.
> 
> Although policies are all workload dependent. Good news is host workload 
> is configurable, and not very heavy
> 
> So I will try to figure out some favorable policy. The policies referred 
> in pdf are good start for me.
> 
> Today, instead of trying to implement “/proc/meminfo” with shared pages, 
> I hacked the balloon driver to have another
> 
> workqueue periodically write meminfo into xenstore through xenbus, which 
> solve the problem of xenstrore high CPU
> 
> utilization problem.
> 
> Later I will try to google more on how Citrix does.
> 
> Thanks for your help, or do you have any better idea for windows guest?
> 
> *Sent:* Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> *Date:* 2010.11.23 1:47
> *To:* MaoXiaoyun; xen devel
> *CC:* george.dunlap@eu.citrix.com
> *Subject:* RE: Xen balloon driver discuss
> 
> Xenstore IS slow and you could improve xenballoond performance by only 
> sending the single CommittedAS value from xenballoond in domU to dom0 
> instead of all of /proc/meminfo. But you are making an assumption that 
> getting memory utilization information from domU to dom0 FASTER (e.g. 
> with a shared page) will provide better ballooning results. I have not 
> found this to be the case, which is what led to my investigation into 
> self-ballooning, which led to Transcendent Memory. See the 2010 Xen 
> Summit for more information.
> 
> In your last paragraph below “Regards balloon strategy”, the problem is 
> it is not easy to define “enough memory” and “shortage of memory” within 
> any guest and almost impossible to define it and effectively load 
> balance across many guests. See my Linux Plumber’s Conference 
> presentation (with complete speaker notes) here:
> 
> http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-Final.pdf
> 
> http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-SpkNotes.pdf
> 
> *From:* MaoXiaoyun [mailto:tinnycloud@hotmail.com]
> *Sent:* Sunday, November 21, 2010 9:33 PM
> *To:* xen devel
> *Cc:* Dan Magenheimer; george.dunlap@eu.citrix.com
> *Subject:* RE: Xen balloon driver discuss
> 
> 
> Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in 
> my opinoin is slow.
> What I want to do is: there is a shared page between domU and dom0, and 
> domU periodically
> update the meminfo into the page, while on the other side dom0 retrive 
> the updated data for
> caculating the target, which is used by guest for balloning.
> 
> The problem I met is, currently I don't know how to implement a shared 
> page between
> dom0 and domU.
> Would it like dom 0 alloc a unbound event and wait guest to connect, and 
> transfer date through
> grant table?
> Or someone has more efficient way?
> many thanks.
> 
>>  From: tinnycloud@hotmail.com
>>  To: xen-devel@lists.xensource.com
>>  CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
>>  Subject: Xen balloon driver discuss
>>  Date: Sun, 21 Nov 2010 14:26:01 +0800
>>
>>  Hi:
>>  Greeting first.
>>
>>  I was trying to run about 24 HVMS (currently only Linux, later will
>>  involve Windows) on one physical server with 24GB memory, 16CPUs.
>>  Each VM is configured with 2GB memory, and I reserved 8GB memory for
>>  dom0.
>>  For safety reason, only domain U's memory is allowed to balloon.
>>
>>  Inside domain U, I used xenballooned provide by xensource,
>>  periodically write /proc/meminfo into xenstore in dom
>>  0(/local/domain/did/memory/meminfo).
>>  And in domain 0, I wrote a python script to read the meminfo, like
>>  xen provided strategy, use Committed_AS to calculate the domain U balloon
>>  target.
>>  The time interval is ! 1 seconds.
>>
>>  Inside each VM, I setup a apache server for test. Well, I'd
>>  like to say the result is not so good.
>>  It appears that too much read/write on xenstore, when I give some of
>>  the stress(by using ab) to guest domains,
>>  the CPU usage of xenstore is up to 100%. Thus the monitor running in
>>  dom0 also response quite slowly.
>>  Also, in ab test, the Committed_AS grows very fast, reach to maxmem
>>  in short time, but in fact the only a small amount
>>  of memory guest really need, so I guess there should be some more to
>>  be taken into consideration for ballooning.
>>
>>  For xenstore issue, I first plan to wrote a C program inside domain
>>  U to replace xenballoond to see whether the situation
>>  will be refined. If not, how about set up event channel directly for
>>  domU and dom0, would it be faster?
>>
>>  Regards balloon strategy, I would do like this, when there ! are
>>  enough memory , just fulfill the guest balloon request, and when shortage
>>  of memory, distribute memory evenly on the guests those request
>>  inflation.
>>
>>  Does anyone have better suggestion, thanks in advance.
>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Xen balloon driver discuss
  2010-11-28  2:36         ` Dan Magenheimer
                             ` (2 preceding siblings ...)
       [not found]           ` <002b01cb8f8f$852bda10$8f838e30$@maoxy@aliyun-inc.com>
@ 2010-11-29 10:12           ` George Dunlap
  2010-11-29 15:42             ` Dan Magenheimer
  3 siblings, 1 reply; 32+ messages in thread
From: George Dunlap @ 2010-11-29 10:12 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: cloudroot, tinnycloud, xen devel

FYI, the balloon driver in 2.6.18 was meant to be working at some point. 
  The xen tree has some drivers which will compile for 2.6.18 externally 
and will run in HVM mode.  More modern kernels need Stefano's pv-on-hvm 
patch series to be able to access xenstore (which is a requisite for a 
working balloon driver).

  -George

On 28/11/10 02:36, Dan Magenheimer wrote:
> Am I understanding correctly that you are running each linux-2.6.18 as
> HVM (not PV)? I didn’t think that the linux-2.6.18 balloon driver worked
> at all in an HVM guest.
>
> You also didn’t say what version of Xen you are using. If you are
> running xen-unstable, you should also provide the changeset number.
>
> In any case, any load of HVM guests should never crash Xen itself, but
> if you are running HVM guests, I probably can’t help much as I almost
> never run HVM guests.
>
> *From:* cloudroot [mailto:cloudroot@sina.com]
> *Sent:* Friday, November 26, 2010 11:55 PM
> *To:* tinnycloud; Dan Magenheimer; xen devel
> *Cc:* george.dunlap@eu.citrix.com
> *Subject:* re: Xen balloon driver discuss
>
> Hi Dan:
>
> I have set the benchmark to test balloon driver, but unfortunately the
> Xen crashed on memory Panic.
>
> Before I attach the details output from serial port(which takes time on
> next run), I am afraid of I might miss something on test environment.
>
> My dom0 kernel is 2.6.31, pvops.
>
> Well currently there is no driver/xen/balloon.c on this kernel source
> tree, so I build the xen-balloon.ko, Xen-platform-pci.ko form
>
> linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.
>
> What I did is put a C program in the each Dom U(total 24 HVM), the
> program will allocate the memory and fill it with random string repeatly.
>
> And in dom0, a phthon monitor will collect the meminfo from xenstore and
> calculate the target to balloon from Committed_AS.
>
> The panic happens when the program is running in just one Dom.
>
> I am writing to ask whether my balloon driver is out of date, or where
> can I get the latest source code,
>
> I’ve googled a lot, but still have a lot of confusion on those source tree.
>
> Many thanks.
>
> *From:* tinnycloud [mailto:tinnycloud@hotmail.com]
> *Date:* 2010.11.23 22:58
> *TO:* 'Dan Magenheimer'; 'xen devel'
> *CC:* 'george.dunlap@eu.citrix.com'
> *Subject:* re: Xen balloon driver discuss
>
> HI Dan:
>
> Appreciate for your presentation in summarizing the memory overcommit,
> really vivid and in great help.
>
> Well, I guess recently days the strategy in my mind will fall into the
> solution Set C in pdf.
>
> The tmem solution your worked out for memory overcommit is both
> efficient and effective.
>
> I guess I will have a try on Linux Guest.
>
> The real situation I have is most of the running VMs on host are
> windows. So I had to come up those policies to balance the memory.
>
> Although policies are all workload dependent. Good news is host workload
> is configurable, and not very heavy
>
> So I will try to figure out some favorable policy. The policies referred
> in pdf are good start for me.
>
> Today, instead of trying to implement “/proc/meminfo” with shared pages,
> I hacked the balloon driver to have another
>
> workqueue periodically write meminfo into xenstore through xenbus, which
> solve the problem of xenstrore high CPU
>
> utilization problem.
>
> Later I will try to google more on how Citrix does.
>
> Thanks for your help, or do you have any better idea for windows guest?
>
> *Sent:* Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> *Date:* 2010.11.23 1:47
> *To:* MaoXiaoyun; xen devel
> *CC:* george.dunlap@eu.citrix.com
> *Subject:* RE: Xen balloon driver discuss
>
> Xenstore IS slow and you could improve xenballoond performance by only
> sending the single CommittedAS value from xenballoond in domU to dom0
> instead of all of /proc/meminfo. But you are making an assumption that
> getting memory utilization information from domU to dom0 FASTER (e.g.
> with a shared page) will provide better ballooning results. I have not
> found this to be the case, which is what led to my investigation into
> self-ballooning, which led to Transcendent Memory. See the 2010 Xen
> Summit for more information.
>
> In your last paragraph below “Regards balloon strategy”, the problem is
> it is not easy to define “enough memory” and “shortage of memory” within
> any guest and almost impossible to define it and effectively load
> balance across many guests. See my Linux Plumber’s Conference
> presentation (with complete speaker notes) here:
>
> http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-Final.pdf
>
> http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-SpkNotes.pdf
>
> *From:* MaoXiaoyun [mailto:tinnycloud@hotmail.com]
> *Sent:* Sunday, November 21, 2010 9:33 PM
> *To:* xen devel
> *Cc:* Dan Magenheimer; george.dunlap@eu.citrix.com
> *Subject:* RE: Xen balloon driver discuss
>
>
> Since currently /cpu/meminfo is sent to domain 0 via xenstore, which in
> my opinoin is slow.
> What I want to do is: there is a shared page between domU and dom0, and
> domU periodically
> update the meminfo into the page, while on the other side dom0 retrive
> the updated data for
> caculating the target, which is used by guest for balloning.
>
> The problem I met is, currently I don't know how to implement a shared
> page between
> dom0 and domU.
> Would it like dom 0 alloc a unbound event and wait guest to connect, and
> transfer date through
> grant table?
> Or someone has more efficient way?
> many thanks.
>
>>  From: tinnycloud@hotmail.com
>>  To: xen-devel@lists.xensource.com
>>  CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
>>  Subject: Xen balloon driver discuss
>>  Date: Sun, 21 Nov 2010 14:26:01 +0800
>>
>>  Hi:
>>  Greeting first.
>>
>>  I was trying to run about 24 HVMS (currently only Linux, later will
>>  involve Windows) on one physical server with 24GB memory, 16CPUs.
>>  Each VM is configured with 2GB memory, and I reserved 8GB memory for
>>  dom0.
>>  For safety reason, only domain U's memory is allowed to balloon.
>>
>>  Inside domain U, I used xenballooned provide by xensource,
>>  periodically write /proc/meminfo into xenstore in dom
>>  0(/local/domain/did/memory/meminfo).
>>  And in domain 0, I wrote a python script to read the meminfo, like
>>  xen provided strategy, use Committed_AS to calculate the domain U balloon
>>  target.
>>  The time interval is ! 1 seconds.
>>
>>  Inside each VM, I setup a apache server for test. Well, I'd
>>  like to say the result is not so good.
>>  It appears that too much read/write on xenstore, when I give some of
>>  the stress(by using ab) to guest domains,
>>  the CPU usage of xenstore is up to 100%. Thus the monitor running in
>>  dom0 also response quite slowly.
>>  Also, in ab test, the Committed_AS grows very fast, reach to maxmem
>>  in short time, but in fact the only a small amount
>>  of memory guest really need, so I guess there should be some more to
>>  be taken into consideration for ballooning.
>>
>>  For xenstore issue, I first plan to wrote a C program inside domain
>>  U to replace xenballoond to see whether the situation
>>  will be refined. If not, how about set up event channel directly for
>>  domU and dom0, would it be faster?
>>
>>  Regards balloon strategy, I would do like this, when there ! are
>>  enough memory , just fulfill the guest balloon request, and when shortage
>>  of memory, distribute memory evenly on the guests those request
>>  inflation.
>>
>>  Does anyone have better suggestion, thanks in advance.
>>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* 答复: [Xen-devel] Xen balloon driver discuss
  2010-11-29  6:56   ` Chu Rui
@ 2010-11-29 10:55     ` tinnycloud
  2010-11-29 11:19       ` George Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: tinnycloud @ 2010-11-29 10:55 UTC (permalink / raw)
  To: 'Chu Rui'; +Cc: George.Dunlap, dan.magenheimer, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1441 bytes --]

Well, I think I know the problem.

 

The PoD expects Guest domain to balloon to the Max memory when getting
start.

But in fact, I only have balloon driver installed in domain U, but have no
actual balloon work.

 

So that is, if we run out of PoD cache before balloon works, Xen will crash
domain(goto out_of_memory), 

and at this situation, domain U swap(dom U can’t use swap memory) is not
available , right?

 

And when balloon actually works, the pod cached will finally decrease to 0,
and no longer be used any more, right?

 

In my understanding, Pod Cache is much like a memory pool used for domain
initialization, this remind me of tmem,

which is a pool of all host memory. But tmem needs to have host OS
modification, but since Pod supports hvm, 

could we use this method to implement a tmem like memory overcommit?

 

 

From: Chu Rui [mailto:ruichu@gmail.com] 
TO: tinnycloud
CC: xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com;
dan.magenheimer@oracle.com
Subject: Re: [Xen-devel] Xen balloon driver discuss

 

I am also interested with tinnycloud's problem.

It looks that the pod cache has been used up like this:

 

    if ( p2md->pod.count == 0 )
        goto out_of_memory;

 

George, would you please take a look on this problem, and, if possbile, tell
a little more about what does PoD cache mean? Is it a memory pool for PoD
allocation?


[-- Attachment #1.2: Type: text/html, Size: 8730 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 答复: [Xen-devel] Xen balloon driver discuss
  2010-11-29 10:55     ` 答复: [Xen-devel] " tinnycloud
@ 2010-11-29 11:19       ` George Dunlap
  2010-11-29 15:41         ` hotmaim
  2010-11-30  3:51         ` 答复: [Xen-devel] Xen balloon driver discuss Chu Rui
  0 siblings, 2 replies; 32+ messages in thread
From: George Dunlap @ 2010-11-29 11:19 UTC (permalink / raw)
  To: tinnycloud; +Cc: Dan Magenheimer, xen-devel, 'Chu Rui'

On 29/11/10 10:55, tinnycloud wrote:
> So that is, if we run out of PoD cache before balloon works, Xen will 
> crash domain(goto out_of_memory),

That's right; PoD is only meant to allow a guest to run from boot until
the balloon driver can load.  It's to allow a guest to "boot ballooned."

> and at this situation, domain U swap(dom U can’t use swap memory) is not 
> available , right?

I don't believe swap and PoD are integrated at the moment, no.

> And when balloon actually works, the pod cached will finally decrease to 
> 0, and no longer be used any more, right?

Conceptually, yes.  What actually happens is that ballooning will reduce
it so that pod_entries==cache_size.  Entries will stay PoD until the
guest touches them.  It's likely that eventually the guest will touch
all the pages, at which point the PoD cache will be 0.

> could we use this method to implement a tmem like memory overcommit?

PoD does require guest knowledge -- it requires the balloon driver to be
loaded soon after boot so the so the guest will limit its memory usage.
 It also doesn't allow overcommit.  Memory in the PoD cache is already
allocated to the VM, and can't be used for something else.

You can't to overcommit without either:
* The guest knowing that it might not get the memory back, and being OK
with that (tmem), or
* Swapping, which doesn't require PoD at all.

If you're thinking about scanning for zero pages and automatically
reclaiming them, for instance, you have to be able to deal with a
situation where the guest decides to use a page you've reclaimed but
you've already given your last free page to someone else, and there are
no more zero pages anywhere on the system.  That would mean either just
pausing the VM indefinitely, or choosing another guest page to swap out.

 -George

> 
> *From:* Chu Rui [mailto:ruichu@gmail.com]
> *TO:* tinnycloud
> *CC:* xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com; 
> dan.magenheimer@oracle.com
> *Subject:* Re: [Xen-devel] Xen balloon driver discuss
> 
> I am also interested with tinnycloud's problem.
> 
> It looks that the pod cache has been used up like this:
> 
> if ( p2md->pod.count == 0 )
> goto out_of_memory;
> 
> George, would you please take a look on this problem, and, if possbile, 
> tell a little more about what does PoD cache mean? Is it a memory pool 
> for PoD allocation?
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: re: Xen balloon driver discuss
  2010-11-29 11:19       ` George Dunlap
@ 2010-11-29 15:41         ` hotmaim
  2010-11-30 10:50           ` George Dunlap
  2010-11-30  3:51         ` 答复: [Xen-devel] Xen balloon driver discuss Chu Rui
  1 sibling, 1 reply; 32+ messages in thread
From: hotmaim @ 2010-11-29 15:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: Dan Magenheimer, xen-devel, Chu Rui

Hi George:

        Appreciate for the details, I get more understandings but still have some confusions.
   1.Is it necessery to balloon to max-target at dom U right startup?(for, xm cr xxx.hvm maxmem=2048 memory=1024; max-target is 2048-1024) Say is it safe to balloon to let the guest has only 512M memory in total? or 1536 M(in this situation, i guess the pod entry will 
also reduce and extra 512M memory will be added to Pod cache,right)?

2. Suppose we have a xen wide PoD memorym pool, that is accessable for every guest domains, when the guest needs a page, it get the page from the pool, and we can still use 
balloon strategy to have the guest free pages back to the pool.  So if the amount of all domain 
memory inuse is less than host physial memory, it is safe. And when no memory available from
host, domain need new memory may pause for for waiting for others to free, or use swap memory, is it possible? 

3. If item 2 is possible, it looks more like tmem, what will tmem do when all memory on request is larger than host physical memory? I will have detail look tomorrow.

Thanks for your kindly help.  



From my iPad

2010-11-29,19:19,George Dunlap <George.Dunlap@eu.citrix.com> :

> On 29/11/10 10:55, tinnycloud wrote:
>> So that is, if we run out of PoD cache before balloon works, Xen will 
>> crash domain(goto out_of_memory),
> 
> That's right; PoD is only meant to allow a guest to run from boot until
> the balloon driver can load.  It's to allow a guest to "boot ballooned."
> 
>> and at this situation, domain U swap(dom U can’t use swap memory) is not 
>> available , right?
> 
> I don't believe swap and PoD are integrated at the moment, no.
> 
>> And when balloon actually works, the pod cached will finally decrease to 
>> 0, and no longer be used any more, right?
> 
> Conceptually, yes.  What actually happens is that ballooning will reduce
> it so that pod_entries==cache_size.  Entries will stay PoD until the
> guest touches them.  It's likely that eventually the guest will touch
> all the pages, at which point the PoD cache will be 0.
> 
>> could we use this method to implement a tmem like memory overcommit?
> 
> PoD does require guest knowledge -- it requires the balloon driver to be
> loaded soon after boot so the so the guest will limit its memory usage.
> It also doesn't allow overcommit.  Memory in the PoD cache is already
> allocated to the VM, and can't be used for something else.
> 
> You can't to overcommit without either:
> * The guest knowing that it might not get the memory back, and being OK
> with that (tmem), or
> * Swapping, which doesn't require PoD at all.
> 
> If you're thinking about scanning for zero pages and automatically
> reclaiming them, for instance, you have to be able to deal with a
> situation where the guest decides to use a page you've reclaimed but
> you've already given your last free page to someone else, and there are
> no more zero pages anywhere on the system.  That would mean either just
> pausing the VM indefinitely, or choosing another guest page to swap out.
> 
> -George
> 
>> 
>> *From:* Chu Rui [mailto:ruichu@gmail.com]
>> *TO:* tinnycloud
>> *CC:* xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com; 
>> dan.magenheimer@oracle.com
>> *Subject:* Re: [Xen-devel] Xen balloon driver discuss
>> 
>> I am also interested with tinnycloud's problem.
>> 
>> It looks that the pod cache has been used up like this:
>> 
>> if ( p2md->pod.count == 0 )
>> goto out_of_memory;
>> 
>> George, would you please take a look on this problem, and, if possbile, 
>> tell a little more about what does PoD cache mean? Is it a memory pool 
>> for PoD allocation?
>> 
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Re: Xen balloon driver discuss
  2010-11-29 10:12           ` George Dunlap
@ 2010-11-29 15:42             ` Dan Magenheimer
  0 siblings, 0 replies; 32+ messages in thread
From: Dan Magenheimer @ 2010-11-29 15:42 UTC (permalink / raw)
  To: George Dunlap; +Cc: cloudroot, tinnycloud, devel

Yes, sorry, I was thinking about the upstream balloon driver
which fails to init if (!xen_pv_domain()).

The only other problem I can think of in the RH5 balloon
driver is I think there is no minimum-size check... i.e. if
you try to balloon to a very small size (which can happen
accidentally if you use the wrong units to /proc/xen/balloon),
the guest kernel will crash.

> -----Original Message-----
> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
> Sent: Monday, November 29, 2010 3:12 AM
> To: Dan Magenheimer
> Cc: cloudroot; tinnycloud; xen devel
> Subject: [Xen-devel] Re: Xen balloon driver discuss
> 
> FYI, the balloon driver in 2.6.18 was meant to be working at some
> point.
>   The xen tree has some drivers which will compile for 2.6.18
> externally
> and will run in HVM mode.  More modern kernels need Stefano's pv-on-hvm
> patch series to be able to access xenstore (which is a requisite for a
> working balloon driver).
> 
>   -George
> 
> On 28/11/10 02:36, Dan Magenheimer wrote:
> > Am I understanding correctly that you are running each linux-2.6.18
> as
> > HVM (not PV)? I didn't think that the linux-2.6.18 balloon driver
> worked
> > at all in an HVM guest.
> >
> > You also didn't say what version of Xen you are using. If you are
> > running xen-unstable, you should also provide the changeset number.
> >
> > In any case, any load of HVM guests should never crash Xen itself,
> but
> > if you are running HVM guests, I probably can't help much as I almost
> > never run HVM guests.
> >
> > *From:* cloudroot [mailto:cloudroot@sina.com]
> > *Sent:* Friday, November 26, 2010 11:55 PM
> > *To:* tinnycloud; Dan Magenheimer; xen devel
> > *Cc:* george.dunlap@eu.citrix.com
> > *Subject:* re: Xen balloon driver discuss
> >
> > Hi Dan:
> >
> > I have set the benchmark to test balloon driver, but unfortunately
> the
> > Xen crashed on memory Panic.
> >
> > Before I attach the details output from serial port(which takes time
> on
> > next run), I am afraid of I might miss something on test environment.
> >
> > My dom0 kernel is 2.6.31, pvops.
> >
> > Well currently there is no driver/xen/balloon.c on this kernel source
> > tree, so I build the xen-balloon.ko, Xen-platform-pci.ko form
> >
> > linux-2.6.18.x86_64, and installed in Dom U, which is redhat 5.4.
> >
> > What I did is put a C program in the each Dom U(total 24 HVM), the
> > program will allocate the memory and fill it with random string
> repeatly.
> >
> > And in dom0, a phthon monitor will collect the meminfo from xenstore
> and
> > calculate the target to balloon from Committed_AS.
> >
> > The panic happens when the program is running in just one Dom.
> >
> > I am writing to ask whether my balloon driver is out of date, or
> where
> > can I get the latest source code,
> >
> > I've googled a lot, but still have a lot of confusion on those source
> tree.
> >
> > Many thanks.
> >
> > *From:* tinnycloud [mailto:tinnycloud@hotmail.com]
> > *Date:* 2010.11.23 22:58
> > *TO:* 'Dan Magenheimer'; 'xen devel'
> > *CC:* 'george.dunlap@eu.citrix.com'
> > *Subject:* re: Xen balloon driver discuss
> >
> > HI Dan:
> >
> > Appreciate for your presentation in summarizing the memory
> overcommit,
> > really vivid and in great help.
> >
> > Well, I guess recently days the strategy in my mind will fall into
> the
> > solution Set C in pdf.
> >
> > The tmem solution your worked out for memory overcommit is both
> > efficient and effective.
> >
> > I guess I will have a try on Linux Guest.
> >
> > The real situation I have is most of the running VMs on host are
> > windows. So I had to come up those policies to balance the memory.
> >
> > Although policies are all workload dependent. Good news is host
> workload
> > is configurable, and not very heavy
> >
> > So I will try to figure out some favorable policy. The policies
> referred
> > in pdf are good start for me.
> >
> > Today, instead of trying to implement "/proc/meminfo" with shared
> pages,
> > I hacked the balloon driver to have another
> >
> > workqueue periodically write meminfo into xenstore through xenbus,
> which
> > solve the problem of xenstrore high CPU
> >
> > utilization problem.
> >
> > Later I will try to google more on how Citrix does.
> >
> > Thanks for your help, or do you have any better idea for windows
> guest?
> >
> > *Sent:* Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> > *Date:* 2010.11.23 1:47
> > *To:* MaoXiaoyun; xen devel
> > *CC:* george.dunlap@eu.citrix.com
> > *Subject:* RE: Xen balloon driver discuss
> >
> > Xenstore IS slow and you could improve xenballoond performance by
> only
> > sending the single CommittedAS value from xenballoond in domU to dom0
> > instead of all of /proc/meminfo. But you are making an assumption
> that
> > getting memory utilization information from domU to dom0 FASTER (e.g.
> > with a shared page) will provide better ballooning results. I have
> not
> > found this to be the case, which is what led to my investigation into
> > self-ballooning, which led to Transcendent Memory. See the 2010 Xen
> > Summit for more information.
> >
> > In your last paragraph below "Regards balloon strategy", the problem
> is
> > it is not easy to define "enough memory" and "shortage of memory"
> within
> > any guest and almost impossible to define it and effectively load
> > balance across many guests. See my Linux Plumber's Conference
> > presentation (with complete speaker notes) here:
> >
> >
> http://oss.oracle.com/projects/tmem/dist/documentation/presentations/Me
> mMgmtVirtEnv-LPC2010-Final.pdf
> >
> >
> http://oss.oracle.com/projects/tmem/dist/documentation/presentations/Me
> mMgmtVirtEnv-LPC2010-SpkNotes.pdf
> >
> > *From:* MaoXiaoyun [mailto:tinnycloud@hotmail.com]
> > *Sent:* Sunday, November 21, 2010 9:33 PM
> > *To:* xen devel
> > *Cc:* Dan Magenheimer; george.dunlap@eu.citrix.com
> > *Subject:* RE: Xen balloon driver discuss
> >
> >
> > Since currently /cpu/meminfo is sent to domain 0 via xenstore, which
> in
> > my opinoin is slow.
> > What I want to do is: there is a shared page between domU and dom0,
> and
> > domU periodically
> > update the meminfo into the page, while on the other side dom0
> retrive
> > the updated data for
> > caculating the target, which is used by guest for balloning.
> >
> > The problem I met is, currently I don't know how to implement a
> shared
> > page between
> > dom0 and domU.
> > Would it like dom 0 alloc a unbound event and wait guest to connect,
> and
> > transfer date through
> > grant table?
> > Or someone has more efficient way?
> > many thanks.
> >
> >>  From: tinnycloud@hotmail.com
> >>  To: xen-devel@lists.xensource.com
> >>  CC: dan.magenheimer@oracle.com; George.Dunlap@eu.citrix.com
> >>  Subject: Xen balloon driver discuss
> >>  Date: Sun, 21 Nov 2010 14:26:01 +0800
> >>
> >>  Hi:
> >>  Greeting first.
> >>
> >>  I was trying to run about 24 HVMS (currently only Linux, later will
> >>  involve Windows) on one physical server with 24GB memory, 16CPUs.
> >>  Each VM is configured with 2GB memory, and I reserved 8GB memory
> for
> >>  dom0.
> >>  For safety reason, only domain U's memory is allowed to balloon.
> >>
> >>  Inside domain U, I used xenballooned provide by xensource,
> >>  periodically write /proc/meminfo into xenstore in dom
> >>  0(/local/domain/did/memory/meminfo).
> >>  And in domain 0, I wrote a python script to read the meminfo, like
> >>  xen provided strategy, use Committed_AS to calculate the domain U
> balloon
> >>  target.
> >>  The time interval is ! 1 seconds.
> >>
> >>  Inside each VM, I setup a apache server for test. Well, I'd
> >>  like to say the result is not so good.
> >>  It appears that too much read/write on xenstore, when I give some
> of
> >>  the stress(by using ab) to guest domains,
> >>  the CPU usage of xenstore is up to 100%. Thus the monitor running
> in
> >>  dom0 also response quite slowly.
> >>  Also, in ab test, the Committed_AS grows very fast, reach to maxmem
> >>  in short time, but in fact the only a small amount
> >>  of memory guest really need, so I guess there should be some more
> to
> >>  be taken into consideration for ballooning.
> >>
> >>  For xenstore issue, I first plan to wrote a C program inside domain
> >>  U to replace xenballoond to see whether the situation
> >>  will be refined. If not, how about set up event channel directly
> for
> >>  domU and dom0, would it be faster?
> >>
> >>  Regards balloon strategy, I would do like this, when there ! are
> >>  enough memory , just fulfill the guest balloon request, and when
> shortage
> >>  of memory, distribute memory evenly on the guests those request
> >>  inflation.
> >>
> >>  Does anyone have better suggestion, thanks in advance.
> >>
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 答复: [Xen-devel] Xen balloon driver discuss
  2010-11-29 11:19       ` George Dunlap
  2010-11-29 15:41         ` hotmaim
@ 2010-11-30  3:51         ` Chu Rui
  2010-11-30 11:08           ` George Dunlap
  1 sibling, 1 reply; 32+ messages in thread
From: Chu Rui @ 2010-11-30  3:51 UTC (permalink / raw)
  To: George Dunlap; +Cc: tinnycloud, xen-devel, Dan Magenheimer


[-- Attachment #1.1: Type: text/plain, Size: 3555 bytes --]

I also think it is a little strange in current PoD implementation, different
with my image.
In my mind, as tinnycloud mentioned, the PoD cache should be a pool as large
as the idle memory in VMM, and shared by all guests. If a guest usages was
always smaller than the predefined PoD cache size, the unused part could be
appropriated for others. On the contrary, If a guest balloon was delayed  to
be started, the VMM should populate more memory to satisfy it (supposing the
VMM has enough memory).
In current PoD, the balloon should be started as soon as possbile, otherwise
the guest will be crashed after the PoD cache is exhausted(supposing the
emergency sweep does not work). That's dangerous, although in most cases it
works well.
George, I wonder why do you implement it as this? It looks better to used a
resilient PoD cache, intead of a fixed one. Your wonderful work was
appreciated, I just want to know your thought.

在 2010年11月29日 下午7:19,George Dunlap <George.Dunlap@eu.citrix.com>写道:

> On 29/11/10 10:55, tinnycloud wrote:
> > So that is, if we run out of PoD cache before balloon works, Xen will
> > crash domain(goto out_of_memory),
>
> That's right; PoD is only meant to allow a guest to run from boot until
> the balloon driver can load.  It's to allow a guest to "boot ballooned."
>
> > and at this situation, domain U swap(dom U can’t use swap memory) is not
> > available , right?
>
> I don't believe swap and PoD are integrated at the moment, no.
>
> > And when balloon actually works, the pod cached will finally decrease to
> > 0, and no longer be used any more, right?
>
> Conceptually, yes.  What actually happens is that ballooning will reduce
> it so that pod_entries==cache_size.  Entries will stay PoD until the
> guest touches them.  It's likely that eventually the guest will touch
> all the pages, at which point the PoD cache will be 0.
>
> > could we use this method to implement a tmem like memory overcommit?
>
> PoD does require guest knowledge -- it requires the balloon driver to be
> loaded soon after boot so the so the guest will limit its memory usage.
>  It also doesn't allow overcommit.  Memory in the PoD cache is already
> allocated to the VM, and can't be used for something else.
>
> You can't to overcommit without either:
> * The guest knowing that it might not get the memory back, and being OK
> with that (tmem), or
> * Swapping, which doesn't require PoD at all.
>
> If you're thinking about scanning for zero pages and automatically
> reclaiming them, for instance, you have to be able to deal with a
> situation where the guest decides to use a page you've reclaimed but
> you've already given your last free page to someone else, and there are
> no more zero pages anywhere on the system.  That would mean either just
> pausing the VM indefinitely, or choosing another guest page to swap out.
>
>  -George
>
> >
> > *From:* Chu Rui [mailto:ruichu@gmail.com]
>  > *TO:* tinnycloud
> > *CC:* xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com;
> > dan.magenheimer@oracle.com
> > *Subject:* Re: [Xen-devel] Xen balloon driver discuss
> >
> > I am also interested with tinnycloud's problem.
> >
> > It looks that the pod cache has been used up like this:
> >
> > if ( p2md->pod.count == 0 )
> > goto out_of_memory;
> >
> > George, would you please take a look on this problem, and, if possbile,
> > tell a little more about what does PoD cache mean? Is it a memory pool
> > for PoD allocation?
> >
>
>

[-- Attachment #1.2: Type: text/html, Size: 4412 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Xen balloon driver discuss
  2010-11-29 15:41         ` hotmaim
@ 2010-11-30 10:50           ` George Dunlap
  2010-11-30 13:58             ` tinnycloud
                               ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: George Dunlap @ 2010-11-30 10:50 UTC (permalink / raw)
  To: hotmaim; +Cc: Dan Magenheimer, xen-devel, Chu Rui

On 29/11/10 15:41, hotmaim wrote:
>          Appreciate for the details, I get more understandings but still have some confusions.
>     1.Is it necessery to balloon to max-target at dom U right startup?(for, xm cr xxx.hvm maxmem=2048 memory=1024; max-target is 2048-1024) Say is it safe to balloon to let the guest has only 512M memory in total? or 1536 M(in this situation, i guess the pod entry will
> also reduce and extra 512M memory will be added to Pod cache,right)?

I'm sorry, I can't figure out what you mean.  The tools will set
"target" to the value of "memory".  The balloon driver is supposed to
see how many total pages it has (2048M) and "inflate" the balloon until
the number of pages is at the target (1024M in your example above).

> 2. Suppose we have a xen wide PoD memorym pool, that is accessable for every guest domains, when the guest needs a page, it get the page from the pool, and we can still use
> balloon strategy to have the guest free pages back to the pool.  So if the amount of all domain
> memory inuse is less than host physial memory, it is safe. And when no memory available from
> host, domain need new memory may pause for for waiting for others to free, or use swap memory, is it possible?

We already have a pool of free memory accessible to all the guest
domains: It's called the Xen free page list. :-)

One of the explicit purposes of PoD is to set aside a fixed amount of
memory for a guest, so that no other domains / processes can claim it.
It's guaranteed that memory, and as long as it has a working balloon
driver, shouldn't have any issues using it properly.  Sharing it with
other VMs would undermine this, and make it pretty much the same as the
Xen free page list.

I'm not an expert in tmem, but as I understand it, the whole point of
tmem is to use knowledge of the guest OS to be able to throw away
certain data.  You can't get guest-specific knowledge without modifying
the guest OS to have it tell Xen somehow.

It sounds like what you're advocating is *allocate*-on-demand (as
opposed to PoD, which allocates all the memory at the beginning but
*populates* the p2m table on demand): tell all the guests they have more
memory than is available total, assuming that only some of them are
going to try to use all of it; and allocating the memory as it's used.
This works well for processes, but operating systems are typically built
with the assumption that memory not used is memory completely wasted.
They therefore keep disk cache pages and unused memory pages around
"just in case", and I predict that any guest which has an active
workload will eventually use all the memory it's been told it has, even
if it's only actively using a small portion of it.  At that point, Xen
will be forced to try to guess which page is the least important to have
around and swap it out.

Alternately, the tools could slowly balloon down all of the guests as
the memory starts to run out; but then you have a situation where the
guest that gets the most memory is the one that touched it first, not
the one which actually needs it.

At any rate, PoD is meant to solve exactly one problem: booting
"ballooned".  At the moment it doesn't lend itself to other solutions.

 -George

> 2010-11-29,19:19,George Dunlap<George.Dunlap@eu.citrix.com>  :
> 
>> On 29/11/10 10:55, tinnycloud wrote:
>>> So that is, if we run out of PoD cache before balloon works, Xen will
>>> crash domain(goto out_of_memory),
>>
>> That's right; PoD is only meant to allow a guest to run from boot until
>> the balloon driver can load.  It's to allow a guest to "boot ballooned."
>>
>>> and at this situation, domain U swap(dom U can’t use swap memory) is not
>>> available , right?
>>
>> I don't believe swap and PoD are integrated at the moment, no.
>>
>>> And when balloon actually works, the pod cached will finally decrease to
>>> 0, and no longer be used any more, right?
>>
>> Conceptually, yes.  What actually happens is that ballooning will reduce
>> it so that pod_entries==cache_size.  Entries will stay PoD until the
>> guest touches them.  It's likely that eventually the guest will touch
>> all the pages, at which point the PoD cache will be 0.
>>
>>> could we use this method to implement a tmem like memory overcommit?
>>
>> PoD does require guest knowledge -- it requires the balloon driver to be
>> loaded soon after boot so the so the guest will limit its memory usage.
>> It also doesn't allow overcommit.  Memory in the PoD cache is already
>> allocated to the VM, and can't be used for something else.
>>
>> You can't to overcommit without either:
>> * The guest knowing that it might not get the memory back, and being OK
>> with that (tmem), or
>> * Swapping, which doesn't require PoD at all.
>>
>> If you're thinking about scanning for zero pages and automatically
>> reclaiming them, for instance, you have to be able to deal with a
>> situation where the guest decides to use a page you've reclaimed but
>> you've already given your last free page to someone else, and there are
>> no more zero pages anywhere on the system.  That would mean either just
>> pausing the VM indefinitely, or choosing another guest page to swap out.
>>
>> -George
>>
>>>
>>> *From:* Chu Rui [mailto:ruichu@gmail.com]
>>> *TO:* tinnycloud
>>> *CC:* xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com;
>>> dan.magenheimer@oracle.com
>>> *Subject:* Re: [Xen-devel] Xen balloon driver discuss
>>>
>>> I am also interested with tinnycloud's problem.
>>>
>>> It looks that the pod cache has been used up like this:
>>>
>>> if ( p2md->pod.count == 0 )
>>> goto out_of_memory;
>>>
>>> George, would you please take a look on this problem, and, if possbile,
>>> tell a little more about what does PoD cache mean? Is it a memory pool
>>> for PoD allocation?
>>>
>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: 答复: [Xen-devel] Xen balloon driver discuss
  2010-11-30  3:51         ` 答复: [Xen-devel] Xen balloon driver discuss Chu Rui
@ 2010-11-30 11:08           ` George Dunlap
  0 siblings, 0 replies; 32+ messages in thread
From: George Dunlap @ 2010-11-30 11:08 UTC (permalink / raw)
  To: Chu Rui; +Cc: tinnycloud, xen-devel, Dan Magenheimer

On 30/11/10 03:51, Chu Rui wrote:
> George, I wonder why do you implement it as this? It looks better to 
> used a resilient PoD cache, intead of a fixed one. Your wonderful work 
> was appreciated, I just want to know your thought.

The main reason it was implemented this way was to make things
predictable for the toolstack.  The XenServer control stack recently
implmented an automatic dynamic memory control functionality that would
allow you to simply set some ranges for memory, and it would
automatically change the ballooning for you.  To do that effectively, it
needs to know how much memory is in use by every guest, and be able to
guarantee that a guest can get memory if it needs it.  Furthermore,
XenServer has a High Availability (HA) option, which will allow you to
guarantee that if a given number of hosts go down, certain VMs can be
guaranteed to restart on other hosts.  For both of these options, having
strong control of the memory is a necessity.

As I said in another e-mail, the Xen free page list is an
already-existing, system-wide pool from which VMs can (in theory)
allocate RAM.  We could make it such that when a VM runs too low on PoD
memory, it pauses and notifies the toolstack somehow.  A toolstack which
wanted to be more flexible with the balloon drivers could then increase
its PoD cache size from the free cache pool if available; if not
available, it could even create more free memory by ballooning down
other VMs.  Xen doesn't need to be involved.

If you want to implement that functionality, I'm sure your patches would
be welcome. :-)

 -George

^ permalink raw reply	[flat|nested] 32+ messages in thread

* re: Xen balloon driver discuss
  2010-11-30 10:50           ` George Dunlap
@ 2010-11-30 13:58             ` tinnycloud
  2010-11-30 16:39             ` Dan Magenheimer
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: tinnycloud @ 2010-11-30 13:58 UTC (permalink / raw)
  To: 'George Dunlap'
  Cc: 'Dan Magenheimer', xen-devel, 'Chu Rui'

Thank you for your kindly help. 

Well, on last mail, you mentioned that balloon will make pod_entries equal
to cache_size as soon as it start to work when guest starts up.
>From my understanding, if we start guest such as:

xm cr xxx.hvm maxmem=2048 memory=512 

then, we should set the /local/domain/did/memory/target to 522240 ( (
512M-2M) * 1204, 2M for VGA in your another patch? )
to tell the balloon driver in guest to inflate, right? And when balloon
driver balloon to let guest memory has this target,
I think pod_entires will equal to cached_size, right?

I did some experiment on this, the result shows different.

Step 1.
xm cr xxx.hvm maxmem=2048 memory=512

at the very beginning, I printed out domain tot_pages, 1320288,
pod.entry_count 523776, that is 2046M, pod.count 130560, that is 512M

(XEN) tot_pages 132088 pod_entries 523776 pod_count 130560


currently, /local/domain/did/memory/target in default will be written to
524288

after guest start up, balloon driver will balloon, when finish, I can see
pod.entry_count reduce to 23552, pod,count 14063

(XEN)     DomPage list too long to display
(XEN) Tot pages 132088  PoD entries=23552 cachesize=14063

Step 2.

In my understanding, /local/domain/did/memory/target should be at least 510
* 1024 , and then pod_entries will equal to cache_size

I use  500, So I did: xm mem-set domain_id  500

then I can see pod.entry_count reduce to 22338, pod,count 15921, still not
equal

(XEN) Memory pages belonging to domain 4:
(XEN)     DomPage list too long to display
(XEN) Tot pages 132088  PoD entries=22338 cachesize=15921

Step 3. 

Only after I did : xm mem-set domain_id  470
Pod_entries is equal to pod.count
(XEN)     DomPage list too long to display
(XEN) Tot pages 130825  PoD entries=14677 cachesize=14677

Later from the code, I learnt that those two values are forced to be equal,
in 

700 out_entry_check:
701     /* If we've reduced our "liabilities" beyond our "assets", free some
*/
702     if ( p2md->pod.entry_count < p2md->pod.count )
703     {
704         p2m_pod_set_cache_target(d, p2md->pod.entry_count);
705     }   
706


So in conclude, it looks like something goes wrong, the PoD entries should
equal to cachesize(pod.count) 
as soon as the balloon driver inflate to max - target, right? 

Many thanks.

----------------------------------------------------------------------------
------------------
From: George Dunlap [mailto:George.Dunlap@eu.citrix.com] 
to: hotmaim
cc: Chu Rui; xen-devel@lists.xensource.com; Dan Magenheimer
Sub: Re: [Xen-devel] Xen balloon driver discuss

On 29/11/10 15:41, hotmaim wrote:
>          Appreciate for the details, I get more understandings but still
have some confusions.
>     1.Is it necessery to balloon to max-target at dom U right
startup?(for, xm cr xxx.hvm maxmem=2048 memory=1024; max-target is
2048-1024) Say is it safe to balloon to let the guest has only 512M memory
in total? or 1536 M(in this situation, i guess the pod entry will
> also reduce and extra 512M memory will be added to Pod cache,right)?

I'm sorry, I can't figure out what you mean.  The tools will set
"target" to the value of "memory".  The balloon driver is supposed to
see how many total pages it has (2048M) and "inflate" the balloon until
the number of pages is at the target (1024M in your example above).

> 2. Suppose we have a xen wide PoD memorym pool, that is accessable for
every guest domains, when the guest needs a page, it get the page from the
pool, and we can still use
> balloon strategy to have the guest free pages back to the pool.  So if the
amount of all domain
> memory inuse is less than host physial memory, it is safe. And when no
memory available from
> host, domain need new memory may pause for for waiting for others to free,
or use swap memory, is it possible?

We already have a pool of free memory accessible to all the guest
domains: It's called the Xen free page list. :-)

One of the explicit purposes of PoD is to set aside a fixed amount of
memory for a guest, so that no other domains / processes can claim it.
It's guaranteed that memory, and as long as it has a working balloon
driver, shouldn't have any issues using it properly.  Sharing it with
other VMs would undermine this, and make it pretty much the same as the
Xen free page list.

I'm not an expert in tmem, but as I understand it, the whole point of
tmem is to use knowledge of the guest OS to be able to throw away
certain data.  You can't get guest-specific knowledge without modifying
the guest OS to have it tell Xen somehow.

It sounds like what you're advocating is *allocate*-on-demand (as
opposed to PoD, which allocates all the memory at the beginning but
*populates* the p2m table on demand): tell all the guests they have more
memory than is available total, assuming that only some of them are
going to try to use all of it; and allocating the memory as it's used.
This works well for processes, but operating systems are typically built
with the assumption that memory not used is memory completely wasted.
They therefore keep disk cache pages and unused memory pages around
"just in case", and I predict that any guest which has an active
workload will eventually use all the memory it's been told it has, even
if it's only actively using a small portion of it.  At that point, Xen
will be forced to try to guess which page is the least important to have
around and swap it out.

Alternately, the tools could slowly balloon down all of the guests as
the memory starts to run out; but then you have a situation where the
guest that gets the most memory is the one that touched it first, not
the one which actually needs it.

At any rate, PoD is meant to solve exactly one problem: booting
"ballooned".  At the moment it doesn't lend itself to other solutions.

 -George

> 2010-11-29,19:19,George Dunlap<George.Dunlap@eu.citrix.com>  :
> 
>> On 29/11/10 10:55, tinnycloud wrote:
>>> So that is, if we run out of PoD cache before balloon works, Xen will
>>> crash domain(goto out_of_memory),
>>
>> That's right; PoD is only meant to allow a guest to run from boot until
>> the balloon driver can load.  It's to allow a guest to "boot ballooned."
>>
>>> and at this situation, domain U swap(dom U can’t use swap memory) is
not
>>> available , right?
>>
>> I don't believe swap and PoD are integrated at the moment, no.
>>
>>> And when balloon actually works, the pod cached will finally decrease to
>>> 0, and no longer be used any more, right?
>>
>> Conceptually, yes.  What actually happens is that ballooning will reduce
>> it so that pod_entries==cache_size.  Entries will stay PoD until the
>> guest touches them.  It's likely that eventually the guest will touch
>> all the pages, at which point the PoD cache will be 0.
>>
>>> could we use this method to implement a tmem like memory overcommit?
>>
>> PoD does require guest knowledge -- it requires the balloon driver to be
>> loaded soon after boot so the so the guest will limit its memory usage.
>> It also doesn't allow overcommit.  Memory in the PoD cache is already
>> allocated to the VM, and can't be used for something else.
>>
>> You can't to overcommit without either:
>> * The guest knowing that it might not get the memory back, and being OK
>> with that (tmem), or
>> * Swapping, which doesn't require PoD at all.
>>
>> If you're thinking about scanning for zero pages and automatically
>> reclaiming them, for instance, you have to be able to deal with a
>> situation where the guest decides to use a page you've reclaimed but
>> you've already given your last free page to someone else, and there are
>> no more zero pages anywhere on the system.  That would mean either just
>> pausing the VM indefinitely, or choosing another guest page to swap out.
>>
>> -George
>>
>>>
>>> *From:* Chu Rui [mailto:ruichu@gmail.com]
>>> *TO:* tinnycloud
>>> *CC:* xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com;
>>> dan.magenheimer@oracle.com
>>> *Subject:* Re: [Xen-devel] Xen balloon driver discuss
>>>
>>> I am also interested with tinnycloud's problem.
>>>
>>> It looks that the pod cache has been used up like this:
>>>
>>> if ( p2md->pod.count == 0 )
>>> goto out_of_memory;
>>>
>>> George, would you please take a look on this problem, and, if possbile,
>>> tell a little more about what does PoD cache mean? Is it a memory pool
>>> for PoD allocation?
>>>
>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Xen balloon driver discuss
  2010-11-30 10:50           ` George Dunlap
  2010-11-30 13:58             ` tinnycloud
@ 2010-11-30 16:39             ` Dan Magenheimer
  2010-12-01  5:07             ` xiaoyun.maoxy
       [not found]             ` <00fe01cb9115$98319c80$c894d580$@maoxy@aliyun-inc.com>
  3 siblings, 0 replies; 32+ messages in thread
From: Dan Magenheimer @ 2010-11-30 16:39 UTC (permalink / raw)
  To: George Dunlap, hotmaim; +Cc: xen-devel, Chu Rui

> One of the explicit purposes of PoD is to set aside a fixed amount of
> memory for a guest, so that no other domains / processes can claim it.
> It's guaranteed that memory, and as long as it has a working balloon
> driver, shouldn't have any issues using it properly.  Sharing it with
> other VMs would undermine this, and make it pretty much the same as the
> Xen free page list.
>   :
> It sounds like what you're advocating is *allocate*-on-demand (as
> opposed to PoD, which allocates all the memory at the beginning but
> *populates* the p2m table on demand): tell all the guests they have
> more
> memory than is available total, assuming that only some of them are
> going to try to use all of it; and allocating the memory as it's used.
> This works well for processes, but operating systems are typically
> built
> with the assumption that memory not used is memory completely wasted.
> They therefore keep disk cache pages and unused memory pages around
> "just in case", and I predict that any guest which has an active
> workload will eventually use all the memory it's been told it has, even
> if it's only actively using a small portion of it.  At that point, Xen
> will be forced to try to guess which page is the least important to
> have
> around and swap it out.

Maybe another key point about PoD is worth mentioning here (and
probably very obvious to George and possibly mentioned somewhere
else in this thread and I just missed it): The guest will *crash*
if it attempts to write to a PoD page and Xen has no real physical
page to back it.  Or alternately, the guest must be stopped
(perhaps for a long time) until Xen does have a real physical page
to back it.  Real Windows guest users won't like that, so the
memory should be pre-allocated and remain reserved for that guest.
Or the toolset/dom0 must implement host-swapping, which has all
sorts of nasty unpredictable performance issues.

Dan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* re: Xen balloon driver discuss
  2010-11-30 10:50           ` George Dunlap
  2010-11-30 13:58             ` tinnycloud
  2010-11-30 16:39             ` Dan Magenheimer
@ 2010-12-01  5:07             ` xiaoyun.maoxy
       [not found]             ` <00fe01cb9115$98319c80$c894d580$@maoxy@aliyun-inc.com>
  3 siblings, 0 replies; 32+ messages in thread
From: xiaoyun.maoxy @ 2010-12-01  5:07 UTC (permalink / raw)
  To: 'tinnycloud', 'George Dunlap'
  Cc: 'Dan Magenheimer', xen-devel, 'Chu Rui'

Hi George:

	I think I know the problem, it is due to the balloon driver I used
it out of date. 
	My Guest kernel is
from(ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/SRPMS/,
kernel-2.6.18-164.el5.src.rpm, so as the balloon driver ) 
	
	The problem is at the very beginning, Pod Entry total is different
from the current_pages pages in balloon.
	(at the beginning, both Pod Entry and current_pages shall point to
the same value, that is total memory allocated for guest,
	 But in fact, Pod Entry is 523776  <  current_pages is  514879
	 So from Pod aspect, the balloon need to inflate to 523776 - target,
but the balloon driver only inflate 514879 -target
	 This is the problem. 
	)

	So later I will try to get the balloon.c from xenlinux to build a
new driver, to see if solve the problem.

	Thanks.

----- -----
From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Sent: 2010.11.30 21:59
To: 'George Dunlap'
cc: 'Chu Rui'; 'xen-devel@lists.xensource.com'; 'Dan Magenheimer'
Subject: re: [Xen-devel] Xen balloon driver discuss

Thank you for your kindly help. 

Well, on last mail, you mentioned that balloon will make pod_entries equal
to cache_size as soon as it start to work when guest starts up.
>From my understanding, if we start guest such as:

xm cr xxx.hvm maxmem=2048 memory=512 

then, we should set the /local/domain/did/memory/target to 522240 ( (
512M-2M) * 1204, 2M for VGA in your another patch? )
to tell the balloon driver in guest to inflate, right? And when balloon
driver balloon to let guest memory has this target,
I think pod_entires will equal to cached_size, right?

I did some experiment on this, the result shows different.

Step 1.
xm cr xxx.hvm maxmem=2048 memory=512

at the very beginning, I printed out domain tot_pages, 1320288,
pod.entry_count 523776, that is 2046M, pod.count 130560, that is 512M

(XEN) tot_pages 132088 pod_entries 523776 pod_count 130560


currently, /local/domain/did/memory/target in default will be written to
524288

after guest start up, balloon driver will balloon, when finish, I can see
pod.entry_count reduce to 23552, pod,count 14063

(XEN)     DomPage list too long to display
(XEN) Tot pages 132088  PoD entries=23552 cachesize=14063

Step 2.

In my understanding, /local/domain/did/memory/target should be at least 510
* 1024 , and then pod_entries will equal to cache_size

I use  500, So I did: xm mem-set domain_id  500

then I can see pod.entry_count reduce to 22338, pod,count 15921, still not
equal

(XEN) Memory pages belonging to domain 4:
(XEN)     DomPage list too long to display
(XEN) Tot pages 132088  PoD entries=22338 cachesize=15921

Step 3. 

Only after I did : xm mem-set domain_id  470
Pod_entries is equal to pod.count
(XEN)     DomPage list too long to display
(XEN) Tot pages 130825  PoD entries=14677 cachesize=14677

Later from the code, I learnt that those two values are forced to be equal,
in 

700 out_entry_check:
701     /* If we've reduced our "liabilities" beyond our "assets", free some
*/
702     if ( p2md->pod.entry_count < p2md->pod.count )
703     {
704         p2m_pod_set_cache_target(d, p2md->pod.entry_count);
705     }   
706


So in conclude, it looks like something goes wrong, the PoD entries should
equal to cachesize(pod.count) 
as soon as the balloon driver inflate to max - target, right? 

Many thanks.

----------------------------------------------------------------------------
------------------
From: George Dunlap [mailto:George.Dunlap@eu.citrix.com] 
to: hotmaim
cc: Chu Rui; xen-devel@lists.xensource.com; Dan Magenheimer
Sub: Re: [Xen-devel] Xen balloon driver discuss

On 29/11/10 15:41, hotmaim wrote:
>          Appreciate for the details, I get more understandings but still
have some confusions.
>     1.Is it necessery to balloon to max-target at dom U right
startup?(for, xm cr xxx.hvm maxmem=2048 memory=1024; max-target is
2048-1024) Say is it safe to balloon to let the guest has only 512M memory
in total? or 1536 M(in this situation, i guess the pod entry will
> also reduce and extra 512M memory will be added to Pod cache,right)?

I'm sorry, I can't figure out what you mean.  The tools will set
"target" to the value of "memory".  The balloon driver is supposed to
see how many total pages it has (2048M) and "inflate" the balloon until
the number of pages is at the target (1024M in your example above).

> 2. Suppose we have a xen wide PoD memorym pool, that is accessable for
every guest domains, when the guest needs a page, it get the page from the
pool, and we can still use
> balloon strategy to have the guest free pages back to the pool.  So if the
amount of all domain
> memory inuse is less than host physial memory, it is safe. And when no
memory available from
> host, domain need new memory may pause for for waiting for others to free,
or use swap memory, is it possible?

We already have a pool of free memory accessible to all the guest
domains: It's called the Xen free page list. :-)

One of the explicit purposes of PoD is to set aside a fixed amount of
memory for a guest, so that no other domains / processes can claim it.
It's guaranteed that memory, and as long as it has a working balloon
driver, shouldn't have any issues using it properly.  Sharing it with
other VMs would undermine this, and make it pretty much the same as the
Xen free page list.

I'm not an expert in tmem, but as I understand it, the whole point of
tmem is to use knowledge of the guest OS to be able to throw away
certain data.  You can't get guest-specific knowledge without modifying
the guest OS to have it tell Xen somehow.

It sounds like what you're advocating is *allocate*-on-demand (as
opposed to PoD, which allocates all the memory at the beginning but
*populates* the p2m table on demand): tell all the guests they have more
memory than is available total, assuming that only some of them are
going to try to use all of it; and allocating the memory as it's used.
This works well for processes, but operating systems are typically built
with the assumption that memory not used is memory completely wasted.
They therefore keep disk cache pages and unused memory pages around
"just in case", and I predict that any guest which has an active
workload will eventually use all the memory it's been told it has, even
if it's only actively using a small portion of it.  At that point, Xen
will be forced to try to guess which page is the least important to have
around and swap it out.

Alternately, the tools could slowly balloon down all of the guests as
the memory starts to run out; but then you have a situation where the
guest that gets the most memory is the one that touched it first, not
the one which actually needs it.

At any rate, PoD is meant to solve exactly one problem: booting
"ballooned".  At the moment it doesn't lend itself to other solutions.

 -George

> 2010-11-29,19:19,George Dunlap<George.Dunlap@eu.citrix.com>  :
> 
>> On 29/11/10 10:55, tinnycloud wrote:
>>> So that is, if we run out of PoD cache before balloon works, Xen will
>>> crash domain(goto out_of_memory),
>>
>> That's right; PoD is only meant to allow a guest to run from boot until
>> the balloon driver can load.  It's to allow a guest to "boot ballooned."
>>
>>> and at this situation, domain U swap(dom U can’t use swap memory) is
not
>>> available , right?
>>
>> I don't believe swap and PoD are integrated at the moment, no.
>>
>>> And when balloon actually works, the pod cached will finally decrease to
>>> 0, and no longer be used any more, right?
>>
>> Conceptually, yes.  What actually happens is that ballooning will reduce
>> it so that pod_entries==cache_size.  Entries will stay PoD until the
>> guest touches them.  It's likely that eventually the guest will touch
>> all the pages, at which point the PoD cache will be 0.
>>
>>> could we use this method to implement a tmem like memory overcommit?
>>
>> PoD does require guest knowledge -- it requires the balloon driver to be
>> loaded soon after boot so the so the guest will limit its memory usage.
>> It also doesn't allow overcommit.  Memory in the PoD cache is already
>> allocated to the VM, and can't be used for something else.
>>
>> You can't to overcommit without either:
>> * The guest knowing that it might not get the memory back, and being OK
>> with that (tmem), or
>> * Swapping, which doesn't require PoD at all.
>>
>> If you're thinking about scanning for zero pages and automatically
>> reclaiming them, for instance, you have to be able to deal with a
>> situation where the guest decides to use a page you've reclaimed but
>> you've already given your last free page to someone else, and there are
>> no more zero pages anywhere on the system.  That would mean either just
>> pausing the VM indefinitely, or choosing another guest page to swap out.
>>
>> -George
>>
>>>
>>> *From:* Chu Rui [mailto:ruichu@gmail.com]
>>> *TO:* tinnycloud
>>> *CC:* xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com;
>>> dan.magenheimer@oracle.com
>>> *Subject:* Re: [Xen-devel] Xen balloon driver discuss
>>>
>>> I am also interested with tinnycloud's problem.
>>>
>>> It looks that the pod cache has been used up like this:
>>>
>>> if ( p2md->pod.count == 0 )
>>> goto out_of_memory;
>>>
>>> George, would you please take a look on this problem, and, if possbile,
>>> tell a little more about what does PoD cache mean? Is it a memory pool
>>> for PoD allocation?
>>>
>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* re: Xen balloon driver discuss
       [not found]             ` <00fe01cb9115$98319c80$c894d580$@maoxy@aliyun-inc.com>
@ 2010-12-01  6:29               ` tinnycloud
  2011-01-12 14:41                 ` strange CPU utilization, could related to credit schedule ? tinnycloud
  0 siblings, 1 reply; 32+ messages in thread
From: tinnycloud @ 2010-12-01  6:29 UTC (permalink / raw)
  To: 'George Dunlap'
  Cc: 'Dan Magenheimer', xen-devel, 'Chu Rui'

To build a new balloon driver is much easier than I thought, so I quickly
get more results

Create domain like:

xm cr xxx.hvm maxmem=2048 memory=512

In new the cur_page is 525312(larger than Pod Entry, so after balloon
pod_entry == pod_cached well be satisfied),
that is 2052M, later I found that this number comes from domain->max_pages.

In /local/domain/did/memory/target is 524288, that is 512M
Inside guest, from /proc/meminfo, the total memory is 482236KB, that is
470.93M

Strange is
balloon driver holds memory = 2052 - 512 = 1540M
And the guest actually has 470.93M
1540 + 470.93 = 2010.93 < 2048

So I wonder where is the memory goes (2048-2010.93)?

Thanks. 
----- -----
date: 2010,12,1 13:07
To: 'tinnycloud'; 'George Dunlap'
CC: 'Chu Rui'; xen-devel@lists.xensource.com; 'Dan Magenheimer'
Subject: re: [Xen-devel] Xen balloon driver discuss

Hi George:

	I think I know the problem, it is due to the balloon driver I used
it out of date. 
	My Guest kernel is
from(ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/SRPMS/,
kernel-2.6.18-164.el5.src.rpm, so as the balloon driver ) 
	
	The problem is at the very beginning, Pod Entry total is different
from the current_pages pages in balloon.
	(at the beginning, both Pod Entry and current_pages shall point to
the same value, that is total memory allocated for guest,
	 But in fact, Pod Entry is 523776  <  current_pages is  514879
	 So from Pod aspect, the balloon need to inflate to 523776 - target,
but the balloon driver only inflate 514879 -target
	 This is the problem. 
	)

	So later I will try to get the balloon.c from xenlinux to build a
new driver, to see if solve the problem.

	Thanks.

----- -----
From: tinnycloud [mailto:tinnycloud@hotmail.com] 
Sent: 2010.11.30 21:59
To: 'George Dunlap'
cc: 'Chu Rui'; 'xen-devel@lists.xensource.com'; 'Dan Magenheimer'
Subject: re: [Xen-devel] Xen balloon driver discuss

Thank you for your kindly help. 

Well, on last mail, you mentioned that balloon will make pod_entries equal
to cache_size as soon as it start to work when guest starts up.
>From my understanding, if we start guest such as:

xm cr xxx.hvm maxmem=2048 memory=512 

then, we should set the /local/domain/did/memory/target to 522240 ( (
512M-2M) * 1204, 2M for VGA in your another patch? )
to tell the balloon driver in guest to inflate, right? And when balloon
driver balloon to let guest memory has this target,
I think pod_entires will equal to cached_size, right?

I did some experiment on this, the result shows different.

Step 1.
xm cr xxx.hvm maxmem=2048 memory=512

at the very beginning, I printed out domain tot_pages, 1320288,
pod.entry_count 523776, that is 2046M, pod.count 130560, that is 512M

(XEN) tot_pages 132088 pod_entries 523776 pod_count 130560


currently, /local/domain/did/memory/target in default will be written to
524288

after guest start up, balloon driver will balloon, when finish, I can see
pod.entry_count reduce to 23552, pod,count 14063

(XEN)     DomPage list too long to display
(XEN) Tot pages 132088  PoD entries=23552 cachesize=14063

Step 2.

In my understanding, /local/domain/did/memory/target should be at least 510
* 1024 , and then pod_entries will equal to cache_size

I use  500, So I did: xm mem-set domain_id  500

then I can see pod.entry_count reduce to 22338, pod,count 15921, still not
equal

(XEN) Memory pages belonging to domain 4:
(XEN)     DomPage list too long to display
(XEN) Tot pages 132088  PoD entries=22338 cachesize=15921

Step 3. 

Only after I did : xm mem-set domain_id  470
Pod_entries is equal to pod.count
(XEN)     DomPage list too long to display
(XEN) Tot pages 130825  PoD entries=14677 cachesize=14677

Later from the code, I learnt that those two values are forced to be equal,
in 

700 out_entry_check:
701     /* If we've reduced our "liabilities" beyond our "assets", free some
*/
702     if ( p2md->pod.entry_count < p2md->pod.count )
703     {
704         p2m_pod_set_cache_target(d, p2md->pod.entry_count);
705     }   
706


So in conclude, it looks like something goes wrong, the PoD entries should
equal to cachesize(pod.count) 
as soon as the balloon driver inflate to max - target, right? 

Many thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* strange CPU utilization, could related to credit schedule ?
  2010-12-01  6:29               ` tinnycloud
@ 2011-01-12 14:41                 ` tinnycloud
  2011-01-12 16:41                   ` George Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: tinnycloud @ 2011-01-12 14:41 UTC (permalink / raw)
  To: 'xen devel'; +Cc: george.dunlap


[-- Attachment #1.1: Type: text/plain, Size: 882 bytes --]

 Hi Geogre:
 
         We have quite strange CPU usage behaivor in one of our DomU(2008
HVM)
         Totally, our host has 16 physical CPU, and 9 VMS. 
 
         Most of time, the all VMs works fine, the CPU usage are low and
resonable,
But at every high workload time(say 9:00-11:00AM, there are 8 VMs, each is a
web server,
cutomers accesses the page at this time), we login into the 9th VM which is
idle, find that
its CPU usage is at 85%, doesn't make any sense since we have no task
running, also the 
usage distrbutes evenly across most of the processes.
 
        I wonder if it relates to CPU schedule algorithm in Xen.
        After go through
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00414.html

        I can't figure out any assumptiones explains our situation.
        So what do u think of this?
 
        Many thanks.
 
          

          


[-- Attachment #1.2: Type: text/html, Size: 3977 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strange CPU utilization, could related to credit schedule ?
  2011-01-12 14:41                 ` strange CPU utilization, could related to credit schedule ? tinnycloud
@ 2011-01-12 16:41                   ` George Dunlap
  2011-01-13  4:29                     ` MaoXiaoyun
  0 siblings, 1 reply; 32+ messages in thread
From: George Dunlap @ 2011-01-12 16:41 UTC (permalink / raw)
  To: tinnycloud; +Cc: xen devel

Where is that 85% number coming from -- is this from within the VM, or
from xentop?

If it's Windows reporting from within the VM, one hypothesis is that
it has to do with processing and running with virtual time.  It may
simply be a side effect of the VM only getting a small percentage of
the cpu.

If it's xentop, it's probably the vm reacting somehow to getting only
a small percentage of the CPU.  We saw something like this with early
versions of Windows 2k3, but that problem was addressed in later
service packs.  At any rate, to find out what Windows is doing would
require a bit more investigation. :-)

 -George

On Wed, Jan 12, 2011 at 2:41 PM, tinnycloud <tinnycloud@hotmail.com> wrote:
>  Hi Geogre:
>
>          We have quite strange CPU usage behaivor in one of our DomU(2008
> HVM)
>          Totally, our host has 16 physical CPU, and 9 VMS.
>
>          Most of time, the all VMs works fine, the CPU usage are low and
> resonable,
> But at every high workload time(say 9:00-11:00AM, there are 8 VMs, each is a
> web server,
> cutomers accesses the page at this time), we login into the 9th VM which
> is idle, find that
> its CPU usage is at 85%, doesn't make any sense since we have no task
> running, also the
> usage distrbutes evenly across most of the processes.
>
>         I wonder if it relates to CPU schedule algorithm in Xen.
>         After go
> through http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00414.html
>         I can't figure out any assumptiones explains our situation.
>         So what do u think of this?
>
>         Many thanks.
>
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: strange CPU utilization, could related to credit schedule ?
  2011-01-12 16:41                   ` George Dunlap
@ 2011-01-13  4:29                     ` MaoXiaoyun
  2011-01-17  3:52                       ` MaoXiaoyun
  0 siblings, 1 reply; 32+ messages in thread
From: MaoXiaoyun @ 2011-01-13  4:29 UTC (permalink / raw)
  To: george.dunlap, xen devel


[-- Attachment #1.1: Type: text/plain, Size: 2933 bytes --]


85% is from VM. 
I forget to tell that, 8VMS each of them has 2 VCPUS, and the 9th VM, which is 2008 
has 8VCPUs. We are still trying to reproduce the scence.
 
I have questiones on VM idle. How does Xen know VM is idle, or when VM is idle, 
what is VCPU state in Xen, blocked or runable, and how is the CPU utiliazation 
calcauted?
(I assume that the Idle VM finish physical CPU use before the time splice,
and its state come to blocked, then put it into *inactive* queue, right? 
But will it is possible VM's VCPU come back to *active* queue when VM still
in idle, then we may have the phenomenon of VCPU shift between twe queues?)
 
Also, when VM's load comes up, will its priority be set BOOST, thus put
the head of *active* queue to be sheduled earlier? 

 
 
> Date: Wed, 12 Jan 2011 16:41:07 +0000
> Subject: Re: [Xen-devel] strange CPU utilization, could related to credit schedule ?
> From: George.Dunlap@eu.citrix.com
> To: tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com
> 
> Where is that 85% number coming from -- is this from within the VM, or
> from xentop?
> 
> If it's Windows reporting from within the VM, one hypothesis is that
> it has to do with processing and running with virtual time. It may
> simply be a side effect of the VM only getting a small percentage of
> the cpu.
> 
> If it's xentop, it's probably the vm reacting somehow to getting only
> a small percentage of the CPU. We saw something like this with early
> versions of Windows 2k3, but that problem was addressed in later
> service packs. At any rate, to find out what Windows is doing would
> require a bit more investigation. :-)
> 
> -George
> 
> On Wed, Jan 12, 2011 at 2:41 PM, tinnycloud <tinnycloud@hotmail.com> wrote:
> >  Hi Geogre:
> >
> >          We have quite strange CPU usage behaivor in one of our DomU(2008
> > HVM)
> >          Totally, our host has 16 physical CPU, and 9 VMS.
> >
> >          Most of time, the all VMs works fine, the CPU usage are low and
> > resonable,
> > But at every high workload time(say 9:00-11:00AM, there are 8 VMs, each is a
> > web server,
> > cutomers accesses the page at this time), we login into the 9th VM which
> > is idle, find that
> > its CPU usage is at 85%, doesn't make any sense since we have no task
> > running, also the
> > usage distrbutes evenly across most of the processes.
> >
> >         I wonder if it relates to CPU schedule algorithm in Xen.
> >         After go
> > through http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00414.html
> >         I can't figure out any assumptiones explains our situation.
> >         So what do u think of this?
> >
> >         Many thanks.
> >
> >
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> >
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 4092 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: strange CPU utilization, could related to credit schedule ?
  2011-01-13  4:29                     ` MaoXiaoyun
@ 2011-01-17  3:52                       ` MaoXiaoyun
  2011-01-17 10:41                         ` George Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: MaoXiaoyun @ 2011-01-17  3:52 UTC (permalink / raw)
  To: george.dunlap, xen devel


[-- Attachment #1.1: Type: text/plain, Size: 4477 bytes --]


Hi George:
 
       I've been looking into the credit schedule over agian and again.
       Well, I not smart enough to get fully understanding.
       Could you help to clarify below understanding?
 
       1.  From the algorithm, since domains credits is  direct proportion to its weight,
I think if there are two cpu-bound domains with same weight, no matter how many 
vcpus they have, they will have the same CPU times accmulated, right?
       2. if 1 is true, what the different between domains with same weight but have
different VCPUS(say one has 4 vcpus, another has 8)?
       3. I am fully understand the problems of "credit 1 schedule "in your ppt of "Xenschedulerstatus"
 
(1)Client hypervisors and audio/video  
    Audio VM: 5% CPU
 2x Kernel-build VMs: 97% cpu 
 30-40 audio skips over 5 minutes 
 
Do you mean "kernel-build VMs" has great impact on "Audio VM", and does priority CSCHED_PRI_TS_BOOST
solve this?

(2)Not fair to latency-sensitive workloads 
 Network scp: “Fair share” 50%, usage 20-30% 
(3) Load balancing 64 threads (4 x 8 x 2) 
 Unpredictable 
 Not scalable 
 Power management, Hyperthreads

Could you help to explan more ?
 
many many thanks, those confusions really makes me headache, I am a bit of silly.
    
 
 


From: tinnycloud@hotmail.com
To: george.dunlap@eu.citrix.com; xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] strange CPU utilization, could related to credit schedule ?
Date: Thu, 13 Jan 2011 12:29:05 +0800




85% is from VM. 
I forget to tell that, 8VMS each of them has 2 VCPUS, and the 9th VM, which is 2008 
has 8VCPUs. We are still trying to reproduce the scence.
 
I have questiones on VM idle. How does Xen know VM is idle, or when VM is idle, 
what is VCPU state in Xen, blocked or runable, and how is the CPU utiliazation 
calcauted?
(I assume that the Idle VM finish physical CPU use before the time splice,
and its state come to blocked, then put it into *inactive* queue, right? 
But will it is possible VM's VCPU come back to *active* queue when VM still
in idle, then we may have the phenomenon of VCPU shift between twe queues?)
 
Also, when VM's load comes up, will its priority be set BOOST, thus put
the head of *active* queue to be sheduled earlier? 

 
 
> Date: Wed, 12 Jan 2011 16:41:07 +0000
> Subject: Re: [Xen-devel] strange CPU utilization, could related to credit schedule ?
> From: George.Dunlap@eu.citrix.com
> To: tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com
> 
> Where is that 85% number coming from -- is this from within the VM, or
> from xentop?
> 
> If it's Windows reporting from within the VM, one hypothesis is that
> it has to do with processing and running with virtual time. It may
> simply be a side effect of the VM only getting a small percentage of
> the cpu.
> 
> If it's xentop, it's probably the vm reacting somehow to getting only
> a small percentage of the CPU. We saw something like this with early
> versions of Windows 2k3, but that problem was addressed in later
> service packs. At any rate, to find out what Windows is doing would
> require a bit more investigation. :-)
> 
> -George
> 
> On Wed, Jan 12, 2011 at 2:41 PM, tinnycloud <tinnycloud@hotmail.com> wrote:
> >  Hi Geogre:
> >
> >          We have quite strange CPU usage behaivor in one of our DomU(2008
> > HVM)
> >          Totally, our host has 16 physical CPU, and 9 VMS.
> >
> >          Most of time, the all VMs works fine, the CPU usage are low and
> > resonable,
> > But at every high workload time(say 9:00-11:00AM, there are 8 VMs, each is a
> > web server,
> > cutomers accesses the page at this time), we login into the 9th VM which
> > is idle, find that
> > its CPU usage is at 85%, doesn't make any sense since we have no task
> > running, also the
> > usage distrbutes evenly across most of the processes.
> >
> >         I wonder if it relates to CPU schedule algorithm in Xen.
> >         After go
> > through http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00414.html
> >         I can't figure out any assumptiones explains our situation.
> >         So what do u think of this?
> >
> >         Many thanks.
> >
> >
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> >
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 6247 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strange CPU utilization, could related to credit schedule ?
  2011-01-17  3:52                       ` MaoXiaoyun
@ 2011-01-17 10:41                         ` George Dunlap
  2011-01-17 10:51                           ` Re: [Xen-devel] strange CPU utilization, could related to creditschedule ? kim.jin
  0 siblings, 1 reply; 32+ messages in thread
From: George Dunlap @ 2011-01-17 10:41 UTC (permalink / raw)
  To: MaoXiaoyun; +Cc: xen devel

On Mon, Jan 17, 2011 at 3:52 AM, MaoXiaoyun <tinnycloud@hotmail.com> wrote:
> Hi George:
>        1.  From the algorithm, since domains credits is  direct proportion
> to its weight,
> I think if there are two cpu-bound domains with same weight, no matter how
> many
> vcpus they have, they will have the same CPU times accmulated, right?

It used to be the case, yes.  But since that is very
counter-intuitive, some months ago I introduced a change such that the
weight is calculated on a per-vcpu basis.  If you look in
csched_acct(), when accounting credit, weight of a domain is
multiplied by sdom->active_vcpu_count.

>        2. if 1 is true, what the different between domains with same
> weight but have
> different VCPUS(say one has 4 vcpus, another has 8)?

If two domains have the same number of "active" vcpus (4 each, for
example) they'll get the same amount of CPU time.  But if the 8-vcpu
domain has 8 vcpus in "active" mode, it will get twice as much time.

But this is a recent change; in earlier versions of Xen (before 3.4
for sure, and possibly 4.0, I can't remember), if two VMs are given
the same weight, they'll get the same cpu time.

>        3. I am fully understand the problems of "credit 1 schedule "in your
> ppt of "Xenschedulerstatus"
>
> (1)Client hypervisors and audio/video 
>     Audio VM: 5% CPU
>  2x Kernel-build VMs: 97% cpu
>  30-40 audio skips over 5 minutes
>
> Do you mean "kernel-build VMs" has great impact on "Audio VM", and does
> priority CSCHED_PRI_TS_BOOST
> solve this?

BOOST does not solve this problem.  I think I described the problem in
the paper: BOOST is an unstable place to be -- you can't stay there
very long.  The way BOOST works is this:
* You are put into BOOST if your credits reach a certain threshold
(30ms worth of credit)
* You are taken out of BOOST if you are interrupted by a scheduler "tick"

If you run at about 5% (or about 1/20 of the time), you can expect to
be running on average every 20 ticks.  Since timer ticks happen every
10ms, that means you can expect to stay in BOOST for an average of
200ms.

So no matter how little cpu you use, you'll flip back and forth
between BOOST and normal, often several times per second.

> many many thanks, those confusions really makes me headache, I am a bit of
> silly.

不是! 懂scheduling非常难.  It probably took me about six months to really
understand what was going on. :-)

 -George

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: [Xen-devel] strange CPU utilization, could related to creditschedule ?
  2011-01-17 10:41                         ` George Dunlap
@ 2011-01-17 10:51                           ` kim.jin
  2011-01-17 10:56                             ` George Dunlap
  2011-01-17 11:30                             ` Re: Re: strange CPU utilization, could related tocreditschedule ? kim.jin
  0 siblings, 2 replies; 32+ messages in thread
From: kim.jin @ 2011-01-17 10:51 UTC (permalink / raw)
  To: George Dunlap, MaoXiaoyun; +Cc: xen devel

[-- Attachment #1: Type: text/plain, Size: 3097 bytes --]

Then, how about the frequency of CPU? e.g., one VM have 1GHz CPUs, but the other have 2GHz CPUs. 

------------------				 
Best Regards!
 
Kim King
2011-01-17

-------------------------------------------------------------
George Dunlap
2011-01-17 18:41:35
MaoXiaoyun
xen devel
Re: [Xen-devel] strange CPU utilization, could related to creditschedule ?

>On Mon, Jan 17, 2011 at 3:52 AM, MaoXiaoyun <tinnycloud@hotmail.com> wrote:
>> Hi George:
>>        1.  From the algorithm, since domains credits is  direct proportion
>> to its weight,
>> I think if there are two cpu-bound domains with same weight, no matter how
>> many
>> vcpus they have, they will have the same CPU times accmulated, right?
>
>It used to be the case, yes.  But since that is very
>counter-intuitive, some months ago I introduced a change such that the
>weight is calculated on a per-vcpu basis.  If you look in
>csched_acct(), when accounting credit, weight of a domain is
>multiplied by sdom->active_vcpu_count.
>
>>        2. if 1 is true, what the different between domains with same
>> weight but have
>> different VCPUS(say one has 4 vcpus, another has 8)?
>
>If two domains have the same number of "active" vcpus (4 each, for
>example) they'll get the same amount of CPU time.  But if the 8-vcpu
>domain has 8 vcpus in "active" mode, it will get twice as much time.
>
>But this is a recent change; in earlier versions of Xen (before 3.4
>for sure, and possibly 4.0, I can't remember), if two VMs are given
>the same weight, they'll get the same cpu time.
>
>>        3. I am fully understand the problems of "credit 1 schedule "in your
>> ppt of "Xenschedulerstatus"
>>
>> (1)Client hypervisors and audio/video 
>>     Audio VM: 5% CPU
>>  2x Kernel-build VMs: 97% cpu
>>  30-40 audio skips over 5 minutes
>>
>> Do you mean "kernel-build VMs" has great impact on "Audio VM", and does
>> priority CSCHED_PRI_TS_BOOST
>> solve this?
>
>BOOST does not solve this problem.  I think I described the problem in
>the paper: BOOST is an unstable place to be -- you can't stay there
>very long.  The way BOOST works is this:
>* You are put into BOOST if your credits reach a certain threshold
>(30ms worth of credit)
>* You are taken out of BOOST if you are interrupted by a scheduler "tick"
>
>If you run at about 5% (or about 1/20 of the time), you can expect to
>be running on average every 20 ticks.  Since timer ticks happen every
>10ms, that means you can expect to stay in BOOST for an average of
>200ms.
>
>So no matter how little cpu you use, you'll flip back and forth
>between BOOST and normal, often several times per second.
>
>> many many thanks, those confusions really makes me headache, I am a bit of
>> silly.
>
>不是! 懂scheduling非常难.  It probably took me about six months to really
>understand what was going on. :-)
>
> -George
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>.

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: strange CPU utilization, could related to creditschedule ?
  2011-01-17 10:51                           ` Re: [Xen-devel] strange CPU utilization, could related to creditschedule ? kim.jin
@ 2011-01-17 10:56                             ` George Dunlap
  2011-01-17 11:30                             ` Re: Re: strange CPU utilization, could related tocreditschedule ? kim.jin
  1 sibling, 0 replies; 32+ messages in thread
From: George Dunlap @ 2011-01-17 10:56 UTC (permalink / raw)
  To: kim.jin; +Cc: MaoXiaoyun, xen devel

On Mon, Jan 17, 2011 at 10:51 AM, kim.jin <kim.jin@stromasys.com> wrote:
> Then, how about the frequency of CPU? e.g., one VM have 1GHz CPUs, but the other have 2GHz CPUs.

Do you mean, if someone is using CPU frequency scaling?

 -George

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: Re: strange CPU utilization, could related tocreditschedule ?
  2011-01-17 10:51                           ` Re: [Xen-devel] strange CPU utilization, could related to creditschedule ? kim.jin
  2011-01-17 10:56                             ` George Dunlap
@ 2011-01-17 11:30                             ` kim.jin
  1 sibling, 0 replies; 32+ messages in thread
From: kim.jin @ 2011-01-17 11:30 UTC (permalink / raw)
  To: George Dunlap; +Cc: MaoXiaoyun, xen devel

>On Mon, Jan 17, 2011 at 10:51 AM, kim.jin <kim.jin@stromasys.com> wrote:
>> Then, how about the frequency of CPU? e.g., one VM have 1GHz CPUs, but the other have 2GHz CPUs.
>
>Do you mean, if someone is using CPU frequency scaling?      
The similar thing. Do the new algorith care about the frequency of vCPU?
> -George

Best Regards!
 
Kim King
2011-01-17

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2011-01-17 11:30 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <SNT0-MC3-F148nSuKiM000aac29@SNT0-MC3-F14.Snt0.hotmail.com>
2010-11-21  6:26 ` Xen balloon driver discuss tinnycloud
2010-11-22  4:33   ` MaoXiaoyun
2010-11-22 17:46     ` Dan Magenheimer
2010-11-23 14:58       ` tinnycloud
2010-11-27  6:54       ` cloudroot
2010-11-28  2:36         ` Dan Magenheimer
2010-11-29  4:20           ` tinnycloud
2010-11-29  6:34           ` xiaoyun.maoxy
     [not found]           ` <002b01cb8f8f$852bda10$8f838e30$@maoxy@aliyun-inc.com>
2010-11-29  8:37             ` tinnycloud
2010-11-29 10:09             ` George Dunlap
2010-11-29 10:12           ` George Dunlap
2010-11-29 15:42             ` Dan Magenheimer
2010-11-28 13:00         ` Pasi Kärkkäinen
2010-11-29  6:56   ` Chu Rui
2010-11-29 10:55     ` 答复: [Xen-devel] " tinnycloud
2010-11-29 11:19       ` George Dunlap
2010-11-29 15:41         ` hotmaim
2010-11-30 10:50           ` George Dunlap
2010-11-30 13:58             ` tinnycloud
2010-11-30 16:39             ` Dan Magenheimer
2010-12-01  5:07             ` xiaoyun.maoxy
     [not found]             ` <00fe01cb9115$98319c80$c894d580$@maoxy@aliyun-inc.com>
2010-12-01  6:29               ` tinnycloud
2011-01-12 14:41                 ` strange CPU utilization, could related to credit schedule ? tinnycloud
2011-01-12 16:41                   ` George Dunlap
2011-01-13  4:29                     ` MaoXiaoyun
2011-01-17  3:52                       ` MaoXiaoyun
2011-01-17 10:41                         ` George Dunlap
2011-01-17 10:51                           ` Re: [Xen-devel] strange CPU utilization, could related to creditschedule ? kim.jin
2011-01-17 10:56                             ` George Dunlap
2011-01-17 11:30                             ` Re: Re: strange CPU utilization, could related tocreditschedule ? kim.jin
2010-11-30  3:51         ` 答复: [Xen-devel] Xen balloon driver discuss Chu Rui
2010-11-30 11:08           ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.