All of lore.kernel.org
 help / color / mirror / Atom feed
* Claim mode and HVM PoD interact badly
@ 2014-01-10 11:56 Wei Liu
  2014-01-10 11:59 ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Liu @ 2014-01-10 11:56 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, Ian Campbell

When I have following configuration in HVM config file:
  memory=128
  maxmem=256
and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with

xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82

With claim_mode=0, I can sucessfuly create HVM guest.

PV guest is not affected.

Wei.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 11:56 Claim mode and HVM PoD interact badly Wei Liu
@ 2014-01-10 11:59 ` Ian Campbell
  2014-01-10 12:15   ` Processed: " xen
  2014-01-10 14:58   ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 27+ messages in thread
From: Ian Campbell @ 2014-01-10 11:59 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel

create ^
owner Wei Liu <wei.liu2@citrix.com>
thanks

On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> When I have following configuration in HVM config file:
>   memory=128
>   maxmem=256
> and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> 
> xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> 
> With claim_mode=0, I can sucessfuly create HVM guest.

Is it trying to claim 256M instead of 128M? (although the likelyhood
that you only have 128-255M free is quite low, or are you
autoballooning?)

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Processed: Re: Claim mode and HVM PoD interact badly
  2014-01-10 11:59 ` Ian Campbell
@ 2014-01-10 12:15   ` xen
  2014-01-10 14:58   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 27+ messages in thread
From: xen @ 2014-01-10 12:15 UTC (permalink / raw)
  To: Ian Campbell, xen-devel

Processing commands for xen@bugs.xenproject.org:

> create ^
Created new bug #32 rooted at `<20140110115638.GG29180@zion.uk.xensource.com>'
Title: `Re: Claim mode and HVM PoD interact badly'
> owner Wei Liu <wei.liu2@citrix.com>
Command failed: Cannot parse arguments at /srv/xen-devel-bugs/lib/emesinae/control.pl line 301, <M> line 32.
Stop processing here.

Modified/created Bugs:
 - 32: http://bugs.xenproject.org/xen/bug/32 (new)

---
Xen Hypervisor Bug Tracker
See http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for information on reporting bugs
Contact xen-bugs-owner@bugs.xenproject.org with any infrastructure issues

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 11:59 ` Ian Campbell
  2014-01-10 12:15   ` Processed: " xen
@ 2014-01-10 14:58   ` Konrad Rzeszutek Wilk
  2014-01-10 15:10     ` Wei Liu
                       ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 14:58 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, xen-devel

On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> create ^
> owner Wei Liu <wei.liu2@citrix.com>
> thanks
> 
> On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > When I have following configuration in HVM config file:
> >   memory=128
> >   maxmem=256
> > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > 
> > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > 
> > With claim_mode=0, I can sucessfuly create HVM guest.
> 
> Is it trying to claim 256M instead of 128M? (although the likelyhood

No. 128MB actually.

> that you only have 128-255M free is quite low, or are you
> autoballooning?)

This patch fixes it for me. It basically sets the amount of pages
claimed to be 'maxmem' instead of 'memory' for PoD.

I don't know PoD very well, and this claim is only valid during the
allocation of the guests memory - so the 'target_pages' value might be
the wrong one. However looking at the hypervisor's
'p2m_pod_set_mem_target' I see this comment:

 316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
 317  *   entries.  The balloon driver will deflate the balloon to give back
 318  *   the remainder of the ram to the guest OS.

Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
And then it is the responsibility of the balloon driver to give the memory
back (and this is where the 'static-max' et al come in play to tell the
balloon driver to balloon out).


diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 77bd365..65e9577 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
 
     /* try to claim pages for early warning of insufficient memory available */
     if ( claim_enabled ) {
-        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
+        unsigned long nr = nr_pages - cur_pages;
+
+        if ( pod_mode )
+            nr = target_pages - 0x20;
+
+        rc = xc_domain_claim_pages(xch, dom, nr);
         if ( rc != 0 )
         {
             PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 14:58   ` Konrad Rzeszutek Wilk
@ 2014-01-10 15:10     ` Wei Liu
  2014-01-10 15:41       ` Konrad Rzeszutek Wilk
  2014-01-10 15:16     ` Ian Campbell
  2014-01-20 16:29     ` Wei Liu
  2 siblings, 1 reply; 27+ messages in thread
From: Wei Liu @ 2014-01-10 15:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Wei Liu, Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > create ^
> > owner Wei Liu <wei.liu2@citrix.com>
> > thanks
> > 
> > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > When I have following configuration in HVM config file:
> > >   memory=128
> > >   maxmem=256
> > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > 
> > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > 
> > > With claim_mode=0, I can sucessfuly create HVM guest.
> > 
> > Is it trying to claim 256M instead of 128M? (although the likelyhood
> 
> No. 128MB actually.
> 

Huh? My debug message says otherwise. It tried to claim 248MB (256MB -
8MB video ram). Did I misread your message...

On the hypervisor side d->tot_pages = 30688, d->max_pages = 33024 (128MB
+ 1MB slack). So the claim failed.

> > that you only have 128-255M free is quite low, or are you
> > autoballooning?)
> 
> This patch fixes it for me. It basically sets the amount of pages
> claimed to be 'maxmem' instead of 'memory' for PoD.
> 
> I don't know PoD very well, and this claim is only valid during the
> allocation of the guests memory - so the 'target_pages' value might be
> the wrong one. However looking at the hypervisor's
> 'p2m_pod_set_mem_target' I see this comment:
> 
>  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
>  317  *   entries.  The balloon driver will deflate the balloon to give back
>  318  *   the remainder of the ram to the guest OS.
> 
> Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> And then it is the responsibility of the balloon driver to give the memory
> back (and this is where the 'static-max' et al come in play to tell the
> balloon driver to balloon out).
> 
> 
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 77bd365..65e9577 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
>  
>      /* try to claim pages for early warning of insufficient memory available */
>      if ( claim_enabled ) {
> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> +        unsigned long nr = nr_pages - cur_pages;
> +
> +        if ( pod_mode )
> +            nr = target_pages - 0x20;
> +

Yes it should work because this makes nr smaller than d->tot_pages and
d->max_pages. But according to the comment you pasted above this looks
like wrong fix...

Wei.

> +        rc = xc_domain_claim_pages(xch, dom, nr);
>          if ( rc != 0 )
>          {
>              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 14:58   ` Konrad Rzeszutek Wilk
  2014-01-10 15:10     ` Wei Liu
@ 2014-01-10 15:16     ` Ian Campbell
  2014-01-10 15:19       ` Wei Liu
  2014-01-10 15:28       ` Konrad Rzeszutek Wilk
  2014-01-20 16:29     ` Wei Liu
  2 siblings, 2 replies; 27+ messages in thread
From: Ian Campbell @ 2014-01-10 15:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel

On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > create ^
> > owner Wei Liu <wei.liu2@citrix.com>
> > thanks
> > 
> > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > When I have following configuration in HVM config file:
> > >   memory=128
> > >   maxmem=256
> > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > 
> > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > 
> > > With claim_mode=0, I can sucessfuly create HVM guest.
> > 
> > Is it trying to claim 256M instead of 128M? (although the likelyhood
> 
> No. 128MB actually.
> 
> > that you only have 128-255M free is quite low, or are you
> > autoballooning?)
> 
> This patch fixes it for me. It basically sets the amount of pages
> claimed to be 'maxmem' instead of 'memory' for PoD.
> 
> I don't know PoD very well,

Me neither, this might have to wait for George to get back.

We should also consider flipping the default claim setting to off in xl
for 4.4, since that is likely to be a lower impact change than fixing
the issue (and one which we all understand!).

>  and this claim is only valid during the
> allocation of the guests memory - so the 'target_pages' value might be
> the wrong one. However looking at the hypervisor's
> 'p2m_pod_set_mem_target' I see this comment:
> 
>  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
>  317  *   entries.  The balloon driver will deflate the balloon to give back
>  318  *   the remainder of the ram to the guest OS.
> 
> Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> And then it is the responsibility of the balloon driver to give the memory
> back (and this is where the 'static-max' et al come in play to tell the
> balloon driver to balloon out).

PoD exists purely so that we don't need the 'maxmem' amount of memory at
boot time. It is basically there in order to let the guest get booted
far enough to load the balloon driver to give the memory back.

It's basically a boot time zero-page sharing mechanism AIUI.

> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 77bd365..65e9577 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
>  
>      /* try to claim pages for early warning of insufficient memory available */
>      if ( claim_enabled ) {
> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> +        unsigned long nr = nr_pages - cur_pages;
> +
> +        if ( pod_mode )
> +            nr = target_pages - 0x20;

0x20?

> +
> +        rc = xc_domain_claim_pages(xch, dom, nr);
>          if ( rc != 0 )
>          {
>              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:16     ` Ian Campbell
@ 2014-01-10 15:19       ` Wei Liu
  2014-01-10 15:28       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 27+ messages in thread
From: Wei Liu @ 2014-01-10 15:19 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, Ian Jackson, xen-devel, Wei Liu

On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
 [...]
 
> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > index 77bd365..65e9577 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> >  
> >      /* try to claim pages for early warning of insufficient memory available */
> >      if ( claim_enabled ) {
> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > +        unsigned long nr = nr_pages - cur_pages;
> > +
> > +        if ( pod_mode )
> > +            nr = target_pages - 0x20;
> 
> 0x20?
> 

128K VGA hole. :-)

Wei.

> > +
> > +        rc = xc_domain_claim_pages(xch, dom, nr);
> >          if ( rc != 0 )
> >          {
> >              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:16     ` Ian Campbell
  2014-01-10 15:19       ` Wei Liu
@ 2014-01-10 15:28       ` Konrad Rzeszutek Wilk
  2014-01-10 15:56         ` Ian Campbell
  2014-01-10 19:07         ` Tim Deegan
  1 sibling, 2 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 15:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel

On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
> On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > create ^
> > > owner Wei Liu <wei.liu2@citrix.com>
> > > thanks
> > > 
> > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > When I have following configuration in HVM config file:
> > > >   memory=128
> > > >   maxmem=256
> > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > 
> > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > 
> > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > 
> > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > 
> > No. 128MB actually.
> > 
> > > that you only have 128-255M free is quite low, or are you
> > > autoballooning?)
> > 
> > This patch fixes it for me. It basically sets the amount of pages
> > claimed to be 'maxmem' instead of 'memory' for PoD.
> > 
> > I don't know PoD very well,
> 
> Me neither, this might have to wait for George to get back.

<nods>
> 
> We should also consider flipping the default claim setting to off in xl
> for 4.4, since that is likely to be a lower impact change than fixing
> the issue (and one which we all understand!).

<unwraps the Xen 4.4 duct-tape roll>

> 
> >  and this claim is only valid during the
> > allocation of the guests memory - so the 'target_pages' value might be
> > the wrong one. However looking at the hypervisor's
> > 'p2m_pod_set_mem_target' I see this comment:
> > 
> >  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
> >  317  *   entries.  The balloon driver will deflate the balloon to give back
> >  318  *   the remainder of the ram to the guest OS.
> > 
> > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > And then it is the responsibility of the balloon driver to give the memory
> > back (and this is where the 'static-max' et al come in play to tell the
> > balloon driver to balloon out).
> 
> PoD exists purely so that we don't need the 'maxmem' amount of memory at
> boot time. It is basically there in order to let the guest get booted
> far enough to load the balloon driver to give the memory back.
> 
> It's basically a boot time zero-page sharing mechanism AIUI.

But it does look to gulp up hypervisor memory and return it during
allocation of memory for the guest.

Digging in the hypervisor I see in 'p2m_pod_set_cache_target' (where
pod_target is for this case maxmem - memory):

And pod.count is zero, so for Wei's case it would be 128MB.

 216     /* Increasing the target */
 217     while ( pod_target > p2m->pod.count )
 218     {
 222         if ( (pod_target - p2m->pod.count) >= SUPERPAGE_PAGES )
 223             order = PAGE_ORDER_2M;
 224         else
 225             order = PAGE_ORDER_4K;
 226     retry:
 227         page = alloc_domheap_pages(d, order, PAGE_ORDER_4K);

So allocate 64 2MB pages

 243         p2m_pod_cache_add(p2m, page, order);

Add to a list

251 
 252     /* Decreasing the target */
 253     /* We hold the pod lock here, so we don't need to worry about
 254      * cache disappearing under our feet. */
 255     while ( pod_target < p2m->pod.count )
 256     {
..
 266         page = p2m_pod_cache_get(p2m, order);

Get the page (from the list)
..
 287             put_page(page+i);

And then free it.


>From reading the code the patch seems correct - we will _need_ that
extra 128MB 'claim' to allocate/free the 128MB extra pages. They
are temporary as we do free them.

> 
> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > index 77bd365..65e9577 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> >  
> >      /* try to claim pages for early warning of insufficient memory available */
> >      if ( claim_enabled ) {
> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > +        unsigned long nr = nr_pages - cur_pages;
> > +
> > +        if ( pod_mode )
> > +            nr = target_pages - 0x20;
> 
> 0x20?

Yup. From earlier:

305     if ( pod_mode )
306     {
307         /*
308          * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
309          * adjust the PoD cache size so that domain tot_pages will be
310          * target_pages - 0x20 after this call.
311          */
312         rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
313                                       NULL, NULL, NULL);
314         if ( rc != 0 )
315         {
316             PERROR("Could not set PoD target for HVM guest.\n");
317             goto error_out;
318         }
319     }

Maybe a nice little 'pod_delta' or 'pod_pages' should be used instead of copying
this around.

> 
> > +
> > +        rc = xc_domain_claim_pages(xch, dom, nr);
> >          if ( rc != 0 )
> >          {
> >              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:10     ` Wei Liu
@ 2014-01-10 15:41       ` Konrad Rzeszutek Wilk
  2014-01-10 15:48         ` Wei Liu
                           ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 15:41 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
> On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > create ^
> > > owner Wei Liu <wei.liu2@citrix.com>
> > > thanks
> > > 
> > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > When I have following configuration in HVM config file:
> > > >   memory=128
> > > >   maxmem=256
> > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > 
> > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > 
> > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > 
> > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > 
> > No. 128MB actually.
> > 
> 
> Huh? My debug message says otherwise. It tried to claim 248MB (256MB -
> 8MB video ram). Did I misread your message...

The 'claim' being the hypercall to set the 'clamp' on how much memory
the guest can allocate. This is based on:

242     unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;

  /* try to claim pages for early warning of insufficient memory available */
337     if ( claim_enabled ) {
343         rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);

Your 'mem_size' is 128MB, cur_pages is 0xc0, so it ends up 'claiming'
that the guest only needs 128MB - 768kB.
> 
> On the hypervisor side d->tot_pages = 30688, d->max_pages = 33024 (128MB
> + 1MB slack). So the claim failed.

Correct.
> 
> > > that you only have 128-255M free is quite low, or are you
> > > autoballooning?)
> > 
> > This patch fixes it for me. It basically sets the amount of pages
> > claimed to be 'maxmem' instead of 'memory' for PoD.
> > 
> > I don't know PoD very well, and this claim is only valid during the
> > allocation of the guests memory - so the 'target_pages' value might be
> > the wrong one. However looking at the hypervisor's
> > 'p2m_pod_set_mem_target' I see this comment:
> > 
> >  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
> >  317  *   entries.  The balloon driver will deflate the balloon to give back
> >  318  *   the remainder of the ram to the guest OS.
> > 
> > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > And then it is the responsibility of the balloon driver to give the memory
> > back (and this is where the 'static-max' et al come in play to tell the
> > balloon driver to balloon out).
> > 
> > 
> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > index 77bd365..65e9577 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> >  
> >      /* try to claim pages for early warning of insufficient memory available */
> >      if ( claim_enabled ) {
> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > +        unsigned long nr = nr_pages - cur_pages;
> > +
> > +        if ( pod_mode )
> > +            nr = target_pages - 0x20;
> > +
> 
> Yes it should work because this makes nr smaller than d->tot_pages and
> d->max_pages. But according to the comment you pasted above this looks
> like wrong fix...

It should be: 

tot_pages = 128MB
max_pages = 256MB
nr = 256MB - 0x20.

So tot_pages < max_pages > nr && nr > tot_pages

If I got my variables right.
Which means that 'nr' is greater than tot_pages but less than max_pages.

> 
> Wei.
> 
> > +        rc = xc_domain_claim_pages(xch, dom, nr);
> >          if ( rc != 0 )
> >          {
> >              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:41       ` Konrad Rzeszutek Wilk
@ 2014-01-10 15:48         ` Wei Liu
  2014-01-10 16:08           ` Konrad Rzeszutek Wilk
  2014-01-10 15:52         ` Jan Beulich
  2014-01-10 16:03         ` Wei Liu
  2 siblings, 1 reply; 27+ messages in thread
From: Wei Liu @ 2014-01-10 15:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Wei Liu, Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 10:41:05AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
> > On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > > create ^
> > > > owner Wei Liu <wei.liu2@citrix.com>
> > > > thanks
> > > > 
> > > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > > When I have following configuration in HVM config file:
> > > > >   memory=128
> > > > >   maxmem=256
> > > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > > 
> > > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > > 
> > > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > > 
> > > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > > 
> > > No. 128MB actually.
> > > 
> > 
> > Huh? My debug message says otherwise. It tried to claim 248MB (256MB -
> > 8MB video ram). Did I misread your message...
> 
> The 'claim' being the hypercall to set the 'clamp' on how much memory
> the guest can allocate. This is based on:
> 
> 242     unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
> 
>   /* try to claim pages for early warning of insufficient memory available */
> 337     if ( claim_enabled ) {
> 343         rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> 
> Your 'mem_size' is 128MB, cur_pages is 0xc0, so it ends up 'claiming'
> that the guest only needs 128MB - 768kB.

No, the nr_pages I saw was 63296 (256MB - 768KB) -- I printed it out.

Wei.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:41       ` Konrad Rzeszutek Wilk
  2014-01-10 15:48         ` Wei Liu
@ 2014-01-10 15:52         ` Jan Beulich
  2014-01-10 16:07           ` Konrad Rzeszutek Wilk
  2014-01-10 16:03         ` Wei Liu
  2 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2014-01-10 15:52 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk; +Cc: Ian Campbell, xen-devel

>>> On 10.01.14 at 16:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
>> On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
>> > --- a/tools/libxc/xc_hvm_build_x86.c
>> > +++ b/tools/libxc/xc_hvm_build_x86.c
>> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
>> >  
>> >      /* try to claim pages for early warning of insufficient memory available */
>> >      if ( claim_enabled ) {
>> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
>> > +        unsigned long nr = nr_pages - cur_pages;
>> > +
>> > +        if ( pod_mode )
>> > +            nr = target_pages - 0x20;
>> > +
>> 
>> Yes it should work because this makes nr smaller than d->tot_pages and
>> d->max_pages. But according to the comment you pasted above this looks
>> like wrong fix...
> 
> It should be: 
> 
> tot_pages = 128MB
> max_pages = 256MB
> nr = 256MB - 0x20.
> 
> So tot_pages < max_pages > nr && nr > tot_pages
> 
> If I got my variables right.
> Which means that 'nr' is greater than tot_pages but less than max_pages.

But that seems conceptually wrong: As was said before, the guest
should only get 128Mb allocated, hence it would be wrong to claim
almost 256Mb for it.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:28       ` Konrad Rzeszutek Wilk
@ 2014-01-10 15:56         ` Ian Campbell
  2014-01-10 16:05           ` Konrad Rzeszutek Wilk
  2014-01-10 19:07         ` Tim Deegan
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2014-01-10 15:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel

On Fri, 2014-01-10 at 10:28 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
> > On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> > > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > > And then it is the responsibility of the balloon driver to give the memory
> > > back (and this is where the 'static-max' et al come in play to tell the
> > > balloon driver to balloon out).
> > 
> > PoD exists purely so that we don't need the 'maxmem' amount of memory at
> > boot time. It is basically there in order to let the guest get booted
> > far enough to load the balloon driver to give the memory back.
> > 
> > It's basically a boot time zero-page sharing mechanism AIUI.
> 
> But it does look to gulp up hypervisor memory and return it during
> allocation of memory for the guest.

It should be less than the maxmem-memory amount though. Perhaps because
Wei is using relatively small sizes the pod cache ends up being a
similar size to the saving?

Or maybe I just don't understand PoD, since the code you quote does seem
to contradict that.

Or maybe libxl's calculation of pod_target is wrong?

> From reading the code the patch seems correct - we will _need_ that
> extra 128MB 'claim' to allocate/free the 128MB extra pages. They
> are temporary as we do free them.

It does makes sense that the PoD cache should be included in the claim,
I just don't get why the cache is so big...

> > > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > > index 77bd365..65e9577 100644
> > > --- a/tools/libxc/xc_hvm_build_x86.c
> > > +++ b/tools/libxc/xc_hvm_build_x86.c
> > > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> > >  
> > >      /* try to claim pages for early warning of insufficient memory available */
> > >      if ( claim_enabled ) {
> > > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > > +        unsigned long nr = nr_pages - cur_pages;
> > > +
> > > +        if ( pod_mode )
> > > +            nr = target_pages - 0x20;
> > 
> > 0x20?
> 
> Yup. From earlier:
> 
> 305     if ( pod_mode )
> 306     {
> 307         /*
> 308          * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
> 309          * adjust the PoD cache size so that domain tot_pages will be
> 310          * target_pages - 0x20 after this call.
> 311          */
> 312         rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
> 313                                       NULL, NULL, NULL);
> 314         if ( rc != 0 )
> 315         {
> 316             PERROR("Could not set PoD target for HVM guest.\n");
> 317             goto error_out;
> 318         }
> 319     }
> 
> Maybe a nice little 'pod_delta' or 'pod_pages' should be used instead of copying
> this around.

A #define or something might be nice, yes.

> 
> > 
> > > +
> > > +        rc = xc_domain_claim_pages(xch, dom, nr);
> > >          if ( rc != 0 )
> > >          {
> > >              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");
> > 
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:41       ` Konrad Rzeszutek Wilk
  2014-01-10 15:48         ` Wei Liu
  2014-01-10 15:52         ` Jan Beulich
@ 2014-01-10 16:03         ` Wei Liu
  2014-01-10 16:50           ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 27+ messages in thread
From: Wei Liu @ 2014-01-10 16:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Wei Liu, Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 10:41:05AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
> > On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > > create ^
> > > > owner Wei Liu <wei.liu2@citrix.com>
> > > > thanks
> > > > 
> > > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > > When I have following configuration in HVM config file:
> > > > >   memory=128
> > > > >   maxmem=256
> > > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > > 
> > > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > > 
> > > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > > 
> > > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > > 
> > > No. 128MB actually.
> > > 
> > 
> > Huh? My debug message says otherwise. It tried to claim 248MB (256MB -
> > 8MB video ram). Did I misread your message...
> 
> The 'claim' being the hypercall to set the 'clamp' on how much memory
> the guest can allocate. This is based on:
> 
> 242     unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;

This is in fact initialized to 'maxmem' in guest's config file and

243     unsigned long target_pages = args->mem_target >> PAGE_SHIFT;

This is in fact 'memory' in guest's config file.

So when you try to claim "maxmem" and the current limit is "memory" it
would not work.

So guest should only claim target_pages sans 0x20 pages if PoD enabled.
Oh this is what your initial patch did. I don't know whether this is
conceptually correct though. :-P

Further more, should guest only allow to claim target_pages, regardless
whether PoD is enabled? When only "memory" is specify, "maxmem"
equals to "memory". So conceptually what we really care about is
"memory" not "maxmem".

Wei.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:56         ` Ian Campbell
@ 2014-01-10 16:05           ` Konrad Rzeszutek Wilk
  2014-01-10 16:11             ` Ian Campbell
  2014-01-27 12:44             ` George Dunlap
  0 siblings, 2 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 16:05 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel

On Fri, Jan 10, 2014 at 03:56:13PM +0000, Ian Campbell wrote:
> On Fri, 2014-01-10 at 10:28 -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
> > > On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> > > > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > > > And then it is the responsibility of the balloon driver to give the memory
> > > > back (and this is where the 'static-max' et al come in play to tell the
> > > > balloon driver to balloon out).
> > > 
> > > PoD exists purely so that we don't need the 'maxmem' amount of memory at
> > > boot time. It is basically there in order to let the guest get booted
> > > far enough to load the balloon driver to give the memory back.
> > > 
> > > It's basically a boot time zero-page sharing mechanism AIUI.
> > 
> > But it does look to gulp up hypervisor memory and return it during
> > allocation of memory for the guest.
> 
> It should be less than the maxmem-memory amount though. Perhaps because
> Wei is using relatively small sizes the pod cache ends up being a
> similar size to the saving?
> 
> Or maybe I just don't understand PoD, since the code you quote does seem
> to contradict that.
> 
> Or maybe libxl's calculation of pod_target is wrong?
> 
> > From reading the code the patch seems correct - we will _need_ that
> > extra 128MB 'claim' to allocate/free the 128MB extra pages. They
> > are temporary as we do free them.
> 
> It does makes sense that the PoD cache should be included in the claim,
> I just don't get why the cache is so big...

I think it expands and shrinks to make sure that the memory is present
in the hypervisor. If there is not enough memory it would -ENOMEM and
the toolstack would know immediately.

But that seems silly - as that memory might be in the future used
by other guests and then you won't be able to use said cache. But since
it is a "cache" I guess that is OK.

> 
> > > > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > > > index 77bd365..65e9577 100644
> > > > --- a/tools/libxc/xc_hvm_build_x86.c
> > > > +++ b/tools/libxc/xc_hvm_build_x86.c
> > > > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> > > >  
> > > >      /* try to claim pages for early warning of insufficient memory available */
> > > >      if ( claim_enabled ) {
> > > > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > > > +        unsigned long nr = nr_pages - cur_pages;
> > > > +
> > > > +        if ( pod_mode )
> > > > +            nr = target_pages - 0x20;
> > > 
> > > 0x20?
> > 
> > Yup. From earlier:
> > 
> > 305     if ( pod_mode )
> > 306     {
> > 307         /*
> > 308          * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
> > 309          * adjust the PoD cache size so that domain tot_pages will be
> > 310          * target_pages - 0x20 after this call.
> > 311          */
> > 312         rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
> > 313                                       NULL, NULL, NULL);
> > 314         if ( rc != 0 )
> > 315         {
> > 316             PERROR("Could not set PoD target for HVM guest.\n");
> > 317             goto error_out;
> > 318         }
> > 319     }
> > 
> > Maybe a nice little 'pod_delta' or 'pod_pages' should be used instead of copying
> > this around.
> 
> A #define or something might be nice, yes.
> 
> > 
> > > 
> > > > +
> > > > +        rc = xc_domain_claim_pages(xch, dom, nr);
> > > >          if ( rc != 0 )
> > > >          {
> > > >              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:52         ` Jan Beulich
@ 2014-01-10 16:07           ` Konrad Rzeszutek Wilk
  2014-01-10 16:23             ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 16:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Wei Liu, Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 03:52:08PM +0000, Jan Beulich wrote:
> >>> On 10.01.14 at 16:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
> >> On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> >> > --- a/tools/libxc/xc_hvm_build_x86.c
> >> > +++ b/tools/libxc/xc_hvm_build_x86.c
> >> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> >> >  
> >> >      /* try to claim pages for early warning of insufficient memory available */
> >> >      if ( claim_enabled ) {
> >> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> >> > +        unsigned long nr = nr_pages - cur_pages;
> >> > +
> >> > +        if ( pod_mode )
> >> > +            nr = target_pages - 0x20;
> >> > +
> >> 
> >> Yes it should work because this makes nr smaller than d->tot_pages and
> >> d->max_pages. But according to the comment you pasted above this looks
> >> like wrong fix...
> > 
> > It should be: 
> > 
> > tot_pages = 128MB
> > max_pages = 256MB
> > nr = 256MB - 0x20.
> > 
> > So tot_pages < max_pages > nr && nr > tot_pages
> > 
> > If I got my variables right.
> > Which means that 'nr' is greater than tot_pages but less than max_pages.
> 
> But that seems conceptually wrong: As was said before, the guest
> should only get 128Mb allocated, hence it would be wrong to claim
> almost 256Mb for it.

At the start of the day (that is when the guest starts) it would only have
128MB allocated to it. But before then it needs 256MB (at least that is
based on looking at the code).

I do agree it is seems odd that the cache allocates memory, then
frees it. But I might also be reading the code wrong.

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:48         ` Wei Liu
@ 2014-01-10 16:08           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 16:08 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 03:48:31PM +0000, Wei Liu wrote:
> On Fri, Jan 10, 2014 at 10:41:05AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
> > > On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > > > create ^
> > > > > owner Wei Liu <wei.liu2@citrix.com>
> > > > > thanks
> > > > > 
> > > > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > > > When I have following configuration in HVM config file:
> > > > > >   memory=128
> > > > > >   maxmem=256
> > > > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > > > 
> > > > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > > > 
> > > > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > > > 
> > > > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > > > 
> > > > No. 128MB actually.
> > > > 
> > > 
> > > Huh? My debug message says otherwise. It tried to claim 248MB (256MB -
> > > 8MB video ram). Did I misread your message...
> > 
> > The 'claim' being the hypercall to set the 'clamp' on how much memory
> > the guest can allocate. This is based on:
> > 
> > 242     unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
> > 
> >   /* try to claim pages for early warning of insufficient memory available */
> > 337     if ( claim_enabled ) {
> > 343         rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > 
> > Your 'mem_size' is 128MB, cur_pages is 0xc0, so it ends up 'claiming'
> > that the guest only needs 128MB - 768kB.
> 
> No, the nr_pages I saw was 63296 (256MB - 768KB) -- I printed it out.

Then my patch should have made no difference in your case. Odd.

> 
> Wei.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 16:05           ` Konrad Rzeszutek Wilk
@ 2014-01-10 16:11             ` Ian Campbell
  2014-01-10 16:39               ` Konrad Rzeszutek Wilk
  2014-01-27 12:44             ` George Dunlap
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2014-01-10 16:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel

On Fri, 2014-01-10 at 11:05 -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 03:56:13PM +0000, Ian Campbell wrote:
> > On Fri, 2014-01-10 at 10:28 -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
> > > > On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > > > > And then it is the responsibility of the balloon driver to give the memory
> > > > > back (and this is where the 'static-max' et al come in play to tell the
> > > > > balloon driver to balloon out).
> > > > 
> > > > PoD exists purely so that we don't need the 'maxmem' amount of memory at
> > > > boot time. It is basically there in order to let the guest get booted
> > > > far enough to load the balloon driver to give the memory back.
> > > > 
> > > > It's basically a boot time zero-page sharing mechanism AIUI.
> > > 
> > > But it does look to gulp up hypervisor memory and return it during
> > > allocation of memory for the guest.
> > 
> > It should be less than the maxmem-memory amount though. Perhaps because
> > Wei is using relatively small sizes the pod cache ends up being a
> > similar size to the saving?
> > 
> > Or maybe I just don't understand PoD, since the code you quote does seem
> > to contradict that.
> > 
> > Or maybe libxl's calculation of pod_target is wrong?
> > 
> > > From reading the code the patch seems correct - we will _need_ that
> > > extra 128MB 'claim' to allocate/free the 128MB extra pages. They
> > > are temporary as we do free them.
> > 
> > It does makes sense that the PoD cache should be included in the claim,
> > I just don't get why the cache is so big...
> 
> I think it expands and shrinks to make sure that the memory is present
> in the hypervisor. If there is not enough memory it would -ENOMEM and
> the toolstack would know immediately.
> 
> But that seems silly - as that memory might be in the future used
> by other guests and then you won't be able to use said cache. But since
> it is a "cache" I guess that is OK.

Wait, isn't the "cache" here just the target memory size?

PoD uses up to that size to populate guest pages, and will try to
reclaim zeroed pages from the guest so that it never grows bigger than
the target size.

I think the confusion here is that Wei had target=128M and maxmem=256M
so the difference was 128M which served as a nice red-herring...

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 16:07           ` Konrad Rzeszutek Wilk
@ 2014-01-10 16:23             ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2014-01-10 16:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Wei Liu, Ian Campbell, xen-devel

>>> On 10.01.14 at 17:07, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Fri, Jan 10, 2014 at 03:52:08PM +0000, Jan Beulich wrote:
>> >>> On 10.01.14 at 16:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> > On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
>> >> On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
>> >> > --- a/tools/libxc/xc_hvm_build_x86.c
>> >> > +++ b/tools/libxc/xc_hvm_build_x86.c
>> >> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
>> >> >  
>> >> >      /* try to claim pages for early warning of insufficient memory 
> available */
>> >> >      if ( claim_enabled ) {
>> >> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
>> >> > +        unsigned long nr = nr_pages - cur_pages;
>> >> > +
>> >> > +        if ( pod_mode )
>> >> > +            nr = target_pages - 0x20;
>> >> > +
>> >> 
>> >> Yes it should work because this makes nr smaller than d->tot_pages and
>> >> d->max_pages. But according to the comment you pasted above this looks
>> >> like wrong fix...
>> > 
>> > It should be: 
>> > 
>> > tot_pages = 128MB
>> > max_pages = 256MB
>> > nr = 256MB - 0x20.
>> > 
>> > So tot_pages < max_pages > nr && nr > tot_pages
>> > 
>> > If I got my variables right.
>> > Which means that 'nr' is greater than tot_pages but less than max_pages.
>> 
>> But that seems conceptually wrong: As was said before, the guest
>> should only get 128Mb allocated, hence it would be wrong to claim
>> almost 256Mb for it.
> 
> At the start of the day (that is when the guest starts) it would only have
> 128MB allocated to it. But before then it needs 256MB (at least that is
> based on looking at the code).
> 
> I do agree it is seems odd that the cache allocates memory, then
> frees it. But I might also be reading the code wrong.

I'm afraid you do - Xen would refuse to allocate more than the
current memory target to a domain, i.e. in the example here
more than 128Mb. All of the rest of the guest memory must be
fake (zeroed) pages.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 16:11             ` Ian Campbell
@ 2014-01-10 16:39               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 16:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel

On Fri, Jan 10, 2014 at 04:11:15PM +0000, Ian Campbell wrote:
> On Fri, 2014-01-10 at 11:05 -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 03:56:13PM +0000, Ian Campbell wrote:
> > > On Fri, 2014-01-10 at 10:28 -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
> > > > > On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > > > > > And then it is the responsibility of the balloon driver to give the memory
> > > > > > back (and this is where the 'static-max' et al come in play to tell the
> > > > > > balloon driver to balloon out).
> > > > > 
> > > > > PoD exists purely so that we don't need the 'maxmem' amount of memory at
> > > > > boot time. It is basically there in order to let the guest get booted
> > > > > far enough to load the balloon driver to give the memory back.
> > > > > 
> > > > > It's basically a boot time zero-page sharing mechanism AIUI.
> > > > 
> > > > But it does look to gulp up hypervisor memory and return it during
> > > > allocation of memory for the guest.
> > > 
> > > It should be less than the maxmem-memory amount though. Perhaps because
> > > Wei is using relatively small sizes the pod cache ends up being a
> > > similar size to the saving?
> > > 
> > > Or maybe I just don't understand PoD, since the code you quote does seem
> > > to contradict that.
> > > 
> > > Or maybe libxl's calculation of pod_target is wrong?
> > > 
> > > > From reading the code the patch seems correct - we will _need_ that
> > > > extra 128MB 'claim' to allocate/free the 128MB extra pages. They
> > > > are temporary as we do free them.
> > > 
> > > It does makes sense that the PoD cache should be included in the claim,
> > > I just don't get why the cache is so big...
> > 
> > I think it expands and shrinks to make sure that the memory is present
> > in the hypervisor. If there is not enough memory it would -ENOMEM and
> > the toolstack would know immediately.
> > 
> > But that seems silly - as that memory might be in the future used
> > by other guests and then you won't be able to use said cache. But since
> > it is a "cache" I guess that is OK.
> 
> Wait, isn't the "cache" here just the target memory size?

The delta of it: maxmem - memory.
> 
> PoD uses up to that size to populate guest pages, and will try to
> reclaim zeroed pages from the guest so that it never grows bigger than
> the target size.

The way I read the code it has a split personality:
 - up to 'memory' in the toolstack are allocated.
 - the delta of 'maxmem - memory' are allocated in the hypervisor and
   then freed.

> 
> I think the confusion here is that Wei had target=128M and maxmem=256M
> so the difference was 128M which served as a nice red-herring...

This is confusing indeed.
> 
> Ian
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 16:03         ` Wei Liu
@ 2014-01-10 16:50           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-10 16:50 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 04:03:51PM +0000, Wei Liu wrote:
> On Fri, Jan 10, 2014 at 10:41:05AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 03:10:48PM +0000, Wei Liu wrote:
> > > On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > > > create ^
> > > > > owner Wei Liu <wei.liu2@citrix.com>
> > > > > thanks
> > > > > 
> > > > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > > > When I have following configuration in HVM config file:
> > > > > >   memory=128
> > > > > >   maxmem=256
> > > > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > > > 
> > > > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > > > 
> > > > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > > > 
> > > > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > > > 
> > > > No. 128MB actually.
> > > > 
> > > 
> > > Huh? My debug message says otherwise. It tried to claim 248MB (256MB -
> > > 8MB video ram). Did I misread your message...
> > 
> > The 'claim' being the hypercall to set the 'clamp' on how much memory
> > the guest can allocate. This is based on:
> > 
> > 242     unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
> 
> This is in fact initialized to 'maxmem' in guest's config file and
> 
> 243     unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
> 
> This is in fact 'memory' in guest's config file.
> 
> So when you try to claim "maxmem" and the current limit is "memory" it
> would not work.
> 
> So guest should only claim target_pages sans 0x20 pages if PoD enabled.
> Oh this is what your initial patch did. I don't know whether this is
> conceptually correct though. :-P

Heh.
> 
> Further more, should guest only allow to claim target_pages, regardless

No.
> whether PoD is enabled? When only "memory" is specify, "maxmem"

That is indeed happening at some point. When you modify the 'target_pages'
(so 'xl mem-set' or 'xl mem-max') you will move the ceiling and allow
the guest (via ballooning) to increase or decrease tot_pages.

You don't need the 'claim' at that point as the hypervisor is the
one that deals with many concurrent guests competing for memory.
And it has the proper locking mechanics to tell guests to buzz
off if there is not enough memory.

But keep in mind that the 'claim' (or outstanding pages) is more
of a reservation. Or a lock. Or a stick in the ground.
It says: "To allocate this guest I need X pages' - and if you cannot
guarantee that amount then -ENOMEM right away. Which it did.

And said 'X' pages is incorrect for PoD guests. The patch I posted
sets the ceiling to the 'maxmem'.

Pls also note that the claim hypercall, or reservation, is cancelled
right after the guests' memory has been allocated:

530     /* ensure no unclaimed pages are left unused */                             
531     xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);            

It is a very short lived 'lock' on the memory - all contained
within 'setup_guest' for HVM and 'arch_setup_meminit' for PV.


> equals to "memory". So conceptually what we really care about is
> "memory" not "maxmem".

Uh, at the start of the life of the guest - sure. During its
build-up - well, we seem to have a spike of memory for the
PoD to allocate and free memory.

The time-flow seem to be:

 memory ... maxmem ... memory.. [start of guest]


That actually seems a bit silly - we could as well just check
how much free memory the hypervisor has and return 'ENOMEM'
if it does not have enough. But I am very likely mis-reading the
early setup of the PoD code or misunderstanding the implications
of PoD allocating its cache and freeing it.

> 
> Wei.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 15:28       ` Konrad Rzeszutek Wilk
  2014-01-10 15:56         ` Ian Campbell
@ 2014-01-10 19:07         ` Tim Deegan
  1 sibling, 0 replies; 27+ messages in thread
From: Tim Deegan @ 2014-01-10 19:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: George Dunlap, Wei Liu, Ian Jackson, Ian Campbell, xen-devel

At 10:28 -0500 on 10 Jan (1389346120), Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
> > On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > > create ^
> > > > owner Wei Liu <wei.liu2@citrix.com>
> > > > thanks
> > > > 
> > > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > > When I have following configuration in HVM config file:
> > > > >   memory=128
> > > > >   maxmem=256
> > > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > > 
> > > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > > 
> > > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > > 
> > > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > > 
> > > No. 128MB actually.
> > > 
> > > > that you only have 128-255M free is quite low, or are you
> > > > autoballooning?)
> > > 
> > > This patch fixes it for me. It basically sets the amount of pages
> > > claimed to be 'maxmem' instead of 'memory' for PoD.
> > > 
> > > I don't know PoD very well,
> > 
> > Me neither, this might have to wait for George to get back.
> 
> <nods>
> > 
> > We should also consider flipping the default claim setting to off in xl
> > for 4.4, since that is likely to be a lower impact change than fixing
> > the issue (and one which we all understand!).
> 
> <unwraps the Xen 4.4 duct-tape roll>
> 
> > 
> > >  and this claim is only valid during the
> > > allocation of the guests memory - so the 'target_pages' value might be
> > > the wrong one. However looking at the hypervisor's
> > > 'p2m_pod_set_mem_target' I see this comment:
> > > 
> > >  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
> > >  317  *   entries.  The balloon driver will deflate the balloon to give back
> > >  318  *   the remainder of the ram to the guest OS.
> > > 
> > > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > > And then it is the responsibility of the balloon driver to give the memory
> > > back (and this is where the 'static-max' et al come in play to tell the
> > > balloon driver to balloon out).
> > 
> > PoD exists purely so that we don't need the 'maxmem' amount of memory at
> > boot time. It is basically there in order to let the guest get booted
> > far enough to load the balloon driver to give the memory back.
> > 
> > It's basically a boot time zero-page sharing mechanism AIUI.
> 
> But it does look to gulp up hypervisor memory and return it during
> allocation of memory for the guest.
> 
> Digging in the hypervisor I see in 'p2m_pod_set_cache_target' (where
> pod_target is for this case maxmem - memory):
> 
> And pod.count is zero, so for Wei's case it would be 128MB.
> 
>  216     /* Increasing the target */
>  217     while ( pod_target > p2m->pod.count )
>  218     {
>  222         if ( (pod_target - p2m->pod.count) >= SUPERPAGE_PAGES )
>  223             order = PAGE_ORDER_2M;
>  224         else
>  225             order = PAGE_ORDER_4K;
>  226     retry:
>  227         page = alloc_domheap_pages(d, order, PAGE_ORDER_4K);
> 
> So allocate 64 2MB pages
> 
>  243         p2m_pod_cache_add(p2m, page, order);
> 
> Add to a list
> 
> 251 
>  252     /* Decreasing the target */
>  253     /* We hold the pod lock here, so we don't need to worry about
>  254      * cache disappearing under our feet. */
>  255     while ( pod_target < p2m->pod.count )
>  256     {
> ..
>  266         page = p2m_pod_cache_get(p2m, order);
> 
> Get the page (from the list)
> ..
>  287             put_page(page+i);
> 
> And then free it.

Er, it will do only one of those things, right?  If the requested
target is larger than the current pool, it allocates memory.  If the
requested target is smaller than the current pool, it frees memory. 
So starting from an empty pool, it will allocate pod_target pages and
free none of them.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 14:58   ` Konrad Rzeszutek Wilk
  2014-01-10 15:10     ` Wei Liu
  2014-01-10 15:16     ` Ian Campbell
@ 2014-01-20 16:29     ` Wei Liu
  2014-01-21 21:57       ` Konrad Rzeszutek Wilk
  2014-01-27 14:54       ` George Dunlap
  2 siblings, 2 replies; 27+ messages in thread
From: Wei Liu @ 2014-01-20 16:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Wei Liu, Ian Campbell, xen-devel

On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > create ^
> > owner Wei Liu <wei.liu2@citrix.com>
> > thanks
> > 
> > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > When I have following configuration in HVM config file:
> > >   memory=128
> > >   maxmem=256
> > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > 
> > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > 
> > > With claim_mode=0, I can sucessfuly create HVM guest.
> > 
> > Is it trying to claim 256M instead of 128M? (although the likelyhood
> 
> No. 128MB actually.
> 
> > that you only have 128-255M free is quite low, or are you
> > autoballooning?)
> 
> This patch fixes it for me. It basically sets the amount of pages
> claimed to be 'maxmem' instead of 'memory' for PoD.
> 
> I don't know PoD very well, and this claim is only valid during the
> allocation of the guests memory - so the 'target_pages' value might be
> the wrong one. However looking at the hypervisor's
> 'p2m_pod_set_mem_target' I see this comment:
> 
>  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
>  317  *   entries.  The balloon driver will deflate the balloon to give back
>  318  *   the remainder of the ram to the guest OS.
> 
> Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> And then it is the responsibility of the balloon driver to give the memory
> back (and this is where the 'static-max' et al come in play to tell the
> balloon driver to balloon out).
> 
> 
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 77bd365..65e9577 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
>  
>      /* try to claim pages for early warning of insufficient memory available */
>      if ( claim_enabled ) {
> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> +        unsigned long nr = nr_pages - cur_pages;
> +
> +        if ( pod_mode )
> +            nr = target_pages - 0x20;
> +

I'm a bit confused, did this work for you? At this point d->tot_pages
should be (target_pages - 0x20). However in the hypervisor logic if you
try to claim the exact amount of pages as d->tot_pages it should return
EINVAL.

Furthur more, the original logic doesn't look right. In PV guest
creation, xc tries to claim "memory=" pages, while in HVM guest creation
it tries to claim "maxmem=" pages. I think HVM code is wrong.

And George shed me some light on PoD this morning, "cache" in PoD should
be the pool of pages that used to populate into guest physical memory.
In that sense it should be the size of mem_target ("memory=").

So I come up with a fix like this. Any idea?

Wei.

---8<---
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 77bd365..472f1df 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -49,6 +49,8 @@
 #define NR_SPECIAL_PAGES     8
 #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
 
+#define POD_VGA_HOLE_SIZE (0x20)
+
 static int modules_init(struct xc_hvm_build_args *args,
                         uint64_t vend, struct elf_binary *elf,
                         uint64_t *mstart_out, uint64_t *mend_out)
@@ -305,11 +307,13 @@ static int setup_guest(xc_interface *xch,
     if ( pod_mode )
     {
         /*
-         * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
-         * adjust the PoD cache size so that domain tot_pages will be
-         * target_pages - 0x20 after this call.
+         * Subtract POD_VGA_HOLE_SIZE from target_pages for the VGA
+         * "hole".  Xen will adjust the PoD cache size so that domain
+         * tot_pages will be target_pages - POD_VGA_HOLE_SIZE after
+         * this call.
          */
-        rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
+        rc = xc_domain_set_pod_target(xch, dom,
+                                      target_pages - POD_VGA_HOLE_SIZE,
                                       NULL, NULL, NULL);
         if ( rc != 0 )
         {
@@ -335,7 +339,12 @@ static int setup_guest(xc_interface *xch,
 
     /* try to claim pages for early warning of insufficient memory available */
     if ( claim_enabled ) {
-        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
+        unsigned long nr = target_pages;
+
+        if ( pod_mode )
+            nr -= POD_VGA_HOLE_SIZE;
+
+        rc = xc_domain_claim_pages(xch, dom, nr);
         if ( rc != 0 )
         {
             PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5f484a2..1e44ba3 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -339,8 +339,8 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
         goto out;
     }
 
-    /* disallow a claim not exceeding current tot_pages or above max_pages */
-    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
+    /* disallow a claim below current tot_pages or above max_pages */
+    if ( (pages < d->tot_pages) || (pages > d->max_pages) )
     {
         ret = -EINVAL;
         goto out;



> +        rc = xc_domain_claim_pages(xch, dom, nr);
>          if ( rc != 0 )
>          {
>              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-20 16:29     ` Wei Liu
@ 2014-01-21 21:57       ` Konrad Rzeszutek Wilk
  2014-01-27 14:54       ` George Dunlap
  1 sibling, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-21 21:57 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel

On Mon, Jan 20, 2014 at 04:29:43PM +0000, Wei Liu wrote:
> On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > create ^
> > > owner Wei Liu <wei.liu2@citrix.com>
> > > thanks
> > > 
> > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > When I have following configuration in HVM config file:
> > > >   memory=128
> > > >   maxmem=256
> > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > 
> > > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
> > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
> > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
> > > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
> > > > 
> > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > 
> > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > 
> > No. 128MB actually.
> > 
> > > that you only have 128-255M free is quite low, or are you
> > > autoballooning?)
> > 
> > This patch fixes it for me. It basically sets the amount of pages
> > claimed to be 'maxmem' instead of 'memory' for PoD.
> > 
> > I don't know PoD very well, and this claim is only valid during the
> > allocation of the guests memory - so the 'target_pages' value might be
> > the wrong one. However looking at the hypervisor's
> > 'p2m_pod_set_mem_target' I see this comment:
> > 
> >  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
> >  317  *   entries.  The balloon driver will deflate the balloon to give back
> >  318  *   the remainder of the ram to the guest OS.
> > 
> > Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
> > And then it is the responsibility of the balloon driver to give the memory
> > back (and this is where the 'static-max' et al come in play to tell the
> > balloon driver to balloon out).
> > 
> > 
> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > index 77bd365..65e9577 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> >  
> >      /* try to claim pages for early warning of insufficient memory available */
> >      if ( claim_enabled ) {
> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > +        unsigned long nr = nr_pages - cur_pages;
> > +
> > +        if ( pod_mode )
> > +            nr = target_pages - 0x20;
> > +
> 
> I'm a bit confused, did this work for you? At this point d->tot_pages
> should be (target_pages - 0x20). However in the hypervisor logic if you
> try to claim the exact amount of pages as d->tot_pages it should return
> EINVAL.
> 
> Furthur more, the original logic doesn't look right. In PV guest
> creation, xc tries to claim "memory=" pages, while in HVM guest creation
> it tries to claim "maxmem=" pages. I think HVM code is wrong.
> 
> And George shed me some light on PoD this morning, "cache" in PoD should
> be the pool of pages that used to populate into guest physical memory.
> In that sense it should be the size of mem_target ("memory=").
> 
> So I come up with a fix like this. Any idea?

The patch I came up didn't work. It did work the first time but I think
that is because I had 'claim=0' in the xl.conf and forgot about it (sigh).

After a bit of rebooting I realized that the patch didn't do the right
job. The one way it _did_ work was to claim the amount of pages
that the guest had allocated at this moment. That is for the PoD
path the 'xc_domain_set_pod_target' (which is called before the
claim call) ends up allocating memory.

So the claim 'clamp' was done past that moment (which is not good).

I will try out your patch later this week and report. Thanks!

> 
> Wei.
> 
> ---8<---
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 77bd365..472f1df 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -49,6 +49,8 @@
>  #define NR_SPECIAL_PAGES     8
>  #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
>  
> +#define POD_VGA_HOLE_SIZE (0x20)
> +
>  static int modules_init(struct xc_hvm_build_args *args,
>                          uint64_t vend, struct elf_binary *elf,
>                          uint64_t *mstart_out, uint64_t *mend_out)
> @@ -305,11 +307,13 @@ static int setup_guest(xc_interface *xch,
>      if ( pod_mode )
>      {
>          /*
> -         * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
> -         * adjust the PoD cache size so that domain tot_pages will be
> -         * target_pages - 0x20 after this call.
> +         * Subtract POD_VGA_HOLE_SIZE from target_pages for the VGA
> +         * "hole".  Xen will adjust the PoD cache size so that domain
> +         * tot_pages will be target_pages - POD_VGA_HOLE_SIZE after
> +         * this call.
>           */
> -        rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
> +        rc = xc_domain_set_pod_target(xch, dom,
> +                                      target_pages - POD_VGA_HOLE_SIZE,
>                                        NULL, NULL, NULL);
>          if ( rc != 0 )
>          {
> @@ -335,7 +339,12 @@ static int setup_guest(xc_interface *xch,
>  
>      /* try to claim pages for early warning of insufficient memory available */
>      if ( claim_enabled ) {
> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> +        unsigned long nr = target_pages;
> +
> +        if ( pod_mode )
> +            nr -= POD_VGA_HOLE_SIZE;
> +
> +        rc = xc_domain_claim_pages(xch, dom, nr);
>          if ( rc != 0 )
>          {
>              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 5f484a2..1e44ba3 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -339,8 +339,8 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
>          goto out;
>      }
>  
> -    /* disallow a claim not exceeding current tot_pages or above max_pages */
> -    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
> +    /* disallow a claim below current tot_pages or above max_pages */
> +    if ( (pages < d->tot_pages) || (pages > d->max_pages) )
>      {
>          ret = -EINVAL;
>          goto out;
> 
> 
> 
> > +        rc = xc_domain_claim_pages(xch, dom, nr);
> >          if ( rc != 0 )
> >          {
> >              PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-10 16:05           ` Konrad Rzeszutek Wilk
  2014-01-10 16:11             ` Ian Campbell
@ 2014-01-27 12:44             ` George Dunlap
  1 sibling, 0 replies; 27+ messages in thread
From: George Dunlap @ 2014-01-27 12:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Ian Campbell; +Cc: Ian Jackson, Wei Liu, xen-devel

On 01/10/2014 04:05 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 10, 2014 at 03:56:13PM +0000, Ian Campbell wrote:
>> On Fri, 2014-01-10 at 10:28 -0500, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
>>>> On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
>>>>> Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
>>>>> And then it is the responsibility of the balloon driver to give the memory
>>>>> back (and this is where the 'static-max' et al come in play to tell the
>>>>> balloon driver to balloon out).
>>>> PoD exists purely so that we don't need the 'maxmem' amount of memory at
>>>> boot time. It is basically there in order to let the guest get booted
>>>> far enough to load the balloon driver to give the memory back.
>>>>
>>>> It's basically a boot time zero-page sharing mechanism AIUI.
>>> But it does look to gulp up hypervisor memory and return it during
>>> allocation of memory for the guest.
>> It should be less than the maxmem-memory amount though. Perhaps because
>> Wei is using relatively small sizes the pod cache ends up being a
>> similar size to the saving?
>>
>> Or maybe I just don't understand PoD, since the code you quote does seem
>> to contradict that.
>>
>> Or maybe libxl's calculation of pod_target is wrong?
>>
>>>  From reading the code the patch seems correct - we will _need_ that
>>> extra 128MB 'claim' to allocate/free the 128MB extra pages. They
>>> are temporary as we do free them.
>> It does makes sense that the PoD cache should be included in the claim,
>> I just don't get why the cache is so big...
> I think it expands and shrinks to make sure that the memory is present
> in the hypervisor. If there is not enough memory it would -ENOMEM and
> the toolstack would know immediately.
>
> But that seems silly - as that memory might be in the future used
> by other guests and then you won't be able to use said cache. But since
> it is a "cache" I guess that is OK.

Sorry, "cache" is a bit of a misnomer.  It really should be "pool".

The basic idea is to allocate memory to the guest *without assigning it 
to specific p2m entries*.  Then the PoD mechanism will shuffle the 
memory around behind the p2m entries as needed until the balloon driver 
comes up.

In the simple case, memory should only ever be allocated, not freed; for 
example:
* Admin sets target=1GiB, maxmem=2GiB
* Domain creation:
  - makes 2GiB of p2m, filling it with PoD entries rather than memory
  - allocates 1GiB of ram for the PoD "cache"
* PoD shuffles memory around to allow guest to boot
* Balloon driver comes up, and balloons down to target.
  - In theory, at this point #PoD entries in p2m == #pages in the "cache"

The basic complication comes in that there is no point at which we can 
be *certain* that all PoD entries have been eliminated.  If the guest 
just doesn't touch some of its memory, there may be PoD entries 
outstanding (with corresponding memory in the "cache") indefinitely.  
Also, the admin may want to be able to change the target before the 
balloon driver hits it.    So every time you change the target, you need 
to tell the PoD code that's what you're doing; it has carefully 
thought-out logic inside it to free or allocate more memory appropriately.

For example, in the example above, while the balloon driver has only 
ballooned down to 1.2 GiB, the admin may want to set the target to 
1.5GiB.  In that case, the PoD code would allocate an additional 0.2GiB 
(to cover the outstanding 0.2GiB of PoD entries in the p2m).

Anyway, if I understand correctly, the problem was that the memory 
allocated to the PoD "cache" wasn't being counted in the claim mode.  
That's the basic problem: memory in the PoD "cache" should be considered 
basically the same as memory in the p2m table for claim purposes.

  -George

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-20 16:29     ` Wei Liu
  2014-01-21 21:57       ` Konrad Rzeszutek Wilk
@ 2014-01-27 14:54       ` George Dunlap
  2014-01-27 16:14         ` Wei Liu
  1 sibling, 1 reply; 27+ messages in thread
From: George Dunlap @ 2014-01-27 14:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Campbell

On Mon, Jan 20, 2014 at 4:29 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
>> > create ^
>> > owner Wei Liu <wei.liu2@citrix.com>
>> > thanks
>> >
>> > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
>> > > When I have following configuration in HVM config file:
>> > >   memory=128
>> > >   maxmem=256
>> > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
>> > >
>> > > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (22 = Invalid argument): Internal error
>> > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
>> > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot (re-)build domain: -3
>> > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device model pid in /local/domain/82/image/device-model-pid
>> > > libxl: error: libxl.c:1425:libxl__destroy_domid: libxl__destroy_device_model failed for 82
>> > >
>> > > With claim_mode=0, I can sucessfuly create HVM guest.
>> >
>> > Is it trying to claim 256M instead of 128M? (although the likelyhood
>>
>> No. 128MB actually.
>>
>> > that you only have 128-255M free is quite low, or are you
>> > autoballooning?)
>>
>> This patch fixes it for me. It basically sets the amount of pages
>> claimed to be 'maxmem' instead of 'memory' for PoD.
>>
>> I don't know PoD very well, and this claim is only valid during the
>> allocation of the guests memory - so the 'target_pages' value might be
>> the wrong one. However looking at the hypervisor's
>> 'p2m_pod_set_mem_target' I see this comment:
>>
>>  316  *     B <T': Set the PoD cache size equal to the number of outstanding PoD
>>  317  *   entries.  The balloon driver will deflate the balloon to give back
>>  318  *   the remainder of the ram to the guest OS.
>>
>> Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
>> And then it is the responsibility of the balloon driver to give the memory
>> back (and this is where the 'static-max' et al come in play to tell the
>> balloon driver to balloon out).
>>
>>
>> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>> index 77bd365..65e9577 100644
>> --- a/tools/libxc/xc_hvm_build_x86.c
>> +++ b/tools/libxc/xc_hvm_build_x86.c
>> @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
>>
>>      /* try to claim pages for early warning of insufficient memory available */
>>      if ( claim_enabled ) {
>> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
>> +        unsigned long nr = nr_pages - cur_pages;
>> +
>> +        if ( pod_mode )
>> +            nr = target_pages - 0x20;
>> +
>
> I'm a bit confused, did this work for you? At this point d->tot_pages
> should be (target_pages - 0x20). However in the hypervisor logic if you
> try to claim the exact amount of pages as d->tot_pages it should return
> EINVAL.
>
> Furthur more, the original logic doesn't look right. In PV guest
> creation, xc tries to claim "memory=" pages, while in HVM guest creation
> it tries to claim "maxmem=" pages. I think HVM code is wrong.
>
> And George shed me some light on PoD this morning, "cache" in PoD should
> be the pool of pages that used to populate into guest physical memory.
> In that sense it should be the size of mem_target ("memory=").
>
> So I come up with a fix like this. Any idea?
>
> Wei.
>
> ---8<---
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 77bd365..472f1df 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -49,6 +49,8 @@
>  #define NR_SPECIAL_PAGES     8
>  #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
>
> +#define POD_VGA_HOLE_SIZE (0x20)
> +
>  static int modules_init(struct xc_hvm_build_args *args,
>                          uint64_t vend, struct elf_binary *elf,
>                          uint64_t *mstart_out, uint64_t *mend_out)
> @@ -305,11 +307,13 @@ static int setup_guest(xc_interface *xch,
>      if ( pod_mode )
>      {
>          /*
> -         * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
> -         * adjust the PoD cache size so that domain tot_pages will be
> -         * target_pages - 0x20 after this call.
> +         * Subtract POD_VGA_HOLE_SIZE from target_pages for the VGA
> +         * "hole".  Xen will adjust the PoD cache size so that domain
> +         * tot_pages will be target_pages - POD_VGA_HOLE_SIZE after
> +         * this call.
>           */
> -        rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
> +        rc = xc_domain_set_pod_target(xch, dom,
> +                                      target_pages - POD_VGA_HOLE_SIZE,
>                                        NULL, NULL, NULL);
>          if ( rc != 0 )
>          {
> @@ -335,7 +339,12 @@ static int setup_guest(xc_interface *xch,
>
>      /* try to claim pages for early warning of insufficient memory available */
>      if ( claim_enabled ) {
> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> +        unsigned long nr = target_pages;
> +
> +        if ( pod_mode )
> +            nr -= POD_VGA_HOLE_SIZE;
> +
> +        rc = xc_domain_claim_pages(xch, dom, nr);

Two things:

1. This is broken because it doesn't claim pages for the PoD "cache".
The PoD "cache" amounts to *all the pages that the domain will have
allocated* -- there will be basically no pages allocated after this
point.

Claim mode is trying to make creation of large guests fail early or be
guaranteed to succeed.  For large guests, it's set_pod_target() that
may take the long time, and it's there that things will fail if
there's not enough memory.  By the time you get to actually setting up
the p2m, you've already made it.

2. I think the VGA_HOLE doesn't have anything to do with PoD.

Actually, it looks like the original code was wrong: correct me if I'm
wrong, but xc_domain_claim_pages() wants the *total number of pages*,
whereas nr_pages-cur_pages would give you the *number of additional
pages*.

I think you need to claim nr_pages-VGA_HOLE_SIZE regardless of whether
you're in PoD mode or not.  The initial code would claim 0xa0 pages
too few; the new code will claim 0x20 pages too many in the non-PoD
case.

> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 5f484a2..1e44ba3 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -339,8 +339,8 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
>          goto out;
>      }
>
> -    /* disallow a claim not exceeding current tot_pages or above max_pages */
> -    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
> +    /* disallow a claim below current tot_pages or above max_pages */
> +    if ( (pages < d->tot_pages) || (pages > d->max_pages) )
>      {

This seems like a good interface change in any case -- there's no
particular reason not to allow multiple calls with the same claim
amount.  (Interface-wise, I don't see a good reason we couldn't allow
the claim to be reduced as well; but that's probably more than we want
to do at this point.)

So it seems like we should at least make the two fixes above:
* Use nr_pages-VGA_HOLE_SIZE for the claim, regardless of whether PoD is enabled
* Allow a claim equal to tot_pages

That will allow PoD guests to boot with claim mode enabled, although
it will effectively be a noop.

The next question is whether we should try to make claim mode actually
do the claim properly for PoD mode for 4.4.  It seems like just moving
the claim call up before the xc_domain_set_target() call should work;
I'm inclined to say that if that works, we should just do it.

Thoughts?

 -George

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-27 14:54       ` George Dunlap
@ 2014-01-27 16:14         ` Wei Liu
  2014-01-27 17:33           ` George Dunlap
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Liu @ 2014-01-27 16:14 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Wei Liu, Ian Campbell

On Mon, Jan 27, 2014 at 02:54:40PM +0000, George Dunlap wrote:
[...]
> > ---8<---
> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > index 77bd365..472f1df 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -49,6 +49,8 @@
> >  #define NR_SPECIAL_PAGES     8
> >  #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
> >
> > +#define POD_VGA_HOLE_SIZE (0x20)
> > +
> >  static int modules_init(struct xc_hvm_build_args *args,
> >                          uint64_t vend, struct elf_binary *elf,
> >                          uint64_t *mstart_out, uint64_t *mend_out)
> > @@ -305,11 +307,13 @@ static int setup_guest(xc_interface *xch,
> >      if ( pod_mode )
> >      {
> >          /*
> > -         * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
> > -         * adjust the PoD cache size so that domain tot_pages will be
> > -         * target_pages - 0x20 after this call.
> > +         * Subtract POD_VGA_HOLE_SIZE from target_pages for the VGA
> > +         * "hole".  Xen will adjust the PoD cache size so that domain
> > +         * tot_pages will be target_pages - POD_VGA_HOLE_SIZE after
> > +         * this call.
> >           */
> > -        rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
> > +        rc = xc_domain_set_pod_target(xch, dom,
> > +                                      target_pages - POD_VGA_HOLE_SIZE,
> >                                        NULL, NULL, NULL);
> >          if ( rc != 0 )
> >          {
> > @@ -335,7 +339,12 @@ static int setup_guest(xc_interface *xch,
> >
> >      /* try to claim pages for early warning of insufficient memory available */
> >      if ( claim_enabled ) {
> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > +        unsigned long nr = target_pages;
> > +
> > +        if ( pod_mode )
> > +            nr -= POD_VGA_HOLE_SIZE;
> > +
> > +        rc = xc_domain_claim_pages(xch, dom, nr);
> 
> Two things:
> 
> 1. This is broken because it doesn't claim pages for the PoD "cache".
> The PoD "cache" amounts to *all the pages that the domain will have
> allocated* -- there will be basically no pages allocated after this
> point.
> 
> Claim mode is trying to make creation of large guests fail early or be
> guaranteed to succeed.  For large guests, it's set_pod_target() that
> may take the long time, and it's there that things will fail if
> there's not enough memory.  By the time you get to actually setting up
> the p2m, you've already made it.
> 
> 2. I think the VGA_HOLE doesn't have anything to do with PoD.
> 
> Actually, it looks like the original code was wrong: correct me if I'm
> wrong, but xc_domain_claim_pages() wants the *total number of pages*,
> whereas nr_pages-cur_pages would give you the *number of additional
> pages*.
> 
> I think you need to claim nr_pages-VGA_HOLE_SIZE regardless of whether
> you're in PoD mode or not.  The initial code would claim 0xa0 pages
> too few; the new code will claim 0x20 pages too many in the non-PoD
> case.
> 
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > index 5f484a2..1e44ba3 100644
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -339,8 +339,8 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
> >          goto out;
> >      }
> >
> > -    /* disallow a claim not exceeding current tot_pages or above max_pages */
> > -    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
> > +    /* disallow a claim below current tot_pages or above max_pages */
> > +    if ( (pages < d->tot_pages) || (pages > d->max_pages) )
> >      {
> 
> This seems like a good interface change in any case -- there's no
> particular reason not to allow multiple calls with the same claim
> amount.  (Interface-wise, I don't see a good reason we couldn't allow
> the claim to be reduced as well; but that's probably more than we want
> to do at this point.)
> 
> So it seems like we should at least make the two fixes above:
> * Use nr_pages-VGA_HOLE_SIZE for the claim, regardless of whether PoD is enabled

You mean target_pages here, right? nr_pages is maxmem= while target_pages
is memory=. And from the face-to-face discussion we had this morning I
had the impression you meant target_pages.

In fact using nr_pages won't work because at that point d->max_pages is
capped to target_memory in the hypervisor.

> * Allow a claim equal to tot_pages
> 
> That will allow PoD guests to boot with claim mode enabled, although
> it will effectively be a noop.
> 
> The next question is whether we should try to make claim mode actually
> do the claim properly for PoD mode for 4.4.  It seems like just moving
> the claim call up before the xc_domain_set_target() call should work;
> I'm inclined to say that if that works, we should just do it.
> 

Agreed.

Wei.

> Thoughts?
> 
>  -George

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Claim mode and HVM PoD interact badly
  2014-01-27 16:14         ` Wei Liu
@ 2014-01-27 17:33           ` George Dunlap
  0 siblings, 0 replies; 27+ messages in thread
From: George Dunlap @ 2014-01-27 17:33 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Campbell

On Mon, Jan 27, 2014 at 4:14 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Mon, Jan 27, 2014 at 02:54:40PM +0000, George Dunlap wrote:
> [...]
>> > ---8<---
>> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>> > index 77bd365..472f1df 100644
>> > --- a/tools/libxc/xc_hvm_build_x86.c
>> > +++ b/tools/libxc/xc_hvm_build_x86.c
>> > @@ -49,6 +49,8 @@
>> >  #define NR_SPECIAL_PAGES     8
>> >  #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
>> >
>> > +#define POD_VGA_HOLE_SIZE (0x20)
>> > +
>> >  static int modules_init(struct xc_hvm_build_args *args,
>> >                          uint64_t vend, struct elf_binary *elf,
>> >                          uint64_t *mstart_out, uint64_t *mend_out)
>> > @@ -305,11 +307,13 @@ static int setup_guest(xc_interface *xch,
>> >      if ( pod_mode )
>> >      {
>> >          /*
>> > -         * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
>> > -         * adjust the PoD cache size so that domain tot_pages will be
>> > -         * target_pages - 0x20 after this call.
>> > +         * Subtract POD_VGA_HOLE_SIZE from target_pages for the VGA
>> > +         * "hole".  Xen will adjust the PoD cache size so that domain
>> > +         * tot_pages will be target_pages - POD_VGA_HOLE_SIZE after
>> > +         * this call.
>> >           */
>> > -        rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
>> > +        rc = xc_domain_set_pod_target(xch, dom,
>> > +                                      target_pages - POD_VGA_HOLE_SIZE,
>> >                                        NULL, NULL, NULL);
>> >          if ( rc != 0 )
>> >          {
>> > @@ -335,7 +339,12 @@ static int setup_guest(xc_interface *xch,
>> >
>> >      /* try to claim pages for early warning of insufficient memory available */
>> >      if ( claim_enabled ) {
>> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
>> > +        unsigned long nr = target_pages;
>> > +
>> > +        if ( pod_mode )
>> > +            nr -= POD_VGA_HOLE_SIZE;
>> > +
>> > +        rc = xc_domain_claim_pages(xch, dom, nr);
>>
>> Two things:
>>
>> 1. This is broken because it doesn't claim pages for the PoD "cache".
>> The PoD "cache" amounts to *all the pages that the domain will have
>> allocated* -- there will be basically no pages allocated after this
>> point.
>>
>> Claim mode is trying to make creation of large guests fail early or be
>> guaranteed to succeed.  For large guests, it's set_pod_target() that
>> may take the long time, and it's there that things will fail if
>> there's not enough memory.  By the time you get to actually setting up
>> the p2m, you've already made it.
>>
>> 2. I think the VGA_HOLE doesn't have anything to do with PoD.
>>
>> Actually, it looks like the original code was wrong: correct me if I'm
>> wrong, but xc_domain_claim_pages() wants the *total number of pages*,
>> whereas nr_pages-cur_pages would give you the *number of additional
>> pages*.
>>
>> I think you need to claim nr_pages-VGA_HOLE_SIZE regardless of whether
>> you're in PoD mode or not.  The initial code would claim 0xa0 pages
>> too few; the new code will claim 0x20 pages too many in the non-PoD
>> case.
>>
>> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> > index 5f484a2..1e44ba3 100644
>> > --- a/xen/common/page_alloc.c
>> > +++ b/xen/common/page_alloc.c
>> > @@ -339,8 +339,8 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
>> >          goto out;
>> >      }
>> >
>> > -    /* disallow a claim not exceeding current tot_pages or above max_pages */
>> > -    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
>> > +    /* disallow a claim below current tot_pages or above max_pages */
>> > +    if ( (pages < d->tot_pages) || (pages > d->max_pages) )
>> >      {
>>
>> This seems like a good interface change in any case -- there's no
>> particular reason not to allow multiple calls with the same claim
>> amount.  (Interface-wise, I don't see a good reason we couldn't allow
>> the claim to be reduced as well; but that's probably more than we want
>> to do at this point.)
>>
>> So it seems like we should at least make the two fixes above:
>> * Use nr_pages-VGA_HOLE_SIZE for the claim, regardless of whether PoD is enabled
>
> You mean target_pages here, right? nr_pages is maxmem= while target_pages
> is memory=. And from the face-to-face discussion we had this morning I
> had the impression you meant target_pages.
>
> In fact using nr_pages won't work because at that point d->max_pages is
> capped to target_memory in the hypervisor.

Dur, yes of course.

 -George

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-01-27 17:33 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-10 11:56 Claim mode and HVM PoD interact badly Wei Liu
2014-01-10 11:59 ` Ian Campbell
2014-01-10 12:15   ` Processed: " xen
2014-01-10 14:58   ` Konrad Rzeszutek Wilk
2014-01-10 15:10     ` Wei Liu
2014-01-10 15:41       ` Konrad Rzeszutek Wilk
2014-01-10 15:48         ` Wei Liu
2014-01-10 16:08           ` Konrad Rzeszutek Wilk
2014-01-10 15:52         ` Jan Beulich
2014-01-10 16:07           ` Konrad Rzeszutek Wilk
2014-01-10 16:23             ` Jan Beulich
2014-01-10 16:03         ` Wei Liu
2014-01-10 16:50           ` Konrad Rzeszutek Wilk
2014-01-10 15:16     ` Ian Campbell
2014-01-10 15:19       ` Wei Liu
2014-01-10 15:28       ` Konrad Rzeszutek Wilk
2014-01-10 15:56         ` Ian Campbell
2014-01-10 16:05           ` Konrad Rzeszutek Wilk
2014-01-10 16:11             ` Ian Campbell
2014-01-10 16:39               ` Konrad Rzeszutek Wilk
2014-01-27 12:44             ` George Dunlap
2014-01-10 19:07         ` Tim Deegan
2014-01-20 16:29     ` Wei Liu
2014-01-21 21:57       ` Konrad Rzeszutek Wilk
2014-01-27 14:54       ` George Dunlap
2014-01-27 16:14         ` Wei Liu
2014-01-27 17:33           ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.