All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] libxl: Wait for ballooning if free memory is increasing
@ 2015-01-30 21:01 Mike Latimer
  2015-02-02 14:35 ` Ian Campbell
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Latimer @ 2015-01-30 21:01 UTC (permalink / raw)
  To: xen-devel
  Cc: george.dunlap, wei.liu2, Ian.Jackson, Ian.Campbell, Stefano.Stabellini

During domain startup, all required memory ballooning must complete
within a maximum window of 33 seconds (3 retries, 11 seconds of delay).
If not, domain creation is aborted with a 'failed to free memory' error.

In order to accommodate large domains or slower hardware (which require
substantially longer to balloon memory) the free memory process should
continue retrying if the amount of free memory is increasing on each
iteration of the loop.

---
 tools/libxl/xl_cmdimpl.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 0b02a6c..9ff3c4f 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2194,8 +2194,9 @@ static int preserve_domain(uint32_t *r_domid, libxl_event *event,
 
 static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
 {
-    int rc, retries = 3;
-    uint32_t need_memkb, free_memkb;
+    int rc, retries;
+    const int MAX_RETRIES = 3;
+    uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
 
     if (!autoballoon)
         return 0;
@@ -2204,6 +2205,7 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
     if (rc < 0)
         return rc;
 
+    retries = MAX_RETRIES;
     do {
         rc = libxl_get_free_memory(ctx, &free_memkb);
         if (rc < 0)
@@ -2228,7 +2230,16 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
         if (rc < 0)
             return rc;
 
-        retries--;
+        /*
+         * If the amount of free mem has increased on this iteration (i.e.
+         * some progress has been made) then reset the retry counter.
+         */
+        if (free_memkb > free_memkb_prev) {
+            retries = MAX_RETRIES;
+            free_memkb_prev = free_memkb;
+        } else {
+            retries--;
+        }
     } while (retries > 0);
 
     return ERROR_NOMEM;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] libxl: Wait for ballooning if free memory is increasing
  2015-01-30 21:01 [PATCH v3] libxl: Wait for ballooning if free memory is increasing Mike Latimer
@ 2015-02-02 14:35 ` Ian Campbell
  2015-02-02 15:17   ` Mike Latimer
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2015-02-02 14:35 UTC (permalink / raw)
  To: Mike Latimer
  Cc: george.dunlap, wei.liu2, Stefano.Stabellini, Ian.Jackson, xen-devel

On Fri, 2015-01-30 at 14:01 -0700, Mike Latimer wrote:
> During domain startup, all required memory ballooning must complete
> within a maximum window of 33 seconds (3 retries, 11 seconds of delay).
> If not, domain creation is aborted with a 'failed to free memory' error.
> 
> In order to accommodate large domains or slower hardware (which require
> substantially longer to balloon memory) the free memory process should
> continue retrying if the amount of free memory is increasing on each
> iteration of the loop.

Sorry for not spotting this earlier, but to accept a patch into the Xen
code base we need a Signed-off-by:
http://wiki.xen.org/wiki/Submitting_Xen_Patches#Signing_off_a_patch

You can provide one in this thread rather than resending if that is more
convenient.

For my part: Acked-by: Ian Campbell <ian.campbell@citrix.com>

> 
> ---
>  tools/libxl/xl_cmdimpl.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 0b02a6c..9ff3c4f 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2194,8 +2194,9 @@ static int preserve_domain(uint32_t *r_domid, libxl_event *event,
>  
>  static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
>  {
> -    int rc, retries = 3;
> -    uint32_t need_memkb, free_memkb;
> +    int rc, retries;
> +    const int MAX_RETRIES = 3;
> +    uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
>  
>      if (!autoballoon)
>          return 0;
> @@ -2204,6 +2205,7 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
>      if (rc < 0)
>          return rc;
>  
> +    retries = MAX_RETRIES;
>      do {
>          rc = libxl_get_free_memory(ctx, &free_memkb);
>          if (rc < 0)
> @@ -2228,7 +2230,16 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
>          if (rc < 0)
>              return rc;
>  
> -        retries--;
> +        /*
> +         * If the amount of free mem has increased on this iteration (i.e.
> +         * some progress has been made) then reset the retry counter.
> +         */
> +        if (free_memkb > free_memkb_prev) {
> +            retries = MAX_RETRIES;
> +            free_memkb_prev = free_memkb;
> +        } else {
> +            retries--;
> +        }
>      } while (retries > 0);
>  
>      return ERROR_NOMEM;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] libxl: Wait for ballooning if free memory is increasing
  2015-02-02 14:35 ` Ian Campbell
@ 2015-02-02 15:17   ` Mike Latimer
  2015-02-05 12:45     ` Ian Campbell
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Latimer @ 2015-02-02 15:17 UTC (permalink / raw)
  To: Ian Campbell
  Cc: george.dunlap, wei.liu2, Stefano.Stabellini, Ian.Jackson, xen-devel

On Monday, February 02, 2015 02:35:39 PM Ian Campbell wrote:
> On Fri, 2015-01-30 at 14:01 -0700, Mike Latimer wrote:
> > During domain startup, all required memory ballooning must complete
> > within a maximum window of 33 seconds (3 retries, 11 seconds of delay).
> > If not, domain creation is aborted with a 'failed to free memory' error.
> > 
> > In order to accommodate large domains or slower hardware (which require
> > substantially longer to balloon memory) the free memory process should
> > continue retrying if the amount of free memory is increasing on each
> > iteration of the loop.
> 
> Sorry for not spotting this earlier, but to accept a patch into the Xen
> code base we need a Signed-off-by:
> http://wiki.xen.org/wiki/Submitting_Xen_Patches#Signing_off_a_patch
> 
> You can provide one in this thread rather than resending if that is more
> convenient.

Sorry, I should have caught that myself.  If's it's okay to provide it here:

Signed-off-by: Mike Latimer <mlatimer@suse.com>

> For my part: Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks,
Mike

> > ---
> > 
> >  tools/libxl/xl_cmdimpl.c | 17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 0b02a6c..9ff3c4f 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -2194,8 +2194,9 @@ static int preserve_domain(uint32_t *r_domid,
> > libxl_event *event,> 
> >  static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
> >  {
> > 
> > -    int rc, retries = 3;
> > -    uint32_t need_memkb, free_memkb;
> > +    int rc, retries;
> > +    const int MAX_RETRIES = 3;
> > +    uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
> > 
> >      if (!autoballoon)
> >      
> >          return 0;
> > 
> > @@ -2204,6 +2205,7 @@ static int freemem(uint32_t domid,
> > libxl_domain_build_info *b_info)> 
> >      if (rc < 0)
> >      
> >          return rc;
> > 
> > +    retries = MAX_RETRIES;
> > 
> >      do {
> >      
> >          rc = libxl_get_free_memory(ctx, &free_memkb);
> >          if (rc < 0)
> > 
> > @@ -2228,7 +2230,16 @@ static int freemem(uint32_t domid,
> > libxl_domain_build_info *b_info)> 
> >          if (rc < 0)
> >          
> >              return rc;
> > 
> > -        retries--;
> > +        /*
> > +         * If the amount of free mem has increased on this iteration
> > (i.e.
> > +         * some progress has been made) then reset the retry counter.
> > +         */
> > +        if (free_memkb > free_memkb_prev) {
> > +            retries = MAX_RETRIES;
> > +            free_memkb_prev = free_memkb;
> > +        } else {
> > +            retries--;
> > +        }
> > 
> >      } while (retries > 0);
> >      
> >      return ERROR_NOMEM;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] libxl: Wait for ballooning if free memory is increasing
  2015-02-02 15:17   ` Mike Latimer
@ 2015-02-05 12:45     ` Ian Campbell
  2015-02-11  4:17       ` Mike Latimer
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2015-02-05 12:45 UTC (permalink / raw)
  To: Mike Latimer
  Cc: george.dunlap, wei.liu2, Stefano.Stabellini, Ian.Jackson, xen-devel

On Mon, 2015-02-02 at 08:17 -0700, Mike Latimer wrote:
> On Monday, February 02, 2015 02:35:39 PM Ian Campbell wrote:
> > On Fri, 2015-01-30 at 14:01 -0700, Mike Latimer wrote:
> > > During domain startup, all required memory ballooning must complete
> > > within a maximum window of 33 seconds (3 retries, 11 seconds of delay).
> > > If not, domain creation is aborted with a 'failed to free memory' error.
> > > 
> > > In order to accommodate large domains or slower hardware (which require
> > > substantially longer to balloon memory) the free memory process should
> > > continue retrying if the amount of free memory is increasing on each
> > > iteration of the loop.
> > 
> > Sorry for not spotting this earlier, but to accept a patch into the Xen
> > code base we need a Signed-off-by:
> > http://wiki.xen.org/wiki/Submitting_Xen_Patches#Signing_off_a_patch
> > 
> > You can provide one in this thread rather than resending if that is more
> > convenient.
> 
> Sorry, I should have caught that myself.  If's it's okay to provide it here:
> 
> Signed-off-by: Mike Latimer <mlatimer@suse.com>

Thanks, applied.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] libxl: Wait for ballooning if free memory is increasing
  2015-02-05 12:45     ` Ian Campbell
@ 2015-02-11  4:17       ` Mike Latimer
  2015-02-13 11:01         ` Wei Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Latimer @ 2015-02-11  4:17 UTC (permalink / raw)
  To: xen-devel
  Cc: george.dunlap, Ian.Jackson, wei.liu2, Ian Campbell, Stefano.Stabellini

On Thursday, February 05, 2015 12:45:53 PM Ian Campbell wrote:
> On Mon, 2015-02-02 at 08:17 -0700, Mike Latimer wrote:
> > On Monday, February 02, 2015 02:35:39 PM Ian Campbell wrote:
> > > On Fri, 2015-01-30 at 14:01 -0700, Mike Latimer wrote:
> > > > During domain startup, all required memory ballooning must complete
> > > > within a maximum window of 33 seconds (3 retries, 11 seconds of
> > > > delay).
> > > > If not, domain creation is aborted with a 'failed to free memory'
> > > > error.
...
> Thanks, applied.

Unfortunately, I just found an issue with this codepath...

In tools/libxl/xl_cmdimpl.c:freemem, the following call sets the memory target 
for dom0:

  rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);

This reduces the memory target of dom0 by the amount of memory still needed - 
relative to the amount of memory currently free. In other words, during each 
iteration of the loop, dom0's target memory is set lower, and lower, and 
lower...

Prior to my changes, this issue would only be noticed when starting very large 
domains - due to the loop being limited to 3 iterations. (For example, when 
ballooning 512G, dom0 memory could be reduced up to 1.5T.) With my changes, 
this loop can take several more iterations (10 or many more in testing). With 
each iteration lowering dom0's target, the end result can be dom0 ballooning 
down 100's of gigabytes, just to satisfy a much smaller request. (On one 
machine, a 64G guest can result in dom0 ballooning down 500G.)

It seems like a proper fix would be to use libxl_get_memory_target to first 
check dom0's target and see if:

  (free_memory + (dom0_current_mem - dom0_target_mem)) >= needed_memory

If so, there will be sufficient memory when dom0 reaches the target, so don't 
change it. Before going down that road, I'd like a sanity check on that 
theory, and any advice on the overall picture (including my thread on freemem-
slack).

I'll leave it up to you if my patch should be reverted until a new patch is 
created.

Thanks!
Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] libxl: Wait for ballooning if free memory is increasing
  2015-02-11  4:17       ` Mike Latimer
@ 2015-02-13 11:01         ` Wei Liu
  2015-02-13 23:16           ` Mike Latimer
  0 siblings, 1 reply; 7+ messages in thread
From: Wei Liu @ 2015-02-13 11:01 UTC (permalink / raw)
  To: Mike Latimer
  Cc: wei.liu2, Ian Campbell, Stefano.Stabellini, george.dunlap,
	Ian.Jackson, xen-devel

On Tue, Feb 10, 2015 at 09:17:23PM -0700, Mike Latimer wrote:
> On Thursday, February 05, 2015 12:45:53 PM Ian Campbell wrote:
> > On Mon, 2015-02-02 at 08:17 -0700, Mike Latimer wrote:
> > > On Monday, February 02, 2015 02:35:39 PM Ian Campbell wrote:
> > > > On Fri, 2015-01-30 at 14:01 -0700, Mike Latimer wrote:
> > > > > During domain startup, all required memory ballooning must complete
> > > > > within a maximum window of 33 seconds (3 retries, 11 seconds of
> > > > > delay).
> > > > > If not, domain creation is aborted with a 'failed to free memory'
> > > > > error.
> ...
> > Thanks, applied.
> 
> Unfortunately, I just found an issue with this codepath...
> 
> In tools/libxl/xl_cmdimpl.c:freemem, the following call sets the memory target 
> for dom0:
> 
>   rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
> 
> This reduces the memory target of dom0 by the amount of memory still needed - 
> relative to the amount of memory currently free. In other words, during each 
> iteration of the loop, dom0's target memory is set lower, and lower, and 
> lower...
> 
> Prior to my changes, this issue would only be noticed when starting very large 
> domains - due to the loop being limited to 3 iterations. (For example, when 
> ballooning 512G, dom0 memory could be reduced up to 1.5T.) With my changes, 
> this loop can take several more iterations (10 or many more in testing). With 
> each iteration lowering dom0's target, the end result can be dom0 ballooning 
> down 100's of gigabytes, just to satisfy a much smaller request. (On one 
> machine, a 64G guest can result in dom0 ballooning down 500G.)
> 

I'm curious why freemem doesn't not return when dom0 is ballooned down 64G?
I.e. I think libxl_wait_for_free_memory should return 0 in that case and
freeme should just return. Does it suggest there are other domains
competing for free memory and dom0 is doing the right thing?

Wei.

> It seems like a proper fix would be to use libxl_get_memory_target to first 
> check dom0's target and see if:
> 
>   (free_memory + (dom0_current_mem - dom0_target_mem)) >= needed_memory
> 
> If so, there will be sufficient memory when dom0 reaches the target, so don't 
> change it. Before going down that road, I'd like a sanity check on that 
> theory, and any advice on the overall picture (including my thread on freemem-
> slack).
> 
> I'll leave it up to you if my patch should be reverted until a new patch is 
> created.
> 
> Thanks!
> Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] libxl: Wait for ballooning if free memory is increasing
  2015-02-13 11:01         ` Wei Liu
@ 2015-02-13 23:16           ` Mike Latimer
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Latimer @ 2015-02-13 23:16 UTC (permalink / raw)
  To: xen-devel
  Cc: george.dunlap, Ian.Jackson, Wei Liu, Ian Campbell, Stefano.Stabellini

Hi Wei,

On Friday, February 13, 2015 11:01:41 AM Wei Liu wrote:
> On Tue, Feb 10, 2015 at 09:17:23PM -0700, Mike Latimer wrote:
> > Prior to my changes, this issue would only be noticed when starting very
> > large domains - due to the loop being limited to 3 iterations. (For
> > example, when ballooning 512G, dom0 memory could be reduced up to 1.5T.)
> > With my changes, this loop can take several more iterations (10 or many
> > more in testing). With each iteration lowering dom0's target, the end
> > result can be dom0 ballooning down 100's of gigabytes, just to satisfy a
> > much smaller request. (On one machine, a 64G guest can result in dom0
> > ballooning down 500G.)
> 
> I'm curious why freemem doesn't not return when dom0 is ballooned down 64G?
> I.e. I think libxl_wait_for_free_memory should return 0 in that case and
> freeme should just return. Does it suggest there are other domains
> competing for free memory and dom0 is doing the right thing?

I was only testing with a single domain. freemem returns 0 until freemem-slack 
is free, then the difference between freemem and freemem-slack is returned.

However, there are actually a couple of different problems here - and the whole 
situation is closely related to freemem_slack. As the patch submitted with 
this thread is only a victim here, maybe it's best to move the discussion over 
to the freemem-slack thread. I'll reply in detail there.

Thanks!
Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-02-13 23:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-30 21:01 [PATCH v3] libxl: Wait for ballooning if free memory is increasing Mike Latimer
2015-02-02 14:35 ` Ian Campbell
2015-02-02 15:17   ` Mike Latimer
2015-02-05 12:45     ` Ian Campbell
2015-02-11  4:17       ` Mike Latimer
2015-02-13 11:01         ` Wei Liu
2015-02-13 23:16           ` Mike Latimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.