All of lore.kernel.org
 help / color / mirror / Atom feed
* freemem-slack and large memory environments
@ 2015-02-10  1:27 Mike Latimer
  2015-02-10 21:34 ` Mike Latimer
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Latimer @ 2015-02-10  1:27 UTC (permalink / raw)
  To: xen-devel

Hi,

While testing commit 2563bca1, I found that libxl_get_free_memory returns 0 
until there is more free memory than required for freemem-slack. This means 
that during the domain creation process, freed memory is first set aside for 
freemem-slack, then marked as truly free for consumption.

On machines with large amounts of memory, freemem-slack can be very high (26GB 
on a 2TB test machine). If freeing this memory takes more time than allowed 
during domain startup, domain creation fails with ERROR_NOMEM. (Commit 
2563bca1 doesn't help here, as free_memkb remains 0 until freemem-slack is 
satisfied.)

There is already a 15% limit on the size of freemem-slack (commit a39b5bc6), 
but this does not take into consideration very large memory environments. 
(26GB is only 1.2% of 2TB), where this limit is not hit.

It seems that there are two approaches to resolve this:

 - Introduce a hard limit on freemem-slack to avoid unnecessarily large 
reservations
 - Increase the retry count during domain creation to ensure enough time is 
set aside for any cycles spent freeing memory for freemem-slack (on the test 
machine, doubling the retry count to 6 is the minimum required)

Which is the best approach (or did I miss something)?

Thanks!
Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-10  1:27 freemem-slack and large memory environments Mike Latimer
@ 2015-02-10 21:34 ` Mike Latimer
  2015-02-13 11:13   ` Wei Liu
  2015-02-18 14:10   ` Ian Campbell
  0 siblings, 2 replies; 41+ messages in thread
From: Mike Latimer @ 2015-02-10 21:34 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, ian.campbell, stefano.stabellini

On Monday, February 09, 2015 06:27:54 PM Mike Latimer wrote:
> While testing commit 2563bca1, I found that libxl_get_free_memory returns 0
> until there is more free memory than required for freemem-slack. This means
> that during the domain creation process, freed memory is first set aside for
> freemem-slack, then marked as truly free for consumption.
> 
> On machines with large amounts of memory, freemem-slack can be very high
> (26GB on a 2TB test machine). If freeing this memory takes more time than
> allowed during domain startup, domain creation fails with ERROR_NOMEM.
> (Commit 2563bca1 doesn't help here, as free_memkb remains 0 until
> freemem-slack is satisfied.)
> 
> There is already a 15% limit on the size of freemem-slack (commit a39b5bc6),
> but this does not take into consideration very large memory environments.
> (26GB is only 1.2% of 2TB), where this limit is not hit.
> 
> It seems that there are two approaches to resolve this:
> 
>  - Introduce a hard limit on freemem-slack to avoid unnecessarily large
> reservations
>  - Increase the retry count during domain creation to ensure enough time is
> set aside for any cycles spent freeing memory for freemem-slack (on the test
> machine, doubling the retry count to 6 is the minimum required)
> 
> Which is the best approach (or did I miss something)?

Sorry - forgot to CC relevant maintainers.

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-10 21:34 ` Mike Latimer
@ 2015-02-13 11:13   ` Wei Liu
  2015-02-13 23:16     ` Mike Latimer
  2015-02-18 14:10   ` Ian Campbell
  1 sibling, 1 reply; 41+ messages in thread
From: Wei Liu @ 2015-02-13 11:13 UTC (permalink / raw)
  To: Mike Latimer
  Cc: wei.liu2, stefano.stabellini, ian.jackson, ian.campbell, xen-devel

On Tue, Feb 10, 2015 at 02:34:27PM -0700, Mike Latimer wrote:
> On Monday, February 09, 2015 06:27:54 PM Mike Latimer wrote:
> > While testing commit 2563bca1, I found that libxl_get_free_memory returns 0
> > until there is more free memory than required for freemem-slack. This means
> > that during the domain creation process, freed memory is first set aside for
> > freemem-slack, then marked as truly free for consumption.
> > 
> > On machines with large amounts of memory, freemem-slack can be very high
> > (26GB on a 2TB test machine). If freeing this memory takes more time than
> > allowed during domain startup, domain creation fails with ERROR_NOMEM.
> > (Commit 2563bca1 doesn't help here, as free_memkb remains 0 until
> > freemem-slack is satisfied.)
> > 
> > There is already a 15% limit on the size of freemem-slack (commit a39b5bc6),
> > but this does not take into consideration very large memory environments.
> > (26GB is only 1.2% of 2TB), where this limit is not hit.
> > 
> > It seems that there are two approaches to resolve this:
> > 
> >  - Introduce a hard limit on freemem-slack to avoid unnecessarily large
> > reservations
> >  - Increase the retry count during domain creation to ensure enough time is
> > set aside for any cycles spent freeing memory for freemem-slack (on the test
> > machine, doubling the retry count to 6 is the minimum required)
> > 
> > Which is the best approach (or did I miss something)?
> 
> Sorry - forgot to CC relevant maintainers.
> 

Oops, I replied to your other email before looking at this one. Sorry.
26GB out of 2TB is overkill IMHO. And the 15% limit dates back to 2010
which a) is just empirical, b) doesn't take into account large system.

I think we might be able to do both, introducing a hard limit as well as
tweaking the retry count (which function are you referring to?).

Wei.

> -Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-13 11:13   ` Wei Liu
@ 2015-02-13 23:16     ` Mike Latimer
  0 siblings, 0 replies; 41+ messages in thread
From: Mike Latimer @ 2015-02-13 23:16 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.jackson, jfehlig, Wei Liu, ian.campbell, stefano.stabellini

Hi Wei,

On Friday, February 13, 2015 11:13:50 AM Wei Liu wrote:
> On Tue, Feb 10, 2015 at 02:34:27PM -0700, Mike Latimer wrote:
> > On Monday, February 09, 2015 06:27:54 PM Mike Latimer wrote:
> > > It seems that there are two approaches to resolve this:
> > >  - Introduce a hard limit on freemem-slack to avoid unnecessarily large
> > > reservations
> > > 
> > >  - Increase the retry count during domain creation to ensure enough time
> > >  is set aside for any cycles spent freeing memory for freemem-slack (on
> > > the test machine, doubling the retry count to 6 is the minimum required)
> > > 
> > > Which is the best approach (or did I miss something)?
> > 
> > Sorry - forgot to CC relevant maintainers.
> 
> Oops, I replied to your other email before looking at this one. Sorry.
> 26GB out of 2TB is overkill IMHO. And the 15% limit dates back to 2010
> which a) is just empirical, b) doesn't take into account large system.

Agreed.  :)

> I think we might be able to do both, introducing a hard limit as well as
> tweaking the retry count (which function are you referring to?).

Well, as I mentioned in the other thread, the situation is a bit complicated. 
As I've continued debugging here, there are actually a few issues related to 
freemem-slack and ballooning.

- tools/libxl/libxl.c:libxl__fill_dom0_memory_info sets free_mem_slack_kb to an 
empirical maximum of 15% of memory. This is obviously far too large in large 
memory environments.

Even if freemem-slack is lowered...

- tools/libxl/xl_cmdimpl.c:freemem sets dom0's memory target lower during each 
iteration of the freemem loop (despite the fact that the first iteration 
already lowered dom0's target by the amount required for domU). This causes 
dom0 to balloon down far too much, but has the interesting side effect that it 
balloons dom0 down far enough to get past the freemem-slack barrier. :)

- tools/libxl/xl_cmdimpl.c:freemem does not know about or compensate for the 
memory reservation requirement imposed by freemem-slack. As adding this 
awareness to the tools side would have ramifications for all tools, it makes 
more sense to compensate for this requirement in 
tools/libxl/libxl.c:libxl_set_memory_target - so any ballooning of dom0 will 
make sure freemem-slack space is preserved.

Even if xl sets dom0's memory target once, and libxl compensates for the 
amount of memory required by freemem-slack...

- tools/libxl/xl_cmdimpl.c:freemem does not account for the amount of time 
required to balloon dom0 past the freemem-slack barrier. Lowering the freemem-
slack amount will help, but it may still be worth increasing the retry count 
to something higher than 3.

The following patch addresses all but the first issue above. (I'm not sure what 
the best option here is (i.e. another empirical hard limit?).) I'm open to 
other ideas, but this patch has worked in testing. (Starting a 512G domain on 
a 2T host used to take 1.5 hours and ballooned dom0 down to ~20G. With this 
patch, the domain started in 30 minutes and dom0 ballooned down to the 
expected 1.5T (2T-(512G+26G freemem-slack)).

---
 tools/libxl/libxl.c      | 29 +++++++++++++++++++++++++++++
 tools/libxl/xl_cmdimpl.c | 30 +++++++++++++++++++-----------
 2 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index cd6f42c..f336c9a 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4710,10 +4710,12 @@ int libxl_set_memory_target(libxl_ctx *ctx, uint32_t 
domid,
     uint32_t memorykb = 0, videoram = 0;
     uint32_t current_target_memkb = 0, new_target_memkb = 0;
     uint32_t current_max_memkb = 0;
+    uint32_t freemem_slack = 0;
     char *memmax, *endptr, *videoram_s = NULL, *target = NULL;
     char *dompath = libxl__xs_get_dompath(gc, domid);
     xc_domaininfo_t info;
     libxl_dominfo ptr;
+    libxl_physinfo physinfo;
     char *uuid;
     xs_transaction_t t;

@@ -4783,6 +4785,33 @@ retry_transaction:
         goto out;
     }

+    /* When setting the target on dom0, compare free memory to freemem_slack
+       required reservation. If there is insufficient free memory to satisfy
+       freemem_slack reservation, dom0's target should be decreased by the
+       required freemem_slack space. */
+    if (!domid) {
+        rc = libxl_get_physinfo(ctx, &physinfo);
+        if (rc != 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
+                    "libxl_get_physinfo failed "
+                    "rc=%d\n", rc);
+            abort_transaction = 1;
+            goto out;
+        }
+        rc = libxl__get_free_memory_slack(gc, &freemem_slack);
+        if (rc != 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
+                    "libxl__get_free_memory_slack failed "
+                    "rc=%d\n", rc);
+            abort_transaction = 1;
+            goto out;
+        }
+        if ((physinfo.free_pages + physinfo.scrub_pages) * 4 < freemem_slack) 
{
+            new_target_memkb -= (freemem_slack - ((physinfo.free_pages
+                                              + physinfo.scrub_pages) * 4));
+        }
+    }
+
     if (!domid && new_target_memkb < LIBXL_MIN_DOM0_MEM) {
         LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
                 "new target %d for dom0 is below the minimum threshold\n",
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 440db78..630b427 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2199,7 +2199,7 @@ static int preserve_domain(uint32_t *r_domid, 
libxl_event *event,
 static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
 {
     int rc, retries;
-    const int MAX_RETRIES = 3;
+    const int MAX_RETRIES = 10;
     uint32_t need_memkb, free_memkb, free_memkb_prev = 0;

     if (!autoballoon)
@@ -2209,19 +2209,20 @@ static int freemem(uint32_t domid, 
libxl_domain_build_info *b_info)
     if (rc < 0)
         return rc;

-    retries = MAX_RETRIES;
-    do {
-        rc = libxl_get_free_memory(ctx, &free_memkb);
-        if (rc < 0)
-            return rc;
+    rc = libxl_get_free_memory(ctx, &free_memkb);
+    if (rc < 0)
+        return rc;

-        if (free_memkb >= need_memkb)
-            return 0;
+    if (free_memkb >= need_memkb)
+        return 0;

-        rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
-        if (rc < 0)
-            return rc;
+    /* Set memory target for dom0 before entering wait loop */
+    rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
+    if (rc < 0)
+        return rc;

+    retries = MAX_RETRIES;
+    do {
         rc = libxl_wait_for_free_memory(ctx, domid, need_memkb, 10);
         if (!rc)
             return 0;
@@ -2234,6 +2235,13 @@ static int freemem(uint32_t domid, 
libxl_domain_build_info *b_info)
         if (rc < 0)
             return rc;

+        rc = libxl_get_free_memory(ctx, &free_memkb);
+        if (rc < 0)
+            return rc;
+
+        if (free_memkb >= need_memkb)
+            return 0;
+
         /*
          * If the amount of free mem has increased on this iteration (i.e.
          * some progress has been made) then reset the retry counter.
--

Is there a better way to handle these issues?

Thanks,
Mike

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-10 21:34 ` Mike Latimer
  2015-02-13 11:13   ` Wei Liu
@ 2015-02-18 14:10   ` Ian Campbell
  2015-02-24 16:06     ` Stefano Stabellini
  1 sibling, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-02-18 14:10 UTC (permalink / raw)
  To: Mike Latimer; +Cc: wei.liu2, stefano.stabellini, ian.jackson, xen-devel

On Tue, 2015-02-10 at 14:34 -0700, Mike Latimer wrote:
> On Monday, February 09, 2015 06:27:54 PM Mike Latimer wrote:
> > While testing commit 2563bca1, I found that libxl_get_free_memory returns 0
> > until there is more free memory than required for freemem-slack. This means
> > that during the domain creation process, freed memory is first set aside for
> > freemem-slack, then marked as truly free for consumption.
> > 
> > On machines with large amounts of memory, freemem-slack can be very high
> > (26GB on a 2TB test machine). If freeing this memory takes more time than
> > allowed during domain startup, domain creation fails with ERROR_NOMEM.
> > (Commit 2563bca1 doesn't help here, as free_memkb remains 0 until
> > freemem-slack is satisfied.)
> > 
> > There is already a 15% limit on the size of freemem-slack (commit a39b5bc6),
> > but this does not take into consideration very large memory environments.
> > (26GB is only 1.2% of 2TB), where this limit is not hit.

Stefano,

What is "freemem-slack" for? It seems to have been added in 7010e9b7 but
the commit log makes no mention of it whatsoever. Was it originally just
supposed to be the delta between the host memory and dom0 memory at
start of day?

This seems to then change in a39b5bc64, to add an arbitrary caP which
seems to be working around an invalid configuration (dom0_mem +
autoballooning on).

Now that we autodetect the use of dom0_mem and set autoballooning
correctly perhaps we should just revert a39b5bc64?

Ian.

> > 
> > It seems that there are two approaches to resolve this:
> > 
> >  - Introduce a hard limit on freemem-slack to avoid unnecessarily large
> > reservations
> >  - Increase the retry count during domain creation to ensure enough time is
> > set aside for any cycles spent freeing memory for freemem-slack (on the test
> > machine, doubling the retry count to 6 is the minimum required)
> > 
> > Which is the best approach (or did I miss something)?
> 
> Sorry - forgot to CC relevant maintainers.
> 
> -Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-18 14:10   ` Ian Campbell
@ 2015-02-24 16:06     ` Stefano Stabellini
  2015-02-24 16:54       ` Ian Campbell
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2015-02-24 16:06 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, stefano.stabellini, ian.jackson, Mike Latimer, xen-devel

On Wed, 18 Feb 2015, Ian Campbell wrote:
> On Tue, 2015-02-10 at 14:34 -0700, Mike Latimer wrote:
> > On Monday, February 09, 2015 06:27:54 PM Mike Latimer wrote:
> > > While testing commit 2563bca1, I found that libxl_get_free_memory returns 0
> > > until there is more free memory than required for freemem-slack. This means
> > > that during the domain creation process, freed memory is first set aside for
> > > freemem-slack, then marked as truly free for consumption.
> > > 
> > > On machines with large amounts of memory, freemem-slack can be very high
> > > (26GB on a 2TB test machine). If freeing this memory takes more time than
> > > allowed during domain startup, domain creation fails with ERROR_NOMEM.
> > > (Commit 2563bca1 doesn't help here, as free_memkb remains 0 until
> > > freemem-slack is satisfied.)
> > > 
> > > There is already a 15% limit on the size of freemem-slack (commit a39b5bc6),
> > > but this does not take into consideration very large memory environments.
> > > (26GB is only 1.2% of 2TB), where this limit is not hit.
> 
> Stefano,
> 
> What is "freemem-slack" for?

I think it comes from xapi: they always keep a minimum amount of free
memory in the system as it seems to be empirically required by the
hypervisor.


> It seems to have been added in 7010e9b7 but
> the commit log makes no mention of it whatsoever. Was it originally just
> supposed to be the delta between the host memory and dom0 memory at
> start of day?

Yes, that is right.


> This seems to then change in a39b5bc64, to add an arbitrary caP which
> seems to be working around an invalid configuration (dom0_mem +
> autoballooning on).

Correct again.


> Now that we autodetect the use of dom0_mem and set autoballooning
> correctly perhaps we should just revert a39b5bc64?

We could do that and theoretically it makes perfect sense, but it would
result in an even bigger waste of memory.
I think we should either introduce an hard upper limit for
freemem-slack as Mike suggested, or remove freemem-slack altogether and
properly fix any issues caused by lack of memory in the system (properly
account memory usage).
After all we are just at the beginning of the release cycle, it is the
right time to do this.


> Ian.
> 
> > > 
> > > It seems that there are two approaches to resolve this:
> > > 
> > >  - Introduce a hard limit on freemem-slack to avoid unnecessarily large
> > > reservations
> > >  - Increase the retry count during domain creation to ensure enough time is
> > > set aside for any cycles spent freeing memory for freemem-slack (on the test
> > > machine, doubling the retry count to 6 is the minimum required)
> > > 
> > > Which is the best approach (or did I miss something)?
> > 
> > Sorry - forgot to CC relevant maintainers.
> > 
> > -Mike
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-24 16:06     ` Stefano Stabellini
@ 2015-02-24 16:54       ` Ian Campbell
  2015-02-25 11:39         ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-02-24 16:54 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: wei.liu2, ian.jackson, Mike Latimer, xen-devel

On Tue, 2015-02-24 at 16:06 +0000, Stefano Stabellini wrote:
> > Now that we autodetect the use of dom0_mem and set autoballooning
> > correctly perhaps we should just revert a39b5bc64?
> 
> We could do that and theoretically it makes perfect sense, but it would
> result in an even bigger waste of memory.

Would it, even though we now detect dom0_mem usage and do the right
thing? I thought a39b5bc64 was a workaround for autoballooning=1
in /etc/xen/xl.conf when dom0 was used.


> I think we should either introduce an hard upper limit for
> freemem-slack as Mike suggested, or remove freemem-slack altogether and
> properly fix any issues caused by lack of memory in the system (properly
> account memory usage).
> After all we are just at the beginning of the release cycle, it is the
> right time to do this.

I'm all in favour of someone doing this, similarly to
http://bugs.xenproject.org/xen/bug/23

Who is going to do that (either one)?

Ian.

> 
> 
> > Ian.
> > 
> > > > 
> > > > It seems that there are two approaches to resolve this:
> > > > 
> > > >  - Introduce a hard limit on freemem-slack to avoid unnecessarily large
> > > > reservations
> > > >  - Increase the retry count during domain creation to ensure enough time is
> > > > set aside for any cycles spent freeing memory for freemem-slack (on the test
> > > > machine, doubling the retry count to 6 is the minimum required)
> > > > 
> > > > Which is the best approach (or did I miss something)?
> > > 
> > > Sorry - forgot to CC relevant maintainers.
> > > 
> > > -Mike
> > 
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-24 16:54       ` Ian Campbell
@ 2015-02-25 11:39         ` Stefano Stabellini
  2015-02-25 12:00           ` Ian Campbell
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2015-02-25 11:39 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, xen-devel, ian.jackson, Mike Latimer, Stefano Stabellini

On Tue, 24 Feb 2015, Ian Campbell wrote:
> On Tue, 2015-02-24 at 16:06 +0000, Stefano Stabellini wrote:
> > > Now that we autodetect the use of dom0_mem and set autoballooning
> > > correctly perhaps we should just revert a39b5bc64?
> > 
> > We could do that and theoretically it makes perfect sense, but it would
> > result in an even bigger waste of memory.
> 
> Would it, even though we now detect dom0_mem usage and do the right
> thing? I thought a39b5bc64 was a workaround for autoballooning=1
> in /etc/xen/xl.conf when dom0 was used.
> 
> 
> > I think we should either introduce an hard upper limit for
> > freemem-slack as Mike suggested, or remove freemem-slack altogether and
> > properly fix any issues caused by lack of memory in the system (properly
> > account memory usage).
> > After all we are just at the beginning of the release cycle, it is the
> > right time to do this.
> 
> I'm all in favour of someone doing this, similarly to
> http://bugs.xenproject.org/xen/bug/23
> 
> Who is going to do that (either one)?

I am OK with sending the patch for both

 
> > 
> > 
> > > Ian.
> > > 
> > > > > 
> > > > > It seems that there are two approaches to resolve this:
> > > > > 
> > > > >  - Introduce a hard limit on freemem-slack to avoid unnecessarily large
> > > > > reservations
> > > > >  - Increase the retry count during domain creation to ensure enough time is
> > > > > set aside for any cycles spent freeing memory for freemem-slack (on the test
> > > > > machine, doubling the retry count to 6 is the minimum required)
> > > > > 
> > > > > Which is the best approach (or did I miss something)?
> > > > 
> > > > Sorry - forgot to CC relevant maintainers.
> > > > 
> > > > -Mike
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-25 11:39         ` Stefano Stabellini
@ 2015-02-25 12:00           ` Ian Campbell
  2015-02-25 14:03             ` Ian Campbell
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-02-25 12:00 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: wei.liu2, ian.jackson, Mike Latimer, xen-devel

On Wed, 2015-02-25 at 11:39 +0000, Stefano Stabellini wrote:
> On Tue, 24 Feb 2015, Ian Campbell wrote:
> > On Tue, 2015-02-24 at 16:06 +0000, Stefano Stabellini wrote:
> > > > Now that we autodetect the use of dom0_mem and set autoballooning
> > > > correctly perhaps we should just revert a39b5bc64?
> > > 
> > > We could do that and theoretically it makes perfect sense, but it would
> > > result in an even bigger waste of memory.
> > 
> > Would it, even though we now detect dom0_mem usage and do the right
> > thing? I thought a39b5bc64 was a workaround for autoballooning=1
> > in /etc/xen/xl.conf when dom0 was used.
> > 
> > 
> > > I think we should either introduce an hard upper limit for
> > > freemem-slack as Mike suggested, or remove freemem-slack altogether and
> > > properly fix any issues caused by lack of memory in the system (properly
> > > account memory usage).
> > > After all we are just at the beginning of the release cycle, it is the
> > > right time to do this.
> > 
> > I'm all in favour of someone doing this, similarly to
> > http://bugs.xenproject.org/xen/bug/23
> > 
> > Who is going to do that (either one)?
> 
> I am OK with sending the patch for both

Super, thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-25 12:00           ` Ian Campbell
@ 2015-02-25 14:03             ` Ian Campbell
  2015-02-25 14:09               ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-02-25 14:03 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: ian.jackson, wei.liu2, Mike Latimer, xen-devel

On Wed, 2015-02-25 at 12:00 +0000, Ian Campbell wrote:
> On Wed, 2015-02-25 at 11:39 +0000, Stefano Stabellini wrote:
> > On Tue, 24 Feb 2015, Ian Campbell wrote:
> > > On Tue, 2015-02-24 at 16:06 +0000, Stefano Stabellini wrote:
> > > > > Now that we autodetect the use of dom0_mem and set autoballooning
> > > > > correctly perhaps we should just revert a39b5bc64?
> > > > 
> > > > We could do that and theoretically it makes perfect sense, but it would
> > > > result in an even bigger waste of memory.
> > > 
> > > Would it, even though we now detect dom0_mem usage and do the right
> > > thing? I thought a39b5bc64 was a workaround for autoballooning=1
> > > in /etc/xen/xl.conf when dom0 was used.
> > > 
> > > 
> > > > I think we should either introduce an hard upper limit for
> > > > freemem-slack as Mike suggested, or remove freemem-slack altogether and
> > > > properly fix any issues caused by lack of memory in the system (properly
> > > > account memory usage).
> > > > After all we are just at the beginning of the release cycle, it is the
> > > > right time to do this.
> > > 
> > > I'm all in favour of someone doing this, similarly to
> > > http://bugs.xenproject.org/xen/bug/23
> > > 
> > > Who is going to do that (either one)?
> > 
> > I am OK with sending the patch for both
> 
> Super, thanks.

Is the upshot that Mike doesn't need to do anything further with his
patch (i.e. can drop it)? I think so?

> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-25 14:03             ` Ian Campbell
@ 2015-02-25 14:09               ` Stefano Stabellini
  2015-02-26 15:36                 ` Mike Latimer
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2015-02-25 14:09 UTC (permalink / raw)
  To: Ian Campbell
  Cc: ian.jackson, xen-devel, wei.liu2, Mike Latimer, Stefano Stabellini

On Wed, 25 Feb 2015, Ian Campbell wrote:
> On Wed, 2015-02-25 at 12:00 +0000, Ian Campbell wrote:
> > On Wed, 2015-02-25 at 11:39 +0000, Stefano Stabellini wrote:
> > > On Tue, 24 Feb 2015, Ian Campbell wrote:
> > > > On Tue, 2015-02-24 at 16:06 +0000, Stefano Stabellini wrote:
> > > > > > Now that we autodetect the use of dom0_mem and set autoballooning
> > > > > > correctly perhaps we should just revert a39b5bc64?
> > > > > 
> > > > > We could do that and theoretically it makes perfect sense, but it would
> > > > > result in an even bigger waste of memory.
> > > > 
> > > > Would it, even though we now detect dom0_mem usage and do the right
> > > > thing? I thought a39b5bc64 was a workaround for autoballooning=1
> > > > in /etc/xen/xl.conf when dom0 was used.
> > > > 
> > > > 
> > > > > I think we should either introduce an hard upper limit for
> > > > > freemem-slack as Mike suggested, or remove freemem-slack altogether and
> > > > > properly fix any issues caused by lack of memory in the system (properly
> > > > > account memory usage).
> > > > > After all we are just at the beginning of the release cycle, it is the
> > > > > right time to do this.
> > > > 
> > > > I'm all in favour of someone doing this, similarly to
> > > > http://bugs.xenproject.org/xen/bug/23
> > > > 
> > > > Who is going to do that (either one)?
> > > 
> > > I am OK with sending the patch for both
> > 
> > Super, thanks.
> 
> Is the upshot that Mike doesn't need to do anything further with his
> patch (i.e. can drop it)? I think so?

Yes, I think so. Maybe he could help out testing the patches I am going
to write :-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-25 14:09               ` Stefano Stabellini
@ 2015-02-26 15:36                 ` Mike Latimer
  2015-02-26 15:57                   ` Ian Campbell
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Latimer @ 2015-02-26 15:36 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: ian.jackson, wei.liu2, Ian Campbell, xen-devel

On Wednesday, February 25, 2015 02:09:50 PM Stefano Stabellini wrote:
> > Is the upshot that Mike doesn't need to do anything further with his
> > patch (i.e. can drop it)? I think so?
> 
> Yes, I think so. Maybe he could help out testing the patches I am going
> to write :-)

Sorry for not responding to this yesterday.

There is still one aspect of my original patch that is important. As the code 
currently stands, the target for dom0 is set lower during each iteration of 
the loop. Unless only one iteration is required, dom0 will end up being set to 
a much lower target than is actually required.

There are two ways to fix this issue:

 - Set the memory target for dom0 once, before entering the loop
 - During each iteration of the loop, compare the amount of needed memory to 
the amount of memory which will be available once dom0 hits the target, and 
only lower the target if additional memory is needed.

My patch earlier in this thread does the former, but I think the second option 
is also possible. Is there a preference between those approaches (or a better 
idea)?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 15:36                 ` Mike Latimer
@ 2015-02-26 15:57                   ` Ian Campbell
  2015-02-26 17:38                     ` Mike Latimer
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-02-26 15:57 UTC (permalink / raw)
  To: Mike Latimer; +Cc: wei.liu2, xen-devel, ian.jackson, Stefano Stabellini

On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> On Wednesday, February 25, 2015 02:09:50 PM Stefano Stabellini wrote:
> > > Is the upshot that Mike doesn't need to do anything further with his
> > > patch (i.e. can drop it)? I think so?
> > 
> > Yes, I think so. Maybe he could help out testing the patches I am going
> > to write :-)
> 
> Sorry for not responding to this yesterday.
> 
> There is still one aspect of my original patch that is important. As the code 
> currently stands, the target for dom0 is set lower during each iteration of 
> the loop. Unless only one iteration is required, dom0 will end up being set to 
> a much lower target than is actually required.

Is this because some sort of slack is applied once per iteration rather
than once at the start or is it something else?

> 
> There are two ways to fix this issue:
> 
>  - Set the memory target for dom0 once, before entering the loop
>  - During each iteration of the loop, compare the amount of needed memory to 
> the amount of memory which will be available once dom0 hits the target, and 
> only lower the target if additional memory is needed.
> 
> My patch earlier in this thread does the former, but I think the second option 
> is also possible. Is there a preference between those approaches (or a better 
> idea)?
> 
> Thanks,
> Mike
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 15:57                   ` Ian Campbell
@ 2015-02-26 17:38                     ` Mike Latimer
  2015-02-26 17:47                       ` Ian Campbell
  2015-02-26 17:53                       ` Stefano Stabellini
  0 siblings, 2 replies; 41+ messages in thread
From: Mike Latimer @ 2015-02-26 17:38 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, xen-devel, ian.jackson, Stefano Stabellini

On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote:
> On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> > There is still one aspect of my original patch that is important. As the
> > code currently stands, the target for dom0 is set lower during each
> > iteration of the loop. Unless only one iteration is required, dom0 will
> > end up being set to a much lower target than is actually required.
> 
> Is this because some sort of slack is applied once per iteration rather
> than once at the start or is it something else?

No - the slack reservation just complicated the request by (potentially) 
requiring more free memory than domU initially requested.

With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, 
frees memory for domU by lowering the memory target for dom0. However, this is 
not a single request (e.g. free 64GB for domX), rather the memory target for 
dom0 is set lower during every iteration through:

   rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);

This causes dom0's memory target to be lowered by the needed amount during 
every iteration of the loop. In practice, this causes the first request to 
lower dom0's target by the full amount (e.g. -64GB), and subsequent iterations 
further lower dom0's target by however much memory that still appears to be 
required (e.g. three iterations of the loop might lower dom0's target by 
-25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue 
itself is due to the loop ignoring the fact that the original request set 
dom0's target to the correct amount, but the ballooning has not completed.

The problem itself is easier to see when domU memory sizes are increased. As 
mentioned before, starting a 512GB domain should guarantee that multiple 
iterations of the loop are required, and dom0 will balloon down much further 
than the required 512GB.

Does this clarify the situation?

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 17:38                     ` Mike Latimer
@ 2015-02-26 17:47                       ` Ian Campbell
  2015-02-26 20:38                         ` Mike Latimer
  2015-02-26 17:53                       ` Stefano Stabellini
  1 sibling, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-02-26 17:47 UTC (permalink / raw)
  To: Mike Latimer; +Cc: ian.jackson, Stefano Stabellini, wei.liu2, xen-devel

On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
> On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote:
> > On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> > > There is still one aspect of my original patch that is important. As the
> > > code currently stands, the target for dom0 is set lower during each
> > > iteration of the loop. Unless only one iteration is required, dom0 will
> > > end up being set to a much lower target than is actually required.
> > 
> > Is this because some sort of slack is applied once per iteration rather
> > than once at the start or is it something else?
> 
> No - the slack reservation just complicated the request by (potentially) 
> requiring more free memory than domU initially requested.
> 
> With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, 
> frees memory for domU by lowering the memory target for dom0. However, this is 
> not a single request (e.g. free 64GB for domX), rather the memory target for 
> dom0 is set lower during every iteration through:
> 
>    rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
> 
> This causes dom0's memory target to be lowered by the needed amount during 
> every iteration of the loop. In practice, this causes the first request to 
> lower dom0's target by the full amount (e.g. -64GB), and subsequent iterations 
> further lower dom0's target by however much memory that still appears to be 
> required (e.g. three iterations of the loop might lower dom0's target by 
> -25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue 
> itself is due to the loop ignoring the fact that the original request set 
> dom0's target to the correct amount, but the ballooning has not completed.
> 
> The problem itself is easier to see when domU memory sizes are increased. As 
> mentioned before, starting a 512GB domain should guarantee that multiple 
> iterations of the loop are required, and dom0 will balloon down much further 
> than the required 512GB.
> 
> Does this clarify the situation?

I think so. In essence we just need to update need_memkb on each
iteration, right?

Ian.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 17:38                     ` Mike Latimer
  2015-02-26 17:47                       ` Ian Campbell
@ 2015-02-26 17:53                       ` Stefano Stabellini
  2015-02-26 20:45                         ` Mike Latimer
  2015-02-27  8:24                         ` Jan Beulich
  1 sibling, 2 replies; 41+ messages in thread
From: Stefano Stabellini @ 2015-02-26 17:53 UTC (permalink / raw)
  To: Mike Latimer
  Cc: wei.liu2, xen-devel, ian.jackson, Ian Campbell, Stefano Stabellini

On Thu, 26 Feb 2015, Mike Latimer wrote:
> On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote:
> > On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> > > There is still one aspect of my original patch that is important. As the
> > > code currently stands, the target for dom0 is set lower during each
> > > iteration of the loop. Unless only one iteration is required, dom0 will
> > > end up being set to a much lower target than is actually required.
> > 
> > Is this because some sort of slack is applied once per iteration rather
> > than once at the start or is it something else?
> 
> No - the slack reservation just complicated the request by (potentially) 
> requiring more free memory than domU initially requested.
> 
> With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, 
> frees memory for domU by lowering the memory target for dom0. However, this is 
> not a single request (e.g. free 64GB for domX), rather the memory target for 
> dom0 is set lower during every iteration through:
> 
>    rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
> 
> This causes dom0's memory target to be lowered by the needed amount during 
> every iteration of the loop. In practice, this causes the first request to 
> lower dom0's target by the full amount (e.g. -64GB), and subsequent iterations 
> further lower dom0's target by however much memory that still appears to be 
> required (e.g. three iterations of the loop might lower dom0's target by 
> -25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue 
> itself is due to the loop ignoring the fact that the original request set 
> dom0's target to the correct amount, but the ballooning has not completed.

What is the return value of libxl_set_memory_target and
libxl_wait_for_free_memory in that case? Isn't it just a matter of
properly handle the return values?

Or maybe we just need to change the libxl_set_memory_target call to use
an absolute memory target to avoid restricting dom0 memory more than
necessary at each iteration. Also increasing the timeout argument passed
to the libxl_wait_for_free_memory call could help.


> The problem itself is easier to see when domU memory sizes are increased. As 
> mentioned before, starting a 512GB domain should guarantee that multiple 
> iterations of the loop are required, and dom0 will balloon down much further 
> than the required 512GB.
> 
> Does this clarify the situation?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 17:47                       ` Ian Campbell
@ 2015-02-26 20:38                         ` Mike Latimer
  2015-02-27 10:17                           ` Ian Campbell
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Latimer @ 2015-02-26 20:38 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, Ian Campbell, Stefano Stabellini

(Sorry for the delayed response, dealing with ENOTIME.)

On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote:
> On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
>
> >    rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
>
> I think so. In essence we just need to update need_memkb on each
> iteration, right?

Not quite...  need_memkb is used in the loop to determine if we have enough 
free memory for the new domain. So, need_memkb should always remain set to the 
total amount of memory requested - not just the amount of change still 
required.

The easiest thing to do is set the dom0's memory target before the loop, which 
is what my original patch did.

Another approach would be something like this:

    uint32_t dom0_memkb, dom0_targetkb, pending_memkb;

    libxl_get_memory(ctx, 0, &dom0_memkb);   <--doesn't actually exist

    libxl_get_memory_target(ctx, 0, &dom0_targetkb);

    pending_memkb = (free_memkb + (dom0_memkb - dom0_targetkb));

    if (pending_memkb < need_memkb) {
        libxl_set_memory_target(ctx, 0, pending_memkb - need_memkb, 1, 0);
    }

which essentially sets pending_memkb to the amount of free memory plus the 
amount of memory which will be freed once dom0 hits the its target.

The final possibility I can think of is to ensure libxl_wait_for_memory_target 
does not return until the memory target has been reached. That raises some 
concern about what happens if the target cannot be reached though...

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 17:53                       ` Stefano Stabellini
@ 2015-02-26 20:45                         ` Mike Latimer
  2015-02-26 23:30                           ` Mike Latimer
  2015-02-27  8:24                         ` Jan Beulich
  1 sibling, 1 reply; 41+ messages in thread
From: Mike Latimer @ 2015-02-26 20:45 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.jackson, wei.liu2, Ian Campbell, Stefano Stabellini

On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
> What is the return value of libxl_set_memory_target and
> libxl_wait_for_free_memory in that case? Isn't it just a matter of
> properly handle the return values?

The return from libxl_set_memory_target is 0, as the assignment works just 
fine. I don't have the return from libxl_wait_for_free_memory in my notes, so 
I'll spin up another test and track that down.

> Or maybe we just need to change the libxl_set_memory_target call to use
> an absolute memory target to avoid restricting dom0 memory more than
> necessary at each iteration. Also increasing the timeout argument passed
> to the libxl_wait_for_free_memory call could help.

Using an absolute target would help, and would obviously only have to be set 
once - which is similar to what my patch did.

Increasing the timeout would help, but if the timeout were insufficient (say 
when dealing with very large guests), it wouldn't solve the problem.

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 20:45                         ` Mike Latimer
@ 2015-02-26 23:30                           ` Mike Latimer
  2015-02-27 10:21                             ` Ian Campbell
  2015-02-27 10:52                             ` Stefano Stabellini
  0 siblings, 2 replies; 41+ messages in thread
From: Mike Latimer @ 2015-02-26 23:30 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, Ian Campbell, Stefano Stabellini

On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote:
> On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
> > What is the return value of libxl_set_memory_target and
> > libxl_wait_for_free_memory in that case? Isn't it just a matter of
> > properly handle the return values?
> 
> The return from libxl_set_memory_target is 0, as the assignment works just
> fine. I don't have the return from libxl_wait_for_free_memory in my notes,
> so I'll spin up another test and track that down.

I slightly misspoke here... In my testing, the returns are actually:

   libxl_set_memory_target = 1
   libxl_wait_for_free_memory = -5
   libxl_wait_for_memory_target = 0
      Note - libxl_wait_for_memory_target is confusing, as rc can be set
      to ERROR_FAIL, but the function returns 0 anyway (unless an error
      is encountered earlier.) I guess this just means we need to continue
      to wait...

I was testing spinning up a 64GB guest on a 2TB host. After the ballooning had 
completed, dom0 had ballooned down an extra ~320GB. On this particular 
machine, each iteration of the loop was showing only 5-7GB of memory being 
freed at a time. (The loop took 12 iterations.)

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 17:53                       ` Stefano Stabellini
  2015-02-26 20:45                         ` Mike Latimer
@ 2015-02-27  8:24                         ` Jan Beulich
  2015-02-27 10:52                           ` Stefano Stabellini
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2015-02-27  8:24 UTC (permalink / raw)
  To: Stefano Stabellini, Mike Latimer
  Cc: ian.jackson, wei.liu2, Ian Campbell, xen-devel

>>> On 26.02.15 at 18:53, <stefano.stabellini@eu.citrix.com> wrote:
> Or maybe we just need to change the libxl_set_memory_target call to use
> an absolute memory target to avoid restricting dom0 memory more than
> necessary at each iteration. Also increasing the timeout argument passed
> to the libxl_wait_for_free_memory call could help.

Wouldn't that lead to problems with multiple racing invocations of
that function (perhaps from different processes)? (Note that I ask
this without knowledge on whether libxl already suitably serializes
things.)

Jan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 20:38                         ` Mike Latimer
@ 2015-02-27 10:17                           ` Ian Campbell
  2015-02-27 11:05                             ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-02-27 10:17 UTC (permalink / raw)
  To: Mike Latimer; +Cc: ian.jackson, Stefano Stabellini, wei.liu2, xen-devel

On Thu, 2015-02-26 at 13:38 -0700, Mike Latimer wrote:
> (Sorry for the delayed response, dealing with ENOTIME.)
> 
> On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote:
> > On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
> >
> > >    rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
> >
> > I think so. In essence we just need to update need_memkb on each
> > iteration, right?
> 
> Not quite...

Indeed, looking again I see that the 1 there means "relative", so I'm
still confused about why free_memkb - need_memkb isn't the correct delta
on every iteration.

Is the issue that if you have a current target of, say, 15 and you wish
to go to ten you would say

	libxl_set_memory_target(, 15 - (-5), 1, 0)
i.e.
	libxl_set_memory_target(, -5, 1, 0)

then the target would be set to 10, but if during
libxl_wait_for_free_memory you only ballooned -2 and failed the target
gets left at 10 but the current free is actually now 13 so next time
around you say:

	libxl_set_memory_target(, 13 - (-3), 1, 0)
i.e.
	libxl_set_memory_target(, -3, 1, 0)

and the target now becomes 10-3 == 7, rather than 13-3=10 as one might
expect?

>   need_memkb is used in the loop to determine if we have enough 
> free memory for the new domain. So, need_memkb should always remain set to the 
> total amount of memory requested - not just the amount of change still 
> required.
> 
> The easiest thing to do is set the dom0's memory target before the loop, which 
> is what my original patch did.

It seems like there are two viable approaches here:

First is to just set the target before the loop and wait (perhaps much
longer) for it to be achieved.

The second is to decrement the target in smaller steps and wait to reach
it each time.

I don't think an approach which sets a target, waits for that target to
be achieved and then on partial success tries to figure out what the
relative progress is and what is left to achieve  and factor that into a
new target request makes sense.

This is all confounded by the fact that the libxl_wait_for_free_*
functions have a barking interface. I've just seen this comment right
above:

/*
 * WARNING
 * This memory management API is unstable even in Xen 4.2.
 * It has a numer of deficiencies and we intend to replace it.
 *
 * The semantics of these functions should not be relied on to be very
 * coherent or stable.  We will however endeavour to keep working
 * existing programs which use them in roughly the same way as libxl.
 */

Given that I think that we should feel free, if necessary, to deprecate
the current interface and replace it with one which is actually usable.
Whatever that might mean.

Ian.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 23:30                           ` Mike Latimer
@ 2015-02-27 10:21                             ` Ian Campbell
  2015-02-27 10:52                             ` Stefano Stabellini
  1 sibling, 0 replies; 41+ messages in thread
From: Ian Campbell @ 2015-02-27 10:21 UTC (permalink / raw)
  To: Mike Latimer; +Cc: wei.liu2, Stefano Stabellini, ian.jackson, xen-devel

On Thu, 2015-02-26 at 16:30 -0700, Mike Latimer wrote:
> On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote:
> > On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
> > > What is the return value of libxl_set_memory_target and
> > > libxl_wait_for_free_memory in that case? Isn't it just a matter of
> > > properly handle the return values?
> > 
> > The return from libxl_set_memory_target is 0, as the assignment works just
> > fine. I don't have the return from libxl_wait_for_free_memory in my notes,
> > so I'll spin up another test and track that down.
> 
> I slightly misspoke here... In my testing, the returns are actually:
> 
>    libxl_set_memory_target = 1
>    libxl_wait_for_free_memory = -5
>    libxl_wait_for_memory_target = 0
>       Note - libxl_wait_for_memory_target is confusing,

Further to the comment I just made WRT this source comment:
        /*
         * WARNING
         * This memory management API is unstable even in Xen 4.2.
         * It has a numer of deficiencies and we intend to replace it.
         *
         * The semantics of these functions should not be relied on to be very
         * coherent or stable.  We will however endeavour to keep working
         * existing programs which use them in roughly the same way as libxl.
         */
        
I think we should feel free to introduce a new interface which has
semantics which we can actually work with. IOW

>  as rc can be set
>       to ERROR_FAIL, but the function returns 0 anyway (unless an error
>       is encountered earlier.) I guess this just means we need to continue
>       to wait...

Do something sensible so there is no more guessing.

I'm not sure yet what "sensible" would be.

One approach to fixing this might be when the replacenent for
libxl_wait_for_memory_target fails it sets the target to whatever was
actually achieved, such that further calculations involving free_memkb
and the overall target will still be valid.

Or we could move the "progress is being made" logic currently in xl's
freemem down into the wait_for_memory_target replacement so it hopefully
has more information available to it in order to make better decisions
about the timeouts.

Ian.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-26 23:30                           ` Mike Latimer
  2015-02-27 10:21                             ` Ian Campbell
@ 2015-02-27 10:52                             ` Stefano Stabellini
  2015-02-27 15:28                               ` Mike Latimer
  1 sibling, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2015-02-27 10:52 UTC (permalink / raw)
  To: Mike Latimer
  Cc: wei.liu2, Stefano Stabellini, ian.jackson, Ian Campbell, xen-devel

On Thu, 26 Feb 2015, Mike Latimer wrote:
> On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote:
> > On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
> > > What is the return value of libxl_set_memory_target and
> > > libxl_wait_for_free_memory in that case? Isn't it just a matter of
> > > properly handle the return values?
> > 
> > The return from libxl_set_memory_target is 0, as the assignment works just
> > fine. I don't have the return from libxl_wait_for_free_memory in my notes,
> > so I'll spin up another test and track that down.
> 
> I slightly misspoke here... In my testing, the returns are actually:
> 
>    libxl_set_memory_target = 1

The new memory target is set for dom0 successfully.


>    libxl_wait_for_free_memory = -5

Still there isn't enough free memory in the system.


>    libxl_wait_for_memory_target = 0

However dom0 reached the new memory target already.
Who is stealing your memory?


>       Note - libxl_wait_for_memory_target is confusing, as rc can be set
>       to ERROR_FAIL, but the function returns 0 anyway (unless an error
>       is encountered earlier.) I guess this just means we need to continue
>       to wait...

Maybe I am misunderstanding what you meant, but as far as I can tell rc
is set to ERROR_FAIL only right before the out label in
libxl_wait_for_memory_target. In that case the function would return
ERROR_FAIL.

In any case in the context of libxl_wait_for_memory_target, ERROR_FAIL
means that the memory target has not been reached.

 
> I was testing spinning up a 64GB guest on a 2TB host. After the ballooning had 
> completed, dom0 had ballooned down an extra ~320GB. On this particular 
> machine, each iteration of the loop was showing only 5-7GB of memory being 
> freed at a time. (The loop took 12 iterations.)

I would investigate why dom0 is ballooning down as much as you asked it
to, but the free memory in the system is still not enough.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-27  8:24                         ` Jan Beulich
@ 2015-02-27 10:52                           ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2015-02-27 10:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, Ian Campbell, Stefano Stabellini, ian.jackson,
	xen-devel, Mike Latimer

On Fri, 27 Feb 2015, Jan Beulich wrote:
> >>> On 26.02.15 at 18:53, <stefano.stabellini@eu.citrix.com> wrote:
> > Or maybe we just need to change the libxl_set_memory_target call to use
> > an absolute memory target to avoid restricting dom0 memory more than
> > necessary at each iteration. Also increasing the timeout argument passed
> > to the libxl_wait_for_free_memory call could help.
> 
> Wouldn't that lead to problems with multiple racing invocations of
> that function (perhaps from different processes)? (Note that I ask
> this without knowledge on whether libxl already suitably serializes
> things.)

Calls to freemem are serialized using a lockfile.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-27 10:17                           ` Ian Campbell
@ 2015-02-27 11:05                             ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2015-02-27 11:05 UTC (permalink / raw)
  To: Ian Campbell
  Cc: ian.jackson, Stefano Stabellini, wei.liu2, Mike Latimer, xen-devel

On Fri, 27 Feb 2015, Ian Campbell wrote:
> On Thu, 2015-02-26 at 13:38 -0700, Mike Latimer wrote:
> > (Sorry for the delayed response, dealing with ENOTIME.)
> > 
> > On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote:
> > > On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
> > >
> > > >    rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
> > >
> > > I think so. In essence we just need to update need_memkb on each
> > > iteration, right?
> > 
> > Not quite...
> 
> Indeed, looking again I see that the 1 there means "relative", so I'm
> still confused about why free_memkb - need_memkb isn't the correct delta
> on every iteration.
> 
> Is the issue that if you have a current target of, say, 15 and you wish
> to go to ten you would say
> 
> 	libxl_set_memory_target(, 15 - (-5), 1, 0)
> i.e.
> 	libxl_set_memory_target(, -5, 1, 0)
> 
> then the target would be set to 10, but if during
> libxl_wait_for_free_memory you only ballooned -2 and failed the target
> gets left at 10 but the current free is actually now 13 so next time
> around you say:
> 
> 	libxl_set_memory_target(, 13 - (-3), 1, 0)
> i.e.
> 	libxl_set_memory_target(, -3, 1, 0)
> 
> and the target now becomes 10-3 == 7, rather than 13-3=10 as one might
> expect?
> 
> >   need_memkb is used in the loop to determine if we have enough 
> > free memory for the new domain. So, need_memkb should always remain set to the 
> > total amount of memory requested - not just the amount of change still 
> > required.
> > 
> > The easiest thing to do is set the dom0's memory target before the loop, which 
> > is what my original patch did.
> 
> It seems like there are two viable approaches here:
> 
> First is to just set the target before the loop and wait (perhaps much
> longer) for it to be achieved.
>
> The second is to decrement the target in smaller steps and wait to reach
> it each time.
>
> I don't think an approach which sets a target, waits for that target to
> be achieved and then on partial success tries to figure out what the
> relative progress is and what is left to achieve  and factor that into a
> new target request makes sense.

The reason for the loop is not to make the memory decrease request more
digestible for dom0 or coping with errors. The loop tries to handle
scenarios were the freed memory is not available to us somehow.
This is a more wordy explanation of it:

  get free memory
  is it enough? if so, return, otherwise continue
  set dom0 memory target = current - need
  is there enough memory now? if so, return, otherwise continue
  has dom0 actually reached his target? If so, loop again (who stole the memory?), otherwise fail (dom0 is busy)

This is consistent with Mike's logs: the memory is freed by dom0 but it
is not available somehow. Maybe XenD is running? Another guest is
ballooning up at the same time?


> This is all confounded by the fact that the libxl_wait_for_free_*
> functions have a barking interface.

That is true


> I've just seen this comment right
> above:
>
> /*
>  * WARNING
>  * This memory management API is unstable even in Xen 4.2.
>  * It has a numer of deficiencies and we intend to replace it.
>  *
>  * The semantics of these functions should not be relied on to be very
>  * coherent or stable.  We will however endeavour to keep working
>  * existing programs which use them in roughly the same way as libxl.
>  */
> 
> Given that I think that we should feel free, if necessary, to deprecate
> the current interface and replace it with one which is actually usable.
> Whatever that might mean.
> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-27 10:52                             ` Stefano Stabellini
@ 2015-02-27 15:28                               ` Mike Latimer
  2015-02-27 18:29                                 ` Mike Latimer
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Latimer @ 2015-02-27 15:28 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.jackson, wei.liu2, Ian Campbell, Stefano Stabellini

On Friday, February 27, 2015 10:52:17 AM Stefano Stabellini wrote:
> On Thu, 26 Feb 2015, Mike Latimer wrote:
> >    libxl_set_memory_target = 1
> 
> The new memory target is set for dom0 successfully.
> 
> >    libxl_wait_for_free_memory = -5
> 
> Still there isn't enough free memory in the system.
> 
> >    libxl_wait_for_memory_target = 0
> 
> However dom0 reached the new memory target already.
> Who is stealing your memory?

I just realized I was missing commit 2048aeec, which corrects the hardcoded 
return value of libxl_wait_for_memory_target from 0 to rc. I'll retest with 
this change in place.

> In any case in the context of libxl_wait_for_memory_target, ERROR_FAIL
> means that the memory target has not been reached.

I'm expecting this commit to to change what I'm seeing, but I'm not convinced 
it will be a good change...  There is zero chance dom0 will balloon down 64GB 
(or 512GB) in the 10 second window set by freemem. This will likely mean the 
entire process will fail (when given a bit more time it would have succeeded).

I'll add the missing commit, and send a complete set of debug logs later 
today.

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-27 15:28                               ` Mike Latimer
@ 2015-02-27 18:29                                 ` Mike Latimer
  2015-02-28  0:31                                   ` Mike Latimer
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Latimer @ 2015-02-27 18:29 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, Ian Campbell, Stefano Stabellini

On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> On Friday, February 27, 2015 10:52:17 AM Stefano Stabellini wrote:
> > On Thu, 26 Feb 2015, Mike Latimer wrote:
> > >    libxl_set_memory_target = 1
> > 
> > The new memory target is set for dom0 successfully.
> > 
> > >    libxl_wait_for_free_memory = -5
> > 
> > Still there isn't enough free memory in the system.
> > 
> > >    libxl_wait_for_memory_target = 0
> > 
> > However dom0 reached the new memory target already.
> > Who is stealing your memory?
> 
> I just realized I was missing commit 2048aeec, which corrects the hardcoded
> return value of libxl_wait_for_memory_target from 0 to rc. I'll retest with
> this change in place.
> 
> > In any case in the context of libxl_wait_for_memory_target, ERROR_FAIL
> > means that the memory target has not been reached.
> 
> I'm expecting this commit to to change what I'm seeing, but I'm not
> convinced it will be a good change...  There is zero chance dom0 will
> balloon down 64GB (or 512GB) in the 10 second window set by freemem. This
> will likely mean the entire process will fail (when given a bit more time
> it would have succeeded).
> 
> I'll add the missing commit, and send a complete set of debug logs later
> today.

After adding 2048aeec, dom0's target is lowered by the required amount (e.g. 
64GB), but as dom0 cannot balloon down fast enough, 
libxl_wait_for_memory_target returns -5, and the domain create fails ("failed 
to free memory for the domain").

As dom0's target was lowered successfully, dom0 continues to balloon down in 
the background. So, after waiting a while, the domain creation will succeed. 
This is one of the problems I would like to solve. As the ballooning is 
working (just taking longer than expected) the code should monitor it and wait 
somehow.

I'll send in detailed logs (without 2048aeec) later today, to make sure I've 
explained this well enough.

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-27 18:29                                 ` Mike Latimer
@ 2015-02-28  0:31                                   ` Mike Latimer
  2015-03-02 10:12                                     ` Stefano Stabellini
  2015-03-02 11:42                                     ` Ian Campbell
  0 siblings, 2 replies; 41+ messages in thread
From: Mike Latimer @ 2015-02-28  0:31 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.jackson, wei.liu2, Ian Campbell, Stefano Stabellini

On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
> On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> After adding 2048aeec, dom0's target is lowered by the required amount (e.g.
> 64GB), but as dom0 cannot balloon down fast enough,
> libxl_wait_for_memory_target returns -5, and the domain create fails
    (wrong return code - libxl_wait_for_memory_target actually returns -3)

With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
messages look like this:

Parsing config from sles12pv
 DBG: start freemem loop
 DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
 DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
 DBG: wait_for_free_memory = 67651584 (rc=-5)
 DBG: wait_for_memory_target (rc=-3)
failed to free memory for the domain

After failing, dom0 continues to balloon down by the requested amount 
(-67109608), so a subsequent startup attempt would work.

My original fix (2563bca1) was intended to continue looping in freem until dom0 
ballooned down the requested amount. However, this really only worked without 
2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
pointed out this problem, commit 2563bca1 can still be useful - but seems less 
important as ballooning down dom0 is where the major delays are seen.

The following messages show what was happening when wait_for_memory_target was 
always returning 0. I've narrowed it down to just the interesting messages:

DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)

The above situation is no longer relevant, but the overall dom0 target problem 
is still an issue. It now seems rather obvious (hopefully) that the 10 second 
delay in wait_for_memory_target is not sufficient. Should that function be 
modified to monitor ongoing progress and continue waiting as long as progress 
is being made?

Sorry for the long discussion to get to this point. :(

-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-28  0:31                                   ` Mike Latimer
@ 2015-03-02 10:12                                     ` Stefano Stabellini
  2015-03-02 10:44                                       ` Jan Beulich
  2015-03-02 11:42                                     ` Ian Campbell
  1 sibling, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2015-03-02 10:12 UTC (permalink / raw)
  To: Mike Latimer
  Cc: ian.jackson, Stefano Stabellini, wei.liu2, Ian Campbell, xen-devel

On Fri, 27 Feb 2015, Mike Latimer wrote:
> On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
> > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> > After adding 2048aeec, dom0's target is lowered by the required amount (e.g.
> > 64GB), but as dom0 cannot balloon down fast enough,
> > libxl_wait_for_memory_target returns -5, and the domain create fails
>     (wrong return code - libxl_wait_for_memory_target actually returns -3)
> 
> With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
> messages look like this:
> 
> Parsing config from sles12pv
>  DBG: start freemem loop
>  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
>  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
>  DBG: wait_for_free_memory = 67651584 (rc=-5)
>  DBG: wait_for_memory_target (rc=-3)
> failed to free memory for the domain
> 
> After failing, dom0 continues to balloon down by the requested amount 
> (-67109608), so a subsequent startup attempt would work.
> 
> My original fix (2563bca1) was intended to continue looping in freem until dom0 
> ballooned down the requested amount. However, this really only worked without 
> 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
> pointed out this problem, commit 2563bca1 can still be useful - but seems less 
> important as ballooning down dom0 is where the major delays are seen.
> 
> The following messages show what was happening when wait_for_memory_target was 
> always returning 0. I've narrowed it down to just the interesting messages:
> 
> DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
> DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
> DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
> DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
> DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
> DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
> DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
> DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
> DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
> DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
> DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
> DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
> 
> The above situation is no longer relevant, but the overall dom0 target problem 
> is still an issue. It now seems rather obvious (hopefully) that the 10 second 
> delay in wait_for_memory_target is not sufficient. Should that function be 
> modified to monitor ongoing progress and continue waiting as long as progress 
> is being made?
> 
> Sorry for the long discussion to get to this point. :(

I think we need to increase the timeout passed to
libxl_wait_for_free_memory. Would 30 sec be enough?

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 53c16eb..7779350 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2222,7 +2222,7 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
         if (rc < 0)
             return rc;
 
-        rc = libxl_wait_for_free_memory(ctx, domid, need_memkb, 10);
+        rc = libxl_wait_for_free_memory(ctx, domid, need_memkb, 30);
         if (!rc)
             return 0;
         else if (rc != ERROR_NOMEM)

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 10:12                                     ` Stefano Stabellini
@ 2015-03-02 10:44                                       ` Jan Beulich
  2015-03-02 12:13                                         ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2015-03-02 10:44 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: ian.jackson, Mike Latimer, wei.liu2, Ian Campbell, xen-devel

>>> On 02.03.15 at 11:12, <stefano.stabellini@eu.citrix.com> wrote:
> On Fri, 27 Feb 2015, Mike Latimer wrote:
>> On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
>> > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
>> > After adding 2048aeec, dom0's target is lowered by the required amount 
> (e.g.
>> > 64GB), but as dom0 cannot balloon down fast enough,
>> > libxl_wait_for_memory_target returns -5, and the domain create fails
>>     (wrong return code - libxl_wait_for_memory_target actually returns -3)
>> 
>> With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
>> messages look like this:
>> 
>> Parsing config from sles12pv
>>  DBG: start freemem loop
>>  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
>>  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
>>  DBG: wait_for_free_memory = 67651584 (rc=-5)
>>  DBG: wait_for_memory_target (rc=-3)
>> failed to free memory for the domain
>> 
>> After failing, dom0 continues to balloon down by the requested amount 
>> (-67109608), so a subsequent startup attempt would work.
>> 
>> My original fix (2563bca1) was intended to continue looping in freem until 
> dom0 
>> ballooned down the requested amount. However, this really only worked 
> without 
>> 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
>> pointed out this problem, commit 2563bca1 can still be useful - but seems 
> less 
>> important as ballooning down dom0 is where the major delays are seen.
>> 
>> The following messages show what was happening when wait_for_memory_target 
> was 
>> always returning 0. I've narrowed it down to just the interesting messages:
>> 
>> DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
>> DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
>> DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
>> DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
>> DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
>> DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
>> DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
>> DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
>> DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
>> DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
>> DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
>> DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
>> 
>> The above situation is no longer relevant, but the overall dom0 target 
> problem 
>> is still an issue. It now seems rather obvious (hopefully) that the 10 
> second 
>> delay in wait_for_memory_target is not sufficient. Should that function be 
>> modified to monitor ongoing progress and continue waiting as long as 
> progress 
>> is being made?
>> 
>> Sorry for the long discussion to get to this point. :(
> 
> I think we need to increase the timeout passed to
> libxl_wait_for_free_memory. Would 30 sec be enough?

No fixed timeout will ever be enough for arbitrarily large requests.

Jan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-02-28  0:31                                   ` Mike Latimer
  2015-03-02 10:12                                     ` Stefano Stabellini
@ 2015-03-02 11:42                                     ` Ian Campbell
  2015-03-02 15:19                                       ` Stefano Stabellini
  1 sibling, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-03-02 11:42 UTC (permalink / raw)
  To: Mike Latimer; +Cc: ian.jackson, Stefano Stabellini, wei.liu2, xen-devel

On Fri, 2015-02-27 at 17:31 -0700, Mike Latimer wrote:
> On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
> > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> > After adding 2048aeec, dom0's target is lowered by the required amount (e.g.
> > 64GB), but as dom0 cannot balloon down fast enough,
> > libxl_wait_for_memory_target returns -5, and the domain create fails
>     (wrong return code - libxl_wait_for_memory_target actually returns -3)
> 
> With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
> messages look like this:
> 
> Parsing config from sles12pv
>  DBG: start freemem loop
>  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
>  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
>  DBG: wait_for_free_memory = 67651584 (rc=-5)
>  DBG: wait_for_memory_target (rc=-3)
> failed to free memory for the domain
> 
> After failing, dom0 continues to balloon down by the requested amount 
> (-67109608), so a subsequent startup attempt would work.
> 
> My original fix (2563bca1) was intended to continue looping in freem until dom0 
> ballooned down the requested amount. However, this really only worked without 
> 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
> pointed out this problem, commit 2563bca1 can still be useful - but seems less 
> important as ballooning down dom0 is where the major delays are seen.
> 
> The following messages show what was happening when wait_for_memory_target was 
> always returning 0. I've narrowed it down to just the interesting messages:
> 
> DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
> DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
> DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
> DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
> DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
> DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
> DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
> DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
> DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
> DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
> DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
> DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
> 
> The above situation is no longer relevant, but the overall dom0 target problem 
> is still an issue. It now seems rather obvious (hopefully) that the 10 second 
> delay in wait_for_memory_target is not sufficient. Should that function be 
> modified to monitor ongoing progress and continue waiting as long as progress 
> is being made?

You mean to push the logic currently in freemem to keep going so long as
there is progress being made down into libxl_wait_for_memory_target?
That seems like a good idea.

Only problem would be API compatibility for older versions. Given the
comment in libxl.h above this function perhaps we can sweep this under
the carpet, and say the timeout is now the time to wait without progress
being made.

Ian.

> 
> Sorry for the long discussion to get to this point. :(
> 
> -Mike
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 10:44                                       ` Jan Beulich
@ 2015-03-02 12:13                                         ` Stefano Stabellini
  2015-03-02 13:04                                           ` Jan Beulich
       [not found]                                           ` <54F46DDB020000780006505B@suse.com>
  0 siblings, 2 replies; 41+ messages in thread
From: Stefano Stabellini @ 2015-03-02 12:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, Ian Campbell, Stefano Stabellini, ian.jackson,
	xen-devel, Mike Latimer

On Mon, 2 Mar 2015, Jan Beulich wrote:
> >>> On 02.03.15 at 11:12, <stefano.stabellini@eu.citrix.com> wrote:
> > On Fri, 27 Feb 2015, Mike Latimer wrote:
> >> On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
> >> > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> >> > After adding 2048aeec, dom0's target is lowered by the required amount 
> > (e.g.
> >> > 64GB), but as dom0 cannot balloon down fast enough,
> >> > libxl_wait_for_memory_target returns -5, and the domain create fails
> >>     (wrong return code - libxl_wait_for_memory_target actually returns -3)
> >> 
> >> With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
> >> messages look like this:
> >> 
> >> Parsing config from sles12pv
> >>  DBG: start freemem loop
> >>  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
> >>  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
> >>  DBG: wait_for_free_memory = 67651584 (rc=-5)
> >>  DBG: wait_for_memory_target (rc=-3)
> >> failed to free memory for the domain
> >> 
> >> After failing, dom0 continues to balloon down by the requested amount 
> >> (-67109608), so a subsequent startup attempt would work.
> >> 
> >> My original fix (2563bca1) was intended to continue looping in freem until 
> > dom0 
> >> ballooned down the requested amount. However, this really only worked 
> > without 
> >> 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
> >> pointed out this problem, commit 2563bca1 can still be useful - but seems 
> > less 
> >> important as ballooning down dom0 is where the major delays are seen.
> >> 
> >> The following messages show what was happening when wait_for_memory_target 
> > was 
> >> always returning 0. I've narrowed it down to just the interesting messages:
> >> 
> >> DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
> >> DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
> >> DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
> >> DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
> >> DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
> >> DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
> >> DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
> >> DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
> >> DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
> >> DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
> >> DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
> >> DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
> >> 
> >> The above situation is no longer relevant, but the overall dom0 target 
> > problem 
> >> is still an issue. It now seems rather obvious (hopefully) that the 10 
> > second 
> >> delay in wait_for_memory_target is not sufficient. Should that function be 
> >> modified to monitor ongoing progress and continue waiting as long as 
> > progress 
> >> is being made?
> >> 
> >> Sorry for the long discussion to get to this point. :(
> > 
> > I think we need to increase the timeout passed to
> > libxl_wait_for_free_memory. Would 30 sec be enough?
> 
> No fixed timeout will ever be enough for arbitrarily large requests.

There is no way for Dom0 to notify Xen and/or libxl that ballooning is
completed. There is no choice but to wait. We could make the wait time
unlimited (after all arbitrarily large requests need arbitrarily large
wait times), but do we really want that?

Of course users could just use dom0_mem and get down with it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 12:13                                         ` Stefano Stabellini
@ 2015-03-02 13:04                                           ` Jan Beulich
       [not found]                                           ` <54F46DDB020000780006505B@suse.com>
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2015-03-02 13:04 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: ian.jackson, Mike Latimer, wei.liu2, Ian Campbell, xen-devel

>>> On 02.03.15 at 13:13, <stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 2 Mar 2015, Jan Beulich wrote:
>> >>> On 02.03.15 at 11:12, <stefano.stabellini@eu.citrix.com> wrote:
>> > On Fri, 27 Feb 2015, Mike Latimer wrote:
>> >> On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
>> >> > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
>> >> > After adding 2048aeec, dom0's target is lowered by the required amount 
>> > (e.g.
>> >> > 64GB), but as dom0 cannot balloon down fast enough,
>> >> > libxl_wait_for_memory_target returns -5, and the domain create fails
>> >>     (wrong return code - libxl_wait_for_memory_target actually returns -3)
>> >> 
>> >> With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
>> >> messages look like this:
>> >> 
>> >> Parsing config from sles12pv
>> >>  DBG: start freemem loop
>> >>  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
>> >>  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
>> >>  DBG: wait_for_free_memory = 67651584 (rc=-5)
>> >>  DBG: wait_for_memory_target (rc=-3)
>> >> failed to free memory for the domain
>> >> 
>> >> After failing, dom0 continues to balloon down by the requested amount 
>> >> (-67109608), so a subsequent startup attempt would work.
>> >> 
>> >> My original fix (2563bca1) was intended to continue looping in freem until 
>> > dom0 
>> >> ballooned down the requested amount. However, this really only worked 
>> > without 
>> >> 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
>> >> pointed out this problem, commit 2563bca1 can still be useful - but seems 
>> > less 
>> >> important as ballooning down dom0 is where the major delays are seen.
>> >> 
>> >> The following messages show what was happening when wait_for_memory_target 
>> > was 
>> >> always returning 0. I've narrowed it down to just the interesting messages:
>> >> 
>> >> DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
>> >> DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
>> >> DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
>> >> DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
>> >> DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
>> >> DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
>> >> DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
>> >> DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
>> >> DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
>> >> DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
>> >> DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
>> >> DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
>> >> 
>> >> The above situation is no longer relevant, but the overall dom0 target 
>> > problem 
>> >> is still an issue. It now seems rather obvious (hopefully) that the 10 
>> > second 
>> >> delay in wait_for_memory_target is not sufficient. Should that function be 
>> >> modified to monitor ongoing progress and continue waiting as long as 
>> > progress 
>> >> is being made?
>> >> 
>> >> Sorry for the long discussion to get to this point. :(
>> > 
>> > I think we need to increase the timeout passed to
>> > libxl_wait_for_free_memory. Would 30 sec be enough?
>> 
>> No fixed timeout will ever be enough for arbitrarily large requests.
> 
> There is no way for Dom0 to notify Xen and/or libxl that ballooning is
> completed. There is no choice but to wait. We could make the wait time
> unlimited (after all arbitrarily large requests need arbitrarily large
> wait times), but do we really want that?

That's why almost everyone else seem to agree that waiting as
long as there is progress being made is the right approach.

> Of course users could just use dom0_mem and get down with it.

I don't think we should make this a requirement for correct
operation.

Jan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 11:42                                     ` Ian Campbell
@ 2015-03-02 15:19                                       ` Stefano Stabellini
  2015-03-02 16:04                                         ` Ian Campbell
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2015-03-02 15:19 UTC (permalink / raw)
  To: Ian Campbell
  Cc: ian.jackson, Stefano Stabellini, wei.liu2, Mike Latimer, xen-devel

On Mon, 2 Mar 2015, Ian Campbell wrote:
> On Fri, 2015-02-27 at 17:31 -0700, Mike Latimer wrote:
> > On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
> > > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> > > After adding 2048aeec, dom0's target is lowered by the required amount (e.g.
> > > 64GB), but as dom0 cannot balloon down fast enough,
> > > libxl_wait_for_memory_target returns -5, and the domain create fails
> >     (wrong return code - libxl_wait_for_memory_target actually returns -3)
> > 
> > With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
> > messages look like this:
> > 
> > Parsing config from sles12pv
> >  DBG: start freemem loop
> >  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
> >  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
> >  DBG: wait_for_free_memory = 67651584 (rc=-5)
> >  DBG: wait_for_memory_target (rc=-3)
> > failed to free memory for the domain
> > 
> > After failing, dom0 continues to balloon down by the requested amount 
> > (-67109608), so a subsequent startup attempt would work.
> > 
> > My original fix (2563bca1) was intended to continue looping in freem until dom0 
> > ballooned down the requested amount. However, this really only worked without 
> > 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
> > pointed out this problem, commit 2563bca1 can still be useful - but seems less 
> > important as ballooning down dom0 is where the major delays are seen.
> > 
> > The following messages show what was happening when wait_for_memory_target was 
> > always returning 0. I've narrowed it down to just the interesting messages:
> > 
> > DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
> > DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
> > DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
> > DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
> > DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
> > DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
> > DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
> > DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
> > DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
> > DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
> > DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
> > DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
> > 
> > The above situation is no longer relevant, but the overall dom0 target problem 
> > is still an issue. It now seems rather obvious (hopefully) that the 10 second 
> > delay in wait_for_memory_target is not sufficient. Should that function be 
> > modified to monitor ongoing progress and continue waiting as long as progress 
> > is being made?
> 
> You mean to push the logic currently in freemem to keep going so long as
> there is progress being made down into libxl_wait_for_memory_target?
> That seems like a good idea.

"Continue as long as progress is being made" is not exactly the idea
behind the current implementation of freemem even if we do have a check
like 

  if (free_memkb > free_memkb_prev) {

at the end.  See:

http://marc.info/?l=xen-devel&m=142503529725529

Specifically the current loop is not able to cope with Dom0 decreasing
its reservation but not reaching its target (because it is too slow or
because of other reasons).

The following changes to freemem implements the aforementioned logic by
improving libxl_wait_for_memory_target to keep going as long as dom0 is
able to make progress and removing the useless free_memkb >
free_memkb_prev check.


diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b9d083a..ec98e00 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5002,26 +5002,41 @@ int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs)
 {
     int rc = 0;
     uint32_t target_memkb = 0;
+    uint64_t current_memkb, prev_memkb;
     libxl_dominfo info;
 
+    rc = libxl_get_memory_target(ctx, domid, &target_memkb);
+    if (rc < 0)
+        return rc;
+
     libxl_dominfo_init(&info);
+    prev_memkb = UINT64_MAX;
 
     do {
-        wait_secs--;
         sleep(1);
 
-        rc = libxl_get_memory_target(ctx, domid, &target_memkb);
-        if (rc < 0)
-            goto out;
-
         libxl_dominfo_dispose(&info);
         libxl_dominfo_init(&info);
         rc = libxl_domain_info(ctx, &info, domid);
         if (rc < 0)
             goto out;
-    } while (wait_secs > 0 && (info.current_memkb + info.outstanding_memkb) > target_memkb);
 
-    if ((info.current_memkb + info.outstanding_memkb) <= target_memkb)
+        current_memkb = info.current_memkb + info.outstanding_memkb;
+
+        if (current_memkb > prev_memkb)
+        {
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        else if (current_memkb == prev_memkb)
+            wait_secs--;
+        /* if current_memkb < prev_memkb loop for free as progress has
+         * been made */
+
+        prev_memkb = current_memkb;
+    } while (wait_secs > 0 && current_memkb > target_memkb);
+
+    if (current_memkb <= target_memkb)
         rc = 0;
     else
         rc = ERROR_FAIL;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 53c16eb..17f61a8 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2200,7 +2200,7 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
 {
     int rc, retries;
     const int MAX_RETRIES = 3;
-    uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
+    uint32_t need_memkb, free_memkb;
 
     if (!autoballoon)
         return 0;
@@ -2228,22 +2228,14 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
         else if (rc != ERROR_NOMEM)
             return rc;
 
-        /* the memory target has been reached but the free memory is still
-         * not enough: loop over again */
+        /* wait until dom0 reaches its target, as long as we are making
+         * progress */
         rc = libxl_wait_for_memory_target(ctx, 0, 1);
         if (rc < 0)
             return rc;
 
-        /*
-         * If the amount of free mem has increased on this iteration (i.e.
-         * some progress has been made) then reset the retry counter.
-         */
-        if (free_memkb > free_memkb_prev) {
-            retries = MAX_RETRIES;
-            free_memkb_prev = free_memkb;
-        } else {
-            retries--;
-        }
+        /* dom0 target has been reached: loop again */
+        retries--;
     } while (retries > 0);
 
     return ERROR_NOMEM;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 15:19                                       ` Stefano Stabellini
@ 2015-03-02 16:04                                         ` Ian Campbell
  2015-03-02 16:15                                           ` Stefano Stabellini
  0 siblings, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-03-02 16:04 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: ian.jackson, wei.liu2, Mike Latimer, xen-devel

On Mon, 2015-03-02 at 15:19 +0000, Stefano Stabellini wrote:
> On Mon, 2 Mar 2015, Ian Campbell wrote:
> > On Fri, 2015-02-27 at 17:31 -0700, Mike Latimer wrote:
> > > On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
> > > > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> > > > After adding 2048aeec, dom0's target is lowered by the required amount (e.g.
> > > > 64GB), but as dom0 cannot balloon down fast enough,
> > > > libxl_wait_for_memory_target returns -5, and the domain create fails
> > >     (wrong return code - libxl_wait_for_memory_target actually returns -3)
> > > 
> > > With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
> > > messages look like this:
> > > 
> > > Parsing config from sles12pv
> > >  DBG: start freemem loop
> > >  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
> > >  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
> > >  DBG: wait_for_free_memory = 67651584 (rc=-5)
> > >  DBG: wait_for_memory_target (rc=-3)
> > > failed to free memory for the domain
> > > 
> > > After failing, dom0 continues to balloon down by the requested amount 
> > > (-67109608), so a subsequent startup attempt would work.
> > > 
> > > My original fix (2563bca1) was intended to continue looping in freem until dom0 
> > > ballooned down the requested amount. However, this really only worked without 
> > > 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
> > > pointed out this problem, commit 2563bca1 can still be useful - but seems less 
> > > important as ballooning down dom0 is where the major delays are seen.
> > > 
> > > The following messages show what was happening when wait_for_memory_target was 
> > > always returning 0. I've narrowed it down to just the interesting messages:
> > > 
> > > DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
> > > DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
> > > DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
> > > DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
> > > DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
> > > DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
> > > DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
> > > DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
> > > DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
> > > DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
> > > DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
> > > DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
> > > 
> > > The above situation is no longer relevant, but the overall dom0 target problem 
> > > is still an issue. It now seems rather obvious (hopefully) that the 10 second 
> > > delay in wait_for_memory_target is not sufficient. Should that function be 
> > > modified to monitor ongoing progress and continue waiting as long as progress 
> > > is being made?
> > 
> > You mean to push the logic currently in freemem to keep going so long as
> > there is progress being made down into libxl_wait_for_memory_target?
> > That seems like a good idea.
> 
> "Continue as long as progress is being made" is not exactly the idea
> behind the current implementation of freemem even if we do have a check
> like 
> 
>   if (free_memkb > free_memkb_prev) {
> 
> at the end.

? "Continue as long as progress is being made" is exactly what
2563bca1154 "libxl: Wait for ballooning if free memory is increasing"
was trying to implement, so it certainly was the idea behind the current
implementation (it may well not have managed to implement the idea it
was trying to).

>   See:
> 
> http://marc.info/?l=xen-devel&m=142503529725529
> 
> Specifically the current loop is not able to cope with Dom0 decreasing
> its reservation but not reaching its target (because it is too slow or
> because of other reasons).

I'm not sure what you are saying here.

Are you just agreeing with the suggestion in an oblique way?

> The following changes to freemem implements the aforementioned logic by
> improving libxl_wait_for_memory_target to keep going as long as dom0 is
> able to make progress and removing the useless free_memkb >
> free_memkb_prev check.
> 
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index b9d083a..ec98e00 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5002,26 +5002,41 @@ int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs)
>  {
>      int rc = 0;
>      uint32_t target_memkb = 0;
> +    uint64_t current_memkb, prev_memkb;
>      libxl_dominfo info;
>  
> +    rc = libxl_get_memory_target(ctx, domid, &target_memkb);
> +    if (rc < 0)
> +        return rc;
> +
>      libxl_dominfo_init(&info);
> +    prev_memkb = UINT64_MAX;
>  
>      do {
> -        wait_secs--;
>          sleep(1);
>  
> -        rc = libxl_get_memory_target(ctx, domid, &target_memkb);
> -        if (rc < 0)
> -            goto out;
> -
>          libxl_dominfo_dispose(&info);
>          libxl_dominfo_init(&info);
>          rc = libxl_domain_info(ctx, &info, domid);
>          if (rc < 0)
>              goto out;
> -    } while (wait_secs > 0 && (info.current_memkb + info.outstanding_memkb) > target_memkb);
>  
> -    if ((info.current_memkb + info.outstanding_memkb) <= target_memkb)
> +        current_memkb = info.current_memkb + info.outstanding_memkb;
> +
> +        if (current_memkb > prev_memkb)
> +        {
> +            rc = ERROR_FAIL;
> +            goto out;
> +        }
> +        else if (current_memkb == prev_memkb)
> +            wait_secs--;
> +        /* if current_memkb < prev_memkb loop for free as progress has
> +         * been made */
> +
> +        prev_memkb = current_memkb;
> +    } while (wait_secs > 0 && current_memkb > target_memkb);
> +
> +    if (current_memkb <= target_memkb)
>          rc = 0;
>      else
>          rc = ERROR_FAIL;
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 53c16eb..17f61a8 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2200,7 +2200,7 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
>  {
>      int rc, retries;
>      const int MAX_RETRIES = 3;
> -    uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
> +    uint32_t need_memkb, free_memkb;
>  
>      if (!autoballoon)
>          return 0;
> @@ -2228,22 +2228,14 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
>          else if (rc != ERROR_NOMEM)
>              return rc;
>  
> -        /* the memory target has been reached but the free memory is still
> -         * not enough: loop over again */
> +        /* wait until dom0 reaches its target, as long as we are making
> +         * progress */
>          rc = libxl_wait_for_memory_target(ctx, 0, 1);
>          if (rc < 0)
>              return rc;
>  
> -        /*
> -         * If the amount of free mem has increased on this iteration (i.e.
> -         * some progress has been made) then reset the retry counter.
> -         */
> -        if (free_memkb > free_memkb_prev) {
> -            retries = MAX_RETRIES;
> -            free_memkb_prev = free_memkb;
> -        } else {
> -            retries--;
> -        }
> +        /* dom0 target has been reached: loop again */
> +        retries--;
>      } while (retries > 0);
>  
>      return ERROR_NOMEM;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 16:04                                         ` Ian Campbell
@ 2015-03-02 16:15                                           ` Stefano Stabellini
  2015-03-02 22:49                                             ` Mike Latimer
  0 siblings, 1 reply; 41+ messages in thread
From: Stefano Stabellini @ 2015-03-02 16:15 UTC (permalink / raw)
  To: Ian Campbell
  Cc: ian.jackson, xen-devel, wei.liu2, Mike Latimer, Stefano Stabellini

On Mon, 2 Mar 2015, Ian Campbell wrote:
> On Mon, 2015-03-02 at 15:19 +0000, Stefano Stabellini wrote:
> > On Mon, 2 Mar 2015, Ian Campbell wrote:
> > > On Fri, 2015-02-27 at 17:31 -0700, Mike Latimer wrote:
> > > > On Friday, February 27, 2015 11:29:12 AM Mike Latimer wrote:
> > > > > On Friday, February 27, 2015 08:28:49 AM Mike Latimer wrote:
> > > > > After adding 2048aeec, dom0's target is lowered by the required amount (e.g.
> > > > > 64GB), but as dom0 cannot balloon down fast enough,
> > > > > libxl_wait_for_memory_target returns -5, and the domain create fails
> > > >     (wrong return code - libxl_wait_for_memory_target actually returns -3)
> > > > 
> > > > With libxl_wait_for_memory_target return code corrected (2048aeec), debug 
> > > > messages look like this:
> > > > 
> > > > Parsing config from sles12pv
> > > >  DBG: start freemem loop
> > > >  DBG: free_memkb = 541976, need_memkb = 67651584 (rc=0)
> > > >  DBG: dom0_curr_target = 2118976472, set_memory_target = -67109608 (rc=1)
> > > >  DBG: wait_for_free_memory = 67651584 (rc=-5)
> > > >  DBG: wait_for_memory_target (rc=-3)
> > > > failed to free memory for the domain
> > > > 
> > > > After failing, dom0 continues to balloon down by the requested amount 
> > > > (-67109608), so a subsequent startup attempt would work.
> > > > 
> > > > My original fix (2563bca1) was intended to continue looping in freem until dom0 
> > > > ballooned down the requested amount. However, this really only worked without 
> > > > 2048aeec, as wait_for_memory_target was always returning 0. After Stefano 
> > > > pointed out this problem, commit 2563bca1 can still be useful - but seems less 
> > > > important as ballooning down dom0 is where the major delays are seen.
> > > > 
> > > > The following messages show what was happening when wait_for_memory_target was 
> > > > always returning 0. I've narrowed it down to just the interesting messages:
> > > > 
> > > > DBG: free_memkb = 9794852, need_memkb = 67651584 (rc=0)
> > > > DBG: dom0_curr_target = 2118976464, set_memory_target = -67109596 (rc=1)
> > > > DBG: dom0_curr_target = 2051866868, set_memory_target = -57856732 (rc=1)
> > > > DBG: dom0_curr_target = 1994010136, set_memory_target = -50615004 (rc=1)
> > > > DBG: dom0_curr_target = 1943395132, set_memory_target = -43965148 (rc=1)
> > > > DBG: dom0_curr_target = 1899429984, set_memory_target = -37538524 (rc=1)
> > > > DBG: dom0_curr_target = 1861891460, set_memory_target = -31560412 (rc=1)
> > > > DBG: dom0_curr_target = 1830331048, set_memory_target = -25309916 (rc=1)
> > > > DBG: dom0_curr_target = 1805021132, set_memory_target = -19514076 (rc=1)
> > > > DBG: dom0_curr_target = 1785507056, set_memory_target = -13949660 (rc=1)
> > > > DBG: dom0_curr_target = 1771557396, set_memory_target = -8057564 (rc=1)
> > > > DBG: dom0_curr_target = 1763499832, set_memory_target = -1862364 (rc=1)
> > > > 
> > > > The above situation is no longer relevant, but the overall dom0 target problem 
> > > > is still an issue. It now seems rather obvious (hopefully) that the 10 second 
> > > > delay in wait_for_memory_target is not sufficient. Should that function be 
> > > > modified to monitor ongoing progress and continue waiting as long as progress 
> > > > is being made?
> > > 
> > > You mean to push the logic currently in freemem to keep going so long as
> > > there is progress being made down into libxl_wait_for_memory_target?
> > > That seems like a good idea.
> > 
> > "Continue as long as progress is being made" is not exactly the idea
> > behind the current implementation of freemem even if we do have a check
> > like 
> > 
> >   if (free_memkb > free_memkb_prev) {
> > 
> > at the end.
> 
> ? "Continue as long as progress is being made" is exactly what
> 2563bca1154 "libxl: Wait for ballooning if free memory is increasing"
> was trying to implement, so it certainly was the idea behind the current
> implementation (it may well not have managed to implement the idea it
> was trying to).

I don't think so: pre-2563bca1154 it wasn't and 2563bca1154 doesn't
implement it properly.  I think we should revert it.


> >   See:
> > 
> > http://marc.info/?l=xen-devel&m=142503529725529
> > 
> > Specifically the current loop is not able to cope with Dom0 decreasing
> > its reservation but not reaching its target (because it is too slow or
> > because of other reasons).
> 
> I'm not sure what you are saying here.
> 
> Are you just agreeing with the suggestion in an oblique way?

I agree with the suggestion "continue as long as progress is being
made". I disagree that "continue as long as progress is being made" is
the current logic of freemem.

I don't have an opinion on whether that logic should be implemented in
freemem or libxl_wait_for_memory_target. The code below implements it in
libxl_wait_for_memory_target.



> > The following changes to freemem implements the aforementioned logic by
> > improving libxl_wait_for_memory_target to keep going as long as dom0 is
> > able to make progress and removing the useless free_memkb >
> > free_memkb_prev check.
> > 
> > 
> > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > index b9d083a..ec98e00 100644
> > --- a/tools/libxl/libxl.c
> > +++ b/tools/libxl/libxl.c
> > @@ -5002,26 +5002,41 @@ int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs)
> >  {
> >      int rc = 0;
> >      uint32_t target_memkb = 0;
> > +    uint64_t current_memkb, prev_memkb;
> >      libxl_dominfo info;
> >  
> > +    rc = libxl_get_memory_target(ctx, domid, &target_memkb);
> > +    if (rc < 0)
> > +        return rc;
> > +
> >      libxl_dominfo_init(&info);
> > +    prev_memkb = UINT64_MAX;
> >  
> >      do {
> > -        wait_secs--;
> >          sleep(1);
> >  
> > -        rc = libxl_get_memory_target(ctx, domid, &target_memkb);
> > -        if (rc < 0)
> > -            goto out;
> > -
> >          libxl_dominfo_dispose(&info);
> >          libxl_dominfo_init(&info);
> >          rc = libxl_domain_info(ctx, &info, domid);
> >          if (rc < 0)
> >              goto out;
> > -    } while (wait_secs > 0 && (info.current_memkb + info.outstanding_memkb) > target_memkb);
> >  
> > -    if ((info.current_memkb + info.outstanding_memkb) <= target_memkb)
> > +        current_memkb = info.current_memkb + info.outstanding_memkb;
> > +
> > +        if (current_memkb > prev_memkb)
> > +        {
> > +            rc = ERROR_FAIL;
> > +            goto out;
> > +        }
> > +        else if (current_memkb == prev_memkb)
> > +            wait_secs--;
> > +        /* if current_memkb < prev_memkb loop for free as progress has
> > +         * been made */
> > +
> > +        prev_memkb = current_memkb;
> > +    } while (wait_secs > 0 && current_memkb > target_memkb);
> > +
> > +    if (current_memkb <= target_memkb)
> >          rc = 0;
> >      else
> >          rc = ERROR_FAIL;
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 53c16eb..17f61a8 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -2200,7 +2200,7 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
> >  {
> >      int rc, retries;
> >      const int MAX_RETRIES = 3;
> > -    uint32_t need_memkb, free_memkb, free_memkb_prev = 0;
> > +    uint32_t need_memkb, free_memkb;
> >  
> >      if (!autoballoon)
> >          return 0;
> > @@ -2228,22 +2228,14 @@ static int freemem(uint32_t domid, libxl_domain_build_info *b_info)
> >          else if (rc != ERROR_NOMEM)
> >              return rc;
> >  
> > -        /* the memory target has been reached but the free memory is still
> > -         * not enough: loop over again */
> > +        /* wait until dom0 reaches its target, as long as we are making
> > +         * progress */
> >          rc = libxl_wait_for_memory_target(ctx, 0, 1);
> >          if (rc < 0)
> >              return rc;
> >  
> > -        /*
> > -         * If the amount of free mem has increased on this iteration (i.e.
> > -         * some progress has been made) then reset the retry counter.
> > -         */
> > -        if (free_memkb > free_memkb_prev) {
> > -            retries = MAX_RETRIES;
> > -            free_memkb_prev = free_memkb;
> > -        } else {
> > -            retries--;
> > -        }
> > +        /* dom0 target has been reached: loop again */
> > +        retries--;
> >      } while (retries > 0);
> >  
> >      return ERROR_NOMEM;
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 16:15                                           ` Stefano Stabellini
@ 2015-03-02 22:49                                             ` Mike Latimer
  2015-03-03 10:02                                               ` Ian Campbell
  2015-03-03 10:40                                               ` Stefano Stabellini
  0 siblings, 2 replies; 41+ messages in thread
From: Mike Latimer @ 2015-03-02 22:49 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, Ian Campbell, Stefano Stabellini

On Monday, March 02, 2015 04:15:41 PM Stefano Stabellini wrote:
> On Mon, 2 Mar 2015, Ian Campbell wrote:
> > ? "Continue as long as progress is being made" is exactly what
> > 2563bca1154 "libxl: Wait for ballooning if free memory is increasing"
> > was trying to implement, so it certainly was the idea behind the current
> > implementation (it may well not have managed to implement the idea it
> > was trying to).
> 
> I don't think so: pre-2563bca1154 it wasn't and 2563bca1154 doesn't
> implement it properly.  I think we should revert it.

I agree. The intent was there, but we are not aware of any known problem cases 
that the changes will effect. Reverting seems like the right thing to do.

> I agree with the suggestion "continue as long as progress is being
> made". I disagree that "continue as long as progress is being made" is
> the current logic of freemem.
> 
> I don't have an opinion on whether that logic should be implemented in
> freemem or libxl_wait_for_memory_target. The code below implements it in
> libxl_wait_for_memory_target.

Seems like it should wait in libxl_wait_for_memory_target, as you have 
implemented.

> > > The following changes to freemem implements the aforementioned logic by
> > > improving libxl_wait_for_memory_target to keep going as long as dom0 is
> > > able to make progress and removing the useless free_memkb >
> > > free_memkb_prev check.

I just finished testing your changes and they work correctly in my environment. 
An interesting thing to note is that within the 1 second delay in the loop, my 
test machine was ballooning down about 512M. This resulted in the loop being 
executed 127 times in order to balloon down the required 64G. (I'm not sure if 
there is any reason to worry about that though.)

One small question on freemem...  The loop essentially looks like:

  libxl_get_free_memory:  Get free memory
  If free_memkb >= need_memkb, return
  libxl_set_memory_target:  Balloon dom0 to free memory for domU
  libxl_wait_for_free_memory:  Wait for free memory for domU (max 10 seconds)
  libxl_wait_for_memory_target:  Wait for dom0 to finish ballooning
  Decrement retry and try again

Shouldn't libxl_wait_for_memory_target be before libxl_wait_for_free_memory?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
       [not found]                                           ` <54F46DDB020000780006505B@suse.com>
@ 2015-03-02 22:49                                             ` Mike Latimer
  0 siblings, 0 replies; 41+ messages in thread
From: Mike Latimer @ 2015-03-02 22:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: ian.jackson, xen-devel, wei.liu2, Ian Campbell, Stefano Stabellini

On Monday, March 02, 2015 06:04:11 AM Jan Beulich wrote:
> > Of course users could just use dom0_mem and get down with it.
> 
> I don't think we should make this a requirement for correct
> operation.

Exactly. I think from a best practices perspective, dom0_mem is still the 
recommended approach. It just does not seem right to require that (or to 
expect the first domU startup to fail without it).

Thanks,
Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 22:49                                             ` Mike Latimer
@ 2015-03-03 10:02                                               ` Ian Campbell
  2015-03-03 10:32                                                 ` Stefano Stabellini
  2015-03-03 10:40                                               ` Stefano Stabellini
  1 sibling, 1 reply; 41+ messages in thread
From: Ian Campbell @ 2015-03-03 10:02 UTC (permalink / raw)
  To: Mike Latimer; +Cc: wei.liu2, Stefano Stabellini, ian.jackson, xen-devel

On Mon, 2015-03-02 at 15:49 -0700, Mike Latimer wrote:
> On Monday, March 02, 2015 04:15:41 PM Stefano Stabellini wrote:
> > On Mon, 2 Mar 2015, Ian Campbell wrote:
> > > ? "Continue as long as progress is being made" is exactly what
> > > 2563bca1154 "libxl: Wait for ballooning if free memory is increasing"
> > > was trying to implement, so it certainly was the idea behind the current
> > > implementation (it may well not have managed to implement the idea it
> > > was trying to).
> > 
> > I don't think so: pre-2563bca1154 it wasn't and 2563bca1154 doesn't
> > implement it properly.  I think we should revert it.
> 
> I agree. The intent was there, but we are not aware of any known problem cases 
> that the changes will effect. Reverting seems like the right thing to do.
> 
> > I agree with the suggestion "continue as long as progress is being
> > made". I disagree that "continue as long as progress is being made" is
> > the current logic of freemem.
> > 
> > I don't have an opinion on whether that logic should be implemented in
> > freemem or libxl_wait_for_memory_target. The code below implements it in
> > libxl_wait_for_memory_target.
> 
> Seems like it should wait in libxl_wait_for_memory_target, as you have 
> implemented.

Please can someone submit a series with the revert and the change to
libxl_wait_for_memory_tareget in which ever order makes sense.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-03 10:02                                               ` Ian Campbell
@ 2015-03-03 10:32                                                 ` Stefano Stabellini
  0 siblings, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2015-03-03 10:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, Stefano Stabellini, ian.jackson, Mike Latimer, xen-devel

On Tue, 3 Mar 2015, Ian Campbell wrote:
> On Mon, 2015-03-02 at 15:49 -0700, Mike Latimer wrote:
> > On Monday, March 02, 2015 04:15:41 PM Stefano Stabellini wrote:
> > > On Mon, 2 Mar 2015, Ian Campbell wrote:
> > > > ? "Continue as long as progress is being made" is exactly what
> > > > 2563bca1154 "libxl: Wait for ballooning if free memory is increasing"
> > > > was trying to implement, so it certainly was the idea behind the current
> > > > implementation (it may well not have managed to implement the idea it
> > > > was trying to).
> > > 
> > > I don't think so: pre-2563bca1154 it wasn't and 2563bca1154 doesn't
> > > implement it properly.  I think we should revert it.
> > 
> > I agree. The intent was there, but we are not aware of any known problem cases 
> > that the changes will effect. Reverting seems like the right thing to do.
> > 
> > > I agree with the suggestion "continue as long as progress is being
> > > made". I disagree that "continue as long as progress is being made" is
> > > the current logic of freemem.
> > > 
> > > I don't have an opinion on whether that logic should be implemented in
> > > freemem or libxl_wait_for_memory_target. The code below implements it in
> > > libxl_wait_for_memory_target.
> > 
> > Seems like it should wait in libxl_wait_for_memory_target, as you have 
> > implemented.
> 
> Please can someone submit a series with the revert and the change to
> libxl_wait_for_memory_tareget in which ever order makes sense.

I'll do

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: freemem-slack and large memory environments
  2015-03-02 22:49                                             ` Mike Latimer
  2015-03-03 10:02                                               ` Ian Campbell
@ 2015-03-03 10:40                                               ` Stefano Stabellini
  1 sibling, 0 replies; 41+ messages in thread
From: Stefano Stabellini @ 2015-03-03 10:40 UTC (permalink / raw)
  To: Mike Latimer
  Cc: wei.liu2, Stefano Stabellini, ian.jackson, Ian Campbell, xen-devel

On Mon, 2 Mar 2015, Mike Latimer wrote:
> On Monday, March 02, 2015 04:15:41 PM Stefano Stabellini wrote:
> > On Mon, 2 Mar 2015, Ian Campbell wrote:
> > > ? "Continue as long as progress is being made" is exactly what
> > > 2563bca1154 "libxl: Wait for ballooning if free memory is increasing"
> > > was trying to implement, so it certainly was the idea behind the current
> > > implementation (it may well not have managed to implement the idea it
> > > was trying to).
> > 
> > I don't think so: pre-2563bca1154 it wasn't and 2563bca1154 doesn't
> > implement it properly.  I think we should revert it.
> 
> I agree. The intent was there, but we are not aware of any known problem cases 
> that the changes will effect. Reverting seems like the right thing to do.
> 
> > I agree with the suggestion "continue as long as progress is being
> > made". I disagree that "continue as long as progress is being made" is
> > the current logic of freemem.
> > 
> > I don't have an opinion on whether that logic should be implemented in
> > freemem or libxl_wait_for_memory_target. The code below implements it in
> > libxl_wait_for_memory_target.
> 
> Seems like it should wait in libxl_wait_for_memory_target, as you have 
> implemented.
> 
> > > > The following changes to freemem implements the aforementioned logic by
> > > > improving libxl_wait_for_memory_target to keep going as long as dom0 is
> > > > able to make progress and removing the useless free_memkb >
> > > > free_memkb_prev check.
> 
> I just finished testing your changes and they work correctly in my environment. 
> An interesting thing to note is that within the 1 second delay in the loop, my 
> test machine was ballooning down about 512M. This resulted in the loop being 
> executed 127 times in order to balloon down the required 64G. (I'm not sure if 
> there is any reason to worry about that though.)
>
> One small question on freemem...  The loop essentially looks like:
> 
>   libxl_get_free_memory:  Get free memory
>   If free_memkb >= need_memkb, return
>   libxl_set_memory_target:  Balloon dom0 to free memory for domU
>   libxl_wait_for_free_memory:  Wait for free memory for domU (max 10 seconds)
>   libxl_wait_for_memory_target:  Wait for dom0 to finish ballooning
>   Decrement retry and try again
> 
> Shouldn't libxl_wait_for_memory_target be before libxl_wait_for_free_memory?

The code works exactly as you described. The idea behind it is that
libxl_wait_for_free_memory should be enough, but if it isn't we should
figure out if we should try again of it was was because of an error by
checking if dom0 managed to balloon down (libxl_wait_for_memory_target).

With my changes to libxl_wait_for_memory_target, it doesn't make too
much sense to have both libxl_wait_for_free_memory and
libxl_wait_for_memory_target. We should probably remove the call to
libxl_wait_for_free_memory entirely.

Also it could make sense to increase the granularity of the wait time to
at least 2 sec (1G of memory being freed in your stats) to avoid waiting
cycles uselessly. I'll submit a series for it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2015-03-03 10:40 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-10  1:27 freemem-slack and large memory environments Mike Latimer
2015-02-10 21:34 ` Mike Latimer
2015-02-13 11:13   ` Wei Liu
2015-02-13 23:16     ` Mike Latimer
2015-02-18 14:10   ` Ian Campbell
2015-02-24 16:06     ` Stefano Stabellini
2015-02-24 16:54       ` Ian Campbell
2015-02-25 11:39         ` Stefano Stabellini
2015-02-25 12:00           ` Ian Campbell
2015-02-25 14:03             ` Ian Campbell
2015-02-25 14:09               ` Stefano Stabellini
2015-02-26 15:36                 ` Mike Latimer
2015-02-26 15:57                   ` Ian Campbell
2015-02-26 17:38                     ` Mike Latimer
2015-02-26 17:47                       ` Ian Campbell
2015-02-26 20:38                         ` Mike Latimer
2015-02-27 10:17                           ` Ian Campbell
2015-02-27 11:05                             ` Stefano Stabellini
2015-02-26 17:53                       ` Stefano Stabellini
2015-02-26 20:45                         ` Mike Latimer
2015-02-26 23:30                           ` Mike Latimer
2015-02-27 10:21                             ` Ian Campbell
2015-02-27 10:52                             ` Stefano Stabellini
2015-02-27 15:28                               ` Mike Latimer
2015-02-27 18:29                                 ` Mike Latimer
2015-02-28  0:31                                   ` Mike Latimer
2015-03-02 10:12                                     ` Stefano Stabellini
2015-03-02 10:44                                       ` Jan Beulich
2015-03-02 12:13                                         ` Stefano Stabellini
2015-03-02 13:04                                           ` Jan Beulich
     [not found]                                           ` <54F46DDB020000780006505B@suse.com>
2015-03-02 22:49                                             ` Mike Latimer
2015-03-02 11:42                                     ` Ian Campbell
2015-03-02 15:19                                       ` Stefano Stabellini
2015-03-02 16:04                                         ` Ian Campbell
2015-03-02 16:15                                           ` Stefano Stabellini
2015-03-02 22:49                                             ` Mike Latimer
2015-03-03 10:02                                               ` Ian Campbell
2015-03-03 10:32                                                 ` Stefano Stabellini
2015-03-03 10:40                                               ` Stefano Stabellini
2015-02-27  8:24                         ` Jan Beulich
2015-02-27 10:52                           ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.