All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11] claim and its friends for allocating multiple self-ballooning guests.
@ 2013-03-11 14:20 Konrad Rzeszutek Wilk
  2013-03-11 14:20 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 14:20 UTC (permalink / raw)
  To: xen-devel, keir, tim, konrad

[This is mostly a copy-n-paste from the cover letter from v10 of the patches]

Two of these patches (Dan's) have been posted in the past, but I took it upon
myself to expand them and address some of the concerns and also continue the
discussion around them. I believe most people have been roped in the email
conversation about this and know what these patches are for. The first patch
(mmu: Introduce XENMEM_claim_pages (subop of memory ops) has a good blurb
about the problem/solution/alternative solutions.

The patches follow the normal code-flow - the patch to implement the two hypercalls:
XENMEM_claim_pages and XENMEM_get_outstanding_pages.

Then the patches to utilize them in the libxc. The hypercall's are only utilized
if the toolstack (libxl) sets the claim_mode to anything but zero.

Then the toolstack (libxl + xl) patches. They revolve around two different changes:
 1). Add 'claim_mode=[off|on]' global configuration value that determines
     whether the claim hypercall should be used as part of guest creation.
 2). Add an extra parameter -c to the 'xl info' to provide the output of how many
     pages are claimed by different guests. This is more of a diagnostic patch.

That is it.

 docs/man/xl.conf.pod.5         |  25 +++++++++
 tools/examples/xl.conf         |   5 ++
 tools/libxc/xc_dom.h           |   1 +
 tools/libxc/xc_dom_x86.c       |  12 +++++
 tools/libxc/xc_domain.c        |  25 +++++++++
 tools/libxc/xc_hvm_build_x86.c |  17 ++++++
 tools/libxc/xenctrl.h          |   7 +++
 tools/libxc/xenguest.h         |   2 +
 tools/libxl/libxl.c            |  18 +++++++
 tools/libxl/libxl.h            |   3 ++
 tools/libxl/libxl_dom.c        |   3 +-
 tools/libxl/libxl_types.idl    |  12 ++++-
 tools/libxl/xl.c               |   5 ++
 tools/libxl/xl.h               |   1 +
 tools/libxl/xl_cmdimpl.c       |  34 ++++++++++--
 tools/libxl/xl_cmdtable.c      |   4 +-
 tools/libxl/xl_sxp.c           |   1 +
 xen/common/domain.c            |   1 +
 xen/common/domctl.c            |   1 +
 xen/common/memory.c            |  33 ++++++++++++
 xen/common/page_alloc.c        | 116 ++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/domctl.h    |   3 +-
 xen/include/public/memory.h    |  35 ++++++++++++-
 xen/include/xen/mm.h           |   3 ++
 xen/include/xen/sched.h        |   1 +
 25 files changed, 356 insertions(+), 12 deletions(-)


Dan Magenheimer (3):
      mmu: Introduce XENMEM_claim_pages (subop of memory ops).
      xc: use XENMEM_claim_pages during guest creation.
      xc: XENMEM_claim_pages outstanding claims value

Konrad Rzeszutek Wilk (3):
      xl: Implement XENMEM_claim_pages support via 'claim_mode' global config
      xl: export 'outstanding_pages' value from xcinfo
      xl: 'xl list' supports '-c' for global claim information.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
@ 2013-03-11 14:20 ` Konrad Rzeszutek Wilk
  2013-03-11 14:20 ` [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 14:20 UTC (permalink / raw)
  To: xen-devel, keir, tim, konrad; +Cc: Dan Magenheimer, Konrad Rzeszutek Wilk

From: Dan Magenheimer <dan.magenheimer@oracle.com>

When guests memory consumption is volatile (multiple guests
ballooning up/down) we are presented with the problem of
being able to determine exactly how much memory there is
for allocation of new guests without negatively impacting
existing guests. Note that the existing models (xapi, xend)
drive the memory consumption from the tool-stack and assume
that the guest will eventually hit the memory target. Other
models, such as the dynamic memory utilized by tmem, do this
differently - the guest drivers the memory consumption (up
to the d->max_pages ceiling). With dynamic memory model, the
guest frequently can balloon up and down as it sees fit.
This presents the problem to the toolstack that it does not
know atomically how much free memory there is (as the information
gets stale the moment the d->tot_pages information is provided
to the tool-stack), and hence when starting a guest can fail
during the memory creation process. Especially if the process
is done in parallel. In a nutshell what we need is a atomic
value of all domains tot_pages during the allocation of guests.
Naturally holding a lock for such a long time is unacceptable.
Hence the goal of this hypercall is to attempt to atomically and very
quickly determine if there are sufficient pages available in the
system and, if so, "set aside" that quantity of pages for future
allocations by that domain.  Unlike an existing hypercall such as
increase_reservation or populate_physmap, specific physical
pageframes are not assigned to the domain because this
cannot be done sufficiently quickly (especially for very large
allocations in an arbitrarily fragmented system) and so the
existing mechanisms result in classic time-of-check-time-of-use
(TOCTOU) races.  One can think of claiming as similar to a
"lazy" allocation, but subsequent hypercalls are required
to do the actual physical pageframe allocation.

Note that one of effects of this hypercall is that from the
perspective of other running guests -  suddenly there is
a new guest occupying X amount of pages. This means that when
we try to balloon up they will hit the system-wide ceiling of
available free memory (if the total sum of the existing d->max_pages
>= host memory). This is OK - as that is part of the overcommit.
What we DO NOT want to do is dictate their ceiling should be
(d->max_pages) as that is risky and can lead to guests OOM-ing.
It is something the guest needs to figure out.

In order for a toolstack to "get" information about whether
a domain has a claim and, if so, how large, and also for
the toolstack to measure the total system-wide claim, a
second subop has been added and exposed through domctl
and libxl (see "xen: XENMEM_claim_pages: xc").

== Alternative solutions ==
There has been a variety of discussion whether the problem
hypercall is solving can be done in user-space, such as:
 - For all the existing guest, set their d->max_pages temporarily
   to d->tot_pages and create the domain. This forces those
   domains to stay at their current consumption level (fyi, this is what
   the tmem freeze call is doing). The disadvantage of this is
   that needlessly forces the guests to stay at the memory usage
   instead of allowing it to decide the optimal target.
 - Account only using d->max_pages of how much free memory there is.
   This ignores ballooning changes and any over-commit scenario. This
   is similar to the scenario where the sum of all d->max_pages (and
   the one to be allocated now) on the host is smaller than the available
   free memory. As such it ignores the over-commit problem.
 - Provide a ring/FIFO along with event channel to notify an userspace
   daemon of guests memory consumption. This daemon can then provide
   up-to-date information to the toolstack of how much free memory
   there is. This duplicates what the hypervisor is already doing and
   introduced latency issues and catching breath for the toolstack as there
   might be millions of these updates on heavily used machine. There might
   not be any quiescent state ever and the toolstack will heavily consume
   CPU cycles and not ever provide up-to-date information.

It has been noted that this claim mechanism solves the
underlying problem (slow failure of domain creation) for
a large class of domains but not all, specifically not
handling (but also not making the problem worse for) PV
domains that specify the "superpages" flag, and 32-bit PV
domains on large RAM systems.  These will be addressed at a
later time.

Code overview:

Though the hypercall simply does arithmetic within locks,
some of the semantics in the code may be a bit subtle.

The key variables (d->unclaimed_pages and total_unclaimed_pages)
starts at zero if no claim has yet been staked for any domain.
(Perhaps a better name is "claimed_but_not_yet_possessed" but that's
a bit unwieldy.)  If no claim hypercalls are executed, there
should be no impact on existing usage.

When a claim is successfully staked by a domain, it is like a
watermark but there is no record kept of the size of the claim.
Instead, d->unclaimed_pages is set to the difference between
d->tot_pages and the claim.  When d->tot_pages increases or decreases,
d->unclaimed_pages atomically decreases or increases.  Once
d->unclaimed_pages reaches zero, the claim is satisfied and
d->unclaimed pages stays at zero -- unless a new claim is
subsequently staked.

The systemwide variable total_unclaimed_pages is always the sum
of d->unclaimed_pages, across all domains.  A non-domain-
specific heap allocation will fail if total_unclaimed_pages
exceeds free (plus, on tmem enabled systems, freeable) pages.

Claim semantics could be modified by flags.  The initial
implementation had three flag, which discerns whether the
caller would like tmem freeable pages to be considered
in determining whether or not the claim can be successfully
staked. This in later patches was removed and there are no
flags.

A claim can be cancelled by requesting a claim with the
number of pages being zero.

A second subop returns the total outstanding claimed pages
systemwide.

Note: Save/restore/migrate may need to be modified,
else it can be documented that all claims are cancelled.

==
This patch of the proposed XENMEM_claim_pages hypercall/subop, takes
into account review feedback from Jan and Keir and IanC and Matthew Daley,
plus some fixes found via runtime debugging.

v12:
 - Dropped flags per Time suggestion.

v11:
 - Altered names and some of the code assumptions per Tim
 - Added Acked-by Keir
v9->v10:
 - Altered the API a bit (new flags, etc).

v8->v9:
- Drop snippets already merged by Keir
  (26269:21a5b181f8ad and 26270:03cb71bc32f9)

v7->v8:
- Replace missing tot_pages adjustment [JBeulich]
- Combine domain_inc/dec_tot_pages into one function [JBeulich]
- Added a few more clarifying comments [djm]

v6->v7:
- Need co-existence with low_mem_virq [JBeulich]

v5->v6:
- Fix post-inc problem [mattjd]

v4->v5:
- Split tools part into separate patch [JBeulich]
- Minor coding style fixups [JBeulich]
- Use rcu_lock_domain_by_id, not _target version [JBeulich]

v3->v4: (please ignore v3)
- Process error, sent stale patch, sorry [djm]

v2->v3:
- Add per-domain and global "get" for unclaimed info [JBeulich]
- New hypercall(s) should fail for unpriv callers [IanC]
- Non-zero extent_order disallowed [JBeulich]
- Remove bonehead ASSERTs [JBeulich]
- Need not hold heaplock for decrease case too [JBeulich]
- More descriptive failure return values [JBeulich]
- Don't allow a claim to exceed max_pages [IanC]
- Subops must be in correct ifdef block in memory.c [keir]

v1->v2:
- Add reset-to-zero page claim in domain_kill [JBeulich]
- Proper handling of struct passed to hypercall [JBeulich]
- Fix alloc_heap_pages when a domain has a claim [JBeulich]
- Need not hold heap_lock if !d->unclaimed_pages [keir]
- Fix missed tot_pages call in donate_page [djm]
- Remove domain_reset_unclaimed_pages; use set with zero [djm]
- Bugfixes found through testing in set_unclaimed [djm]
- More comments in code [djm]
- Code formatting fixes [djm]
===

Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/common/domain.c         |   1 +
 xen/common/domctl.c         |   1 +
 xen/common/memory.c         |  33 +++++++++++++
 xen/common/page_alloc.c     | 116 +++++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/domctl.h |   3 +-
 xen/include/public/memory.h |  35 ++++++++++++-
 xen/include/xen/mm.h        |   3 ++
 xen/include/xen/sched.h     |   1 +
 8 files changed, 188 insertions(+), 5 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b360de1..64ee29d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -507,6 +507,7 @@ int domain_kill(struct domain *d)
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
+        domain_set_outstanding_pages(d, 0);
         d->tmem = NULL;
         /* fallthrough */
     case DOMDYING_dying:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b7f6619..c98e99c 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
 
     info->tot_pages         = d->tot_pages;
     info->max_pages         = d->max_pages;
+    info->outstanding_pages = d->outstanding_pages;
     info->shr_pages         = atomic_read(&d->shr_pages);
     info->paged_pages       = atomic_read(&d->paged_pages);
     info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 08550ef..3cfc1e3 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -712,6 +712,39 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case XENMEM_claim_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&reservation, arg, 1) )
+            return -EFAULT;
+
+        if ( !guest_handle_is_null(reservation.extent_start) )
+            return -EINVAL;
+
+        if ( reservation.extent_order != 0 )
+            return -EINVAL;
+
+        if ( reservation.mem_flags != 0 )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_id(reservation.domid);
+        if ( d == NULL )
+            return -EINVAL;
+
+        rc = domain_set_outstanding_pages(d, reservation.nr_extents);
+
+        rcu_unlock_domain(d);
+
+        break;
+
+    case XENMEM_get_outstanding_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        rc = get_outstanding_claims();
+        break;
+
     default:
         rc = arch_memory_op(op, arg);
         break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e9fb15..aefef29 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -252,11 +252,114 @@ static long midsize_alloc_zone_pages;
 #define MIDSIZE_ALLOC_FRAC 128
 
 static DEFINE_SPINLOCK(heap_lock);
+static long outstanding_claims; /* total outstanding claims by all domains */
 
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
 {
+    long dom_before, dom_after, dom_claimed, sys_before, sys_after;
+
     ASSERT(spin_is_locked(&d->page_alloc_lock));
-    return d->tot_pages += pages;
+    d->tot_pages += pages;
+
+    /*
+     * can test d->claimed_pages race-free because it can only change
+     * if d->page_alloc_lock and heap_lock are both held, see also
+     * domain_set_outstanding_pages below
+     */
+    if ( !d->outstanding_pages )
+        goto out;
+
+    spin_lock(&heap_lock);
+    /* adjust domain outstanding pages; may not go negative */
+    dom_before = d->outstanding_pages;
+    dom_after = dom_before - pages;
+    BUG_ON(dom_before < 0);
+    dom_claimed = dom_after < 0 ? 0 : dom_after;
+    d->outstanding_pages = dom_claimed;
+    /* flag accounting bug if system outstanding_claims would go negative */
+    sys_before = outstanding_claims;
+    sys_after = sys_before - (dom_before - dom_claimed);
+    BUG_ON(sys_after < 0);
+    outstanding_claims = sys_after;
+    spin_unlock(&heap_lock);
+
+out:
+    return d->tot_pages;
+}
+
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
+{
+    int ret = -ENOMEM;
+    unsigned long claim, avail_pages;
+
+    /*
+     * take the domain's page_alloc_lock, else all d->tot_page adjustments
+     * must always take the global heap_lock rather than only in the much
+     * rarer case that d->outstanding_pages is non-zero
+     */
+    spin_lock(&d->page_alloc_lock);
+    spin_lock(&heap_lock);
+
+    /* pages==0 means "unset" the claim. */
+    if ( pages == 0 )
+    {
+        outstanding_claims -= d->outstanding_pages;
+        d->outstanding_pages = 0;
+        ret = 0;
+        goto out;
+    }
+
+    /* only one active claim per domain please */
+    if ( d->outstanding_pages )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* disallow a claim not exceeding current tot_pages or above max_pages */
+    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* how much memory is available? */
+    avail_pages = total_avail_pages;
+
+    /* Note: The usage of claim means that allocation from a guest *might*
+     * have to come from freeable memory. Using free memory is always better, if
+     * it is available, than using freeable memory.
+     *
+     * But that is OK as once the claim has been made, it still can take minutes
+     * before the claim is fully satisfied. Tmem can make use of the unclaimed
+     * pages during this time (to store ephemeral/freeable pages only,
+     * not persistent pages).
+     */
+    avail_pages += tmem_freeable_pages();
+    avail_pages -= outstanding_claims;
+
+    /*
+     * Note, if domain has already allocated memory before making a claim
+     * then the claim must take tot_pages into account
+     */
+    claim = pages - d->tot_pages;
+    if ( claim > avail_pages )
+        goto out;
+
+    /* yay, claim fits in available memory, stake the claim, success! */
+    d->outstanding_pages = claim;
+    outstanding_claims += d->outstanding_pages;
+    ret = 0;
+
+out:
+    spin_unlock(&heap_lock);
+    spin_unlock(&d->page_alloc_lock);
+    return ret;
+}
+
+long get_outstanding_claims(void)
+{
+    return outstanding_claims;
 }
 
 static unsigned long init_node_heap(int node, unsigned long mfn,
@@ -397,7 +500,7 @@ static void __init setup_low_mem_virq(void)
 static void check_low_mem_virq(void)
 {
     unsigned long avail_pages = total_avail_pages +
-        (opt_tmem ? tmem_freeable_pages() : 0);
+        (opt_tmem ? tmem_freeable_pages() : 0) - outstanding_claims;
 
     if ( unlikely(avail_pages <= low_mem_virq_th) )
     {
@@ -466,6 +569,15 @@ static struct page_info *alloc_heap_pages(
     spin_lock(&heap_lock);
 
     /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          (d == NULL || d->outstanding_pages < request) )
+        goto not_found;
+
+    /*
      * TMEM: When available memory is scarce due to tmem absorbing it, allow
      * only mid-size allocations to avoid worst of fragmentation issues.
      * Others try tmem pools then fail.  This is a workaround until all
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index deb19db..113b8dc 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -36,7 +36,7 @@
 #include "grant_table.h"
 #include "hvm/save.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo {
     uint32_t flags;              /* XEN_DOMINF_* */
     uint64_aligned_t tot_pages;
     uint64_aligned_t max_pages;
+    uint64_aligned_t outstanding_pages;
     uint64_aligned_t shr_pages;
     uint64_aligned_t paged_pages;
     uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 1c5ca19..51d5254 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -68,6 +68,8 @@ struct xen_memory_reservation {
      *   IN:  GPFN bases of extents to populate with memory
      *   OUT: GMFN bases of extents that were allocated
      *   (NB. This command also updates the mach_to_phys translation table)
+     * XENMEM_claim_pages:
+     *   IN: must be zero
      */
     XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
 
@@ -430,10 +432,39 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
 
 /*
- * Reserve ops for future/out-of-tree "claim" patches (Oracle)
+ * Attempt to stake a claim for a domain on a quantity of pages
+ * of system RAM, but _not_ assign specific pageframes.  Only
+ * arithmetic is performed so the hypercall is very fast and need
+ * not be preemptible, thus sidestepping time-of-check-time-of-use
+ * races for memory allocation.  Returns 0 if the hypervisor page
+ * allocator has atomically and successfully claimed the requested
+ * number of pages, else non-zero.
+ *
+ * Any domain may have only one active claim.  When sufficient memory
+ * has been allocated to resolve the claim, the claim silently expires.
+ * Claiming zero pages effectively resets any outstanding claim and
+ * is always successful.
+ *
+ * Note that a valid claim may be staked even after memory has been
+ * allocated for a domain.  In this case, the claim is not incremental,
+ * i.e. if the domain's tot_pages is 3, and a claim is staked for 10,
+ * only 7 additional pages are claimed.
+ *
+ * Caller must be privileged or the hypercall fails.
  */
 #define XENMEM_claim_pages                  24
-#define XENMEM_get_unclaimed_pages          25
+
+/*
+ * XENMEM_claim_pages flags - the are no flags at this time.
+ * The zero value is appropiate.
+ */
+
+/*
+ * Get the number of pages currently claimed (but not yet "possessed")
+ * across all domains.  The caller must be privileged but otherwise
+ * the call never fails.
+ */
+#define XENMEM_get_outstanding_pages        25
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 2f701f5..28512fb 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -49,7 +49,10 @@ void free_xenheap_pages(void *v, unsigned int order);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 
+/* Claim handling */
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages);
+long get_outstanding_claims(void);
 
 /* Domain suballocator. These functions are *not* interrupt-safe.*/
 void init_domheap_pages(paddr_t ps, paddr_t pe);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e108436..569e76e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -242,6 +242,7 @@ struct domain
     struct page_list_head page_list;  /* linked list */
     struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
     unsigned int     tot_pages;       /* number of pages currently possesed */
+    unsigned int     outstanding_pages; /* pages claimed but not possessed  */
     unsigned int     max_pages;       /* maximum value for tot_pages        */
     atomic_t         shr_pages;       /* number of shared pages             */
     atomic_t         paged_pages;     /* number of paged-out pages          */
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation.
  2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
  2013-03-11 14:20 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
@ 2013-03-11 14:20 ` Konrad Rzeszutek Wilk
  2013-03-13 10:23   ` Ian Campbell
  2013-03-11 14:20 ` [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 14:20 UTC (permalink / raw)
  To: xen-devel, keir, tim, konrad; +Cc: Dan Magenheimer, Konrad Rzeszutek Wilk

From: Dan Magenheimer <dan.magenheimer@oracle.com>

We add an extra parameter to the structures passed to the
PV routine (arch_setup_meminit) and HVM routine (setup_guest)
that determines whether the claim hypercall is to be done.

The contents of the 'claim_mode' is defined as an 'uint8_t'
in case the hypercall expands in the future with extra
flags (for example for per-NUMA allocation). For right now
the proper values are: 0 to disable it or 1 to enable
it.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/xc_dom.h           |  1 +
 tools/libxc/xc_dom_x86.c       | 12 ++++++++++++
 tools/libxc/xc_domain.c        | 24 ++++++++++++++++++++++++
 tools/libxc/xc_hvm_build_x86.c | 17 +++++++++++++++++
 tools/libxc/xenctrl.h          |  6 ++++++
 tools/libxc/xenguest.h         |  2 ++
 6 files changed, 62 insertions(+)

diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
index 779b9d4..34a6e38 100644
--- a/tools/libxc/xc_dom.h
+++ b/tools/libxc/xc_dom.h
@@ -135,6 +135,7 @@ struct xc_dom_image {
     domid_t guest_domid;
     int8_t vhpt_size_log2; /* for IA64 */
     int8_t superpages;
+    uint8_t claim_mode; /* 0 by default disables it, 1 enables it */
     int shadow_enabled;
 
     int xen_version;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index eb9ac07..c1b7905 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -706,6 +706,13 @@ int arch_setup_meminit(struct xc_dom_image *dom)
     }
     else
     {
+        /* try to claim pages for early warning of insufficient memory avail */
+        if ( dom->claim_mode ) {
+            rc = xc_domain_claim_pages(dom->xch, dom->guest_domid,
+                                       dom->total_pages);
+            if ( rc )
+                return rc;
+        }
         /* setup initial p2m */
         for ( pfn = 0; pfn < dom->total_pages; pfn++ )
             dom->p2m_host[pfn] = pfn;
@@ -722,6 +729,11 @@ int arch_setup_meminit(struct xc_dom_image *dom)
                 dom->xch, dom->guest_domid, allocsz,
                 0, 0, &dom->p2m_host[i]);
         }
+
+        /* Ensure no unclaimed pages are left unused.
+         * OK to call if hadn't done the earlier claim call. */
+        xc_domain_claim_pages(dom->xch, dom->guest_domid,
+                              0 /* cancels the claim */);
     }
 
     return rc;
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 480ce91..b8a345c 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -775,6 +775,30 @@ int xc_domain_add_to_physmap(xc_interface *xch,
     return do_memory_op(xch, XENMEM_add_to_physmap, &xatp, sizeof(xatp));
 }
 
+int xc_domain_claim_pages(xc_interface *xch,
+                               uint32_t domid,
+                               unsigned long nr_pages)
+{
+    int err;
+    xen_pfn_t *extent_start = NULL;
+    DECLARE_HYPERCALL_BOUNCE(extent_start, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+    struct xen_memory_reservation reservation = {
+        .nr_extents   = nr_pages,
+        .extent_order = 0,
+        .mem_flags    = 0, /* no flags */
+        .domid        = domid
+    };
+
+    set_xen_guest_handle(reservation.extent_start, extent_start);
+
+    err = do_memory_op(xch, XENMEM_claim_pages, &reservation, sizeof(reservation));
+    return err;
+}
+unsigned long xc_domain_get_outstanding_pages(xc_interface *xch)
+{
+    return do_memory_op(xch, XENMEM_get_outstanding_pages, NULL, 0);
+}
+
 int xc_domain_populate_physmap(xc_interface *xch,
                                uint32_t domid,
                                unsigned long nr_extents,
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 3b5d777..b6b56b3 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -252,6 +252,7 @@ static int setup_guest(xc_interface *xch,
     unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
         stat_1gb_pages = 0;
     int pod_mode = 0;
+    unsigned int claim_mode = args->claim_mode;
 
     if ( nr_pages > target_pages )
         pod_mode = XENMEMF_populate_on_demand;
@@ -329,6 +330,16 @@ static int setup_guest(xc_interface *xch,
         xch, dom, 0xa0, 0, pod_mode, &page_array[0x00]);
     cur_pages = 0xc0;
     stat_normal_pages = 0xc0;
+
+    /* try to claim pages for early warning of insufficient memory available */
+    if ( claim_mode ) {
+        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
+        if ( rc != 0 )
+        {
+            PERROR("Could not allocate memory for HVM guest.");
+            goto error_out;
+        }
+    }
     while ( (rc == 0) && (nr_pages > cur_pages) )
     {
         /* Clip count to maximum 1GB extent. */
@@ -506,10 +517,16 @@ static int setup_guest(xc_interface *xch,
         munmap(page0, PAGE_SIZE);
     }
 
+    /* ensure no unclaimed pages are left unused */
+    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
+
     free(page_array);
     return 0;
 
  error_out:
+    /* ensure no unclaimed pages are left unused */
+    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
+
     free(page_array);
     return -1;
 }
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 32122fd..e695456 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1129,6 +1129,12 @@ int xc_domain_populate_physmap_exact(xc_interface *xch,
                                      unsigned int mem_flags,
                                      xen_pfn_t *extent_start);
 
+int xc_domain_claim_pages(xc_interface *xch,
+                               uint32_t domid,
+                               unsigned long nr_pages);
+
+unsigned long xc_domain_get_outstanding_pages(xc_interface *xch);
+
 int xc_domain_memory_exchange_pages(xc_interface *xch,
                                     int domid,
                                     unsigned long nr_in_extents,
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 7d4ac33..e3c879b 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -231,6 +231,8 @@ struct xc_hvm_build_args {
 
     /* Extra SMBIOS structures passed to HVMLOADER */
     struct xc_hvm_firmware_module smbios_module;
+    /* Whether to use claim hypercall (1 - enable, 0 - disable). */
+    uint8_t claim_mode;
 };
 
 /**
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config
  2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
  2013-03-11 14:20 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
  2013-03-11 14:20 ` [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation Konrad Rzeszutek Wilk
@ 2013-03-11 14:20 ` Konrad Rzeszutek Wilk
  2013-03-13 10:41   ` Ian Campbell
  2013-03-11 14:20 ` [PATCH 4/6] xc: XENMEM_claim_pages outstanding claims value Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 14:20 UTC (permalink / raw)
  To: xen-devel, keir, tim, konrad; +Cc: Konrad Rzeszutek Wilk

The XENMEM_claim_pages operates per domain and its usage is suppose to
be done system wide. As such this patch introduces a global
configuration option 'claim_mode' that by default is disabled.

The problem is that when a guest is launched the process of allocating
memory from the hypervisor is not atomic and for large guests can take a while.
During which time other guests (especially self-ballooning) can take
balloon up / down making the free memory information available to the
toolstack stale.  With the "claim_mode" we can "stake a claim" for
exact number of pages we need to allocate the guest and we are guaranteed that
they will be there when we start allocating the guest.

There are two options:
 "on" - Use the default memory and also include the ephereal tmem pool in this
      calculation. This can affect temporarily negatively 'tmem' enabled guests
      but provides more freeable memory.
 "off" - (default) No claim attempt is made.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/man/xl.conf.pod.5      | 25 +++++++++++++++++++++++++
 tools/examples/xl.conf      |  5 +++++
 tools/libxl/libxl.c         |  5 +++++
 tools/libxl/libxl.h         |  2 ++
 tools/libxl/libxl_dom.c     |  3 ++-
 tools/libxl/libxl_types.idl |  7 ++++++-
 tools/libxl/xl.c            |  5 +++++
 tools/libxl/xl.h            |  1 +
 tools/libxl/xl_cmdimpl.c    |  5 +++++
 9 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 82c6b20..fded0ab 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -98,6 +98,31 @@ Configures the name of the first block device to be used for temporary
 block device allocations by the toolstack.
 The default choice is "xvda".
 
+=item B<claim_mode="off|on">
+
+If enabled when a guest is created it will provide an guarantee whether the
+guest can be launched or not due to memory exhaustion. This is needed when
+using tmem type guests that can balloon up and down (self-balloon) extremely
+quickly and the list of free memory information is stale the moment it is
+displayed. When this mode is used there will be initially a claim set for the
+amount of pages needed which will be fulfilled as the guest is created.
+If the claim fails the guest creation process will also fail.
+
+Default: C<off>
+
+=over 4
+
+=item B<"off">
+
+No action is taken and there is no stake put for the memory.
+
+=item B<"on">
+
+Normal memory and freeable pool of ephereal pages (tmem) is used when
+calculating whether there is enough memory free to launch a guest.
+
+=back
+
 =back
 
 =head1 SEE ALSO
diff --git a/tools/examples/xl.conf b/tools/examples/xl.conf
index 28ab796..2b64f7e 100644
--- a/tools/examples/xl.conf
+++ b/tools/examples/xl.conf
@@ -20,3 +20,8 @@
 # if disabled the old behaviour will be used, and hotplug scripts will be
 # launched by udev.
 #run_hotplug_scripts=1
+#
+# stake a claim of memory when launching a guest. This guarantees immediate
+# feedback whether the guest can be launched due to memory exhaustion
+# (which can take a long time to find out if launching huge guests).
+#claim_mode="on"
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 73e0dc3..8c6e1a1 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4788,6 +4788,11 @@ int libxl_tmem_freeable(libxl_ctx *ctx)
     return rc;
 }
 
+int libxl_parse_claim_mode(const char *s, unsigned int *flag)
+{
+    return libxl_claim_mode_from_string(s, (libxl_claim_mode *)flag);
+}
+
 int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap *cpumap)
 {
     int ncpus;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 030aa86..538bf93 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -579,6 +579,8 @@ int libxl_wait_for_free_memory(libxl_ctx *ctx, uint32_t domid, uint32_t memory_k
 /* wait for the memory target of a domain to be reached */
 int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs);
 
+/* Parse the claim_mode options */
+int libxl_parse_claim_mode(const char *s, unsigned int *flag);
 
 int libxl_vncviewer_exec(libxl_ctx *ctx, uint32_t domid, int autopass);
 int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num, libxl_console_type type);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index de555ee..4715378 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -367,6 +367,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     dom->console_domid = state->console_domid;
     dom->xenstore_evtchn = state->store_port;
     dom->xenstore_domid = state->store_domid;
+    dom->claim_mode = info->claim_mode;
 
     if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) {
         LOGE(ERROR, "xc_dom_boot_xen_init failed");
@@ -601,7 +602,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
      */
     args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
     args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
-
+    args.claim_mode = info->claim_mode;
     if (libxl__domain_firmware(gc, info, &args)) {
         LOG(ERROR, "initializing domain firmware failed");
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 5b080ed..e417851 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -94,6 +94,11 @@ libxl_tsc_mode = Enumeration("tsc_mode", [
     (3, "native_paravirt"),
     ])
 
+libxl_claim_mode = Enumeration("claim_mode", [
+    (0, "off"),
+    (1, "on"),
+    ]);
+
 # Consistent with the values defined for HVM_PARAM_TIMER_MODE.
 libxl_timer_mode = Enumeration("timer_mode", [
     (0, "delay_for_missed_ticks"),
@@ -293,7 +298,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("ioports",          Array(libxl_ioport_range, "num_ioports")),
     ("irqs",             Array(uint32, "num_irqs")),
     ("iomem",            Array(libxl_iomem_range, "num_iomem")),
-
+    ("claim_mode",	 libxl_claim_mode),
     ("u", KeyedUnion(None, libxl_domain_type, "type",
                 [("hvm", Struct(None, [("firmware",         string),
                                        ("bios",             libxl_bios_type),
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index ecbcd3b..37d56c9 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -44,6 +44,7 @@ char *lockfile;
 char *default_vifscript = NULL;
 char *default_bridge = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
+unsigned int global_claim_mode = 0;
 
 static xentoollog_level minmsglevel = XTL_PROGRESS;
 
@@ -102,6 +103,10 @@ static void parse_global_config(const char *configfile,
     }
     if (!xlu_cfg_get_string (config, "blkdev_start", &buf, 0))
         blkdev_start = strdup(buf);
+
+    if (!xlu_cfg_get_string(config, "claim_mode", &buf, 0))
+        libxl_parse_claim_mode(buf, &global_claim_mode);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index be6f38b..b1d434e 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -145,6 +145,7 @@ int xl_child_pid(xlchildnum); /* returns 0 if child struct is not in use */
 extern int autoballoon;
 extern int run_hotplug_scripts;
 extern int dryrun_only;
+extern unsigned int global_claim_mode;
 extern char *lockfile;
 extern char *default_vifscript;
 extern char *default_bridge;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index a98705e..27298ea 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -757,6 +757,11 @@ static void parse_config_data(const char *config_source,
     if (!xlu_cfg_get_long (config, "maxmem", &l, 0))
         b_info->max_memkb = l * 1024;
 
+    if (global_claim_mode)
+        b_info->claim_mode = (libxl_claim_mode)global_claim_mode;
+    else
+        b_info->claim_mode = 0;
+
     if (xlu_cfg_get_string (config, "on_poweroff", &buf, 0))
         buf = "destroy";
     if (!parse_action_on_shutdown(buf, &d_config->on_poweroff)) {
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/6] xc: XENMEM_claim_pages outstanding claims value
  2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2013-03-11 14:20 ` [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config Konrad Rzeszutek Wilk
@ 2013-03-11 14:20 ` Konrad Rzeszutek Wilk
  2013-03-13 10:43   ` Ian Campbell
  2013-03-11 14:20 ` [PATCH 5/6] xl: export 'outstanding_pages' value from xcinfo Konrad Rzeszutek Wilk
  2013-03-11 14:20 ` [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information Konrad Rzeszutek Wilk
  5 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 14:20 UTC (permalink / raw)
  To: xen-devel, keir, tim, konrad; +Cc: Dan Magenheimer, Konrad Rzeszutek Wilk

From: Dan Magenheimer <dan.magenheimer@oracle.com>

This is patch provides the value of claimed pages but
not yet accounted for in a domain (outstanding). This is a
total global value that influences the hypervisors' MM system.
This value - when the guest has finally been created and is
running - ends up having the value zero. But during the
memory populate calls the contents of this value will
decrease.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
[v2: s/unclaimed_pages/outstanding_pages/ per Tim's suggestion]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/xc_domain.c | 1 +
 tools/libxc/xenctrl.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index b8a345c..ab6dde5 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -234,6 +234,7 @@ int xc_domain_getinfo(xc_interface *xch,
 
         info->ssidref  = domctl.u.getdomaininfo.ssidref;
         info->nr_pages = domctl.u.getdomaininfo.tot_pages;
+        info->nr_outstanding_pages = domctl.u.getdomaininfo.outstanding_pages;
         info->nr_shared_pages = domctl.u.getdomaininfo.shr_pages;
         info->nr_paged_pages = domctl.u.getdomaininfo.paged_pages;
         info->max_memkb = domctl.u.getdomaininfo.max_pages << (PAGE_SHIFT-10);
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index e695456..2a4d4df 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -364,6 +364,7 @@ typedef struct xc_dominfo {
                   hvm:1, debugged:1;
     unsigned int  shutdown_reason; /* only meaningful if shutdown==1 */
     unsigned long nr_pages; /* current number, not maximum */
+    unsigned long nr_outstanding_pages;
     unsigned long nr_shared_pages;
     unsigned long nr_paged_pages;
     unsigned long shared_info_frame;
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/6] xl: export 'outstanding_pages' value from xcinfo
  2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2013-03-11 14:20 ` [PATCH 4/6] xc: XENMEM_claim_pages outstanding claims value Konrad Rzeszutek Wilk
@ 2013-03-11 14:20 ` Konrad Rzeszutek Wilk
  2013-03-13 10:44   ` Ian Campbell
  2013-03-11 14:20 ` [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information Konrad Rzeszutek Wilk
  5 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 14:20 UTC (permalink / raw)
  To: xen-devel, keir, tim, konrad; +Cc: Konrad Rzeszutek Wilk

This patch provides the value of claimed pages but not yet
accounted for in a domain (outstanding). This value is initially set
by the XENMEM_claim_pages hypercall and as the guest is being
created its values decreases to zero.

With this patch it is possible to see the value of this
field.

[v2: s/unclaimed/outstanding/ per Tim's suggestion]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl.c         | 1 +
 tools/libxl/libxl_types.idl | 1 +
 tools/libxl/xl_sxp.c        | 1 +
 3 files changed, 3 insertions(+)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 8c6e1a1..0745888 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -528,6 +528,7 @@ static void xcinfo2xlinfo(const xc_domaininfo_t *xcinfo,
     else
         xlinfo->shutdown_reason  = ~0;
 
+    xlinfo->outstanding_memkb = PAGE_TO_MEMKB(xcinfo->outstanding_pages);
     xlinfo->current_memkb = PAGE_TO_MEMKB(xcinfo->tot_pages);
     xlinfo->shared_memkb = PAGE_TO_MEMKB(xcinfo->shr_pages);
     xlinfo->paged_memkb = PAGE_TO_MEMKB(xcinfo->paged_pages);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e417851..0a8b99a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -201,6 +201,7 @@ libxl_dominfo = Struct("dominfo",[
     # Otherwise set to a value guaranteed not to clash with any valid
     # LIBXL_SHUTDOWN_REASON_* constant.
     ("shutdown_reason", libxl_shutdown_reason),
+    ("outstanding_memkb",  MemKB),
     ("current_memkb",   MemKB),
     ("shared_memkb", MemKB),
     ("paged_memkb", MemKB),
diff --git a/tools/libxl/xl_sxp.c b/tools/libxl/xl_sxp.c
index a16a025..798f99d 100644
--- a/tools/libxl/xl_sxp.c
+++ b/tools/libxl/xl_sxp.c
@@ -72,6 +72,7 @@ void printf_info_sexp(int domid, libxl_domain_config *d_config)
     printf("\t(build_info)\n");
     printf("\t(max_vcpus %d)\n", b_info->max_vcpus);
     printf("\t(tsc_mode %s)\n", libxl_tsc_mode_to_string(b_info->tsc_mode));
+    printf("\t(outstanding_memkb %"PRId64")\n", info.outstanding_memkb);
     printf("\t(max_memkb %"PRId64")\n", b_info->max_memkb);
     printf("\t(target_memkb %"PRId64")\n", b_info->target_memkb);
     printf("\t(nomigrate %s)\n",
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information.
  2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2013-03-11 14:20 ` [PATCH 5/6] xl: export 'outstanding_pages' value from xcinfo Konrad Rzeszutek Wilk
@ 2013-03-11 14:20 ` Konrad Rzeszutek Wilk
  2013-03-13 10:51   ` Ian Campbell
  5 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 14:20 UTC (permalink / raw)
  To: xen-devel, keir, tim, konrad; +Cc: Konrad Rzeszutek Wilk

When guests have XENMEM_claim_pages called, they influence a global
counter value - which has the cumulative count of the number of
pages requested by all domains. This value is provided via the
XENMEM_get_unclaimed_pages hypercall. The value fluctuates quite
often so the value is stale once it is provided to the user-space.
However it is useful for diagnostic purposes.

[v1: s/unclaimed/outstanding/]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl.c         | 12 ++++++++++++
 tools/libxl/libxl.h         |  1 +
 tools/libxl/libxl_types.idl |  4 ++++
 tools/libxl/xl_cmdimpl.c    | 29 +++++++++++++++++++++++++----
 tools/libxl/xl_cmdtable.c   |  4 +++-
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0745888..fd5d725 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4051,6 +4051,18 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
     return ret;
 }
 
+int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo)
+{
+    long l;
+
+    l = xc_domain_get_outstanding_pages(ctx->xch);
+    if (l == -ENOSYS)
+        return l;
+    claiminfo->claimed = l;
+
+    return 0;
+}
+
 const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
 {
     union {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 538bf93..8d0ab23 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -581,6 +581,7 @@ int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs);
 
 /* Parse the claim_mode options */
 int libxl_parse_claim_mode(const char *s, unsigned int *flag);
+int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo);
 
 int libxl_vncviewer_exec(libxl_ctx *ctx, uint32_t domid, int autopass);
 int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num, libxl_console_type type);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0a8b99a..aa71ed8 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -509,6 +509,10 @@ libxl_cputopology = Struct("cputopology", [
     ("node", uint32),
     ], dir=DIR_OUT)
 
+libxl_claiminfo = Struct("claiminfo", [
+    ("claimed", uint64),
+    ])
+
 libxl_sched_credit_params = Struct("sched_credit_params", [
     ("tslice_ms", integer),
     ("ratelimit_us", integer),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 27298ea..6687c01 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4640,7 +4640,23 @@ static void output_topologyinfo(void)
     return;
 }
 
-static void print_info(int numa)
+static void output_claim(void)
+{
+    libxl_claiminfo info;
+
+    if (libxl_get_claiminfo(ctx, &info)) {
+        fprintf(stderr, "libxl_get_claiminfo failed.\n");
+        return;
+    }
+
+    printf("claim mode             : %s (%ld)\n",
+           libxl_claim_mode_to_string(global_claim_mode), info.claimed);
+
+    libxl_claiminfo_dispose(&info);
+    return;
+}
+
+static void print_info(int numa, int claim)
 {
     output_nodeinfo();
 
@@ -4650,6 +4666,8 @@ static void print_info(int numa)
         output_topologyinfo();
         output_numainfo();
     }
+    if (claim)
+        output_claim();
 
     output_xeninfo();
 
@@ -4663,18 +4681,21 @@ int main_info(int argc, char **argv)
     int opt;
     static struct option opts[] = {
         {"numa", 0, 0, 'n'},
+        {"claim", 0, 0, 'c'},
         COMMON_LONG_OPTS,
         {0, 0, 0, 0}
     };
-    int numa = 0;
+    int numa = 0, claim = 0;
 
-    SWITCH_FOREACH_OPT(opt, "hn", opts, "info", 0) {
+    SWITCH_FOREACH_OPT(opt, "hnc", opts, "info", 0) {
     case 'n':
         numa = 1;
         break;
+    case 'c':
+        claim = 1;
     }
 
-    print_info(numa);
+    print_info(numa, claim);
     return 0;
 }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index b4a87ca..a48dbb0 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -226,7 +226,9 @@ struct cmd_spec cmd_table[] = {
     { "info",
       &main_info, 0, 0,
       "Get information about Xen host",
-      "-n, --numa         List host NUMA topology information",
+      "[-n|-c]",
+      "-n, --numa         List host NUMA topology information\n"
+      "-c, --claim        List claim information",
     },
     { "sharing",
       &main_sharing, 0, 0,
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation.
  2013-03-11 14:20 ` [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation Konrad Rzeszutek Wilk
@ 2013-03-13 10:23   ` Ian Campbell
  2013-03-13 14:42     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 10:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tim (Xen.org), Dan Magenheimer, xen-devel, Keir (Xen.org), konrad

On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> From: Dan Magenheimer <dan.magenheimer@oracle.com>
> 
> We add an extra parameter to the structures passed to the
> PV routine (arch_setup_meminit) and HVM routine (setup_guest)
> that determines whether the claim hypercall is to be done.
> 
> The contents of the 'claim_mode' is defined as an 'uint8_t'
> in case the hypercall expands in the future with extra
> flags (for example for per-NUMA allocation). For right now
> the proper values are: 0 to disable it or 1 to enable
> it.

At the hypercall layer mem_flags is 32 bits, while this is only 8? But
also it seems that claim_mode at the libxc layer doesn't really
correspond directly to mem_flags anyway, which this commentary seems to
suggest it somehow does.

I think it would be less confusing to just have "int claim_enabled" for
now and add a claim_flags if and when they come into being.

> 
> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxc/xc_dom.h           |  1 +
>  tools/libxc/xc_dom_x86.c       | 12 ++++++++++++
>  tools/libxc/xc_domain.c        | 24 ++++++++++++++++++++++++
>  tools/libxc/xc_hvm_build_x86.c | 17 +++++++++++++++++
>  tools/libxc/xenctrl.h          |  6 ++++++
>  tools/libxc/xenguest.h         |  2 ++
>  6 files changed, 62 insertions(+)
> 
> diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
> index 779b9d4..34a6e38 100644
> --- a/tools/libxc/xc_dom.h
> +++ b/tools/libxc/xc_dom.h
> @@ -135,6 +135,7 @@ struct xc_dom_image {
>      domid_t guest_domid;
>      int8_t vhpt_size_log2; /* for IA64 */
>      int8_t superpages;
> +    uint8_t claim_mode; /* 0 by default disables it, 1 enables it */
>      int shadow_enabled;
>  
>      int xen_version;
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index eb9ac07..c1b7905 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -706,6 +706,13 @@ int arch_setup_meminit(struct xc_dom_image *dom)
>      }
>      else
>      {
> +        /* try to claim pages for early warning of insufficient memory avail */
> +        if ( dom->claim_mode ) {
> +            rc = xc_domain_claim_pages(dom->xch, dom->guest_domid,
> +                                       dom->total_pages);
> +            if ( rc )
> +                return rc;
> +        }
>          /* setup initial p2m */
>          for ( pfn = 0; pfn < dom->total_pages; pfn++ )
>              dom->p2m_host[pfn] = pfn;
> @@ -722,6 +729,11 @@ int arch_setup_meminit(struct xc_dom_image *dom)
>                  dom->xch, dom->guest_domid, allocsz,
>                  0, 0, &dom->p2m_host[i]);
>          }
> +
> +        /* Ensure no unclaimed pages are left unused.
> +         * OK to call if hadn't done the earlier claim call. */
> +        xc_domain_claim_pages(dom->xch, dom->guest_domid,
> +                              0 /* cancels the claim */);
>      }
>  
>      return rc;
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 480ce91..b8a345c 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -775,6 +775,30 @@ int xc_domain_add_to_physmap(xc_interface *xch,
>      return do_memory_op(xch, XENMEM_add_to_physmap, &xatp, sizeof(xatp));
>  }
>  
> +int xc_domain_claim_pages(xc_interface *xch,
> +                               uint32_t domid,
> +                               unsigned long nr_pages)
> +{
> +    int err;
> +    xen_pfn_t *extent_start = NULL;
> +    DECLARE_HYPERCALL_BOUNCE(extent_start, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> +    struct xen_memory_reservation reservation = {
> +        .nr_extents   = nr_pages,
> +        .extent_order = 0,
> +        .mem_flags    = 0, /* no flags */
> +        .domid        = domid
> +    };
> +
> +    set_xen_guest_handle(reservation.extent_start, extent_start);

This is unused? I think you just need:
	set_xen_guest_handle(reservation.extent_start, HYPERCALL_BUFFER_NULL);
and drop the declaration of the bounce above.

(personally I think a new arg struct for this subop would have been more
obvious than forcing it into the reservation struct, but what's done is
done)

> +    err = do_memory_op(xch, XENMEM_claim_pages, &reservation, sizeof(reservation));
> +    return err;
> +}
> +unsigned long xc_domain_get_outstanding_pages(xc_interface *xch)
> +{
> +    return do_memory_op(xch, XENMEM_get_outstanding_pages, NULL, 0);
> +}
> +
>  int xc_domain_populate_physmap(xc_interface *xch,
>                                 uint32_t domid,
>                                 unsigned long nr_extents,
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 3b5d777..b6b56b3 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -252,6 +252,7 @@ static int setup_guest(xc_interface *xch,
>      unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
>          stat_1gb_pages = 0;
>      int pod_mode = 0;
> +    unsigned int claim_mode = args->claim_mode;
>  
>      if ( nr_pages > target_pages )
>          pod_mode = XENMEMF_populate_on_demand;
> @@ -329,6 +330,16 @@ static int setup_guest(xc_interface *xch,
>          xch, dom, 0xa0, 0, pod_mode, &page_array[0x00]);
>      cur_pages = 0xc0;
>      stat_normal_pages = 0xc0;
> +
> +    /* try to claim pages for early warning of insufficient memory available */
> +    if ( claim_mode ) {
> +        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> +        if ( rc != 0 )
> +        {
> +            PERROR("Could not allocate memory for HVM guest.");
> +            goto error_out;
> +        }
> +    }
>      while ( (rc == 0) && (nr_pages > cur_pages) )
>      {
>          /* Clip count to maximum 1GB extent. */
> @@ -506,10 +517,16 @@ static int setup_guest(xc_interface *xch,
>          munmap(page0, PAGE_SIZE);
>      }
>  
> +    /* ensure no unclaimed pages are left unused */
> +    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);

This cannot fail?

> +
>      free(page_array);
>      return 0;
>  
>   error_out:
> +    /* ensure no unclaimed pages are left unused */
> +    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
> +
>      free(page_array);
>      return -1;

Could we consolidate the error/success paths by returning rc?

>  }
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 32122fd..e695456 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1129,6 +1129,12 @@ int xc_domain_populate_physmap_exact(xc_interface *xch,
>                                       unsigned int mem_flags,
>                                       xen_pfn_t *extent_start);
>  
> +int xc_domain_claim_pages(xc_interface *xch,
> +                               uint32_t domid,
> +                               unsigned long nr_pages);
> +
> +unsigned long xc_domain_get_outstanding_pages(xc_interface *xch);
> +
>  int xc_domain_memory_exchange_pages(xc_interface *xch,
>                                      int domid,
>                                      unsigned long nr_in_extents,
> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> index 7d4ac33..e3c879b 100644
> --- a/tools/libxc/xenguest.h
> +++ b/tools/libxc/xenguest.h
> @@ -231,6 +231,8 @@ struct xc_hvm_build_args {
>  
>      /* Extra SMBIOS structures passed to HVMLOADER */
>      struct xc_hvm_firmware_module smbios_module;
> +    /* Whether to use claim hypercall (1 - enable, 0 - disable). */
> +    uint8_t claim_mode;
>  };
>  
>  /**

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config
  2013-03-11 14:20 ` [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config Konrad Rzeszutek Wilk
@ 2013-03-13 10:41   ` Ian Campbell
  2013-03-13 14:57     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 10:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), konrad

On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> The XENMEM_claim_pages operates per domain and its usage is suppose to
> be done system wide. As such this patch introduces a global
> configuration option 'claim_mode' that by default is disabled.
> 
> The problem is that when a guest is launched the process of allocating
> memory from the hypervisor is not atomic and for large guests can take a while.
> During which time other guests (especially self-ballooning) can take
> balloon up / down making the free memory information available to the
> toolstack stale.  With the "claim_mode" we can "stake a claim" for
> exact number of pages we need to allocate the guest and we are guaranteed that
> they will be there when we start allocating the guest.

Is it strictly "guaranteed"? AIUI there are still cases where the claim
may succeed but domain building will not succeed (32 bit PV guests being
the first to spring to mind). It is more correct to say it increases the
chances of success or something like that.

> There are two options:
>  "on" - Use the default memory and also include the ephereal tmem pool in this

                                                     ephemeral ?

Or maybe "ethereal" ? 

>       calculation. This can affect temporarily negatively 'tmem' enabled guests
>       but provides more freeable memory.

The last sentence here doesn't parse for me. "This can have a temporary
negative effect on 'tmem'..." ? What are those temporary negative
effects?

>  "off" - (default) No claim attempt is made.

Is there a middle ground between off and on which stakes a claim but
doesn't negatively effect tmem guests?

> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  docs/man/xl.conf.pod.5      | 25 +++++++++++++++++++++++++
>  tools/examples/xl.conf      |  5 +++++
>  tools/libxl/libxl.c         |  5 +++++
>  tools/libxl/libxl.h         |  2 ++
>  tools/libxl/libxl_dom.c     |  3 ++-
>  tools/libxl/libxl_types.idl |  7 ++++++-
>  tools/libxl/xl.c            |  5 +++++
>  tools/libxl/xl.h            |  1 +
>  tools/libxl/xl_cmdimpl.c    |  5 +++++
>  9 files changed, 56 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
> index 82c6b20..fded0ab 100644
> --- a/docs/man/xl.conf.pod.5
> +++ b/docs/man/xl.conf.pod.5
> @@ -98,6 +98,31 @@ Configures the name of the first block device to be used for temporary
>  block device allocations by the toolstack.
>  The default choice is "xvda".
>  
> +=item B<claim_mode="off|on">

For consistency with other option this should be 
	=item B<claim_mode=BOOLEAN>
(which may require code changes)

> +If enabled when a guest is created it will provide an guarantee whether the
> +guest can be launched or not due to memory exhaustion.

"guarantee" again. But more importantly this is missing the "when". Even
without this option you get an eventual guarantee (i.e. after the guest
actually starts). I think you want to say something about a quick
upfront check or something.

I would also start with "If this option is enabled then when a guest is
created..." since it took me a couple of attempts to understand the
current wording.

>  This is needed when
> +using tmem type guests

A reference to how to enable (or tell if it is enabled) this might be
useful? Perhaps "(i.e. those with XXXX=FOO in their domain
configuration, see xl.cfg(5))" ?

>  that can balloon up and down (self-balloon) extremely
> +quickly and the list of free memory information is stale the moment it is
> +displayed. When this mode is used there will be initially a claim set for the
> +amount of pages needed which will be fulfilled as the guest is created.

"there will be initially a claim set..." -> "an initial claim will be
set...".

The last half (which will be fulfilled) doesn't read right to me either.

        When this mode is used an initial claim will be set covering the
        number of pages which will be required as the guest is created
        
?

> +If the claim fails the guest creation process will also fail.
> +
> +Default: C<off>
> +
> +=over 4
> +
> +=item B<"off">
> +
> +No action is taken and there is no stake put for the memory.

"no stake put" is a bit clumsy.

"No claim is made. Domain creation will be attempted as normal and may
fail due to out of memory errors"  ? Not great either.

> +=item B<"on">
> +
> +Normal memory and freeable pool of ephereal pages (tmem) is used when

Ephereal again?

> +calculating whether there is enough memory free to launch a guest.
> +
> +=back
> +
>  =back
>  
>  =head1 SEE ALSO
> diff --git a/tools/examples/xl.conf b/tools/examples/xl.conf
> index 28ab796..2b64f7e 100644
> --- a/tools/examples/xl.conf
> +++ b/tools/examples/xl.conf
> @@ -20,3 +20,8 @@
>  # if disabled the old behaviour will be used, and hotplug scripts will be
>  # launched by udev.
>  #run_hotplug_scripts=1
> +#
> +# stake a claim of memory when launching a guest. This guarantees immediate
> +# feedback whether the guest can be launched due to memory exhaustion
> +# (which can take a long time to find out if launching huge guests).

These mentions of long time and huge guests would be good in the docs
too I think.

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 5b080ed..e417851 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -94,6 +94,11 @@ libxl_tsc_mode = Enumeration("tsc_mode", [
>      (3, "native_paravirt"),
>      ])
>  
> +libxl_claim_mode = Enumeration("claim_mode", [
> +    (0, "off"),
> +    (1, "on"),
> +    ]);

Eeew, we have a perfectly serviceable bool (and even defbool if you
prefer) type. Please use one or the other instead of making up a third.

> @@ -102,6 +103,10 @@ static void parse_global_config(const char *configfile,
>      }
>      if (!xlu_cfg_get_string (config, "blkdev_start", &buf, 0))
>          blkdev_start = strdup(buf);
> +
> +    if (!xlu_cfg_get_string(config, "claim_mode", &buf, 0))
> +        libxl_parse_claim_mode(buf, &global_claim_mode);

This goes away if you don't invent your own type but FYI the IDL
provides you with a libxl_claim_mode_from_string() function
automatically.

> +
>      xlu_cfg_destroy(config);
>  }
>  
> diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
> index be6f38b..b1d434e 100644
> --- a/tools/libxl/xl.h
> +++ b/tools/libxl/xl.h
> @@ -145,6 +145,7 @@ int xl_child_pid(xlchildnum); /* returns 0 if child struct is not in use */
>  extern int autoballoon;
>  extern int run_hotplug_scripts;
>  extern int dryrun_only;
> +extern unsigned int global_claim_mode;
>  extern char *lockfile;
>  extern char *default_vifscript;
>  extern char *default_bridge;
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index a98705e..27298ea 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -757,6 +757,11 @@ static void parse_config_data(const char *config_source,
>      if (!xlu_cfg_get_long (config, "maxmem", &l, 0))
>          b_info->max_memkb = l * 1024;
>  
> +    if (global_claim_mode)
> +        b_info->claim_mode = (libxl_claim_mode)global_claim_mode;
> +    else
> +        b_info->claim_mode = 0;

This is equivalent to
	b_info->claim_mode = (libxl_claim_mode)global_claim_mode;
and AFAICT the cast is also redundant (and/or the type of the global is
wrong)

>      if (xlu_cfg_get_string (config, "on_poweroff", &buf, 0))
>          buf = "destroy";
>      if (!parse_action_on_shutdown(buf, &d_config->on_poweroff)) {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] xc: XENMEM_claim_pages outstanding claims value
  2013-03-11 14:20 ` [PATCH 4/6] xc: XENMEM_claim_pages outstanding claims value Konrad Rzeszutek Wilk
@ 2013-03-13 10:43   ` Ian Campbell
  2013-03-13 14:57     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 10:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tim (Xen.org), Dan Magenheimer, xen-devel, Keir (Xen.org), konrad

On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> From: Dan Magenheimer <dan.magenheimer@oracle.com>
> 
> This is patch provides the value of claimed pages but
> not yet accounted for in a domain (outstanding). This is a
> total global value that influences the hypervisors' MM system.
> This value - when the guest has finally been created and is
> running - ends up having the value zero. But during the
> memory populate calls the contents of this value will
> decrease.

I think you are trying to say that it jumps to some large value when a
claim is made, and then reduces as the domain's memory is populated and
eventually reaches zero (either through allocations or the claim being
released)?

> 
> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> [v2: s/unclaimed_pages/outstanding_pages/ per Tim's suggestion]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxc/xc_domain.c | 1 +
>  tools/libxc/xenctrl.h   | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index b8a345c..ab6dde5 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -234,6 +234,7 @@ int xc_domain_getinfo(xc_interface *xch,
>  
>          info->ssidref  = domctl.u.getdomaininfo.ssidref;
>          info->nr_pages = domctl.u.getdomaininfo.tot_pages;
> +        info->nr_outstanding_pages = domctl.u.getdomaininfo.outstanding_pages;
>          info->nr_shared_pages = domctl.u.getdomaininfo.shr_pages;
>          info->nr_paged_pages = domctl.u.getdomaininfo.paged_pages;
>          info->max_memkb = domctl.u.getdomaininfo.max_pages << (PAGE_SHIFT-10);
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index e695456..2a4d4df 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -364,6 +364,7 @@ typedef struct xc_dominfo {
>                    hvm:1, debugged:1;
>      unsigned int  shutdown_reason; /* only meaningful if shutdown==1 */
>      unsigned long nr_pages; /* current number, not maximum */
> +    unsigned long nr_outstanding_pages;
>      unsigned long nr_shared_pages;
>      unsigned long nr_paged_pages;
>      unsigned long shared_info_frame;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] xl: export 'outstanding_pages' value from xcinfo
  2013-03-11 14:20 ` [PATCH 5/6] xl: export 'outstanding_pages' value from xcinfo Konrad Rzeszutek Wilk
@ 2013-03-13 10:44   ` Ian Campbell
  0 siblings, 0 replies; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 10:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), konrad

On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> This patch provides the value of claimed pages but not yet
> accounted for in a domain (outstanding). This value is initially set
> by the XENMEM_claim_pages hypercall and as the guest is being
> created its values decreases to zero.
> 
> With this patch it is possible to see the value of this
> field.
> 
> [v2: s/unclaimed/outstanding/ per Tim's suggestion]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxl/libxl.c         | 1 +
>  tools/libxl/libxl_types.idl | 1 +
>  tools/libxl/xl_sxp.c        | 1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 8c6e1a1..0745888 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -528,6 +528,7 @@ static void xcinfo2xlinfo(const xc_domaininfo_t *xcinfo,
>      else
>          xlinfo->shutdown_reason  = ~0;
>  
> +    xlinfo->outstanding_memkb = PAGE_TO_MEMKB(xcinfo->outstanding_pages);
>      xlinfo->current_memkb = PAGE_TO_MEMKB(xcinfo->tot_pages);
>      xlinfo->shared_memkb = PAGE_TO_MEMKB(xcinfo->shr_pages);
>      xlinfo->paged_memkb = PAGE_TO_MEMKB(xcinfo->paged_pages);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index e417851..0a8b99a 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -201,6 +201,7 @@ libxl_dominfo = Struct("dominfo",[
>      # Otherwise set to a value guaranteed not to clash with any valid
>      # LIBXL_SHUTDOWN_REASON_* constant.
>      ("shutdown_reason", libxl_shutdown_reason),
> +    ("outstanding_memkb",  MemKB),
>      ("current_memkb",   MemKB),
>      ("shared_memkb", MemKB),
>      ("paged_memkb", MemKB),
> diff --git a/tools/libxl/xl_sxp.c b/tools/libxl/xl_sxp.c
> index a16a025..798f99d 100644
> --- a/tools/libxl/xl_sxp.c
> +++ b/tools/libxl/xl_sxp.c

Please don't patch this file, it is purely for legacy compatibility and
doesn't want or need to reflect new features.

Otherwise: Acked-by: Ian Campbell <ian.campbell@citrix.com>

> @@ -72,6 +72,7 @@ void printf_info_sexp(int domid, libxl_domain_config *d_config)
>      printf("\t(build_info)\n");
>      printf("\t(max_vcpus %d)\n", b_info->max_vcpus);
>      printf("\t(tsc_mode %s)\n", libxl_tsc_mode_to_string(b_info->tsc_mode));
> +    printf("\t(outstanding_memkb %"PRId64")\n", info.outstanding_memkb);
>      printf("\t(max_memkb %"PRId64")\n", b_info->max_memkb);
>      printf("\t(target_memkb %"PRId64")\n", b_info->target_memkb);
>      printf("\t(nomigrate %s)\n",

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information.
  2013-03-11 14:20 ` [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information Konrad Rzeszutek Wilk
@ 2013-03-13 10:51   ` Ian Campbell
  2013-03-13 15:02     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 10:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), konrad

On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> When guests have XENMEM_claim_pages called, they influence a global
> counter value - which has the cumulative count of the number of
> pages requested by all domains. This value is provided via the
> XENMEM_get_unclaimed_pages hypercall. The value fluctuates quite
> often so the value is stale once it is provided to the user-space.
> However it is useful for diagnostic purposes.
> 
> [v1: s/unclaimed/outstanding/]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxl/libxl.c         | 12 ++++++++++++
>  tools/libxl/libxl.h         |  1 +
>  tools/libxl/libxl_types.idl |  4 ++++
>  tools/libxl/xl_cmdimpl.c    | 29 +++++++++++++++++++++++++----
>  tools/libxl/xl_cmdtable.c   |  4 +++-
>  5 files changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 0745888..fd5d725 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -4051,6 +4051,18 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
>      return ret;
>  }
>  
> +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo)
> +{
> +    long l;
> +
> +    l = xc_domain_get_outstanding_pages(ctx->xch);
> +    if (l == -ENOSYS)

Does this function really return -errno and not -1 + set errno on error?

libxc is a bit crap^Wconfused in this respect but I don't see the
frobbing in the do_memory_op call path to turn from the -1+errno you get
from ioctl() into -errno which some call chains in libxc have.

> +        return l;
> +    claiminfo->claimed = l;
> +
> +    return 0;
> +}
> +
>  const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
>  {
>      union {
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 538bf93..8d0ab23 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -581,6 +581,7 @@ int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs);
>  
>  /* Parse the claim_mode options */
>  int libxl_parse_claim_mode(const char *s, unsigned int *flag);
> +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo);

Most similar interfaces in libxl seem to either return
libxl_claiminfo->claimed directly or return the struct.

Is there likely to be other info in this struct in the future?

>  
>  int libxl_vncviewer_exec(libxl_ctx *ctx, uint32_t domid, int autopass);
>  int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num, libxl_console_type type);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 0a8b99a..aa71ed8 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -509,6 +509,10 @@ libxl_cputopology = Struct("cputopology", [
>      ("node", uint32),
>      ], dir=DIR_OUT)
>  
> +libxl_claiminfo = Struct("claiminfo", [
> +    ("claimed", uint64),

What are the units? Should this be a MemKB or ....

> +    ])
> +
>  libxl_sched_credit_params = Struct("sched_credit_params", [
>      ("tslice_ms", integer),
>      ("ratelimit_us", integer),
[...]
> @@ -4663,18 +4681,21 @@ int main_info(int argc, char **argv)
>      int opt;
>      static struct option opts[] = {
>          {"numa", 0, 0, 'n'},
> +        {"claim", 0, 0, 'c'},

Unlike numa this is just one line, I reckon it could be printed
unconditionally.

>          COMMON_LONG_OPTS,
>          {0, 0, 0, 0}
>      };
> -    int numa = 0;
> +    int numa = 0, claim = 0;
>  
> -    SWITCH_FOREACH_OPT(opt, "hn", opts, "info", 0) {
> +    SWITCH_FOREACH_OPT(opt, "hnc", opts, "info", 0) {
>      case 'n':
>          numa = 1;
>          break;
> +    case 'c':
> +        claim = 1;
>      }
>  
> -    print_info(numa);
> +    print_info(numa, claim);
>      return 0;
>  }
>  
> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> index b4a87ca..a48dbb0 100644
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -226,7 +226,9 @@ struct cmd_spec cmd_table[] = {
>      { "info",
>        &main_info, 0, 0,
>        "Get information about Xen host",
> -      "-n, --numa         List host NUMA topology information",
> +      "[-n|-c]",
> +      "-n, --numa         List host NUMA topology information\n"
> +      "-c, --claim        List claim information",
>      },
>      { "sharing",
>        &main_sharing, 0, 0,

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation.
  2013-03-13 10:23   ` Ian Campbell
@ 2013-03-13 14:42     ` Konrad Rzeszutek Wilk
  2013-03-13 14:46       ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-13 14:42 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Tim (Xen.org), Dan Magenheimer, xen-devel, Keir (Xen.org), konrad

On Wed, Mar 13, 2013 at 10:23:15AM +0000, Ian Campbell wrote:
> On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > From: Dan Magenheimer <dan.magenheimer@oracle.com>
> > 
> > We add an extra parameter to the structures passed to the
> > PV routine (arch_setup_meminit) and HVM routine (setup_guest)
> > that determines whether the claim hypercall is to be done.
> > 
> > The contents of the 'claim_mode' is defined as an 'uint8_t'
> > in case the hypercall expands in the future with extra
> > flags (for example for per-NUMA allocation). For right now
> > the proper values are: 0 to disable it or 1 to enable
> > it.
> 
> At the hypercall layer mem_flags is 32 bits, while this is only 8? But
> also it seems that claim_mode at the libxc layer doesn't really
> correspond directly to mem_flags anyway, which this commentary seems to
> suggest it somehow does.
> 
> I think it would be less confusing to just have "int claim_enabled" for
> now and add a claim_flags if and when they come into being.

Yes! Thanks for spotting that.
> 
> > 
> > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  tools/libxc/xc_dom.h           |  1 +
> >  tools/libxc/xc_dom_x86.c       | 12 ++++++++++++
> >  tools/libxc/xc_domain.c        | 24 ++++++++++++++++++++++++
> >  tools/libxc/xc_hvm_build_x86.c | 17 +++++++++++++++++
> >  tools/libxc/xenctrl.h          |  6 ++++++
> >  tools/libxc/xenguest.h         |  2 ++
> >  6 files changed, 62 insertions(+)
> > 
> > diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
> > index 779b9d4..34a6e38 100644
> > --- a/tools/libxc/xc_dom.h
> > +++ b/tools/libxc/xc_dom.h
> > @@ -135,6 +135,7 @@ struct xc_dom_image {
> >      domid_t guest_domid;
> >      int8_t vhpt_size_log2; /* for IA64 */
> >      int8_t superpages;
> > +    uint8_t claim_mode; /* 0 by default disables it, 1 enables it */
> >      int shadow_enabled;
> >  
> >      int xen_version;
> > diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> > index eb9ac07..c1b7905 100644
> > --- a/tools/libxc/xc_dom_x86.c
> > +++ b/tools/libxc/xc_dom_x86.c
> > @@ -706,6 +706,13 @@ int arch_setup_meminit(struct xc_dom_image *dom)
> >      }
> >      else
> >      {
> > +        /* try to claim pages for early warning of insufficient memory avail */
> > +        if ( dom->claim_mode ) {
> > +            rc = xc_domain_claim_pages(dom->xch, dom->guest_domid,
> > +                                       dom->total_pages);
> > +            if ( rc )
> > +                return rc;
> > +        }
> >          /* setup initial p2m */
> >          for ( pfn = 0; pfn < dom->total_pages; pfn++ )
> >              dom->p2m_host[pfn] = pfn;
> > @@ -722,6 +729,11 @@ int arch_setup_meminit(struct xc_dom_image *dom)
> >                  dom->xch, dom->guest_domid, allocsz,
> >                  0, 0, &dom->p2m_host[i]);
> >          }
> > +
> > +        /* Ensure no unclaimed pages are left unused.
> > +         * OK to call if hadn't done the earlier claim call. */
> > +        xc_domain_claim_pages(dom->xch, dom->guest_domid,
> > +                              0 /* cancels the claim */);
> >      }
> >  
> >      return rc;
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 480ce91..b8a345c 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -775,6 +775,30 @@ int xc_domain_add_to_physmap(xc_interface *xch,
> >      return do_memory_op(xch, XENMEM_add_to_physmap, &xatp, sizeof(xatp));
> >  }
> >  
> > +int xc_domain_claim_pages(xc_interface *xch,
> > +                               uint32_t domid,
> > +                               unsigned long nr_pages)
> > +{
> > +    int err;
> > +    xen_pfn_t *extent_start = NULL;
> > +    DECLARE_HYPERCALL_BOUNCE(extent_start, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> > +    struct xen_memory_reservation reservation = {
> > +        .nr_extents   = nr_pages,
> > +        .extent_order = 0,
> > +        .mem_flags    = 0, /* no flags */
> > +        .domid        = domid
> > +    };
> > +
> > +    set_xen_guest_handle(reservation.extent_start, extent_start);
> 
> This is unused? I think you just need:
> 	set_xen_guest_handle(reservation.extent_start, HYPERCALL_BUFFER_NULL);
> and drop the declaration of the bounce above.
> 
> (personally I think a new arg struct for this subop would have been more
> obvious than forcing it into the reservation struct, but what's done is
> done)
> 
> > +    err = do_memory_op(xch, XENMEM_claim_pages, &reservation, sizeof(reservation));
> > +    return err;
> > +}
> > +unsigned long xc_domain_get_outstanding_pages(xc_interface *xch)
> > +{
> > +    return do_memory_op(xch, XENMEM_get_outstanding_pages, NULL, 0);
> > +}
> > +
> >  int xc_domain_populate_physmap(xc_interface *xch,
> >                                 uint32_t domid,
> >                                 unsigned long nr_extents,
> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > index 3b5d777..b6b56b3 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -252,6 +252,7 @@ static int setup_guest(xc_interface *xch,
> >      unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
> >          stat_1gb_pages = 0;
> >      int pod_mode = 0;
> > +    unsigned int claim_mode = args->claim_mode;
> >  
> >      if ( nr_pages > target_pages )
> >          pod_mode = XENMEMF_populate_on_demand;
> > @@ -329,6 +330,16 @@ static int setup_guest(xc_interface *xch,
> >          xch, dom, 0xa0, 0, pod_mode, &page_array[0x00]);
> >      cur_pages = 0xc0;
> >      stat_normal_pages = 0xc0;
> > +
> > +    /* try to claim pages for early warning of insufficient memory available */
> > +    if ( claim_mode ) {
> > +        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > +        if ( rc != 0 )
> > +        {
> > +            PERROR("Could not allocate memory for HVM guest.");
> > +            goto error_out;
> > +        }
> > +    }
> >      while ( (rc == 0) && (nr_pages > cur_pages) )
> >      {
> >          /* Clip count to maximum 1GB extent. */
> > @@ -506,10 +517,16 @@ static int setup_guest(xc_interface *xch,
> >          munmap(page0, PAGE_SIZE);
> >      }
> >  
> > +    /* ensure no unclaimed pages are left unused */
> > +    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
> 
> This cannot fail?

It can (say we use an older hypervisor that does not have this subop), in
which case any of the operations would return -ENOSYS - which is OK.

But in case of the hypervisor having this implemented - yes - this call
should not fail.

Thought I should probably redo the first call to be more like:

 if (claim_mode) {

	rc =..
	if ( rc == -ENOSYS)
		rc = 0; // Whatever, hypervisor is out sync.
	if ( rc != 0 )
> 
> > +
> >      free(page_array);
> >      return 0;
> >  
> >   error_out:
> > +    /* ensure no unclaimed pages are left unused */
> > +    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
> > +
> >      free(page_array);
> >      return -1;
> 
> Could we consolidate the error/success paths by returning rc?

Of course.
> 
> >  }
> > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > index 32122fd..e695456 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -1129,6 +1129,12 @@ int xc_domain_populate_physmap_exact(xc_interface *xch,
> >                                       unsigned int mem_flags,
> >                                       xen_pfn_t *extent_start);
> >  
> > +int xc_domain_claim_pages(xc_interface *xch,
> > +                               uint32_t domid,
> > +                               unsigned long nr_pages);
> > +
> > +unsigned long xc_domain_get_outstanding_pages(xc_interface *xch);
> > +
> >  int xc_domain_memory_exchange_pages(xc_interface *xch,
> >                                      int domid,
> >                                      unsigned long nr_in_extents,
> > diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> > index 7d4ac33..e3c879b 100644
> > --- a/tools/libxc/xenguest.h
> > +++ b/tools/libxc/xenguest.h
> > @@ -231,6 +231,8 @@ struct xc_hvm_build_args {
> >  
> >      /* Extra SMBIOS structures passed to HVMLOADER */
> >      struct xc_hvm_firmware_module smbios_module;
> > +    /* Whether to use claim hypercall (1 - enable, 0 - disable). */
> > +    uint8_t claim_mode;
> >  };
> >  
> >  /**
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation.
  2013-03-13 14:42     ` Konrad Rzeszutek Wilk
@ 2013-03-13 14:46       ` Ian Campbell
  0 siblings, 0 replies; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 14:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tim (Xen.org), Dan Magenheimer, xen-devel, Keir (Xen.org), konrad

On Wed, 2013-03-13 at 14:42 +0000, Konrad Rzeszutek Wilk wrote:
> > > @@ -506,10 +517,16 @@ static int setup_guest(xc_interface *xch,
> > >          munmap(page0, PAGE_SIZE);
> > >      }
> > >  
> > > +    /* ensure no unclaimed pages are left unused */
> > > +    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
> > 
> > This cannot fail?
> 
> It can (say we use an older hypervisor that does not have this subop), in
> which case any of the operations would return -ENOSYS - which is OK.

Right, that's seems ok

> But in case of the hypervisor having this implemented - yes - this call
> should not fail.
> 
> Thought I should probably redo the first call to be more like:
> 
>  if (claim_mode) {
> 
> 	rc =..
> 	if ( rc == -ENOSYS)
> 		rc = 0; // Whatever, hypervisor is out sync.
> 	if ( rc != 0 )

If I follow correctly then yes, I think that would be better.

I wonder if
+ PERROR("Could not allocate memory for HVM guest.");
and friends ought to say "claim" to distinguish from a later failure to
allocate memory?

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config
  2013-03-13 10:41   ` Ian Campbell
@ 2013-03-13 14:57     ` Konrad Rzeszutek Wilk
  2013-03-13 15:05       ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-13 14:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Keir (Xen.org), xen-devel, Tim (Xen.org), konrad

On Wed, Mar 13, 2013 at 10:41:32AM +0000, Ian Campbell wrote:
> On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > The XENMEM_claim_pages operates per domain and its usage is suppose to
> > be done system wide. As such this patch introduces a global
> > configuration option 'claim_mode' that by default is disabled.


Thanks for taking a look. I will update the patch right away. The one
particular question I have is in regards to the xl.c and using
libxl_claim_mode_from_string - as it will require adding in
#include <libxl_types.h>.

Which means that we have now:

	xl.c -> uses libxl.h
		-> libxl -> uses libxl_types.

But will now be:

	xl.c -> uses libxl.h and libxl_types
		-> libxl > uses libxl_types.


Which I am not sure if that is OK as it brings another layer of
'API' code in the xl code?

> > 
> > The problem is that when a guest is launched the process of allocating
> > memory from the hypervisor is not atomic and for large guests can take a while.
> > During which time other guests (especially self-ballooning) can take
> > balloon up / down making the free memory information available to the
> > toolstack stale.  With the "claim_mode" we can "stake a claim" for
> > exact number of pages we need to allocate the guest and we are guaranteed that
> > they will be there when we start allocating the guest.
> 
> Is it strictly "guaranteed"? AIUI there are still cases where the claim
> may succeed but domain building will not succeed (32 bit PV guests being
> the first to spring to mind). It is more correct to say it increases the
> chances of success or something like that.

Let me reword it. It will guarantee that the memory is available. However,
you pointed out to some of the edge cases (that are on the list of things
that need to be implemented, one of them being superpages)- that I should
enumerate in the document as right now for those cases (say superpages)
the claim call is not even done.

> 
> > There are two options:
> >  "on" - Use the default memory and also include the ephereal tmem pool in this
> 
>                                                      ephemeral ?
> 
> Or maybe "ethereal" ? 

It is "ephemeral/freeable"
> 
> >       calculation. This can affect temporarily negatively 'tmem' enabled guests
> >       but provides more freeable memory.
> 
> The last sentence here doesn't parse for me. "This can have a temporary
> negative effect on 'tmem'..." ? What are those temporary negative
> effects?

Let me update that with a paragraph or two.
> 
> >  "off" - (default) No claim attempt is made.
> 
> Is there a middle ground between off and on which stakes a claim but
> doesn't negatively effect tmem guests?

There was in the earlier version (v11), with a 'normal' case. But
Tim suggested tossing it out so out it went.


> 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  docs/man/xl.conf.pod.5      | 25 +++++++++++++++++++++++++
> >  tools/examples/xl.conf      |  5 +++++
> >  tools/libxl/libxl.c         |  5 +++++
> >  tools/libxl/libxl.h         |  2 ++
> >  tools/libxl/libxl_dom.c     |  3 ++-
> >  tools/libxl/libxl_types.idl |  7 ++++++-
> >  tools/libxl/xl.c            |  5 +++++
> >  tools/libxl/xl.h            |  1 +
> >  tools/libxl/xl_cmdimpl.c    |  5 +++++
> >  9 files changed, 56 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
> > index 82c6b20..fded0ab 100644
> > --- a/docs/man/xl.conf.pod.5
> > +++ b/docs/man/xl.conf.pod.5
> > @@ -98,6 +98,31 @@ Configures the name of the first block device to be used for temporary
> >  block device allocations by the toolstack.
> >  The default choice is "xvda".
> >  
> > +=item B<claim_mode="off|on">
> 
> For consistency with other option this should be 
> 	=item B<claim_mode=BOOLEAN>
> (which may require code changes)
> 
> > +If enabled when a guest is created it will provide an guarantee whether the
> > +guest can be launched or not due to memory exhaustion.
> 
> "guarantee" again. But more importantly this is missing the "when". Even
> without this option you get an eventual guarantee (i.e. after the guest
> actually starts). I think you want to say something about a quick
> upfront check or something.

OK.
> 
> I would also start with "If this option is enabled then when a guest is
> created..." since it took me a couple of attempts to understand the
> current wording.
> 
> >  This is needed when
> > +using tmem type guests
> 
> A reference to how to enable (or tell if it is enabled) this might be
> useful? Perhaps "(i.e. those with XXXX=FOO in their domain
> configuration, see xl.cfg(5))" ?

OK.
> 
> >  that can balloon up and down (self-balloon) extremely
> > +quickly and the list of free memory information is stale the moment it is
> > +displayed. When this mode is used there will be initially a claim set for the
> > +amount of pages needed which will be fulfilled as the guest is created.
> 
> "there will be initially a claim set..." -> "an initial claim will be
> set...".
> 
> The last half (which will be fulfilled) doesn't read right to me either.
> 
>         When this mode is used an initial claim will be set covering the
>         number of pages which will be required as the guest is created
>         
> ?

Wow. I am impressed by how badly I wrote that.

> 
> > +If the claim fails the guest creation process will also fail.
> > +
> > +Default: C<off>
> > +
> > +=over 4
> > +
> > +=item B<"off">
> > +
> > +No action is taken and there is no stake put for the memory.
> 
> "no stake put" is a bit clumsy.
> 
> "No claim is made. Domain creation will be attempted as normal and may
> fail due to out of memory errors"  ? Not great either.
> 
> > +=item B<"on">
> > +
> > +Normal memory and freeable pool of ephereal pages (tmem) is used when
> 
> Ephereal again?
> 
> > +calculating whether there is enough memory free to launch a guest.
> > +
> > +=back
> > +
> >  =back
> >  
> >  =head1 SEE ALSO
> > diff --git a/tools/examples/xl.conf b/tools/examples/xl.conf
> > index 28ab796..2b64f7e 100644
> > --- a/tools/examples/xl.conf
> > +++ b/tools/examples/xl.conf
> > @@ -20,3 +20,8 @@
> >  # if disabled the old behaviour will be used, and hotplug scripts will be
> >  # launched by udev.
> >  #run_hotplug_scripts=1
> > +#
> > +# stake a claim of memory when launching a guest. This guarantees immediate
> > +# feedback whether the guest can be launched due to memory exhaustion
> > +# (which can take a long time to find out if launching huge guests).
> 
> These mentions of long time and huge guests would be good in the docs
> too I think.

As in other docs beside the man page for xl.conf?

> 
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 5b080ed..e417851 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -94,6 +94,11 @@ libxl_tsc_mode = Enumeration("tsc_mode", [
> >      (3, "native_paravirt"),
> >      ])
> >  
> > +libxl_claim_mode = Enumeration("claim_mode", [
> > +    (0, "off"),
> > +    (1, "on"),
> > +    ]);
> 
> Eeew, we have a perfectly serviceable bool (and even defbool if you
> prefer) type. Please use one or the other instead of making up a third.

Sure.
> 
> > @@ -102,6 +103,10 @@ static void parse_global_config(const char *configfile,
> >      }
> >      if (!xlu_cfg_get_string (config, "blkdev_start", &buf, 0))
> >          blkdev_start = strdup(buf);
> > +
> > +    if (!xlu_cfg_get_string(config, "claim_mode", &buf, 0))
> > +        libxl_parse_claim_mode(buf, &global_claim_mode);
> 
> This goes away if you don't invent your own type but FYI the IDL
> provides you with a libxl_claim_mode_from_string() function
> automatically.

It means I have to add #include <libxl_types.h> in this file.
And I was not sure whether that was OK or whether this code 'xl.c'
should only have '#include <libxl.h>'

> 
> > +
> >      xlu_cfg_destroy(config);
> >  }
> >  
> > diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
> > index be6f38b..b1d434e 100644
> > --- a/tools/libxl/xl.h
> > +++ b/tools/libxl/xl.h
> > @@ -145,6 +145,7 @@ int xl_child_pid(xlchildnum); /* returns 0 if child struct is not in use */
> >  extern int autoballoon;
> >  extern int run_hotplug_scripts;
> >  extern int dryrun_only;
> > +extern unsigned int global_claim_mode;
> >  extern char *lockfile;
> >  extern char *default_vifscript;
> >  extern char *default_bridge;
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index a98705e..27298ea 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -757,6 +757,11 @@ static void parse_config_data(const char *config_source,
> >      if (!xlu_cfg_get_long (config, "maxmem", &l, 0))
> >          b_info->max_memkb = l * 1024;
> >  
> > +    if (global_claim_mode)
> > +        b_info->claim_mode = (libxl_claim_mode)global_claim_mode;
> > +    else
> > +        b_info->claim_mode = 0;
> 
> This is equivalent to
> 	b_info->claim_mode = (libxl_claim_mode)global_claim_mode;
> and AFAICT the cast is also redundant (and/or the type of the global is
> wrong)

Yup. Will collapse it in one.
> 
> >      if (xlu_cfg_get_string (config, "on_poweroff", &buf, 0))
> >          buf = "destroy";
> >      if (!parse_action_on_shutdown(buf, &d_config->on_poweroff)) {
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] xc: XENMEM_claim_pages outstanding claims value
  2013-03-13 10:43   ` Ian Campbell
@ 2013-03-13 14:57     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-13 14:57 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Keir (Xen.org), Dan Magenheimer, xen-devel, Tim (Xen.org), konrad

On Wed, Mar 13, 2013 at 10:43:28AM +0000, Ian Campbell wrote:
> On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > From: Dan Magenheimer <dan.magenheimer@oracle.com>
> > 
> > This is patch provides the value of claimed pages but
> > not yet accounted for in a domain (outstanding). This is a
> > total global value that influences the hypervisors' MM system.
> > This value - when the guest has finally been created and is
> > running - ends up having the value zero. But during the
> > memory populate calls the contents of this value will
> > decrease.
> 
> I think you are trying to say that it jumps to some large value when a
> claim is made, and then reduces as the domain's memory is populated and
> eventually reaches zero (either through allocations or the claim being
> released)?

<laughs>
Yes. That is what I was trying to say :-)
> 
> > 
> > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> > [v2: s/unclaimed_pages/outstanding_pages/ per Tim's suggestion]
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  tools/libxc/xc_domain.c | 1 +
> >  tools/libxc/xenctrl.h   | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index b8a345c..ab6dde5 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -234,6 +234,7 @@ int xc_domain_getinfo(xc_interface *xch,
> >  
> >          info->ssidref  = domctl.u.getdomaininfo.ssidref;
> >          info->nr_pages = domctl.u.getdomaininfo.tot_pages;
> > +        info->nr_outstanding_pages = domctl.u.getdomaininfo.outstanding_pages;
> >          info->nr_shared_pages = domctl.u.getdomaininfo.shr_pages;
> >          info->nr_paged_pages = domctl.u.getdomaininfo.paged_pages;
> >          info->max_memkb = domctl.u.getdomaininfo.max_pages << (PAGE_SHIFT-10);
> > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > index e695456..2a4d4df 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -364,6 +364,7 @@ typedef struct xc_dominfo {
> >                    hvm:1, debugged:1;
> >      unsigned int  shutdown_reason; /* only meaningful if shutdown==1 */
> >      unsigned long nr_pages; /* current number, not maximum */
> > +    unsigned long nr_outstanding_pages;
> >      unsigned long nr_shared_pages;
> >      unsigned long nr_paged_pages;
> >      unsigned long shared_info_frame;
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information.
  2013-03-13 10:51   ` Ian Campbell
@ 2013-03-13 15:02     ` Konrad Rzeszutek Wilk
  2013-03-13 16:03       ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-13 15:02 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Keir (Xen.org), xen-devel, Tim (Xen.org), konrad

On Wed, Mar 13, 2013 at 10:51:18AM +0000, Ian Campbell wrote:
> On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > When guests have XENMEM_claim_pages called, they influence a global
> > counter value - which has the cumulative count of the number of
> > pages requested by all domains. This value is provided via the
> > XENMEM_get_unclaimed_pages hypercall. The value fluctuates quite
> > often so the value is stale once it is provided to the user-space.
> > However it is useful for diagnostic purposes.
> > 
> > [v1: s/unclaimed/outstanding/]
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  tools/libxl/libxl.c         | 12 ++++++++++++
> >  tools/libxl/libxl.h         |  1 +
> >  tools/libxl/libxl_types.idl |  4 ++++
> >  tools/libxl/xl_cmdimpl.c    | 29 +++++++++++++++++++++++++----
> >  tools/libxl/xl_cmdtable.c   |  4 +++-
> >  5 files changed, 45 insertions(+), 5 deletions(-)
> > 
> > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > index 0745888..fd5d725 100644
> > --- a/tools/libxl/libxl.c
> > +++ b/tools/libxl/libxl.c
> > @@ -4051,6 +4051,18 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
> >      return ret;
> >  }
> >  
> > +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo)
> > +{
> > +    long l;
> > +
> > +    l = xc_domain_get_outstanding_pages(ctx->xch);
> > +    if (l == -ENOSYS)
> 
> Does this function really return -errno and not -1 + set errno on error?

It should be -Exxxx. I need to double check with a hypervisor that does not
have this hypercall implemented to make sure it actually does return
a proper -E value.
> 
> libxc is a bit crap^Wconfused in this respect but I don't see the
> frobbing in the do_memory_op call path to turn from the -1+errno you get
> from ioctl() into -errno which some call chains in libxc have.

I can add it via the errno (in libxc) if you think it would be better?

> 
> > +        return l;
> > +    claiminfo->claimed = l;
> > +
> > +    return 0;
> > +}
> > +
> >  const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
> >  {
> >      union {
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index 538bf93..8d0ab23 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -581,6 +581,7 @@ int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs);
> >  
> >  /* Parse the claim_mode options */
> >  int libxl_parse_claim_mode(const char *s, unsigned int *flag);
> > +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo);
> 
> Most similar interfaces in libxl seem to either return
> libxl_claiminfo->claimed directly or return the struct.
> 
> Is there likely to be other info in this struct in the future?

Yes and no. Originally there were the 'flags' and there were three of them:
on, off, tmem.

There was also suggestion to add NUMA capability to it. But right now the
hypercall expects _no_ flags. This could change in the future.

Do you think it would be better to just slim down this structure and if the
flags get added back, then expand this structure?

I am happy with either solution.

> 
> >  
> >  int libxl_vncviewer_exec(libxl_ctx *ctx, uint32_t domid, int autopass);
> >  int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num, libxl_console_type type);
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 0a8b99a..aa71ed8 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -509,6 +509,10 @@ libxl_cputopology = Struct("cputopology", [
> >      ("node", uint32),
> >      ], dir=DIR_OUT)
> >  
> > +libxl_claiminfo = Struct("claiminfo", [
> > +    ("claimed", uint64),
> 
> What are the units? Should this be a MemKB or ....

pages. Thought that is a bit confusing to users. So MemKB is a better
choice. Let me update that.
> 
> > +    ])
> > +
> >  libxl_sched_credit_params = Struct("sched_credit_params", [
> >      ("tslice_ms", integer),
> >      ("ratelimit_us", integer),
> [...]
> > @@ -4663,18 +4681,21 @@ int main_info(int argc, char **argv)
> >      int opt;
> >      static struct option opts[] = {
> >          {"numa", 0, 0, 'n'},
> > +        {"claim", 0, 0, 'c'},
> 
> Unlike numa this is just one line, I reckon it could be printed
> unconditionally.

OK.
> 
> >          COMMON_LONG_OPTS,
> >          {0, 0, 0, 0}
> >      };
> > -    int numa = 0;
> > +    int numa = 0, claim = 0;
> >  
> > -    SWITCH_FOREACH_OPT(opt, "hn", opts, "info", 0) {
> > +    SWITCH_FOREACH_OPT(opt, "hnc", opts, "info", 0) {
> >      case 'n':
> >          numa = 1;
> >          break;
> > +    case 'c':
> > +        claim = 1;
> >      }
> >  
> > -    print_info(numa);
> > +    print_info(numa, claim);
> >      return 0;
> >  }
> >  
> > diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> > index b4a87ca..a48dbb0 100644
> > --- a/tools/libxl/xl_cmdtable.c
> > +++ b/tools/libxl/xl_cmdtable.c
> > @@ -226,7 +226,9 @@ struct cmd_spec cmd_table[] = {
> >      { "info",
> >        &main_info, 0, 0,
> >        "Get information about Xen host",
> > -      "-n, --numa         List host NUMA topology information",
> > +      "[-n|-c]",
> > +      "-n, --numa         List host NUMA topology information\n"
> > +      "-c, --claim        List claim information",
> >      },
> >      { "sharing",
> >        &main_sharing, 0, 0,
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config
  2013-03-13 14:57     ` Konrad Rzeszutek Wilk
@ 2013-03-13 15:05       ` Ian Campbell
  0 siblings, 0 replies; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 15:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Keir (Xen.org), xen-devel, Tim (Xen.org), konrad

On Wed, 2013-03-13 at 14:57 +0000, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 13, 2013 at 10:41:32AM +0000, Ian Campbell wrote:
> > On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > > The XENMEM_claim_pages operates per domain and its usage is suppose to
> > > be done system wide. As such this patch introduces a global
> > > configuration option 'claim_mode' that by default is disabled.
> 
> 
> Thanks for taking a look. I will update the patch right away. The one
> particular question I have is in regards to the xl.c and using
> libxl_claim_mode_from_string - as it will require adding in
> #include <libxl_types.h>.

I don't think it should, libxl_types.h is included by libxl.h. (it's
hidden halfway through, due to the ordering constraints of the
autogenerated header.

> Which I am not sure if that is OK as it brings another layer of
> 'API' code in the xl code?

You should be good to go with just libxl.h, no need for an explicit
libxl_types include.

> > > The problem is that when a guest is launched the process of allocating
> > > memory from the hypervisor is not atomic and for large guests can take a while.
> > > During which time other guests (especially self-ballooning) can take
> > > balloon up / down making the free memory information available to the
> > > toolstack stale.  With the "claim_mode" we can "stake a claim" for
> > > exact number of pages we need to allocate the guest and we are guaranteed that
> > > they will be there when we start allocating the guest.
> > 
> > Is it strictly "guaranteed"? AIUI there are still cases where the claim
> > may succeed but domain building will not succeed (32 bit PV guests being
> > the first to spring to mind). It is more correct to say it increases the
> > chances of success or something like that.
> 
> Let me reword it. It will guarantee that the memory is available. However,
> you pointed out to some of the edge cases (that are on the list of things
> that need to be implemented, one of them being superpages)- that I should
> enumerate in the document as right now for those cases (say superpages)
> the claim call is not even done.

That might be a good idea.

> > 
> > > +If the claim fails the guest creation process will also fail.
> > > +
> > > +Default: C<off>
> > > +
> > > +=over 4
> > > +
> > > +=item B<"off">
> > > +
> > > +No action is taken and there is no stake put for the memory.
> > 
> > "no stake put" is a bit clumsy.
> > 
> > "No claim is made. Domain creation will be attempted as normal and may
> > fail due to out of memory errors"  ? Not great either.
> > 
> > > +=item B<"on">
> > > +
> > > +Normal memory and freeable pool of ephereal pages (tmem) is used when
> > 
> > Ephereal again?
> > 
> > > +calculating whether there is enough memory free to launch a guest.
> > > +
> > > +=back
> > > +
> > >  =back
> > >  
> > >  =head1 SEE ALSO
> > > diff --git a/tools/examples/xl.conf b/tools/examples/xl.conf
> > > index 28ab796..2b64f7e 100644
> > > --- a/tools/examples/xl.conf
> > > +++ b/tools/examples/xl.conf
> > > @@ -20,3 +20,8 @@
> > >  # if disabled the old behaviour will be used, and hotplug scripts will be
> > >  # launched by udev.
> > >  #run_hotplug_scripts=1
> > > +#
> > > +# stake a claim of memory when launching a guest. This guarantees immediate
> > > +# feedback whether the guest can be launched due to memory exhaustion
> > > +# (which can take a long time to find out if launching huge guests).
> > 
> > These mentions of long time and huge guests would be good in the docs
> > too I think.
> 
> As in other docs beside the man page for xl.conf?

I meant xl.conf, which isn't really explicit about this particular
aspect of the failure mode.

> > 
> > > @@ -102,6 +103,10 @@ static void parse_global_config(const char *configfile,
> > >      }
> > >      if (!xlu_cfg_get_string (config, "blkdev_start", &buf, 0))
> > >          blkdev_start = strdup(buf);
> > > +
> > > +    if (!xlu_cfg_get_string(config, "claim_mode", &buf, 0))
> > > +        libxl_parse_claim_mode(buf, &global_claim_mode);
> > 
> > This goes away if you don't invent your own type but FYI the IDL
> > provides you with a libxl_claim_mode_from_string() function
> > automatically.
> 
> It means I have to add #include <libxl_types.h> in this file.

Nope ;-)

> And I was not sure whether that was OK or whether this code 'xl.c'
> should only have '#include <libxl.h>'

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information.
  2013-03-13 15:02     ` Konrad Rzeszutek Wilk
@ 2013-03-13 16:03       ` Ian Campbell
  2013-03-15 18:05         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-03-13 16:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Keir (Xen.org), xen-devel, Tim (Xen.org), konrad

On Wed, 2013-03-13 at 15:02 +0000, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 13, 2013 at 10:51:18AM +0000, Ian Campbell wrote:
> > On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > > When guests have XENMEM_claim_pages called, they influence a global
> > > counter value - which has the cumulative count of the number of
> > > pages requested by all domains. This value is provided via the
> > > XENMEM_get_unclaimed_pages hypercall. The value fluctuates quite
> > > often so the value is stale once it is provided to the user-space.
> > > However it is useful for diagnostic purposes.
> > > 
> > > [v1: s/unclaimed/outstanding/]
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > >  tools/libxl/libxl.c         | 12 ++++++++++++
> > >  tools/libxl/libxl.h         |  1 +
> > >  tools/libxl/libxl_types.idl |  4 ++++
> > >  tools/libxl/xl_cmdimpl.c    | 29 +++++++++++++++++++++++++----
> > >  tools/libxl/xl_cmdtable.c   |  4 +++-
> > >  5 files changed, 45 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > > index 0745888..fd5d725 100644
> > > --- a/tools/libxl/libxl.c
> > > +++ b/tools/libxl/libxl.c
> > > @@ -4051,6 +4051,18 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
> > >      return ret;
> > >  }
> > >  
> > > +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo)
> > > +{
> > > +    long l;
> > > +
> > > +    l = xc_domain_get_outstanding_pages(ctx->xch);
> > > +    if (l == -ENOSYS)
> > 
> > Does this function really return -errno and not -1 + set errno on error?
> 
> It should be -Exxxx. I need to double check with a hypervisor that does not
> have this hypercall implemented to make sure it actually does return
> a proper -E value.

You should probably double check the call chain to make sure it
propagates the way you think. Remember that libc (not libxc) translates
the return value of the underlying ioctl used to make the hypercall from
-Efoo into -1 + errno=Efoo. Some libxc call paths then jump through
hoops to turn this back into -Efoo :-( Some libxc calls do either
depending on the error path they take :-((

> > 
> > libxc is a bit crap^Wconfused in this respect but I don't see the
> > frobbing in the do_memory_op call path to turn from the -1+errno you get
> > from ioctl() into -errno which some call chains in libxc have.
> 
> I can add it via the errno (in libxc) if you think it would be better?

I prefer -1+errno to -errno myself but libxc is such a mess that the
important thing is that this call really does behave the way you
think...

> 
> > 
> > > +        return l;
> > > +    claiminfo->claimed = l;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
> > >  {
> > >      union {
> > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > > index 538bf93..8d0ab23 100644
> > > --- a/tools/libxl/libxl.h
> > > +++ b/tools/libxl/libxl.h
> > > @@ -581,6 +581,7 @@ int libxl_wait_for_memory_target(libxl_ctx *ctx, uint32_t domid, int wait_secs);
> > >  
> > >  /* Parse the claim_mode options */
> > >  int libxl_parse_claim_mode(const char *s, unsigned int *flag);
> > > +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo);
> > 
> > Most similar interfaces in libxl seem to either return
> > libxl_claiminfo->claimed directly or return the struct.
> > 
> > Is there likely to be other info in this struct in the future?
> 
> Yes and no. Originally there were the 'flags' and there were three of them:
> on, off, tmem.

Wouldn't those be per-claim properties? Whereas this call is returning
global claim state?

> There was also suggestion to add NUMA capability to it. But right now the
> hypercall expects _no_ flags. This could change in the future.
> 
> Do you think it would be better to just slim down this structure and if the
> flags get added back, then expand this structure?
> 
> I am happy with either solution.

Looking again it seems there is precedent for both ways, so how you have
it should be fine.

BTW libxl_get_claiminfo should probably call the init function before
filling in the values.

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information.
  2013-03-13 16:03       ` Ian Campbell
@ 2013-03-15 18:05         ` Konrad Rzeszutek Wilk
  2013-03-18  9:42           ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-15 18:05 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), konrad

On Wed, Mar 13, 2013 at 04:03:14PM +0000, Ian Campbell wrote:
> On Wed, 2013-03-13 at 15:02 +0000, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 13, 2013 at 10:51:18AM +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > > > When guests have XENMEM_claim_pages called, they influence a global
> > > > counter value - which has the cumulative count of the number of
> > > > pages requested by all domains. This value is provided via the
> > > > XENMEM_get_unclaimed_pages hypercall. The value fluctuates quite
> > > > often so the value is stale once it is provided to the user-space.
> > > > However it is useful for diagnostic purposes.
> > > > 
> > > > [v1: s/unclaimed/outstanding/]
> > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > ---
> > > >  tools/libxl/libxl.c         | 12 ++++++++++++
> > > >  tools/libxl/libxl.h         |  1 +
> > > >  tools/libxl/libxl_types.idl |  4 ++++
> > > >  tools/libxl/xl_cmdimpl.c    | 29 +++++++++++++++++++++++++----
> > > >  tools/libxl/xl_cmdtable.c   |  4 +++-
> > > >  5 files changed, 45 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > > > index 0745888..fd5d725 100644
> > > > --- a/tools/libxl/libxl.c
> > > > +++ b/tools/libxl/libxl.c
> > > > @@ -4051,6 +4051,18 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
> > > >      return ret;
> > > >  }
> > > >  
> > > > +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo)
> > > > +{
> > > > +    long l;
> > > > +
> > > > +    l = xc_domain_get_outstanding_pages(ctx->xch);
> > > > +    if (l == -ENOSYS)
> > > 
> > > Does this function really return -errno and not -1 + set errno on error?
> > 
> > It should be -Exxxx. I need to double check with a hypervisor that does not
> > have this hypercall implemented to make sure it actually does return
> > a proper -E value.
> 
> You should probably double check the call chain to make sure it
> propagates the way you think. Remember that libc (not libxc) translates
> the return value of the underlying ioctl used to make the hypercall from
> -Efoo into -1 + errno=Efoo. Some libxc call paths then jump through
> hoops to turn this back into -Efoo :-( Some libxc calls do either
> depending on the error path they take :-((

It looks as it gets the righ value:
> xl create vm-hvm.cfg 
Parsing config from vm-hvm.cfg
WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" instead if you really want a non-default firmware
xc: info: VIRTUAL MEMORY ARRANGEMENT:
  Loader:        0000000000100000->000000000019c844
  Modules:       0000000000000000->0000000000000000
  TOTAL:         0000000000000000->00000000ff800000
  ENTRY ADDRESS: 0000000000100000
xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (38 = Function not implemented): Internal error
libxl: error: libxl_dom.c:613:libxl__build_hvm: hvm building failed
libxl: error: libxl_create.c:907:domcreate_rebuild_done: cannot (re-)build domain: -3
libxl: error: libxl_dm.c:1250:libxl__destroy_device_model: could not find device-model's pid for dom 1
libxl: error: libxl.c:1415:libxl__destroy_domid: libxl__destroy_device_model failed for 1

18:02:45 # 22 :/OVS/seed_pool/konrad/ 
> echo $?
3

and when running 'xl info' through strace:

ioctl(5, SNDCTL_DSP_RESET, 0x7fffb7a4ebe0) = -1 ENOSYS (Function not implemented)
write(2, "libxl_get_claiminfo failed.\n", 28libxl_get_claiminfo failed.
) = 28

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information.
  2013-03-15 18:05         ` Konrad Rzeszutek Wilk
@ 2013-03-18  9:42           ` Ian Campbell
  2013-03-18 13:10             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-03-18  9:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), konrad

On Fri, 2013-03-15 at 18:05 +0000, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 13, 2013 at 04:03:14PM +0000, Ian Campbell wrote:
> > On Wed, 2013-03-13 at 15:02 +0000, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Mar 13, 2013 at 10:51:18AM +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > > > > When guests have XENMEM_claim_pages called, they influence a global
> > > > > counter value - which has the cumulative count of the number of
> > > > > pages requested by all domains. This value is provided via the
> > > > > XENMEM_get_unclaimed_pages hypercall. The value fluctuates quite
> > > > > often so the value is stale once it is provided to the user-space.
> > > > > However it is useful for diagnostic purposes.
> > > > > 
> > > > > [v1: s/unclaimed/outstanding/]
> > > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > > ---
> > > > >  tools/libxl/libxl.c         | 12 ++++++++++++
> > > > >  tools/libxl/libxl.h         |  1 +
> > > > >  tools/libxl/libxl_types.idl |  4 ++++
> > > > >  tools/libxl/xl_cmdimpl.c    | 29 +++++++++++++++++++++++++----
> > > > >  tools/libxl/xl_cmdtable.c   |  4 +++-
> > > > >  5 files changed, 45 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > > > > index 0745888..fd5d725 100644
> > > > > --- a/tools/libxl/libxl.c
> > > > > +++ b/tools/libxl/libxl.c
> > > > > @@ -4051,6 +4051,18 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
> > > > >      return ret;
> > > > >  }
> > > > >  
> > > > > +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo)
> > > > > +{
> > > > > +    long l;
> > > > > +
> > > > > +    l = xc_domain_get_outstanding_pages(ctx->xch);
> > > > > +    if (l == -ENOSYS)
> > > > 
> > > > Does this function really return -errno and not -1 + set errno on error?
> > > 
> > > It should be -Exxxx. I need to double check with a hypervisor that does not
> > > have this hypercall implemented to make sure it actually does return
> > > a proper -E value.
> > 
> > You should probably double check the call chain to make sure it
> > propagates the way you think. Remember that libc (not libxc) translates
> > the return value of the underlying ioctl used to make the hypercall from
> > -Efoo into -1 + errno=Efoo. Some libxc call paths then jump through
> > hoops to turn this back into -Efoo :-( Some libxc calls do either
> > depending on the error path they take :-((
> 
> It looks as it gets the righ value:
> > xl create vm-hvm.cfg 
> Parsing config from vm-hvm.cfg
> WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" instead if you really want a non-default firmware
> xc: info: VIRTUAL MEMORY ARRANGEMENT:
>   Loader:        0000000000100000->000000000019c844
>   Modules:       0000000000000000->0000000000000000
>   TOTAL:         0000000000000000->00000000ff800000
>   ENTRY ADDRESS: 0000000000100000
> xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (38 = Function not implemented): Internal error

I don't think this path uses xc_domain_get_outstanding_pages does it? I
expect it is calling xc_domain_claim_pages(). This message calls PERROR
which uses errno not -err, which in a round about way shows that
xc_domain_get_outstanding_pages does set errno and return -1 (since both
use do_memory_op) and doesn't return -errno (i.e. it "proves" the
opposite to what you were trying to prove...)

> libxl: error: libxl_dom.c:613:libxl__build_hvm: hvm building failed
> libxl: error: libxl_create.c:907:domcreate_rebuild_done: cannot (re-)build domain: -3
> libxl: error: libxl_dm.c:1250:libxl__destroy_device_model: could not find device-model's pid for dom 1
> libxl: error: libxl.c:1415:libxl__destroy_domid: libxl__destroy_device_model failed for 1
> 
> 18:02:45 # 22 :/OVS/seed_pool/konrad/ 
> > echo $?
> 3
> 
> and when running 'xl info' through strace:
> 
> ioctl(5, SNDCTL_DSP_RESET, 0x7fffb7a4ebe0) = -1 ENOSYS (Function not implemented)
> write(2, "libxl_get_claiminfo failed.\n", 28libxl_get_claiminfo failed.
> ) = 28

Does this print leak into the xl info output? Would be preferable to
keep it silent I think, or to integrate cleanly with the output (e..g y
outputting "0" or "-" or...)

Ian.


> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information.
  2013-03-18  9:42           ` Ian Campbell
@ 2013-03-18 13:10             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-18 13:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), konrad

On Mon, Mar 18, 2013 at 09:42:14AM +0000, Ian Campbell wrote:
> On Fri, 2013-03-15 at 18:05 +0000, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 13, 2013 at 04:03:14PM +0000, Ian Campbell wrote:
> > > On Wed, 2013-03-13 at 15:02 +0000, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Mar 13, 2013 at 10:51:18AM +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-11 at 14:20 +0000, Konrad Rzeszutek Wilk wrote:
> > > > > > When guests have XENMEM_claim_pages called, they influence a global
> > > > > > counter value - which has the cumulative count of the number of
> > > > > > pages requested by all domains. This value is provided via the
> > > > > > XENMEM_get_unclaimed_pages hypercall. The value fluctuates quite
> > > > > > often so the value is stale once it is provided to the user-space.
> > > > > > However it is useful for diagnostic purposes.
> > > > > > 
> > > > > > [v1: s/unclaimed/outstanding/]
> > > > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > > > ---
> > > > > >  tools/libxl/libxl.c         | 12 ++++++++++++
> > > > > >  tools/libxl/libxl.h         |  1 +
> > > > > >  tools/libxl/libxl_types.idl |  4 ++++
> > > > > >  tools/libxl/xl_cmdimpl.c    | 29 +++++++++++++++++++++++++----
> > > > > >  tools/libxl/xl_cmdtable.c   |  4 +++-
> > > > > >  5 files changed, 45 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > > > > > index 0745888..fd5d725 100644
> > > > > > --- a/tools/libxl/libxl.c
> > > > > > +++ b/tools/libxl/libxl.c
> > > > > > @@ -4051,6 +4051,18 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
> > > > > >      return ret;
> > > > > >  }
> > > > > >  
> > > > > > +int libxl_get_claiminfo(libxl_ctx *ctx, libxl_claiminfo *claiminfo)
> > > > > > +{
> > > > > > +    long l;
> > > > > > +
> > > > > > +    l = xc_domain_get_outstanding_pages(ctx->xch);
> > > > > > +    if (l == -ENOSYS)
> > > > > 
> > > > > Does this function really return -errno and not -1 + set errno on error?
> > > > 
> > > > It should be -Exxxx. I need to double check with a hypervisor that does not
> > > > have this hypercall implemented to make sure it actually does return
> > > > a proper -E value.
> > > 
> > > You should probably double check the call chain to make sure it
> > > propagates the way you think. Remember that libc (not libxc) translates
> > > the return value of the underlying ioctl used to make the hypercall from
> > > -Efoo into -1 + errno=Efoo. Some libxc call paths then jump through
> > > hoops to turn this back into -Efoo :-( Some libxc calls do either
> > > depending on the error path they take :-((
> > 
> > It looks as it gets the righ value:
> > > xl create vm-hvm.cfg 
> > Parsing config from vm-hvm.cfg
> > WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" instead if you really want a non-default firmware
> > xc: info: VIRTUAL MEMORY ARRANGEMENT:
> >   Loader:        0000000000100000->000000000019c844
> >   Modules:       0000000000000000->0000000000000000
> >   TOTAL:         0000000000000000->00000000ff800000
> >   ENTRY ADDRESS: 0000000000100000
> > xc: error: Could not allocate memory for HVM guest as we cannot claim memory! (38 = Function not implemented): Internal error
> 
> I don't think this path uses xc_domain_get_outstanding_pages does it? I
> expect it is calling xc_domain_claim_pages(). This message calls PERROR

Right, xc_domain_claim_pges.
> which uses errno not -err, which in a round about way shows that
> xc_domain_get_outstanding_pages does set errno and return -1 (since both
> use do_memory_op) and doesn't return -errno (i.e. it "proves" the
> opposite to what you were trying to prove...)

Yes. I realized that after I sent the email - and then redid the code to
take care of that.
> 
> > libxl: error: libxl_dom.c:613:libxl__build_hvm: hvm building failed
> > libxl: error: libxl_create.c:907:domcreate_rebuild_done: cannot (re-)build domain: -3
> > libxl: error: libxl_dm.c:1250:libxl__destroy_device_model: could not find device-model's pid for dom 1
> > libxl: error: libxl.c:1415:libxl__destroy_domid: libxl__destroy_device_model failed for 1
> > 
> > 18:02:45 # 22 :/OVS/seed_pool/konrad/ 
> > > echo $?
> > 3
> > 
> > and when running 'xl info' through strace:
> > 
> > ioctl(5, SNDCTL_DSP_RESET, 0x7fffb7a4ebe0) = -1 ENOSYS (Function not implemented)
> > write(2, "libxl_get_claiminfo failed.\n", 28libxl_get_claiminfo failed.
> > ) = 28
> 
> Does this print leak into the xl info output? Would be preferable to
> keep it silent I think, or to integrate cleanly with the output (e..g y
> outputting "0" or "-" or...)


Yup. Cleaned it up.
> 
> Ian.
> 
> 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-08 20:07             ` Konrad Rzeszutek Wilk
@ 2013-03-11  9:25               ` Tim Deegan
  0 siblings, 0 replies; 32+ messages in thread
From: Tim Deegan @ 2013-03-11  9:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

At 15:07 -0500 on 08 Mar (1362755233), Konrad Rzeszutek Wilk wrote:
> Right so if we eliminate the flags and the 'need_free_memory' this
> becomes a lot easier.
> 
> Here is the patch (there is of course more of the toolstack changes - those
> I can post later).

This looks good to me.  Thanks!

Tim.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-07 15:56           ` Tim Deegan
@ 2013-03-08 20:07             ` Konrad Rzeszutek Wilk
  2013-03-11  9:25               ` Tim Deegan
  0 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-08 20:07 UTC (permalink / raw)
  To: Tim Deegan; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

On Thu, Mar 07, 2013 at 03:56:35PM +0000, Tim Deegan wrote:
> At 10:36 -0500 on 06 Mar (1362566187), Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 06, 2013 at 09:07:46AM +0000, Tim Deegan wrote:
> > > Oh I see.  That's pretty strange semantics for a 'claim', though.
> > > Wouldn't it make sense for the toolstack just to query free and claimed
> > > memory on the first pass and fail if there's not enough space?
> > 
> > So do something like this:
> > 
> > 	if ( dom->claim_enabled ) {
> > 		unsigned long outstanding = xc_domain_get_outstanding_pages(dom->xch);
> > 		xc_physinfo_t xcphysinfo = { 0 };
> > 		int flag = XENMEMF_claim_normal;
> > 	
> > 		rc = xc_physinfo(dom->xch, &xcphysinfo);
> > 
> > 		if (xcphysinfo.total_pages + outstanding > dom->total_pages)
> > 			flag = XENMEMF_claim_tmem;
> > 
> > 		rc = xc_domain_claim_pages(dom->xch, dom->guest_domid, dom->total_pages,
> > 					   flag);
> > 	}
> > 
> > (Ignorning the checks for 'rc' and bailing out as neccessary)
> 
> No, I meant to get rid of the XENMEMF_claim_* flags altogether (so all
> claims are 'tmem' claims) and, at a suitable level in the toolstack, do
> something like this:
> 
>   LOCK
>   free := [XEN_SYSCTL_physinfo]->free
>   claimed := [XENMEM_get_outstanding_claims]
>   IF need_free_memory AND free - claimed < needed THEN
>     DESPAIR (claim would need tmem freeable pages)
>   ELSE
>     IF [XENMEM_claim](needed) THEN
>       REJOICE
>     ELSE
>       DESPAIR (could not claim enough memory)
>     ENDIF
>   ENDIF
>   UNLOCK
> 
> (using whatever wrappers around the hypercalls are appropriate)

Right so if we eliminate the flags and the 'need_free_memory' this
becomes a lot easier.

Here is the patch (there is of course more of the toolstack changes - those
I can post later).

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b360de1..64ee29d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -507,6 +507,7 @@ int domain_kill(struct domain *d)
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
+        domain_set_outstanding_pages(d, 0);
         d->tmem = NULL;
         /* fallthrough */
     case DOMDYING_dying:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b7f6619..c98e99c 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
 
     info->tot_pages         = d->tot_pages;
     info->max_pages         = d->max_pages;
+    info->outstanding_pages = d->outstanding_pages;
     info->shr_pages         = atomic_read(&d->shr_pages);
     info->paged_pages       = atomic_read(&d->paged_pages);
     info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 08550ef..3cfc1e3 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -712,6 +712,39 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case XENMEM_claim_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&reservation, arg, 1) )
+            return -EFAULT;
+
+        if ( !guest_handle_is_null(reservation.extent_start) )
+            return -EINVAL;
+
+        if ( reservation.extent_order != 0 )
+            return -EINVAL;
+
+        if ( reservation.mem_flags != 0 )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_id(reservation.domid);
+        if ( d == NULL )
+            return -EINVAL;
+
+        rc = domain_set_outstanding_pages(d, reservation.nr_extents);
+
+        rcu_unlock_domain(d);
+
+        break;
+
+    case XENMEM_get_outstanding_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        rc = get_outstanding_claims();
+        break;
+
     default:
         rc = arch_memory_op(op, arg);
         break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e9fb15..aefef29 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -252,11 +252,114 @@ static long midsize_alloc_zone_pages;
 #define MIDSIZE_ALLOC_FRAC 128
 
 static DEFINE_SPINLOCK(heap_lock);
+static long outstanding_claims; /* total outstanding claims by all domains */
 
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
 {
+    long dom_before, dom_after, dom_claimed, sys_before, sys_after;
+
     ASSERT(spin_is_locked(&d->page_alloc_lock));
-    return d->tot_pages += pages;
+    d->tot_pages += pages;
+
+    /*
+     * can test d->claimed_pages race-free because it can only change
+     * if d->page_alloc_lock and heap_lock are both held, see also
+     * domain_set_outstanding_pages below
+     */
+    if ( !d->outstanding_pages )
+        goto out;
+
+    spin_lock(&heap_lock);
+    /* adjust domain outstanding pages; may not go negative */
+    dom_before = d->outstanding_pages;
+    dom_after = dom_before - pages;
+    BUG_ON(dom_before < 0);
+    dom_claimed = dom_after < 0 ? 0 : dom_after;
+    d->outstanding_pages = dom_claimed;
+    /* flag accounting bug if system outstanding_claims would go negative */
+    sys_before = outstanding_claims;
+    sys_after = sys_before - (dom_before - dom_claimed);
+    BUG_ON(sys_after < 0);
+    outstanding_claims = sys_after;
+    spin_unlock(&heap_lock);
+
+out:
+    return d->tot_pages;
+}
+
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
+{
+    int ret = -ENOMEM;
+    unsigned long claim, avail_pages;
+
+    /*
+     * take the domain's page_alloc_lock, else all d->tot_page adjustments
+     * must always take the global heap_lock rather than only in the much
+     * rarer case that d->outstanding_pages is non-zero
+     */
+    spin_lock(&d->page_alloc_lock);
+    spin_lock(&heap_lock);
+
+    /* pages==0 means "unset" the claim. */
+    if ( pages == 0 )
+    {
+        outstanding_claims -= d->outstanding_pages;
+        d->outstanding_pages = 0;
+        ret = 0;
+        goto out;
+    }
+
+    /* only one active claim per domain please */
+    if ( d->outstanding_pages )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* disallow a claim not exceeding current tot_pages or above max_pages */
+    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* how much memory is available? */
+    avail_pages = total_avail_pages;
+
+    /* Note: The usage of claim means that allocation from a guest *might*
+     * have to come from freeable memory. Using free memory is always better, if
+     * it is available, than using freeable memory.
+     *
+     * But that is OK as once the claim has been made, it still can take minutes
+     * before the claim is fully satisfied. Tmem can make use of the unclaimed
+     * pages during this time (to store ephemeral/freeable pages only,
+     * not persistent pages).
+     */
+    avail_pages += tmem_freeable_pages();
+    avail_pages -= outstanding_claims;
+
+    /*
+     * Note, if domain has already allocated memory before making a claim
+     * then the claim must take tot_pages into account
+     */
+    claim = pages - d->tot_pages;
+    if ( claim > avail_pages )
+        goto out;
+
+    /* yay, claim fits in available memory, stake the claim, success! */
+    d->outstanding_pages = claim;
+    outstanding_claims += d->outstanding_pages;
+    ret = 0;
+
+out:
+    spin_unlock(&heap_lock);
+    spin_unlock(&d->page_alloc_lock);
+    return ret;
+}
+
+long get_outstanding_claims(void)
+{
+    return outstanding_claims;
 }
 
 static unsigned long init_node_heap(int node, unsigned long mfn,
@@ -397,7 +500,7 @@ static void __init setup_low_mem_virq(void)
 static void check_low_mem_virq(void)
 {
     unsigned long avail_pages = total_avail_pages +
-        (opt_tmem ? tmem_freeable_pages() : 0);
+        (opt_tmem ? tmem_freeable_pages() : 0) - outstanding_claims;
 
     if ( unlikely(avail_pages <= low_mem_virq_th) )
     {
@@ -466,6 +569,15 @@ static struct page_info *alloc_heap_pages(
     spin_lock(&heap_lock);
 
     /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          (d == NULL || d->outstanding_pages < request) )
+        goto not_found;
+
+    /*
      * TMEM: When available memory is scarce due to tmem absorbing it, allow
      * only mid-size allocations to avoid worst of fragmentation issues.
      * Others try tmem pools then fail.  This is a workaround until all
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index deb19db..113b8dc 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -36,7 +36,7 @@
 #include "grant_table.h"
 #include "hvm/save.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo {
     uint32_t flags;              /* XEN_DOMINF_* */
     uint64_aligned_t tot_pages;
     uint64_aligned_t max_pages;
+    uint64_aligned_t outstanding_pages;
     uint64_aligned_t shr_pages;
     uint64_aligned_t paged_pages;
     uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 1c5ca19..51d5254 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -68,6 +68,8 @@ struct xen_memory_reservation {
      *   IN:  GPFN bases of extents to populate with memory
      *   OUT: GMFN bases of extents that were allocated
      *   (NB. This command also updates the mach_to_phys translation table)
+     * XENMEM_claim_pages:
+     *   IN: must be zero
      */
     XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
 
@@ -430,10 +432,39 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
 
 /*
- * Reserve ops for future/out-of-tree "claim" patches (Oracle)
+ * Attempt to stake a claim for a domain on a quantity of pages
+ * of system RAM, but _not_ assign specific pageframes.  Only
+ * arithmetic is performed so the hypercall is very fast and need
+ * not be preemptible, thus sidestepping time-of-check-time-of-use
+ * races for memory allocation.  Returns 0 if the hypervisor page
+ * allocator has atomically and successfully claimed the requested
+ * number of pages, else non-zero.
+ *
+ * Any domain may have only one active claim.  When sufficient memory
+ * has been allocated to resolve the claim, the claim silently expires.
+ * Claiming zero pages effectively resets any outstanding claim and
+ * is always successful.
+ *
+ * Note that a valid claim may be staked even after memory has been
+ * allocated for a domain.  In this case, the claim is not incremental,
+ * i.e. if the domain's tot_pages is 3, and a claim is staked for 10,
+ * only 7 additional pages are claimed.
+ *
+ * Caller must be privileged or the hypercall fails.
  */
 #define XENMEM_claim_pages                  24
-#define XENMEM_get_unclaimed_pages          25
+
+/*
+ * XENMEM_claim_pages flags - the are no flags at this time.
+ * The zero value is appropiate.
+ */
+
+/*
+ * Get the number of pages currently claimed (but not yet "possessed")
+ * across all domains.  The caller must be privileged but otherwise
+ * the call never fails.
+ */
+#define XENMEM_get_outstanding_pages        25
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 2f701f5..28512fb 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -49,7 +49,10 @@ void free_xenheap_pages(void *v, unsigned int order);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 
+/* Claim handling */
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages);
+long get_outstanding_claims(void);
 
 /* Domain suballocator. These functions are *not* interrupt-safe.*/
 void init_domheap_pages(paddr_t ps, paddr_t pe);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e108436..569e76e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -242,6 +242,7 @@ struct domain
     struct page_list_head page_list;  /* linked list */
     struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
     unsigned int     tot_pages;       /* number of pages currently possesed */
+    unsigned int     outstanding_pages; /* pages claimed but not possessed  */
     unsigned int     max_pages;       /* maximum value for tot_pages        */
     atomic_t         shr_pages;       /* number of shared pages             */
     atomic_t         paged_pages;     /* number of paged-out pages          */
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-06 15:36         ` Konrad Rzeszutek Wilk
  2013-03-07 14:59           ` Konrad Rzeszutek Wilk
@ 2013-03-07 15:56           ` Tim Deegan
  2013-03-08 20:07             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 32+ messages in thread
From: Tim Deegan @ 2013-03-07 15:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

At 10:36 -0500 on 06 Mar (1362566187), Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 06, 2013 at 09:07:46AM +0000, Tim Deegan wrote:
> > Oh I see.  That's pretty strange semantics for a 'claim', though.
> > Wouldn't it make sense for the toolstack just to query free and claimed
> > memory on the first pass and fail if there's not enough space?
> 
> So do something like this:
> 
> 	if ( dom->claim_enabled ) {
> 		unsigned long outstanding = xc_domain_get_outstanding_pages(dom->xch);
> 		xc_physinfo_t xcphysinfo = { 0 };
> 		int flag = XENMEMF_claim_normal;
> 	
> 		rc = xc_physinfo(dom->xch, &xcphysinfo);
> 
> 		if (xcphysinfo.total_pages + outstanding > dom->total_pages)
> 			flag = XENMEMF_claim_tmem;
> 
> 		rc = xc_domain_claim_pages(dom->xch, dom->guest_domid, dom->total_pages,
> 					   flag);
> 	}
> 
> (Ignorning the checks for 'rc' and bailing out as neccessary)

No, I meant to get rid of the XENMEMF_claim_* flags altogether (so all
claims are 'tmem' claims) and, at a suitable level in the toolstack, do
something like this:

  LOCK
  free := [XEN_SYSCTL_physinfo]->free
  claimed := [XENMEM_get_outstanding_claims]
  IF need_free_memory AND free - claimed < needed THEN
    DESPAIR (claim would need tmem freeable pages)
  ELSE
    IF [XENMEM_claim](needed) THEN
      REJOICE
    ELSE
      DESPAIR (could not claim enough memory)
    ENDIF
  ENDIF
  UNLOCK

(using whatever wrappers around the hypercalls are appropriate)

If tmem consumes free memory after the test but before the claim, that's
OK: it's equivalent to using a XENMEMF_claim_normal claim and
tmem allocating the same memory immediately _after_ the claim.

The lock is only there to stop two parallel instances claiming the same
free memory.  Or you could invert it, for the optimistic approach:

  IF NOT [XENMEM_claim](needed) THEN
    DESPAIR (could not claim enough memory)
  ENDIF
  free := [XEN_SYSCTL_physinfo]->free
  claimed := [XENMEM_get_outstanding_claims]
  IF need_free_memory AND free < claimed THEN
    [XENMEM_claim](0)
    DESPAIR (claim would need tmem freeable pages)
  ELSE

Not _quite_ the same semantics you had before, but good enough, no?
Especially in the presence of the widely advertised high-speed
allocations.

> The location where the claim call is in the libxc toolstack (it seemed
> the most appropiate as it deals with the populate calls). There is no locking
> semantics at all in that library - that is something the other libraries - 
> such as libxl, have. The libxl has the lock, but it would be a coarse lock
> as it would be around:
> 
> 	xc_hvm_build and xc_dom_mem_init
> 

AIUI, xl already has a lock around all of domain creation, so that's not
making anything worse.  But if libxc exposes the claim operations to
libxl, then libxl could do the locking just around the claim.

In any case, with my grumpy kernel hacker's hat on, odd architecture in
the tools is no excuse for bizarre hypercall semantics. :)  

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-06 15:36         ` Konrad Rzeszutek Wilk
@ 2013-03-07 14:59           ` Konrad Rzeszutek Wilk
  2013-03-07 15:56           ` Tim Deegan
  1 sibling, 0 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-07 14:59 UTC (permalink / raw)
  To: Tim Deegan; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

> > > The big thing is *might*. I put this in the code path to explain better:
> > > 
> > >   /* note; The usage of tmem claim means that allocation from a guest *might*
> > > +     * have to come from freeable memory. Using free memory is always better, if
> > > +     * it is available, than using freeable memory. This flag is for the use
> > > +     * case where the toolstack cannot be constantly aware of the exact current
> > > +     * value of free and/or freeable on each machine and is multi-machine
> > > +     * capable. It can try/fail a "normal" claim on all machines first then,
> > > +     * and if the normal claim on all machines fail, then "fallback" to a
> > > +     * tmem-flag type claim.
> > 
> > Oh I see.  That's pretty strange semantics for a 'claim', though.
> > Wouldn't it make sense for the toolstack just to query free and claimed
> > memory on the first pass and fail if there's not enough space?
> 
> So do something like this:
> 
> 	if ( dom->claim_enabled ) {
> 		unsigned long outstanding = xc_domain_get_outstanding_pages(dom->xch);
> 		xc_physinfo_t xcphysinfo = { 0 };
> 		int flag = XENMEMF_claim_normal;
> 	
> 		rc = xc_physinfo(dom->xch, &xcphysinfo);
> 
> 		if (xcphysinfo.total_pages + outstanding > dom->total_pages)
> 			flag = XENMEMF_claim_tmem;
> 
> 		rc = xc_domain_claim_pages(dom->xch, dom->guest_domid, dom->total_pages,
> 					   flag);
> 	}
> 
> (Ignorning the checks for 'rc' and bailing out as neccessary)
> 
> > The race between that query and the claim call (i.e. enough
> > actually-free space at query time but only freeable memory at claim
> > time) isn't materially different from the claim succeeding and then tmem
> > consuming the memory.  And a race between multiple claims can be
> > trivially avoided with a mutex in toolstack.
> 
> The location where the claim call is in the libxc toolstack (it seemed
> the most appropiate as it deals with the populate calls). There is no locking
> semantics at all in that library - that is something the other libraries - 
> such as libxl, have. The libxl has the lock, but it would be a coarse lock
> as it would be around:

..bla bla..

The other thought is just to drop the XENMEMF_claim_normal flag
and just have XENMEMF_claim_tmem flag and use that by default.

I think that will simplify this a lot more. Let me prep a patch
and have it ready by tomorrow.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-06  9:07       ` Tim Deegan
@ 2013-03-06 15:36         ` Konrad Rzeszutek Wilk
  2013-03-07 14:59           ` Konrad Rzeszutek Wilk
  2013-03-07 15:56           ` Tim Deegan
  0 siblings, 2 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-06 15:36 UTC (permalink / raw)
  To: Tim Deegan; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

On Wed, Mar 06, 2013 at 09:07:46AM +0000, Tim Deegan wrote:
> Hi, 
> 
> At 16:43 -0500 on 05 Mar (1362501800), Konrad Rzeszutek Wilk wrote:
> > > The 'normal' restriction isn't actually enforced except at claim time.
> > > Since the gate in alloc_heap_pages is effectively:
> > 
> > The big thing is *might*. I put this in the code path to explain better:
> > 
> >   /* note; The usage of tmem claim means that allocation from a guest *might*
> > +     * have to come from freeable memory. Using free memory is always better, if
> > +     * it is available, than using freeable memory. This flag is for the use
> > +     * case where the toolstack cannot be constantly aware of the exact current
> > +     * value of free and/or freeable on each machine and is multi-machine
> > +     * capable. It can try/fail a "normal" claim on all machines first then,
> > +     * and if the normal claim on all machines fail, then "fallback" to a
> > +     * tmem-flag type claim.
> 
> Oh I see.  That's pretty strange semantics for a 'claim', though.
> Wouldn't it make sense for the toolstack just to query free and claimed
> memory on the first pass and fail if there's not enough space?

So do something like this:

	if ( dom->claim_enabled ) {
		unsigned long outstanding = xc_domain_get_outstanding_pages(dom->xch);
		xc_physinfo_t xcphysinfo = { 0 };
		int flag = XENMEMF_claim_normal;
	
		rc = xc_physinfo(dom->xch, &xcphysinfo);

		if (xcphysinfo.total_pages + outstanding > dom->total_pages)
			flag = XENMEMF_claim_tmem;

		rc = xc_domain_claim_pages(dom->xch, dom->guest_domid, dom->total_pages,
					   flag);
	}

(Ignorning the checks for 'rc' and bailing out as neccessary)

> The race between that query and the claim call (i.e. enough
> actually-free space at query time but only freeable memory at claim
> time) isn't materially different from the claim succeeding and then tmem
> consuming the memory.  And a race between multiple claims can be
> trivially avoided with a mutex in toolstack.

The location where the claim call is in the libxc toolstack (it seemed
the most appropiate as it deals with the populate calls). There is no locking
semantics at all in that library - that is something the other libraries - 
such as libxl, have. The libxl has the lock, but it would be a coarse lock
as it would be around:

	xc_hvm_build and xc_dom_mem_init

which are the libxc calls that also call populate physmap. So in essence
we would be back to part of the original problem - trying to launch
multiple guests in parallel and hitting a stall for minutes when one
guest is being populated (as both of these would do the claim and also
in a loop call populate_physmap).

The other idea I had was to stick the claim call right outside those two
calls:

	take lock
	xc_domain_claim_pages(...)
	release lock
	xc_dom_mem_init() (or xc_hvm_build)
	xc_domain_claim_pages(..., XENMEM_claim_cancel)

which works great for PV, but not so well for HVM as there is a bunch
of the PoD code in there that subtracts how many pages we need.
(referring to setup_Guest in xc_hvm_build_x86.c)

Unfortunatly it looks a bit odd - the PV code (libxl__build_pv) is all
about:
	xc_dom_something
	xc_dom_something_other
	--> xc_domain_claim_pages <--
	xc_dom_something_other
	--> xc_domain_claim_pages <--
	xc_dom_something_other

and then this oddity of xc_domain_claim_pages stuck in between.
At which point it looks more natural to have it inside the
xc_dom_something calls. Especially as the API of libxc looks
to be structured around - xc_dom_* are the top level which
then calls a variety of xc_domain_* as needed.


> 
> > +     * The memory claimed by a normal claim isn't enforced against "freeable
> > +     * pages" once the claim is successful. That's by design. Once the claim
> > +     * has been made, it still can take minutes before the claim is fully
> > +     * satisfied. Tmem can make use of the unclaimed pages during this time
> > +     * (to store ephemeral/freeable pages only, not persistent pages).
> > +     */
> > 
> > Here is the updated patch (with the long git commit description removed).
> 
> Thanks.  Could you also replace the other uses of "unclaimed" (in
> interfaces & comments)?

Naturally. Let me get that out to you tomorrow.
> 
> Cheers,
> 
> Tim.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-05 21:43     ` Konrad Rzeszutek Wilk
@ 2013-03-06  9:07       ` Tim Deegan
  2013-03-06 15:36         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 32+ messages in thread
From: Tim Deegan @ 2013-03-06  9:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

Hi, 

At 16:43 -0500 on 05 Mar (1362501800), Konrad Rzeszutek Wilk wrote:
> > The 'normal' restriction isn't actually enforced except at claim time.
> > Since the gate in alloc_heap_pages is effectively:
> 
> The big thing is *might*. I put this in the code path to explain better:
> 
>   /* note; The usage of tmem claim means that allocation from a guest *might*
> +     * have to come from freeable memory. Using free memory is always better, if
> +     * it is available, than using freeable memory. This flag is for the use
> +     * case where the toolstack cannot be constantly aware of the exact current
> +     * value of free and/or freeable on each machine and is multi-machine
> +     * capable. It can try/fail a "normal" claim on all machines first then,
> +     * and if the normal claim on all machines fail, then "fallback" to a
> +     * tmem-flag type claim.

Oh I see.  That's pretty strange semantics for a 'claim', though.
Wouldn't it make sense for the toolstack just to query free and claimed
memory on the first pass and fail if there's not enough space?

The race between that query and the claim call (i.e. enough
actually-free space at query time but only freeable memory at claim
time) isn't materially different from the claim succeeding and then tmem
consuming the memory.  And a race between multiple claims can be
trivially avoided with a mutex in toolstack.

> +     * The memory claimed by a normal claim isn't enforced against "freeable
> +     * pages" once the claim is successful. That's by design. Once the claim
> +     * has been made, it still can take minutes before the claim is fully
> +     * satisfied. Tmem can make use of the unclaimed pages during this time
> +     * (to store ephemeral/freeable pages only, not persistent pages).
> +     */
> 
> Here is the updated patch (with the long git commit description removed).

Thanks.  Could you also replace the other uses of "unclaimed" (in
interfaces & comments)?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-05 12:01   ` Tim Deegan
@ 2013-03-05 21:43     ` Konrad Rzeszutek Wilk
  2013-03-06  9:07       ` Tim Deegan
  0 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-05 21:43 UTC (permalink / raw)
  To: Tim Deegan; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

On Tue, Mar 05, 2013 at 12:01:56PM +0000, Tim Deegan wrote:
> Hi,
> 
> At 12:47 -0500 on 04 Mar (1362401229), Konrad Rzeszutek Wilk wrote:
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -252,11 +252,124 @@ static long midsize_alloc_zone_pages;
> >  #define MIDSIZE_ALLOC_FRAC 128
> >  
> >  static DEFINE_SPINLOCK(heap_lock);
> > +static long total_unclaimed_pages; /* total outstanding claims by all domains */
> 
> Could this field (& associated per-domain stuff) have a better name?

Yes. d->outstanding_pages ?

> AFAICT from the code, this is a count of _claimed_ pages (specifically,
> claimed but not allocated).  It caused me to double-take almost every
> time I saw it used in this patch.

The word 'unclaimed' does not help either.

> 
> How about outstanding_claimed_pages, or just outstanding_claims?

outstanding_claims is a great name.

> 
> >  unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
> >  {
> > +    long dom_before, dom_after, dom_claimed, sys_before, sys_after;
> > +
> >      ASSERT(spin_is_locked(&d->page_alloc_lock));
> > -    return d->tot_pages += pages;
> > +    d->tot_pages += pages;
> > +
> > +    /*
> > +     * can test d->claimed_pages race-free because it can only change
> > +     * if d->page_alloc_lock and heap_lock are both held, see also
> > +     * domain_set_unclaimed_pages below
> > +     */
> > +    if ( !d->unclaimed_pages )
> > +        goto out;
> > +
> > +    spin_lock(&heap_lock);
> > +    /* adjust domain unclaimed_pages; may not go negative */
> > +    dom_before = d->unclaimed_pages;
> > +    dom_after = dom_before - pages;
> > +    if ( (dom_before > 0) && (dom_after < 0) )
> > +        dom_claimed = 0;
> 
> Why the test for dom_before > 0 ?  I think it might be better to
> BUG_ON() any of these counts ever being negative, rather than to carry
> on regardless.

I put in a BUG_ON( dom_before <= 0 ) and simplified that logic to be

    BUG_ON(dom_before < 0);
    dom_claimed = dom_after < 0 ? 0 : dom_after;


> 
> Or is this meant to be a cunning way of handling the case where 
> sizeof (long) == 4 and unclaimed_pages > 2^31 ?  I suspect that
> will fall foul of the undefinedness of signed overflow.
> 
> > +    else
> > +        dom_claimed = dom_after;
> > +    d->unclaimed_pages = dom_claimed;
> > +    /* flag accounting bug if system unclaimed_pages would go negative */
> > +    sys_before = total_unclaimed_pages;
> > +    sys_after = sys_before - (dom_before - dom_claimed);
> > +    BUG_ON((sys_before > 0) && (sys_after < 0));
> 
> Same question here.

That can certainly be simplified to be:
	+    BUG_ON(sys_after < 0);

> 
> > +/*
> > + * XENMEM_claim_pages flags:
> > + *  normal: claim is successful only if sufficient free pages
> > + *    are available.
> > + *  tmem: claim is successful only if sufficient free pages
> > + *    are available and (if tmem is enabled) hypervisor
> > + *    may also consider tmem "freeable" pages to satisfy the claim.
> 
> The 'normal' restriction isn't actually enforced except at claim time.
> Since the gate in alloc_heap_pages is effectively:
> 
>   (request > free + freeable - total_reserved)
> 
> later allocations (from any source) can take up all the free memory as
> long as freeable >= total_reserved).
> 
> Is there a use for it?  Presumably by turning on tmem you're happy with
> the idea that allocations might have to come from freeable memory?


The big thing is *might*. I put this in the code path to explain better:

  /* note; The usage of tmem claim means that allocation from a guest *might*
+     * have to come from freeable memory. Using free memory is always better, if
+     * it is available, than using freeable memory. This flag is for the use
+     * case where the toolstack cannot be constantly aware of the exact current
+     * value of free and/or freeable on each machine and is multi-machine
+     * capable. It can try/fail a "normal" claim on all machines first then,
+     * and if the normal claim on all machines fail, then "fallback" to a
+     * tmem-flag type claim.
+     *
+     * The memory claimed by a normal claim isn't enforced against "freeable
+     * pages" once the claim is successful. That's by design. Once the claim
+     * has been made, it still can take minutes before the claim is fully
+     * satisfied. Tmem can make use of the unclaimed pages during this time
+     * (to store ephemeral/freeable pages only, not persistent pages).
+     */

Here is the updated patch (with the long git commit description removed).

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b360de1..90ce40b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -507,6 +507,7 @@ int domain_kill(struct domain *d)
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
+        domain_set_unclaimed_pages(d, 0, 0);
         d->tmem = NULL;
         /* fallthrough */
     case DOMDYING_dying:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b7f6619..c98e99c 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
 
     info->tot_pages         = d->tot_pages;
     info->max_pages         = d->max_pages;
+    info->outstanding_pages = d->outstanding_pages;
     info->shr_pages         = atomic_read(&d->shr_pages);
     info->paged_pages       = atomic_read(&d->paged_pages);
     info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 08550ef..aa698b1 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -712,6 +712,37 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case XENMEM_claim_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&reservation, arg, 1) )
+            return -EFAULT;
+
+        if ( !guest_handle_is_null(reservation.extent_start) )
+            return -EINVAL;
+
+        if ( reservation.extent_order != 0 )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_id(reservation.domid);
+        if ( d == NULL )
+            return -EINVAL;
+
+        rc = domain_set_unclaimed_pages(d, reservation.nr_extents,
+                                        reservation.mem_flags);
+
+        rcu_unlock_domain(d);
+
+        break;
+
+    case XENMEM_get_unclaimed_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        rc = get_outstanding_claims();
+        break;
+
     default:
         rc = arch_memory_op(op, arg);
         break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e9fb15..73e2392 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -252,11 +252,137 @@ static long midsize_alloc_zone_pages;
 #define MIDSIZE_ALLOC_FRAC 128
 
 static DEFINE_SPINLOCK(heap_lock);
+static long outstanding_claims; /* total outstanding claims by all domains */
 
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
 {
+    long dom_before, dom_after, dom_claimed, sys_before, sys_after;
+
     ASSERT(spin_is_locked(&d->page_alloc_lock));
-    return d->tot_pages += pages;
+    d->tot_pages += pages;
+
+    /*
+     * can test d->claimed_pages race-free because it can only change
+     * if d->page_alloc_lock and heap_lock are both held, see also
+     * domain_set_unclaimed_pages below
+     */
+    if ( !d->outstanding_pages )
+        goto out;
+
+    spin_lock(&heap_lock);
+    /* adjust domain outstanding pages; may not go negative */
+    dom_before = d->outstanding_pages;
+    dom_after = dom_before - pages;
+    BUG_ON(dom_before < 0);
+    dom_claimed = dom_after < 0 ? 0 : dom_after;
+    d->outstanding_pages = dom_claimed;
+    /* flag accounting bug if system outstanding_claims would go negative */
+    sys_before = outstanding_claims;
+    sys_after = sys_before - (dom_before - dom_claimed);
+    BUG_ON(sys_after < 0);
+    outstanding_claims = sys_after;
+    spin_unlock(&heap_lock);
+
+out:
+    return d->tot_pages;
+}
+
+int domain_set_unclaimed_pages(struct domain *d, unsigned long pages,
+                                unsigned long flags)
+{
+    int ret = -ENOMEM;
+    unsigned long claim, avail_pages;
+
+    /*
+     * Sanity check.
+     */
+    switch ( flags ) {
+    case XENMEM_CLAIMF_normal:
+    case XENMEM_CLAIMF_tmem:
+        if ( pages == 0 )
+            return -EINVAL;
+        break;
+    case XENMEM_CLAIMF_cancel:
+        pages = 0;
+        break;
+    default:
+    return -ENOSYS;
+    }
+
+    /*
+     * take the domain's page_alloc_lock, else all d->tot_page adjustments
+     * must always take the global heap_lock rather than only in the much
+     * rarer case that d->outstanding_pages is non-zero
+     */
+    spin_lock(&d->page_alloc_lock);
+    spin_lock(&heap_lock);
+
+    /* pages==0 means "unset" the claim. */
+    if ( pages == 0 )
+    {
+        outstanding_claims -= d->outstanding_pages;
+        d->outstanding_pages = 0;
+        ret = 0;
+        goto out;
+    }
+
+    /* only one active claim per domain please */
+    if ( d->outstanding_pages )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* disallow a claim not exceeding current tot_pages or above max_pages */
+    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* how much memory is available? */
+    avail_pages = total_avail_pages;
+    /* note; The usage of tmem claim means that allocation from a guest *might*
+     * have to come from freeable memory. Using free memory is always better, if
+     * it is available, than using freeable memory. This flag is for the use
+     * case where the toolstack cannot be constantly aware of the exact current
+     * value of free and/or freeable on each machine and is multi-machine
+     * capable. It can try/fail a "normal" claim on all machines first then,
+     * and if the normal claim on all machines fail, then "fallback" to a
+     * tmem-flag type claim.
+     *
+     * The memory claimed by a normal claim isn't enforced against "freeable
+     * pages" once the claim is successful. That's by design. Once the claim
+     * has been made, it still can take minutes before the claim is fully
+     * satisfied. Tmem can make use of the unclaimed pages during this time
+     * (to store ephemeral/freeable pages only, not persistent pages).
+     */
+    if ( flags & XENMEM_CLAIMF_tmem )
+        avail_pages += tmem_freeable_pages();
+    avail_pages -= outstanding_claims;
+
+    /*
+     * note, if domain has already allocated memory before making a claim
+     * then the claim must take tot_pages into account
+     */
+    claim = pages - d->tot_pages;
+    if ( claim > avail_pages )
+        goto out;
+
+    /* yay, claim fits in available memory, stake the claim, success! */
+    d->outstanding_pages = claim;
+    outstanding_claims += d->outstanding_pages;
+    ret = 0;
+
+out:
+    spin_unlock(&heap_lock);
+    spin_unlock(&d->page_alloc_lock);
+    return ret;
+}
+
+long get_outstanding_claims(void)
+{
+    return outstanding_claims;
 }
 
 static unsigned long init_node_heap(int node, unsigned long mfn,
@@ -397,7 +523,7 @@ static void __init setup_low_mem_virq(void)
 static void check_low_mem_virq(void)
 {
     unsigned long avail_pages = total_avail_pages +
-        (opt_tmem ? tmem_freeable_pages() : 0);
+        (opt_tmem ? tmem_freeable_pages() : 0) - outstanding_claims;
 
     if ( unlikely(avail_pages <= low_mem_virq_th) )
     {
@@ -466,6 +592,15 @@ static struct page_info *alloc_heap_pages(
     spin_lock(&heap_lock);
 
     /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          (d == NULL || d->outstanding_pages < request) )
+        goto not_found;
+
+    /*
      * TMEM: When available memory is scarce due to tmem absorbing it, allow
      * only mid-size allocations to avoid worst of fragmentation issues.
      * Others try tmem pools then fail.  This is a workaround until all
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index deb19db..113b8dc 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -36,7 +36,7 @@
 #include "grant_table.h"
 #include "hvm/save.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo {
     uint32_t flags;              /* XEN_DOMINF_* */
     uint64_aligned_t tot_pages;
     uint64_aligned_t max_pages;
+    uint64_aligned_t outstanding_pages;
     uint64_aligned_t shr_pages;
     uint64_aligned_t paged_pages;
     uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 1c5ca19..5c264ca 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -68,6 +68,8 @@ struct xen_memory_reservation {
      *   IN:  GPFN bases of extents to populate with memory
      *   OUT: GMFN bases of extents that were allocated
      *   (NB. This command also updates the mach_to_phys translation table)
+     * XENMEM_claim_pages:
+     *   IN: must be zero
      */
     XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
 
@@ -430,10 +432,51 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
 
 /*
- * Reserve ops for future/out-of-tree "claim" patches (Oracle)
+ * Attempt to stake a claim for a domain on a quantity of pages
+ * of system RAM, but _not_ assign specific pageframes.  Only
+ * arithmetic is performed so the hypercall is very fast and need
+ * not be preemptible, thus sidestepping time-of-check-time-of-use
+ * races for memory allocation.  Returns 0 if the hypervisor page
+ * allocator has atomically and successfully claimed the requested
+ * number of pages, else non-zero.
+ *
+ * Any domain may have only one active claim.  When sufficient memory
+ * has been allocated to resolve the claim, the claim silently expires.
+ * Claiming zero pages effectively resets any outstanding claim and
+ * is always successful.
+ *
+ * Note that a valid claim may be staked even after memory has been
+ * allocated for a domain.  In this case, the claim is not incremental,
+ * i.e. if the domain's tot_pages is 3, and a claim is staked for 10,
+ * only 7 additional pages are claimed.
+ *
+ * Caller must be privileged or the hypercall fails.
  */
 #define XENMEM_claim_pages                  24
-#define XENMEM_get_unclaimed_pages          25
+
+/*
+ * XENMEM_claim_pages flags:
+ *  normal: claim is successful only if sufficient free pages
+ *    are available.
+ *  tmem: claim is successful only if sufficient free pages
+ *    are available and (if tmem is enabled) hypervisor
+ *    may also consider tmem "freeable" pages to satisfy the claim.
+ *  cancel: cancel the outstanding claim for the domain.
+ */
+#define XENMEM_CLAIMF_ignored                0
+
+#define _XENMEM_CLAIMF_normal                1
+#define XENMEM_CLAIMF_normal                 (1U<<_XENMEM_CLAIMF_normal)
+#define _XENMEM_CLAIMF_tmem                  2
+#define XENMEM_CLAIMF_tmem                   (1U<<_XENMEM_CLAIMF_tmem)
+#define _XENMEM_CLAIMF_cancel                3
+#define XENMEM_CLAIMF_cancel                 (1U<<_XENMEM_CLAIMF_cancel)
+/*
+ * Get the number of pages currently claimed (but not yet "possessed")
+ * across all domains.  The caller must be privileged but otherwise
+ * the call never fails.
+ */
+#define XENMEM_get_unclaimed_pages            25
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 2f701f5..8a355b6 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -49,7 +49,11 @@ void free_xenheap_pages(void *v, unsigned int order);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 
+/* Claim handling */
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
+int domain_set_unclaimed_pages(
+    struct domain *d, unsigned long pages, unsigned long flags);
+long get_outstanding_claims(void);
 
 /* Domain suballocator. These functions are *not* interrupt-safe.*/
 void init_domheap_pages(paddr_t ps, paddr_t pe);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e108436..569e76e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -242,6 +242,7 @@ struct domain
     struct page_list_head page_list;  /* linked list */
     struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
     unsigned int     tot_pages;       /* number of pages currently possesed */
+    unsigned int     outstanding_pages; /* pages claimed but not possessed  */
     unsigned int     max_pages;       /* maximum value for tot_pages        */
     atomic_t         shr_pages;       /* number of shared pages             */
     atomic_t         paged_pages;     /* number of paged-out pages          */

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-04 17:47 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
  2013-03-04 21:50   ` Keir Fraser
@ 2013-03-05 12:01   ` Tim Deegan
  2013-03-05 21:43     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 32+ messages in thread
From: Tim Deegan @ 2013-03-05 12:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: dan.magenheimer, xen-devel, keir, keir.xen

Hi,

At 12:47 -0500 on 04 Mar (1362401229), Konrad Rzeszutek Wilk wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -252,11 +252,124 @@ static long midsize_alloc_zone_pages;
>  #define MIDSIZE_ALLOC_FRAC 128
>  
>  static DEFINE_SPINLOCK(heap_lock);
> +static long total_unclaimed_pages; /* total outstanding claims by all domains */

Could this field (& associated per-domain stuff) have a better name?
AFAICT from the code, this is a count of _claimed_ pages (specifically,
claimed but not allocated).  It caused me to double-take almost every
time I saw it used in this patch.

How about outstanding_claimed_pages, or just outstanding_claims?

>  unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
>  {
> +    long dom_before, dom_after, dom_claimed, sys_before, sys_after;
> +
>      ASSERT(spin_is_locked(&d->page_alloc_lock));
> -    return d->tot_pages += pages;
> +    d->tot_pages += pages;
> +
> +    /*
> +     * can test d->claimed_pages race-free because it can only change
> +     * if d->page_alloc_lock and heap_lock are both held, see also
> +     * domain_set_unclaimed_pages below
> +     */
> +    if ( !d->unclaimed_pages )
> +        goto out;
> +
> +    spin_lock(&heap_lock);
> +    /* adjust domain unclaimed_pages; may not go negative */
> +    dom_before = d->unclaimed_pages;
> +    dom_after = dom_before - pages;
> +    if ( (dom_before > 0) && (dom_after < 0) )
> +        dom_claimed = 0;

Why the test for dom_before > 0 ?  I think it might be better to
BUG_ON() any of these counts ever being negative, rather than to carry
on regardless.

Or is this meant to be a cunning way of handling the case where 
sizeof (long) == 4 and unclaimed_pages > 2^31 ?  I suspect that
will fall foul of the undefinedness of signed overflow.

> +    else
> +        dom_claimed = dom_after;
> +    d->unclaimed_pages = dom_claimed;
> +    /* flag accounting bug if system unclaimed_pages would go negative */
> +    sys_before = total_unclaimed_pages;
> +    sys_after = sys_before - (dom_before - dom_claimed);
> +    BUG_ON((sys_before > 0) && (sys_after < 0));

Same question here.

> +/*
> + * XENMEM_claim_pages flags:
> + *  normal: claim is successful only if sufficient free pages
> + *    are available.
> + *  tmem: claim is successful only if sufficient free pages
> + *    are available and (if tmem is enabled) hypervisor
> + *    may also consider tmem "freeable" pages to satisfy the claim.

The 'normal' restriction isn't actually enforced except at claim time.
Since the gate in alloc_heap_pages is effectively:

  (request > free + freeable - total_reserved)

later allocations (from any source) can take up all the free memory as
long as freeable >= total_reserved).

Is there a use for it?  Presumably by turning on tmem you're happy with
the idea that allocations might have to come from freeable memory?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-04 17:47 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
@ 2013-03-04 21:50   ` Keir Fraser
  2013-03-05 12:01   ` Tim Deegan
  1 sibling, 0 replies; 32+ messages in thread
From: Keir Fraser @ 2013-03-04 21:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel; +Cc: dan.magenheimer

This patch is fine by me at this point.

Acked-by: Keir Fraser <keir@xen.org>

On 04/03/2013 17:47, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

> From: Dan Magenheimer <dan.magenheimer@oracle.com>
> 
> When guests memory consumption is volatile (multiple guests
> ballooning up/down) we are presented with the problem of
> being able to determine exactly how much memory there is
> for allocation of new guests without negatively impacting
> existing guests. Note that the existing models (xapi, xend)
> drive the memory consumption from the tool-stack and assume
> that the guest will eventually hit the memory target. Other
> models, such as the dynamic memory utilized by tmem, do this
> differently - the guest drivers the memory consumption (up
> to the d->max_pages ceiling). With dynamic memory model, the
> guest frequently can balloon up and down as it sees fit.
> This presents the problem to the toolstack that it does not
> know atomically how much free memory there is (as the information
> gets stale the moment the d->tot_pages information is provided
> to the tool-stack), and hence when starting a guest can fail
> during the memory creation process. Especially if the process
> is done in parallel. In a nutshell what we need is a atomic
> value of all domains tot_pages during the allocation of guests.
> Naturally holding a lock for such a long time is unacceptable.
> Hence the goal of this hypercall is to attempt to atomically and very
> quickly determine if there are sufficient pages available in the
> system and, if so, "set aside" that quantity of pages for future
> allocations by that domain.  Unlike an existing hypercall such as
> increase_reservation or populate_physmap, specific physical
> pageframes are not assigned to the domain because this
> cannot be done sufficiently quickly (especially for very large
> allocations in an arbitrarily fragmented system) and so the
> existing mechanisms result in classic time-of-check-time-of-use
> (TOCTOU) races.  One can think of claiming as similar to a
> "lazy" allocation, but subsequent hypercalls are required
> to do the actual physical pageframe allocation.
> 
> Note that one of effects of this hypercall is that from the
> perspective of other running guests -  suddenly there is
> a new guest occupying X amount of pages. This means that when
> we try to balloon up they will hit the system-wide ceiling of
> available free memory (if the total sum of the existing d->max_pages
>> = host memory). This is OK - as that is part of the overcommit.
> What we DO NOT want to do is dictate their ceiling should be
> (d->max_pages) as that is risky and can lead to guests OOM-ing.
> It is something the guest needs to figure out.
> 
> In order for a toolstack to "get" information about whether
> a domain has a claim and, if so, how large, and also for
> the toolstack to measure the total system-wide claim, a
> second subop has been added and exposed through domctl
> and libxl (see "xen: XENMEM_claim_pages: xc").
> 
> == Alternative solutions ==
> There has been a variety of discussion whether the problem
> hypercall is solving can be done in user-space, such as:
>  - For all the existing guest, set their d->max_pages temporarily
>    to d->tot_pages and create the domain. This forces those
>    domains to stay at their current consumption level (fyi, this is what
>    the tmem freeze call is doing). The disadvantage of this is
>    that needlessly forces the guests to stay at the memory usage
>    instead of allowing it to decide the optimal target.
>  - Account only using d->max_pages of how much free memory there is.
>    This ignores ballooning changes and any over-commit scenario. This
>    is similar to the scenario where the sum of all d->max_pages (and
>    the one to be allocated now) on the host is smaller than the available
>    free memory. As such it ignores the over-commit problem.
>  - Provide a ring/FIFO along with event channel to notify an userspace
>    daemon of guests memory consumption. This daemon can then provide
>    up-to-date information to the toolstack of how much free memory
>    there is. This duplicates what the hypervisor is already doing and
>    introduced latency issues and catching breath for the toolstack as there
>    might be millions of these updates on heavily used machine. There might
>    not be any quiescent state ever and the toolstack will heavily consume
>    CPU cycles and not ever provide up-to-date information.
> 
> It has been noted that this claim mechanism solves the
> underlying problem (slow failure of domain creation) for
> a large class of domains but not all, specifically not
> handling (but also not making the problem worse for) PV
> domains that specify the "superpages" flag, and 32-bit PV
> domains on large RAM systems.  These will be addressed at a
> later time.
> 
> Code overview:
> 
> Though the hypercall simply does arithmetic within locks,
> some of the semantics in the code may be a bit subtle.
> 
> The key variables (d->unclaimed_pages and total_unclaimed_pages)
> starts at zero if no claim has yet been staked for any domain.
> (Perhaps a better name is "claimed_but_not_yet_possessed" but that's
> a bit unwieldy.)  If no claim hypercalls are executed, there
> should be no impact on existing usage.
> 
> When a claim is successfully staked by a domain, it is like a
> watermark but there is no record kept of the size of the claim.
> Instead, d->unclaimed_pages is set to the difference between
> d->tot_pages and the claim.  When d->tot_pages increases or decreases,
> d->unclaimed_pages atomically decreases or increases.  Once
> d->unclaimed_pages reaches zero, the claim is satisfied and
> d->unclaimed pages stays at zero -- unless a new claim is
> subsequently staked.
> 
> The systemwide variable total_unclaimed_pages is always the sum
> of d->unclaimed_pages, across all domains.  A non-domain-
> specific heap allocation will fail if total_unclaimed_pages
> exceeds free (plus, on tmem enabled systems, freeable) pages.
> 
> Claim semantics may be modified by flags.  The initial
> implementation has three flag, which discerns whether the
> caller would like tmem freeable pages to be considered
> in determining whether or not the claim can be successfully
> staked. Future flags may, for example, specify that the
> implementation has one flag, which discerns whether the
> caller would like tmem freeable pages to be considered
> in determining whether or not the claim can be successfully
> staked. Future flags may, for example, specify that the
> caller would like the claim to be successful only if there
> are sufficient pages available on a single node (per Dario's
> suggestion).
> 
> A claim can be cancelled by requesting a claim with the
> flag being zero (XENMEM_CLAIMF_ignored).
> 
> A second subop returns the total outstanding claimed pages
> systemwide.
> 
> Note: Save/restore/migrate may need to be modified,
> else it can be documented that all claims are cancelled.
> 
> ==
> This patch of the proposed XENMEM_claim_pages hypercall/subop, takes
> into account review feedback from Jan and Keir and IanC and Matthew Daley,
> plus some fixes found via runtime debugging.
> 
> v9->v10:
>  - Altered the API a bit (new flags, etc).
> 
> v8->v9:
> - Drop snippets already merged by Keir
>   (26269:21a5b181f8ad and 26270:03cb71bc32f9)
> 
> v7->v8:
> - Replace missing tot_pages adjustment [JBeulich]
> - Combine domain_inc/dec_tot_pages into one function [JBeulich]
> - Added a few more clarifying comments [djm]
> 
> v6->v7:
> - Need co-existence with low_mem_virq [JBeulich]
> 
> v5->v6:
> - Fix post-inc problem [mattjd]
> 
> v4->v5:
> - Split tools part into separate patch [JBeulich]
> - Minor coding style fixups [JBeulich]
> - Use rcu_lock_domain_by_id, not _target version [JBeulich]
> 
> v3->v4: (please ignore v3)
> - Process error, sent stale patch, sorry [djm]
> 
> v2->v3:
> - Add per-domain and global "get" for unclaimed info [JBeulich]
> - New hypercall(s) should fail for unpriv callers [IanC]
> - Non-zero extent_order disallowed [JBeulich]
> - Remove bonehead ASSERTs [JBeulich]
> - Need not hold heaplock for decrease case too [JBeulich]
> - More descriptive failure return values [JBeulich]
> - Don't allow a claim to exceed max_pages [IanC]
> - Subops must be in correct ifdef block in memory.c [keir]
> 
> v1->v2:
> - Add reset-to-zero page claim in domain_kill [JBeulich]
> - Proper handling of struct passed to hypercall [JBeulich]
> - Fix alloc_heap_pages when a domain has a claim [JBeulich]
> - Need not hold heap_lock if !d->unclaimed_pages [keir]
> - Fix missed tot_pages call in donate_page [djm]
> - Remove domain_reset_unclaimed_pages; use set with zero [djm]
> - Bugfixes found through testing in set_unclaimed [djm]
> - More comments in code [djm]
> - Code formatting fixes [djm]
> ===
> 
> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/common/domain.c         |   1 +
>  xen/common/domctl.c         |   1 +
>  xen/common/memory.c         |  31 +++++++++++
>  xen/common/page_alloc.c     | 126
> +++++++++++++++++++++++++++++++++++++++++++-
>  xen/include/public/domctl.h |   3 +-
>  xen/include/public/memory.h |  47 ++++++++++++++++-
>  xen/include/xen/mm.h        |   4 ++
>  xen/include/xen/sched.h     |   1 +
>  8 files changed, 209 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index b360de1..90ce40b 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -507,6 +507,7 @@ int domain_kill(struct domain *d)
>          evtchn_destroy(d);
>          gnttab_release_mappings(d);
>          tmem_destroy(d->tmem);
> +        domain_set_unclaimed_pages(d, 0, 0);
>          d->tmem = NULL;
>          /* fallthrough */
>      case DOMDYING_dying:
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index b7f6619..98d6f50 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct
> xen_domctl_getdomaininfo *info)
>  
>      info->tot_pages         = d->tot_pages;
>      info->max_pages         = d->max_pages;
> +    info->unclaimed_pages   = d->unclaimed_pages;
>      info->shr_pages         = atomic_read(&d->shr_pages);
>      info->paged_pages       = atomic_read(&d->paged_pages);
>      info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info));
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 08550ef..c27bba5 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -712,6 +712,37 @@ long do_memory_op(unsigned long cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case XENMEM_claim_pages:
> +        if ( !IS_PRIV(current->domain) )
> +            return -EPERM;
> +
> +        if ( copy_from_guest(&reservation, arg, 1) )
> +            return -EFAULT;
> +
> +        if ( !guest_handle_is_null(reservation.extent_start) )
> +            return -EINVAL;
> +
> +        if ( reservation.extent_order != 0 )
> +            return -EINVAL;
> +
> +        d = rcu_lock_domain_by_id(reservation.domid);
> +        if ( d == NULL )
> +            return -EINVAL;
> +
> +        rc = domain_set_unclaimed_pages(d, reservation.nr_extents,
> +                                        reservation.mem_flags);
> +
> +        rcu_unlock_domain(d);
> +
> +        break;
> +
> +    case XENMEM_get_unclaimed_pages:
> +        if ( !IS_PRIV(current->domain) )
> +            return -EPERM;
> +
> +        rc = get_total_unclaimed_pages();
> +        break;
> +
>      default:
>          rc = arch_memory_op(op, arg);
>          break;
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 9e9fb15..2442aba 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -252,11 +252,124 @@ static long midsize_alloc_zone_pages;
>  #define MIDSIZE_ALLOC_FRAC 128
>  
>  static DEFINE_SPINLOCK(heap_lock);
> +static long total_unclaimed_pages; /* total outstanding claims by all domains
> */
>  
>  unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
>  {
> +    long dom_before, dom_after, dom_claimed, sys_before, sys_after;
> +
>      ASSERT(spin_is_locked(&d->page_alloc_lock));
> -    return d->tot_pages += pages;
> +    d->tot_pages += pages;
> +
> +    /*
> +     * can test d->claimed_pages race-free because it can only change
> +     * if d->page_alloc_lock and heap_lock are both held, see also
> +     * domain_set_unclaimed_pages below
> +     */
> +    if ( !d->unclaimed_pages )
> +        goto out;
> +
> +    spin_lock(&heap_lock);
> +    /* adjust domain unclaimed_pages; may not go negative */
> +    dom_before = d->unclaimed_pages;
> +    dom_after = dom_before - pages;
> +    if ( (dom_before > 0) && (dom_after < 0) )
> +        dom_claimed = 0;
> +    else
> +        dom_claimed = dom_after;
> +    d->unclaimed_pages = dom_claimed;
> +    /* flag accounting bug if system unclaimed_pages would go negative */
> +    sys_before = total_unclaimed_pages;
> +    sys_after = sys_before - (dom_before - dom_claimed);
> +    BUG_ON((sys_before > 0) && (sys_after < 0));
> +    total_unclaimed_pages = sys_after;
> +    spin_unlock(&heap_lock);
> +
> +out:
> +    return d->tot_pages;
> +}
> +
> +int domain_set_unclaimed_pages(struct domain *d, unsigned long pages,
> +                                unsigned long flags)
> +{
> +    int ret = -ENOMEM;
> +    unsigned long claim, avail_pages;
> +
> +    /*
> +     * Sanity check.
> +     */
> +    switch ( flags ) {
> +    case XENMEM_CLAIMF_normal:
> +    case XENMEM_CLAIMF_tmem:
> +        if ( pages == 0 )
> +            return -EINVAL;
> +        break;
> +    case XENMEM_CLAIMF_cancel:
> +        pages = 0;
> +        break;
> +    default:
> +    return -ENOSYS;
> +    }
> +
> +    /*
> +     * take the domain's page_alloc_lock, else all d->tot_page adjustments
> +     * must always take the global heap_lock rather than only in the much
> +     * rarer case that d->unclaimed_pages is non-zero
> +     */
> +    spin_lock(&d->page_alloc_lock);
> +    spin_lock(&heap_lock);
> +
> +    /* pages==0 means "unset" the claim. */
> +    if ( pages == 0 )
> +    {
> +        total_unclaimed_pages -= d->unclaimed_pages;
> +        d->unclaimed_pages = 0;
> +        ret = 0;
> +        goto out;
> +    }
> +
> +    /* only one active claim per domain please */
> +    if ( d->unclaimed_pages )
> +    {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    /* disallow a claim not exceeding current tot_pages or above max_pages */
> +    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
> +    {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    /* how much memory is available? */
> +    avail_pages = total_avail_pages;
> +    if ( flags & XENMEM_CLAIMF_tmem )
> +        avail_pages += tmem_freeable_pages();
> +    avail_pages -= total_unclaimed_pages;
> +
> +    /*
> +     * note, if domain has already allocated memory before making a claim
> +     * then the claim must take tot_pages into account
> +     */
> +    claim = pages - d->tot_pages;
> +    if ( claim > avail_pages )
> +        goto out;
> +
> +    /* yay, claim fits in available memory, stake the claim, success! */
> +    d->unclaimed_pages = claim;
> +    total_unclaimed_pages += d->unclaimed_pages;
> +    ret = 0;
> +
> +out:
> +    spin_unlock(&heap_lock);
> +    spin_unlock(&d->page_alloc_lock);
> +    return ret;
> +}
> +
> +long get_total_unclaimed_pages(void)
> +{
> +    return total_unclaimed_pages;
>  }
>  
>  static unsigned long init_node_heap(int node, unsigned long mfn,
> @@ -397,7 +510,7 @@ static void __init setup_low_mem_virq(void)
>  static void check_low_mem_virq(void)
>  {
>      unsigned long avail_pages = total_avail_pages +
> -        (opt_tmem ? tmem_freeable_pages() : 0);
> +        (opt_tmem ? tmem_freeable_pages() : 0) - total_unclaimed_pages;
>  
>      if ( unlikely(avail_pages <= low_mem_virq_th) )
>      {
> @@ -466,6 +579,15 @@ static struct page_info *alloc_heap_pages(
>      spin_lock(&heap_lock);
>  
>      /*
> +     * Claimed memory is considered unavailable unless the request
> +     * is made by a domain with sufficient unclaimed pages.
> +     */
> +    if ( (total_unclaimed_pages + request >
> +          total_avail_pages + tmem_freeable_pages()) &&
> +          (d == NULL || d->unclaimed_pages < request) )
> +        goto not_found;
> +
> +    /*
>       * TMEM: When available memory is scarce due to tmem absorbing it, allow
>       * only mid-size allocations to avoid worst of fragmentation issues.
>       * Others try tmem pools then fail.  This is a workaround until all
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index deb19db..efae421 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -36,7 +36,7 @@
>  #include "grant_table.h"
>  #include "hvm/save.h"
>  
> -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008
> +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
>  
>  /*
>   * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> @@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo {
>      uint32_t flags;              /* XEN_DOMINF_* */
>      uint64_aligned_t tot_pages;
>      uint64_aligned_t max_pages;
> +    uint64_aligned_t unclaimed_pages;
>      uint64_aligned_t shr_pages;
>      uint64_aligned_t paged_pages;
>      uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 1c5ca19..5c264ca 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -68,6 +68,8 @@ struct xen_memory_reservation {
>       *   IN:  GPFN bases of extents to populate with memory
>       *   OUT: GMFN bases of extents that were allocated
>       *   (NB. This command also updates the mach_to_phys translation table)
> +     * XENMEM_claim_pages:
> +     *   IN: must be zero
>       */
>      XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
>  
> @@ -430,10 +432,51 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
>  
>  /*
> - * Reserve ops for future/out-of-tree "claim" patches (Oracle)
> + * Attempt to stake a claim for a domain on a quantity of pages
> + * of system RAM, but _not_ assign specific pageframes.  Only
> + * arithmetic is performed so the hypercall is very fast and need
> + * not be preemptible, thus sidestepping time-of-check-time-of-use
> + * races for memory allocation.  Returns 0 if the hypervisor page
> + * allocator has atomically and successfully claimed the requested
> + * number of pages, else non-zero.
> + *
> + * Any domain may have only one active claim.  When sufficient memory
> + * has been allocated to resolve the claim, the claim silently expires.
> + * Claiming zero pages effectively resets any outstanding claim and
> + * is always successful.
> + *
> + * Note that a valid claim may be staked even after memory has been
> + * allocated for a domain.  In this case, the claim is not incremental,
> + * i.e. if the domain's tot_pages is 3, and a claim is staked for 10,
> + * only 7 additional pages are claimed.
> + *
> + * Caller must be privileged or the hypercall fails.
>   */
>  #define XENMEM_claim_pages                  24
> -#define XENMEM_get_unclaimed_pages          25
> +
> +/*
> + * XENMEM_claim_pages flags:
> + *  normal: claim is successful only if sufficient free pages
> + *    are available.
> + *  tmem: claim is successful only if sufficient free pages
> + *    are available and (if tmem is enabled) hypervisor
> + *    may also consider tmem "freeable" pages to satisfy the claim.
> + *  cancel: cancel the outstanding claim for the domain.
> + */
> +#define XENMEM_CLAIMF_ignored                0
> +
> +#define _XENMEM_CLAIMF_normal                1
> +#define XENMEM_CLAIMF_normal                 (1U<<_XENMEM_CLAIMF_normal)
> +#define _XENMEM_CLAIMF_tmem                  2
> +#define XENMEM_CLAIMF_tmem                   (1U<<_XENMEM_CLAIMF_tmem)
> +#define _XENMEM_CLAIMF_cancel                3
> +#define XENMEM_CLAIMF_cancel                 (1U<<_XENMEM_CLAIMF_cancel)
> +/*
> + * Get the number of pages currently claimed (but not yet "possessed")
> + * across all domains.  The caller must be privileged but otherwise
> + * the call never fails.
> + */
> +#define XENMEM_get_unclaimed_pages            25
>  
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  
> diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
> index 2f701f5..f4ab991 100644
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -49,7 +49,11 @@ void free_xenheap_pages(void *v, unsigned int order);
>  #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
>  #define free_xenheap_page(v) (free_xenheap_pages(v,0))
>  
> +/* Claim handling */
>  unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
> +int domain_set_unclaimed_pages(
> +    struct domain *d, unsigned long pages, unsigned long flags);
> +long get_total_unclaimed_pages(void);
>  
>  /* Domain suballocator. These functions are *not* interrupt-safe.*/
>  void init_domheap_pages(paddr_t ps, paddr_t pe);
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index e108436..9c5263f 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -242,6 +242,7 @@ struct domain
>      struct page_list_head page_list;  /* linked list */
>      struct page_list_head xenpage_list; /* linked list (size xenheap_pages)
> */
>      unsigned int     tot_pages;       /* number of pages currently possesed
> */
> +    unsigned int     unclaimed_pages; /* pages claimed but not possessed
> */
>      unsigned int     max_pages;       /* maximum value for tot_pages
> */
>      atomic_t         shr_pages;       /* number of shared pages
> */
>      atomic_t         paged_pages;     /* number of paged-out pages
> */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
  2013-03-04 17:47 [PATCH v10] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
@ 2013-03-04 17:47 ` Konrad Rzeszutek Wilk
  2013-03-04 21:50   ` Keir Fraser
  2013-03-05 12:01   ` Tim Deegan
  0 siblings, 2 replies; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-04 17:47 UTC (permalink / raw)
  To: xen-devel, keir, keir.xen; +Cc: dan.magenheimer, Konrad Rzeszutek Wilk

From: Dan Magenheimer <dan.magenheimer@oracle.com>

When guests memory consumption is volatile (multiple guests
ballooning up/down) we are presented with the problem of
being able to determine exactly how much memory there is
for allocation of new guests without negatively impacting
existing guests. Note that the existing models (xapi, xend)
drive the memory consumption from the tool-stack and assume
that the guest will eventually hit the memory target. Other
models, such as the dynamic memory utilized by tmem, do this
differently - the guest drivers the memory consumption (up
to the d->max_pages ceiling). With dynamic memory model, the
guest frequently can balloon up and down as it sees fit.
This presents the problem to the toolstack that it does not
know atomically how much free memory there is (as the information
gets stale the moment the d->tot_pages information is provided
to the tool-stack), and hence when starting a guest can fail
during the memory creation process. Especially if the process
is done in parallel. In a nutshell what we need is a atomic
value of all domains tot_pages during the allocation of guests.
Naturally holding a lock for such a long time is unacceptable.
Hence the goal of this hypercall is to attempt to atomically and very
quickly determine if there are sufficient pages available in the
system and, if so, "set aside" that quantity of pages for future
allocations by that domain.  Unlike an existing hypercall such as
increase_reservation or populate_physmap, specific physical
pageframes are not assigned to the domain because this
cannot be done sufficiently quickly (especially for very large
allocations in an arbitrarily fragmented system) and so the
existing mechanisms result in classic time-of-check-time-of-use
(TOCTOU) races.  One can think of claiming as similar to a
"lazy" allocation, but subsequent hypercalls are required
to do the actual physical pageframe allocation.

Note that one of effects of this hypercall is that from the
perspective of other running guests -  suddenly there is
a new guest occupying X amount of pages. This means that when
we try to balloon up they will hit the system-wide ceiling of
available free memory (if the total sum of the existing d->max_pages
>= host memory). This is OK - as that is part of the overcommit.
What we DO NOT want to do is dictate their ceiling should be
(d->max_pages) as that is risky and can lead to guests OOM-ing.
It is something the guest needs to figure out.

In order for a toolstack to "get" information about whether
a domain has a claim and, if so, how large, and also for
the toolstack to measure the total system-wide claim, a
second subop has been added and exposed through domctl
and libxl (see "xen: XENMEM_claim_pages: xc").

== Alternative solutions ==
There has been a variety of discussion whether the problem
hypercall is solving can be done in user-space, such as:
 - For all the existing guest, set their d->max_pages temporarily
   to d->tot_pages and create the domain. This forces those
   domains to stay at their current consumption level (fyi, this is what
   the tmem freeze call is doing). The disadvantage of this is
   that needlessly forces the guests to stay at the memory usage
   instead of allowing it to decide the optimal target.
 - Account only using d->max_pages of how much free memory there is.
   This ignores ballooning changes and any over-commit scenario. This
   is similar to the scenario where the sum of all d->max_pages (and
   the one to be allocated now) on the host is smaller than the available
   free memory. As such it ignores the over-commit problem.
 - Provide a ring/FIFO along with event channel to notify an userspace
   daemon of guests memory consumption. This daemon can then provide
   up-to-date information to the toolstack of how much free memory
   there is. This duplicates what the hypervisor is already doing and
   introduced latency issues and catching breath for the toolstack as there
   might be millions of these updates on heavily used machine. There might
   not be any quiescent state ever and the toolstack will heavily consume
   CPU cycles and not ever provide up-to-date information.

It has been noted that this claim mechanism solves the
underlying problem (slow failure of domain creation) for
a large class of domains but not all, specifically not
handling (but also not making the problem worse for) PV
domains that specify the "superpages" flag, and 32-bit PV
domains on large RAM systems.  These will be addressed at a
later time.

Code overview:

Though the hypercall simply does arithmetic within locks,
some of the semantics in the code may be a bit subtle.

The key variables (d->unclaimed_pages and total_unclaimed_pages)
starts at zero if no claim has yet been staked for any domain.
(Perhaps a better name is "claimed_but_not_yet_possessed" but that's
a bit unwieldy.)  If no claim hypercalls are executed, there
should be no impact on existing usage.

When a claim is successfully staked by a domain, it is like a
watermark but there is no record kept of the size of the claim.
Instead, d->unclaimed_pages is set to the difference between
d->tot_pages and the claim.  When d->tot_pages increases or decreases,
d->unclaimed_pages atomically decreases or increases.  Once
d->unclaimed_pages reaches zero, the claim is satisfied and
d->unclaimed pages stays at zero -- unless a new claim is
subsequently staked.

The systemwide variable total_unclaimed_pages is always the sum
of d->unclaimed_pages, across all domains.  A non-domain-
specific heap allocation will fail if total_unclaimed_pages
exceeds free (plus, on tmem enabled systems, freeable) pages.

Claim semantics may be modified by flags.  The initial
implementation has three flag, which discerns whether the
caller would like tmem freeable pages to be considered
in determining whether or not the claim can be successfully
staked. Future flags may, for example, specify that the
implementation has one flag, which discerns whether the
caller would like tmem freeable pages to be considered
in determining whether or not the claim can be successfully
staked. Future flags may, for example, specify that the
caller would like the claim to be successful only if there
are sufficient pages available on a single node (per Dario's
suggestion).

A claim can be cancelled by requesting a claim with the
flag being zero (XENMEM_CLAIMF_ignored).

A second subop returns the total outstanding claimed pages
systemwide.

Note: Save/restore/migrate may need to be modified,
else it can be documented that all claims are cancelled.

==
This patch of the proposed XENMEM_claim_pages hypercall/subop, takes
into account review feedback from Jan and Keir and IanC and Matthew Daley,
plus some fixes found via runtime debugging.

v9->v10:
 - Altered the API a bit (new flags, etc).

v8->v9:
- Drop snippets already merged by Keir
  (26269:21a5b181f8ad and 26270:03cb71bc32f9)

v7->v8:
- Replace missing tot_pages adjustment [JBeulich]
- Combine domain_inc/dec_tot_pages into one function [JBeulich]
- Added a few more clarifying comments [djm]

v6->v7:
- Need co-existence with low_mem_virq [JBeulich]

v5->v6:
- Fix post-inc problem [mattjd]

v4->v5:
- Split tools part into separate patch [JBeulich]
- Minor coding style fixups [JBeulich]
- Use rcu_lock_domain_by_id, not _target version [JBeulich]

v3->v4: (please ignore v3)
- Process error, sent stale patch, sorry [djm]

v2->v3:
- Add per-domain and global "get" for unclaimed info [JBeulich]
- New hypercall(s) should fail for unpriv callers [IanC]
- Non-zero extent_order disallowed [JBeulich]
- Remove bonehead ASSERTs [JBeulich]
- Need not hold heaplock for decrease case too [JBeulich]
- More descriptive failure return values [JBeulich]
- Don't allow a claim to exceed max_pages [IanC]
- Subops must be in correct ifdef block in memory.c [keir]

v1->v2:
- Add reset-to-zero page claim in domain_kill [JBeulich]
- Proper handling of struct passed to hypercall [JBeulich]
- Fix alloc_heap_pages when a domain has a claim [JBeulich]
- Need not hold heap_lock if !d->unclaimed_pages [keir]
- Fix missed tot_pages call in donate_page [djm]
- Remove domain_reset_unclaimed_pages; use set with zero [djm]
- Bugfixes found through testing in set_unclaimed [djm]
- More comments in code [djm]
- Code formatting fixes [djm]
===

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/common/domain.c         |   1 +
 xen/common/domctl.c         |   1 +
 xen/common/memory.c         |  31 +++++++++++
 xen/common/page_alloc.c     | 126 +++++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/domctl.h |   3 +-
 xen/include/public/memory.h |  47 ++++++++++++++++-
 xen/include/xen/mm.h        |   4 ++
 xen/include/xen/sched.h     |   1 +
 8 files changed, 209 insertions(+), 5 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b360de1..90ce40b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -507,6 +507,7 @@ int domain_kill(struct domain *d)
         evtchn_destroy(d);
         gnttab_release_mappings(d);
         tmem_destroy(d->tmem);
+        domain_set_unclaimed_pages(d, 0, 0);
         d->tmem = NULL;
         /* fallthrough */
     case DOMDYING_dying:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b7f6619..98d6f50 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
 
     info->tot_pages         = d->tot_pages;
     info->max_pages         = d->max_pages;
+    info->unclaimed_pages   = d->unclaimed_pages;
     info->shr_pages         = atomic_read(&d->shr_pages);
     info->paged_pages       = atomic_read(&d->paged_pages);
     info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 08550ef..c27bba5 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -712,6 +712,37 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case XENMEM_claim_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&reservation, arg, 1) )
+            return -EFAULT;
+
+        if ( !guest_handle_is_null(reservation.extent_start) )
+            return -EINVAL;
+
+        if ( reservation.extent_order != 0 )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_id(reservation.domid);
+        if ( d == NULL )
+            return -EINVAL;
+
+        rc = domain_set_unclaimed_pages(d, reservation.nr_extents,
+                                        reservation.mem_flags);
+
+        rcu_unlock_domain(d);
+
+        break;
+
+    case XENMEM_get_unclaimed_pages:
+        if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+
+        rc = get_total_unclaimed_pages();
+        break;
+
     default:
         rc = arch_memory_op(op, arg);
         break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e9fb15..2442aba 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -252,11 +252,124 @@ static long midsize_alloc_zone_pages;
 #define MIDSIZE_ALLOC_FRAC 128
 
 static DEFINE_SPINLOCK(heap_lock);
+static long total_unclaimed_pages; /* total outstanding claims by all domains */
 
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
 {
+    long dom_before, dom_after, dom_claimed, sys_before, sys_after;
+
     ASSERT(spin_is_locked(&d->page_alloc_lock));
-    return d->tot_pages += pages;
+    d->tot_pages += pages;
+
+    /*
+     * can test d->claimed_pages race-free because it can only change
+     * if d->page_alloc_lock and heap_lock are both held, see also
+     * domain_set_unclaimed_pages below
+     */
+    if ( !d->unclaimed_pages )
+        goto out;
+
+    spin_lock(&heap_lock);
+    /* adjust domain unclaimed_pages; may not go negative */
+    dom_before = d->unclaimed_pages;
+    dom_after = dom_before - pages;
+    if ( (dom_before > 0) && (dom_after < 0) )
+        dom_claimed = 0;
+    else
+        dom_claimed = dom_after;
+    d->unclaimed_pages = dom_claimed;
+    /* flag accounting bug if system unclaimed_pages would go negative */
+    sys_before = total_unclaimed_pages;
+    sys_after = sys_before - (dom_before - dom_claimed);
+    BUG_ON((sys_before > 0) && (sys_after < 0));
+    total_unclaimed_pages = sys_after;
+    spin_unlock(&heap_lock);
+
+out:
+    return d->tot_pages;
+}
+
+int domain_set_unclaimed_pages(struct domain *d, unsigned long pages,
+                                unsigned long flags)
+{
+    int ret = -ENOMEM;
+    unsigned long claim, avail_pages;
+
+    /*
+     * Sanity check.
+     */
+    switch ( flags ) {
+    case XENMEM_CLAIMF_normal:
+    case XENMEM_CLAIMF_tmem:
+        if ( pages == 0 )
+            return -EINVAL;
+        break;
+    case XENMEM_CLAIMF_cancel:
+        pages = 0;
+        break;
+    default:
+    return -ENOSYS;
+    }
+
+    /*
+     * take the domain's page_alloc_lock, else all d->tot_page adjustments
+     * must always take the global heap_lock rather than only in the much
+     * rarer case that d->unclaimed_pages is non-zero
+     */
+    spin_lock(&d->page_alloc_lock);
+    spin_lock(&heap_lock);
+
+    /* pages==0 means "unset" the claim. */
+    if ( pages == 0 )
+    {
+        total_unclaimed_pages -= d->unclaimed_pages;
+        d->unclaimed_pages = 0;
+        ret = 0;
+        goto out;
+    }
+
+    /* only one active claim per domain please */
+    if ( d->unclaimed_pages )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* disallow a claim not exceeding current tot_pages or above max_pages */
+    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* how much memory is available? */
+    avail_pages = total_avail_pages;
+    if ( flags & XENMEM_CLAIMF_tmem )
+        avail_pages += tmem_freeable_pages();
+    avail_pages -= total_unclaimed_pages;
+
+    /*
+     * note, if domain has already allocated memory before making a claim 
+     * then the claim must take tot_pages into account
+     */
+    claim = pages - d->tot_pages;
+    if ( claim > avail_pages )
+        goto out;
+
+    /* yay, claim fits in available memory, stake the claim, success! */
+    d->unclaimed_pages = claim;
+    total_unclaimed_pages += d->unclaimed_pages;
+    ret = 0;
+
+out:
+    spin_unlock(&heap_lock);
+    spin_unlock(&d->page_alloc_lock);
+    return ret;
+}
+
+long get_total_unclaimed_pages(void)
+{
+    return total_unclaimed_pages;
 }
 
 static unsigned long init_node_heap(int node, unsigned long mfn,
@@ -397,7 +510,7 @@ static void __init setup_low_mem_virq(void)
 static void check_low_mem_virq(void)
 {
     unsigned long avail_pages = total_avail_pages +
-        (opt_tmem ? tmem_freeable_pages() : 0);
+        (opt_tmem ? tmem_freeable_pages() : 0) - total_unclaimed_pages;
 
     if ( unlikely(avail_pages <= low_mem_virq_th) )
     {
@@ -466,6 +579,15 @@ static struct page_info *alloc_heap_pages(
     spin_lock(&heap_lock);
 
     /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (total_unclaimed_pages + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          (d == NULL || d->unclaimed_pages < request) )
+        goto not_found;
+
+    /*
      * TMEM: When available memory is scarce due to tmem absorbing it, allow
      * only mid-size allocations to avoid worst of fragmentation issues.
      * Others try tmem pools then fail.  This is a workaround until all
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index deb19db..efae421 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -36,7 +36,7 @@
 #include "grant_table.h"
 #include "hvm/save.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo {
     uint32_t flags;              /* XEN_DOMINF_* */
     uint64_aligned_t tot_pages;
     uint64_aligned_t max_pages;
+    uint64_aligned_t unclaimed_pages;
     uint64_aligned_t shr_pages;
     uint64_aligned_t paged_pages;
     uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 1c5ca19..5c264ca 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -68,6 +68,8 @@ struct xen_memory_reservation {
      *   IN:  GPFN bases of extents to populate with memory
      *   OUT: GMFN bases of extents that were allocated
      *   (NB. This command also updates the mach_to_phys translation table)
+     * XENMEM_claim_pages:
+     *   IN: must be zero
      */
     XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
 
@@ -430,10 +432,51 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
 
 /*
- * Reserve ops for future/out-of-tree "claim" patches (Oracle)
+ * Attempt to stake a claim for a domain on a quantity of pages
+ * of system RAM, but _not_ assign specific pageframes.  Only
+ * arithmetic is performed so the hypercall is very fast and need
+ * not be preemptible, thus sidestepping time-of-check-time-of-use
+ * races for memory allocation.  Returns 0 if the hypervisor page
+ * allocator has atomically and successfully claimed the requested
+ * number of pages, else non-zero.
+ *
+ * Any domain may have only one active claim.  When sufficient memory
+ * has been allocated to resolve the claim, the claim silently expires.
+ * Claiming zero pages effectively resets any outstanding claim and
+ * is always successful.
+ *
+ * Note that a valid claim may be staked even after memory has been
+ * allocated for a domain.  In this case, the claim is not incremental,
+ * i.e. if the domain's tot_pages is 3, and a claim is staked for 10,
+ * only 7 additional pages are claimed.
+ *
+ * Caller must be privileged or the hypercall fails.
  */
 #define XENMEM_claim_pages                  24
-#define XENMEM_get_unclaimed_pages          25
+
+/*
+ * XENMEM_claim_pages flags:
+ *  normal: claim is successful only if sufficient free pages
+ *    are available.
+ *  tmem: claim is successful only if sufficient free pages
+ *    are available and (if tmem is enabled) hypervisor
+ *    may also consider tmem "freeable" pages to satisfy the claim.
+ *  cancel: cancel the outstanding claim for the domain.
+ */
+#define XENMEM_CLAIMF_ignored                0
+
+#define _XENMEM_CLAIMF_normal                1
+#define XENMEM_CLAIMF_normal                 (1U<<_XENMEM_CLAIMF_normal)
+#define _XENMEM_CLAIMF_tmem                  2
+#define XENMEM_CLAIMF_tmem                   (1U<<_XENMEM_CLAIMF_tmem)
+#define _XENMEM_CLAIMF_cancel                3
+#define XENMEM_CLAIMF_cancel                 (1U<<_XENMEM_CLAIMF_cancel)
+/*
+ * Get the number of pages currently claimed (but not yet "possessed")
+ * across all domains.  The caller must be privileged but otherwise
+ * the call never fails.
+ */
+#define XENMEM_get_unclaimed_pages            25
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 2f701f5..f4ab991 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -49,7 +49,11 @@ void free_xenheap_pages(void *v, unsigned int order);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 
+/* Claim handling */
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
+int domain_set_unclaimed_pages(
+    struct domain *d, unsigned long pages, unsigned long flags);
+long get_total_unclaimed_pages(void);
 
 /* Domain suballocator. These functions are *not* interrupt-safe.*/
 void init_domheap_pages(paddr_t ps, paddr_t pe);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e108436..9c5263f 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -242,6 +242,7 @@ struct domain
     struct page_list_head page_list;  /* linked list */
     struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
     unsigned int     tot_pages;       /* number of pages currently possesed */
+    unsigned int     unclaimed_pages; /* pages claimed but not possessed    */
     unsigned int     max_pages;       /* maximum value for tot_pages        */
     atomic_t         shr_pages;       /* number of shared pages             */
     atomic_t         paged_pages;     /* number of paged-out pages          */
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-03-18 13:10 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
2013-03-11 14:20 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
2013-03-11 14:20 ` [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation Konrad Rzeszutek Wilk
2013-03-13 10:23   ` Ian Campbell
2013-03-13 14:42     ` Konrad Rzeszutek Wilk
2013-03-13 14:46       ` Ian Campbell
2013-03-11 14:20 ` [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config Konrad Rzeszutek Wilk
2013-03-13 10:41   ` Ian Campbell
2013-03-13 14:57     ` Konrad Rzeszutek Wilk
2013-03-13 15:05       ` Ian Campbell
2013-03-11 14:20 ` [PATCH 4/6] xc: XENMEM_claim_pages outstanding claims value Konrad Rzeszutek Wilk
2013-03-13 10:43   ` Ian Campbell
2013-03-13 14:57     ` Konrad Rzeszutek Wilk
2013-03-11 14:20 ` [PATCH 5/6] xl: export 'outstanding_pages' value from xcinfo Konrad Rzeszutek Wilk
2013-03-13 10:44   ` Ian Campbell
2013-03-11 14:20 ` [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information Konrad Rzeszutek Wilk
2013-03-13 10:51   ` Ian Campbell
2013-03-13 15:02     ` Konrad Rzeszutek Wilk
2013-03-13 16:03       ` Ian Campbell
2013-03-15 18:05         ` Konrad Rzeszutek Wilk
2013-03-18  9:42           ` Ian Campbell
2013-03-18 13:10             ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2013-03-04 17:47 [PATCH v10] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
2013-03-04 17:47 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
2013-03-04 21:50   ` Keir Fraser
2013-03-05 12:01   ` Tim Deegan
2013-03-05 21:43     ` Konrad Rzeszutek Wilk
2013-03-06  9:07       ` Tim Deegan
2013-03-06 15:36         ` Konrad Rzeszutek Wilk
2013-03-07 14:59           ` Konrad Rzeszutek Wilk
2013-03-07 15:56           ` Tim Deegan
2013-03-08 20:07             ` Konrad Rzeszutek Wilk
2013-03-11  9:25               ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.