All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] xen: arm: Use super pages in p2m
@ 2014-06-10  9:55 Ian Campbell
  2014-06-10  9:57 ` [PATCH 1/6] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Ian Campbell @ 2014-06-10  9:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, Stefano Stabellini, Tim Deegan

The following series adds support for superpage mappings (1G and 2M) in
the p2m for dom0 and domU.

At the moment it is not possible to allocate 1G regions to guests,
because this is gated on multipage_allocation_permitted() which
currently (incorrectly) returns false for dom0 on ARM. This is fixed by
Arianna's "arch/arm: domain build: let dom0 access I/O memory of mapped
devices" (which initialises iomem_caps for dom0).

On a 32-bit platform this causes the overhead of kernbench vs. native to
drop from 8-9% to 1.5-3% (it's basically the same for both dom0 and
domU).

I couldn't get my 64-bit platform to boot natively for some reason but
in absolute terms kernbench improved by ~5% (domU) to ~15% (dom0).

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] xen: arm: dump vcpu gic info in arch_dump_vcpu_info
  2014-06-10  9:55 [PATCH 0/6] xen: arm: Use super pages in p2m Ian Campbell
@ 2014-06-10  9:57 ` Ian Campbell
  2014-06-10 11:08   ` Julien Grall
  2014-06-10  9:57 ` [PATCH 2/6] tools: arm: allocate superpages to guests Ian Campbell
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Instead of looping over vcpus in arch_dump_domain_info

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/domain.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 04d0cd0..e494112 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -742,12 +742,6 @@ int domain_relinquish_resources(struct domain *d)
 
 void arch_dump_domain_info(struct domain *d)
 {
-    struct vcpu *v;
-
-    for_each_vcpu ( d, v )
-    {
-        gic_dump_info(v);
-    }
 }
 
 
@@ -770,6 +764,7 @@ long arch_do_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 void arch_dump_vcpu_info(struct vcpu *v)
 {
+    gic_dump_info(v);
 }
 
 void vcpu_mark_events_pending(struct vcpu *v)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] tools: arm: allocate superpages to guests.
  2014-06-10  9:55 [PATCH 0/6] xen: arm: Use super pages in p2m Ian Campbell
  2014-06-10  9:57 ` [PATCH 1/6] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
@ 2014-06-10  9:57 ` Ian Campbell
  2014-06-10 11:23   ` Julien Grall
  2014-06-10  9:57 ` [PATCH 3/6] xen: arm: only put_page for p2m operations which require it Ian Campbell
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Tries to allocate as many large (1G or 2M) pages as it can to the guest and tries
to align to the next larger size on each attempt so that we can attempt to
allocate even larger pages on the next iteration (this is currently only
exercised via a compile time debug option, which is left in place under a #if
0).

Since ARM page tables are consistent at each level there is a common helper
function which tries to allocate a levels worth of pages. The exception to this
consistency is level 0 which does not support table mappings (0.5TB
superpages!).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/xc_dom_arm.c |  103 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 92 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 75f8363..da68ec3 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -30,6 +30,13 @@
 #define CONSOLE_PFN_OFFSET 0
 #define XENSTORE_PFN_OFFSET 1
 
+#define LPAE_SHIFT 9
+
+#define PFN_4K_SHIFT  (0)
+#define PFN_2M_SHIFT  (PFN_4K_SHIFT+LPAE_SHIFT)
+#define PFN_1G_SHIFT  (PFN_2M_SHIFT+LPAE_SHIFT)
+#define PFN_512G_SHIFT (PFN_1G_SHIFT+LPAE_SHIFT)
+
 /* get guest IO ABI protocol */
 const char *xc_domain_get_native_protocol(xc_interface *xch,
                                           uint32_t domid)
@@ -249,11 +256,57 @@ static int set_mode(xc_interface *xch, domid_t domid, char *guest_type)
     return rc;
 }
 
+#define min_t(type,x,y) \
+        ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
+
+/*  >0: success, *nr_pfns set to number actually populated
+ *   0: didn't try with this pfn shift (e.g. misaligned base etc)
+ *  <0: ERROR
+ */
+static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
+                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns)
+{
+    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
+    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
+    int nr, i, count = min_t(int, 1024*1024, *nr_pfns >> pfn_shift);
+    xen_pfn_t extents[count];
+
+    /* No level zero super pages with current hardware */
+    if ( pfn_shift == PFN_512G_SHIFT )
+        return 0;
+
+    /* Nothing to allocate */
+    if ( !count )
+        return 0;
+
+    /* base is misaligned for this level */
+    if ( mask & base_pfn )
+        return 0;
+
+    /* align to the end of a super page at this level */
+    if ( count & next_mask )
+        count &= next_mask;
+
+    for ( i = 0 ; i < count ; i ++ )
+        extents[i] = base_pfn + (i<<pfn_shift);
+
+    nr = xc_domain_populate_physmap(dom->xch, dom->guest_domid, count,
+                                    pfn_shift, 0, extents);
+    if ( nr <= 0 ) return nr;
+    DOMPRINTF("%s: populated %#x/%#x entries with shift %d",
+              __FUNCTION__, nr, count, pfn_shift);
+
+    *nr_pfns = nr << pfn_shift;
+
+    return 1;
+}
+
 static int populate_guest_memory(struct xc_dom_image *dom,
                                  xen_pfn_t base_pfn, xen_pfn_t nr_pfns)
 {
-    int rc;
+    int rc = 0;
     xen_pfn_t allocsz, pfn;
+    int debug_iters = 0;
 
     DOMPRINTF("%s: populating RAM @ %016"PRIx64"-%016"PRIx64" (%"PRId64"MB)",
               __FUNCTION__,
@@ -261,21 +314,49 @@ static int populate_guest_memory(struct xc_dom_image *dom,
               (uint64_t)(base_pfn + nr_pfns) << XC_PAGE_SHIFT,
               (uint64_t)nr_pfns >> (20-XC_PAGE_SHIFT));
 
-    for ( pfn = 0; pfn < nr_pfns; pfn++ )
-        dom->p2m_host[pfn] = base_pfn + pfn;
-
-    for ( pfn = rc = allocsz = 0; (pfn < nr_pfns) && !rc; pfn += allocsz )
+    for ( pfn = 0; pfn < nr_pfns; pfn += allocsz )
     {
         allocsz = nr_pfns - pfn;
-        if ( allocsz > 1024*1024 )
-            allocsz = 1024*1024;
 
-        rc = xc_domain_populate_physmap_exact(
-            dom->xch, dom->guest_domid, allocsz,
-            0, 0, &dom->p2m_host[pfn]);
+        if ( debug_iters++ > 4 )
+            abort();
+#if 0 /* Enable this to exercise/debug the code which tries to realign
+       * to a superpage boundary, by misaligning at the start. */
+        if ( pfn == 0 )
+        {
+            allocsz = 1;
+            rc = populate_one_size(dom, PFN_4K_SHIFT, base_pfn + pfn, &allocsz);
+            if (rc < 0) break;
+            if (rc > 0) continue;
+            /* Failed to allocate a single page? */
+            break;
+        }
+#endif
+
+        rc = populate_one_size(dom, PFN_512G_SHIFT, base_pfn + pfn, &allocsz);
+        if (rc < 0) break;
+        if (rc > 0) continue;
+
+        rc = populate_one_size(dom, PFN_1G_SHIFT, base_pfn + pfn, &allocsz);
+        if (rc < 0) break;
+        if (rc > 0) continue;
+
+        rc = populate_one_size(dom, PFN_2M_SHIFT, base_pfn + pfn, &allocsz);
+        if (rc < 0) break;
+        if (rc > 0) continue;
+
+        rc = populate_one_size(dom, PFN_4K_SHIFT, base_pfn + pfn, &allocsz);
+        if (rc < 0) break;
+
+        assert(rc > 0); /* Must have tried to allocate some 4k pages! */
     }
 
-    return rc;
+    DOMPRINTF("%s: rc=%d", __FUNCTION__, rc);
+
+    for ( pfn = 0; pfn < nr_pfns; pfn++ )
+        dom->p2m_host[pfn] = base_pfn + pfn;
+
+    return rc < 0 ? rc : 0;
 }
 
 int arch_setup_meminit(struct xc_dom_image *dom)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] xen: arm: only put_page for p2m operations which require it.
  2014-06-10  9:55 [PATCH 0/6] xen: arm: Use super pages in p2m Ian Campbell
  2014-06-10  9:57 ` [PATCH 1/6] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
  2014-06-10  9:57 ` [PATCH 2/6] tools: arm: allocate superpages to guests Ian Campbell
@ 2014-06-10  9:57 ` Ian Campbell
  2014-06-10 11:28   ` Julien Grall
  2014-06-10  9:57 ` [PATCH 4/6] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

In particular do not do it for CACHEFLUSH.

INSERT, RELINQUISH and REMOVE should all put the page (if the current pte is
valid). ALLOCATE doesn't need to since it asserts the current PTE must be
invalid.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/p2m.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 9960e17..830a9f9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -299,6 +299,23 @@ enum p2m_operation {
     CACHEFLUSH,
 };
 
+static void p2m_put_page(const lpae_t pte)
+{
+    /* TODO: Handle other p2m types
+     *
+     * It's safe to do the put_page here because page_alloc will
+     * flush the TLBs if the page is reallocated before the end of
+     * this loop.
+     */
+    if ( p2m_is_foreign(pte.p2m.type) )
+    {
+        unsigned long mfn = pte.p2m.base;
+
+        ASSERT(mfn_valid(mfn));
+        put_page(mfn_to_page(mfn));
+    }
+}
+
 static int apply_p2m_changes(struct domain *d,
                      enum p2m_operation op,
                      paddr_t start_gpaddr,
@@ -436,6 +453,8 @@ static int apply_p2m_changes(struct domain *d,
                 break;
             case INSERT:
                 {
+                    if ( pte.p2m.valid )
+                        p2m_put_page(pte);
                     pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
                     p2m_write_pte(&third[third_table_offset(addr)],
                                   pte, flush_pt);
@@ -451,6 +470,8 @@ static int apply_p2m_changes(struct domain *d,
                         break;
                     }
 
+                    p2m_put_page(pte);
+
                     count += 0x10;
 
                     memset(&pte, 0x00, sizeof(pte));
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] xen: arm: handle superpage mappings in p2m_lookup
  2014-06-10  9:55 [PATCH 0/6] xen: arm: Use super pages in p2m Ian Campbell
                   ` (2 preceding siblings ...)
  2014-06-10  9:57 ` [PATCH 3/6] xen: arm: only put_page for p2m operations which require it Ian Campbell
@ 2014-06-10  9:57 ` Ian Campbell
  2014-06-10 11:35   ` Julien Grall
  2014-06-10  9:57 ` [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte Ian Campbell
  2014-06-10  9:57 ` [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
  5 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Currently none are actually created, but they will be shortly.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/p2m.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 830a9f9..5d654ce 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -123,6 +123,7 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     struct p2m_domain *p2m = &d->arch.p2m;
     lpae_t pte, *first = NULL, *second = NULL, *third = NULL;
     paddr_t maddr = INVALID_PADDR;
+    paddr_t mask;
     p2m_type_t _t;
 
     /* Allow t to be NULL */
@@ -136,15 +137,21 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     if ( !first )
         goto err;
 
+    mask = FIRST_MASK;
     pte = first[first_table_offset(paddr)];
     if ( !pte.p2m.valid || !pte.p2m.table )
         goto done;
 
+    mask = SECOND_MASK;
     second = map_domain_page(pte.p2m.base);
     pte = second[second_table_offset(paddr)];
     if ( !pte.p2m.valid || !pte.p2m.table )
         goto done;
 
+    mask = THIRD_MASK;
+
+    BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
+
     third = map_domain_page(pte.p2m.base);
     pte = third[third_table_offset(paddr)];
 
@@ -156,7 +163,7 @@ done:
     if ( pte.p2m.valid )
     {
         ASSERT(pte.p2m.type != p2m_invalid);
-        maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr & ~PAGE_MASK);
+        maddr = (pte.bits & PADDR_MASK & mask) | (paddr & ~mask);
         *t = pte.p2m.type;
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte
  2014-06-10  9:55 [PATCH 0/6] xen: arm: Use super pages in p2m Ian Campbell
                   ` (3 preceding siblings ...)
  2014-06-10  9:57 ` [PATCH 4/6] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
@ 2014-06-10  9:57 ` Ian Campbell
  2014-06-10 11:37   ` Julien Grall
  2014-06-10  9:57 ` [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
  5 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Not terribly helpful right now, since they aren't widely used, but makes future
patches easier to read.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/p2m.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 5d654ce..233df72 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -14,6 +14,11 @@
 #define P2M_FIRST_ORDER 1
 #define P2M_FIRST_ENTRIES (LPAE_ENTRIES<<P2M_FIRST_ORDER)
 
+#define p2m_valid(pte) ((pte).p2m.valid)
+/* Remember: L3 entries set the table bit! */
+#define p2m_table(pte) (p2m_valid(pte) && (pte).p2m.table)
+#define p2m_entry(pte) (p2m_valid(pte) && !(pte).p2m.table)
+
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
@@ -139,13 +144,13 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
 
     mask = FIRST_MASK;
     pte = first[first_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
+    if ( !p2m_table(pte) )
         goto done;
 
     mask = SECOND_MASK;
     second = map_domain_page(pte.p2m.base);
     pte = second[second_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
+    if ( !p2m_table(pte) )
         goto done;
 
     mask = THIRD_MASK;
@@ -156,11 +161,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     pte = third[third_table_offset(paddr)];
 
     /* This bit must be one in the level 3 entry */
-    if ( !pte.p2m.table )
+    if ( !p2m_table(pte) )
         pte.bits = 0;
 
 done:
-    if ( pte.p2m.valid )
+    if ( p2m_valid(pte) )
     {
         ASSERT(pte.p2m.type != p2m_invalid);
         maddr = (pte.bits & PADDR_MASK & mask) | (paddr & ~mask);
@@ -367,7 +372,7 @@ static int apply_p2m_changes(struct domain *d,
             cur_first_page = p2m_first_level_index(addr);
         }
 
-        if ( !first[first_table_offset(addr)].p2m.valid )
+        if ( !p2m_valid(first[first_table_offset(addr)]) )
         {
             if ( !populate )
             {
@@ -384,7 +389,7 @@ static int apply_p2m_changes(struct domain *d,
             }
         }
 
-        BUG_ON(!first[first_table_offset(addr)].p2m.valid);
+        BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
 
         if ( cur_first_offset != first_table_offset(addr) )
         {
@@ -394,7 +399,7 @@ static int apply_p2m_changes(struct domain *d,
         }
         /* else: second already valid */
 
-        if ( !second[second_table_offset(addr)].p2m.valid )
+        if ( !p2m_valid(second[second_table_offset(addr)]) )
         {
             if ( !populate )
             {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-10  9:55 [PATCH 0/6] xen: arm: Use super pages in p2m Ian Campbell
                   ` (4 preceding siblings ...)
  2014-06-10  9:57 ` [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte Ian Campbell
@ 2014-06-10  9:57 ` Ian Campbell
  2014-06-10 12:10   ` Julien Grall
  5 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10  9:57 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This creates superpage (1G and 2M) mappings in the p2m for both domU and dom0
when the guest and machine addresses are suitably aligned. This relies on the
domain builder to allocate suitably sized regions, this is trivially true for
1:1 mapped dom0's and was arranged for guests in a previous patch. A non-1:1
dom0's memory is not currently deliberately aligned.

Since ARM pagetables are (mostly) consistent at each level this is implemented
by refactoring the handling of a single level of pagetable into a common
function. The two inconsistencies are that there are no superpage mappings at
level zero and that level three entry mappings must set the table bit.

When inserting new mappings the code shatters superpage mappings as necessary,
but currently makes no attempt to coalesce anything again. In particular when
inserting a mapping which could be a superpage over an existing table mapping
we do not attempt to free that lower page table and instead descend into it.

Some p2m statistics are added to keep track of the number of each level of
mapping and the number of times we've had to shatter an existing mapping.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/domain.c     |    1 +
 xen/arch/arm/p2m.c        |  497 +++++++++++++++++++++++++++++++++------------
 xen/include/asm-arm/p2m.h |    9 +
 3 files changed, 373 insertions(+), 134 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index e494112..45b7cbd 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -742,6 +742,7 @@ int domain_relinquish_resources(struct domain *d)
 
 void arch_dump_domain_info(struct domain *d)
 {
+    p2m_dump_info(d);
 }
 
 
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 233df72..1c2c724 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/lib.h>
+#include <xen/stdbool.h>
 #include <xen/errno.h>
 #include <xen/domain_page.h>
 #include <xen/bitops.h>
@@ -19,6 +20,22 @@
 #define p2m_table(pte) (p2m_valid(pte) && (pte).p2m.table)
 #define p2m_entry(pte) (p2m_valid(pte) && !(pte).p2m.table)
 
+void p2m_dump_info(struct domain *d)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+
+    spin_lock(&p2m->lock);
+    printk("p2m mappings for domain %d (vmid %d):\n",
+           d->domain_id, p2m->vmid);
+    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
+    printk("  1G mappings: %d (shattered %d)\n",
+           p2m->stats.mappings[1], p2m->stats.shattered[1]);
+    printk("  2M mappings: %d (shattered %d)\n",
+           p2m->stats.mappings[2], p2m->stats.shattered[2]);
+    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
+    spin_unlock(&p2m->lock);
+}
+
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
@@ -274,15 +291,26 @@ static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool_t flush_cache)
         clean_xen_dcache(*p);
 }
 
-/* Allocate a new page table page and hook it in via the given entry */
-static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
+/*
+ * Allocate a new page table page and hook it in via the given entry.
+ * apply_one_level relies on this returning 0 on success
+ * and -ve on failure.
+ *
+ * If the exist entry is present then it must be a mapping and not a
+ * table and it will be shattered into the next level down.
+ *
+ * level_shift is the number of bits at the level we want to create.
+ */
+static int p2m_create_table(struct domain *d, lpae_t *entry,
+                            int level_shift, bool_t flush_cache)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
     struct page_info *page;
-    void *p;
+    lpae_t *p;
     lpae_t pte;
+    int splitting = entry->p2m.valid;
 
-    BUG_ON(entry->p2m.valid);
+    BUG_ON(entry->p2m.table);
 
     page = alloc_domheap_page(NULL, 0);
     if ( page == NULL )
@@ -291,9 +319,39 @@ static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
     page_list_add(page, &p2m->pages);
 
     p = __map_domain_page(page);
-    clear_page(p);
+    if ( splitting )
+    {
+        p2m_type_t t = entry->p2m.type;
+        unsigned long base_pfn = entry->p2m.base;
+        int i;
+
+        /*
+         * We are either splitting a first level 1G page into 512 second level
+         * 2M pages, or a second level 2M page into 512 third level 4K pages.
+         */
+         for ( i=0 ; i < LPAE_ENTRIES; i++ )
+         {
+             pte = mfn_to_p2m_entry(base_pfn + (i<<(level_shift-LPAE_SHIFT)),
+                                    MATTR_MEM, t);
+
+             /*
+              * First and second level super pages set p2m.table = 0, but
+              * third level entries set table = 1.
+              */
+             if ( level_shift - LPAE_SHIFT )
+                 pte.p2m.table = 0;
+
+             write_pte(&p[i], pte);
+         }
+    }
+    else
+    {
+        clear_page(p);
+    }
+
     if ( flush_cache )
         clean_xen_dcache_va_range(p, PAGE_SIZE);
+
     unmap_domain_page(p);
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
@@ -311,8 +369,14 @@ enum p2m_operation {
     CACHEFLUSH,
 };
 
-static void p2m_put_page(const lpae_t pte)
+/* Put any references on the single 4K page referenced by pte.  TODO:
+ * Handle superpages, for now we only take special references for leaf
+ * pages (specifically foreign ones, which can't be super mapped today).
+ */
+static void p2m_put_l3_page(const lpae_t pte)
 {
+    ASSERT(p2m_valid(pte));
+
     /* TODO: Handle other p2m types
      *
      * It's safe to do the put_page here because page_alloc will
@@ -328,6 +392,248 @@ static void p2m_put_page(const lpae_t pte)
     }
 }
 
+/*
+ * Returns true if start_gpaddr..end_gpaddr contains at least one
+ * suitably aligned level_size mappping of maddr.
+ *
+ * So long as the range is large enough the end_gpaddr need not be
+ * aligned (callers should create one superpage mapping based on this
+ * result and then call this again on the new range, eventually the
+ * slop at the end will cause this function to return false).
+ */
+static bool_t is_mapping_aligned(const paddr_t start_gpaddr,
+                                 const paddr_t end_gpaddr,
+                                 const paddr_t maddr,
+                                 const paddr_t level_size)
+{
+    const paddr_t level_mask = level_size - 1;
+
+    /* No hardware superpages at level 0 */
+    if ( level_size == ZEROETH_SIZE )
+        return false;
+
+    /*
+     * A range smaller than the size of a superpage at this level
+     * cannot be superpage aligned.
+     */
+    if ( ( end_gpaddr - start_gpaddr ) < level_size - 1 )
+        return false;
+
+    /* Both the gpaddr and maddr must be aligned */
+    if ( start_gpaddr & level_mask )
+        return false;
+    if ( maddr & level_mask )
+        return false;
+    return true;
+}
+
+#define P2M_ONE_DESCEND        0
+#define P2M_ONE_PROGRESS_NOP   0x1
+#define P2M_ONE_PROGRESS       0x10
+
+/*
+ * 0   == (P2M_ONE_DESCEND) continue to decend the tree
+ * +ve == (P2M_ONE_PROGRESS_*) handled at this level, continue, flush,
+ *        entry, addr and maddr updated.  Return value is an
+ *        indication of the amount of work done (for preemption).
+ * -ve == (-Exxx) error.
+ */
+static int apply_one_level(struct domain *d,
+                           lpae_t *entry,
+                           unsigned int level,
+                           bool_t flush_cache,
+                           enum p2m_operation op,
+                           paddr_t start_gpaddr,
+                           paddr_t end_gpaddr,
+                           paddr_t *addr,
+                           paddr_t *maddr,
+                           bool_t *flush,
+                           int mattr,
+                           p2m_type_t t)
+{
+    /* Helpers to lookup the properties of each level */
+    const paddr_t level_sizes[] =
+        { ZEROETH_SIZE, FIRST_SIZE, SECOND_SIZE, THIRD_SIZE };
+    const paddr_t level_masks[] =
+        { ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK };
+    const paddr_t level_shifts[] =
+        { ZEROETH_SHIFT, FIRST_SHIFT, SECOND_SHIFT, THIRD_SHIFT };
+    const paddr_t level_size = level_sizes[level];
+    const paddr_t level_mask = level_masks[level];
+    const paddr_t level_shift = level_shifts[level];
+
+    struct p2m_domain *p2m = &d->arch.p2m;
+    lpae_t pte;
+    const lpae_t orig_pte = *entry;
+    int rc;
+
+    BUG_ON(level > 3);
+
+    switch ( op )
+    {
+    case ALLOCATE:
+        ASSERT(level < 3 || !p2m_valid(orig_pte));
+        ASSERT(*maddr == 0);
+
+        if ( p2m_valid(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        if ( is_mapping_aligned(*addr, end_gpaddr, 0, level_size) )
+        {
+            struct page_info *page;
+
+            page = alloc_domheap_pages(d, level_shift - PAGE_SHIFT, 0);
+            if ( page )
+            {
+                pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
+                if ( level != 3 )
+                    pte.p2m.table = 0;
+                p2m_write_pte(entry, pte, flush_cache);
+                p2m->stats.mappings[level]++;
+                return P2M_ONE_PROGRESS;
+            }
+            else if ( level == 3 )
+                return -ENOMEM;
+        }
+
+        BUG_ON(level == 3); /* L3 is always superpage aligned */
+
+        /*
+         * If we get here then we failed to allocate a sufficiently
+         * large contiguous region for this level (which can't be
+         * L3). Create a page table and continue to descend so we try
+         * smaller allocations.
+         */
+        rc = p2m_create_table(d, entry, 0, flush_cache);
+        if ( rc < 0 )
+            return rc;
+
+        return P2M_ONE_DESCEND;
+
+    case INSERT:
+        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
+           /* We do not handle replacing an existing table with a superpage */
+             (level == 3 || !p2m_table(orig_pte)) )
+        {
+            /* New mapping is superpage aligned, make it */
+            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
+            if ( level < 3 )
+                pte.p2m.table = 0; /* Superpage entry */
+
+            p2m_write_pte(entry, pte, flush_cache);
+
+            *flush |= p2m_valid(orig_pte);
+
+            *addr += level_size;
+            *maddr += level_size;
+
+            if ( p2m_valid(orig_pte) )
+            {
+                /*
+                 * We can't currently get here for an existing table
+                 * mapping, since we don't handle replacing an
+                 * existing table with a superpage. If we did we would
+                 * need to handle freeing (and accounting) for the bit
+                 * of the p2m tree which we would be about to lop off.
+                 */
+                BUG_ON(level < 3 && p2m_table(orig_pte));
+                if ( level == 3 )
+                    p2m_put_l3_page(orig_pte);
+            }
+            else /* New mapping */
+                p2m->stats.mappings[level]++;
+
+            return P2M_ONE_PROGRESS;
+        }
+        else
+        {
+            /* New mapping is not superpage aligned, create a new table entry */
+            BUG_ON(level == 3); /* L3 is always superpage aligned */
+
+            /* Not present -> create table entry and descend */
+            if ( !p2m_valid(orig_pte) )
+            {
+                rc = p2m_create_table(d, entry, 0, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+                return P2M_ONE_DESCEND;
+            }
+
+            /* Existing superpage mapping -> shatter and descend */
+            if ( p2m_entry(orig_pte) )
+            {
+                *flush = true;
+                rc = p2m_create_table(d, entry,
+                                      level_shift - PAGE_SHIFT, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+
+                p2m->stats.shattered[level]++;
+                p2m->stats.mappings[level]--;
+                p2m->stats.mappings[level+1] += LPAE_ENTRIES;
+            } /* else: an existing table mapping -> descend */
+
+            BUG_ON(!entry->p2m.table);
+
+            return P2M_ONE_DESCEND;
+        }
+
+        break;
+
+    case RELINQUISH:
+    case REMOVE:
+        if ( !p2m_valid(orig_pte) )
+        {
+            /* Progress up to next boundary */
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        if ( level < 3 && p2m_table(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        *flush = true;
+
+        memset(&pte, 0x00, sizeof(pte));
+        p2m_write_pte(entry, pte, flush_cache);
+
+        *addr += level_size;
+
+        p2m->stats.mappings[level]--;
+
+        if ( level == 3 )
+            p2m_put_l3_page(orig_pte);
+
+        /*
+         * This is still a single pte write, no matter the level, so no need to
+         * scale.
+         */
+        return P2M_ONE_PROGRESS;
+
+    case CACHEFLUSH:
+        if ( !p2m_valid(orig_pte) )
+        {
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        /*
+         * could flush up to the next boundary, but would need to be
+         * careful about preemption, so just do one page now and loop.
+         */
+        *addr += PAGE_SIZE;
+        if ( p2m_is_ram(orig_pte.p2m.type) )
+        {
+            flush_page_to_ram(orig_pte.p2m.base + third_table_offset(*addr));
+            return P2M_ONE_PROGRESS;
+        }
+        else
+            return P2M_ONE_PROGRESS_NOP;
+    }
+
+    BUG(); /* Should never get here */
+}
+
 static int apply_p2m_changes(struct domain *d,
                      enum p2m_operation op,
                      paddr_t start_gpaddr,
@@ -344,9 +650,7 @@ static int apply_p2m_changes(struct domain *d,
                   cur_first_offset = ~0,
                   cur_second_offset = ~0;
     unsigned long count = 0;
-    unsigned int flush = 0;
-    bool_t populate = (op == INSERT || op == ALLOCATE);
-    lpae_t pte;
+    bool_t flush = false;
     bool_t flush_pt;
 
     /* Some IOMMU don't support coherent PT walk. When the p2m is
@@ -360,6 +664,25 @@ static int apply_p2m_changes(struct domain *d,
     addr = start_gpaddr;
     while ( addr < end_gpaddr )
     {
+        /*
+         * Arbitrarily, preempt every 512 operations or 8192 nops.
+         * 512*P2M_ONE_PROGRESS == 8192*P2M_ONE_PROGRESS_NOP == 0x2000
+         *
+         * count is initialised to 0 above, so we are guaranteed to
+         * always make at least one pass.
+         */
+
+        if ( op == RELINQUISH && count >= 0x2000 )
+        {
+            if ( hypercall_preempt_check() )
+            {
+                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
+                rc = -ERESTART;
+                goto out;
+            }
+            count = 0;
+        }
+
         if ( cur_first_page != p2m_first_level_index(addr) )
         {
             if ( first ) unmap_domain_page(first);
@@ -372,22 +695,18 @@ static int apply_p2m_changes(struct domain *d,
             cur_first_page = p2m_first_level_index(addr);
         }
 
-        if ( !p2m_valid(first[first_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + FIRST_SIZE) & FIRST_MASK;
-                continue;
-            }
+        /* We only use a 3 level p2m at the moment, so no level 0,
+         * current hardware doesn't support super page mappings at
+         * level 0 anyway */
 
-            rc = p2m_create_table(d, &first[first_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 )
-            {
-                printk("p2m_populate_ram: L1 failed\n");
-                goto out;
-            }
-        }
+        rc = apply_one_level(d, &first[first_table_offset(addr)],
+                             1, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        count += rc;
+        if ( rc != P2M_ONE_DESCEND ) continue;
 
         BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
 
@@ -399,23 +718,16 @@ static int apply_p2m_changes(struct domain *d,
         }
         /* else: second already valid */
 
-        if ( !p2m_valid(second[second_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + SECOND_SIZE) & SECOND_MASK;
-                continue;
-            }
+        rc = apply_one_level(d,&second[second_table_offset(addr)],
+                             2, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        count += rc;
+        if ( rc != P2M_ONE_DESCEND ) continue;
 
-            rc = p2m_create_table(d, &second[second_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 ) {
-                printk("p2m_populate_ram: L2 failed\n");
-                goto out;
-            }
-        }
-
-        BUG_ON(!second[second_table_offset(addr)].p2m.valid);
+        BUG_ON(!p2m_valid(second[second_table_offset(addr)]));
 
         if ( cur_second_offset != second_table_offset(addr) )
         {
@@ -425,98 +737,15 @@ static int apply_p2m_changes(struct domain *d,
             cur_second_offset = second_table_offset(addr);
         }
 
-        pte = third[third_table_offset(addr)];
-
-        flush |= pte.p2m.valid;
-
-        /* TODO: Handle other p2m type
-         *
-         * It's safe to do the put_page here because page_alloc will
-         * flush the TLBs if the page is reallocated before the end of
-         * this loop.
-         */
-        if ( pte.p2m.valid && p2m_is_foreign(pte.p2m.type) )
-        {
-            unsigned long mfn = pte.p2m.base;
-
-            ASSERT(mfn_valid(mfn));
-            put_page(mfn_to_page(mfn));
-        }
-
-        switch (op) {
-            case ALLOCATE:
-                {
-                    /* Allocate a new RAM page and attach */
-                    struct page_info *page;
-
-                    ASSERT(!pte.p2m.valid);
-                    rc = -ENOMEM;
-                    page = alloc_domheap_page(d, 0);
-                    if ( page == NULL ) {
-                        printk("p2m_populate_ram: failed to allocate page\n");
-                        goto out;
-                    }
-
-                    pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
-
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                }
-                break;
-            case INSERT:
-                {
-                    if ( pte.p2m.valid )
-                        p2m_put_page(pte);
-                    pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    maddr += PAGE_SIZE;
-                }
-                break;
-            case RELINQUISH:
-            case REMOVE:
-                {
-                    if ( !pte.p2m.valid )
-                    {
-                        count++;
-                        break;
-                    }
-
-                    p2m_put_page(pte);
-
-                    count += 0x10;
-
-                    memset(&pte, 0x00, sizeof(pte));
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    count++;
-                }
-                break;
-
-            case CACHEFLUSH:
-                {
-                    if ( !pte.p2m.valid || !p2m_is_ram(pte.p2m.type) )
-                        break;
-
-                    flush_page_to_ram(pte.p2m.base);
-                }
-                break;
-        }
-
-        /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */
-        if ( op == RELINQUISH && count >= 0x2000 )
-        {
-            if ( hypercall_preempt_check() )
-            {
-                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
-                rc = -ERESTART;
-                goto out;
-            }
-            count = 0;
-        }
-
-        /* Got the next page */
-        addr += PAGE_SIZE;
+        rc = apply_one_level(d, &third[third_table_offset(addr)],
+                             3, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        /* L3 had better have done something! We cannot descend any further */
+        BUG_ON(rc == P2M_ONE_DESCEND);
+        count += rc;
     }
 
     if ( flush )
@@ -574,7 +803,7 @@ int guest_physmap_add_entry(struct domain *d,
 {
     return apply_p2m_changes(d, INSERT,
                              pfn_to_paddr(gpfn),
-                             pfn_to_paddr(gpfn + (1 << page_order)),
+                             pfn_to_paddr(gpfn + (1 << page_order)) - 1,
                              pfn_to_paddr(mfn), MATTR_MEM, t);
 }
 
@@ -584,7 +813,7 @@ void guest_physmap_remove_page(struct domain *d,
 {
     apply_p2m_changes(d, REMOVE,
                       pfn_to_paddr(gpfn),
-                      pfn_to_paddr(gpfn + (1<<page_order)),
+                      pfn_to_paddr(gpfn + (1<<page_order)) - 1,
                       pfn_to_paddr(mfn), MATTR_MEM, p2m_invalid);
 }
 
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 911d32d..0cfbb09 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -29,6 +29,12 @@ struct p2m_domain {
      * resume the search. Apart from during teardown this can only
      * decrease. */
     unsigned long lowest_mapped_gfn;
+
+    struct {
+        uint32_t mappings[4]; /* Number of mappings at each p2m tree level */
+        uint32_t shattered[4]; /* Number of times we have shattered a mapping
+                                * at each p2m tree level. */
+    } stats;
 };
 
 /* List of possible type for each page in the p2m entry.
@@ -79,6 +85,9 @@ int p2m_alloc_table(struct domain *d);
 void p2m_save_state(struct vcpu *p);
 void p2m_restore_state(struct vcpu *n);
 
+/* */
+void p2m_dump_info(struct domain *d);
+
 /* Look up the MFN corresponding to a domain's PFN. */
 paddr_t p2m_lookup(struct domain *d, paddr_t gpfn, p2m_type_t *t);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xen: arm: dump vcpu gic info in arch_dump_vcpu_info
  2014-06-10  9:57 ` [PATCH 1/6] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
@ 2014-06-10 11:08   ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2014-06-10 11:08 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 06/10/2014 10:57 AM, Ian Campbell wrote:
> Instead of looping over vcpus in arch_dump_domain_info
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] tools: arm: allocate superpages to guests.
  2014-06-10  9:57 ` [PATCH 2/6] tools: arm: allocate superpages to guests Ian Campbell
@ 2014-06-10 11:23   ` Julien Grall
  2014-06-10 15:16     ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-10 11:23 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 06/10/2014 10:57 AM, Ian Campbell wrote:
> Tries to allocate as many large (1G or 2M) pages as it can to the guest and tries
> to align to the next larger size on each attempt so that we can attempt to
> allocate even larger pages on the next iteration (this is currently only
> exercised via a compile time debug option, which is left in place under a #if
> 0).
> 
> Since ARM page tables are consistent at each level there is a common helper
> function which tries to allocate a levels worth of pages. The exception to this
> consistency is level 0 which does not support table mappings (0.5TB
> superpages!).
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  tools/libxc/xc_dom_arm.c |  103 +++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 92 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
> index 75f8363..da68ec3 100644
> --- a/tools/libxc/xc_dom_arm.c
> +++ b/tools/libxc/xc_dom_arm.c
> @@ -30,6 +30,13 @@
>  #define CONSOLE_PFN_OFFSET 0
>  #define XENSTORE_PFN_OFFSET 1
>  
> +#define LPAE_SHIFT 9
> +
> +#define PFN_4K_SHIFT  (0)
> +#define PFN_2M_SHIFT  (PFN_4K_SHIFT+LPAE_SHIFT)
> +#define PFN_1G_SHIFT  (PFN_2M_SHIFT+LPAE_SHIFT)
> +#define PFN_512G_SHIFT (PFN_1G_SHIFT+LPAE_SHIFT)
> +
>  /* get guest IO ABI protocol */
>  const char *xc_domain_get_native_protocol(xc_interface *xch,
>                                            uint32_t domid)
> @@ -249,11 +256,57 @@ static int set_mode(xc_interface *xch, domid_t domid, char *guest_type)
>      return rc;
>  }
>  
> +#define min_t(type,x,y) \
> +        ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })

min_t is also defined in xc_dom_decompress_unsafe_xz.c. It might be
interesting to define it in a common header (such as xc_private.h).

> +/*  >0: success, *nr_pfns set to number actually populated
> + *   0: didn't try with this pfn shift (e.g. misaligned base etc)
> + *  <0: ERROR
> + */
> +static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
> +                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns)
> +{
> +    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
> +    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
> +    int nr, i, count = min_t(int, 1024*1024, *nr_pfns >> pfn_shift);

Stupid question, where does come from the 1024*1024?

> +    xen_pfn_t extents[count];

If I follow the count definition, it's possible to allocate 1024*1024
xen_pfn_t (about 8MB) on the stack.

[..]

>  static int populate_guest_memory(struct xc_dom_image *dom,
>                                   xen_pfn_t base_pfn, xen_pfn_t nr_pfns)
>  {
> -    int rc;
> +    int rc = 0;
>      xen_pfn_t allocsz, pfn;
> +    int debug_iters = 0;
>  
>      DOMPRINTF("%s: populating RAM @ %016"PRIx64"-%016"PRIx64" (%"PRId64"MB)",
>                __FUNCTION__,
> @@ -261,21 +314,49 @@ static int populate_guest_memory(struct xc_dom_image *dom,
>                (uint64_t)(base_pfn + nr_pfns) << XC_PAGE_SHIFT,
>                (uint64_t)nr_pfns >> (20-XC_PAGE_SHIFT));
>  
> -    for ( pfn = 0; pfn < nr_pfns; pfn++ )
> -        dom->p2m_host[pfn] = base_pfn + pfn;
> -
> -    for ( pfn = rc = allocsz = 0; (pfn < nr_pfns) && !rc; pfn += allocsz )
> +    for ( pfn = 0; pfn < nr_pfns; pfn += allocsz )
>      {
>          allocsz = nr_pfns - pfn;
> -        if ( allocsz > 1024*1024 )
> -            allocsz = 1024*1024;
>  
> -        rc = xc_domain_populate_physmap_exact(
> -            dom->xch, dom->guest_domid, allocsz,
> -            0, 0, &dom->p2m_host[pfn]);
> +        if ( debug_iters++ > 4 )
> +            abort();

IIRC, abort doesn't give any stack trace. I'm not sure if you intended
to keep it in your patch. If so, I would add an error message.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] xen: arm: only put_page for p2m operations which require it.
  2014-06-10  9:57 ` [PATCH 3/6] xen: arm: only put_page for p2m operations which require it Ian Campbell
@ 2014-06-10 11:28   ` Julien Grall
  2014-06-10 15:18     ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-10 11:28 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 06/10/2014 10:57 AM, Ian Campbell wrote:
> In particular do not do it for CACHEFLUSH.
> 
> INSERT, RELINQUISH and REMOVE should all put the page (if the current pte is
> valid). ALLOCATE doesn't need to since it asserts the current PTE must be
> invalid.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  xen/arch/arm/p2m.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 9960e17..830a9f9 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -299,6 +299,23 @@ enum p2m_operation {
>      CACHEFLUSH,
>  };
>  
> +static void p2m_put_page(const lpae_t pte)
> +{
> +    /* TODO: Handle other p2m types
> +     *
> +     * It's safe to do the put_page here because page_alloc will
> +     * flush the TLBs if the page is reallocated before the end of
> +     * this loop.
> +     */
> +    if ( p2m_is_foreign(pte.p2m.type) )
> +    {
> +        unsigned long mfn = pte.p2m.base;
> +
> +        ASSERT(mfn_valid(mfn));
> +        put_page(mfn_to_page(mfn));
> +    }
> +}
> +

You forgot to drop the put_page in apply_p2m_changes. So, now we have 2
put_page rather than one.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xen: arm: handle superpage mappings in p2m_lookup
  2014-06-10  9:57 ` [PATCH 4/6] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
@ 2014-06-10 11:35   ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2014-06-10 11:35 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 06/10/2014 10:57 AM, Ian Campbell wrote:
> Currently none are actually created, but they will be shortly.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte
  2014-06-10  9:57 ` [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte Ian Campbell
@ 2014-06-10 11:37   ` Julien Grall
  2014-06-10 11:46     ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-10 11:37 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

On 06/10/2014 10:57 AM, Ian Campbell wrote:
>      mask = SECOND_MASK;
>      second = map_domain_page(pte.p2m.base);
>      pte = second[second_table_offset(paddr)];
> -    if ( !pte.p2m.valid || !pte.p2m.table )
> +    if ( !p2m_table(pte) )
>          goto done;
>  
>      mask = THIRD_MASK;
> @@ -156,11 +161,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
>      pte = third[third_table_offset(paddr)];
>  
>      /* This bit must be one in the level 3 entry */
> -    if ( !pte.p2m.table )
> +    if ( !p2m_table(pte) )
>          pte.bits = 0;
>  
>  done:
> -    if ( pte.p2m.valid )
> +    if ( p2m_valid(pte) )

Regardless the current check, I think this should be p2m_entry(pte) to
help code comprehension.

Indeed, the can only get the address if the pte is pointed to a memory
block.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte
  2014-06-10 11:37   ` Julien Grall
@ 2014-06-10 11:46     ` Ian Campbell
  2014-06-10 11:54       ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10 11:46 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-10 at 12:37 +0100, Julien Grall wrote:
> On 06/10/2014 10:57 AM, Ian Campbell wrote:
> >      mask = SECOND_MASK;
> >      second = map_domain_page(pte.p2m.base);
> >      pte = second[second_table_offset(paddr)];
> > -    if ( !pte.p2m.valid || !pte.p2m.table )
> > +    if ( !p2m_table(pte) )
> >          goto done;
> >  
> >      mask = THIRD_MASK;
> > @@ -156,11 +161,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
> >      pte = third[third_table_offset(paddr)];
> >  
> >      /* This bit must be one in the level 3 entry */
> > -    if ( !pte.p2m.table )
> > +    if ( !p2m_table(pte) )
> >          pte.bits = 0;
> >  
> >  done:
> > -    if ( pte.p2m.valid )
> > +    if ( p2m_valid(pte) )
> 
> Regardless the current check, I think this should be p2m_entry(pte) to
> help code comprehension.
> 
> Indeed, the can only get the address if the pte is pointed to a memory
> block.

Yes, but an L3 PTE has the table bit set, which would make
p2m_entry(pte) false...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte
  2014-06-10 11:46     ` Ian Campbell
@ 2014-06-10 11:54       ` Julien Grall
  2014-06-10 12:29         ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-10 11:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 06/10/2014 12:46 PM, Ian Campbell wrote:
> On Tue, 2014-06-10 at 12:37 +0100, Julien Grall wrote:
>> On 06/10/2014 10:57 AM, Ian Campbell wrote:
>>>      mask = SECOND_MASK;
>>>      second = map_domain_page(pte.p2m.base);
>>>      pte = second[second_table_offset(paddr)];
>>> -    if ( !pte.p2m.valid || !pte.p2m.table )
>>> +    if ( !p2m_table(pte) )
>>>          goto done;
>>>  
>>>      mask = THIRD_MASK;
>>> @@ -156,11 +161,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
>>>      pte = third[third_table_offset(paddr)];
>>>  
>>>      /* This bit must be one in the level 3 entry */
>>> -    if ( !pte.p2m.table )
>>> +    if ( !p2m_table(pte) )
>>>          pte.bits = 0;
>>>  
>>>  done:
>>> -    if ( pte.p2m.valid )
>>> +    if ( p2m_valid(pte) )
>>
>> Regardless the current check, I think this should be p2m_entry(pte) to
>> help code comprehension.
>>
>> Indeed, the can only get the address if the pte is pointed to a memory
>> block.
> 
> Yes, but an L3 PTE has the table bit set, which would make
> p2m_entry(pte) false...

Hmmm... right. But this bit (ie table bit) doesn't have the same meaning
on L3. Your comment on p2m_table is confusing.

Anyway:

Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-10  9:57 ` [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
@ 2014-06-10 12:10   ` Julien Grall
  2014-06-10 15:23     ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-10 12:10 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

This patch looks good in general. I've only few comments, see them below.

On 06/10/2014 10:57 AM, Ian Campbell wrote:
> +void p2m_dump_info(struct domain *d)
> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +
> +    spin_lock(&p2m->lock);
> +    printk("p2m mappings for domain %d (vmid %d):\n",
> +           d->domain_id, p2m->vmid);
> +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
> +    printk("  1G mappings: %d (shattered %d)\n",
> +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
> +    printk("  2M mappings: %d (shattered %d)\n",
> +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
> +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);

I wondering if we can have more than 2^32-1 4K mapping sometimes...

[..]

> +    else
> +    {
> +        clear_page(p);
> +    }
> +

Spurious {}.

[..]

> +/*
> + * 0   == (P2M_ONE_DESCEND) continue to decend the tree

s/decend/descend/

[..]

> @@ -574,7 +803,7 @@ int guest_physmap_add_entry(struct domain *d,
>  {
>      return apply_p2m_changes(d, INSERT,
>                               pfn_to_paddr(gpfn),
> -                             pfn_to_paddr(gpfn + (1 << page_order)),
> +                             pfn_to_paddr(gpfn + (1 << page_order)) - 1,

> @@ -584,7 +813,7 @@ void guest_physmap_remove_page(struct domain *d,
>  {
>      apply_p2m_changes(d, REMOVE,
>                        pfn_to_paddr(gpfn),
> -                      pfn_to_paddr(gpfn + (1<<page_order)),
> +                      pfn_to_paddr(gpfn + (1<<page_order)) - 1,

Why did you add -1 on both function? This is inconsistent with
p2m_populate_ram which use the range [start, end[.

I think we should document how we pass the argument to apply_p2m_changes
somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
would avoid this kind of problem.
	
Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte
  2014-06-10 11:54       ` Julien Grall
@ 2014-06-10 12:29         ` Ian Campbell
  2014-06-10 12:52           ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10 12:29 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-10 at 12:54 +0100, Julien Grall wrote:
> On 06/10/2014 12:46 PM, Ian Campbell wrote:
> > On Tue, 2014-06-10 at 12:37 +0100, Julien Grall wrote:
> >> On 06/10/2014 10:57 AM, Ian Campbell wrote:
> >>>      mask = SECOND_MASK;
> >>>      second = map_domain_page(pte.p2m.base);
> >>>      pte = second[second_table_offset(paddr)];
> >>> -    if ( !pte.p2m.valid || !pte.p2m.table )
> >>> +    if ( !p2m_table(pte) )
> >>>          goto done;
> >>>  
> >>>      mask = THIRD_MASK;
> >>> @@ -156,11 +161,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
> >>>      pte = third[third_table_offset(paddr)];
> >>>  
> >>>      /* This bit must be one in the level 3 entry */
> >>> -    if ( !pte.p2m.table )
> >>> +    if ( !p2m_table(pte) )
> >>>          pte.bits = 0;
> >>>  
> >>>  done:
> >>> -    if ( pte.p2m.valid )
> >>> +    if ( p2m_valid(pte) )
> >>
> >> Regardless the current check, I think this should be p2m_entry(pte) to
> >> help code comprehension.
> >>
> >> Indeed, the can only get the address if the pte is pointed to a memory
> >> block.
> > 
> > Yes, but an L3 PTE has the table bit set, which would make
> > p2m_entry(pte) false...
> 
> Hmmm... right. But this bit (ie table bit) doesn't have the same meaning
> on L3. Your comment on p2m_table is confusing.

You mean: /* Remember: L3 entries set the table bit! */ ?

That was intended to be a comment on the two helpers which follow. Do
you have an idea how I could make it clearer? Perhaps appending "So
these two functions will return the opposite to what you expect for L3
ptes"?

> 
> Anyway:
> 
> Acked-by: Julien Grall <julien.grall@linaro.org>
> 

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte
  2014-06-10 12:29         ` Ian Campbell
@ 2014-06-10 12:52           ` Julien Grall
  2014-06-10 15:19             ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-10 12:52 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 06/10/2014 01:29 PM, Ian Campbell wrote:
> On Tue, 2014-06-10 at 12:54 +0100, Julien Grall wrote:
>> On 06/10/2014 12:46 PM, Ian Campbell wrote:
>>> On Tue, 2014-06-10 at 12:37 +0100, Julien Grall wrote:
>>>> On 06/10/2014 10:57 AM, Ian Campbell wrote:
>>>>>      mask = SECOND_MASK;
>>>>>      second = map_domain_page(pte.p2m.base);
>>>>>      pte = second[second_table_offset(paddr)];
>>>>> -    if ( !pte.p2m.valid || !pte.p2m.table )
>>>>> +    if ( !p2m_table(pte) )
>>>>>          goto done;
>>>>>  
>>>>>      mask = THIRD_MASK;
>>>>> @@ -156,11 +161,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
>>>>>      pte = third[third_table_offset(paddr)];
>>>>>  
>>>>>      /* This bit must be one in the level 3 entry */
>>>>> -    if ( !pte.p2m.table )
>>>>> +    if ( !p2m_table(pte) )
>>>>>          pte.bits = 0;
>>>>>  
>>>>>  done:
>>>>> -    if ( pte.p2m.valid )
>>>>> +    if ( p2m_valid(pte) )
>>>>
>>>> Regardless the current check, I think this should be p2m_entry(pte) to
>>>> help code comprehension.
>>>>
>>>> Indeed, the can only get the address if the pte is pointed to a memory
>>>> block.
>>>
>>> Yes, but an L3 PTE has the table bit set, which would make
>>> p2m_entry(pte) false...
>>
>> Hmmm... right. But this bit (ie table bit) doesn't have the same meaning
>> on L3. Your comment on p2m_table is confusing.
> 
> You mean: /* Remember: L3 entries set the table bit! */ ?
> 
> That was intended to be a comment on the two helpers which follow. Do
> you have an idea how I could make it clearer? Perhaps appending "So
> these two functions will return the opposite to what you expect for L3
> ptes"?

I was thinking something like "These two functions are only valid for L1
and L2 ptes".

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] tools: arm: allocate superpages to guests.
  2014-06-10 11:23   ` Julien Grall
@ 2014-06-10 15:16     ` Ian Campbell
  2014-06-11 11:50       ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10 15:16 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-10 at 12:23 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 06/10/2014 10:57 AM, Ian Campbell wrote:
> > Tries to allocate as many large (1G or 2M) pages as it can to the guest and tries
> > to align to the next larger size on each attempt so that we can attempt to
> > allocate even larger pages on the next iteration (this is currently only
> > exercised via a compile time debug option, which is left in place under a #if
> > 0).
> > 
> > Since ARM page tables are consistent at each level there is a common helper
> > function which tries to allocate a levels worth of pages. The exception to this
> > consistency is level 0 which does not support table mappings (0.5TB
> > superpages!).
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > ---
> >  tools/libxc/xc_dom_arm.c |  103 +++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 92 insertions(+), 11 deletions(-)
> > 
> > diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
> > index 75f8363..da68ec3 100644
> > --- a/tools/libxc/xc_dom_arm.c
> > +++ b/tools/libxc/xc_dom_arm.c
> > @@ -30,6 +30,13 @@
> >  #define CONSOLE_PFN_OFFSET 0
> >  #define XENSTORE_PFN_OFFSET 1
> >  
> > +#define LPAE_SHIFT 9
> > +
> > +#define PFN_4K_SHIFT  (0)
> > +#define PFN_2M_SHIFT  (PFN_4K_SHIFT+LPAE_SHIFT)
> > +#define PFN_1G_SHIFT  (PFN_2M_SHIFT+LPAE_SHIFT)
> > +#define PFN_512G_SHIFT (PFN_1G_SHIFT+LPAE_SHIFT)
> > +
> >  /* get guest IO ABI protocol */
> >  const char *xc_domain_get_native_protocol(xc_interface *xch,
> >                                            uint32_t domid)
> > @@ -249,11 +256,57 @@ static int set_mode(xc_interface *xch, domid_t domid, char *guest_type)
> >      return rc;
> >  }
> >  
> > +#define min_t(type,x,y) \
> > +        ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
> 
> min_t is also defined in xc_dom_decompress_unsafe_xz.c. It might be
> interesting to define it in a common header (such as xc_private.h).

Yes, good idea. I'll add a precursor patch.
> 
> > +/*  >0: success, *nr_pfns set to number actually populated
> > + *   0: didn't try with this pfn shift (e.g. misaligned base etc)
> > + *  <0: ERROR
> > + */
> > +static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
> > +                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns)
> > +{
> > +    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
> > +    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
> > +    int nr, i, count = min_t(int, 1024*1024, *nr_pfns >> pfn_shift);
> 
> Stupid question, where does come from the 1024*1024?

It was in the original as a backstop on allocsz. It would correspond to
4GB worth of 4K page I suppose

> > +    xen_pfn_t extents[count];
> 
> If I follow the count definition, it's possible to allocate 1024*1024
> xen_pfn_t (about 8MB) on the stack.

userspace stack isn't all that precious but 8MB does seem a little
excessive. Previously this was using the already allocated host p2m so
it wasn't an issue, but that doesn't work for allocations of page >4K.

The tradeoff is that smaller batches mean it will take longer.

I don't think it would be unreasonable to reduce this to be e.g. 1GB
worth of entries (256*1024) or 2MB of stack.

> 
> [..]
> 
> >  static int populate_guest_memory(struct xc_dom_image *dom,
> >                                   xen_pfn_t base_pfn, xen_pfn_t nr_pfns)
> >  {
> > -    int rc;
> > +    int rc = 0;
> >      xen_pfn_t allocsz, pfn;
> > +    int debug_iters = 0;
> >  
> >      DOMPRINTF("%s: populating RAM @ %016"PRIx64"-%016"PRIx64" (%"PRId64"MB)",
> >                __FUNCTION__,
> > @@ -261,21 +314,49 @@ static int populate_guest_memory(struct xc_dom_image *dom,
> >                (uint64_t)(base_pfn + nr_pfns) << XC_PAGE_SHIFT,
> >                (uint64_t)nr_pfns >> (20-XC_PAGE_SHIFT));
> >  
> > -    for ( pfn = 0; pfn < nr_pfns; pfn++ )
> > -        dom->p2m_host[pfn] = base_pfn + pfn;
> > -
> > -    for ( pfn = rc = allocsz = 0; (pfn < nr_pfns) && !rc; pfn += allocsz )
> > +    for ( pfn = 0; pfn < nr_pfns; pfn += allocsz )
> >      {
> >          allocsz = nr_pfns - pfn;
> > -        if ( allocsz > 1024*1024 )
> > -            allocsz = 1024*1024;
> >  
> > -        rc = xc_domain_populate_physmap_exact(
> > -            dom->xch, dom->guest_domid, allocsz,
> > -            0, 0, &dom->p2m_host[pfn]);
> > +        if ( debug_iters++ > 4 )
> > +            abort();
> 
> IIRC, abort doesn't give any stack trace. I'm not sure if you intended
> to keep it in your patch. If so, I would add an error message.

I can't remember what I was doing here, but it is just debug so I'll
nuke it.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] xen: arm: only put_page for p2m operations which require it.
  2014-06-10 11:28   ` Julien Grall
@ 2014-06-10 15:18     ` Ian Campbell
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-06-10 15:18 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-10 at 12:28 +0100, Julien Grall wrote:
> You forgot to drop the put_page in apply_p2m_changes. So, now we have 2
> put_page rather than one.

Rebasing snafu meant it got removed in the final patch instead of here.
I'll fix.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte
  2014-06-10 12:52           ` Julien Grall
@ 2014-06-10 15:19             ` Ian Campbell
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-06-10 15:19 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-10 at 13:52 +0100, Julien Grall wrote:
> >>>> Indeed, the can only get the address if the pte is pointed to a memory
> >>>> block.
> >>>
> >>> Yes, but an L3 PTE has the table bit set, which would make
> >>> p2m_entry(pte) false...
> >>
> >> Hmmm... right. But this bit (ie table bit) doesn't have the same meaning
> >> on L3. Your comment on p2m_table is confusing.
> > 
> > You mean: /* Remember: L3 entries set the table bit! */ ?
> > 
> > That was intended to be a comment on the two helpers which follow. Do
> > you have an idea how I could make it clearer? Perhaps appending "So
> > these two functions will return the opposite to what you expect for L3
> > ptes"?
> 
> I was thinking something like "These two functions are only valid for L1
> and L2 ptes".

I think I'll do some combination of both.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-10 12:10   ` Julien Grall
@ 2014-06-10 15:23     ` Ian Campbell
  2014-06-10 19:11       ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-10 15:23 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-10 at 13:10 +0100, Julien Grall wrote:
> Hi Ian,
> 
> This patch looks good in general. I've only few comments, see them below.
> 
> On 06/10/2014 10:57 AM, Ian Campbell wrote:
> > +void p2m_dump_info(struct domain *d)
> > +{
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +
> > +    spin_lock(&p2m->lock);
> > +    printk("p2m mappings for domain %d (vmid %d):\n",
> > +           d->domain_id, p2m->vmid);
> > +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
> > +    printk("  1G mappings: %d (shattered %d)\n",
> > +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
> > +    printk("  2M mappings: %d (shattered %d)\n",
> > +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
> > +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
> 
> I wondering if we can have more than 2^32-1 4K mapping sometimes...

That would be 16TB of memory, which isn't out of the question but we
don't currently support it. Still, not much harm in increase to a 64-bit
field.

> [..]
> 
> > +    else
> > +    {
> > +        clear_page(p);
> > +    }
> > +
> 
> Spurious {}.

I prefer them when on both halves of an else when one of them needs
them, but I guess I'm in a minority so I'll remove.

> > @@ -574,7 +803,7 @@ int guest_physmap_add_entry(struct domain *d,
> >  {
> >      return apply_p2m_changes(d, INSERT,
> >                               pfn_to_paddr(gpfn),
> > -                             pfn_to_paddr(gpfn + (1 << page_order)),
> > +                             pfn_to_paddr(gpfn + (1 << page_order)) - 1,
> 
> > @@ -584,7 +813,7 @@ void guest_physmap_remove_page(struct domain *d,
> >  {
> >      apply_p2m_changes(d, REMOVE,
> >                        pfn_to_paddr(gpfn),
> > -                      pfn_to_paddr(gpfn + (1<<page_order)),
> > +                      pfn_to_paddr(gpfn + (1<<page_order)) - 1,
> 
> Why did you add -1 on both function? This is inconsistent with
> p2m_populate_ram which use the range [start, end[.

I thihk I thought the that the callers of apply_p2m_changes were
inconsistent about this, but then I changed my mind and somehow failed
to remove this change.

> I think we should document how we pass the argument to apply_p2m_changes
> somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
> apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
> would avoid this kind of problem.

Arianna is sorting that all out in her mmio mapping series.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-10 15:23     ` Ian Campbell
@ 2014-06-10 19:11       ` Julien Grall
  2014-06-11  8:29         ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-10 19:11 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel



On 10/06/14 16:23, Ian Campbell wrote:
> On Tue, 2014-06-10 at 13:10 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> This patch looks good in general. I've only few comments, see them below.
>>
>> On 06/10/2014 10:57 AM, Ian Campbell wrote:
>>> +void p2m_dump_info(struct domain *d)
>>> +{
>>> +    struct p2m_domain *p2m = &d->arch.p2m;
>>> +
>>> +    spin_lock(&p2m->lock);
>>> +    printk("p2m mappings for domain %d (vmid %d):\n",
>>> +           d->domain_id, p2m->vmid);
>>> +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
>>> +    printk("  1G mappings: %d (shattered %d)\n",
>>> +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
>>> +    printk("  2M mappings: %d (shattered %d)\n",
>>> +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
>>> +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
>>
>> I wondering if we can have more than 2^32-1 4K mapping sometimes...
>
> That would be 16TB of memory, which isn't out of the question but we
> don't currently support it. Still, not much harm in increase to a 64-bit
> field.

Sounds good.

>> I think we should document how we pass the argument to apply_p2m_changes
>> somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
>> apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
>> would avoid this kind of problem.
>
> Arianna is sorting that all out in her mmio mapping series.

Oh right. Some of you will have to rebase his/her series on top of the 
other one :).

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-10 19:11       ` Julien Grall
@ 2014-06-11  8:29         ` Ian Campbell
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-06-11  8:29 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-10 at 20:11 +0100, Julien Grall wrote:
> 
> On 10/06/14 16:23, Ian Campbell wrote:
> > On Tue, 2014-06-10 at 13:10 +0100, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> This patch looks good in general. I've only few comments, see them below.
> >>
> >> On 06/10/2014 10:57 AM, Ian Campbell wrote:
> >>> +void p2m_dump_info(struct domain *d)
> >>> +{
> >>> +    struct p2m_domain *p2m = &d->arch.p2m;
> >>> +
> >>> +    spin_lock(&p2m->lock);
> >>> +    printk("p2m mappings for domain %d (vmid %d):\n",
> >>> +           d->domain_id, p2m->vmid);
> >>> +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
> >>> +    printk("  1G mappings: %d (shattered %d)\n",
> >>> +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
> >>> +    printk("  2M mappings: %d (shattered %d)\n",
> >>> +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
> >>> +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
> >>
> >> I wondering if we can have more than 2^32-1 4K mapping sometimes...
> >
> > That would be 16TB of memory, which isn't out of the question but we
> > don't currently support it. Still, not much harm in increase to a 64-bit
> > field.
> 
> Sounds good.

In the end I went with unsigned long which is the size of a pfn for
arm32 and arm64 and therefore not wasteful on arm32.

> >> I think we should document how we pass the argument to apply_p2m_changes
> >> somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
> >> apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
> >> would avoid this kind of problem.
> >
> > Arianna is sorting that all out in her mmio mapping series.
> 
> Oh right. Some of you will have to rebase his/her series on top of the 
> other one :).

Yes.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] tools: arm: allocate superpages to guests.
  2014-06-10 15:16     ` Ian Campbell
@ 2014-06-11 11:50       ` Julien Grall
  2014-06-11 11:55         ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2014-06-11 11:50 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 06/10/2014 04:16 PM, Ian Campbell wrote:
>>> +/*  >0: success, *nr_pfns set to number actually populated
>>> + *   0: didn't try with this pfn shift (e.g. misaligned base etc)
>>> + *  <0: ERROR
>>> + */
>>> +static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
>>> +                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns)
>>> +{
>>> +    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
>>> +    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
>>> +    int nr, i, count = min_t(int, 1024*1024, *nr_pfns >> pfn_shift);
>>
>> Stupid question, where does come from the 1024*1024?
> 
> It was in the original as a backstop on allocsz. It would correspond to
> 4GB worth of 4K page I suppose

Ah ok. I didn't pay attention about it.

> 
>>> +    xen_pfn_t extents[count];
>>
>> If I follow the count definition, it's possible to allocate 1024*1024
>> xen_pfn_t (about 8MB) on the stack.
> 
> userspace stack isn't all that precious but 8MB does seem a little
> excessive. Previously this was using the already allocated host p2m so
> it wasn't an issue, but that doesn't work for allocations of page >4K.
> 
> The tradeoff is that smaller batches mean it will take longer.
> 
> I don't think it would be unreasonable to reduce this to be e.g. 1GB
> worth of entries (256*1024) or 2MB of stack.

It seems the default stack size is 8MB, I'm wondering if we can have
some use case where this limit is smaller.

Is there any issue to allocate this variable with malloc?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] tools: arm: allocate superpages to guests.
  2014-06-11 11:50       ` Julien Grall
@ 2014-06-11 11:55         ` Ian Campbell
  2014-06-11 12:16           ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-06-11 11:55 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Wed, 2014-06-11 at 12:50 +0100, Julien Grall wrote:
> On 06/10/2014 04:16 PM, Ian Campbell wrote:
> >>> +/*  >0: success, *nr_pfns set to number actually populated
> >>> + *   0: didn't try with this pfn shift (e.g. misaligned base etc)
> >>> + *  <0: ERROR
> >>> + */
> >>> +static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
> >>> +                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns)
> >>> +{
> >>> +    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
> >>> +    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
> >>> +    int nr, i, count = min_t(int, 1024*1024, *nr_pfns >> pfn_shift);
> >>
> >> Stupid question, where does come from the 1024*1024?
> > 
> > It was in the original as a backstop on allocsz. It would correspond to
> > 4GB worth of 4K page I suppose
> 
> Ah ok. I didn't pay attention about it.
> 
> > 
> >>> +    xen_pfn_t extents[count];
> >>
> >> If I follow the count definition, it's possible to allocate 1024*1024
> >> xen_pfn_t (about 8MB) on the stack.
> > 
> > userspace stack isn't all that precious but 8MB does seem a little
> > excessive. Previously this was using the already allocated host p2m so
> > it wasn't an issue, but that doesn't work for allocations of page >4K.
> > 
> > The tradeoff is that smaller batches mean it will take longer.
> > 
> > I don't think it would be unreasonable to reduce this to be e.g. 1GB
> > worth of entries (256*1024) or 2MB of stack.
> 
> It seems the default stack size is 8MB, I'm wondering if we can have
> some use case where this limit is smaller.

I think we effectively assume it is larger than any amount we would
practically use.

> Is there any issue to allocate this variable with malloc?

Just more book keeping code really.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] tools: arm: allocate superpages to guests.
  2014-06-11 11:55         ` Ian Campbell
@ 2014-06-11 12:16           ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2014-06-11 12:16 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 06/11/2014 12:55 PM, Ian Campbell wrote:
> On Wed, 2014-06-11 at 12:50 +0100, Julien Grall wrote:
>> On 06/10/2014 04:16 PM, Ian Campbell wrote:
>>>>> +/*  >0: success, *nr_pfns set to number actually populated
>>>>> + *   0: didn't try with this pfn shift (e.g. misaligned base etc)
>>>>> + *  <0: ERROR
>>>>> + */
>>>>> +static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
>>>>> +                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns)
>>>>> +{
>>>>> +    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
>>>>> +    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
>>>>> +    int nr, i, count = min_t(int, 1024*1024, *nr_pfns >> pfn_shift);
>>>>
>>>> Stupid question, where does come from the 1024*1024?
>>>
>>> It was in the original as a backstop on allocsz. It would correspond to
>>> 4GB worth of 4K page I suppose
>>
>> Ah ok. I didn't pay attention about it.
>>
>>>
>>>>> +    xen_pfn_t extents[count];
>>>>
>>>> If I follow the count definition, it's possible to allocate 1024*1024
>>>> xen_pfn_t (about 8MB) on the stack.
>>>
>>> userspace stack isn't all that precious but 8MB does seem a little
>>> excessive. Previously this was using the already allocated host p2m so
>>> it wasn't an issue, but that doesn't work for allocations of page >4K.
>>>
>>> The tradeoff is that smaller batches mean it will take longer.
>>>
>>> I don't think it would be unreasonable to reduce this to be e.g. 1GB
>>> worth of entries (256*1024) or 2MB of stack.
>>
>> It seems the default stack size is 8MB, I'm wondering if we can have
>> some use case where this limit is smaller.
> 
> I think we effectively assume it is larger than any amount we would
> practically use.
> 
>> Is there any issue to allocate this variable with malloc?
> 
> Just more book keeping code really.

Looking to the code, it doesn't seem too difficult to add malloc and
handle error.

I would prefer to use the malloc rather the stack and therefore assuming
that's stack a quite big, even though this variable should never be too
big. It wouldn't be mad to have an OS using a 1M stack (or even less).

Anyway it seems that x86 libxc is using the same trick for superpage...

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-06-11 12:16 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-10  9:55 [PATCH 0/6] xen: arm: Use super pages in p2m Ian Campbell
2014-06-10  9:57 ` [PATCH 1/6] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
2014-06-10 11:08   ` Julien Grall
2014-06-10  9:57 ` [PATCH 2/6] tools: arm: allocate superpages to guests Ian Campbell
2014-06-10 11:23   ` Julien Grall
2014-06-10 15:16     ` Ian Campbell
2014-06-11 11:50       ` Julien Grall
2014-06-11 11:55         ` Ian Campbell
2014-06-11 12:16           ` Julien Grall
2014-06-10  9:57 ` [PATCH 3/6] xen: arm: only put_page for p2m operations which require it Ian Campbell
2014-06-10 11:28   ` Julien Grall
2014-06-10 15:18     ` Ian Campbell
2014-06-10  9:57 ` [PATCH 4/6] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
2014-06-10 11:35   ` Julien Grall
2014-06-10  9:57 ` [PATCH 5/6] xen: arm: add some helpers for assessing p2m pte Ian Campbell
2014-06-10 11:37   ` Julien Grall
2014-06-10 11:46     ` Ian Campbell
2014-06-10 11:54       ` Julien Grall
2014-06-10 12:29         ` Ian Campbell
2014-06-10 12:52           ` Julien Grall
2014-06-10 15:19             ` Ian Campbell
2014-06-10  9:57 ` [PATCH 6/6] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
2014-06-10 12:10   ` Julien Grall
2014-06-10 15:23     ` Ian Campbell
2014-06-10 19:11       ` Julien Grall
2014-06-11  8:29         ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.