All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] xen: arm: Use super pages in p2m
@ 2014-06-11 16:37 Ian Campbell
  2014-06-11 16:39 ` [PATCH v2 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
                   ` (8 more replies)
  0 siblings, 9 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, Tim Deegan, Stefano Stabellini

The following series adds support for superpage mappings (1G and 2M) in
the p2m for dom0 and domU.

At the moment it is not possible to allocate 1G regions to guests,
because this is gated on multipage_allocation_permitted() which
currently (incorrectly) returns false for dom0 on ARM. This is fixed by
Arianna's "arch/arm: domain build: let dom0 access I/O memory of mapped
devices" (which initialises iomem_caps for dom0).

On a 32-bit platform this causes the overhead of kernbench vs. native to
drop from 8-9% to 1.5-3% (it's basically the same for both dom0 and
domU).

I couldn't get my 64-bit platform to boot natively for some reason but
in absolute terms kernbench improved by ~5% (domU) to ~15% (dom0).

Changes in v2:
 - Juliens review comments
 - Added "xen: arm: allocate more than one bank for 1:1 domain 0 if 
   needed" (not strictly related...)

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
@ 2014-06-11 16:39 ` Ian Campbell
  2014-06-11 16:39 ` [PATCH v2 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:39 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Instead of looping over vcpus in arch_dump_domain_info

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
---
 xen/arch/arm/domain.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 04d0cd0..e494112 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -742,12 +742,6 @@ int domain_relinquish_resources(struct domain *d)
 
 void arch_dump_domain_info(struct domain *d)
 {
-    struct vcpu *v;
-
-    for_each_vcpu ( d, v )
-    {
-        gic_dump_info(v);
-    }
 }
 
 
@@ -770,6 +764,7 @@ long arch_do_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 void arch_dump_vcpu_info(struct vcpu *v)
 {
+    gic_dump_info(v);
 }
 
 void vcpu_mark_events_pending(struct vcpu *v)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/8] tools/libxc: pull min/max_t into xc_private.h
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
  2014-06-11 16:39 ` [PATCH v2 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
@ 2014-06-11 16:39 ` Ian Campbell
  2014-06-11 21:14   ` Julien Grall
  2014-06-11 16:39 ` [PATCH v2 3/8] tools: arm: allocate large pages to guests Ian Campbell
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:39 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

There are generally useful.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: New patch.
---
 tools/libxc/xc_dom_decompress_unsafe_xz.c |    5 -----
 tools/libxc/xc_private.h                  |    5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_decompress_unsafe_xz.c b/tools/libxc/xc_dom_decompress_unsafe_xz.c
index 2a32d40..6be1f89 100644
--- a/tools/libxc/xc_dom_decompress_unsafe_xz.c
+++ b/tools/libxc/xc_dom_decompress_unsafe_xz.c
@@ -40,11 +40,6 @@ static inline u32 le32_to_cpup(const u32 *p)
         (void) (&_x == &_y);            \
         _x < _y ? _x : _y; })
 
-#define min_t(type,x,y) \
-        ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
-#define max_t(type,x,y) \
-        ({ type __x = (x); type __y = (y); __x > __y ? __x: __y; })
-
 #define __force
 #define always_inline
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 670a82d..9a503ef 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -332,6 +332,11 @@ int xc_ffs16(uint16_t x);
 int xc_ffs32(uint32_t x);
 int xc_ffs64(uint64_t x);
 
+#define min_t(type,x,y) \
+        ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
+#define max_t(type,x,y) \
+        ({ type __x = (x); type __y = (y); __x > __y ? __x: __y; })
+
 #define DOMPRINTF(fmt, args...) xc_dom_printf(dom->xch, fmt, ## args)
 #define DOMPRINTF_CALLED(xch) xc_dom_printf((xch), "%s: called", __FUNCTION__)
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 3/8] tools: arm: allocate large pages to guests.
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
  2014-06-11 16:39 ` [PATCH v2 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
  2014-06-11 16:39 ` [PATCH v2 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
@ 2014-06-11 16:39 ` Ian Campbell
  2014-06-11 21:26   ` Julien Grall
  2014-06-11 16:40 ` [PATCH v2 4/8] xen: arm: only put_page for p2m operations which require it Ian Campbell
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:39 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Tries to allocate as many large (1G or 2M) pages as it can to the guest and tries
to align to the next larger size on each attempt so that we can attempt to
allocate even larger pages on the next iteration (this is currently only
exercised via a compile time debug option, which is left in place under a #if
0).

Since ARM page tables are consistent at each level there is a common helper
function which tries to allocate a levels worth of pages. The exception to this
consistency is level 0 which does not support table mappings (0.5TB
superpages!).

Previously we would allocate in batches of up to 4GB worth of pages (allocsz
clamped at 1024*1024 pages) however this would now require 8MB worth of start
for the extents array in populate_one_size. Reduce to just 256*1024 or 1GB
worth of pages (at level 3) or 2MB of stack.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: min_t defintion moved into earlier patch
    drop debug_iters
    allocate extents array explicitly instead of putting 8M on the stack.
    Handle OOM, xc_populate_physmap returns 0, which needs handling for L3.
---
 tools/libxc/xc_dom_arm.c |  119 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 105 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 75f8363..6351ead 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -30,6 +30,13 @@
 #define CONSOLE_PFN_OFFSET 0
 #define XENSTORE_PFN_OFFSET 1
 
+#define LPAE_SHIFT 9
+
+#define PFN_4K_SHIFT  (0)
+#define PFN_2M_SHIFT  (PFN_4K_SHIFT+LPAE_SHIFT)
+#define PFN_1G_SHIFT  (PFN_2M_SHIFT+LPAE_SHIFT)
+#define PFN_512G_SHIFT (PFN_1G_SHIFT+LPAE_SHIFT)
+
 /* get guest IO ABI protocol */
 const char *xc_domain_get_native_protocol(xc_interface *xch,
                                           uint32_t domid)
@@ -249,11 +256,60 @@ static int set_mode(xc_interface *xch, domid_t domid, char *guest_type)
     return rc;
 }
 
+/*  >0: success, *nr_pfns set to number actually populated
+ *   0: didn't try with this pfn shift (e.g. misaligned base etc)
+ *  <0: ERROR
+ */
+static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
+                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns,
+                             xen_pfn_t *extents)
+{
+    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
+    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
+    int nr, i, count = *nr_pfns >> pfn_shift;
+
+    /* No level zero super pages with current hardware */
+    if ( pfn_shift == PFN_512G_SHIFT )
+        return 0;
+
+    /* Nothing to allocate */
+    if ( !count )
+        return 0;
+
+    /* base is misaligned for this level */
+    if ( mask & base_pfn )
+        return 0;
+
+    /* align to the end of a super page at this level */
+    if ( count & next_mask )
+        count &= next_mask;
+
+    for ( i = 0 ; i < count ; i ++ )
+        extents[i] = base_pfn + (i<<pfn_shift);
+
+    nr = xc_domain_populate_physmap(dom->xch, dom->guest_domid, count,
+                                    pfn_shift, 0, extents);
+    if ( nr <= 0 ) return nr;
+    DOMPRINTF("%s: populated %#x/%#x entries with shift %d",
+              __FUNCTION__, nr, count, pfn_shift);
+
+    *nr_pfns = nr << pfn_shift;
+
+    return 1;
+}
+
 static int populate_guest_memory(struct xc_dom_image *dom,
                                  xen_pfn_t base_pfn, xen_pfn_t nr_pfns)
 {
-    int rc;
-    xen_pfn_t allocsz, pfn;
+    int rc = 0;
+    xen_pfn_t allocsz, pfn, *extents;
+
+    extents = calloc(1024*1024,sizeof(xen_pfn_t));
+    if ( extents == NULL )
+    {
+        DOMPRINTF("%s: Unable to allocate extent array", __FUNCTION__);
+        return -1;
+    }
 
     DOMPRINTF("%s: populating RAM @ %016"PRIx64"-%016"PRIx64" (%"PRId64"MB)",
               __FUNCTION__,
@@ -261,21 +317,56 @@ static int populate_guest_memory(struct xc_dom_image *dom,
               (uint64_t)(base_pfn + nr_pfns) << XC_PAGE_SHIFT,
               (uint64_t)nr_pfns >> (20-XC_PAGE_SHIFT));
 
-    for ( pfn = 0; pfn < nr_pfns; pfn++ )
-        dom->p2m_host[pfn] = base_pfn + pfn;
-
-    for ( pfn = rc = allocsz = 0; (pfn < nr_pfns) && !rc; pfn += allocsz )
+    for ( pfn = 0; pfn < nr_pfns; pfn += allocsz )
     {
-        allocsz = nr_pfns - pfn;
-        if ( allocsz > 1024*1024 )
-            allocsz = 1024*1024;
-
-        rc = xc_domain_populate_physmap_exact(
-            dom->xch, dom->guest_domid, allocsz,
-            0, 0, &dom->p2m_host[pfn]);
+        allocsz = min_t(int, 1024*1024, nr_pfns - pfn);
+#if 0 /* Enable this to exercise/debug the code which tries to realign
+       * to a superpage boundary, by misaligning at the start. */
+        if ( pfn == 0 )
+        {
+            allocsz = 1;
+            rc = populate_one_size(dom, PFN_4K_SHIFT,
+                                   base_pfn + pfn, &allocsz, extents);
+            if (rc < 0) break;
+            if (rc > 0) continue;
+            /* Failed to allocate a single page? */
+            break;
+        }
+#endif
+
+        rc = populate_one_size(dom, PFN_512G_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc > 0 ) continue;
+
+        rc = populate_one_size(dom, PFN_1G_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc > 0 ) continue;
+
+        rc = populate_one_size(dom, PFN_2M_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc > 0 ) continue;
+
+        rc = populate_one_size(dom, PFN_4K_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc == 0 )
+        {
+            DOMPRINTF("%s: Not enough RAM", __FUNCTION__);
+            errno = ENOMEM;
+            rc = -1;
+            goto out;
+        }
     }
 
-    return rc;
+    for ( pfn = 0; pfn < nr_pfns; pfn++ )
+        dom->p2m_host[pfn] = base_pfn + pfn;
+
+out:
+    free(extents);
+    return rc < 0 ? rc : 0;
 }
 
 int arch_setup_meminit(struct xc_dom_image *dom)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 4/8] xen: arm: only put_page for p2m operations which require it.
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (2 preceding siblings ...)
  2014-06-11 16:39 ` [PATCH v2 3/8] tools: arm: allocate large pages to guests Ian Campbell
@ 2014-06-11 16:40 ` Ian Campbell
  2014-06-11 21:31   ` Julien Grall
  2014-06-11 16:40 ` [PATCH v2 5/8] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:40 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

In particular do not do it for CACHEFLUSH.

INSERT, RELINQUISH and REMOVE should all put the page (if the current pte is
valid). ALLOCATE doesn't need to since it asserts the current PTE must be
invalid.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: actual move the put page instead of adding a second one.
---
 xen/arch/arm/p2m.c |   35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 9960e17..ea05b6d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -299,6 +299,23 @@ enum p2m_operation {
     CACHEFLUSH,
 };
 
+static void p2m_put_page(const lpae_t pte)
+{
+    /* TODO: Handle other p2m types
+     *
+     * It's safe to do the put_page here because page_alloc will
+     * flush the TLBs if the page is reallocated before the end of
+     * this loop.
+     */
+    if ( p2m_is_foreign(pte.p2m.type) )
+    {
+        unsigned long mfn = pte.p2m.base;
+
+        ASSERT(mfn_valid(mfn));
+        put_page(mfn_to_page(mfn));
+    }
+}
+
 static int apply_p2m_changes(struct domain *d,
                      enum p2m_operation op,
                      paddr_t start_gpaddr,
@@ -400,20 +417,6 @@ static int apply_p2m_changes(struct domain *d,
 
         flush |= pte.p2m.valid;
 
-        /* TODO: Handle other p2m type
-         *
-         * It's safe to do the put_page here because page_alloc will
-         * flush the TLBs if the page is reallocated before the end of
-         * this loop.
-         */
-        if ( pte.p2m.valid && p2m_is_foreign(pte.p2m.type) )
-        {
-            unsigned long mfn = pte.p2m.base;
-
-            ASSERT(mfn_valid(mfn));
-            put_page(mfn_to_page(mfn));
-        }
-
         switch (op) {
             case ALLOCATE:
                 {
@@ -436,6 +439,8 @@ static int apply_p2m_changes(struct domain *d,
                 break;
             case INSERT:
                 {
+                    if ( pte.p2m.valid )
+                        p2m_put_page(pte);
                     pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
                     p2m_write_pte(&third[third_table_offset(addr)],
                                   pte, flush_pt);
@@ -451,6 +456,8 @@ static int apply_p2m_changes(struct domain *d,
                         break;
                     }
 
+                    p2m_put_page(pte);
+
                     count += 0x10;
 
                     memset(&pte, 0x00, sizeof(pte));
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 5/8] xen: arm: handle superpage mappings in p2m_lookup
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (3 preceding siblings ...)
  2014-06-11 16:40 ` [PATCH v2 4/8] xen: arm: only put_page for p2m operations which require it Ian Campbell
@ 2014-06-11 16:40 ` Ian Campbell
  2014-06-11 16:40 ` [PATCH v2 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:40 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Currently none are actually created, but they will be shortly.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
---
 xen/arch/arm/p2m.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index ea05b6d..8a6d295 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -123,6 +123,7 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     struct p2m_domain *p2m = &d->arch.p2m;
     lpae_t pte, *first = NULL, *second = NULL, *third = NULL;
     paddr_t maddr = INVALID_PADDR;
+    paddr_t mask;
     p2m_type_t _t;
 
     /* Allow t to be NULL */
@@ -136,15 +137,21 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     if ( !first )
         goto err;
 
+    mask = FIRST_MASK;
     pte = first[first_table_offset(paddr)];
     if ( !pte.p2m.valid || !pte.p2m.table )
         goto done;
 
+    mask = SECOND_MASK;
     second = map_domain_page(pte.p2m.base);
     pte = second[second_table_offset(paddr)];
     if ( !pte.p2m.valid || !pte.p2m.table )
         goto done;
 
+    mask = THIRD_MASK;
+
+    BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
+
     third = map_domain_page(pte.p2m.base);
     pte = third[third_table_offset(paddr)];
 
@@ -156,7 +163,7 @@ done:
     if ( pte.p2m.valid )
     {
         ASSERT(pte.p2m.type != p2m_invalid);
-        maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr & ~PAGE_MASK);
+        maddr = (pte.bits & PADDR_MASK & mask) | (paddr & ~mask);
         *t = pte.p2m.type;
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 6/8] xen: arm: add some helpers for assessing p2m pte
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (4 preceding siblings ...)
  2014-06-11 16:40 ` [PATCH v2 5/8] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
@ 2014-06-11 16:40 ` Ian Campbell
  2014-06-11 21:39   ` Julien Grall
  2014-06-11 16:40 ` [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:40 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Not terribly helpful right now, since they aren't widely used, but makes future
patches easier to read.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: clarify common on p2m_{table,entry}
---
 xen/arch/arm/p2m.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 8a6d295..2a93ff9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -14,6 +14,13 @@
 #define P2M_FIRST_ORDER 1
 #define P2M_FIRST_ENTRIES (LPAE_ENTRIES<<P2M_FIRST_ORDER)
 
+#define p2m_valid(pte) ((pte).p2m.valid)
+/* These two can only be used on L0..L2 ptes because L3 mappings set
+ * the table bit and therefore these would return the opposite to what
+ * you would expect. */
+#define p2m_table(pte) (p2m_valid(pte) && (pte).p2m.table)
+#define p2m_entry(pte) (p2m_valid(pte) && !(pte).p2m.table)
+
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
@@ -139,13 +146,13 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
 
     mask = FIRST_MASK;
     pte = first[first_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
+    if ( !p2m_table(pte) )
         goto done;
 
     mask = SECOND_MASK;
     second = map_domain_page(pte.p2m.base);
     pte = second[second_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
+    if ( !p2m_table(pte) )
         goto done;
 
     mask = THIRD_MASK;
@@ -156,11 +163,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     pte = third[third_table_offset(paddr)];
 
     /* This bit must be one in the level 3 entry */
-    if ( !pte.p2m.table )
+    if ( !p2m_table(pte) )
         pte.bits = 0;
 
 done:
-    if ( pte.p2m.valid )
+    if ( p2m_valid(pte) )
     {
         ASSERT(pte.p2m.type != p2m_invalid);
         maddr = (pte.bits & PADDR_MASK & mask) | (paddr & ~mask);
@@ -367,7 +374,7 @@ static int apply_p2m_changes(struct domain *d,
             cur_first_page = p2m_first_level_index(addr);
         }
 
-        if ( !first[first_table_offset(addr)].p2m.valid )
+        if ( !p2m_valid(first[first_table_offset(addr)]) )
         {
             if ( !populate )
             {
@@ -384,7 +391,7 @@ static int apply_p2m_changes(struct domain *d,
             }
         }
 
-        BUG_ON(!first[first_table_offset(addr)].p2m.valid);
+        BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
 
         if ( cur_first_offset != first_table_offset(addr) )
         {
@@ -394,7 +401,7 @@ static int apply_p2m_changes(struct domain *d,
         }
         /* else: second already valid */
 
-        if ( !second[second_table_offset(addr)].p2m.valid )
+        if ( !p2m_valid(second[second_table_offset(addr)]) )
         {
             if ( !populate )
             {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (5 preceding siblings ...)
  2014-06-11 16:40 ` [PATCH v2 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
@ 2014-06-11 16:40 ` Ian Campbell
  2014-06-11 22:19   ` Julien Grall
  2014-06-12 13:57   ` Stefano Stabellini
  2014-06-11 16:40 ` [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
  2014-06-12  7:20 ` [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
  8 siblings, 2 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:40 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This creates superpage (1G and 2M) mappings in the p2m for both domU and dom0
when the guest and machine addresses are suitably aligned. This relies on the
domain builder to allocate suitably sized regions, this is trivially true for
1:1 mapped dom0's and was arranged for guests in a previous patch. A non-1:1
dom0's memory is not currently deliberately aligned.

Since ARM pagetables are (mostly) consistent at each level this is implemented
by refactoring the handling of a single level of pagetable into a common
function. The two inconsistencies are that there are no superpage mappings at
level zero and that level three entry mappings must set the table bit.

When inserting new mappings the code shatters superpage mappings as necessary,
but currently makes no attempt to coalesce anything again. In particular when
inserting a mapping which could be a superpage over an existing table mapping
we do not attempt to free that lower page table and instead descend into it.

Some p2m statistics are added to keep track of the number of each level of
mapping and the number of times we've had to shatter an existing mapping.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: p2m stats as unsigned long, to cope with numbers of 4K mappings which are
    possible on both arm32 and arm64
    Fix "decend"
    Remove spurious {} around single line else clause.
    Drop spurious changes to apply_p2m_changes callers.
---
 xen/arch/arm/domain.c     |    1 +
 xen/arch/arm/p2m.c        |  477 ++++++++++++++++++++++++++++++++++-----------
 xen/include/asm-arm/p2m.h |   11 ++
 3 files changed, 371 insertions(+), 118 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index e494112..45b7cbd 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -742,6 +742,7 @@ int domain_relinquish_resources(struct domain *d)
 
 void arch_dump_domain_info(struct domain *d)
 {
+    p2m_dump_info(d);
 }
 
 
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 2a93ff9..af5bcde 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/lib.h>
+#include <xen/stdbool.h>
 #include <xen/errno.h>
 #include <xen/domain_page.h>
 #include <xen/bitops.h>
@@ -21,6 +22,22 @@
 #define p2m_table(pte) (p2m_valid(pte) && (pte).p2m.table)
 #define p2m_entry(pte) (p2m_valid(pte) && !(pte).p2m.table)
 
+void p2m_dump_info(struct domain *d)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+
+    spin_lock(&p2m->lock);
+    printk("p2m mappings for domain %d (vmid %d):\n",
+           d->domain_id, p2m->vmid);
+    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
+    printk("  1G mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[1], p2m->stats.shattered[1]);
+    printk("  2M mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[2], p2m->stats.shattered[2]);
+    printk("  4K mappings: %ld\n", p2m->stats.mappings[3]);
+    spin_unlock(&p2m->lock);
+}
+
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
@@ -276,15 +293,26 @@ static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool_t flush_cache)
         clean_xen_dcache(*p);
 }
 
-/* Allocate a new page table page and hook it in via the given entry */
-static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
+/*
+ * Allocate a new page table page and hook it in via the given entry.
+ * apply_one_level relies on this returning 0 on success
+ * and -ve on failure.
+ *
+ * If the exist entry is present then it must be a mapping and not a
+ * table and it will be shattered into the next level down.
+ *
+ * level_shift is the number of bits at the level we want to create.
+ */
+static int p2m_create_table(struct domain *d, lpae_t *entry,
+                            int level_shift, bool_t flush_cache)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
     struct page_info *page;
-    void *p;
+    lpae_t *p;
     lpae_t pte;
+    int splitting = entry->p2m.valid;
 
-    BUG_ON(entry->p2m.valid);
+    BUG_ON(entry->p2m.table);
 
     page = alloc_domheap_page(NULL, 0);
     if ( page == NULL )
@@ -293,9 +321,37 @@ static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
     page_list_add(page, &p2m->pages);
 
     p = __map_domain_page(page);
-    clear_page(p);
+    if ( splitting )
+    {
+        p2m_type_t t = entry->p2m.type;
+        unsigned long base_pfn = entry->p2m.base;
+        int i;
+
+        /*
+         * We are either splitting a first level 1G page into 512 second level
+         * 2M pages, or a second level 2M page into 512 third level 4K pages.
+         */
+         for ( i=0 ; i < LPAE_ENTRIES; i++ )
+         {
+             pte = mfn_to_p2m_entry(base_pfn + (i<<(level_shift-LPAE_SHIFT)),
+                                    MATTR_MEM, t);
+
+             /*
+              * First and second level super pages set p2m.table = 0, but
+              * third level entries set table = 1.
+              */
+             if ( level_shift - LPAE_SHIFT )
+                 pte.p2m.table = 0;
+
+             write_pte(&p[i], pte);
+         }
+    }
+    else
+        clear_page(p);
+
     if ( flush_cache )
         clean_xen_dcache_va_range(p, PAGE_SIZE);
+
     unmap_domain_page(p);
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
@@ -313,8 +369,14 @@ enum p2m_operation {
     CACHEFLUSH,
 };
 
-static void p2m_put_page(const lpae_t pte)
+/* Put any references on the single 4K page referenced by pte.  TODO:
+ * Handle superpages, for now we only take special references for leaf
+ * pages (specifically foreign ones, which can't be super mapped today).
+ */
+static void p2m_put_l3_page(const lpae_t pte)
 {
+    ASSERT(p2m_valid(pte));
+
     /* TODO: Handle other p2m types
      *
      * It's safe to do the put_page here because page_alloc will
@@ -330,6 +392,248 @@ static void p2m_put_page(const lpae_t pte)
     }
 }
 
+/*
+ * Returns true if start_gpaddr..end_gpaddr contains at least one
+ * suitably aligned level_size mappping of maddr.
+ *
+ * So long as the range is large enough the end_gpaddr need not be
+ * aligned (callers should create one superpage mapping based on this
+ * result and then call this again on the new range, eventually the
+ * slop at the end will cause this function to return false).
+ */
+static bool_t is_mapping_aligned(const paddr_t start_gpaddr,
+                                 const paddr_t end_gpaddr,
+                                 const paddr_t maddr,
+                                 const paddr_t level_size)
+{
+    const paddr_t level_mask = level_size - 1;
+
+    /* No hardware superpages at level 0 */
+    if ( level_size == ZEROETH_SIZE )
+        return false;
+
+    /*
+     * A range smaller than the size of a superpage at this level
+     * cannot be superpage aligned.
+     */
+    if ( ( end_gpaddr - start_gpaddr ) < level_size - 1 )
+        return false;
+
+    /* Both the gpaddr and maddr must be aligned */
+    if ( start_gpaddr & level_mask )
+        return false;
+    if ( maddr & level_mask )
+        return false;
+    return true;
+}
+
+#define P2M_ONE_DESCEND        0
+#define P2M_ONE_PROGRESS_NOP   0x1
+#define P2M_ONE_PROGRESS       0x10
+
+/*
+ * 0   == (P2M_ONE_DESCEND) continue to descend the tree
+ * +ve == (P2M_ONE_PROGRESS_*) handled at this level, continue, flush,
+ *        entry, addr and maddr updated.  Return value is an
+ *        indication of the amount of work done (for preemption).
+ * -ve == (-Exxx) error.
+ */
+static int apply_one_level(struct domain *d,
+                           lpae_t *entry,
+                           unsigned int level,
+                           bool_t flush_cache,
+                           enum p2m_operation op,
+                           paddr_t start_gpaddr,
+                           paddr_t end_gpaddr,
+                           paddr_t *addr,
+                           paddr_t *maddr,
+                           bool_t *flush,
+                           int mattr,
+                           p2m_type_t t)
+{
+    /* Helpers to lookup the properties of each level */
+    const paddr_t level_sizes[] =
+        { ZEROETH_SIZE, FIRST_SIZE, SECOND_SIZE, THIRD_SIZE };
+    const paddr_t level_masks[] =
+        { ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK };
+    const paddr_t level_shifts[] =
+        { ZEROETH_SHIFT, FIRST_SHIFT, SECOND_SHIFT, THIRD_SHIFT };
+    const paddr_t level_size = level_sizes[level];
+    const paddr_t level_mask = level_masks[level];
+    const paddr_t level_shift = level_shifts[level];
+
+    struct p2m_domain *p2m = &d->arch.p2m;
+    lpae_t pte;
+    const lpae_t orig_pte = *entry;
+    int rc;
+
+    BUG_ON(level > 3);
+
+    switch ( op )
+    {
+    case ALLOCATE:
+        ASSERT(level < 3 || !p2m_valid(orig_pte));
+        ASSERT(*maddr == 0);
+
+        if ( p2m_valid(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        if ( is_mapping_aligned(*addr, end_gpaddr, 0, level_size) )
+        {
+            struct page_info *page;
+
+            page = alloc_domheap_pages(d, level_shift - PAGE_SHIFT, 0);
+            if ( page )
+            {
+                pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
+                if ( level != 3 )
+                    pte.p2m.table = 0;
+                p2m_write_pte(entry, pte, flush_cache);
+                p2m->stats.mappings[level]++;
+                return P2M_ONE_PROGRESS;
+            }
+            else if ( level == 3 )
+                return -ENOMEM;
+        }
+
+        BUG_ON(level == 3); /* L3 is always superpage aligned */
+
+        /*
+         * If we get here then we failed to allocate a sufficiently
+         * large contiguous region for this level (which can't be
+         * L3). Create a page table and continue to descend so we try
+         * smaller allocations.
+         */
+        rc = p2m_create_table(d, entry, 0, flush_cache);
+        if ( rc < 0 )
+            return rc;
+
+        return P2M_ONE_DESCEND;
+
+    case INSERT:
+        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
+           /* We do not handle replacing an existing table with a superpage */
+             (level == 3 || !p2m_table(orig_pte)) )
+        {
+            /* New mapping is superpage aligned, make it */
+            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
+            if ( level < 3 )
+                pte.p2m.table = 0; /* Superpage entry */
+
+            p2m_write_pte(entry, pte, flush_cache);
+
+            *flush |= p2m_valid(orig_pte);
+
+            *addr += level_size;
+            *maddr += level_size;
+
+            if ( p2m_valid(orig_pte) )
+            {
+                /*
+                 * We can't currently get here for an existing table
+                 * mapping, since we don't handle replacing an
+                 * existing table with a superpage. If we did we would
+                 * need to handle freeing (and accounting) for the bit
+                 * of the p2m tree which we would be about to lop off.
+                 */
+                BUG_ON(level < 3 && p2m_table(orig_pte));
+                if ( level == 3 )
+                    p2m_put_l3_page(orig_pte);
+            }
+            else /* New mapping */
+                p2m->stats.mappings[level]++;
+
+            return P2M_ONE_PROGRESS;
+        }
+        else
+        {
+            /* New mapping is not superpage aligned, create a new table entry */
+            BUG_ON(level == 3); /* L3 is always superpage aligned */
+
+            /* Not present -> create table entry and descend */
+            if ( !p2m_valid(orig_pte) )
+            {
+                rc = p2m_create_table(d, entry, 0, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+                return P2M_ONE_DESCEND;
+            }
+
+            /* Existing superpage mapping -> shatter and descend */
+            if ( p2m_entry(orig_pte) )
+            {
+                *flush = true;
+                rc = p2m_create_table(d, entry,
+                                      level_shift - PAGE_SHIFT, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+
+                p2m->stats.shattered[level]++;
+                p2m->stats.mappings[level]--;
+                p2m->stats.mappings[level+1] += LPAE_ENTRIES;
+            } /* else: an existing table mapping -> descend */
+
+            BUG_ON(!entry->p2m.table);
+
+            return P2M_ONE_DESCEND;
+        }
+
+        break;
+
+    case RELINQUISH:
+    case REMOVE:
+        if ( !p2m_valid(orig_pte) )
+        {
+            /* Progress up to next boundary */
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        if ( level < 3 && p2m_table(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        *flush = true;
+
+        memset(&pte, 0x00, sizeof(pte));
+        p2m_write_pte(entry, pte, flush_cache);
+
+        *addr += level_size;
+
+        p2m->stats.mappings[level]--;
+
+        if ( level == 3 )
+            p2m_put_l3_page(orig_pte);
+
+        /*
+         * This is still a single pte write, no matter the level, so no need to
+         * scale.
+         */
+        return P2M_ONE_PROGRESS;
+
+    case CACHEFLUSH:
+        if ( !p2m_valid(orig_pte) )
+        {
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        /*
+         * could flush up to the next boundary, but would need to be
+         * careful about preemption, so just do one page now and loop.
+         */
+        *addr += PAGE_SIZE;
+        if ( p2m_is_ram(orig_pte.p2m.type) )
+        {
+            flush_page_to_ram(orig_pte.p2m.base + third_table_offset(*addr));
+            return P2M_ONE_PROGRESS;
+        }
+        else
+            return P2M_ONE_PROGRESS_NOP;
+    }
+
+    BUG(); /* Should never get here */
+}
+
 static int apply_p2m_changes(struct domain *d,
                      enum p2m_operation op,
                      paddr_t start_gpaddr,
@@ -346,9 +650,7 @@ static int apply_p2m_changes(struct domain *d,
                   cur_first_offset = ~0,
                   cur_second_offset = ~0;
     unsigned long count = 0;
-    unsigned int flush = 0;
-    bool_t populate = (op == INSERT || op == ALLOCATE);
-    lpae_t pte;
+    bool_t flush = false;
     bool_t flush_pt;
 
     /* Some IOMMU don't support coherent PT walk. When the p2m is
@@ -362,6 +664,25 @@ static int apply_p2m_changes(struct domain *d,
     addr = start_gpaddr;
     while ( addr < end_gpaddr )
     {
+        /*
+         * Arbitrarily, preempt every 512 operations or 8192 nops.
+         * 512*P2M_ONE_PROGRESS == 8192*P2M_ONE_PROGRESS_NOP == 0x2000
+         *
+         * count is initialised to 0 above, so we are guaranteed to
+         * always make at least one pass.
+         */
+
+        if ( op == RELINQUISH && count >= 0x2000 )
+        {
+            if ( hypercall_preempt_check() )
+            {
+                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
+                rc = -ERESTART;
+                goto out;
+            }
+            count = 0;
+        }
+
         if ( cur_first_page != p2m_first_level_index(addr) )
         {
             if ( first ) unmap_domain_page(first);
@@ -374,22 +695,18 @@ static int apply_p2m_changes(struct domain *d,
             cur_first_page = p2m_first_level_index(addr);
         }
 
-        if ( !p2m_valid(first[first_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + FIRST_SIZE) & FIRST_MASK;
-                continue;
-            }
+        /* We only use a 3 level p2m at the moment, so no level 0,
+         * current hardware doesn't support super page mappings at
+         * level 0 anyway */
 
-            rc = p2m_create_table(d, &first[first_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 )
-            {
-                printk("p2m_populate_ram: L1 failed\n");
-                goto out;
-            }
-        }
+        rc = apply_one_level(d, &first[first_table_offset(addr)],
+                             1, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        count += rc;
+        if ( rc != P2M_ONE_DESCEND ) continue;
 
         BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
 
@@ -401,23 +718,16 @@ static int apply_p2m_changes(struct domain *d,
         }
         /* else: second already valid */
 
-        if ( !p2m_valid(second[second_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + SECOND_SIZE) & SECOND_MASK;
-                continue;
-            }
-
-            rc = p2m_create_table(d, &second[second_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 ) {
-                printk("p2m_populate_ram: L2 failed\n");
-                goto out;
-            }
-        }
+        rc = apply_one_level(d,&second[second_table_offset(addr)],
+                             2, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        count += rc;
+        if ( rc != P2M_ONE_DESCEND ) continue;
 
-        BUG_ON(!second[second_table_offset(addr)].p2m.valid);
+        BUG_ON(!p2m_valid(second[second_table_offset(addr)]));
 
         if ( cur_second_offset != second_table_offset(addr) )
         {
@@ -427,84 +737,15 @@ static int apply_p2m_changes(struct domain *d,
             cur_second_offset = second_table_offset(addr);
         }
 
-        pte = third[third_table_offset(addr)];
-
-        flush |= pte.p2m.valid;
-
-        switch (op) {
-            case ALLOCATE:
-                {
-                    /* Allocate a new RAM page and attach */
-                    struct page_info *page;
-
-                    ASSERT(!pte.p2m.valid);
-                    rc = -ENOMEM;
-                    page = alloc_domheap_page(d, 0);
-                    if ( page == NULL ) {
-                        printk("p2m_populate_ram: failed to allocate page\n");
-                        goto out;
-                    }
-
-                    pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
-
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                }
-                break;
-            case INSERT:
-                {
-                    if ( pte.p2m.valid )
-                        p2m_put_page(pte);
-                    pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    maddr += PAGE_SIZE;
-                }
-                break;
-            case RELINQUISH:
-            case REMOVE:
-                {
-                    if ( !pte.p2m.valid )
-                    {
-                        count++;
-                        break;
-                    }
-
-                    p2m_put_page(pte);
-
-                    count += 0x10;
-
-                    memset(&pte, 0x00, sizeof(pte));
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    count++;
-                }
-                break;
-
-            case CACHEFLUSH:
-                {
-                    if ( !pte.p2m.valid || !p2m_is_ram(pte.p2m.type) )
-                        break;
-
-                    flush_page_to_ram(pte.p2m.base);
-                }
-                break;
-        }
-
-        /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */
-        if ( op == RELINQUISH && count >= 0x2000 )
-        {
-            if ( hypercall_preempt_check() )
-            {
-                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
-                rc = -ERESTART;
-                goto out;
-            }
-            count = 0;
-        }
-
-        /* Got the next page */
-        addr += PAGE_SIZE;
+        rc = apply_one_level(d, &third[third_table_offset(addr)],
+                             3, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        /* L3 had better have done something! We cannot descend any further */
+        BUG_ON(rc == P2M_ONE_DESCEND);
+        count += rc;
     }
 
     if ( flush )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 911d32d..821d9ef 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -29,6 +29,14 @@ struct p2m_domain {
      * resume the search. Apart from during teardown this can only
      * decrease. */
     unsigned long lowest_mapped_gfn;
+
+    struct {
+        /* Number of mappings at each p2m tree level */
+        unsigned long mappings[4];
+        /* Number of times we have shattered a mapping
+         * at each p2m tree level. */
+        unsigned long shattered[4];
+    } stats;
 };
 
 /* List of possible type for each page in the p2m entry.
@@ -79,6 +87,9 @@ int p2m_alloc_table(struct domain *d);
 void p2m_save_state(struct vcpu *p);
 void p2m_restore_state(struct vcpu *n);
 
+/* */
+void p2m_dump_info(struct domain *d);
+
 /* Look up the MFN corresponding to a domain's PFN. */
 paddr_t p2m_lookup(struct domain *d, paddr_t gpfn, p2m_type_t *t);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (6 preceding siblings ...)
  2014-06-11 16:40 ` [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
@ 2014-06-11 16:40 ` Ian Campbell
  2014-06-11 22:47   ` Julien Grall
  2014-06-17 17:58   ` Julien Grall
  2014-06-12  7:20 ` [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
  8 siblings, 2 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-11 16:40 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Depending on where Xen and the initial modules were loaded into RAM we may not
be able to allocate a suitably contiguous block of memory for dom0. Especially
since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
1GB allocation must be at a 1GB boundary).

The alignment requirement in particular is a huge limitation, meaning that even
dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
careful with your load addresses. People were also having trouble with dom0 >
128MB on the 1GB cubieboard2 when using fairly standard load addresses for
things.

This patch tries to allocate multiple banks of memory in order to try and
satisfy the entire requested domain 0 allocation. Sadly this turns out to be
pretty tricky to arrange (see the large comment in the code).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
---
v2: New patch
---
 xen/arch/arm/domain_build.c |  278 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 256 insertions(+), 22 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 46a3619..f37127f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -44,6 +44,13 @@ custom_param("dom0_mem", parse_dom0_mem);
 # define DPRINT(fmt, args...) do {} while ( 0 )
 #endif
 
+//#define DEBUG_11_ALLOCATION
+#ifdef DEBUG_11_ALLOCATION
+# define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
+#else
+# define D11PRINT(fmt, args...) do {} while ( 0 )
+#endif
+
 /*
  * Amount of extra space required to dom0's device tree.  No new nodes
  * are added (yet) but one terminating reserve map entry (16 bytes) is
@@ -66,39 +73,266 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
     return alloc_vcpu(dom0, 0, 0);
 }
 
-static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+static unsigned int get_11_allocation_size(paddr_t size)
 {
-    paddr_t start;
-    paddr_t size;
-    struct page_info *pg;
-    unsigned int order = get_order_from_bytes(dom0_mem);
-    int res;
-    paddr_t spfn;
+    /*
+     * get_order_from_bytes returns the order greater than or equal to
+     * the given size, but we need less than or equal. Adding one to
+     * the size pushes an evenly aligned size into the next order, so
+     * we can then unconditionally subtract 1 from the order which is
+     * returned.
+     */
+    return get_order_from_bytes(size + 1) - 1;
+}
 
-    if ( is_32bit_domain(d) )
-        pg = alloc_domheap_pages(d, order, MEMF_bits(32));
-    else
-        pg = alloc_domheap_pages(d, order, 0);
-    if ( !pg )
-        panic("Failed to allocate contiguous memory for dom0");
+/*
+ * Insert the given pages into a memory bank, banks are ordered by address.
+ *
+ * Returns false if the memory would be below bank 0.
+ */
+static bool_t insert_11_bank(struct domain *d,
+                             struct kernel_info *kinfo,
+                             struct page_info *pg,
+                             unsigned int order)
+{
+    int res, i;
+    paddr_t spfn;
+    paddr_t start, size;
 
     spfn = page_to_mfn(pg);
     start = pfn_to_paddr(spfn);
     size = pfn_to_paddr((1 << order));
 
-    // 1:1 mapping
-    printk("Populate P2M %#"PRIx64"->%#"PRIx64" (1:1 mapping for dom0)\n",
-           start, start + size);
-    res = guest_physmap_add_page(d, spfn, spfn, order);
+    D11PRINT("Allocated %#"PRIpaddr"-%#"PRIpaddr" (%ldMB/%ldMB, order %d)\n",
+             start, start + size,
+             1UL << (order+PAGE_SHIFT-20),
+             /* Don't want format this as PRIpaddr (16 digit hex) */
+             (unsigned long)(kinfo->unassigned_mem >> 20),
+             order);
 
-    if ( res )
-        panic("Unable to add pages in DOM0: %d", res);
+    if ( kinfo->mem.nr_banks > 0 && start + size < kinfo->mem.bank[0].start )
+    {
+        D11PRINT("Allocation is below bank 0, not using\n");
+        free_domheap_pages(pg, order);
+        return false;
+    }
 
-    kinfo->mem.bank[0].start = start;
-    kinfo->mem.bank[0].size = size;
-    kinfo->mem.nr_banks = 1;
+    res = guest_physmap_add_page(d, spfn, spfn, order);
+    if ( res )
+        panic("Failed map pages to DOM0: %d", res);
 
     kinfo->unassigned_mem -= size;
+
+    if ( kinfo->mem.nr_banks == 0 )
+    {
+        kinfo->mem.bank[0].start = start;
+        kinfo->mem.bank[0].size = size;
+        kinfo->mem.nr_banks = 1;
+        return true;
+    }
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        struct membank *bank = &kinfo->mem.bank[i];
+
+        /* If possible merge new memory into the start of the bank */
+        if ( bank->start == start+size )
+        {
+            bank->start = start;
+            bank->size += size;
+            return true;
+        }
+
+        /* If possible merge new memory onto the end of the bank */
+        if ( start == bank->start + bank->size )
+        {
+            bank->size += size;
+            return true;
+        }
+
+        /*
+         * Otherwise if it is below this bank insert new memory in a
+         * new bank before this one. If there was a lower bank we
+         * could have inserted the memory into/before we would already
+         * have done so, so this must be the right place.
+         */
+        if ( start + size < bank->start && kinfo->mem.nr_banks < NR_MEM_BANKS )
+        {
+            memmove(bank + 1, bank, sizeof(*bank)*(kinfo->mem.nr_banks - i));
+            kinfo->mem.nr_banks++;
+            bank->start = start;
+            bank->size = size;
+            return true;
+        }
+    }
+
+    if ( i == kinfo->mem.nr_banks && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        struct membank *bank = &kinfo->mem.bank[kinfo->mem.nr_banks];
+
+        bank->start = start;
+        bank->size = size;
+        kinfo->mem.nr_banks++;
+        return true;
+    }
+
+    panic("Failed add pages to DOM0 memory bank");
+}
+
+/*
+ * This is all pretty horrible.
+ *
+ * Requirements:
+ *
+ * 1. The dom0 kernel should be loaded within the first 128MB of RAM. This
+ *    is necessary at least for Linux zImage kernels, which are all we
+ *    support today.
+ * 2. We want to put the dom0 kernel, ramdisk and DTB in the same
+ *    bank. Partly this is just easier for us to deal with, but also
+ *    the ramdisk and DTB must be placed within a certain proximity of
+ *    the kernel within RAM.
+ * 3. For 32-bit dom0 we want to place as much of the RAM as we
+ *    reasonably can below 4GB, so that it can be used by non-LPAE
+ *    enabled kernels.
+ * 4. For 32-bit dom0 the kernel must be located below 4GB.
+ * 5. We want to have a few largers banks rather than many smaller ones.
+ *
+ * For the first two requirements we need to make sure that the lowest
+ * bank is sufficiently large.
+ *
+ * For convenience we also sort the banks by physical address.
+ *
+ * The memory allocator does not really give us the flexibility to
+ * meet these requirements directly. So instead of proceed as follows:
+ *
+ * We first allocate the largest allocation we can as low as we
+ * can. This then becomes the first bank. This bank must be at least
+ * 128MB (or dom0_mem if that is smaller).
+ *
+ * Then we start allocating more memory, trying to allocate the
+ * largest possible size and trying smaller sizes until we
+ * successfully allocate something.
+ *
+ * We then try and insert this memory in to the list of banks. If it
+ * can be merged into an existing bank then this is trivial.
+ *
+ * If the new memory is before the first bank (and cannot be merged
+ * into it) then we give up. Since the allocator prefers to allocate
+ * high addresses first and the first bank has already been allocated
+ * to be as low as possible this likely means we wouldn't have been
+ * able to allocate much more memory anyway.
+ *
+ * Otherwise we insert a new bank. If we've reached MAX_NR_BANKS then
+ * we give up.
+ *
+ * For 32-bit domain we require that the initial allocation for the
+ * first bank is under 4G. Then for the subsequent allocations we
+ * initially allocate memory only from below 4GB. Once that runs out
+ * (as described above) we allow higher allocations and continue until
+ * that runs out (or we have allocated sufficient dom0 memory).
+ */
+static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+{
+    const unsigned int min_low_order =
+        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
+    const unsigned int min_order = get_order_from_bytes(MB(4));
+    struct page_info *pg;
+    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
+    int i;
+
+    bool_t lowmem = is_32bit_domain(d);
+    unsigned int bits;
+
+    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
+           /* Don't want format this as PRIpaddr (16 digit hex) */
+           (unsigned long)(kinfo->unassigned_mem >> 20));
+
+    kinfo->mem.nr_banks = 0;
+
+    /*
+     * First try and allocate the largest thing we can as low as
+     * possible to be bank 0.
+     */
+    while ( order > min_low_order )
+    {
+        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
+        {
+            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
+            if ( pg != NULL )
+                goto got_bank0;
+        }
+        order--;
+    }
+
+    panic("Unable to allocate first memory bank");
+
+ got_bank0:
+
+    if ( !insert_11_bank(d, kinfo, pg, order) )
+        BUG(); /* Cannot fail for first bank */
+
+    /* Now allocate more memory and fill in additional banks */
+
+    order = get_11_allocation_size(kinfo->unassigned_mem);
+    while ( kinfo->unassigned_mem && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        pg = alloc_domheap_pages(d, order, lowmem ? MEMF_bits(32) : 0);
+        if ( !pg )
+        {
+            order --;
+
+            if ( lowmem && order < min_low_order)
+            {
+                D11PRINT("Failed at min_low_order, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            if ( order >= min_order )
+                continue;
+
+            /* No more we can do */
+            break;
+        }
+
+        if ( !insert_11_bank(d, kinfo, pg, order) )
+        {
+            if ( lowmem )
+            {
+                D11PRINT("Allocation below bank 0, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            else
+            {
+                D11PRINT("Allocation below bank 0\n");
+                break;
+            }
+        }
+
+        /*
+         * Success, next time around try again to get the largest order
+         * allocation possible.
+         */
+        order = get_11_allocation_size(kinfo->unassigned_mem);
+    }
+
+    if ( kinfo->unassigned_mem )
+        printk("WARNING: Failed to allocate requested dom0 memory."
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               " %ldMB unallocated\n",
+               (unsigned long)kinfo->unassigned_mem >> 20);
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        printk("BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" (%ldMB)\n",
+               i,
+               kinfo->mem.bank[i].start,
+               kinfo->mem.bank[i].start + kinfo->mem.bank[i].size,
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               (unsigned long)(kinfo->mem.bank[i].size >> 20));
+    }
 }
 
 static void allocate_memory(struct domain *d, struct kernel_info *kinfo)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/8] tools/libxc: pull min/max_t into xc_private.h
  2014-06-11 16:39 ` [PATCH v2 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
@ 2014-06-11 21:14   ` Julien Grall
  0 siblings, 0 replies; 33+ messages in thread
From: Julien Grall @ 2014-06-11 21:14 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 11/06/14 17:39, Ian Campbell wrote:
> There are generally useful.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/8] tools: arm: allocate large pages to guests.
  2014-06-11 16:39 ` [PATCH v2 3/8] tools: arm: allocate large pages to guests Ian Campbell
@ 2014-06-11 21:26   ` Julien Grall
  2014-06-12  7:19     ` Ian Campbell
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-11 21:26 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 11/06/14 17:39, Ian Campbell wrote:
> Previously we would allocate in batches of up to 4GB worth of pages (allocsz
> clamped at 1024*1024 pages) however this would now require 8MB worth of start
> for the extents array in populate_one_size. Reduce to just 256*1024 or 1GB
> worth of pages (at level 3) or 2MB of stack.

I think you can drop this paragraph. You are using calloc rather than 
the stack in your patch.

> -    return rc;
> +    for ( pfn = 0; pfn < nr_pfns; pfn++ )
> +        dom->p2m_host[pfn] = base_pfn + pfn;
> +
> +out:
> +    free(extents);

Does free preserve errno? I can't find anything saying it... If so we 
may lose it when there is not enought memory.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/8] xen: arm: only put_page for p2m operations which require it.
  2014-06-11 16:40 ` [PATCH v2 4/8] xen: arm: only put_page for p2m operations which require it Ian Campbell
@ 2014-06-11 21:31   ` Julien Grall
  2014-06-12  7:27     ` Ian Campbell
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-11 21:31 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 11/06/14 17:40, Ian Campbell wrote:
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 9960e17..ea05b6d 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -299,6 +299,23 @@ enum p2m_operation {
>       CACHEFLUSH,
>   };
>
> +static void p2m_put_page(const lpae_t pte)

NIT: I'm wondering we static inline would be better here.

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 6/8] xen: arm: add some helpers for assessing p2m pte
  2014-06-11 16:40 ` [PATCH v2 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
@ 2014-06-11 21:39   ` Julien Grall
  2014-06-12  7:27     ` Ian Campbell
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-11 21:39 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 11/06/14 17:40, Ian Campbell wrote:
> Not terribly helpful right now, since they aren't widely used, but makes future
> patches easier to read.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
> v2: clarify common on p2m_{table,entry}
> ---
>   xen/arch/arm/p2m.c |   21 ++++++++++++++-------
>   1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 8a6d295..2a93ff9 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -14,6 +14,13 @@
>   #define P2M_FIRST_ORDER 1
>   #define P2M_FIRST_ENTRIES (LPAE_ENTRIES<<P2M_FIRST_ORDER)
>
> +#define p2m_valid(pte) ((pte).p2m.valid)
> +/* These two can only be used on L0..L2 ptes because L3 mappings set
> + * the table bit and therefore these would return the opposite to what
> + * you would expect. */
> +#define p2m_table(pte) (p2m_valid(pte) && (pte).p2m.table)
> +#define p2m_entry(pte) (p2m_valid(pte) && !(pte).p2m.table)

Sorry, I didn't spot it on the previous version. You are using twice pte 
here. Depending on how complex pte we may duplicate the operation 
(masking the address + dereference the table). I'm wondering if we need 
a temporary variable in both p2m_table and p2m_entry.

It seems that in your patch #7, you always use these 2 macros with 
non-complex variable. So I'm fine with one or the other way:

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-11 16:40 ` [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
@ 2014-06-11 22:19   ` Julien Grall
  2014-06-12  7:30     ` Ian Campbell
  2014-06-12 13:57   ` Stefano Stabellini
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-11 22:19 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

While I was looking closer to this patch I found something strange. Why 
all the callers of guest_physmap_add_page in the directory common don't 
check that the function success to create the mapping?

On 11/06/14 17:40, Ian Campbell wrote:
> +            page = alloc_domheap_pages(d, level_shift - PAGE_SHIFT, 0);
> +            if ( page )
> +            {
> +                pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
> +                if ( level != 3 )
> +                    pte.p2m.table = 0;
> +                p2m_write_pte(entry, pte, flush_cache);
> +                p2m->stats.mappings[level]++;

It looks like you forgot to update the *addr here.

Did you try this code path? With the 1:1 workaround this part is never used.

> +                return P2M_ONE_PROGRESS;
> +            }
> +            else if ( level == 3 )
> +                return -ENOMEM;
> +        }
> +
> +        BUG_ON(level == 3); /* L3 is always superpage aligned */

Did you mean page-aligned?

[..]

> +    case INSERT:
> +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
> +           /* We do not handle replacing an existing table with a superpage */
> +             (level == 3 || !p2m_table(orig_pte)) )
> +        {
> +            /* New mapping is superpage aligned, make it */
> +            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
> +            if ( level < 3 )

It's funny, sometimes you use level < 3 and some level != 3 (see in 
ALLOCATE).

I think this pte.p2m.table set can be handled directly in 
mfn_to_p2m_entry. This will avoid duplicating code.

[..]

> +        }
> +        else
> +        {
> +            /* New mapping is not superpage aligned, create a new table entry */
> +            BUG_ON(level == 3); /* L3 is always superpage aligned */

Did you mean page-aligned?

[..]

> -        /* Got the next page */
> -        addr += PAGE_SIZE;
> +        rc = apply_one_level(d, &third[third_table_offset(addr)],
> +                             3, flush_pt, op,
> +                             start_gpaddr, end_gpaddr,
> +                             &addr, &maddr, &flush,
> +                             mattr, t);
> +        if ( rc < 0 ) goto out;

Shall we redo the whole range if the mapping has failed here?

[..]

> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 911d32d..821d9ef 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h

[..]

> +/* */

Did you intend to add a comment here?

> +void p2m_dump_info(struct domain *d);
> +
>   /* Look up the MFN corresponding to a domain's PFN. */
>   paddr_t p2m_lookup(struct domain *d, paddr_t gpfn, p2m_type_t *t);
>
>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-11 16:40 ` [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
@ 2014-06-11 22:47   ` Julien Grall
  2014-06-12  7:31     ` Ian Campbell
  2014-06-17 17:58   ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-11 22:47 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 11/06/14 17:40, Ian Campbell wrote:
> +        bank->start = start;
> +        bank->size = size;
> +        kinfo->mem.nr_banks++;
> +        return true;
> +    }
> +
> +    panic("Failed add pages to DOM0 memory bank");

This can happen when there is no more bank, right? How likely will Xen 
fill the 8 banks?

Shall we just continue with only the current number of bank?

> +}
> +
> +/*
> + * This is all pretty horrible.
> + *
> + * Requirements:
> + *
> + * 1. The dom0 kernel should be loaded within the first 128MB of RAM. This
> + *    is necessary at least for Linux zImage kernels, which are all we
> + *    support today.
> + * 2. We want to put the dom0 kernel, ramdisk and DTB in the same
> + *    bank. Partly this is just easier for us to deal with, but also
> + *    the ramdisk and DTB must be placed within a certain proximity of
> + *    the kernel within RAM.
> + * 3. For 32-bit dom0 we want to place as much of the RAM as we
> + *    reasonably can below 4GB, so that it can be used by non-LPAE
> + *    enabled kernels.
> + * 4. For 32-bit dom0 the kernel must be located below 4GB.
> + * 5. We want to have a few largers banks rather than many smaller ones.
> + *
> + * For the first two requirements we need to make sure that the lowest
> + * bank is sufficiently large.
> + *
> + * For convenience we also sort the banks by physical address.

IIRC, some kernel version (< 3.15) is assuming that the memory bank are 
ordered.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/8] tools: arm: allocate large pages to guests.
  2014-06-11 21:26   ` Julien Grall
@ 2014-06-12  7:19     ` Ian Campbell
  2014-06-12  8:47       ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-12  7:19 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Wed, 2014-06-11 at 22:26 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 11/06/14 17:39, Ian Campbell wrote:
> > Previously we would allocate in batches of up to 4GB worth of pages (allocsz
> > clamped at 1024*1024 pages) however this would now require 8MB worth of start
> > for the extents array in populate_one_size. Reduce to just 256*1024 or 1GB
> > worth of pages (at level 3) or 2MB of stack.
> 
> I think you can drop this paragraph. You are using calloc rather than 
> the stack in your patch.

Oops, yes. I'll do that on commit unless I have another reason to
resend.

> > -    return rc;
> > +    for ( pfn = 0; pfn < nr_pfns; pfn++ )
> > +        dom->p2m_host[pfn] = base_pfn + pfn;
> > +
> > +out:
> > +    free(extents);
> 
> Does free preserve errno? I can't find anything saying it... If so we 
> may lose it when there is not enought memory.

I'm not 100% sure but I'm almost certain it is fine. free() returns void
and http://pubs.opengroup.org/onlinepubs/007908799/xsh/free.html says no
error codes are defined for it.

We don't preserve errno in numerous error paths.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/8] xen: arm: Use super pages in p2m
  2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (7 preceding siblings ...)
  2014-06-11 16:40 ` [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
@ 2014-06-12  7:20 ` Ian Campbell
  8 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-12  7:20 UTC (permalink / raw)
  To: xen-devel, Ian Jackson; +Cc: Julien Grall, Stefano Stabellini, Tim Deegan

Ian,

I should have CCd you on the tools elements of this series, sorry.

Patches 2 and 3 are the ones of interest.

Cheers,
Ian.

On Wed, 2014-06-11 at 17:37 +0100, Ian Campbell wrote:
> The following series adds support for superpage mappings (1G and 2M) in
> the p2m for dom0 and domU.
> 
> At the moment it is not possible to allocate 1G regions to guests,
> because this is gated on multipage_allocation_permitted() which
> currently (incorrectly) returns false for dom0 on ARM. This is fixed by
> Arianna's "arch/arm: domain build: let dom0 access I/O memory of mapped
> devices" (which initialises iomem_caps for dom0).
> 
> On a 32-bit platform this causes the overhead of kernbench vs. native to
> drop from 8-9% to 1.5-3% (it's basically the same for both dom0 and
> domU).
> 
> I couldn't get my 64-bit platform to boot natively for some reason but
> in absolute terms kernbench improved by ~5% (domU) to ~15% (dom0).
> 
> Changes in v2:
>  - Juliens review comments
>  - Added "xen: arm: allocate more than one bank for 1:1 domain 0 if 
>    needed" (not strictly related...)
> 
> Ian.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/8] xen: arm: only put_page for p2m operations which require it.
  2014-06-11 21:31   ` Julien Grall
@ 2014-06-12  7:27     ` Ian Campbell
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-12  7:27 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Wed, 2014-06-11 at 22:31 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 11/06/14 17:40, Ian Campbell wrote:
> > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> > index 9960e17..ea05b6d 100644
> > --- a/xen/arch/arm/p2m.c
> > +++ b/xen/arch/arm/p2m.c
> > @@ -299,6 +299,23 @@ enum p2m_operation {
> >       CACHEFLUSH,
> >   };
> >
> > +static void p2m_put_page(const lpae_t pte)
> 
> NIT: I'm wondering we static inline would be better here.

I trust the compiler to do what it thinks is best here.

> 
> Acked-by: Julien Grall <julien.grall@linaro.org>
> 
> Regards,
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 6/8] xen: arm: add some helpers for assessing p2m pte
  2014-06-11 21:39   ` Julien Grall
@ 2014-06-12  7:27     ` Ian Campbell
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-12  7:27 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Wed, 2014-06-11 at 22:39 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 11/06/14 17:40, Ian Campbell wrote:
> > Not terribly helpful right now, since they aren't widely used, but makes future
> > patches easier to read.
> >
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > ---
> > v2: clarify common on p2m_{table,entry}
> > ---
> >   xen/arch/arm/p2m.c |   21 ++++++++++++++-------
> >   1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> > index 8a6d295..2a93ff9 100644
> > --- a/xen/arch/arm/p2m.c
> > +++ b/xen/arch/arm/p2m.c
> > @@ -14,6 +14,13 @@
> >   #define P2M_FIRST_ORDER 1
> >   #define P2M_FIRST_ENTRIES (LPAE_ENTRIES<<P2M_FIRST_ORDER)
> >
> > +#define p2m_valid(pte) ((pte).p2m.valid)
> > +/* These two can only be used on L0..L2 ptes because L3 mappings set
> > + * the table bit and therefore these would return the opposite to what
> > + * you would expect. */
> > +#define p2m_table(pte) (p2m_valid(pte) && (pte).p2m.table)
> > +#define p2m_entry(pte) (p2m_valid(pte) && !(pte).p2m.table)
> 
> Sorry, I didn't spot it on the previous version. You are using twice pte 
> here. Depending on how complex pte we may duplicate the operation 
> (masking the address + dereference the table). I'm wondering if we need 
> a temporary variable in both p2m_table and p2m_entry.

A static function would be preferable to that I think.

> It seems that in your patch #7, you always use these 2 macros with 
> non-complex variable.

Yes, this was deliberate.

>  So I'm fine with one or the other way:
> 
> Acked-by: Julien Grall <julien.grall@linaro.org>
> 
> Regards,
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-11 22:19   ` Julien Grall
@ 2014-06-12  7:30     ` Ian Campbell
  2014-06-12  9:00       ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-12  7:30 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Wed, 2014-06-11 at 23:19 +0100, Julien Grall wrote:
> Hi Ian,
> 
> While I was looking closer to this patch I found something strange. Why 
> all the callers of guest_physmap_add_page in the directory common don't 
> check that the function success to create the mapping?

"directory common"? I don't get your meaning.

> On 11/06/14 17:40, Ian Campbell wrote:
> > +            page = alloc_domheap_pages(d, level_shift - PAGE_SHIFT, 0);
> > +            if ( page )
> > +            {
> > +                pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
> > +                if ( level != 3 )
> > +                    pte.p2m.table = 0;
> > +                p2m_write_pte(entry, pte, flush_cache);
> > +                p2m->stats.mappings[level]++;
> 
> It looks like you forgot to update the *addr here.
> 
> Did you try this code path? With the 1:1 workaround this part is never used.

Yes, this probably hasn't been executed. I'll see about forcing it.

> > +                return P2M_ONE_PROGRESS;
> > +            }
> > +            else if ( level == 3 )
> > +                return -ENOMEM;
> > +        }
> > +
> > +        BUG_ON(level == 3); /* L3 is always superpage aligned */
> 
> Did you mean page-aligned?

What I meant was that an L3 entry is always "superpage aligned" because
it is the smallest possible element. Since I wrote that I renamed my
is_superpage_aligned function to is_mapping_aligned. I should perhaps
update this comment to reflect that, which would make it clearer.

> [..]
> 
> > +    case INSERT:
> > +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
> > +           /* We do not handle replacing an existing table with a superpage */
> > +             (level == 3 || !p2m_table(orig_pte)) )
> > +        {
> > +            /* New mapping is superpage aligned, make it */
> > +            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
> > +            if ( level < 3 )
> 
> It's funny, sometimes you use level < 3 and some level != 3 (see in 
> ALLOCATE).

True.

> I think this pte.p2m.table set can be handled directly in 
> mfn_to_p2m_entry. This will avoid duplicating code.

It can't because mfn_to_p2m_entry is used to create both table and
mapping style entries.
> 
> [..]
> 
> > +        }
> > +        else
> > +        {
> > +            /* New mapping is not superpage aligned, create a new table entry */
> > +            BUG_ON(level == 3); /* L3 is always superpage aligned */
> 
> Did you mean page-aligned?
> 
> [..]
> 
> > -        /* Got the next page */
> > -        addr += PAGE_SIZE;
> > +        rc = apply_one_level(d, &third[third_table_offset(addr)],
> > +                             3, flush_pt, op,
> > +                             start_gpaddr, end_gpaddr,
> > +                             &addr, &maddr, &flush,
> > +                             mattr, t);
> > +        if ( rc < 0 ) goto out;
> 
> Shall we redo the whole range if the mapping has failed here?

s/redo/undo/?

> > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> > index 911d32d..821d9ef 100644
> > --- a/xen/include/asm-arm/p2m.h
> > +++ b/xen/include/asm-arm/p2m.h
> 
> [..]
> 
> > +/* */
> 
> Did you intend to add a comment here?

I guess one isn't really needed.

> > +void p2m_dump_info(struct domain *d);
> > +
> >   /* Look up the MFN corresponding to a domain's PFN. */
> >   paddr_t p2m_lookup(struct domain *d, paddr_t gpfn, p2m_type_t *t);
> >
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-11 22:47   ` Julien Grall
@ 2014-06-12  7:31     ` Ian Campbell
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-12  7:31 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Wed, 2014-06-11 at 23:47 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 11/06/14 17:40, Ian Campbell wrote:
> > +        bank->start = start;
> > +        bank->size = size;
> > +        kinfo->mem.nr_banks++;
> > +        return true;
> > +    }
> > +
> > +    panic("Failed add pages to DOM0 memory bank");
> 
> This can happen when there is no more bank, right? How likely will Xen 
> fill the 8 banks?
> 
> Shall we just continue with only the current number of bank?

That was my intention, I think I just got confused about the
circumstances which could lead to particular failure.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/8] tools: arm: allocate large pages to guests.
  2014-06-12  7:19     ` Ian Campbell
@ 2014-06-12  8:47       ` Julien Grall
  0 siblings, 0 replies; 33+ messages in thread
From: Julien Grall @ 2014-06-12  8:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel



On 12/06/14 08:19, Ian Campbell wrote:
> On Wed, 2014-06-11 at 22:26 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> On 11/06/14 17:39, Ian Campbell wrote:
>>> Previously we would allocate in batches of up to 4GB worth of pages (allocsz
>>> clamped at 1024*1024 pages) however this would now require 8MB worth of start
>>> for the extents array in populate_one_size. Reduce to just 256*1024 or 1GB
>>> worth of pages (at level 3) or 2MB of stack.
>>
>> I think you can drop this paragraph. You are using calloc rather than
>> the stack in your patch.
>
> Oops, yes. I'll do that on commit unless I have another reason to
> resend.
>
>>> -    return rc;
>>> +    for ( pfn = 0; pfn < nr_pfns; pfn++ )
>>> +        dom->p2m_host[pfn] = base_pfn + pfn;
>>> +
>>> +out:
>>> +    free(extents);
>>
>> Does free preserve errno? I can't find anything saying it... If so we
>> may lose it when there is not enought memory.
>
> I'm not 100% sure but I'm almost certain it is fine. free() returns void
> and http://pubs.opengroup.org/onlinepubs/007908799/xsh/free.html says no
> error codes are defined for it.

Thanks for the answer. Assuming that:

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-12  7:30     ` Ian Campbell
@ 2014-06-12  9:00       ` Julien Grall
  2014-06-13 14:08         ` Ian Campbell
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-12  9:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel



On 12/06/14 08:30, Ian Campbell wrote:
> On Wed, 2014-06-11 at 23:19 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> While I was looking closer to this patch I found something strange. Why
>> all the callers of guest_physmap_add_page in the directory common don't
>> check that the function success to create the mapping?
>
> "directory common"? I don't get your meaning.

Sorry, I meant xen/common/

>>> +                return P2M_ONE_PROGRESS;
>>> +            }
>>> +            else if ( level == 3 )
>>> +                return -ENOMEM;
>>> +        }
>>> +
>>> +        BUG_ON(level == 3); /* L3 is always superpage aligned */
>>
>> Did you mean page-aligned?
>
> What I meant was that an L3 entry is always "superpage aligned" because
> it is the smallest possible element. Since I wrote that I renamed my
> is_superpage_aligned function to is_mapping_aligned. I should perhaps
> update this comment to reflect that, which would make it clearer.

Oh ok. In my mind, L3 is using page alignment not superpage alignment. I 
think was confuse because in your cover letter you define superpage as 
2M or 1G mappings.

>> [..]
>>
>>> +    case INSERT:
>>> +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
>>> +           /* We do not handle replacing an existing table with a superpage */
>>> +             (level == 3 || !p2m_table(orig_pte)) )
>>> +        {
>>> +            /* New mapping is superpage aligned, make it */
>>> +            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
>>> +            if ( level < 3 )
>>
>> It's funny, sometimes you use level < 3 and some level != 3 (see in
>> ALLOCATE).
>
> True.
>
>> I think this pte.p2m.table set can be handled directly in
>> mfn_to_p2m_entry. This will avoid duplicating code.
>
> It can't because mfn_to_p2m_entry is used to create both table and
> mapping style entries.

There is only on call where we don't override the pte.p2m.table bit (the 
one at the end of p2m_create table).

I would move this extra test in the mfn_to_p2m_entry and override only 
for this specific case.

>> [..]
>>
>>> +        }
>>> +        else
>>> +        {
>>> +            /* New mapping is not superpage aligned, create a new table entry */
>>> +            BUG_ON(level == 3); /* L3 is always superpage aligned */
>>
>> Did you mean page-aligned?
>>
>> [..]
>>
>>> -        /* Got the next page */
>>> -        addr += PAGE_SIZE;
>>> +        rc = apply_one_level(d, &third[third_table_offset(addr)],
>>> +                             3, flush_pt, op,
>>> +                             start_gpaddr, end_gpaddr,
>>> +                             &addr, &maddr, &flush,
>>> +                             mattr, t);
>>> +        if ( rc < 0 ) goto out;
>>
>> Shall we redo the whole range if the mapping has failed here?
>
> s/redo/undo/?

Undo sorry.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-11 16:40 ` [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
  2014-06-11 22:19   ` Julien Grall
@ 2014-06-12 13:57   ` Stefano Stabellini
  2014-06-13 14:15     ` Ian Campbell
  1 sibling, 1 reply; 33+ messages in thread
From: Stefano Stabellini @ 2014-06-12 13:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, julien.grall, xen-devel

On Wed, 11 Jun 2014, Ian Campbell wrote:
> +static bool_t is_mapping_aligned(const paddr_t start_gpaddr,
> +                                 const paddr_t end_gpaddr,
> +                                 const paddr_t maddr,
> +                                 const paddr_t level_size)
> +{
> +    const paddr_t level_mask = level_size - 1;
> +
> +    /* No hardware superpages at level 0 */
> +    if ( level_size == ZEROETH_SIZE )
> +        return false;
> +
> +    /*
> +     * A range smaller than the size of a superpage at this level
> +     * cannot be superpage aligned.
> +     */
> +    if ( ( end_gpaddr - start_gpaddr ) < level_size - 1 )
> +        return false;

Shouldn't this be
       if ( ( end_gpaddr - start_gpaddr ) < level_size )
?

> +    /* Both the gpaddr and maddr must be aligned */
> +    if ( start_gpaddr & level_mask )
> +        return false;
> +    if ( maddr & level_mask )
> +        return false;
> +    return true;
> +}
> +
> +#define P2M_ONE_DESCEND        0
> +#define P2M_ONE_PROGRESS_NOP   0x1
> +#define P2M_ONE_PROGRESS       0x10
> +
> +/*
> + * 0   == (P2M_ONE_DESCEND) continue to descend the tree
> + * +ve == (P2M_ONE_PROGRESS_*) handled at this level, continue, flush,
> + *        entry, addr and maddr updated.  Return value is an
> + *        indication of the amount of work done (for preemption).
> + * -ve == (-Exxx) error.
> + */
> +static int apply_one_level(struct domain *d,
> +                           lpae_t *entry,
> +                           unsigned int level,
> +                           bool_t flush_cache,
> +                           enum p2m_operation op,
> +                           paddr_t start_gpaddr,
> +                           paddr_t end_gpaddr,
> +                           paddr_t *addr,
> +                           paddr_t *maddr,
> +                           bool_t *flush,
> +                           int mattr,
> +                           p2m_type_t t)
> +{
> +    /* Helpers to lookup the properties of each level */
> +    const paddr_t level_sizes[] =
> +        { ZEROETH_SIZE, FIRST_SIZE, SECOND_SIZE, THIRD_SIZE };
> +    const paddr_t level_masks[] =
> +        { ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK };
> +    const paddr_t level_shifts[] =
> +        { ZEROETH_SHIFT, FIRST_SHIFT, SECOND_SHIFT, THIRD_SHIFT };
> +    const paddr_t level_size = level_sizes[level];
> +    const paddr_t level_mask = level_masks[level];
> +    const paddr_t level_shift = level_shifts[level];
> +
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    lpae_t pte;
> +    const lpae_t orig_pte = *entry;
> +    int rc;
> +
> +    BUG_ON(level > 3);
> +
> +    switch ( op )
> +    {
> +    case ALLOCATE:
> +        ASSERT(level < 3 || !p2m_valid(orig_pte));
> +        ASSERT(*maddr == 0);
> +
> +        if ( p2m_valid(orig_pte) )
> +            return P2M_ONE_DESCEND;
> +
> +        if ( is_mapping_aligned(*addr, end_gpaddr, 0, level_size) )
> +        {
> +            struct page_info *page;
> +
> +            page = alloc_domheap_pages(d, level_shift - PAGE_SHIFT, 0);
> +            if ( page )
> +            {
> +                pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
> +                if ( level != 3 )
> +                    pte.p2m.table = 0;
> +                p2m_write_pte(entry, pte, flush_cache);
> +                p2m->stats.mappings[level]++;
> +                return P2M_ONE_PROGRESS;
> +            }
> +            else if ( level == 3 )
> +                return -ENOMEM;
> +        }
> +
> +        BUG_ON(level == 3); /* L3 is always superpage aligned */
> +
> +        /*
> +         * If we get here then we failed to allocate a sufficiently
> +         * large contiguous region for this level (which can't be
> +         * L3). Create a page table and continue to descend so we try
> +         * smaller allocations.
> +         */
> +        rc = p2m_create_table(d, entry, 0, flush_cache);
> +        if ( rc < 0 )
> +            return rc;
> +
> +        return P2M_ONE_DESCEND;
> +
> +    case INSERT:

Shouldn't you add a

        if ( p2m_valid(orig_pte) )
            return P2M_ONE_DESCEND;

test here too?


> +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
> +           /* We do not handle replacing an existing table with a superpage */
> +             (level == 3 || !p2m_table(orig_pte)) )

Why the difference with ALLOCATE?


> +        {
> +            /* New mapping is superpage aligned, make it */
> +            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
> +            if ( level < 3 )
> +                pte.p2m.table = 0; /* Superpage entry */
> +
> +            p2m_write_pte(entry, pte, flush_cache);
> +
> +            *flush |= p2m_valid(orig_pte);
> +
> +            *addr += level_size;
> +            *maddr += level_size;
> +
> +            if ( p2m_valid(orig_pte) )
> +            {
> +                /*
> +                 * We can't currently get here for an existing table
> +                 * mapping, since we don't handle replacing an
> +                 * existing table with a superpage. If we did we would
> +                 * need to handle freeing (and accounting) for the bit
> +                 * of the p2m tree which we would be about to lop off.
> +                 */
> +                BUG_ON(level < 3 && p2m_table(orig_pte));
> +                if ( level == 3 )
> +                    p2m_put_l3_page(orig_pte);
> +            }
> +            else /* New mapping */
> +                p2m->stats.mappings[level]++;
> +
> +            return P2M_ONE_PROGRESS;
> +        }
> +        else
> +        {
> +            /* New mapping is not superpage aligned, create a new table entry */
> +            BUG_ON(level == 3); /* L3 is always superpage aligned */
> +
> +            /* Not present -> create table entry and descend */
> +            if ( !p2m_valid(orig_pte) )
> +            {
> +                rc = p2m_create_table(d, entry, 0, flush_cache);
> +                if ( rc < 0 )
> +                    return rc;
> +                return P2M_ONE_DESCEND;
> +            }
> +
> +            /* Existing superpage mapping -> shatter and descend */
> +            if ( p2m_entry(orig_pte) )
> +            {
> +                *flush = true;
> +                rc = p2m_create_table(d, entry,
> +                                      level_shift - PAGE_SHIFT, flush_cache);
> +                if ( rc < 0 )
> +                    return rc;
> +
> +                p2m->stats.shattered[level]++;
> +                p2m->stats.mappings[level]--;
> +                p2m->stats.mappings[level+1] += LPAE_ENTRIES;
> +            } /* else: an existing table mapping -> descend */
> +
> +            BUG_ON(!entry->p2m.table);
> +
> +            return P2M_ONE_DESCEND;
> +        }
> +
> +        break;
> +
> +    case RELINQUISH:
> +    case REMOVE:
> +        if ( !p2m_valid(orig_pte) )
> +        {
> +            /* Progress up to next boundary */
> +            *addr = (*addr + level_size) & level_mask;
> +            return P2M_ONE_PROGRESS_NOP;
> +        }

You might as well have a single !p2m_valid check before the switch


> +        if ( level < 3 && p2m_table(orig_pte) )
> +            return P2M_ONE_DESCEND;
> +
> +        *flush = true;
> +
> +        memset(&pte, 0x00, sizeof(pte));
> +        p2m_write_pte(entry, pte, flush_cache);
> +
> +        *addr += level_size;
> +
> +        p2m->stats.mappings[level]--;
> +
> +        if ( level == 3 )
> +            p2m_put_l3_page(orig_pte);
> +
> +        /*
> +         * This is still a single pte write, no matter the level, so no need to
> +         * scale.
> +         */
> +        return P2M_ONE_PROGRESS;
> +
> +    case CACHEFLUSH:
> +        if ( !p2m_valid(orig_pte) )
> +        {
> +            *addr = (*addr + level_size) & level_mask;
> +            return P2M_ONE_PROGRESS_NOP;
> +        }
> +
> +        /*
> +         * could flush up to the next boundary, but would need to be
> +         * careful about preemption, so just do one page now and loop.
> +         */
> +        *addr += PAGE_SIZE;
> +        if ( p2m_is_ram(orig_pte.p2m.type) )
> +        {
> +            flush_page_to_ram(orig_pte.p2m.base + third_table_offset(*addr));
> +            return P2M_ONE_PROGRESS;
> +        }
> +        else
> +            return P2M_ONE_PROGRESS_NOP;
> +    }
> +
> +    BUG(); /* Should never get here */
> +}
> +
>  static int apply_p2m_changes(struct domain *d,
>                       enum p2m_operation op,
>                       paddr_t start_gpaddr,
> @@ -346,9 +650,7 @@ static int apply_p2m_changes(struct domain *d,
>                    cur_first_offset = ~0,
>                    cur_second_offset = ~0;
>      unsigned long count = 0;
> -    unsigned int flush = 0;
> -    bool_t populate = (op == INSERT || op == ALLOCATE);
> -    lpae_t pte;
> +    bool_t flush = false;
>      bool_t flush_pt;
>  
>      /* Some IOMMU don't support coherent PT walk. When the p2m is
> @@ -362,6 +664,25 @@ static int apply_p2m_changes(struct domain *d,
>      addr = start_gpaddr;
>      while ( addr < end_gpaddr )
>      {
> +        /*
> +         * Arbitrarily, preempt every 512 operations or 8192 nops.
> +         * 512*P2M_ONE_PROGRESS == 8192*P2M_ONE_PROGRESS_NOP == 0x2000
> +         *
> +         * count is initialised to 0 above, so we are guaranteed to
> +         * always make at least one pass.
> +         */
> +
> +        if ( op == RELINQUISH && count >= 0x2000 )
> +        {
> +            if ( hypercall_preempt_check() )
> +            {
> +                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
> +                rc = -ERESTART;
> +                goto out;
> +            }
> +            count = 0;
> +        }
> +
>          if ( cur_first_page != p2m_first_level_index(addr) )
>          {
>              if ( first ) unmap_domain_page(first);
> @@ -374,22 +695,18 @@ static int apply_p2m_changes(struct domain *d,
>              cur_first_page = p2m_first_level_index(addr);
>          }
>  
> -        if ( !p2m_valid(first[first_table_offset(addr)]) )
> -        {
> -            if ( !populate )
> -            {
> -                addr = (addr + FIRST_SIZE) & FIRST_MASK;
> -                continue;
> -            }
> +        /* We only use a 3 level p2m at the moment, so no level 0,
> +         * current hardware doesn't support super page mappings at
> +         * level 0 anyway */
>  
> -            rc = p2m_create_table(d, &first[first_table_offset(addr)],
> -                                  flush_pt);
> -            if ( rc < 0 )
> -            {
> -                printk("p2m_populate_ram: L1 failed\n");
> -                goto out;
> -            }
> -        }
> +        rc = apply_one_level(d, &first[first_table_offset(addr)],
> +                             1, flush_pt, op,
> +                             start_gpaddr, end_gpaddr,
> +                             &addr, &maddr, &flush,
> +                             mattr, t);
> +        if ( rc < 0 ) goto out;
> +        count += rc;

Adding an rc to a counter looks a bit strange.


> +        if ( rc != P2M_ONE_DESCEND ) continue;
>  
>          BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
>  
> @@ -401,23 +718,16 @@ static int apply_p2m_changes(struct domain *d,
>          }
>          /* else: second already valid */
>  
> -        if ( !p2m_valid(second[second_table_offset(addr)]) )
> -        {
> -            if ( !populate )
> -            {
> -                addr = (addr + SECOND_SIZE) & SECOND_MASK;
> -                continue;
> -            }
> -
> -            rc = p2m_create_table(d, &second[second_table_offset(addr)],
> -                                  flush_pt);
> -            if ( rc < 0 ) {
> -                printk("p2m_populate_ram: L2 failed\n");
> -                goto out;
> -            }
> -        }
> +        rc = apply_one_level(d,&second[second_table_offset(addr)],
> +                             2, flush_pt, op,
> +                             start_gpaddr, end_gpaddr,
> +                             &addr, &maddr, &flush,
> +                             mattr, t);
> +        if ( rc < 0 ) goto out;
> +        count += rc;
> +        if ( rc != P2M_ONE_DESCEND ) continue;
>  
> -        BUG_ON(!second[second_table_offset(addr)].p2m.valid);
> +        BUG_ON(!p2m_valid(second[second_table_offset(addr)]));
>  
>          if ( cur_second_offset != second_table_offset(addr) )
>          {
> @@ -427,84 +737,15 @@ static int apply_p2m_changes(struct domain *d,
>              cur_second_offset = second_table_offset(addr);
>          }
>  
> -        pte = third[third_table_offset(addr)];
> -
> -        flush |= pte.p2m.valid;
> -
> -        switch (op) {
> -            case ALLOCATE:
> -                {
> -                    /* Allocate a new RAM page and attach */
> -                    struct page_info *page;
> -
> -                    ASSERT(!pte.p2m.valid);
> -                    rc = -ENOMEM;
> -                    page = alloc_domheap_page(d, 0);
> -                    if ( page == NULL ) {
> -                        printk("p2m_populate_ram: failed to allocate page\n");
> -                        goto out;
> -                    }
> -
> -                    pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
> -
> -                    p2m_write_pte(&third[third_table_offset(addr)],
> -                                  pte, flush_pt);
> -                }
> -                break;
> -            case INSERT:
> -                {
> -                    if ( pte.p2m.valid )
> -                        p2m_put_page(pte);
> -                    pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
> -                    p2m_write_pte(&third[third_table_offset(addr)],
> -                                  pte, flush_pt);
> -                    maddr += PAGE_SIZE;
> -                }
> -                break;
> -            case RELINQUISH:
> -            case REMOVE:
> -                {
> -                    if ( !pte.p2m.valid )
> -                    {
> -                        count++;
> -                        break;
> -                    }
> -
> -                    p2m_put_page(pte);
> -
> -                    count += 0x10;
> -
> -                    memset(&pte, 0x00, sizeof(pte));
> -                    p2m_write_pte(&third[third_table_offset(addr)],
> -                                  pte, flush_pt);
> -                    count++;
> -                }
> -                break;
> -
> -            case CACHEFLUSH:
> -                {
> -                    if ( !pte.p2m.valid || !p2m_is_ram(pte.p2m.type) )
> -                        break;
> -
> -                    flush_page_to_ram(pte.p2m.base);
> -                }
> -                break;
> -        }
> -
> -        /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */
> -        if ( op == RELINQUISH && count >= 0x2000 )
> -        {
> -            if ( hypercall_preempt_check() )
> -            {
> -                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
> -                rc = -ERESTART;
> -                goto out;
> -            }
> -            count = 0;
> -        }
> -
> -        /* Got the next page */
> -        addr += PAGE_SIZE;
> +        rc = apply_one_level(d, &third[third_table_offset(addr)],
> +                             3, flush_pt, op,
> +                             start_gpaddr, end_gpaddr,
> +                             &addr, &maddr, &flush,
> +                             mattr, t);
> +        if ( rc < 0 ) goto out;
> +        /* L3 had better have done something! We cannot descend any further */
> +        BUG_ON(rc == P2M_ONE_DESCEND);
> +        count += rc;
>      }
>  
>      if ( flush )
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 911d32d..821d9ef 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -29,6 +29,14 @@ struct p2m_domain {
>       * resume the search. Apart from during teardown this can only
>       * decrease. */
>      unsigned long lowest_mapped_gfn;
> +
> +    struct {
> +        /* Number of mappings at each p2m tree level */
> +        unsigned long mappings[4];
> +        /* Number of times we have shattered a mapping
> +         * at each p2m tree level. */
> +        unsigned long shattered[4];
> +    } stats;
>  };

Are you introducing stats.mappings just for statistic gathering?
If so, you should write it in the comment.


>  /* List of possible type for each page in the p2m entry.
> @@ -79,6 +87,9 @@ int p2m_alloc_table(struct domain *d);
>  void p2m_save_state(struct vcpu *p);
>  void p2m_restore_state(struct vcpu *n);
>  
> +/* */
> +void p2m_dump_info(struct domain *d);
> +
>  /* Look up the MFN corresponding to a domain's PFN. */
>  paddr_t p2m_lookup(struct domain *d, paddr_t gpfn, p2m_type_t *t);
>  
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-12  9:00       ` Julien Grall
@ 2014-06-13 14:08         ` Ian Campbell
  2014-06-13 17:45           ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-13 14:08 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Thu, 2014-06-12 at 10:00 +0100, Julien Grall wrote:
> 
> On 12/06/14 08:30, Ian Campbell wrote:
> > On Wed, 2014-06-11 at 23:19 +0100, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> While I was looking closer to this patch I found something strange. Why
> >> all the callers of guest_physmap_add_page in the directory common don't
> >> check that the function success to create the mapping?
> >
> > "directory common"? I don't get your meaning.
> 
> Sorry, I meant xen/common/

I don't know the answer then.


> >> [..]
> >>
> >>> +    case INSERT:
> >>> +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
> >>> +           /* We do not handle replacing an existing table with a superpage */
> >>> +             (level == 3 || !p2m_table(orig_pte)) )
> >>> +        {
> >>> +            /* New mapping is superpage aligned, make it */
> >>> +            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
> >>> +            if ( level < 3 )
> >>
> >> It's funny, sometimes you use level < 3 and some level != 3 (see in
> >> ALLOCATE).
> >
> > True.
> >
> >> I think this pte.p2m.table set can be handled directly in
> >> mfn_to_p2m_entry. This will avoid duplicating code.
> >
> > It can't because mfn_to_p2m_entry is used to create both table and
> > mapping style entries.
> 
> There is only on call where we don't override the pte.p2m.table bit (the 
> one at the end of p2m_create table).

All of those other places *conditionally* set table.

> I would move this extra test in the mfn_to_p2m_entry and override only 
> for this specific case.

mfn_to_p2m_entry doesn't know either the level or whether the intention
of the caller is to create a table or a mapping style entry.

> >>> -        /* Got the next page */
> >>> -        addr += PAGE_SIZE;
> >>> +        rc = apply_one_level(d, &third[third_table_offset(addr)],
> >>> +                             3, flush_pt, op,
> >>> +                             start_gpaddr, end_gpaddr,
> >>> +                             &addr, &maddr, &flush,
> >>> +                             mattr, t);
> >>> +        if ( rc < 0 ) goto out;
> >>
> >> Shall we redo the whole range if the mapping has failed here?
> >
> > s/redo/undo/?
> 
> Undo sorry.

I'm not sure, you could argue it either way I think. Is Arianna's series
dealing with this in some way?

Anyway, I think changing that behaviour would be a separate patch.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-12 13:57   ` Stefano Stabellini
@ 2014-06-13 14:15     ` Ian Campbell
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-13 14:15 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: julien.grall, tim, xen-devel

On Thu, 2014-06-12 at 14:57 +0100, Stefano Stabellini wrote:
> On Wed, 11 Jun 2014, Ian Campbell wrote:
> > +static bool_t is_mapping_aligned(const paddr_t start_gpaddr,
> > +                                 const paddr_t end_gpaddr,
> > +                                 const paddr_t maddr,
> > +                                 const paddr_t level_size)
> > +{
> > +    const paddr_t level_mask = level_size - 1;
> > +
> > +    /* No hardware superpages at level 0 */
> > +    if ( level_size == ZEROETH_SIZE )
> > +        return false;
> > +
> > +    /*
> > +     * A range smaller than the size of a superpage at this level
> > +     * cannot be superpage aligned.
> > +     */
> > +    if ( ( end_gpaddr - start_gpaddr ) < level_size - 1 )
> > +        return false;
> 
> Shouldn't this be
>        if ( ( end_gpaddr - start_gpaddr ) < level_size )
> ?

Perhaps. 

Some of the callers of the parent function use inclusive and some
exclusive end addresses, this helps handle that until it is fixed (I
think Arianna's series does?).

> > +        return P2M_ONE_DESCEND;
> > +
> > +    case INSERT:
> 
> Shouldn't you add a
> 
>         if ( p2m_valid(orig_pte) )
>             return P2M_ONE_DESCEND;
> 
> test here too?

No because we might be able to replace the table with a mapping. (We
don't actually handle that case now, but I was trying to avoid assuming
that we wouldn't)
> 
> 
> > +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
> > +           /* We do not handle replacing an existing table with a superpage */
> > +             (level == 3 || !p2m_table(orig_pte)) )
> 
> Why the difference with ALLOCATE?

On ALLOCATE everything is guaranteed to either be a table mapping or not
present, which isn't true for INSERT.

> > +    case RELINQUISH:
> > +    case REMOVE:
> > +        if ( !p2m_valid(orig_pte) )
> > +        {
> > +            /* Progress up to next boundary */
> > +            *addr = (*addr + level_size) & level_mask;
> > +            return P2M_ONE_PROGRESS_NOP;
> > +        }
> 
> You might as well have a single !p2m_valid check before the switch

That wouldn't make any sense in the INSERT/ALLOCATE cases, where we do
actually want to do something for the !valid case (i.e. make it valid).

> > @@ -374,22 +695,18 @@ static int apply_p2m_changes(struct domain *d,
> >              cur_first_page = p2m_first_level_index(addr);
> >          }
> >  
> > -        if ( !p2m_valid(first[first_table_offset(addr)]) )
> > -        {
> > -            if ( !populate )
> > -            {
> > -                addr = (addr + FIRST_SIZE) & FIRST_MASK;
> > -                continue;
> > -            }
> > +        /* We only use a 3 level p2m at the moment, so no level 0,
> > +         * current hardware doesn't support super page mappings at
> > +         * level 0 anyway */
> >  
> > -            rc = p2m_create_table(d, &first[first_table_offset(addr)],
> > -                                  flush_pt);
> > -            if ( rc < 0 )
> > -            {
> > -                printk("p2m_populate_ram: L1 failed\n");
> > -                goto out;
> > -            }
> > -        }
> > +        rc = apply_one_level(d, &first[first_table_offset(addr)],
> > +                             1, flush_pt, op,
> > +                             start_gpaddr, end_gpaddr,
> > +                             &addr, &maddr, &flush,
> > +                             mattr, t);
> > +        if ( rc < 0 ) goto out;
> > +        count += rc;
> 
> Adding an rc to a counter looks a bit strange.

The return semantics of apply_one_level are pretty clearly documented.
Would you like me to rename the variable to something else? If so then
what?

> > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> > index 911d32d..821d9ef 100644
> > --- a/xen/include/asm-arm/p2m.h
> > +++ b/xen/include/asm-arm/p2m.h
> > @@ -29,6 +29,14 @@ struct p2m_domain {
> >       * resume the search. Apart from during teardown this can only
> >       * decrease. */
> >      unsigned long lowest_mapped_gfn;
> > +
> > +    struct {
> > +        /* Number of mappings at each p2m tree level */
> > +        unsigned long mappings[4];
> > +        /* Number of times we have shattered a mapping
> > +         * at each p2m tree level. */
> > +        unsigned long shattered[4];
> > +    } stats;
> >  };
> 
> Are you introducing stats.mappings just for statistic gathering?
> If so, you should write it in the comment.

I thought the clue would be in the name, but, yes, the field named stats
is for gathering stats. Does it really need a comment?

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-13 14:08         ` Ian Campbell
@ 2014-06-13 17:45           ` Julien Grall
  2014-06-16  8:26             ` Jan Beulich
  2014-06-16  9:53             ` Ian Campbell
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2014-06-13 17:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Keir Fraser, stefano.stabellini, tim, Jan Beulich, xen-devel

Hi Ian,

On 13/06/14 15:08, Ian Campbell wrote:
> On Thu, 2014-06-12 at 10:00 +0100, Julien Grall wrote:
>>
>> On 12/06/14 08:30, Ian Campbell wrote:
>>> On Wed, 2014-06-11 at 23:19 +0100, Julien Grall wrote:
>>>> Hi Ian,
>>>>
>>>> While I was looking closer to this patch I found something strange. Why
>>>> all the callers of guest_physmap_add_page in the directory common don't
>>>> check that the function success to create the mapping?
>>>
>>> "directory common"? I don't get your meaning.
>>
>> Sorry, I meant xen/common/
>
> I don't know the answer then.

CC Jan and Keir. I hope one of them have a clue on this.

>
>>>> [..]
>>>>
>>>>> +    case INSERT:
>>>>> +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
>>>>> +           /* We do not handle replacing an existing table with a superpage */
>>>>> +             (level == 3 || !p2m_table(orig_pte)) )
>>>>> +        {
>>>>> +            /* New mapping is superpage aligned, make it */
>>>>> +            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
>>>>> +            if ( level < 3 )
>>>>
>>>> It's funny, sometimes you use level < 3 and some level != 3 (see in
>>>> ALLOCATE).
>>>
>>> True.
>>>
>>>> I think this pte.p2m.table set can be handled directly in
>>>> mfn_to_p2m_entry. This will avoid duplicating code.
>>>
>>> It can't because mfn_to_p2m_entry is used to create both table and
>>> mapping style entries.
>>
>> There is only on call where we don't override the pte.p2m.table bit (the
>> one at the end of p2m_create table).
>
> All of those other places *conditionally* set table.
>
>> I would move this extra test in the mfn_to_p2m_entry and override only
>> for this specific case.
>
> mfn_to_p2m_entry doesn't know either the level or whether the intention
> of the caller is to create a table or a mapping style entry.

AFAIU, you add/remove every callers of this function in this patch. 
Extending this function (or adding a new helper) would avoid to copy few 
time the same check.

>>>>> -        /* Got the next page */
>>>>> -        addr += PAGE_SIZE;
>>>>> +        rc = apply_one_level(d, &third[third_table_offset(addr)],
>>>>> +                             3, flush_pt, op,
>>>>> +                             start_gpaddr, end_gpaddr,
>>>>> +                             &addr, &maddr, &flush,
>>>>> +                             mattr, t);
>>>>> +        if ( rc < 0 ) goto out;
>>>>
>>>> Shall we redo the whole range if the mapping has failed here?
>>>
>>> s/redo/undo/?
>>
>> Undo sorry.
>
> I'm not sure, you could argue it either way I think. Is Arianna's series
> dealing with this in some way?

I think she handled it only for MMIO. I will have to look again into her 
series.

> Anyway, I think changing that behaviour would be a separate patch.

I didn't intend to ask you to undo the mapping in this patch, nor in 
this series.

I was just wondering what could happen if we fail to map the whole 
mapping. Hence, some callers of guest_physmap_add_page is not checking 
the return of this function. It seems that we may end up to partially 
map the range in guest and crash it.

Anyway,  I don't think it's too bad as the guest will likely fail later 
if we fail to map the region. The only issue is that this won't be 
trivial to find the source as we don't have any error message.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-13 17:45           ` Julien Grall
@ 2014-06-16  8:26             ` Jan Beulich
  2014-06-16  9:53             ` Ian Campbell
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2014-06-16  8:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: tim, xen-devel, Keir Fraser, Ian Campbell, stefano.stabellini

>>> On 13.06.14 at 19:45, <julien.grall@linaro.org> wrote:
> On 13/06/14 15:08, Ian Campbell wrote:
>> On Thu, 2014-06-12 at 10:00 +0100, Julien Grall wrote:
>>>
>>> On 12/06/14 08:30, Ian Campbell wrote:
>>>> On Wed, 2014-06-11 at 23:19 +0100, Julien Grall wrote:
>>>>> Hi Ian,
>>>>>
>>>>> While I was looking closer to this patch I found something strange. Why
>>>>> all the callers of guest_physmap_add_page in the directory common don't
>>>>> check that the function success to create the mapping?
>>>>
>>>> "directory common"? I don't get your meaning.
>>>
>>> Sorry, I meant xen/common/
>>
>> I don't know the answer then.
> 
> CC Jan and Keir. I hope one of them have a clue on this.

The answer is likely the usual one: When this code got added, not
enough care was taken to deal with possible errors. Adding error
handling where it's missing is always a welcome thing.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-13 17:45           ` Julien Grall
  2014-06-16  8:26             ` Jan Beulich
@ 2014-06-16  9:53             ` Ian Campbell
  1 sibling, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-16  9:53 UTC (permalink / raw)
  To: Julien Grall; +Cc: Keir Fraser, stefano.stabellini, tim, Jan Beulich, xen-devel

On Fri, 2014-06-13 at 18:45 +0100, Julien Grall wrote:

> >
> >>>> [..]
> >>>>
> >>>>> +    case INSERT:
> >>>>> +        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
> >>>>> +           /* We do not handle replacing an existing table with a superpage */
> >>>>> +             (level == 3 || !p2m_table(orig_pte)) )
> >>>>> +        {
> >>>>> +            /* New mapping is superpage aligned, make it */
> >>>>> +            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
> >>>>> +            if ( level < 3 )
> >>>>
> >>>> It's funny, sometimes you use level < 3 and some level != 3 (see in
> >>>> ALLOCATE).
> >>>
> >>> True.
> >>>
> >>>> I think this pte.p2m.table set can be handled directly in
> >>>> mfn_to_p2m_entry. This will avoid duplicating code.
> >>>
> >>> It can't because mfn_to_p2m_entry is used to create both table and
> >>> mapping style entries.
> >>
> >> There is only on call where we don't override the pte.p2m.table bit (the
> >> one at the end of p2m_create table).
> >
> > All of those other places *conditionally* set table.
> >
> >> I would move this extra test in the mfn_to_p2m_entry and override only
> >> for this specific case.
> >
> > mfn_to_p2m_entry doesn't know either the level or whether the intention
> > of the caller is to create a table or a mapping style entry.
> 
> AFAIU, you add/remove every callers of this function in this patch. 
> Extending this function (or adding a new helper) would avoid to copy few 
> time the same check.

I prefer it the way it is now.


> 
> >>>>> -        /* Got the next page */
> >>>>> -        addr += PAGE_SIZE;
> >>>>> +        rc = apply_one_level(d, &third[third_table_offset(addr)],
> >>>>> +                             3, flush_pt, op,
> >>>>> +                             start_gpaddr, end_gpaddr,
> >>>>> +                             &addr, &maddr, &flush,
> >>>>> +                             mattr, t);
> >>>>> +        if ( rc < 0 ) goto out;
> >>>>
> >>>> Shall we redo the whole range if the mapping has failed here?
> >>>
> >>> s/redo/undo/?
> >>
> >> Undo sorry.
> >
> > I'm not sure, you could argue it either way I think. Is Arianna's series
> > dealing with this in some way?
> 
> I think she handled it only for MMIO. I will have to look again into her 
> series.
> 
> > Anyway, I think changing that behaviour would be a separate patch.
> 
> I didn't intend to ask you to undo the mapping in this patch, nor in 
> this series.

Please make it unambiguously clear when you are asking a more general
question rather than suggesting a change to a patch. e.g. by using
phrases like "in a future patch".

> I was just wondering what could happen if we fail to map the whole 
> mapping. Hence, some callers of guest_physmap_add_page is not checking 
> the return of this function. It seems that we may end up to partially 
> map the range in guest and crash it.
> 
> Anyway,  I don't think it's too bad as the guest will likely fail later 
> if we fail to map the region. The only issue is that this won't be 
> trivial to find the source as we don't have any error message.

Right.

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-11 16:40 ` [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
  2014-06-11 22:47   ` Julien Grall
@ 2014-06-17 17:58   ` Julien Grall
  2014-06-18  8:27     ` Ian Campbell
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-17 17:58 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 06/11/2014 05:40 PM, Ian Campbell wrote:
> +    /*
> +     * First try and allocate the largest thing we can as low as
> +     * possible to be bank 0.
> +     */
> +    while ( order > min_low_order )
> +    {
> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> +        {
> +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
> +            if ( pg != NULL )
> +                goto got_bank0;
> +        }
> +        order--;
> +    }
> +
> +    panic("Unable to allocate first memory bank");

I gave a try to this patch in stand-alone on the versatile express and I
hit this panic.

Xen is trying to allocate 128Mb for the first bank. It was working
without this patch.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-17 17:58   ` Julien Grall
@ 2014-06-18  8:27     ` Ian Campbell
  2014-06-18  9:10       ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Campbell @ 2014-06-18  8:27 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Tue, 2014-06-17 at 18:58 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 06/11/2014 05:40 PM, Ian Campbell wrote:
> > +    /*
> > +     * First try and allocate the largest thing we can as low as
> > +     * possible to be bank 0.
> > +     */
> > +    while ( order > min_low_order )
> > +    {
> > +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> > +        {
> > +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
> > +            if ( pg != NULL )
> > +                goto got_bank0;
> > +        }
> > +        order--;
> > +    }
> > +
> > +    panic("Unable to allocate first memory bank");
> 
> I gave a try to this patch in stand-alone on the versatile express and I
> hit this panic.
> 
> Xen is trying to allocate 128Mb for the first bank. It was working
> without this patch.

What is your dom0_mem and how much ram does the system have?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-18  8:27     ` Ian Campbell
@ 2014-06-18  9:10       ` Julien Grall
  2014-06-18  9:36         ` Ian Campbell
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2014-06-18  9:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel



On 18/06/14 09:27, Ian Campbell wrote:
> On Tue, 2014-06-17 at 18:58 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> On 06/11/2014 05:40 PM, Ian Campbell wrote:
>>> +    /*
>>> +     * First try and allocate the largest thing we can as low as
>>> +     * possible to be bank 0.
>>> +     */
>>> +    while ( order > min_low_order )
>>> +    {
>>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
>>> +        {
>>> +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
>>> +            if ( pg != NULL )
>>> +                goto got_bank0;
>>> +        }
>>> +        order--;
>>> +    }
>>> +
>>> +    panic("Unable to allocate first memory bank");
>>
>> I gave a try to this patch in stand-alone on the versatile express and I
>> hit this panic.
>>
>> Xen is trying to allocate 128Mb for the first bank. It was working
>> without this patch.
> 
> What is your dom0_mem and how much ram does the system have?

I use the default value from Xen i.e 128MB. The platform has 1GB of RAM
(see below Xen log with early printk enabled).

- UART enabled -
- CPU 00000000 booting -
- Xen starting in Hyp mode -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080000000 - 000000009fffffff
(XEN) RAM: 00000000a0000000 - 00000000bfffffff
(XEN)
(XEN) MODULE[1]: 000000009fee6000 - 000000009feea000
(XEN) MODULE[2]: 00000000a0008000 - 00000000a033f458
(XEN)  RESVD[0]: 0000000081f00000 - 0000000081f04000
(XEN)  RESVD[1]: 000000009fee6000 - 000000009feea000
(XEN)
(XEN) Command line: noreboot sync_console console=dtuart dtuart=serial0
(XEN) Placing Xen at 0x00000000bfe00000-0x00000000c0000000
(XEN) Xen heap: 00000000b6000000-00000000be000000 (32768 pages)
(XEN) Dom heap: 229376 pages
(XEN) Domain heap initialised
(XEN) Looking for UART console serial0
Xen 4.5-unstable
(XEN) Xen version 4.5-unstable (julien@cam.xci-test.com) (arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) 4.8.3 20140106 (prerelease)) debug=y Wed Jun 18 09:52:52 BST 2014
(XEN) Latest ChangeSet: Wed Jun 11 17:40:04 2014 +0100 git:bdee6c6
(XEN) Console output is synchronous.
(XEN) Processor: 412fc0f1: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x1
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00001131:00011011
(XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 02010555
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
(XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
(XEN) Platform: VERSATILE EXPRESS
(XEN) Set SYS_FLAGS to 00000000bfe0004c (0020004c)
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
(XEN) Using generic timer at 24000 KHz
(XEN) GIC initialization:
(XEN)         gic_dist_addr=000000002c001000
(XEN)         gic_cpu_addr=000000002c002000
(XEN)         gic_hyp_addr=000000002c004000
(XEN)         gic_vcpu_addr=000000002c006000
(XEN)         gic_maintenance_irq=25
(XEN) GIC: 192 lines, 5 cpus, secure (IID 0200043b).
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) I/O virtualisation disabled
(XEN) Allocated console ring of 16 KiB.
(XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
(XEN) Bringing up CPU1
- CPU 00000001 booting -
- Xen starting in Hyp mode -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) CPU 1 booted.
(XEN) Brought up 2 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module 2
(XEN) Allocating 1:1 mappings totalling 128MB for dom0:
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Unable to allocate first memory bank
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-18  9:10       ` Julien Grall
@ 2014-06-18  9:36         ` Ian Campbell
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2014-06-18  9:36 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Wed, 2014-06-18 at 10:10 +0100, Julien Grall wrote:
> 
> On 18/06/14 09:27, Ian Campbell wrote:
> > On Tue, 2014-06-17 at 18:58 +0100, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> On 06/11/2014 05:40 PM, Ian Campbell wrote:
> >>> +    /*
> >>> +     * First try and allocate the largest thing we can as low as
> >>> +     * possible to be bank 0.
> >>> +     */
> >>> +    while ( order > min_low_order )
> >>> +    {
> >>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> >>> +        {
> >>> +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
> >>> +            if ( pg != NULL )
> >>> +                goto got_bank0;
> >>> +        }
> >>> +        order--;
> >>> +    }
> >>> +
> >>> +    panic("Unable to allocate first memory bank");
> >>
> >> I gave a try to this patch in stand-alone on the versatile express and I
> >> hit this panic.
> >>
> >> Xen is trying to allocate 128Mb for the first bank. It was working
> >> without this patch.
> > 
> > What is your dom0_mem and how much ram does the system have?
> 
> I use the default value from Xen i.e 128MB. The platform has 1GB of RAM
> (see below Xen log with early printk enabled).

Thanks, how strange. Perhaps I got min_low_order wrong, or that loop is
knackered in some way. I'll check.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-06-18  9:36 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11 16:37 [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell
2014-06-11 16:39 ` [PATCH v2 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
2014-06-11 16:39 ` [PATCH v2 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
2014-06-11 21:14   ` Julien Grall
2014-06-11 16:39 ` [PATCH v2 3/8] tools: arm: allocate large pages to guests Ian Campbell
2014-06-11 21:26   ` Julien Grall
2014-06-12  7:19     ` Ian Campbell
2014-06-12  8:47       ` Julien Grall
2014-06-11 16:40 ` [PATCH v2 4/8] xen: arm: only put_page for p2m operations which require it Ian Campbell
2014-06-11 21:31   ` Julien Grall
2014-06-12  7:27     ` Ian Campbell
2014-06-11 16:40 ` [PATCH v2 5/8] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
2014-06-11 16:40 ` [PATCH v2 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
2014-06-11 21:39   ` Julien Grall
2014-06-12  7:27     ` Ian Campbell
2014-06-11 16:40 ` [PATCH v2 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
2014-06-11 22:19   ` Julien Grall
2014-06-12  7:30     ` Ian Campbell
2014-06-12  9:00       ` Julien Grall
2014-06-13 14:08         ` Ian Campbell
2014-06-13 17:45           ` Julien Grall
2014-06-16  8:26             ` Jan Beulich
2014-06-16  9:53             ` Ian Campbell
2014-06-12 13:57   ` Stefano Stabellini
2014-06-13 14:15     ` Ian Campbell
2014-06-11 16:40 ` [PATCH v2 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
2014-06-11 22:47   ` Julien Grall
2014-06-12  7:31     ` Ian Campbell
2014-06-17 17:58   ` Julien Grall
2014-06-18  8:27     ` Ian Campbell
2014-06-18  9:10       ` Julien Grall
2014-06-18  9:36         ` Ian Campbell
2014-06-12  7:20 ` [PATCH v2 0/8] xen: arm: Use super pages in p2m Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.