All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] xen: arm: Use super pages in p2m
@ 2014-06-26 10:16 Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:16 UTC (permalink / raw)
  To: xen-devel, Ian Jackson; +Cc: Julien Grall, Tim Deegan, Stefano Stabellini

The following series adds support for superpage mappings (1G and 2M) in
the p2m for dom0 and domU.

At the moment it is not possible to allocate 1G regions to guests,
because this is gated on multipage_allocation_permitted() which
currently (incorrectly) returns false for dom0 on ARM. This is fixed by
Arianna's "arch/arm: domain build: let dom0 access I/O memory of mapped
devices" (which initialises iomem_caps for dom0).

On a 32-bit platform this causes the overhead of kernbench vs. native to
drop from 8-9% to 1.5-3% (it's basically the same for both dom0 and
domU).

I couldn't get my 64-bit platform to boot natively for some reason but
in absolute terms kernbench improved by ~5% (domU) to ~15% (dom0).

A   xen: arm: dump vcpu gic info in arch_dump_vcpu_info
At  tools/libxc: pull min/max_t into xc_private.h
At  tools: arm: allocate large pages to guests.
A   xen: arm: only put_page for p2m operations which require it.
A   xen: arm: handle superpage mappings in p2m_lookup
m   xen: arm: add some helpers for assessing p2m pte
m   xen: arm: use superpages in p2m when pages are suitably aligned
m   xen: arm: allocate more than one bank for 1:1 domain 0 if needed

A == ARM ack
t == Tools ACK still needed
m == non-trivial modifications this round

Ian.

The following changes since commit 1c9d2acad014e997771c09d75bc071db754d2f4b:

  QEMU_TAG update (2014-06-25 15:58:02 +0100)

are available in the git repository at:

  git://xenbits.xen.org/people/ianc/xen.git arm-p2m-superpages-v3

for you to fetch changes up to b9efd403d4107eeb40a5b9950204d68b2fd20563:

  xen: arm: allocate more than one bank for 1:1 domain 0 if needed (2014-06-26 11:13:26 +0100)

----------------------------------------------------------------
Ian Campbell (8):
      xen: arm: dump vcpu gic info in arch_dump_vcpu_info
      tools/libxc: pull min/max_t into xc_private.h
      tools: arm: allocate large pages to guests.
      xen: arm: only put_page for p2m operations which require it.
      xen: arm: handle superpage mappings in p2m_lookup
      xen: arm: add some helpers for assessing p2m pte
      xen: arm: use superpages in p2m when pages are suitably aligned
      xen: arm: allocate more than one bank for 1:1 domain 0 if needed

 tools/libxc/xc_dom_arm.c                  |  119 ++++++-
 tools/libxc/xc_dom_decompress_unsafe_xz.c |    5 -
 tools/libxc/xc_private.h                  |    5 +
 xen/arch/arm/domain.c                     |    8 +-
 xen/arch/arm/domain_build.c               |  286 +++++++++++++--
 xen/arch/arm/p2m.c                        |  544 ++++++++++++++++++++++-------
 xen/include/asm-arm/p2m.h                 |   12 +
 7 files changed, 798 insertions(+), 181 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Instead of looping over vcpus in arch_dump_domain_info

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
---
 xen/arch/arm/domain.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 437885d..702ddfb 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -742,12 +742,6 @@ int domain_relinquish_resources(struct domain *d)
 
 void arch_dump_domain_info(struct domain *d)
 {
-    struct vcpu *v;
-
-    for_each_vcpu ( d, v )
-    {
-        gic_dump_info(v);
-    }
 }
 
 
@@ -770,6 +764,7 @@ long arch_do_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 void arch_dump_vcpu_info(struct vcpu *v)
 {
+    gic_dump_info(v);
 }
 
 void vcpu_mark_events_pending(struct vcpu *v)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/8] tools/libxc: pull min/max_t into xc_private.h
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 10:34   ` Ian Jackson
  2014-06-26 10:17 ` [PATCH v3 3/8] tools: arm: allocate large pages to guests Ian Campbell
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Jackson, julien.grall, tim, Ian Campbell, stefano.stabellini

There are generally useful.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch.
---
 tools/libxc/xc_dom_decompress_unsafe_xz.c |    5 -----
 tools/libxc/xc_private.h                  |    5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_decompress_unsafe_xz.c b/tools/libxc/xc_dom_decompress_unsafe_xz.c
index 2a32d40..6be1f89 100644
--- a/tools/libxc/xc_dom_decompress_unsafe_xz.c
+++ b/tools/libxc/xc_dom_decompress_unsafe_xz.c
@@ -40,11 +40,6 @@ static inline u32 le32_to_cpup(const u32 *p)
         (void) (&_x == &_y);            \
         _x < _y ? _x : _y; })
 
-#define min_t(type,x,y) \
-        ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
-#define max_t(type,x,y) \
-        ({ type __x = (x); type __y = (y); __x > __y ? __x: __y; })
-
 #define __force
 #define always_inline
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index c7730f2..47a2801 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -349,6 +349,11 @@ int xc_ffs16(uint16_t x);
 int xc_ffs32(uint32_t x);
 int xc_ffs64(uint64_t x);
 
+#define min_t(type,x,y) \
+        ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
+#define max_t(type,x,y) \
+        ({ type __x = (x); type __y = (y); __x > __y ? __x: __y; })
+
 #define DOMPRINTF(fmt, args...) xc_dom_printf(dom->xch, fmt, ## args)
 #define DOMPRINTF_CALLED(xch) xc_dom_printf((xch), "%s: called", __FUNCTION__)
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/8] tools: arm: allocate large pages to guests.
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 4/8] xen: arm: only put_page for p2m operations which require it Ian Campbell
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Jackson, julien.grall, tim, Ian Campbell, stefano.stabellini

Tries to allocate as many large (1G or 2M) pages as it can to the guest and tries
to align to the next larger size on each attempt so that we can attempt to
allocate even larger pages on the next iteration (this is currently only
exercised via a compile time debug option, which is left in place under a #if
0).

Since ARM page tables are consistent at each level there is a common helper
function which tries to allocate a levels worth of pages. The exception to this
consistency is level 0 which does not support table mappings (0.5TB
superpages!).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
v3: Drop outdated paragraph from commit message
v2: min_t defintion moved into earlier patch
    drop debug_iters
    allocate extents array explicitly instead of putting 8M on the stack.
    Handle OOM, xc_populate_physmap returns 0, which needs handling for L3.
---
 tools/libxc/xc_dom_arm.c |  119 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 105 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index cc64363..c4db19b 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -30,6 +30,13 @@
 #define CONSOLE_PFN_OFFSET 0
 #define XENSTORE_PFN_OFFSET 1
 
+#define LPAE_SHIFT 9
+
+#define PFN_4K_SHIFT  (0)
+#define PFN_2M_SHIFT  (PFN_4K_SHIFT+LPAE_SHIFT)
+#define PFN_1G_SHIFT  (PFN_2M_SHIFT+LPAE_SHIFT)
+#define PFN_512G_SHIFT (PFN_1G_SHIFT+LPAE_SHIFT)
+
 /* get guest IO ABI protocol */
 const char *xc_domain_get_native_protocol(xc_interface *xch,
                                           uint32_t domid)
@@ -249,11 +256,60 @@ static int set_mode(xc_interface *xch, domid_t domid, char *guest_type)
     return rc;
 }
 
+/*  >0: success, *nr_pfns set to number actually populated
+ *   0: didn't try with this pfn shift (e.g. misaligned base etc)
+ *  <0: ERROR
+ */
+static int populate_one_size(struct xc_dom_image *dom, int pfn_shift,
+                             xen_pfn_t base_pfn, xen_pfn_t *nr_pfns,
+                             xen_pfn_t *extents)
+{
+    uint64_t mask = ((uint64_t)1<<(pfn_shift))-1;
+    uint64_t next_mask = ((uint64_t)1<<(LPAE_SHIFT))-1;
+    int nr, i, count = *nr_pfns >> pfn_shift;
+
+    /* No level zero super pages with current hardware */
+    if ( pfn_shift == PFN_512G_SHIFT )
+        return 0;
+
+    /* Nothing to allocate */
+    if ( !count )
+        return 0;
+
+    /* base is misaligned for this level */
+    if ( mask & base_pfn )
+        return 0;
+
+    /* align to the end of a super page at this level */
+    if ( count & next_mask )
+        count &= next_mask;
+
+    for ( i = 0 ; i < count ; i ++ )
+        extents[i] = base_pfn + (i<<pfn_shift);
+
+    nr = xc_domain_populate_physmap(dom->xch, dom->guest_domid, count,
+                                    pfn_shift, 0, extents);
+    if ( nr <= 0 ) return nr;
+    DOMPRINTF("%s: populated %#x/%#x entries with shift %d",
+              __FUNCTION__, nr, count, pfn_shift);
+
+    *nr_pfns = nr << pfn_shift;
+
+    return 1;
+}
+
 static int populate_guest_memory(struct xc_dom_image *dom,
                                  xen_pfn_t base_pfn, xen_pfn_t nr_pfns)
 {
-    int rc;
-    xen_pfn_t allocsz, pfn;
+    int rc = 0;
+    xen_pfn_t allocsz, pfn, *extents;
+
+    extents = calloc(1024*1024,sizeof(xen_pfn_t));
+    if ( extents == NULL )
+    {
+        DOMPRINTF("%s: Unable to allocate extent array", __FUNCTION__);
+        return -1;
+    }
 
     DOMPRINTF("%s: populating RAM @ %016"PRIx64"-%016"PRIx64" (%"PRId64"MB)",
               __FUNCTION__,
@@ -261,21 +317,56 @@ static int populate_guest_memory(struct xc_dom_image *dom,
               (uint64_t)(base_pfn + nr_pfns) << XC_PAGE_SHIFT,
               (uint64_t)nr_pfns >> (20-XC_PAGE_SHIFT));
 
-    for ( pfn = 0; pfn < nr_pfns; pfn++ )
-        dom->p2m_host[pfn] = base_pfn + pfn;
-
-    for ( pfn = rc = allocsz = 0; (pfn < nr_pfns) && !rc; pfn += allocsz )
+    for ( pfn = 0; pfn < nr_pfns; pfn += allocsz )
     {
-        allocsz = nr_pfns - pfn;
-        if ( allocsz > 1024*1024 )
-            allocsz = 1024*1024;
-
-        rc = xc_domain_populate_physmap_exact(
-            dom->xch, dom->guest_domid, allocsz,
-            0, 0, &dom->p2m_host[pfn]);
+        allocsz = min_t(int, 1024*1024, nr_pfns - pfn);
+#if 0 /* Enable this to exercise/debug the code which tries to realign
+       * to a superpage boundary, by misaligning at the start. */
+        if ( pfn == 0 )
+        {
+            allocsz = 1;
+            rc = populate_one_size(dom, PFN_4K_SHIFT,
+                                   base_pfn + pfn, &allocsz, extents);
+            if (rc < 0) break;
+            if (rc > 0) continue;
+            /* Failed to allocate a single page? */
+            break;
+        }
+#endif
+
+        rc = populate_one_size(dom, PFN_512G_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc > 0 ) continue;
+
+        rc = populate_one_size(dom, PFN_1G_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc > 0 ) continue;
+
+        rc = populate_one_size(dom, PFN_2M_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc > 0 ) continue;
+
+        rc = populate_one_size(dom, PFN_4K_SHIFT,
+                               base_pfn + pfn, &allocsz, extents);
+        if ( rc < 0 ) break;
+        if ( rc == 0 )
+        {
+            DOMPRINTF("%s: Not enough RAM", __FUNCTION__);
+            errno = ENOMEM;
+            rc = -1;
+            goto out;
+        }
     }
 
-    return rc;
+    for ( pfn = 0; pfn < nr_pfns; pfn++ )
+        dom->p2m_host[pfn] = base_pfn + pfn;
+
+out:
+    free(extents);
+    return rc < 0 ? rc : 0;
 }
 
 int arch_setup_meminit(struct xc_dom_image *dom)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/8] xen: arm: only put_page for p2m operations which require it.
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (2 preceding siblings ...)
  2014-06-26 10:17 ` [PATCH v3 3/8] tools: arm: allocate large pages to guests Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 5/8] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

In particular do not do it for CACHEFLUSH.

INSERT, RELINQUISH and REMOVE should all put the page (if the current pte is
valid). ALLOCATE doesn't need to since it asserts the current PTE must be
invalid.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

---
v2: actual move the put page instead of adding a second one.
---
 xen/arch/arm/p2m.c |   35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 9960e17..ea05b6d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -299,6 +299,23 @@ enum p2m_operation {
     CACHEFLUSH,
 };
 
+static void p2m_put_page(const lpae_t pte)
+{
+    /* TODO: Handle other p2m types
+     *
+     * It's safe to do the put_page here because page_alloc will
+     * flush the TLBs if the page is reallocated before the end of
+     * this loop.
+     */
+    if ( p2m_is_foreign(pte.p2m.type) )
+    {
+        unsigned long mfn = pte.p2m.base;
+
+        ASSERT(mfn_valid(mfn));
+        put_page(mfn_to_page(mfn));
+    }
+}
+
 static int apply_p2m_changes(struct domain *d,
                      enum p2m_operation op,
                      paddr_t start_gpaddr,
@@ -400,20 +417,6 @@ static int apply_p2m_changes(struct domain *d,
 
         flush |= pte.p2m.valid;
 
-        /* TODO: Handle other p2m type
-         *
-         * It's safe to do the put_page here because page_alloc will
-         * flush the TLBs if the page is reallocated before the end of
-         * this loop.
-         */
-        if ( pte.p2m.valid && p2m_is_foreign(pte.p2m.type) )
-        {
-            unsigned long mfn = pte.p2m.base;
-
-            ASSERT(mfn_valid(mfn));
-            put_page(mfn_to_page(mfn));
-        }
-
         switch (op) {
             case ALLOCATE:
                 {
@@ -436,6 +439,8 @@ static int apply_p2m_changes(struct domain *d,
                 break;
             case INSERT:
                 {
+                    if ( pte.p2m.valid )
+                        p2m_put_page(pte);
                     pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
                     p2m_write_pte(&third[third_table_offset(addr)],
                                   pte, flush_pt);
@@ -451,6 +456,8 @@ static int apply_p2m_changes(struct domain *d,
                         break;
                     }
 
+                    p2m_put_page(pte);
+
                     count += 0x10;
 
                     memset(&pte, 0x00, sizeof(pte));
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/8] xen: arm: handle superpage mappings in p2m_lookup
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (3 preceding siblings ...)
  2014-06-26 10:17 ` [PATCH v3 4/8] xen: arm: only put_page for p2m operations which require it Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 10:17 ` [PATCH v3 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Currently none are actually created, but they will be shortly.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
---
 xen/arch/arm/p2m.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index ea05b6d..8a6d295 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -123,6 +123,7 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     struct p2m_domain *p2m = &d->arch.p2m;
     lpae_t pte, *first = NULL, *second = NULL, *third = NULL;
     paddr_t maddr = INVALID_PADDR;
+    paddr_t mask;
     p2m_type_t _t;
 
     /* Allow t to be NULL */
@@ -136,15 +137,21 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     if ( !first )
         goto err;
 
+    mask = FIRST_MASK;
     pte = first[first_table_offset(paddr)];
     if ( !pte.p2m.valid || !pte.p2m.table )
         goto done;
 
+    mask = SECOND_MASK;
     second = map_domain_page(pte.p2m.base);
     pte = second[second_table_offset(paddr)];
     if ( !pte.p2m.valid || !pte.p2m.table )
         goto done;
 
+    mask = THIRD_MASK;
+
+    BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
+
     third = map_domain_page(pte.p2m.base);
     pte = third[third_table_offset(paddr)];
 
@@ -156,7 +163,7 @@ done:
     if ( pte.p2m.valid )
     {
         ASSERT(pte.p2m.type != p2m_invalid);
-        maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr & ~PAGE_MASK);
+        maddr = (pte.bits & PADDR_MASK & mask) | (paddr & ~mask);
         *t = pte.p2m.type;
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 6/8] xen: arm: add some helpers for assessing p2m pte
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (4 preceding siblings ...)
  2014-06-26 10:17 ` [PATCH v3 5/8] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 15:07   ` Julien Grall
  2014-06-26 10:17 ` [PATCH v3 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Not terribly helpful right now, since they aren't widely used, but makes future
patches easier to read.

p2m_mapping is unused for the time-being and is therefore commented out
(otherwise the compiler complains about an unused function).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Use static functions. s/p2m_entry/p2m_mapping/ as its a better name.
v2: clarify common on p2m_{table,entry}
---
 xen/arch/arm/p2m.c |   30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 8a6d295..b51dc1b 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -14,6 +14,22 @@
 #define P2M_FIRST_ORDER 1
 #define P2M_FIRST_ENTRIES (LPAE_ENTRIES<<P2M_FIRST_ORDER)
 
+static bool_t p2m_valid(lpae_t pte) {
+    return pte.p2m.valid;
+}
+/* These two can only be used on L0..L2 ptes because L3 mappings set
+ * the table bit and therefore these would return the opposite to what
+ * you would expect. */
+static bool_t p2m_table(lpae_t pte)
+{
+    return p2m_valid(pte) && pte.p2m.table;
+}
+#if 0
+static bool_t p2m_mapping(lpae_t pte) {
+    return p2m_valid(pte) && !pte.p2m.table;
+}
+#endif
+
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
@@ -139,13 +155,13 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
 
     mask = FIRST_MASK;
     pte = first[first_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
+    if ( !p2m_table(pte) )
         goto done;
 
     mask = SECOND_MASK;
     second = map_domain_page(pte.p2m.base);
     pte = second[second_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
+    if ( !p2m_table(pte) )
         goto done;
 
     mask = THIRD_MASK;
@@ -156,11 +172,11 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr, p2m_type_t *t)
     pte = third[third_table_offset(paddr)];
 
     /* This bit must be one in the level 3 entry */
-    if ( !pte.p2m.table )
+    if ( !p2m_table(pte) )
         pte.bits = 0;
 
 done:
-    if ( pte.p2m.valid )
+    if ( p2m_valid(pte) )
     {
         ASSERT(pte.p2m.type != p2m_invalid);
         maddr = (pte.bits & PADDR_MASK & mask) | (paddr & ~mask);
@@ -367,7 +383,7 @@ static int apply_p2m_changes(struct domain *d,
             cur_first_page = p2m_first_level_index(addr);
         }
 
-        if ( !first[first_table_offset(addr)].p2m.valid )
+        if ( !p2m_valid(first[first_table_offset(addr)]) )
         {
             if ( !populate )
             {
@@ -384,7 +400,7 @@ static int apply_p2m_changes(struct domain *d,
             }
         }
 
-        BUG_ON(!first[first_table_offset(addr)].p2m.valid);
+        BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
 
         if ( cur_first_offset != first_table_offset(addr) )
         {
@@ -394,7 +410,7 @@ static int apply_p2m_changes(struct domain *d,
         }
         /* else: second already valid */
 
-        if ( !second[second_table_offset(addr)].p2m.valid )
+        if ( !p2m_valid(second[second_table_offset(addr)]) )
         {
             if ( !populate )
             {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (5 preceding siblings ...)
  2014-06-26 10:17 ` [PATCH v3 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 15:16   ` Julien Grall
  2014-06-26 10:17 ` [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
  2014-06-26 15:03 ` [PATCH v3 0/8] xen: arm: Use super pages in p2m Julien Grall
  8 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This creates superpage (1G and 2M) mappings in the p2m for both domU and dom0
when the guest and machine addresses are suitably aligned. This relies on the
domain builder to allocate suitably sized regions, this is trivially true for
1:1 mapped dom0's and was arranged for guests in a previous patch. A non-1:1
dom0's memory is not currently deliberately aligned.

Since ARM pagetables are (mostly) consistent at each level this is implemented
by refactoring the handling of a single level of pagetable into a common
function. The two inconsistencies are that there are no superpage mappings at
level zero and that level three entry mappings must set the table bit.

When inserting new mappings the code shatters superpage mappings as necessary,
but currently makes no attempt to coalesce anything again. In particular when
inserting a mapping which could be a superpage over an existing table mapping
we do not attempt to free that lower page table and instead descend into it.

Some p2m statistics are added to keep track of the number of each level of
mapping and the number of times we've had to shatter an existing mapping.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: s/p2m_entry/p2m_mapping/ after similar change in previous patch
    Increment addr/maddr for ALLOCATE case, tested with dom0_11_mapping = 0 (as
    far as possible, which is to say it crashes mounting rootfs which involves
    DMA)
    Don't describe L3 as superpages, just say "aligned for mapping"
    Consitently check level < 3 never level != 3
    Comment for p2m_dump_info
    Use ret for return value of apply_one_level, to avoid "count += rc".
    Add comment on p2m->stats.
v2: p2m stats as unsigned long, to cope with numbers of 4K mappings which are
    possible on both arm32 and arm64
    Fix "decend"
    Remove spurious {} around single line else clause.
    Drop spurious changes to apply_p2m_changes callers.
---
 xen/arch/arm/domain.c     |    1 +
 xen/arch/arm/p2m.c        |  488 ++++++++++++++++++++++++++++++++++-----------
 xen/include/asm-arm/p2m.h |   12 ++
 3 files changed, 380 insertions(+), 121 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 702ddfb..5dcef96 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -742,6 +742,7 @@ int domain_relinquish_resources(struct domain *d)
 
 void arch_dump_domain_info(struct domain *d)
 {
+    p2m_dump_info(d);
 }
 
 
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index b51dc1b..41d306a 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/lib.h>
+#include <xen/stdbool.h>
 #include <xen/errno.h>
 #include <xen/domain_page.h>
 #include <xen/bitops.h>
@@ -24,11 +25,25 @@ static bool_t p2m_table(lpae_t pte)
 {
     return p2m_valid(pte) && pte.p2m.table;
 }
-#if 0
 static bool_t p2m_mapping(lpae_t pte) {
     return p2m_valid(pte) && !pte.p2m.table;
 }
-#endif
+
+void p2m_dump_info(struct domain *d)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+
+    spin_lock(&p2m->lock);
+    printk("p2m mappings for domain %d (vmid %d):\n",
+           d->domain_id, p2m->vmid);
+    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
+    printk("  1G mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[1], p2m->stats.shattered[1]);
+    printk("  2M mappings: %ld (shattered %ld)\n",
+           p2m->stats.mappings[2], p2m->stats.shattered[2]);
+    printk("  4K mappings: %ld\n", p2m->stats.mappings[3]);
+    spin_unlock(&p2m->lock);
+}
 
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
@@ -285,15 +300,26 @@ static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool_t flush_cache)
         clean_xen_dcache(*p);
 }
 
-/* Allocate a new page table page and hook it in via the given entry */
-static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
+/*
+ * Allocate a new page table page and hook it in via the given entry.
+ * apply_one_level relies on this returning 0 on success
+ * and -ve on failure.
+ *
+ * If the exist entry is present then it must be a mapping and not a
+ * table and it will be shattered into the next level down.
+ *
+ * level_shift is the number of bits at the level we want to create.
+ */
+static int p2m_create_table(struct domain *d, lpae_t *entry,
+                            int level_shift, bool_t flush_cache)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
     struct page_info *page;
-    void *p;
+    lpae_t *p;
     lpae_t pte;
+    int splitting = entry->p2m.valid;
 
-    BUG_ON(entry->p2m.valid);
+    BUG_ON(entry->p2m.table);
 
     page = alloc_domheap_page(NULL, 0);
     if ( page == NULL )
@@ -302,9 +328,37 @@ static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
     page_list_add(page, &p2m->pages);
 
     p = __map_domain_page(page);
-    clear_page(p);
+    if ( splitting )
+    {
+        p2m_type_t t = entry->p2m.type;
+        unsigned long base_pfn = entry->p2m.base;
+        int i;
+
+        /*
+         * We are either splitting a first level 1G page into 512 second level
+         * 2M pages, or a second level 2M page into 512 third level 4K pages.
+         */
+         for ( i=0 ; i < LPAE_ENTRIES; i++ )
+         {
+             pte = mfn_to_p2m_entry(base_pfn + (i<<(level_shift-LPAE_SHIFT)),
+                                    MATTR_MEM, t);
+
+             /*
+              * First and second level super pages set p2m.table = 0, but
+              * third level entries set table = 1.
+              */
+             if ( level_shift - LPAE_SHIFT )
+                 pte.p2m.table = 0;
+
+             write_pte(&p[i], pte);
+         }
+    }
+    else
+        clear_page(p);
+
     if ( flush_cache )
         clean_xen_dcache_va_range(p, PAGE_SIZE);
+
     unmap_domain_page(p);
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
@@ -322,8 +376,14 @@ enum p2m_operation {
     CACHEFLUSH,
 };
 
-static void p2m_put_page(const lpae_t pte)
+/* Put any references on the single 4K page referenced by pte.  TODO:
+ * Handle superpages, for now we only take special references for leaf
+ * pages (specifically foreign ones, which can't be super mapped today).
+ */
+static void p2m_put_l3_page(const lpae_t pte)
 {
+    ASSERT(p2m_valid(pte));
+
     /* TODO: Handle other p2m types
      *
      * It's safe to do the put_page here because page_alloc will
@@ -339,6 +399,255 @@ static void p2m_put_page(const lpae_t pte)
     }
 }
 
+/*
+ * Returns true if start_gpaddr..end_gpaddr contains at least one
+ * suitably aligned level_size mappping of maddr.
+ *
+ * So long as the range is large enough the end_gpaddr need not be
+ * aligned (callers should create one superpage mapping based on this
+ * result and then call this again on the new range, eventually the
+ * slop at the end will cause this function to return false).
+ */
+static bool_t is_mapping_aligned(const paddr_t start_gpaddr,
+                                 const paddr_t end_gpaddr,
+                                 const paddr_t maddr,
+                                 const paddr_t level_size)
+{
+    const paddr_t level_mask = level_size - 1;
+
+    /* No hardware superpages at level 0 */
+    if ( level_size == ZEROETH_SIZE )
+        return false;
+
+    /*
+     * A range smaller than the size of a superpage at this level
+     * cannot be superpage aligned.
+     */
+    if ( ( end_gpaddr - start_gpaddr ) < level_size - 1 )
+        return false;
+
+    /* Both the gpaddr and maddr must be aligned */
+    if ( start_gpaddr & level_mask )
+        return false;
+    if ( maddr & level_mask )
+        return false;
+    return true;
+}
+
+#define P2M_ONE_DESCEND        0
+#define P2M_ONE_PROGRESS_NOP   0x1
+#define P2M_ONE_PROGRESS       0x10
+
+/*
+ * 0   == (P2M_ONE_DESCEND) continue to descend the tree
+ * +ve == (P2M_ONE_PROGRESS_*) handled at this level, continue, flush,
+ *        entry, addr and maddr updated.  Return value is an
+ *        indication of the amount of work done (for preemption).
+ * -ve == (-Exxx) error.
+ */
+static int apply_one_level(struct domain *d,
+                           lpae_t *entry,
+                           unsigned int level,
+                           bool_t flush_cache,
+                           enum p2m_operation op,
+                           paddr_t start_gpaddr,
+                           paddr_t end_gpaddr,
+                           paddr_t *addr,
+                           paddr_t *maddr,
+                           bool_t *flush,
+                           int mattr,
+                           p2m_type_t t)
+{
+    /* Helpers to lookup the properties of each level */
+    const paddr_t level_sizes[] =
+        { ZEROETH_SIZE, FIRST_SIZE, SECOND_SIZE, THIRD_SIZE };
+    const paddr_t level_masks[] =
+        { ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK };
+    const paddr_t level_shifts[] =
+        { ZEROETH_SHIFT, FIRST_SHIFT, SECOND_SHIFT, THIRD_SHIFT };
+    const paddr_t level_size = level_sizes[level];
+    const paddr_t level_mask = level_masks[level];
+    const paddr_t level_shift = level_shifts[level];
+
+    struct p2m_domain *p2m = &d->arch.p2m;
+    lpae_t pte;
+    const lpae_t orig_pte = *entry;
+    int rc;
+
+    BUG_ON(level > 3);
+
+    switch ( op )
+    {
+    case ALLOCATE:
+        ASSERT(level < 3 || !p2m_valid(orig_pte));
+        ASSERT(*maddr == 0);
+
+        if ( p2m_valid(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        if ( is_mapping_aligned(*addr, end_gpaddr, 0, level_size) )
+        {
+            struct page_info *page;
+
+            page = alloc_domheap_pages(d, level_shift - PAGE_SHIFT, 0);
+            if ( page )
+            {
+                pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
+                if ( level < 3 )
+                    pte.p2m.table = 0;
+                p2m_write_pte(entry, pte, flush_cache);
+                p2m->stats.mappings[level]++;
+
+                *addr += level_size;
+                *maddr += level_size;
+
+                return P2M_ONE_PROGRESS;
+            }
+            else if ( level == 3 )
+                return -ENOMEM;
+        }
+
+        /* L3 is always suitably aligned for mapping (handled, above) */
+        BUG_ON(level == 3);
+
+        /*
+         * If we get here then we failed to allocate a sufficiently
+         * large contiguous region for this level (which can't be
+         * L3). Create a page table and continue to descend so we try
+         * smaller allocations.
+         */
+        rc = p2m_create_table(d, entry, 0, flush_cache);
+        if ( rc < 0 )
+            return rc;
+
+        return P2M_ONE_DESCEND;
+
+    case INSERT:
+        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
+           /* We do not handle replacing an existing table with a superpage */
+             (level == 3 || !p2m_table(orig_pte)) )
+        {
+            /* New mapping is superpage aligned, make it */
+            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
+            if ( level < 3 )
+                pte.p2m.table = 0; /* Superpage entry */
+
+            p2m_write_pte(entry, pte, flush_cache);
+
+            *flush |= p2m_valid(orig_pte);
+
+            *addr += level_size;
+            *maddr += level_size;
+
+            if ( p2m_valid(orig_pte) )
+            {
+                /*
+                 * We can't currently get here for an existing table
+                 * mapping, since we don't handle replacing an
+                 * existing table with a superpage. If we did we would
+                 * need to handle freeing (and accounting) for the bit
+                 * of the p2m tree which we would be about to lop off.
+                 */
+                BUG_ON(level < 3 && p2m_table(orig_pte));
+                if ( level == 3 )
+                    p2m_put_l3_page(orig_pte);
+            }
+            else /* New mapping */
+                p2m->stats.mappings[level]++;
+
+            return P2M_ONE_PROGRESS;
+        }
+        else
+        {
+            /* New mapping is not superpage aligned, create a new table entry */
+
+            /* L3 is always suitably aligned for mapping (handled, above) */
+            BUG_ON(level == 3);
+
+            /* Not present -> create table entry and descend */
+            if ( !p2m_valid(orig_pte) )
+            {
+                rc = p2m_create_table(d, entry, 0, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+                return P2M_ONE_DESCEND;
+            }
+
+            /* Existing superpage mapping -> shatter and descend */
+            if ( p2m_mapping(orig_pte) )
+            {
+                *flush = true;
+                rc = p2m_create_table(d, entry,
+                                      level_shift - PAGE_SHIFT, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+
+                p2m->stats.shattered[level]++;
+                p2m->stats.mappings[level]--;
+                p2m->stats.mappings[level+1] += LPAE_ENTRIES;
+            } /* else: an existing table mapping -> descend */
+
+            BUG_ON(!entry->p2m.table);
+
+            return P2M_ONE_DESCEND;
+        }
+
+        break;
+
+    case RELINQUISH:
+    case REMOVE:
+        if ( !p2m_valid(orig_pte) )
+        {
+            /* Progress up to next boundary */
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        if ( level < 3 && p2m_table(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        *flush = true;
+
+        memset(&pte, 0x00, sizeof(pte));
+        p2m_write_pte(entry, pte, flush_cache);
+
+        *addr += level_size;
+
+        p2m->stats.mappings[level]--;
+
+        if ( level == 3 )
+            p2m_put_l3_page(orig_pte);
+
+        /*
+         * This is still a single pte write, no matter the level, so no need to
+         * scale.
+         */
+        return P2M_ONE_PROGRESS;
+
+    case CACHEFLUSH:
+        if ( !p2m_valid(orig_pte) )
+        {
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        /*
+         * could flush up to the next boundary, but would need to be
+         * careful about preemption, so just do one page now and loop.
+         */
+        *addr += PAGE_SIZE;
+        if ( p2m_is_ram(orig_pte.p2m.type) )
+        {
+            flush_page_to_ram(orig_pte.p2m.base + third_table_offset(*addr));
+            return P2M_ONE_PROGRESS;
+        }
+        else
+            return P2M_ONE_PROGRESS_NOP;
+    }
+
+    BUG(); /* Should never get here */
+}
+
 static int apply_p2m_changes(struct domain *d,
                      enum p2m_operation op,
                      paddr_t start_gpaddr,
@@ -347,7 +656,7 @@ static int apply_p2m_changes(struct domain *d,
                      int mattr,
                      p2m_type_t t)
 {
-    int rc;
+    int rc, ret;
     struct p2m_domain *p2m = &d->arch.p2m;
     lpae_t *first = NULL, *second = NULL, *third = NULL;
     paddr_t addr;
@@ -355,9 +664,7 @@ static int apply_p2m_changes(struct domain *d,
                   cur_first_offset = ~0,
                   cur_second_offset = ~0;
     unsigned long count = 0;
-    unsigned int flush = 0;
-    bool_t populate = (op == INSERT || op == ALLOCATE);
-    lpae_t pte;
+    bool_t flush = false;
     bool_t flush_pt;
 
     /* Some IOMMU don't support coherent PT walk. When the p2m is
@@ -371,6 +678,25 @@ static int apply_p2m_changes(struct domain *d,
     addr = start_gpaddr;
     while ( addr < end_gpaddr )
     {
+        /*
+         * Arbitrarily, preempt every 512 operations or 8192 nops.
+         * 512*P2M_ONE_PROGRESS == 8192*P2M_ONE_PROGRESS_NOP == 0x2000
+         *
+         * count is initialised to 0 above, so we are guaranteed to
+         * always make at least one pass.
+         */
+
+        if ( op == RELINQUISH && count >= 0x2000 )
+        {
+            if ( hypercall_preempt_check() )
+            {
+                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
+                rc = -ERESTART;
+                goto out;
+            }
+            count = 0;
+        }
+
         if ( cur_first_page != p2m_first_level_index(addr) )
         {
             if ( first ) unmap_domain_page(first);
@@ -383,22 +709,18 @@ static int apply_p2m_changes(struct domain *d,
             cur_first_page = p2m_first_level_index(addr);
         }
 
-        if ( !p2m_valid(first[first_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + FIRST_SIZE) & FIRST_MASK;
-                continue;
-            }
+        /* We only use a 3 level p2m at the moment, so no level 0,
+         * current hardware doesn't support super page mappings at
+         * level 0 anyway */
 
-            rc = p2m_create_table(d, &first[first_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 )
-            {
-                printk("p2m_populate_ram: L1 failed\n");
-                goto out;
-            }
-        }
+        ret = apply_one_level(d, &first[first_table_offset(addr)],
+                              1, flush_pt, op,
+                              start_gpaddr, end_gpaddr,
+                              &addr, &maddr, &flush,
+                              mattr, t);
+        if ( ret < 0 ) { rc = ret ; goto out; }
+        count += ret;
+        if ( ret != P2M_ONE_DESCEND ) continue;
 
         BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
 
@@ -410,23 +732,16 @@ static int apply_p2m_changes(struct domain *d,
         }
         /* else: second already valid */
 
-        if ( !p2m_valid(second[second_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + SECOND_SIZE) & SECOND_MASK;
-                continue;
-            }
+        ret = apply_one_level(d,&second[second_table_offset(addr)],
+                              2, flush_pt, op,
+                              start_gpaddr, end_gpaddr,
+                              &addr, &maddr, &flush,
+                              mattr, t);
+        if ( ret < 0 ) { rc = ret ; goto out; }
+        count += ret;
+        if ( ret != P2M_ONE_DESCEND ) continue;
 
-            rc = p2m_create_table(d, &second[second_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 ) {
-                printk("p2m_populate_ram: L2 failed\n");
-                goto out;
-            }
-        }
-
-        BUG_ON(!second[second_table_offset(addr)].p2m.valid);
+        BUG_ON(!p2m_valid(second[second_table_offset(addr)]));
 
         if ( cur_second_offset != second_table_offset(addr) )
         {
@@ -436,84 +751,15 @@ static int apply_p2m_changes(struct domain *d,
             cur_second_offset = second_table_offset(addr);
         }
 
-        pte = third[third_table_offset(addr)];
-
-        flush |= pte.p2m.valid;
-
-        switch (op) {
-            case ALLOCATE:
-                {
-                    /* Allocate a new RAM page and attach */
-                    struct page_info *page;
-
-                    ASSERT(!pte.p2m.valid);
-                    rc = -ENOMEM;
-                    page = alloc_domheap_page(d, 0);
-                    if ( page == NULL ) {
-                        printk("p2m_populate_ram: failed to allocate page\n");
-                        goto out;
-                    }
-
-                    pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
-
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                }
-                break;
-            case INSERT:
-                {
-                    if ( pte.p2m.valid )
-                        p2m_put_page(pte);
-                    pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    maddr += PAGE_SIZE;
-                }
-                break;
-            case RELINQUISH:
-            case REMOVE:
-                {
-                    if ( !pte.p2m.valid )
-                    {
-                        count++;
-                        break;
-                    }
-
-                    p2m_put_page(pte);
-
-                    count += 0x10;
-
-                    memset(&pte, 0x00, sizeof(pte));
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    count++;
-                }
-                break;
-
-            case CACHEFLUSH:
-                {
-                    if ( !pte.p2m.valid || !p2m_is_ram(pte.p2m.type) )
-                        break;
-
-                    flush_page_to_ram(pte.p2m.base);
-                }
-                break;
-        }
-
-        /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */
-        if ( op == RELINQUISH && count >= 0x2000 )
-        {
-            if ( hypercall_preempt_check() )
-            {
-                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
-                rc = -ERESTART;
-                goto out;
-            }
-            count = 0;
-        }
-
-        /* Got the next page */
-        addr += PAGE_SIZE;
+        ret = apply_one_level(d, &third[third_table_offset(addr)],
+                              3, flush_pt, op,
+                              start_gpaddr, end_gpaddr,
+                              &addr, &maddr, &flush,
+                              mattr, t);
+        if ( ret < 0 ) { rc = ret ; goto out; }
+        /* L3 had better have done something! We cannot descend any further */
+        BUG_ON(ret == P2M_ONE_DESCEND);
+        count += ret;
     }
 
     if ( flush )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 911d32d..327a79d 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -29,6 +29,15 @@ struct p2m_domain {
      * resume the search. Apart from during teardown this can only
      * decrease. */
     unsigned long lowest_mapped_gfn;
+
+    /* Gather some statistics for information purposes only */
+    struct {
+        /* Number of mappings at each p2m tree level */
+        unsigned long mappings[4];
+        /* Number of times we have shattered a mapping
+         * at each p2m tree level. */
+        unsigned long shattered[4];
+    } stats;
 };
 
 /* List of possible type for each page in the p2m entry.
@@ -79,6 +88,9 @@ int p2m_alloc_table(struct domain *d);
 void p2m_save_state(struct vcpu *p);
 void p2m_restore_state(struct vcpu *n);
 
+/* Print debugging/statistial info about a domain's p2m */
+void p2m_dump_info(struct domain *d);
+
 /* Look up the MFN corresponding to a domain's PFN. */
 paddr_t p2m_lookup(struct domain *d, paddr_t gpfn, p2m_type_t *t);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (6 preceding siblings ...)
  2014-06-26 10:17 ` [PATCH v3 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
@ 2014-06-26 10:17 ` Ian Campbell
  2014-06-26 18:20   ` Julien Grall
  2014-06-26 15:03 ` [PATCH v3 0/8] xen: arm: Use super pages in p2m Julien Grall
  8 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-26 10:17 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Depending on where Xen and the initial modules were loaded into RAM we may not
be able to allocate a suitably contiguous block of memory for dom0. Especially
since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
1GB allocation must be at a 1GB boundary).

The alignment requirement in particular is a huge limitation, meaning that even
dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
careful with your load addresses. People were also having trouble with dom0 >
128MB on the 1GB cubieboard2 when using fairly standard load addresses for
things.

This patch tries to allocate multiple banks of memory in order to try and
satisfy the entire requested domain 0 allocation. Sadly this turns out to be
pretty tricky to arrange (see the large comment in the code).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
---
v3:
  Handle running out of banks more gracefully
  Allow order >= min_low_order, not strictly greater. Otherwise
  dom0_mem=128M doesn't work.
v2: New patch
---
 xen/arch/arm/domain_build.c |  286 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 264 insertions(+), 22 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 9d9cba9..3fdb20e 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -45,6 +45,13 @@ custom_param("dom0_mem", parse_dom0_mem);
 # define DPRINT(fmt, args...) do {} while ( 0 )
 #endif
 
+//#define DEBUG_11_ALLOCATION
+#ifdef DEBUG_11_ALLOCATION
+# define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
+#else
+# define D11PRINT(fmt, args...) do {} while ( 0 )
+#endif
+
 /*
  * Amount of extra space required to dom0's device tree.  No new nodes
  * are added (yet) but one terminating reserve map entry (16 bytes) is
@@ -67,39 +74,274 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
     return alloc_vcpu(dom0, 0, 0);
 }
 
-static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+static unsigned int get_11_allocation_size(paddr_t size)
 {
-    paddr_t start;
-    paddr_t size;
-    struct page_info *pg;
-    unsigned int order = get_order_from_bytes(dom0_mem);
-    int res;
-    paddr_t spfn;
+    /*
+     * get_order_from_bytes returns the order greater than or equal to
+     * the given size, but we need less than or equal. Adding one to
+     * the size pushes an evenly aligned size into the next order, so
+     * we can then unconditionally subtract 1 from the order which is
+     * returned.
+     */
+    return get_order_from_bytes(size + 1) - 1;
+}
 
-    if ( is_32bit_domain(d) )
-        pg = alloc_domheap_pages(d, order, MEMF_bits(32));
-    else
-        pg = alloc_domheap_pages(d, order, 0);
-    if ( !pg )
-        panic("Failed to allocate contiguous memory for dom0");
+/*
+ * Insert the given pages into a memory bank, banks are ordered by address.
+ *
+ * Returns false if the memory would be below bank 0 or we have run
+ * out of banks. In this case it will free the pages.
+ */
+static bool_t insert_11_bank(struct domain *d,
+                             struct kernel_info *kinfo,
+                             struct page_info *pg,
+                             unsigned int order)
+{
+    int res, i;
+    paddr_t spfn;
+    paddr_t start, size;
 
     spfn = page_to_mfn(pg);
     start = pfn_to_paddr(spfn);
     size = pfn_to_paddr((1 << order));
 
-    // 1:1 mapping
-    printk("Populate P2M %#"PRIx64"->%#"PRIx64" (1:1 mapping for dom0)\n",
-           start, start + size);
-    res = guest_physmap_add_page(d, spfn, spfn, order);
+    D11PRINT("Allocated %#"PRIpaddr"-%#"PRIpaddr" (%ldMB/%ldMB, order %d)\n",
+             start, start + size,
+             1UL << (order+PAGE_SHIFT-20),
+             /* Don't want format this as PRIpaddr (16 digit hex) */
+             (unsigned long)(kinfo->unassigned_mem >> 20),
+             order);
 
-    if ( res )
-        panic("Unable to add pages in DOM0: %d", res);
+    if ( kinfo->mem.nr_banks > 0 && start + size < kinfo->mem.bank[0].start )
+    {
+        D11PRINT("Allocation is below bank 0, not using\n");
+        goto fail;
+    }
 
-    kinfo->mem.bank[0].start = start;
-    kinfo->mem.bank[0].size = size;
-    kinfo->mem.nr_banks = 1;
+    res = guest_physmap_add_page(d, spfn, spfn, order);
+    if ( res )
+        panic("Failed map pages to DOM0: %d", res);
 
     kinfo->unassigned_mem -= size;
+
+    if ( kinfo->mem.nr_banks == 0 )
+    {
+        kinfo->mem.bank[0].start = start;
+        kinfo->mem.bank[0].size = size;
+        kinfo->mem.nr_banks = 1;
+        return true;
+    }
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        struct membank *bank = &kinfo->mem.bank[i];
+
+        /* If possible merge new memory into the start of the bank */
+        if ( bank->start == start+size )
+        {
+            bank->start = start;
+            bank->size += size;
+            return true;
+        }
+
+        /* If possible merge new memory onto the end of the bank */
+        if ( start == bank->start + bank->size )
+        {
+            bank->size += size;
+            return true;
+        }
+
+        /*
+         * Otherwise if it is below this bank insert new memory in a
+         * new bank before this one. If there was a lower bank we
+         * could have inserted the memory into/before we would already
+         * have done so, so this must be the right place.
+         */
+        if ( start + size < bank->start && kinfo->mem.nr_banks < NR_MEM_BANKS )
+        {
+            memmove(bank + 1, bank, sizeof(*bank)*(kinfo->mem.nr_banks - i));
+            kinfo->mem.nr_banks++;
+            bank->start = start;
+            bank->size = size;
+            return true;
+        }
+    }
+
+    if ( i == kinfo->mem.nr_banks && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        struct membank *bank = &kinfo->mem.bank[kinfo->mem.nr_banks];
+
+        bank->start = start;
+        bank->size = size;
+        kinfo->mem.nr_banks++;
+        return true;
+    }
+
+    /* If we get here then there are no more banks to fill. */
+
+fail:
+    free_domheap_pages(pg, order);
+    return false;
+}
+
+/*
+ * This is all pretty horrible.
+ *
+ * Requirements:
+ *
+ * 1. The dom0 kernel should be loaded within the first 128MB of RAM. This
+ *    is necessary at least for Linux zImage kernels, which are all we
+ *    support today.
+ * 2. We want to put the dom0 kernel, ramdisk and DTB in the same
+ *    bank. Partly this is just easier for us to deal with, but also
+ *    the ramdisk and DTB must be placed within a certain proximity of
+ *    the kernel within RAM.
+ * 3. For 32-bit dom0 we want to place as much of the RAM as we
+ *    reasonably can below 4GB, so that it can be used by non-LPAE
+ *    enabled kernels.
+ * 4. For 32-bit dom0 the kernel must be located below 4GB.
+ * 5. We want to have a few largers banks rather than many smaller ones.
+ *
+ * For the first two requirements we need to make sure that the lowest
+ * bank is sufficiently large.
+ *
+ * For convenience we also sort the banks by physical address.
+ *
+ * The memory allocator does not really give us the flexibility to
+ * meet these requirements directly. So instead of proceed as follows:
+ *
+ * We first allocate the largest allocation we can as low as we
+ * can. This then becomes the first bank. This bank must be at least
+ * 128MB (or dom0_mem if that is smaller).
+ *
+ * Then we start allocating more memory, trying to allocate the
+ * largest possible size and trying smaller sizes until we
+ * successfully allocate something.
+ *
+ * We then try and insert this memory in to the list of banks. If it
+ * can be merged into an existing bank then this is trivial.
+ *
+ * If the new memory is before the first bank (and cannot be merged
+ * into it) then we give up. Since the allocator prefers to allocate
+ * high addresses first and the first bank has already been allocated
+ * to be as low as possible this likely means we wouldn't have been
+ * able to allocate much more memory anyway.
+ *
+ * Otherwise we insert a new bank. If we've reached MAX_NR_BANKS then
+ * we give up.
+ *
+ * For 32-bit domain we require that the initial allocation for the
+ * first bank is under 4G. Then for the subsequent allocations we
+ * initially allocate memory only from below 4GB. Once that runs out
+ * (as described above) we allow higher allocations and continue until
+ * that runs out (or we have allocated sufficient dom0 memory).
+ */
+static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+{
+    const unsigned int min_low_order =
+        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
+    const unsigned int min_order = get_order_from_bytes(MB(4));
+    struct page_info *pg;
+    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
+    int i;
+
+    bool_t lowmem = is_32bit_domain(d);
+    unsigned int bits;
+
+    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
+           /* Don't want format this as PRIpaddr (16 digit hex) */
+           (unsigned long)(kinfo->unassigned_mem >> 20));
+
+    kinfo->mem.nr_banks = 0;
+
+    /*
+     * First try and allocate the largest thing we can as low as
+     * possible to be bank 0.
+     */
+    while ( order >= min_low_order )
+    {
+        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
+        {
+            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
+            if ( pg != NULL )
+                goto got_bank0;
+        }
+        order--;
+    }
+
+    panic("Unable to allocate first memory bank");
+
+ got_bank0:
+
+    if ( !insert_11_bank(d, kinfo, pg, order) )
+        BUG(); /* Cannot fail for first bank */
+
+    /* Now allocate more memory and fill in additional banks */
+
+    order = get_11_allocation_size(kinfo->unassigned_mem);
+    while ( kinfo->unassigned_mem && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        pg = alloc_domheap_pages(d, order, lowmem ? MEMF_bits(32) : 0);
+        if ( !pg )
+        {
+            order --;
+
+            if ( lowmem && order < min_low_order)
+            {
+                D11PRINT("Failed at min_low_order, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            if ( order >= min_order )
+                continue;
+
+            /* No more we can do */
+            break;
+        }
+
+        if ( !insert_11_bank(d, kinfo, pg, order) )
+        {
+            if ( kinfo->mem.nr_banks == NR_MEM_BANKS )
+                /* Nothing more we can do. */
+                break;
+
+            if ( lowmem )
+            {
+                D11PRINT("Allocation below bank 0, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            else
+            {
+                D11PRINT("Allocation below bank 0\n");
+                break;
+            }
+        }
+
+        /*
+         * Success, next time around try again to get the largest order
+         * allocation possible.
+         */
+        order = get_11_allocation_size(kinfo->unassigned_mem);
+    }
+
+    if ( kinfo->unassigned_mem )
+        printk("WARNING: Failed to allocate requested dom0 memory."
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               " %ldMB unallocated\n",
+               (unsigned long)kinfo->unassigned_mem >> 20);
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        printk("BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" (%ldMB)\n",
+               i,
+               kinfo->mem.bank[i].start,
+               kinfo->mem.bank[i].start + kinfo->mem.bank[i].size,
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               (unsigned long)(kinfo->mem.bank[i].size >> 20));
+    }
 }
 
 static void allocate_memory(struct domain *d, struct kernel_info *kinfo)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/8] tools/libxc: pull min/max_t into xc_private.h
  2014-06-26 10:17 ` [PATCH v3 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
@ 2014-06-26 10:34   ` Ian Jackson
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2014-06-26 10:34 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, julien.grall, xen-devel

Ian Campbell writes ("[PATCH v3 2/8] tools/libxc: pull min/max_t into xc_private.h"):
> There are generally useful.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Acked-by: Julien Grall <julien.grall@linaro.org>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/8] xen: arm: Use super pages in p2m
  2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
                   ` (7 preceding siblings ...)
  2014-06-26 10:17 ` [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
@ 2014-06-26 15:03 ` Julien Grall
  2014-06-30 15:44   ` Ian Campbell
  8 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-26 15:03 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, Ian Jackson; +Cc: Tim Deegan, Stefano Stabellini

Hi Ian,

On 06/26/2014 11:16 AM, Ian Campbell wrote:
> The following series adds support for superpage mappings (1G and 2M) in
> the p2m for dom0 and domU.
> 
> At the moment it is not possible to allocate 1G regions to guests,
> because this is gated on multipage_allocation_permitted() which
> currently (incorrectly) returns false for dom0 on ARM.

OOI, why multipage_allocation_permitted is gated with iomem_caps?

It looks wrong to me, as mapping an MMIO region in the guest will allow
him to use 1G mapping.

> This is fixed by
> Arianna's "arch/arm: domain build: let dom0 access I/O memory of mapped
> devices" (which initialises iomem_caps for dom0).

IIRC, this has been pushed a month ago on upstream.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/8] xen: arm: add some helpers for assessing p2m pte
  2014-06-26 10:17 ` [PATCH v3 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
@ 2014-06-26 15:07   ` Julien Grall
  0 siblings, 0 replies; 29+ messages in thread
From: Julien Grall @ 2014-06-26 15:07 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

On 06/26/2014 11:17 AM, Ian Campbell wrote:
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 8a6d295..b51dc1b 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -14,6 +14,22 @@
>  #define P2M_FIRST_ORDER 1
>  #define P2M_FIRST_ENTRIES (LPAE_ENTRIES<<P2M_FIRST_ORDER)
>  
> +static bool_t p2m_valid(lpae_t pte) {

NIT: The { has to be on another line.

> +    return pte.p2m.valid;
> +}
> +/* These two can only be used on L0..L2 ptes because L3 mappings set
> + * the table bit and therefore these would return the opposite to what
> + * you would expect. */
> +static bool_t p2m_table(lpae_t pte)
> +{
> +    return p2m_valid(pte) && pte.p2m.table;
> +}
> +#if 0
> +static bool_t p2m_mapping(lpae_t pte) {

Same here.

> +    return p2m_valid(pte) && !pte.p2m.table;
> +}
> +#endif
> +

With these small changes:

Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/8] xen: arm: use superpages in p2m when pages are suitably aligned
  2014-06-26 10:17 ` [PATCH v3 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
@ 2014-06-26 15:16   ` Julien Grall
  0 siblings, 0 replies; 29+ messages in thread
From: Julien Grall @ 2014-06-26 15:16 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 06/26/2014 11:17 AM, Ian Campbell wrote:
> +static int p2m_create_table(struct domain *d, lpae_t *entry,
> +                            int level_shift, bool_t flush_cache)
>  {
>      struct p2m_domain *p2m = &d->arch.p2m;
>      struct page_info *page;
> -    void *p;
> +    lpae_t *p;
>      lpae_t pte;
> +    int splitting = entry->p2m.valid;

NIT: You've introduce p2m_valid earlier... I think you can use it there.

> +    BUG_ON(entry->p2m.table);

Should not you check the valid bit by using p2m_table?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-26 10:17 ` [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
@ 2014-06-26 18:20   ` Julien Grall
  2014-06-27  9:24     ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-26 18:20 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: tim, stefano.stabellini

Hi Ian,

On 06/26/2014 11:17 AM, Ian Campbell wrote:
> Depending on where Xen and the initial modules were loaded into RAM we may not
> be able to allocate a suitably contiguous block of memory for dom0. Especially
> since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
> 1GB allocation must be at a 1GB boundary).
> 
> The alignment requirement in particular is a huge limitation, meaning that even
> dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
> careful with your load addresses. People were also having trouble with dom0 >
> 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
> things.
> 
> This patch tries to allocate multiple banks of memory in order to try and
> satisfy the entire requested domain 0 allocation. Sadly this turns out to be
> pretty tricky to arrange (see the large comment in the code).
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
> ---
> v3:
>   Handle running out of banks more gracefully
>   Allow order >= min_low_order, not strictly greater. Otherwise
>   dom0_mem=128M doesn't work.

dom0_mem still not working on the versatile express (see my explanation
a below).

> +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
> +{
> +    const unsigned int min_low_order =
> +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> +    const unsigned int min_order = get_order_from_bytes(MB(4));
> +    struct page_info *pg;
> +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
> +    int i;
> +
> +    bool_t lowmem = is_32bit_domain(d);
> +    unsigned int bits;
> +
> +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
> +           /* Don't want format this as PRIpaddr (16 digit hex) */
> +           (unsigned long)(kinfo->unassigned_mem >> 20));
> +
> +    kinfo->mem.nr_banks = 0;
> +
> +    /*
> +     * First try and allocate the largest thing we can as low as
> +     * possible to be bank 0.
> +     */
> +    while ( order >= min_low_order )
> +    {
> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )

This has to be: bits <= (...). Indeed, we want to try all possible zone.

Which is annoying on the versatile express because the first bank will
be allocated in "high memory" from Linux POV which make DOM0 doesn't
boot under certain circumstance (see the
http://lists.xen.org/archives/html/xen-users/2014-06/msg00142.html).

I'm looking why Xen doesn't try to allocate in lower memory the first bank.

regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-26 18:20   ` Julien Grall
@ 2014-06-27  9:24     ` Ian Campbell
  2014-06-27 10:31       ` Julien Grall
  2014-06-27 12:52       ` Julien Grall
  0 siblings, 2 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-27  9:24 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Thu, 2014-06-26 at 19:20 +0100, Julien Grall wrote:
> On 06/26/2014 11:17 AM, Ian Campbell wrote:
> > Depending on where Xen and the initial modules were loaded into RAM we may not
> > be able to allocate a suitably contiguous block of memory for dom0. Especially
> > since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
> > 1GB allocation must be at a 1GB boundary).
> > 
> > The alignment requirement in particular is a huge limitation, meaning that even
> > dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
> > careful with your load addresses. People were also having trouble with dom0 >
> > 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
> > things.
> > 
> > This patch tries to allocate multiple banks of memory in order to try and
> > satisfy the entire requested domain 0 allocation. Sadly this turns out to be
> > pretty tricky to arrange (see the large comment in the code).
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
> > ---
> > v3:
> >   Handle running out of banks more gracefully
> >   Allow order >= min_low_order, not strictly greater. Otherwise
> >   dom0_mem=128M doesn't work.
> 
> dom0_mem still not working on the versatile express (see my explanation
> a below).

> 
> > +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
> > +{
> > +    const unsigned int min_low_order =
> > +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> > +    const unsigned int min_order = get_order_from_bytes(MB(4));
> > +    struct page_info *pg;
> > +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
> > +    int i;
> > +
> > +    bool_t lowmem = is_32bit_domain(d);
> > +    unsigned int bits;
> > +
> > +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
> > +           /* Don't want format this as PRIpaddr (16 digit hex) */
> > +           (unsigned long)(kinfo->unassigned_mem >> 20));
> > +
> > +    kinfo->mem.nr_banks = 0;
> > +
> > +    /*
> > +     * First try and allocate the largest thing we can as low as
> > +     * possible to be bank 0.
> > +     */
> > +    while ( order >= min_low_order )
> > +    {
> > +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> 
> This has to be: bits <= (...). Indeed, we want to try all possible zone.

Damn. I tested this on midway with "dom0_mem=128M mem=1G" and saw a
failure (which is fixed in this iteration). But obviously midway and
vexpress differ in their physical RAM addresses which caused me to miss
this second failure.

Have you tried that change? bits < 32 gives you 32-bits (i.e. 0..31)
which should cover everything up to 4G. Or else I'm misremembering some
quirk of the allocation interface.

Does vexpress really have memory high enough up that excluding bits==32
excludes it? I thought it was down in the 1-3GB range.

> Which is annoying on the versatile express because the first bank will
> be allocated in "high memory" from Linux POV which make DOM0 doesn't
> boot under certain circumstance (see the
> http://lists.xen.org/archives/html/xen-users/2014-06/msg00142.html).
> 
> I'm looking why Xen doesn't try to allocate in lower memory the first bank.

Usually it means that at least single page is allocated in each 128MB
block lower down, which can happen depending on how your boot modules
are laid out.

I did consider making 
min_low_order = get_order_from_bytes(min_t(paddr_t, dom0_mem/4, MB(128)));

i.e. for a dom0_mem <= 128MB allowing to be split into 4x32MB banks if
necessary. Or perhaps /2 would be better since 32MB is small enough that
we might have trouble with placing the kernel and modules etc in a
single bank.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-27  9:24     ` Ian Campbell
@ 2014-06-27 10:31       ` Julien Grall
  2014-06-27 11:54         ` Julien Grall
  2014-06-27 12:52       ` Julien Grall
  1 sibling, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-27 10:31 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

Hi Ian,

On 27/06/14 10:24, Ian Campbell wrote:
>> This has to be: bits <= (...). Indeed, we want to try all possible zone.
>
> Damn. I tested this on midway with "dom0_mem=128M mem=1G" and saw a
> failure (which is fixed in this iteration). But obviously midway and
> vexpress differ in their physical RAM addresses which caused me to miss
> this second failure.
>
> Have you tried that change? bits < 32 gives you 32-bits (i.e. 0..31)
> which should cover everything up to 4G. Or else I'm misremembering some
> quirk of the allocation interface.

Yes, without the change "<=" I was unable to allocate the first bank.

The memory layout is
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080000000 - 000000009fffffff
(XEN) RAM: 00000000a0000000 - 00000000bfffffff
(XEN)
(XEN) MODULE[1]: 000000009fee6000 - 000000009feea000
(XEN) MODULE[2]: 00000000a0008000 - 00000000a033f458
(XEN)  RESVD[0]: 0000000081f00000 - 0000000081f04000
(XEN)  RESVD[1]: 000000009fee6000 - 000000009feea000

> Does vexpress really have memory high enough up that excluding bits==32
> excludes it? I thought it was down in the 1-3GB range.

The first bank allocating is situated just above 2GB
BANK[0] 0x000000a8000000-0x000000b0000000

But I'm unable to find enough memory unless when I pass bits=32 to the 
allocator.

>> I'm looking why Xen doesn't try to allocate in lower memory the first bank.
>
> Usually it means that at least single page is allocated in each 128MB
> block lower down, which can happen depending on how your boot modules
> are laid out.

With my current layout (see above), I should have space in the first 
bank (around 0x80000000). I don't really understand why Xen is not 
trying to use it.

> I did consider making
> min_low_order = get_order_from_bytes(min_t(paddr_t, dom0_mem/4, MB(128)));
>
> i.e. for a dom0_mem <= 128MB allowing to be split into 4x32MB banks if
> necessary. Or perhaps /2 would be better since 32MB is small enough that
> we might have trouble with placing the kernel and modules etc in a
> single bank.

Require a 128Mb hole in the memory doesn't seem too mad. I would keep 
this minimum value.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-27 10:31       ` Julien Grall
@ 2014-06-27 11:54         ` Julien Grall
  0 siblings, 0 replies; 29+ messages in thread
From: Julien Grall @ 2014-06-27 11:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel



On 27/06/14 11:31, Julien Grall wrote:
> Hi Ian,
>
> On 27/06/14 10:24, Ian Campbell wrote:
>>> This has to be: bits <= (...). Indeed, we want to try all possible zone.
>>
>> Damn. I tested this on midway with "dom0_mem=128M mem=1G" and saw a
>> failure (which is fixed in this iteration). But obviously midway and
>> vexpress differ in their physical RAM addresses which caused me to miss
>> this second failure.
>>
>> Have you tried that change? bits < 32 gives you 32-bits (i.e. 0..31)
>> which should cover everything up to 4G. Or else I'm misremembering some
>> quirk of the allocation interface.
>
> Yes, without the change "<=" I was unable to allocate the first bank.
>
> The memory layout is
> (XEN) Checking for initrd in /chosen
> (XEN) RAM: 0000000080000000 - 000000009fffffff
> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> (XEN)
> (XEN) MODULE[1]: 000000009fee6000 - 000000009feea000
> (XEN) MODULE[2]: 00000000a0008000 - 00000000a033f458
> (XEN)  RESVD[0]: 0000000081f00000 - 0000000081f04000
> (XEN)  RESVD[1]: 000000009fee6000 - 000000009feea000
>
>> Does vexpress really have memory high enough up that excluding bits==32
>> excludes it? I thought it was down in the 1-3GB range.
>
> The first bank allocating is situated just above 2GB
> BANK[0] 0x000000a8000000-0x000000b0000000

I've dump my heap just after end_bootallocator. And it doesn't look right:

(XEN) heap[node=0][zone=0] -> 32763 pages
(XEN) heap[node=0][zone=1] -> 0 pages
(XEN) heap[node=0][zone=2] -> 0 pages
(XEN) heap[node=0][zone=3] -> 0 pages
(XEN) heap[node=0][zone=4] -> 0 pages
(XEN) heap[node=0][zone=5] -> 0 pages
(XEN) heap[node=0][zone=6] -> 0 pages
(XEN) heap[node=0][zone=7] -> 0 pages
(XEN) heap[node=0][zone=8] -> 0 pages
(XEN) heap[node=0][zone=9] -> 0 pages
(XEN) heap[node=0][zone=10] -> 0 pages
(XEN) heap[node=0][zone=11] -> 0 pages
(XEN) heap[node=0][zone=12] -> 0 pages
(XEN) heap[node=0][zone=13] -> 0 pages
(XEN) heap[node=0][zone=14] -> 0 pages
(XEN) heap[node=0][zone=15] -> 0 pages
(XEN) heap[node=0][zone=16] -> 0 pages
(XEN) heap[node=0][zone=17] -> 0 pages
(XEN) heap[node=0][zone=18] -> 0 pages
(XEN) heap[node=0][zone=19] -> 0 pages
(XEN) heap[node=0][zone=20] -> 219840 pages
(XEN) heap[node=0][zone=21] -> 0 pages
(XEN) heap[node=0][zone=22] -> 0 pages
(XEN) heap[node=0][zone=23] -> 0 pages
(XEN) heap[node=0][zone=24] -> 0 pages
(XEN) heap[node=0][zone=25] -> 0 pages
(XEN) heap[node=0][zone=26] -> 0 pages
(XEN) heap[node=0][zone=27] -> 0 pages
(XEN) heap[node=0][zone=28] -> 0 pages

For comparison midway:
(XEN) RAM: 0000000000000000 - 00000000ff7fffff
(XEN) RAM: 0000000200000000 - 00000002ffffffff
(XEN)
(XEN) MODULE[1]: 0000000000001000 - 0000000000006000
(XEN) MODULE[2]: 0000000001000000 - 00000000012c31a8
(XEN)  RESVD[0]: 0000000000000000 - 0000000000001000
(XEN)  RESVD[1]: 0000000000001000 - 0000000000003000

(XEN) heap[node=0][zone=0] -> 262138 pages
(XEN) heap[node=0][zone=1] -> 0 pages
(XEN) heap[node=0][zone=2] -> 0 pages
(XEN) heap[node=0][zone=3] -> 2 pages
(XEN) heap[node=0][zone=4] -> 8 pages
(XEN) heap[node=0][zone=5] -> 16 pages
(XEN) heap[node=0][zone=6] -> 32 pages
(XEN) heap[node=0][zone=7] -> 64 pages
(XEN) heap[node=0][zone=8] -> 128 pages
(XEN) heap[node=0][zone=9] -> 256 pages
(XEN) heap[node=0][zone=10] -> 512 pages
(XEN) heap[node=0][zone=11] -> 1024 pages
(XEN) heap[node=0][zone=12] -> 2048 pages
(XEN) heap[node=0][zone=13] -> 3388 pages
(XEN) heap[node=0][zone=14] -> 8192 pages
(XEN) heap[node=0][zone=15] -> 16384 pages
(XEN) heap[node=0][zone=16] -> 32768 pages
(XEN) heap[node=0][zone=17] -> 65536 pages
(XEN) heap[node=0][zone=18] -> 131072 pages
(XEN) heap[node=0][zone=19] -> 262144 pages
(XEN) heap[node=0][zone=20] -> 259584 pages
(XEN) heap[node=0][zone=21] -> 0 pages
(XEN) heap[node=0][zone=22] -> 1024000 pages
(XEN) heap[node=0][zone=23] -> 0 pages
(XEN) heap[node=0][zone=24] -> 0 pages
(XEN) heap[node=0][zone=25] -> 0 pages
(XEN) heap[node=0][zone=26] -> 0 pages
(XEN) heap[node=0][zone=27] -> 0 pages
(XEN) heap[node=0][zone=28] -> 0 pages

Arndale:
(XEN) RAM: 0000000040000000 - 000000004fffffff
(XEN) RAM: 0000000050000000 - 000000005fffffff
(XEN) RAM: 0000000060000000 - 000000006fffffff
(XEN) RAM: 0000000070000000 - 000000007fffffff
(XEN) RAM: 0000000080000000 - 000000008fffffff
(XEN) RAM: 0000000090000000 - 000000009fffffff
(XEN) RAM: 00000000a0000000 - 00000000afffffff
(XEN) RAM: 00000000b0000000 - 00000000bfffffff
(XEN)
(XEN) MODULE[1]: 000000004fff3000 - 0000000050000000
(XEN) MODULE[2]: 0000000040f00000 - 0000000041900000
(XEN)  RESVD[0]: 0000000042000000 - 000000004200a000

(XEN) heap[node=0][zone=0] -> 65522 pages
(XEN) heap[node=0][zone=1] -> 0 pages
(XEN) heap[node=0][zone=2] -> 0 pages
(XEN) heap[node=0][zone=3] -> 0 pages
(XEN) heap[node=0][zone=4] -> 0 pages
(XEN) heap[node=0][zone=5] -> 0 pages
(XEN) heap[node=0][zone=6] -> 0 pages
(XEN) heap[node=0][zone=7] -> 0 pages
(XEN) heap[node=0][zone=8] -> 0 pages
(XEN) heap[node=0][zone=9] -> 0 pages
(XEN) heap[node=0][zone=10] -> 0 pages
(XEN) heap[node=0][zone=11] -> 0 pages
(XEN) heap[node=0][zone=12] -> 0 pages
(XEN) heap[node=0][zone=13] -> 0 pages
(XEN) heap[node=0][zone=14] -> 0 pages
(XEN) heap[node=0][zone=15] -> 0 pages
(XEN) heap[node=0][zone=16] -> 0 pages
(XEN) heap[node=0][zone=17] -> 0 pages
(XEN) heap[node=0][zone=18] -> 0 pages
(XEN) heap[node=0][zone=19] -> 259561 pages
(XEN) heap[node=0][zone=20] -> 187904 pages
(XEN) heap[node=0][zone=21] -> 0 pages
(XEN) heap[node=0][zone=22] -> 0 pages
(XEN) heap[node=0][zone=23] -> 0 pages
(XEN) heap[node=0][zone=24] -> 0 pages
(XEN) heap[node=0][zone=25] -> 0 pages
(XEN) heap[node=0][zone=26] -> 0 pages
(XEN) heap[node=0][zone=27] -> 0 pages
(XEN) heap[node=0][zone=28] -> 0 pages

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-27  9:24     ` Ian Campbell
  2014-06-27 10:31       ` Julien Grall
@ 2014-06-27 12:52       ` Julien Grall
  2014-06-27 13:01         ` Ian Campbell
  1 sibling, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-27 12:52 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel



On 27/06/14 10:24, Ian Campbell wrote:
> On Thu, 2014-06-26 at 19:20 +0100, Julien Grall wrote:
>> On 06/26/2014 11:17 AM, Ian Campbell wrote:
>>> Depending on where Xen and the initial modules were loaded into RAM we may not
>>> be able to allocate a suitably contiguous block of memory for dom0. Especially
>>> since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
>>> 1GB allocation must be at a 1GB boundary).
>>>
>>> The alignment requirement in particular is a huge limitation, meaning that even
>>> dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
>>> careful with your load addresses. People were also having trouble with dom0 >
>>> 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
>>> things.
>>>
>>> This patch tries to allocate multiple banks of memory in order to try and
>>> satisfy the entire requested domain 0 allocation. Sadly this turns out to be
>>> pretty tricky to arrange (see the large comment in the code).
>>>
>>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>>> Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
>>> ---
>>> v3:
>>>    Handle running out of banks more gracefully
>>>    Allow order >= min_low_order, not strictly greater. Otherwise
>>>    dom0_mem=128M doesn't work.
>>
>> dom0_mem still not working on the versatile express (see my explanation
>> a below).
>
>>
>>> +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
>>> +{
>>> +    const unsigned int min_low_order =
>>> +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
>>> +    const unsigned int min_order = get_order_from_bytes(MB(4));
>>> +    struct page_info *pg;
>>> +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
>>> +    int i;
>>> +
>>> +    bool_t lowmem = is_32bit_domain(d);
>>> +    unsigned int bits;
>>> +
>>> +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
>>> +           /* Don't want format this as PRIpaddr (16 digit hex) */
>>> +           (unsigned long)(kinfo->unassigned_mem >> 20));
>>> +
>>> +    kinfo->mem.nr_banks = 0;
>>> +
>>> +    /*
>>> +     * First try and allocate the largest thing we can as low as
>>> +     * possible to be bank 0.
>>> +     */
>>> +    while ( order >= min_low_order )
>>> +    {
>>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
>>
>> This has to be: bits <= (...). Indeed, we want to try all possible zone.

After digging into the code, I confirm that we need the "<=".

the zone is retrieved with this formula: fls(page_to_mfn(pg)), in the 
case of the versatile express the bank start mfn is 0x8000.
So every page will reach the zone 20. At the same time zone 20 is equals 
to bits=32.

Futhermore, it looks like we always allocate memory from the top of this 
zone. Is it normal?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-27 12:52       ` Julien Grall
@ 2014-06-27 13:01         ` Ian Campbell
  2014-06-27 13:04           ` Julien Grall
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-27 13:01 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Fri, 2014-06-27 at 13:52 +0100, Julien Grall wrote:
> 
> On 27/06/14 10:24, Ian Campbell wrote:
> > On Thu, 2014-06-26 at 19:20 +0100, Julien Grall wrote:
> >> On 06/26/2014 11:17 AM, Ian Campbell wrote:
> >>> Depending on where Xen and the initial modules were loaded into RAM we may not
> >>> be able to allocate a suitably contiguous block of memory for dom0. Especially
> >>> since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
> >>> 1GB allocation must be at a 1GB boundary).
> >>>
> >>> The alignment requirement in particular is a huge limitation, meaning that even
> >>> dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
> >>> careful with your load addresses. People were also having trouble with dom0 >
> >>> 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
> >>> things.
> >>>
> >>> This patch tries to allocate multiple banks of memory in order to try and
> >>> satisfy the entire requested domain 0 allocation. Sadly this turns out to be
> >>> pretty tricky to arrange (see the large comment in the code).
> >>>
> >>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> >>> Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
> >>> ---
> >>> v3:
> >>>    Handle running out of banks more gracefully
> >>>    Allow order >= min_low_order, not strictly greater. Otherwise
> >>>    dom0_mem=128M doesn't work.
> >>
> >> dom0_mem still not working on the versatile express (see my explanation
> >> a below).
> >
> >>
> >>> +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
> >>> +{
> >>> +    const unsigned int min_low_order =
> >>> +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> >>> +    const unsigned int min_order = get_order_from_bytes(MB(4));
> >>> +    struct page_info *pg;
> >>> +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
> >>> +    int i;
> >>> +
> >>> +    bool_t lowmem = is_32bit_domain(d);
> >>> +    unsigned int bits;
> >>> +
> >>> +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
> >>> +           /* Don't want format this as PRIpaddr (16 digit hex) */
> >>> +           (unsigned long)(kinfo->unassigned_mem >> 20));
> >>> +
> >>> +    kinfo->mem.nr_banks = 0;
> >>> +
> >>> +    /*
> >>> +     * First try and allocate the largest thing we can as low as
> >>> +     * possible to be bank 0.
> >>> +     */
> >>> +    while ( order >= min_low_order )
> >>> +    {
> >>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> >>
> >> This has to be: bits <= (...). Indeed, we want to try all possible zone.
> 
> After digging into the code, I confirm that we need the "<=".
> 
> the zone is retrieved with this formula: fls(page_to_mfn(pg)), in the 
> case of the versatile express the bank start mfn is 0x8000.
> So every page will reach the zone 20. At the same time zone 20 is equals 
> to bits=32.


OK, thanks for digging.

> Futhermore, it looks like we always allocate memory from the top of this 
> zone. Is it normal?

Normal behaviour is to allocate from as high as possible so I'm not
surprised that applies with a zone as well.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-27 13:01         ` Ian Campbell
@ 2014-06-27 13:04           ` Julien Grall
  2014-06-27 13:09             ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-27 13:04 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel



On 27/06/14 14:01, Ian Campbell wrote:
>> Futhermore, it looks like we always allocate memory from the top of this
>> zone. Is it normal?
>
> Normal behaviour is to allocate from as high as possible so I'm not
> surprised that applies with a zone as well.

For DOM0 bank allocation, shouldn't we try to allocate as lower as 
possible? If not, we may not be able to create a second bank for dom0.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-27 13:04           ` Julien Grall
@ 2014-06-27 13:09             ` Ian Campbell
  2014-06-30 15:58               ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-27 13:09 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> 
> On 27/06/14 14:01, Ian Campbell wrote:
> >> Futhermore, it looks like we always allocate memory from the top of this
> >> zone. Is it normal?
> >
> > Normal behaviour is to allocate from as high as possible so I'm not
> > surprised that applies with a zone as well.
> 
> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> possible? If not, we may not be able to create a second bank for dom0.

The loop over bits is supposed to help there, but it's not as effective
with higher addresses I suppose.

We'd need a new flag fore the allocate. Which is doable I guess. I'll
give it a try.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/8] xen: arm: Use super pages in p2m
  2014-06-26 15:03 ` [PATCH v3 0/8] xen: arm: Use super pages in p2m Julien Grall
@ 2014-06-30 15:44   ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-30 15:44 UTC (permalink / raw)
  To: Julien Grall; +Cc: Tim Deegan, Stefano Stabellini, Ian Jackson, xen-devel

On Thu, 2014-06-26 at 16:03 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 06/26/2014 11:16 AM, Ian Campbell wrote:
> > The following series adds support for superpage mappings (1G and 2M) in
> > the p2m for dom0 and domU.
> > 
> > At the moment it is not possible to allocate 1G regions to guests,
> > because this is gated on multipage_allocation_permitted() which
> > currently (incorrectly) returns false for dom0 on ARM.
> 
> OOI, why multipage_allocation_permitted is gated with iomem_caps?

I don't really know.

I think it relates to the old PV x86 world where if you had a piece of
hardware passed through it might also be considered reasonable to allow
you to allocate larger contiguous buffers for the purposes of DMA (or
swiotlb) etc.

That probably doesn't make as much sense on ARM (or HVM/PVH x86 for that
matter)

> It looks wrong to me, as mapping an MMIO region in the guest will allow
> him to use 1G mapping.
> 
> > This is fixed by
> > Arianna's "arch/arm: domain build: let dom0 access I/O memory of mapped
> > devices" (which initialises iomem_caps for dom0).
> 
> IIRC, this has been pushed a month ago on upstream.

So it has. Good.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-27 13:09             ` Ian Campbell
@ 2014-06-30 15:58               ` Ian Campbell
  2014-06-30 16:10                 ` Julien Grall
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-30 15:58 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> > 
> > On 27/06/14 14:01, Ian Campbell wrote:
> > >> Futhermore, it looks like we always allocate memory from the top of this
> > >> zone. Is it normal?
> > >
> > > Normal behaviour is to allocate from as high as possible so I'm not
> > > surprised that applies with a zone as well.
> > 
> > For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> > possible? If not, we may not be able to create a second bank for dom0.
> 
> The loop over bits is supposed to help there, but it's not as effective
> with higher addresses I suppose.
> 
> We'd need a new flag fore the allocate. Which is doable I guess. I'll
> give it a try.

It's a bit tricky and involves messing with the guts of the page
allocator, which I'd rather avoid...

We do allow allocations to be merged onto the start of bank 0, which in
the normal case is what I would expect to happen, although it isn't
guaranteed. In that case we would try higher addresses and might find
enough RAM there.

I suppose I could relax the constraint a bit and allow new banks in
front of bank zero if they were >= 128MB (say).

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-30 15:58               ` Ian Campbell
@ 2014-06-30 16:10                 ` Julien Grall
  2014-06-30 16:14                   ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-30 16:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

Hi Ian,

On 06/30/2014 04:58 PM, Ian Campbell wrote:
> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
>>>
>>> On 27/06/14 14:01, Ian Campbell wrote:
>>>>> Futhermore, it looks like we always allocate memory from the top of this
>>>>> zone. Is it normal?
>>>>
>>>> Normal behaviour is to allocate from as high as possible so I'm not
>>>> surprised that applies with a zone as well.
>>>
>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
>>> possible? If not, we may not be able to create a second bank for dom0.
>>
>> The loop over bits is supposed to help there, but it's not as effective
>> with higher addresses I suppose.
>>
>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
>> give it a try.
> 
> It's a bit tricky and involves messing with the guts of the page
> allocator, which I'd rather avoid...
> 
> We do allow allocations to be merged onto the start of bank 0, which in
> the normal case is what I would expect to happen, although it isn't
> guaranteed. In that case we would try higher addresses and might find
> enough RAM there.
> 
> I suppose I could relax the constraint a bit and allow new banks in
> front of bank zero if they were >= 128MB (say).

So, replacing the bank zero by the new bank, right? IIRC, some Linux
version are requiring the bank to be ordered.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-30 16:10                 ` Julien Grall
@ 2014-06-30 16:14                   ` Ian Campbell
  2014-06-30 16:20                     ` Julien Grall
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-30 16:14 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 06/30/2014 04:58 PM, Ian Campbell wrote:
> > On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
> >> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> >>>
> >>> On 27/06/14 14:01, Ian Campbell wrote:
> >>>>> Futhermore, it looks like we always allocate memory from the top of this
> >>>>> zone. Is it normal?
> >>>>
> >>>> Normal behaviour is to allocate from as high as possible so I'm not
> >>>> surprised that applies with a zone as well.
> >>>
> >>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> >>> possible? If not, we may not be able to create a second bank for dom0.
> >>
> >> The loop over bits is supposed to help there, but it's not as effective
> >> with higher addresses I suppose.
> >>
> >> We'd need a new flag fore the allocate. Which is doable I guess. I'll
> >> give it a try.
> > 
> > It's a bit tricky and involves messing with the guts of the page
> > allocator, which I'd rather avoid...
> > 
> > We do allow allocations to be merged onto the start of bank 0, which in
> > the normal case is what I would expect to happen, although it isn't
> > guaranteed. In that case we would try higher addresses and might find
> > enough RAM there.
> > 
> > I suppose I could relax the constraint a bit and allow new banks in
> > front of bank zero if they were >= 128MB (say).
> 
> So, replacing the bank zero by the new bank, right? IIRC, some Linux
> version are requiring the bank to be ordered.

Inserting the new bank before, retaining the ordering.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-30 16:14                   ` Ian Campbell
@ 2014-06-30 16:20                     ` Julien Grall
  2014-06-30 16:27                       ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-30 16:20 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

On 06/30/2014 05:14 PM, Ian Campbell wrote:
> On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> On 06/30/2014 04:58 PM, Ian Campbell wrote:
>>> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
>>>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
>>>>>
>>>>> On 27/06/14 14:01, Ian Campbell wrote:
>>>>>>> Futhermore, it looks like we always allocate memory from the top of this
>>>>>>> zone. Is it normal?
>>>>>>
>>>>>> Normal behaviour is to allocate from as high as possible so I'm not
>>>>>> surprised that applies with a zone as well.
>>>>>
>>>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
>>>>> possible? If not, we may not be able to create a second bank for dom0.
>>>>
>>>> The loop over bits is supposed to help there, but it's not as effective
>>>> with higher addresses I suppose.
>>>>
>>>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
>>>> give it a try.
>>>
>>> It's a bit tricky and involves messing with the guts of the page
>>> allocator, which I'd rather avoid...
>>>
>>> We do allow allocations to be merged onto the start of bank 0, which in
>>> the normal case is what I would expect to happen, although it isn't
>>> guaranteed. In that case we would try higher addresses and might find
>>> enough RAM there.
>>>
>>> I suppose I could relax the constraint a bit and allow new banks in
>>> front of bank zero if they were >= 128MB (say).
>>
>> So, replacing the bank zero by the new bank, right? IIRC, some Linux
>> version are requiring the bank to be ordered.
> 
> Inserting the new bank before, retaining the ordering.

Thanks, this plan sounds good!

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-30 16:20                     ` Julien Grall
@ 2014-06-30 16:27                       ` Ian Campbell
  2014-06-30 16:29                         ` Julien Grall
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-06-30 16:27 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Mon, 2014-06-30 at 17:20 +0100, Julien Grall wrote:
> On 06/30/2014 05:14 PM, Ian Campbell wrote:
> > On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> On 06/30/2014 04:58 PM, Ian Campbell wrote:
> >>> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
> >>>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> >>>>>
> >>>>> On 27/06/14 14:01, Ian Campbell wrote:
> >>>>>>> Futhermore, it looks like we always allocate memory from the top of this
> >>>>>>> zone. Is it normal?
> >>>>>>
> >>>>>> Normal behaviour is to allocate from as high as possible so I'm not
> >>>>>> surprised that applies with a zone as well.
> >>>>>
> >>>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> >>>>> possible? If not, we may not be able to create a second bank for dom0.
> >>>>
> >>>> The loop over bits is supposed to help there, but it's not as effective
> >>>> with higher addresses I suppose.
> >>>>
> >>>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
> >>>> give it a try.
> >>>
> >>> It's a bit tricky and involves messing with the guts of the page
> >>> allocator, which I'd rather avoid...
> >>>
> >>> We do allow allocations to be merged onto the start of bank 0, which in
> >>> the normal case is what I would expect to happen, although it isn't
> >>> guaranteed. In that case we would try higher addresses and might find
> >>> enough RAM there.
> >>>
> >>> I suppose I could relax the constraint a bit and allow new banks in
> >>> front of bank zero if they were >= 128MB (say).
> >>
> >> So, replacing the bank zero by the new bank, right? IIRC, some Linux
> >> version are requiring the bank to be ordered.
> > 
> > Inserting the new bank before, retaining the ordering.
> 
> Thanks, this plan sounds good!

I've implemented it in the obvious way, but I can't for the life of me
get the scenario to actually occur despite playing with module load
location, fdt rsvmem, mem= and dom0_mem=.

I suppose that gives us some confidence that this is not going to happen
very often...

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-30 16:27                       ` Ian Campbell
@ 2014-06-30 16:29                         ` Julien Grall
  2014-06-30 16:31                           ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2014-06-30 16:29 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 06/30/2014 05:27 PM, Ian Campbell wrote:
> On Mon, 2014-06-30 at 17:20 +0100, Julien Grall wrote:
>> On 06/30/2014 05:14 PM, Ian Campbell wrote:
>>> On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
>>>> Hi Ian,
>>>>
>>>> On 06/30/2014 04:58 PM, Ian Campbell wrote:
>>>>> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
>>>>>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
>>>>>>>
>>>>>>> On 27/06/14 14:01, Ian Campbell wrote:
>>>>>>>>> Futhermore, it looks like we always allocate memory from the top of this
>>>>>>>>> zone. Is it normal?
>>>>>>>>
>>>>>>>> Normal behaviour is to allocate from as high as possible so I'm not
>>>>>>>> surprised that applies with a zone as well.
>>>>>>>
>>>>>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
>>>>>>> possible? If not, we may not be able to create a second bank for dom0.
>>>>>>
>>>>>> The loop over bits is supposed to help there, but it's not as effective
>>>>>> with higher addresses I suppose.
>>>>>>
>>>>>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
>>>>>> give it a try.
>>>>>
>>>>> It's a bit tricky and involves messing with the guts of the page
>>>>> allocator, which I'd rather avoid...
>>>>>
>>>>> We do allow allocations to be merged onto the start of bank 0, which in
>>>>> the normal case is what I would expect to happen, although it isn't
>>>>> guaranteed. In that case we would try higher addresses and might find
>>>>> enough RAM there.
>>>>>
>>>>> I suppose I could relax the constraint a bit and allow new banks in
>>>>> front of bank zero if they were >= 128MB (say).
>>>>
>>>> So, replacing the bank zero by the new bank, right? IIRC, some Linux
>>>> version are requiring the bank to be ordered.
>>>
>>> Inserting the new bank before, retaining the ordering.
>>
>> Thanks, this plan sounds good!
> 
> I've implemented it in the obvious way, but I can't for the life of me
> get the scenario to actually occur despite playing with module load
> location, fdt rsvmem, mem= and dom0_mem=.
> 
> I suppose that gives us some confidence that this is not going to happen
> very often...

If you send me the patch, I can give a try on the versatile express to
see what happen.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed
  2014-06-30 16:29                         ` Julien Grall
@ 2014-06-30 16:31                           ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-06-30 16:31 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Mon, 2014-06-30 at 17:29 +0100, Julien Grall wrote:
> If you send me the patch, I can give a try on the versatile express to
> see what happen.

Thanks, I'm heading home now but I'll post v4 in the morning...

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-06-30 16:31 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-26 10:16 [PATCH v3 0/8] xen: arm: Use super pages in p2m Ian Campbell
2014-06-26 10:17 ` [PATCH v3 1/8] xen: arm: dump vcpu gic info in arch_dump_vcpu_info Ian Campbell
2014-06-26 10:17 ` [PATCH v3 2/8] tools/libxc: pull min/max_t into xc_private.h Ian Campbell
2014-06-26 10:34   ` Ian Jackson
2014-06-26 10:17 ` [PATCH v3 3/8] tools: arm: allocate large pages to guests Ian Campbell
2014-06-26 10:17 ` [PATCH v3 4/8] xen: arm: only put_page for p2m operations which require it Ian Campbell
2014-06-26 10:17 ` [PATCH v3 5/8] xen: arm: handle superpage mappings in p2m_lookup Ian Campbell
2014-06-26 10:17 ` [PATCH v3 6/8] xen: arm: add some helpers for assessing p2m pte Ian Campbell
2014-06-26 15:07   ` Julien Grall
2014-06-26 10:17 ` [PATCH v3 7/8] xen: arm: use superpages in p2m when pages are suitably aligned Ian Campbell
2014-06-26 15:16   ` Julien Grall
2014-06-26 10:17 ` [PATCH v3 8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed Ian Campbell
2014-06-26 18:20   ` Julien Grall
2014-06-27  9:24     ` Ian Campbell
2014-06-27 10:31       ` Julien Grall
2014-06-27 11:54         ` Julien Grall
2014-06-27 12:52       ` Julien Grall
2014-06-27 13:01         ` Ian Campbell
2014-06-27 13:04           ` Julien Grall
2014-06-27 13:09             ` Ian Campbell
2014-06-30 15:58               ` Ian Campbell
2014-06-30 16:10                 ` Julien Grall
2014-06-30 16:14                   ` Ian Campbell
2014-06-30 16:20                     ` Julien Grall
2014-06-30 16:27                       ` Ian Campbell
2014-06-30 16:29                         ` Julien Grall
2014-06-30 16:31                           ` Ian Campbell
2014-06-26 15:03 ` [PATCH v3 0/8] xen: arm: Use super pages in p2m Julien Grall
2014-06-30 15:44   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.