All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/9] libxc: support building large pv-domains
@ 2015-11-05 14:36 Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

The Xen hypervisor supports starting a dom0 with large memory (up to
the TB range) by not including the initrd and p2m list in the initial
kernel mapping. Especially the p2m list can grow larger than the
available virtual space in the initial mapping.

The started kernel is indicating the support of each feature via
elf notes.

This series enables the domain builder in libxc to do the same as the
hypervisor. This enables starting of huge pv-domUs via xl.

Unmapped initrd is supported for 64 and 32 bit domains, omitting the
p2m from initial kernel mapping is possible for 64 bit domains only.

Tested with:
- 32 bit domU (kernel not supporting unmapped initrd)
- 32 bit domU (kernel supporting unmapped initrd)
- 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m)
- 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m)
- 900GB 64 bit domU (kernel supporting unmapped initrd and p2m)
- HVM domU

Changes in v4:
- updated patch 1 as suggested by Wei Liu (comment and variable name)
- modify comment in patch 6 as suggested by Wei Liu
- rework of patch 8 reducing line count by nearly 100
- added some additional plausibility checks to patch 8 as suggested by
  Wei Liu
- renamed round_pg() to round_pg_up() in patch 9 as suggested by Wei Liu

Changes in v3:
- Rebased the complete series to new staging (hvm builder patches by
  Roger Pau Monne)
- Removed old patch 1 as it broke stubdom build
- Introduced new Patch 1 to make allocation of guest memory more clear
  regarding virtual/physical memory allocation (requested by Ian Campbell)
- Change name of flag to indicate support of unmapped initrd in patch 2
  (requested by Ian Campbell)
- Introduce new patches 3, 4, 5 ("rename domain builder count_pgtables to
  alloc_pgtables", "introduce domain builder architecture specific data",
  "use domain builder architecture private data for x86 pv domains") to
  assist later page table work
- don't fiddle with initrd virtual address in patch 6 (was patch 3 in v2),
  add explicit initrd parameters for start_info in struct xc_dom_image
  instead (requested by Ian Campbell)
- Introduce new patch 8 ("rework of domain builder's page table handler")
  to be able to use common helpers for unmapped p2m list (requested by
  Ian Campbell)
- use now known segment size in pages for p2m list in patch 9 (was patch
  5 in v2) instead of fiddling with segment end address (requested by
  Ian Campbell)
- split alloc_p2m_list() in patch 9 (was patch 5 in v2) to 32/64 bit
  variants (requested by Ian Campbell)

Changes in v2:
- patch 2 has been removed as it has been applied already
- introduced new patch 2 as suggested by Ian Campbell: add a flag
  indicating support of an unmapped initrd to the parsed elf data of
  the elf_dom_parms structure
- updated patch description of patch 3 as requested by Ian Campbell


Juergen Gross (9):
  libxc: reorganize domain builder guest memory allocator
  xen: add generic flag to elf_dom_parms indicating support of unmapped
    initrd
  libxc: rename domain builder count_pgtables to alloc_pgtables
  libxc: introduce domain builder architecture specific data
  libxc: use domain builder architecture private data for x86 pv domains
  libxc: create unmapped initrd in domain builder if supported
  libxc: split p2m allocation in domain builder from other magic pages
  libxc: rework of domain builder's page table handler
  libxc: create p2m list outside of kernel mapping if supported

 stubdom/grub/kexec.c               |  12 +-
 tools/libxc/include/xc_dom.h       |  34 +--
 tools/libxc/xc_dom_arm.c           |   6 +-
 tools/libxc/xc_dom_core.c          | 180 ++++++++----
 tools/libxc/xc_dom_x86.c           | 563 +++++++++++++++++++++----------------
 tools/libxc/xg_private.h           |  39 +--
 xen/arch/x86/domain_build.c        |   4 +-
 xen/common/libelf/libelf-dominfo.c |   3 +
 xen/include/xen/libelf.h           |   1 +
 9 files changed, 490 insertions(+), 352 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-12 11:14   ` Wei Liu
  2015-11-05 14:36 ` [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Guest memory allocation in the domain builder of libxc is done via
virtual addresses only. In order to be able to support preallocated
areas not virtually mapped reorganize the memory allocator to keep
track of allocated pages globally and in allocated segments.

This requires an interface change of the allocate callback of the
domain builder which currently is using the last mapped virtual
address as a parameter. This is no problem as the only user of this
callback is stubdom/grub/kexec.c using this virtual address to
calculate the last used pfn.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 stubdom/grub/kexec.c         |   6 +--
 tools/libxc/include/xc_dom.h |  13 +++---
 tools/libxc/xc_dom_core.c    | 107 ++++++++++++++++++++++++++++---------------
 3 files changed, 79 insertions(+), 47 deletions(-)

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index 0b2f4f3..2300318 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -100,9 +100,9 @@ static void do_exchange(struct xc_dom_image *dom, xen_pfn_t target_pfn, xen_pfn_
     dom->p2m_host[target_pfn] = source_mfn;
 }
 
-int kexec_allocate(struct xc_dom_image *dom, xen_vaddr_t up_to)
+int kexec_allocate(struct xc_dom_image *dom)
 {
-    unsigned long new_allocated = (up_to - dom->parms.virt_base) / PAGE_SIZE;
+    unsigned long new_allocated = dom->pfn_alloc_end - dom->rambase_pfn;
     unsigned long i;
 
     pages = realloc(pages, new_allocated * sizeof(*pages));
@@ -319,8 +319,6 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
 
     /* Make sure the bootstrap page table does not RW-map any of our current
      * page table frames */
-    kexec_allocate(dom, dom->virt_pgtab_end);
-
     if ( (rc = xc_dom_update_guest_p2m(dom))) {
         grub_printf("xc_dom_update_guest_p2m returned %d\n", rc);
         errnum = ERR_BOOT_FAILURE;
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index ccc5926..68d6848 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -29,6 +29,7 @@ struct xc_dom_seg {
     xen_vaddr_t vstart;
     xen_vaddr_t vend;
     xen_pfn_t pfn;
+    xen_pfn_t pages;
 };
 
 struct xc_dom_mem {
@@ -90,6 +91,7 @@ struct xc_dom_image {
     xen_pfn_t xenstore_pfn;
     xen_pfn_t shared_info_pfn;
     xen_pfn_t bootstack_pfn;
+    xen_pfn_t pfn_alloc_end;
     xen_vaddr_t virt_alloc_end;
     xen_vaddr_t bsd_symtab_start;
 
@@ -175,8 +177,8 @@ struct xc_dom_image {
 
     /* kernel loader */
     struct xc_dom_arch *arch_hooks;
-    /* allocate up to virt_alloc_end */
-    int (*allocate) (struct xc_dom_image * dom, xen_vaddr_t up_to);
+    /* allocate up to pfn_alloc_end */
+    int (*allocate) (struct xc_dom_image * dom);
 
     /* Container type (HVM or PV). */
     enum {
@@ -360,14 +362,11 @@ static inline void *xc_dom_seg_to_ptr_pages(struct xc_dom_image *dom,
                                       struct xc_dom_seg *seg,
                                       xen_pfn_t *pages_out)
 {
-    xen_vaddr_t segsize = seg->vend - seg->vstart;
-    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
-    xen_pfn_t pages = (segsize + page_size - 1) / page_size;
     void *retval;
 
-    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, pages);
+    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, seg->pages);
 
-    *pages_out = retval ? pages : 0;
+    *pages_out = retval ? seg->pages : 0;
     return retval;
 }
 
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index fbe4464..a14d477 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -535,56 +535,75 @@ void *xc_dom_pfn_to_ptr_retcount(struct xc_dom_image *dom, xen_pfn_t pfn,
     return phys->ptr;
 }
 
-int xc_dom_alloc_segment(struct xc_dom_image *dom,
-                         struct xc_dom_seg *seg, char *name,
-                         xen_vaddr_t start, xen_vaddr_t size)
+static int xc_dom_chk_alloc_pages(struct xc_dom_image *dom, char *name,
+                                  xen_pfn_t pages)
 {
     unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
-    xen_pfn_t pages = (size + page_size - 1) / page_size;
-    xen_pfn_t pfn;
-    void *ptr;
 
-    if ( start == 0 )
-        start = dom->virt_alloc_end;
+    if ( pages > dom->total_pages || /* multiple test avoids overflow probs */
+         dom->pfn_alloc_end - dom->rambase_pfn > dom->total_pages ||
+         pages > dom->total_pages - dom->pfn_alloc_end + dom->rambase_pfn )
+    {
+        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
+                     "%s: segment %s too large (0x%"PRIpfn" > "
+                     "0x%"PRIpfn" - 0x%"PRIpfn" pages)", __FUNCTION__, name,
+                     pages, dom->total_pages,
+                     dom->pfn_alloc_end - dom->rambase_pfn);
+        return -1;
+    }
+
+    dom->pfn_alloc_end += pages;
+    dom->virt_alloc_end += pages * page_size;
+
+    return 0;
+}
 
-    if ( start & (page_size - 1) )
+static int xc_dom_alloc_pad(struct xc_dom_image *dom, xen_vaddr_t boundary)
+{
+    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
+    xen_pfn_t pages;
+
+    if ( boundary & (page_size - 1) )
     {
         xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
-                     "%s: segment start isn't page aligned (0x%" PRIx64 ")",
-                     __FUNCTION__, start);
+                     "%s: segment boundary isn't page aligned (0x%" PRIx64 ")",
+                     __FUNCTION__, boundary);
         return -1;
     }
-    if ( start < dom->virt_alloc_end )
+    if ( boundary < dom->virt_alloc_end )
     {
         xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
-                     "%s: segment start too low (0x%" PRIx64 " < 0x%" PRIx64
-                     ")", __FUNCTION__, start, dom->virt_alloc_end);
+                     "%s: segment boundary too low (0x%" PRIx64 " < 0x%" PRIx64
+                     ")", __FUNCTION__, boundary, dom->virt_alloc_end);
         return -1;
     }
+    pages = (boundary - dom->virt_alloc_end) / page_size;
 
-    seg->vstart = start;
-    pfn = (seg->vstart - dom->parms.virt_base) / page_size;
-    seg->pfn = pfn + dom->rambase_pfn;
+    return xc_dom_chk_alloc_pages(dom, "padding", pages);
+}
 
-    if ( pages > dom->total_pages || /* multiple test avoids overflow probs */
-         pfn > dom->total_pages ||
-         pages > dom->total_pages - pfn)
-    {
-        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
-                     "%s: segment %s too large (0x%"PRIpfn" > "
-                     "0x%"PRIpfn" - 0x%"PRIpfn" pages)",
-                     __FUNCTION__, name, pages, dom->total_pages, pfn);
+int xc_dom_alloc_segment(struct xc_dom_image *dom,
+                         struct xc_dom_seg *seg, char *name,
+                         xen_vaddr_t start, xen_vaddr_t size)
+{
+    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
+    xen_pfn_t pages;
+    void *ptr;
+
+    if ( start && xc_dom_alloc_pad(dom, start) )
         return -1;
-    }
 
-    seg->vend = start + pages * page_size;
-    dom->virt_alloc_end = seg->vend;
-    if (dom->allocate)
-        dom->allocate(dom, dom->virt_alloc_end);
+    pages = (size + page_size - 1) / page_size;
+    start = dom->virt_alloc_end;
 
-    DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " -> 0x%" PRIx64
-              "  (pfn 0x%" PRIpfn " + 0x%" PRIpfn " pages)",
-              __FUNCTION__, name, seg->vstart, seg->vend, seg->pfn, pages);
+    seg->pfn = dom->pfn_alloc_end;
+    seg->pages = pages;
+
+    if ( xc_dom_chk_alloc_pages(dom, name, pages) )
+        return -1;
+
+    if (dom->allocate)
+        dom->allocate(dom);
 
     /* map and clear pages */
     ptr = xc_dom_seg_to_ptr(dom, seg);
@@ -592,6 +611,13 @@ int xc_dom_alloc_segment(struct xc_dom_image *dom,
         return -1;
     memset(ptr, 0, pages * page_size);
 
+    seg->vstart = start;
+    seg->vend = dom->virt_alloc_end;
+
+    DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " -> 0x%" PRIx64
+              "  (pfn 0x%" PRIpfn " + 0x%" PRIpfn " pages)",
+              __FUNCTION__, name, seg->vstart, seg->vend, seg->pfn, pages);
+
     return 0;
 }
 
@@ -602,10 +628,11 @@ int xc_dom_alloc_page(struct xc_dom_image *dom, char *name)
     xen_pfn_t pfn;
 
     start = dom->virt_alloc_end;
+    pfn = dom->pfn_alloc_end - dom->rambase_pfn;
     dom->virt_alloc_end += page_size;
-    if (dom->allocate)
-        dom->allocate(dom, dom->virt_alloc_end);
-    pfn = (start - dom->parms.virt_base) / page_size;
+    dom->pfn_alloc_end++;
+    if ( dom->allocate )
+        dom->allocate(dom);
 
     DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " (pfn 0x%" PRIpfn ")",
               __FUNCTION__, name, start, pfn);
@@ -886,6 +913,7 @@ int xc_dom_parse_image(struct xc_dom_image *dom)
 int xc_dom_rambase_init(struct xc_dom_image *dom, uint64_t rambase)
 {
     dom->rambase_pfn = rambase >> XC_PAGE_SHIFT;
+    dom->pfn_alloc_end = dom->rambase_pfn;
     DOMPRINTF("%s: RAM starts at %"PRI_xen_pfn,
               __FUNCTION__, dom->rambase_pfn);
     return 0;
@@ -1013,6 +1041,8 @@ int xc_dom_build_image(struct xc_dom_image *dom)
         goto err;
     }
     page_size = XC_DOM_PAGE_SIZE(dom);
+    if ( dom->parms.virt_base != UNSET_ADDR )
+        dom->virt_alloc_end = dom->parms.virt_base;
 
     /* load kernel */
     if ( xc_dom_alloc_segment(dom, &dom->kernel_seg, "kernel",
@@ -1067,6 +1097,11 @@ int xc_dom_build_image(struct xc_dom_image *dom)
               __FUNCTION__, dom->virt_alloc_end);
     DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
               __FUNCTION__, dom->virt_pgtab_end);
+
+    /* Make sure all memory mapped by initial page tables is available */
+    if ( dom->virt_pgtab_end && xc_dom_alloc_pad(dom, dom->virt_pgtab_end) )
+        return -1;
+
     return 0;
 
  err:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-05 17:38   ` Andrew Cooper
  2015-11-05 14:36 ` [PATCH v4 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables Juergen Gross
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross, andrew.cooper3, keir, jbeulich

Support of an unmapped initrd is indicated by the kernel of the domain
via elf notes. In order not to have to use raw elf data in the tools
for support of an unmapped initrd add a flag to the parsed data area
to indicate the kernel supporting this feature.

Switch using this flag in the hypervisor domain builder.

Cc: andrew.cooper3@citrix.com
Cc: jbeulich@suse.com
Cc: keir@xen.org
Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain_build.c        | 4 ++--
 xen/common/libelf/libelf-dominfo.c | 3 +++
 xen/include/xen/libelf.h           | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index c2ef87a..d02dc4b 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
 
         vstart = parms->virt_base;
         vend = round_pgup(parms->virt_kend);
-        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
+        if ( !parms->unmapped_initrd )
             vend += round_pgup(initrd_len);
         end = vend + nr_pages * sizeof_long;
 
@@ -1037,7 +1037,7 @@ int __init construct_dom0(
     v_start          = parms.virt_base;
     vkern_start      = parms.virt_kstart;
     vkern_end        = parms.virt_kend;
-    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
+    if ( parms.unmapped_initrd )
     {
         vinitrd_start  = vinitrd_end = 0;
         vphysmap_start = round_pgup(vkern_end);
diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index 3de1c23..c9243e4 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     case XEN_ELFNOTE_INIT_P2M:
         parms->p2m_base = val;
         break;
+    case XEN_ELFNOTE_MOD_START_PFN:
+        parms->unmapped_initrd = !!val;
+        break;
     case XEN_ELFNOTE_PADDR_OFFSET:
         parms->elf_paddr_offset = val;
         break;
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index de788c7..6da4cc0 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -423,6 +423,7 @@ struct elf_dom_parms {
     char loader[16];
     enum xen_pae_type pae;
     bool bsd_symtab;
+    bool unmapped_initrd;
     uint64_t virt_base;
     uint64_t virt_entry;
     uint64_t virt_hypercall;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 4/9] libxc: introduce domain builder architecture specific data Juergen Gross
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Rename the count_pgtables hook of the domain builder to alloc_pgtables
and do the allocation of the guest memory for page tables inside this
hook. This will remove the need for accessing the x86 specific pgtables
member of struct xc_dom_image in the generic domain builder code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xc_dom.h |  2 +-
 tools/libxc/xc_dom_arm.c     |  6 +++---
 tools/libxc/xc_dom_core.c    | 11 ++---------
 tools/libxc/xc_dom_x86.c     | 26 +++++++++++++++++---------
 4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 68d6848..19d45f4 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -220,7 +220,7 @@ void xc_dom_register_loader(struct xc_dom_loader *loader);
 struct xc_dom_arch {
     /* pagetable setup */
     int (*alloc_magic_pages) (struct xc_dom_image * dom);
-    int (*count_pgtables) (struct xc_dom_image * dom);
+    int (*alloc_pgtables) (struct xc_dom_image * dom);
     int (*setup_pgtables) (struct xc_dom_image * dom);
 
     /* arch-specific data structs setup */
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 397eef0..d9a6371 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -49,7 +49,7 @@ const char *xc_domain_get_native_protocol(xc_interface *xch,
  * arm guests are hybrid and start off with paging disabled, therefore no
  * pagetables and nothing to do here.
  */
-static int count_pgtables_arm(struct xc_dom_image *dom)
+static int alloc_pgtables_arm(struct xc_dom_image *dom)
 {
     DOMPRINTF_CALLED(dom->xch);
     return 0;
@@ -534,7 +534,7 @@ static struct xc_dom_arch xc_dom_32 = {
     .page_shift = PAGE_SHIFT_ARM,
     .sizeof_pfn = 8,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_arm,
+    .alloc_pgtables = alloc_pgtables_arm,
     .setup_pgtables = setup_pgtables_arm,
     .start_info = start_info_arm,
     .shared_info = shared_info_arm,
@@ -550,7 +550,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .page_shift = PAGE_SHIFT_ARM,
     .sizeof_pfn = 8,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_arm,
+    .alloc_pgtables = alloc_pgtables_arm,
     .setup_pgtables = setup_pgtables_arm,
     .start_info = start_info_arm,
     .shared_info = shared_info_arm,
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index a14d477..74de3c3 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1082,15 +1082,8 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     /* allocate other pages */
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
         goto err;
-    if ( dom->arch_hooks->count_pgtables )
-    {
-        if ( dom->arch_hooks->count_pgtables(dom) != 0 )
-            goto err;
-        if ( (dom->pgtables > 0) &&
-             (xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
-                                   dom->pgtables * page_size) != 0) )
-                goto err;
-    }
+    if ( dom->arch_hooks->alloc_pgtables(dom) != 0 )
+        goto err;
     if ( dom->alloc_bootstack )
         dom->bootstack_pfn = xc_dom_alloc_page(dom, "boot stack");
     DOMPRINTF("%-20s: virt_alloc_end : 0x%" PRIx64 "",
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index ed43c28..ea32b00 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -126,7 +126,7 @@ nr_page_tables(struct xc_dom_image *dom,
     return tables;
 }
 
-static int count_pgtables(struct xc_dom_image *dom, int pae,
+static int alloc_pgtables(struct xc_dom_image *dom, int pae,
                           int l4_bits, int l3_bits, int l2_bits, int l1_bits)
 {
     int pages, extra_pages;
@@ -172,7 +172,9 @@ static int count_pgtables(struct xc_dom_image *dom, int pae,
             break;
     }
     dom->virt_pgtab_end = try_virt_end + 1;
-    return 0;
+
+    return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
+                                dom->pgtables * PAGE_SIZE_X86);
 }
 
 /* ------------------------------------------------------------------------ */
@@ -182,9 +184,9 @@ static int count_pgtables(struct xc_dom_image *dom, int pae,
 #define L2_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER)
 #define L3_PROT (_PAGE_PRESENT)
 
-static int count_pgtables_x86_32_pae(struct xc_dom_image *dom)
+static int alloc_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
-    return count_pgtables(dom, 1, 0, 32,
+    return alloc_pgtables(dom, 1, 0, 32,
                           L3_PAGETABLE_SHIFT_PAE, L2_PAGETABLE_SHIFT_PAE);
 }
 
@@ -355,9 +357,9 @@ pfn_error:
 /* ------------------------------------------------------------------------ */
 /* x86_64 pagetables                                                        */
 
-static int count_pgtables_x86_64(struct xc_dom_image *dom)
+static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
 {
-    return count_pgtables(dom, 0,
+    return alloc_pgtables(dom, 0,
                           L4_PAGETABLE_SHIFT_X86_64 + 9,
                           L4_PAGETABLE_SHIFT_X86_64,
                           L3_PAGETABLE_SHIFT_X86_64,
@@ -1620,6 +1622,12 @@ static int bootlate_pv(struct xc_dom_image *dom)
     return 0;
 }
 
+static int alloc_pgtables_hvm(struct xc_dom_image *dom)
+{
+    DOMPRINTF("%s: doing nothing", __func__);
+    return 0;
+}
+
 static int bootlate_hvm(struct xc_dom_image *dom)
 {
     DOMPRINTF("%s: doing nothing", __func__);
@@ -1643,7 +1651,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_x86_32_pae,
+    .alloc_pgtables = alloc_pgtables_x86_32_pae,
     .setup_pgtables = setup_pgtables_x86_32_pae,
     .start_info = start_info_x86_32,
     .shared_info = shared_info_x86_32,
@@ -1659,7 +1667,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_x86_64,
+    .alloc_pgtables = alloc_pgtables_x86_64,
     .setup_pgtables = setup_pgtables_x86_64,
     .start_info = start_info_x86_64,
     .shared_info = shared_info_x86_64,
@@ -1675,7 +1683,7 @@ static struct xc_dom_arch xc_hvm_32 = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
     .alloc_magic_pages = alloc_magic_pages_hvm,
-    .count_pgtables = NULL,
+    .alloc_pgtables = alloc_pgtables_hvm,
     .setup_pgtables = NULL,
     .start_info = NULL,
     .shared_info = NULL,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 4/9] libxc: introduce domain builder architecture specific data
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (2 preceding siblings ...)
  2015-11-05 14:36 ` [PATCH v4 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 5/9] libxc: use domain builder architecture private data for x86 pv domains Juergen Gross
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Reorganize struct xc_dom_image to contain a pointer to domain builder
architecture specific private data. This will abstract the architecture
or domain type specific data from the general used data.

The new area is allocated as soon as the domain type is known.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 stubdom/grub/kexec.c         |  6 +++++-
 tools/libxc/include/xc_dom.h |  6 +++++-
 tools/libxc/xc_dom_core.c    | 27 +++++++++++++++++++--------
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index 2300318..8fd9ff9 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -272,7 +272,11 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
 #endif
 
     /* equivalent of xc_dom_mem_init */
-    dom->arch_hooks = xc_dom_find_arch_hooks(xc_handle, dom->guest_type);
+    if (xc_dom_set_arch_hooks(dom)) {
+        grub_printf("xc_dom_set_arch_hooks failed\n");
+        errnum = ERR_EXEC_FORMAT;
+        goto out;
+    }
     dom->total_pages = start_info.nr_pages;
 
     /* equivalent of arch_setup_meminit */
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 19d45f4..09f73cd 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -175,6 +175,9 @@ struct xc_dom_image {
     unsigned int *vnode_to_pnode;
     unsigned int nr_vnodes;
 
+    /* domain type/architecture specific data */
+    void *arch_private;
+
     /* kernel loader */
     struct xc_dom_arch *arch_hooks;
     /* allocate up to pfn_alloc_end */
@@ -237,6 +240,7 @@ struct xc_dom_arch {
     char *native_protocol;
     int page_shift;
     int sizeof_pfn;
+    int arch_private_size;
 
     struct xc_dom_arch *next;
 };
@@ -290,7 +294,7 @@ int xc_dom_devicetree_mem(struct xc_dom_image *dom, const void *mem,
                           size_t memsize);
 
 int xc_dom_parse_image(struct xc_dom_image *dom);
-struct xc_dom_arch *xc_dom_find_arch_hooks(xc_interface *xch, char *guest_type);
+int xc_dom_set_arch_hooks(struct xc_dom_image *dom);
 int xc_dom_build_image(struct xc_dom_image *dom);
 int xc_dom_update_guest_p2m(struct xc_dom_image *dom);
 
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 74de3c3..3a31222 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -710,19 +710,30 @@ void xc_dom_register_arch_hooks(struct xc_dom_arch *hooks)
     first_hook = hooks;
 }
 
-struct xc_dom_arch *xc_dom_find_arch_hooks(xc_interface *xch, char *guest_type)
+int xc_dom_set_arch_hooks(struct xc_dom_image *dom)
 {
     struct xc_dom_arch *hooks = first_hook;
 
     while (  hooks != NULL )
     {
-        if ( !strcmp(hooks->guest_type, guest_type))
-            return hooks;
+        if ( !strcmp(hooks->guest_type, dom->guest_type) )
+        {
+            if ( hooks->arch_private_size )
+            {
+                dom->arch_private = malloc(hooks->arch_private_size);
+                if ( dom->arch_private == NULL )
+                    return -1;
+                memset(dom->arch_private, 0, hooks->arch_private_size);
+                dom->alloc_malloc += hooks->arch_private_size;
+            }
+            dom->arch_hooks = hooks;
+            return 0;
+        }
         hooks = hooks->next;
     }
-    xc_dom_panic(xch, XC_INVALID_KERNEL,
-                 "%s: not found (type %s)", __FUNCTION__, guest_type);
-    return NULL;
+    xc_dom_panic(dom->xch, XC_INVALID_KERNEL,
+                 "%s: not found (type %s)", __FUNCTION__, dom->guest_type);
+    return -1;
 }
 
 /* ------------------------------------------------------------------------ */
@@ -734,6 +745,7 @@ void xc_dom_release(struct xc_dom_image *dom)
     if ( dom->phys_pages )
         xc_dom_unmap_all(dom);
     xc_dom_free_all(dom);
+    free(dom->arch_private);
     free(dom);
 }
 
@@ -924,8 +936,7 @@ int xc_dom_mem_init(struct xc_dom_image *dom, unsigned int mem_mb)
     unsigned int page_shift;
     xen_pfn_t nr_pages;
 
-    dom->arch_hooks = xc_dom_find_arch_hooks(dom->xch, dom->guest_type);
-    if ( dom->arch_hooks == NULL )
+    if ( xc_dom_set_arch_hooks(dom) )
     {
         xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, "%s: arch hooks not set",
                      __FUNCTION__);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 5/9] libxc: use domain builder architecture private data for x86 pv domains
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (3 preceding siblings ...)
  2015-11-05 14:36 ` [PATCH v4 4/9] libxc: introduce domain builder architecture specific data Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 6/9] libxc: create unmapped initrd in domain builder if supported Juergen Gross
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Move some data private to the x86 domain builder to the private data
section. Remove extra_pages as they are used nowhere.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xc_dom.h |  8 --------
 tools/libxc/xc_dom_x86.c     | 48 +++++++++++++++++++++++++++++---------------
 2 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 09f73cd..0ba9821 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -94,15 +94,7 @@ struct xc_dom_image {
     xen_pfn_t pfn_alloc_end;
     xen_vaddr_t virt_alloc_end;
     xen_vaddr_t bsd_symtab_start;
-
-    /* initial page tables */
-    unsigned int pgtables;
-    unsigned int pg_l4;
-    unsigned int pg_l3;
-    unsigned int pg_l2;
-    unsigned int pg_l1;
     unsigned int alloc_bootstack;
-    unsigned int extra_pages;
     xen_vaddr_t virt_pgtab_end;
 
     /* other state info */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index ea32b00..aba50df 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -69,6 +69,15 @@
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
 
+struct xc_dom_image_x86 {
+    /* initial page tables */
+    unsigned int pgtables;
+    unsigned int pg_l4;
+    unsigned int pg_l3;
+    unsigned int pg_l2;
+    unsigned int pg_l1;
+};
+
 /* get guest IO ABI protocol */
 const char *xc_domain_get_native_protocol(xc_interface *xch,
                                           uint32_t domid)
@@ -132,9 +141,9 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae,
     int pages, extra_pages;
     xen_vaddr_t try_virt_end;
     xen_pfn_t try_pfn_end;
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
 
     extra_pages = dom->alloc_bootstack ? 1 : 0;
-    extra_pages += dom->extra_pages;
     extra_pages += 128; /* 512kB padding */
     pages = extra_pages;
     for ( ; ; )
@@ -152,29 +161,30 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae,
             return -ENOMEM;
         }
 
-        dom->pg_l4 =
+        domx86->pg_l4 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l4_bits);
-        dom->pg_l3 =
+        domx86->pg_l3 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l3_bits);
-        dom->pg_l2 =
+        domx86->pg_l2 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l2_bits);
-        dom->pg_l1 =
+        domx86->pg_l1 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l1_bits);
         if (pae && try_virt_end < 0xc0000000)
         {
             DOMPRINTF("%s: PAE: extra l2 page table for l3#3",
                       __FUNCTION__);
-            dom->pg_l2++;
+            domx86->pg_l2++;
         }
-        dom->pgtables = dom->pg_l4 + dom->pg_l3 + dom->pg_l2 + dom->pg_l1;
-        pages = dom->pgtables + extra_pages;
+        domx86->pgtables = domx86->pg_l4 + domx86->pg_l3 +
+                           domx86->pg_l2 + domx86->pg_l1;
+        pages = domx86->pgtables + extra_pages;
         if ( dom->virt_alloc_end + pages * PAGE_SIZE_X86 <= try_virt_end + 1 )
             break;
     }
     dom->virt_pgtab_end = try_virt_end + 1;
 
     return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
-                                dom->pgtables * PAGE_SIZE_X86);
+                                domx86->pgtables * PAGE_SIZE_X86);
 }
 
 /* ------------------------------------------------------------------------ */
@@ -262,9 +272,10 @@ static xen_pfn_t move_l3_below_4G(struct xc_dom_image *dom,
 
 static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     xen_pfn_t l3pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l2pfn = l3pfn + dom->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + dom->pg_l2;
+    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
+    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
     l3_pgentry_64_t *l3tab;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
@@ -373,10 +384,11 @@ static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
 
 static int setup_pgtables_x86_64(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     xen_pfn_t l4pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l3pfn = l4pfn + dom->pg_l4;
-    xen_pfn_t l2pfn = l3pfn + dom->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + dom->pg_l2;
+    xen_pfn_t l3pfn = l4pfn + domx86->pg_l4;
+    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
+    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
     l4_pgentry_64_t *l4tab = xc_dom_pfn_to_ptr(dom, l4pfn, 1);
     l3_pgentry_64_t *l3tab = NULL;
     l2_pgentry_64_t *l2tab = NULL;
@@ -619,6 +631,7 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
 
 static int start_info_x86_32(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     start_info_x86_32_t *start_info =
         xc_dom_pfn_to_ptr(dom, dom->start_info_pfn, 1);
     xen_pfn_t shinfo =
@@ -639,7 +652,7 @@ static int start_info_x86_32(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = dom->pgtables;
+    start_info->nr_pt_frames = domx86->pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
@@ -665,6 +678,7 @@ static int start_info_x86_32(struct xc_dom_image *dom)
 
 static int start_info_x86_64(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     start_info_x86_64_t *start_info =
         xc_dom_pfn_to_ptr(dom, dom->start_info_pfn, 1);
     xen_pfn_t shinfo =
@@ -685,7 +699,7 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = dom->pgtables;
+    start_info->nr_pt_frames = domx86->pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
@@ -1650,6 +1664,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_32,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
+    .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_32_pae,
     .setup_pgtables = setup_pgtables_x86_32_pae,
@@ -1666,6 +1681,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_64,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
+    .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_64,
     .setup_pgtables = setup_pgtables_x86_64,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 6/9] libxc: create unmapped initrd in domain builder if supported
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (4 preceding siblings ...)
  2015-11-05 14:36 ` [PATCH v4 5/9] libxc: use domain builder architecture private data for x86 pv domains Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 7/9] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

In case the kernel of a new pv-domU indicates it is supporting an
unmapped initrd, don't waste precious virtual space for the initrd,
but allocate only guest physical memory for it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xc_dom.h |  5 +++++
 tools/libxc/xc_dom_core.c    | 19 +++++++++++++++++--
 tools/libxc/xc_dom_x86.c     |  8 ++++----
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 0ba9821..2358012 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -94,6 +94,11 @@ struct xc_dom_image {
     xen_pfn_t pfn_alloc_end;
     xen_vaddr_t virt_alloc_end;
     xen_vaddr_t bsd_symtab_start;
+
+    /* initrd parameters as specified in start_info page */
+    unsigned long initrd_start;
+    unsigned long initrd_len;
+
     unsigned int alloc_bootstack;
     xen_vaddr_t virt_pgtab_end;
 
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 3a31222..7b48b1f 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1041,6 +1041,7 @@ static int xc_dom_build_ramdisk(struct xc_dom_image *dom)
 int xc_dom_build_image(struct xc_dom_image *dom)
 {
     unsigned int page_size;
+    bool unmapped_initrd;
 
     DOMPRINTF_CALLED(dom->xch);
 
@@ -1064,11 +1065,15 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     if ( dom->kernel_loader->loader(dom) != 0 )
         goto err;
 
-    /* load ramdisk */
-    if ( dom->ramdisk_blob )
+    /* Don't load ramdisk now if no initial mapping required. */
+    unmapped_initrd = dom->parms.unmapped_initrd && !dom->ramdisk_seg.vstart;
+
+    if ( dom->ramdisk_blob && !unmapped_initrd )
     {
         if ( xc_dom_build_ramdisk(dom) != 0 )
             goto err;
+        dom->initrd_start = dom->ramdisk_seg.vstart;
+        dom->initrd_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
     }
 
     /* load devicetree */
@@ -1106,6 +1111,16 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     if ( dom->virt_pgtab_end && xc_dom_alloc_pad(dom, dom->virt_pgtab_end) )
         return -1;
 
+    /* Load ramdisk if no initial mapping required. */
+    if ( dom->ramdisk_blob && unmapped_initrd )
+    {
+        if ( xc_dom_build_ramdisk(dom) != 0 )
+            goto err;
+        dom->flags |= SIF_MOD_START_PFN;
+        dom->initrd_start = dom->ramdisk_seg.pfn;
+        dom->initrd_len = page_size * dom->ramdisk_seg.pages;
+    }
+
     return 0;
 
  err:
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index aba50df..3c6bb9c 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -663,8 +663,8 @@ static int start_info_x86_32(struct xc_dom_image *dom)
 
     if ( dom->ramdisk_blob )
     {
-        start_info->mod_start = dom->ramdisk_seg.vstart;
-        start_info->mod_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
+        start_info->mod_start = dom->initrd_start;
+        start_info->mod_len = dom->initrd_len;
     }
 
     if ( dom->cmdline )
@@ -710,8 +710,8 @@ static int start_info_x86_64(struct xc_dom_image *dom)
 
     if ( dom->ramdisk_blob )
     {
-        start_info->mod_start = dom->ramdisk_seg.vstart;
-        start_info->mod_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
+        start_info->mod_start = dom->initrd_start;
+        start_info->mod_len = dom->initrd_len;
     }
 
     if ( dom->cmdline )
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 7/9] libxc: split p2m allocation in domain builder from other magic pages
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (5 preceding siblings ...)
  2015-11-05 14:36 ` [PATCH v4 6/9] libxc: create unmapped initrd in domain builder if supported Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-05 14:36 ` [PATCH v4 8/9] libxc: rework of domain builder's page table handler Juergen Gross
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Carve out the p2m list allocation from the .alloc_magic_pages hook of
the domain builder in order to prepare allocating the p2m list outside
of the initial kernel mapping. This will be needed to support loading
domains with huge memory (>512 GB).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xc_dom.h |  1 +
 tools/libxc/xc_dom_core.c    |  3 +++
 tools/libxc/xc_dom_x86.c     | 11 ++++++++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 2358012..7c157c3 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -221,6 +221,7 @@ struct xc_dom_arch {
     /* pagetable setup */
     int (*alloc_magic_pages) (struct xc_dom_image * dom);
     int (*alloc_pgtables) (struct xc_dom_image * dom);
+    int (*alloc_p2m_list) (struct xc_dom_image * dom);
     int (*setup_pgtables) (struct xc_dom_image * dom);
 
     /* arch-specific data structs setup */
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 7b48b1f..ad91b35 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1096,6 +1096,9 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     }
 
     /* allocate other pages */
+    if ( dom->arch_hooks->alloc_p2m_list &&
+         dom->arch_hooks->alloc_p2m_list(dom) != 0 )
+        goto err;
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
         goto err;
     if ( dom->arch_hooks->alloc_pgtables(dom) != 0 )
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 3c6bb9c..dd448cb 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -475,7 +475,7 @@ pfn_error:
 
 /* ------------------------------------------------------------------------ */
 
-static int alloc_magic_pages(struct xc_dom_image *dom)
+static int alloc_p2m_list(struct xc_dom_image *dom)
 {
     size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
 
@@ -487,6 +487,13 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
     if ( dom->p2m_guest == NULL )
         return -1;
 
+    return 0;
+}
+
+/* ------------------------------------------------------------------------ */
+
+static int alloc_magic_pages(struct xc_dom_image *dom)
+{
     /* allocate special pages */
     dom->start_info_pfn = xc_dom_alloc_page(dom, "start info");
     dom->xenstore_pfn = xc_dom_alloc_page(dom, "xenstore");
@@ -1667,6 +1674,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_32_pae,
+    .alloc_p2m_list = alloc_p2m_list,
     .setup_pgtables = setup_pgtables_x86_32_pae,
     .start_info = start_info_x86_32,
     .shared_info = shared_info_x86_32,
@@ -1684,6 +1692,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_64,
+    .alloc_p2m_list = alloc_p2m_list,
     .setup_pgtables = setup_pgtables_x86_64,
     .start_info = start_info_x86_64,
     .shared_info = shared_info_x86_64,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 8/9] libxc: rework of domain builder's page table handler
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (6 preceding siblings ...)
  2015-11-05 14:36 ` [PATCH v4 7/9] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-12 12:39   ` Wei Liu
  2015-11-05 14:36 ` [PATCH v4 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
  2015-11-12  5:09 ` [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
  9 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

In order to prepare a p2m list outside of the initial kernel mapping
do a rework of the domain builder's page table handler. The goal is
to be able to use common helpers for page table allocation and setup
for initial kernel page tables and page tables mapping the p2m list.
This is achieved by supporting multiple mapping areas. The mapped
virtual addresses of the single areas must not overlap, while the
page tables of a new area added might already be partially present.
Especially the top level page table is existing only once, of course.

Currently restrict the number of mappings to 1 because the only mapping
now is the initial mapping created by toolstack. There should not be
behaviour change and guest visible change introduced.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/xc_dom_x86.c | 478 ++++++++++++++++++++++++-----------------------
 tools/libxc/xg_private.h |  39 +---
 2 files changed, 251 insertions(+), 266 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index dd448cb..497aa55 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -26,6 +26,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <inttypes.h>
+#include <assert.h>
 
 #include <xen/xen.h>
 #include <xen/foreign/x86_32.h>
@@ -69,13 +70,29 @@
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
 
-struct xc_dom_image_x86 {
-    /* initial page tables */
+struct xc_dom_params {
+    unsigned levels;
+    xen_vaddr_t vaddr_mask;
+    x86_pgentry_t lvl_prot[4];
+};
+
+struct xc_dom_x86_mapping_lvl {
+    xen_vaddr_t from;
+    xen_vaddr_t to;
+    xen_pfn_t pfn;
     unsigned int pgtables;
-    unsigned int pg_l4;
-    unsigned int pg_l3;
-    unsigned int pg_l2;
-    unsigned int pg_l1;
+};
+
+struct xc_dom_x86_mapping {
+    struct xc_dom_x86_mapping_lvl area;
+    struct xc_dom_x86_mapping_lvl lvls[4];
+};
+
+struct xc_dom_image_x86 {
+    unsigned n_mappings;
+#define MAPPING_MAX 1
+    struct xc_dom_x86_mapping maps[MAPPING_MAX];
+    struct xc_dom_params *params;
 };
 
 /* get guest IO ABI protocol */
@@ -105,102 +122,159 @@ const char *xc_domain_get_native_protocol(xc_interface *xch,
     return protocol;
 }
 
-static unsigned long
-nr_page_tables(struct xc_dom_image *dom,
-               xen_vaddr_t start, xen_vaddr_t end, unsigned long bits)
+static int count_pgtables(struct xc_dom_image *dom, xen_vaddr_t from,
+                          xen_vaddr_t to, xen_pfn_t pfn)
 {
-    xen_vaddr_t mask = bits_to_mask(bits);
-    int tables;
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map, *map_cmp;
+    xen_pfn_t pfn_end;
+    xen_vaddr_t mask;
+    unsigned bits;
+    int l, m;
 
-    if ( bits == 0 )
-        return 0;  /* unused */
+    if ( domx86->n_mappings == MAPPING_MAX )
+    {
+        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
+                     "%s: too many mappings\n", __FUNCTION__);
+        return -ENOMEM;
+    }
+    map = domx86->maps + domx86->n_mappings;
 
-    if ( bits == (8 * sizeof(unsigned long)) )
+    pfn_end = pfn + ((to - from) >> PAGE_SHIFT_X86);
+    if ( pfn_end >= dom->p2m_size )
     {
-        /* must be pgd, need one */
-        start = 0;
-        end = -1;
-        tables = 1;
+        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
+                     "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
+                     __FUNCTION__, pfn_end, dom->p2m_size);
+        return -ENOMEM;
     }
-    else
+    for ( m = 0; m < domx86->n_mappings; m++ )
     {
-        start = round_down(start, mask);
-        end = round_up(end, mask);
-        tables = ((end - start) >> bits) + 1;
+        map_cmp = domx86->maps + m;
+        if ( from < map_cmp->area.to && to > map_cmp->area.from )
+        {
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                         "%s: overlapping mappings\n", __FUNCTION__);
+            return -1;
+        }
     }
 
-    DOMPRINTF("%s: 0x%016" PRIx64 "/%ld: 0x%016" PRIx64
-              " -> 0x%016" PRIx64 ", %d table(s)",
-              __FUNCTION__, mask, bits, start, end, tables);
-    return tables;
+    memset(map, 0, sizeof(*map));
+    map->area.from = from & domx86->params->vaddr_mask;
+    map->area.to = to & domx86->params->vaddr_mask;
+
+    for ( l = domx86->params->levels - 1; l >= 0; l-- )
+    {
+        map->lvls[l].pfn = pfn + map->area.pgtables;
+        if ( l == domx86->params->levels - 1 )
+        {
+            if ( domx86->n_mappings == 0 )
+            {
+                map->lvls[l].from = 0;
+                map->lvls[l].to = domx86->params->vaddr_mask;
+                map->lvls[l].pgtables = 1;
+                map->area.pgtables++;
+            }
+            continue;
+        }
+
+        bits = PAGE_SHIFT_X86 + (l + 1) * PGTBL_LEVEL_SHIFT_X86;
+        mask = bits_to_mask(bits);
+        map->lvls[l].from = map->area.from & ~mask;
+        map->lvls[l].to = map->area.to | mask;
+
+        if ( domx86->params->levels == 3 && domx86->n_mappings == 0 &&
+             to < 0xc0000000 && l == 1 )
+        {
+            DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
+            map->lvls[l].to = domx86->params->vaddr_mask;
+        }
+
+        for ( m = 0; m < domx86->n_mappings; m++ )
+        {
+            map_cmp = domx86->maps + m;
+            if ( map_cmp->lvls[l].from == map_cmp->lvls[l].to )
+                continue;
+            if ( map->lvls[l].from >= map_cmp->lvls[l].from &&
+                 map->lvls[l].to <= map_cmp->lvls[l].to )
+            {
+                map->lvls[l].from = 0;
+                map->lvls[l].to = 0;
+                break;
+            }
+            assert(map->lvls[l].from >= map_cmp->lvls[l].from ||
+                   map->lvls[l].to <= map_cmp->lvls[l].to);
+            if ( map->lvls[l].from >= map_cmp->lvls[l].from &&
+                 map->lvls[l].from <= map_cmp->lvls[l].to )
+                map->lvls[l].from = map_cmp->lvls[l].to + 1;
+            if ( map->lvls[l].to >= map_cmp->lvls[l].from &&
+                 map->lvls[l].to <= map_cmp->lvls[l].to )
+                map->lvls[l].to = map_cmp->lvls[l].from - 1;
+        }
+        if ( map->lvls[l].from < map->lvls[l].to )
+            map->lvls[l].pgtables =
+                ((map->lvls[l].to - map->lvls[l].from) >> bits) + 1;
+        DOMPRINTF("%s: 0x%016" PRIx64 "/%d: 0x%016" PRIx64 " -> 0x%016" PRIx64
+                  ", %d table(s)", __FUNCTION__, mask, bits,
+                  map->lvls[l].from, map->lvls[l].to, map->lvls[l].pgtables);
+        map->area.pgtables += map->lvls[l].pgtables;
+    }
+
+    return 0;
 }
 
-static int alloc_pgtables(struct xc_dom_image *dom, int pae,
-                          int l4_bits, int l3_bits, int l2_bits, int l1_bits)
+static int alloc_pgtables(struct xc_dom_image *dom)
 {
     int pages, extra_pages;
     xen_vaddr_t try_virt_end;
-    xen_pfn_t try_pfn_end;
     struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
 
     extra_pages = dom->alloc_bootstack ? 1 : 0;
-    extra_pages += 128; /* 512kB padding */
+    extra_pages += (512 * 1024) / PAGE_SIZE_X86; /* 512kB padding */
     pages = extra_pages;
     for ( ; ; )
     {
         try_virt_end = round_up(dom->virt_alloc_end + pages * PAGE_SIZE_X86,
                                 bits_to_mask(22)); /* 4MB alignment */
 
-        try_pfn_end = (try_virt_end - dom->parms.virt_base) >> PAGE_SHIFT_X86;
+        if ( count_pgtables(dom, dom->parms.virt_base, try_virt_end,
+                            dom->pfn_alloc_end) )
+            return -1;
 
-        if ( try_pfn_end > dom->p2m_size )
-        {
-            xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
-                         "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
-                         __FUNCTION__, try_pfn_end, dom->p2m_size);
-            return -ENOMEM;
-        }
-
-        domx86->pg_l4 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l4_bits);
-        domx86->pg_l3 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l3_bits);
-        domx86->pg_l2 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l2_bits);
-        domx86->pg_l1 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l1_bits);
-        if (pae && try_virt_end < 0xc0000000)
-        {
-            DOMPRINTF("%s: PAE: extra l2 page table for l3#3",
-                      __FUNCTION__);
-            domx86->pg_l2++;
-        }
-        domx86->pgtables = domx86->pg_l4 + domx86->pg_l3 +
-                           domx86->pg_l2 + domx86->pg_l1;
-        pages = domx86->pgtables + extra_pages;
+        pages = map->area.pgtables + extra_pages;
         if ( dom->virt_alloc_end + pages * PAGE_SIZE_X86 <= try_virt_end + 1 )
             break;
     }
+    map->area.pfn = 0;
+    domx86->n_mappings++;
     dom->virt_pgtab_end = try_virt_end + 1;
 
     return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
-                                domx86->pgtables * PAGE_SIZE_X86);
+                                map->area.pgtables * PAGE_SIZE_X86);
 }
 
 /* ------------------------------------------------------------------------ */
 /* i386 pagetables                                                          */
 
-#define L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED)
-#define L2_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER)
-#define L3_PROT (_PAGE_PRESENT)
+static struct xc_dom_params x86_32_params = {
+    .levels = PGTBL_LEVELS_I386,
+    .vaddr_mask = bits_to_mask(VIRT_BITS_I386),
+    .lvl_prot[0] = _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED,
+    .lvl_prot[1] = _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER,
+    .lvl_prot[2] = _PAGE_PRESENT,
+};
 
 static int alloc_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
-    return alloc_pgtables(dom, 1, 0, 32,
-                          L3_PAGETABLE_SHIFT_PAE, L2_PAGETABLE_SHIFT_PAE);
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+
+    domx86->params = &x86_32_params;
+    return alloc_pgtables(dom);
 }
 
 #define pfn_to_paddr(pfn) ((xen_paddr_t)(pfn) << PAGE_SHIFT_X86)
+#define pgentry_to_pfn(entry) ((xen_pfn_t)((entry) >> PAGE_SHIFT_X86))
 
 /*
  * Move the l3 page table page below 4G for guests which do not
@@ -270,20 +344,100 @@ static xen_pfn_t move_l3_below_4G(struct xc_dom_image *dom,
     return l3mfn;
 }
 
+static x86_pgentry_t *get_pg_table_x86(struct xc_dom_image *dom, int m, int l)
+{
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map;
+    x86_pgentry_t *pg;
+
+    map = domx86->maps + m;
+    pg = xc_dom_pfn_to_ptr(dom, map->lvls[l].pfn, 0);
+    if ( pg )
+        return pg;
+
+    xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                 "%s: xc_dom_pfn_to_ptr failed", __FUNCTION__);
+    return NULL;
+}
+
+static x86_pgentry_t get_pg_prot_x86(struct xc_dom_image *dom, int l,
+                                     xen_pfn_t pfn)
+{
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map;
+    xen_pfn_t pfn_s, pfn_e;
+    x86_pgentry_t prot;
+    unsigned m;
+
+    prot = domx86->params->lvl_prot[l];
+    if ( l > 0 )
+        return prot;
+
+    for ( m = 0; m < domx86->n_mappings; m++ )
+    {
+        map = domx86->maps + m;
+        pfn_s = map->lvls[domx86->params->levels - 1].pfn;
+        pfn_e = map->area.pgtables + pfn_s;
+        if ( pfn >= pfn_s && pfn < pfn_e )
+            return prot & ~_PAGE_RW;
+    }
+
+    return prot;
+}
+
+static int setup_pgtables_x86(struct xc_dom_image *dom)
+{
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map1, *map2;
+    struct xc_dom_x86_mapping_lvl *lvl;
+    xen_vaddr_t from, to;
+    xen_pfn_t pfn, p, p_s, p_e;
+    x86_pgentry_t *pg;
+    unsigned m1, m2;
+    int l;
+
+    for ( l = domx86->params->levels - 1; l >= 0; l-- )
+        for ( m1 = 0; m1 < domx86->n_mappings; m1++ )
+        {
+            map1 = domx86->maps + m1;
+            from = map1->lvls[l].from;
+            to = map1->lvls[l].to;
+            pg = get_pg_table_x86(dom, m1, l);
+            if ( !pg )
+                return -1;
+            for ( m2 = 0; m2 < domx86->n_mappings; m2++ )
+            {
+                map2 = domx86->maps + m2;
+                lvl = (l > 0) ? map2->lvls + l - 1 : &map2->area;
+                if ( l > 0 && lvl->pgtables == 0 )
+                    continue;
+                if ( lvl->from >= to || lvl->to <= from )
+                    continue;
+                p_s = (max(from, lvl->from) - from) >>
+                      (PAGE_SHIFT_X86 + l * PGTBL_LEVEL_SHIFT_X86);
+                p_e = (min(to, lvl->to) - from) >>
+                      (PAGE_SHIFT_X86 + l * PGTBL_LEVEL_SHIFT_X86);
+                pfn = ((max(from, lvl->from) - lvl->from) >>
+                      (PAGE_SHIFT_X86 + l * PGTBL_LEVEL_SHIFT_X86)) + lvl->pfn;
+                for ( p = p_s; p <= p_e; p++ )
+                {
+                    pg[p] = pfn_to_paddr(xc_dom_p2m(dom, pfn)) |
+                            get_pg_prot_x86(dom, l, pfn);
+                    pfn++;
+                }
+            }
+        }
+
+    return 0;
+}
+
 static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
     struct xc_dom_image_x86 *domx86 = dom->arch_private;
-    xen_pfn_t l3pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
-    l3_pgentry_64_t *l3tab;
-    l2_pgentry_64_t *l2tab = NULL;
-    l1_pgentry_64_t *l1tab = NULL;
-    unsigned long l3off, l2off = 0, l1off;
-    xen_vaddr_t addr;
-    xen_pfn_t pgpfn;
-    xen_pfn_t l3mfn = xc_dom_p2m(dom, l3pfn);
+    xen_pfn_t l3mfn, l3pfn;
 
+    l3pfn = domx86->maps[0].lvls[2].pfn;
+    l3mfn = xc_dom_p2m(dom, l3pfn);
     if ( dom->parms.pae == XEN_PAE_YES )
     {
         if ( l3mfn >= 0x100000 )
@@ -299,180 +453,34 @@ static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
         }
     }
 
-    l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
-    if ( l3tab == NULL )
-        goto pfn_error;
-
-    for ( addr = dom->parms.virt_base; addr < dom->virt_pgtab_end;
-          addr += PAGE_SIZE_X86 )
-    {
-        if ( l2tab == NULL )
-        {
-            /* get L2 tab, make L3 entry */
-            l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
-            if ( l2tab == NULL )
-                goto pfn_error;
-            l3off = l3_table_offset_pae(addr);
-            l3tab[l3off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
-            l2pfn++;
-        }
-
-        if ( l1tab == NULL )
-        {
-            /* get L1 tab, make L2 entry */
-            l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
-            if ( l1tab == NULL )
-                goto pfn_error;
-            l2off = l2_table_offset_pae(addr);
-            l2tab[l2off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
-            l1pfn++;
-        }
-
-        /* make L1 entry */
-        l1off = l1_table_offset_pae(addr);
-        pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
-        l1tab[l1off] =
-            pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
-        if ( (!dom->pvh_enabled)                &&
-             (addr >= dom->pgtables_seg.vstart) &&
-             (addr < dom->pgtables_seg.vend) )
-            l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
-
-        if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
-        {
-            l1tab = NULL;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
-                l2tab = NULL;
-        }
-    }
-
-    if ( dom->virt_pgtab_end <= 0xc0000000 )
-    {
-        DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
-        l3tab[3] = pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
-    }
-    return 0;
-
-pfn_error:
-    xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
-                 "%s: xc_dom_pfn_to_ptr failed", __FUNCTION__);
-    return -EINVAL;
+    return setup_pgtables_x86(dom);
 }
 
-#undef L1_PROT
-#undef L2_PROT
-#undef L3_PROT
-
 /* ------------------------------------------------------------------------ */
 /* x86_64 pagetables                                                        */
 
+static struct xc_dom_params x86_64_params = {
+    .levels = PGTBL_LEVELS_X86_64,
+    .vaddr_mask = bits_to_mask(VIRT_BITS_X86_64),
+    .lvl_prot[0] = _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED,
+    .lvl_prot[1] = _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER,
+    .lvl_prot[2] = _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER,
+    .lvl_prot[3] = _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER,
+};
+
 static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
 {
-    return alloc_pgtables(dom, 0,
-                          L4_PAGETABLE_SHIFT_X86_64 + 9,
-                          L4_PAGETABLE_SHIFT_X86_64,
-                          L3_PAGETABLE_SHIFT_X86_64,
-                          L2_PAGETABLE_SHIFT_X86_64);
-}
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
 
-#define L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED)
-#define L2_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER)
-#define L3_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER)
-#define L4_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER)
+    domx86->params = &x86_64_params;
+    return alloc_pgtables(dom);
+}
 
 static int setup_pgtables_x86_64(struct xc_dom_image *dom)
 {
-    struct xc_dom_image_x86 *domx86 = dom->arch_private;
-    xen_pfn_t l4pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l3pfn = l4pfn + domx86->pg_l4;
-    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
-    l4_pgentry_64_t *l4tab = xc_dom_pfn_to_ptr(dom, l4pfn, 1);
-    l3_pgentry_64_t *l3tab = NULL;
-    l2_pgentry_64_t *l2tab = NULL;
-    l1_pgentry_64_t *l1tab = NULL;
-    uint64_t l4off, l3off = 0, l2off = 0, l1off;
-    uint64_t addr;
-    xen_pfn_t pgpfn;
-
-    if ( l4tab == NULL )
-        goto pfn_error;
-
-    for ( addr = dom->parms.virt_base; addr < dom->virt_pgtab_end;
-          addr += PAGE_SIZE_X86 )
-    {
-        if ( l3tab == NULL )
-        {
-            /* get L3 tab, make L4 entry */
-            l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
-            if ( l3tab == NULL )
-                goto pfn_error;
-            l4off = l4_table_offset_x86_64(addr);
-            l4tab[l4off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l3pfn)) | L4_PROT;
-            l3pfn++;
-        }
-
-        if ( l2tab == NULL )
-        {
-            /* get L2 tab, make L3 entry */
-            l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
-            if ( l2tab == NULL )
-                goto pfn_error;
-            l3off = l3_table_offset_x86_64(addr);
-            l3tab[l3off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
-            l2pfn++;
-        }
-
-        if ( l1tab == NULL )
-        {
-            /* get L1 tab, make L2 entry */
-            l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
-            if ( l1tab == NULL )
-                goto pfn_error;
-            l2off = l2_table_offset_x86_64(addr);
-            l2tab[l2off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
-            l1pfn++;
-        }
-
-        /* make L1 entry */
-        l1off = l1_table_offset_x86_64(addr);
-        pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
-        l1tab[l1off] =
-            pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
-        if ( (!dom->pvh_enabled)                &&
-             (addr >= dom->pgtables_seg.vstart) &&
-             (addr < dom->pgtables_seg.vend) )
-            l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
-
-        if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
-        {
-            l1tab = NULL;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
-            {
-                l2tab = NULL;
-                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
-                    l3tab = NULL;
-            }
-        }
-    }
-    return 0;
-
-pfn_error:
-    xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
-                 "%s: xc_dom_pfn_to_ptr failed", __FUNCTION__);
-    return -EINVAL;
+    return setup_pgtables_x86(dom);
 }
 
-#undef L1_PROT
-#undef L2_PROT
-#undef L3_PROT
-#undef L4_PROT
-
 /* ------------------------------------------------------------------------ */
 
 static int alloc_p2m_list(struct xc_dom_image *dom)
@@ -659,7 +667,7 @@ static int start_info_x86_32(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = domx86->pgtables;
+    start_info->nr_pt_frames = domx86->maps[0].area.pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
@@ -706,7 +714,7 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = domx86->pgtables;
+    start_info->nr_pt_frames = domx86->maps[0].area.pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
diff --git a/tools/libxc/xg_private.h b/tools/libxc/xg_private.h
index 5544897..a76d6b3 100644
--- a/tools/libxc/xg_private.h
+++ b/tools/libxc/xg_private.h
@@ -62,10 +62,13 @@ unsigned long csum_page (void * page);
 #define _PAGE_PSE       0x080
 #define _PAGE_GLOBAL    0x100
 
-#define L1_PAGETABLE_SHIFT_I386       12
-#define L2_PAGETABLE_SHIFT_I386       22
-#define L1_PAGETABLE_ENTRIES_I386   1024
-#define L2_PAGETABLE_ENTRIES_I386   1024
+#define VIRT_BITS_I386     32
+#define VIRT_BITS_X86_64   48
+
+#define PGTBL_LEVELS_I386       3
+#define PGTBL_LEVELS_X86_64     4
+
+#define PGTBL_LEVEL_SHIFT_X86   9
 
 #define L1_PAGETABLE_SHIFT_PAE        12
 #define L2_PAGETABLE_SHIFT_PAE        21
@@ -83,33 +86,7 @@ unsigned long csum_page (void * page);
 #define L3_PAGETABLE_ENTRIES_X86_64  512
 #define L4_PAGETABLE_ENTRIES_X86_64  512
 
-typedef uint32_t l1_pgentry_32_t;
-typedef uint32_t l2_pgentry_32_t;
-typedef uint64_t l1_pgentry_64_t;
-typedef uint64_t l2_pgentry_64_t;
-typedef uint64_t l3_pgentry_64_t;
-typedef uint64_t l4_pgentry_64_t;
-
-#define l1_table_offset_i386(_a) \
-  (((_a) >> L1_PAGETABLE_SHIFT_I386) & (L1_PAGETABLE_ENTRIES_I386 - 1))
-#define l2_table_offset_i386(_a) \
-  (((_a) >> L2_PAGETABLE_SHIFT_I386) & (L2_PAGETABLE_ENTRIES_I386 - 1))
-
-#define l1_table_offset_pae(_a) \
-  (((_a) >> L1_PAGETABLE_SHIFT_PAE) & (L1_PAGETABLE_ENTRIES_PAE - 1))
-#define l2_table_offset_pae(_a) \
-  (((_a) >> L2_PAGETABLE_SHIFT_PAE) & (L2_PAGETABLE_ENTRIES_PAE - 1))
-#define l3_table_offset_pae(_a) \
-  (((_a) >> L3_PAGETABLE_SHIFT_PAE) & (L3_PAGETABLE_ENTRIES_PAE - 1))
-
-#define l1_table_offset_x86_64(_a) \
-  (((_a) >> L1_PAGETABLE_SHIFT_X86_64) & (L1_PAGETABLE_ENTRIES_X86_64 - 1))
-#define l2_table_offset_x86_64(_a) \
-  (((_a) >> L2_PAGETABLE_SHIFT_X86_64) & (L2_PAGETABLE_ENTRIES_X86_64 - 1))
-#define l3_table_offset_x86_64(_a) \
-  (((_a) >> L3_PAGETABLE_SHIFT_X86_64) & (L3_PAGETABLE_ENTRIES_X86_64 - 1))
-#define l4_table_offset_x86_64(_a) \
-  (((_a) >> L4_PAGETABLE_SHIFT_X86_64) & (L4_PAGETABLE_ENTRIES_X86_64 - 1))
+typedef uint64_t x86_pgentry_t;
 
 #define PAGE_SHIFT_ARM          12
 #define PAGE_SIZE_ARM           (1UL << PAGE_SHIFT_ARM)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 9/9] libxc: create p2m list outside of kernel mapping if supported
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (7 preceding siblings ...)
  2015-11-05 14:36 ` [PATCH v4 8/9] libxc: rework of domain builder's page table handler Juergen Gross
@ 2015-11-05 14:36 ` Juergen Gross
  2015-11-12 12:42   ` Wei Liu
  2015-11-12  5:09 ` [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
  9 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-11-05 14:36 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

In case the kernel of a new pv-domU indicates it is supporting a p2m
list outside the initial kernel mapping by specifying INIT_P2M, let
the domain builder allocate the memory for the p2m list from physical
guest memory only and map it to the address the kernel is expecting.

This will enable loading pv-domUs larger than 512 GB.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/include/xc_dom.h |  1 +
 tools/libxc/xc_dom_core.c    | 15 +++++++++++-
 tools/libxc/xc_dom_x86.c     | 56 ++++++++++++++++++++++++++++++++++++++------
 3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 7c157c3..ad8e47e 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -238,6 +238,7 @@ struct xc_dom_arch {
     char *native_protocol;
     int page_shift;
     int sizeof_pfn;
+    int p2m_base_supported;
     int arch_private_size;
 
     struct xc_dom_arch *next;
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index ad91b35..5d6c3ba 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -777,6 +777,7 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch,
     dom->parms.virt_hypercall = UNSET_ADDR;
     dom->parms.virt_hv_start_low = UNSET_ADDR;
     dom->parms.elf_paddr_offset = UNSET_ADDR;
+    dom->parms.p2m_base = UNSET_ADDR;
 
     dom->alloc_malloc += sizeof(*dom);
     return dom;
@@ -1096,7 +1097,11 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     }
 
     /* allocate other pages */
-    if ( dom->arch_hooks->alloc_p2m_list &&
+    if ( !dom->arch_hooks->p2m_base_supported ||
+         dom->parms.p2m_base >= dom->parms.virt_base ||
+         (dom->parms.p2m_base & (XC_DOM_PAGE_SIZE(dom) - 1)) )
+        dom->parms.p2m_base = UNSET_ADDR;
+    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base == UNSET_ADDR &&
          dom->arch_hooks->alloc_p2m_list(dom) != 0 )
         goto err;
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
@@ -1124,6 +1129,14 @@ int xc_dom_build_image(struct xc_dom_image *dom)
         dom->initrd_len = page_size * dom->ramdisk_seg.pages;
     }
 
+    /* Allocate p2m list if outside of initial kernel mapping. */
+    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base != UNSET_ADDR )
+    {
+        if ( dom->arch_hooks->alloc_p2m_list(dom) != 0 )
+            goto err;
+        dom->p2m_seg.vstart = dom->parms.p2m_base;
+    }
+
     return 0;
 
  err:
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 497aa55..147468c 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -69,6 +69,7 @@
 #define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits))-1)
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
+#define round_pg_up(addr)  (((addr) + PAGE_SIZE_X86 - 1) & ~(PAGE_SIZE_X86 - 1))
 
 struct xc_dom_params {
     unsigned levels;
@@ -90,7 +91,7 @@ struct xc_dom_x86_mapping {
 
 struct xc_dom_image_x86 {
     unsigned n_mappings;
-#define MAPPING_MAX 1
+#define MAPPING_MAX 2
     struct xc_dom_x86_mapping maps[MAPPING_MAX];
     struct xc_dom_params *params;
 };
@@ -483,11 +484,8 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
 
 /* ------------------------------------------------------------------------ */
 
-static int alloc_p2m_list(struct xc_dom_image *dom)
+static int alloc_p2m_list(struct xc_dom_image *dom, size_t p2m_alloc_size)
 {
-    size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
-
-    /* allocate phys2mach table */
     if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach",
                               0, p2m_alloc_size) )
         return -1;
@@ -498,6 +496,40 @@ static int alloc_p2m_list(struct xc_dom_image *dom)
     return 0;
 }
 
+static int alloc_p2m_list_x86_32(struct xc_dom_image *dom)
+{
+    size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
+
+    p2m_alloc_size = round_pg_up(p2m_alloc_size);
+    return alloc_p2m_list(dom, p2m_alloc_size);
+}
+
+static int alloc_p2m_list_x86_64(struct xc_dom_image *dom)
+{
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
+    size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
+    xen_vaddr_t from, to;
+    unsigned lvl;
+
+    p2m_alloc_size = round_pg_up(p2m_alloc_size);
+    if ( dom->parms.p2m_base != UNSET_ADDR )
+    {
+        from = dom->parms.p2m_base;
+        to = from + p2m_alloc_size - 1;
+        if ( count_pgtables(dom, from, to, dom->pfn_alloc_end) )
+            return -1;
+
+        map->area.pfn = dom->pfn_alloc_end;
+        for ( lvl = 0; lvl < 4; lvl++ )
+            map->lvls[lvl].pfn += p2m_alloc_size >> PAGE_SHIFT_X86;
+        domx86->n_mappings++;
+        p2m_alloc_size += map->area.pgtables << PAGE_SHIFT_X86;
+    }
+
+    return alloc_p2m_list(dom, p2m_alloc_size);
+}
+
 /* ------------------------------------------------------------------------ */
 
 static int alloc_magic_pages(struct xc_dom_image *dom)
@@ -716,6 +748,11 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->pt_base = dom->pgtables_seg.vstart;
     start_info->nr_pt_frames = domx86->maps[0].area.pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
+    if ( dom->parms.p2m_base != UNSET_ADDR )
+    {
+        start_info->first_p2m_pfn = dom->p2m_seg.pfn;
+        start_info->nr_p2m_frames = dom->p2m_seg.pages;
+    }
 
     start_info->flags = dom->flags;
     start_info->store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn);
@@ -1600,7 +1637,10 @@ static int bootlate_pv(struct xc_dom_image *dom)
     if ( !xc_dom_feature_translated(dom) )
     {
         /* paravirtualized guest */
+
+        /* Drop references to all initial page tables before pinning. */
         xc_dom_unmap_one(dom, dom->pgtables_seg.pfn);
+        xc_dom_unmap_one(dom, dom->p2m_seg.pfn);
         rc = pin_table(dom->xch, pgd_type,
                        xc_dom_p2m(dom, dom->pgtables_seg.pfn),
                        dom->guest_domid);
@@ -1679,10 +1719,11 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_32,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
+    .p2m_base_supported = 0,
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_32_pae,
-    .alloc_p2m_list = alloc_p2m_list,
+    .alloc_p2m_list = alloc_p2m_list_x86_32,
     .setup_pgtables = setup_pgtables_x86_32_pae,
     .start_info = start_info_x86_32,
     .shared_info = shared_info_x86_32,
@@ -1697,10 +1738,11 @@ static struct xc_dom_arch xc_dom_64 = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_64,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
+    .p2m_base_supported = 1,
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_64,
-    .alloc_p2m_list = alloc_p2m_list,
+    .alloc_p2m_list = alloc_p2m_list_x86_64,
     .setup_pgtables = setup_pgtables_x86_64,
     .start_info = start_info_x86_64,
     .shared_info = shared_info_x86_64,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-11-05 14:36 ` [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
@ 2015-11-05 17:38   ` Andrew Cooper
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2015-11-05 17:38 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2, roger.pau
  Cc: keir, jbeulich

On 05/11/15 14:36, Juergen Gross wrote:
> Support of an unmapped initrd is indicated by the kernel of the domain
> via elf notes. In order not to have to use raw elf data in the tools
> for support of an unmapped initrd add a flag to the parsed data area
> to indicate the kernel supporting this feature.
>
> Switch using this flag in the hypervisor domain builder.
>
> Cc: andrew.cooper3@citrix.com
> Cc: jbeulich@suse.com
> Cc: keir@xen.org
> Suggested-by: Ian Campbell <ian.campbell@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 0/9] libxc: support building large pv-domains
  2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (8 preceding siblings ...)
  2015-11-05 14:36 ` [PATCH v4 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
@ 2015-11-12  5:09 ` Juergen Gross
  2015-11-12  9:39   ` Wei Liu
  9 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-11-12  5:09 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau

On 05/11/15 15:36, Juergen Gross wrote:
> The Xen hypervisor supports starting a dom0 with large memory (up to
> the TB range) by not including the initrd and p2m list in the initial
> kernel mapping. Especially the p2m list can grow larger than the
> available virtual space in the initial mapping.
> 
> The started kernel is indicating the support of each feature via
> elf notes.
> 
> This series enables the domain builder in libxc to do the same as the
> hypervisor. This enables starting of huge pv-domUs via xl.
> 
> Unmapped initrd is supported for 64 and 32 bit domains, omitting the
> p2m from initial kernel mapping is possible for 64 bit domains only.
> 
> Tested with:
> - 32 bit domU (kernel not supporting unmapped initrd)
> - 32 bit domU (kernel supporting unmapped initrd)
> - 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m)
> - 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m)
> - 900GB 64 bit domU (kernel supporting unmapped initrd and p2m)
> - HVM domU
> 
> Changes in v4:
> - updated patch 1 as suggested by Wei Liu (comment and variable name)
> - modify comment in patch 6 as suggested by Wei Liu
> - rework of patch 8 reducing line count by nearly 100
> - added some additional plausibility checks to patch 8 as suggested by
>   Wei Liu
> - renamed round_pg() to round_pg_up() in patch 9 as suggested by Wei Liu
> 
> Changes in v3:
> - Rebased the complete series to new staging (hvm builder patches by
>   Roger Pau Monne)
> - Removed old patch 1 as it broke stubdom build
> - Introduced new Patch 1 to make allocation of guest memory more clear
>   regarding virtual/physical memory allocation (requested by Ian Campbell)
> - Change name of flag to indicate support of unmapped initrd in patch 2
>   (requested by Ian Campbell)
> - Introduce new patches 3, 4, 5 ("rename domain builder count_pgtables to
>   alloc_pgtables", "introduce domain builder architecture specific data",
>   "use domain builder architecture private data for x86 pv domains") to
>   assist later page table work
> - don't fiddle with initrd virtual address in patch 6 (was patch 3 in v2),
>   add explicit initrd parameters for start_info in struct xc_dom_image
>   instead (requested by Ian Campbell)
> - Introduce new patch 8 ("rework of domain builder's page table handler")
>   to be able to use common helpers for unmapped p2m list (requested by
>   Ian Campbell)
> - use now known segment size in pages for p2m list in patch 9 (was patch
>   5 in v2) instead of fiddling with segment end address (requested by
>   Ian Campbell)
> - split alloc_p2m_list() in patch 9 (was patch 5 in v2) to 32/64 bit
>   variants (requested by Ian Campbell)
> 
> Changes in v2:
> - patch 2 has been removed as it has been applied already
> - introduced new patch 2 as suggested by Ian Campbell: add a flag
>   indicating support of an unmapped initrd to the parsed elf data of
>   the elf_dom_parms structure
> - updated patch description of patch 3 as requested by Ian Campbell
> 
> 
> Juergen Gross (9):
>   libxc: reorganize domain builder guest memory allocator
>   xen: add generic flag to elf_dom_parms indicating support of unmapped
>     initrd
>   libxc: rename domain builder count_pgtables to alloc_pgtables
>   libxc: introduce domain builder architecture specific data
>   libxc: use domain builder architecture private data for x86 pv domains
>   libxc: create unmapped initrd in domain builder if supported
>   libxc: split p2m allocation in domain builder from other magic pages
>   libxc: rework of domain builder's page table handler
>   libxc: create p2m list outside of kernel mapping if supported
> 
>  stubdom/grub/kexec.c               |  12 +-
>  tools/libxc/include/xc_dom.h       |  34 +--
>  tools/libxc/xc_dom_arm.c           |   6 +-
>  tools/libxc/xc_dom_core.c          | 180 ++++++++----
>  tools/libxc/xc_dom_x86.c           | 563 +++++++++++++++++++++----------------
>  tools/libxc/xg_private.h           |  39 +--
>  xen/arch/x86/domain_build.c        |   4 +-
>  xen/common/libelf/libelf-dominfo.c |   3 +
>  xen/include/xen/libelf.h           |   1 +
>  9 files changed, 490 insertions(+), 352 deletions(-)

Could someone please comment on patches 1, 8 and 9?


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 0/9] libxc: support building large pv-domains
  2015-11-12  5:09 ` [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
@ 2015-11-12  9:39   ` Wei Liu
  2015-11-12  9:41     ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2015-11-12  9:39 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Thu, Nov 12, 2015 at 06:09:54AM +0100, Juergen Gross wrote:
> On 05/11/15 15:36, Juergen Gross wrote:
> > The Xen hypervisor supports starting a dom0 with large memory (up to
> > the TB range) by not including the initrd and p2m list in the initial
> > kernel mapping. Especially the p2m list can grow larger than the
> > available virtual space in the initial mapping.
> > 
> > The started kernel is indicating the support of each feature via
> > elf notes.
> > 
> > This series enables the domain builder in libxc to do the same as the
> > hypervisor. This enables starting of huge pv-domUs via xl.
> > 
> > Unmapped initrd is supported for 64 and 32 bit domains, omitting the
> > p2m from initial kernel mapping is possible for 64 bit domains only.
> > 
> > Tested with:
> > - 32 bit domU (kernel not supporting unmapped initrd)
> > - 32 bit domU (kernel supporting unmapped initrd)
> > - 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m)
> > - 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m)
> > - 900GB 64 bit domU (kernel supporting unmapped initrd and p2m)
> > - HVM domU
> > 
> > Changes in v4:
> > - updated patch 1 as suggested by Wei Liu (comment and variable name)
> > - modify comment in patch 6 as suggested by Wei Liu
> > - rework of patch 8 reducing line count by nearly 100
> > - added some additional plausibility checks to patch 8 as suggested by
> >   Wei Liu
> > - renamed round_pg() to round_pg_up() in patch 9 as suggested by Wei Liu
> > 
> > Changes in v3:
> > - Rebased the complete series to new staging (hvm builder patches by
> >   Roger Pau Monne)
> > - Removed old patch 1 as it broke stubdom build
> > - Introduced new Patch 1 to make allocation of guest memory more clear
> >   regarding virtual/physical memory allocation (requested by Ian Campbell)
> > - Change name of flag to indicate support of unmapped initrd in patch 2
> >   (requested by Ian Campbell)
> > - Introduce new patches 3, 4, 5 ("rename domain builder count_pgtables to
> >   alloc_pgtables", "introduce domain builder architecture specific data",
> >   "use domain builder architecture private data for x86 pv domains") to
> >   assist later page table work
> > - don't fiddle with initrd virtual address in patch 6 (was patch 3 in v2),
> >   add explicit initrd parameters for start_info in struct xc_dom_image
> >   instead (requested by Ian Campbell)
> > - Introduce new patch 8 ("rework of domain builder's page table handler")
> >   to be able to use common helpers for unmapped p2m list (requested by
> >   Ian Campbell)
> > - use now known segment size in pages for p2m list in patch 9 (was patch
> >   5 in v2) instead of fiddling with segment end address (requested by
> >   Ian Campbell)
> > - split alloc_p2m_list() in patch 9 (was patch 5 in v2) to 32/64 bit
> >   variants (requested by Ian Campbell)
> > 
> > Changes in v2:
> > - patch 2 has been removed as it has been applied already
> > - introduced new patch 2 as suggested by Ian Campbell: add a flag
> >   indicating support of an unmapped initrd to the parsed elf data of
> >   the elf_dom_parms structure
> > - updated patch description of patch 3 as requested by Ian Campbell
> > 
> > 
> > Juergen Gross (9):
> >   libxc: reorganize domain builder guest memory allocator
> >   xen: add generic flag to elf_dom_parms indicating support of unmapped
> >     initrd
> >   libxc: rename domain builder count_pgtables to alloc_pgtables
> >   libxc: introduce domain builder architecture specific data
> >   libxc: use domain builder architecture private data for x86 pv domains
> >   libxc: create unmapped initrd in domain builder if supported
> >   libxc: split p2m allocation in domain builder from other magic pages
> >   libxc: rework of domain builder's page table handler
> >   libxc: create p2m list outside of kernel mapping if supported
> > 
> >  stubdom/grub/kexec.c               |  12 +-
> >  tools/libxc/include/xc_dom.h       |  34 +--
> >  tools/libxc/xc_dom_arm.c           |   6 +-
> >  tools/libxc/xc_dom_core.c          | 180 ++++++++----
> >  tools/libxc/xc_dom_x86.c           | 563 +++++++++++++++++++++----------------
> >  tools/libxc/xg_private.h           |  39 +--
> >  xen/arch/x86/domain_build.c        |   4 +-
> >  xen/common/libelf/libelf-dominfo.c |   3 +
> >  xen/include/xen/libelf.h           |   1 +
> >  9 files changed, 490 insertions(+), 352 deletions(-)
> 
> Could someone please comment on patches 1, 8 and 9?
> 

It's on my radar. I haven't got to this yet. I will review this series
at some point.

Wei.

> 
> Juergen
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 0/9] libxc: support building large pv-domains
  2015-11-12  9:39   ` Wei Liu
@ 2015-11-12  9:41     ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-12  9:41 UTC (permalink / raw)
  To: Wei Liu
  Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 12/11/15 10:39, Wei Liu wrote:
> On Thu, Nov 12, 2015 at 06:09:54AM +0100, Juergen Gross wrote:
>> On 05/11/15 15:36, Juergen Gross wrote:
>>> The Xen hypervisor supports starting a dom0 with large memory (up to
>>> the TB range) by not including the initrd and p2m list in the initial
>>> kernel mapping. Especially the p2m list can grow larger than the
>>> available virtual space in the initial mapping.
>>>
>>> The started kernel is indicating the support of each feature via
>>> elf notes.
>>>
>>> This series enables the domain builder in libxc to do the same as the
>>> hypervisor. This enables starting of huge pv-domUs via xl.
>>>
>>> Unmapped initrd is supported for 64 and 32 bit domains, omitting the
>>> p2m from initial kernel mapping is possible for 64 bit domains only.
>>>
>>> Tested with:
>>> - 32 bit domU (kernel not supporting unmapped initrd)
>>> - 32 bit domU (kernel supporting unmapped initrd)
>>> - 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m)
>>> - 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m)
>>> - 900GB 64 bit domU (kernel supporting unmapped initrd and p2m)
>>> - HVM domU
>>>
>>> Changes in v4:
>>> - updated patch 1 as suggested by Wei Liu (comment and variable name)
>>> - modify comment in patch 6 as suggested by Wei Liu
>>> - rework of patch 8 reducing line count by nearly 100
>>> - added some additional plausibility checks to patch 8 as suggested by
>>>   Wei Liu
>>> - renamed round_pg() to round_pg_up() in patch 9 as suggested by Wei Liu
>>>
>>> Changes in v3:
>>> - Rebased the complete series to new staging (hvm builder patches by
>>>   Roger Pau Monne)
>>> - Removed old patch 1 as it broke stubdom build
>>> - Introduced new Patch 1 to make allocation of guest memory more clear
>>>   regarding virtual/physical memory allocation (requested by Ian Campbell)
>>> - Change name of flag to indicate support of unmapped initrd in patch 2
>>>   (requested by Ian Campbell)
>>> - Introduce new patches 3, 4, 5 ("rename domain builder count_pgtables to
>>>   alloc_pgtables", "introduce domain builder architecture specific data",
>>>   "use domain builder architecture private data for x86 pv domains") to
>>>   assist later page table work
>>> - don't fiddle with initrd virtual address in patch 6 (was patch 3 in v2),
>>>   add explicit initrd parameters for start_info in struct xc_dom_image
>>>   instead (requested by Ian Campbell)
>>> - Introduce new patch 8 ("rework of domain builder's page table handler")
>>>   to be able to use common helpers for unmapped p2m list (requested by
>>>   Ian Campbell)
>>> - use now known segment size in pages for p2m list in patch 9 (was patch
>>>   5 in v2) instead of fiddling with segment end address (requested by
>>>   Ian Campbell)
>>> - split alloc_p2m_list() in patch 9 (was patch 5 in v2) to 32/64 bit
>>>   variants (requested by Ian Campbell)
>>>
>>> Changes in v2:
>>> - patch 2 has been removed as it has been applied already
>>> - introduced new patch 2 as suggested by Ian Campbell: add a flag
>>>   indicating support of an unmapped initrd to the parsed elf data of
>>>   the elf_dom_parms structure
>>> - updated patch description of patch 3 as requested by Ian Campbell
>>>
>>>
>>> Juergen Gross (9):
>>>   libxc: reorganize domain builder guest memory allocator
>>>   xen: add generic flag to elf_dom_parms indicating support of unmapped
>>>     initrd
>>>   libxc: rename domain builder count_pgtables to alloc_pgtables
>>>   libxc: introduce domain builder architecture specific data
>>>   libxc: use domain builder architecture private data for x86 pv domains
>>>   libxc: create unmapped initrd in domain builder if supported
>>>   libxc: split p2m allocation in domain builder from other magic pages
>>>   libxc: rework of domain builder's page table handler
>>>   libxc: create p2m list outside of kernel mapping if supported
>>>
>>>  stubdom/grub/kexec.c               |  12 +-
>>>  tools/libxc/include/xc_dom.h       |  34 +--
>>>  tools/libxc/xc_dom_arm.c           |   6 +-
>>>  tools/libxc/xc_dom_core.c          | 180 ++++++++----
>>>  tools/libxc/xc_dom_x86.c           | 563 +++++++++++++++++++++----------------
>>>  tools/libxc/xg_private.h           |  39 +--
>>>  xen/arch/x86/domain_build.c        |   4 +-
>>>  xen/common/libelf/libelf-dominfo.c |   3 +
>>>  xen/include/xen/libelf.h           |   1 +
>>>  9 files changed, 490 insertions(+), 352 deletions(-)
>>
>> Could someone please comment on patches 1, 8 and 9?
>>
> 
> It's on my radar. I haven't got to this yet. I will review this series
> at some point.

Thanks.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-05 14:36 ` [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
@ 2015-11-12 11:14   ` Wei Liu
  2015-11-12 11:20     ` Ian Campbell
  2015-11-12 11:21     ` Juergen Gross
  0 siblings, 2 replies; 25+ messages in thread
From: Wei Liu @ 2015-11-12 11:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
> Guest memory allocation in the domain builder of libxc is done via
> virtual addresses only. In order to be able to support preallocated
> areas not virtually mapped reorganize the memory allocator to keep
> track of allocated pages globally and in allocated segments.
> 
> This requires an interface change of the allocate callback of the
> domain builder which currently is using the last mapped virtual
> address as a parameter. This is no problem as the only user of this
> callback is stubdom/grub/kexec.c using this virtual address to
> calculate the last used pfn.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

One question: did you test building with pygrub after this change? I
think the code is correct but we'd better be sure it doesn't break
pygrub.

Wei.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-12 11:14   ` Wei Liu
@ 2015-11-12 11:20     ` Ian Campbell
  2015-11-12 11:22       ` Wei Liu
  2015-11-12 11:29       ` Juergen Gross
  2015-11-12 11:21     ` Juergen Gross
  1 sibling, 2 replies; 25+ messages in thread
From: Ian Campbell @ 2015-11-12 11:20 UTC (permalink / raw)
  To: Wei Liu, Juergen Gross
  Cc: stefano.stabellini, ian.jackson, roger.pau, xen-devel

On Thu, 2015-11-12 at 11:14 +0000, Wei Liu wrote:
> On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
> > Guest memory allocation in the domain builder of libxc is done via
> > virtual addresses only. In order to be able to support preallocated
> > areas not virtually mapped reorganize the memory allocator to keep
> > track of allocated pages globally and in allocated segments.
> > 
> > This requires an interface change of the allocate callback of the
> > domain builder which currently is using the last mapped virtual
> > address as a parameter. This is no problem as the only user of this
> > callback is stubdom/grub/kexec.c using this virtual address to
> > calculate the last used pfn.
> > 
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> One question: did you test building with pygrub after this change? I
> think the code is correct but we'd better be sure it doesn't break
> pygrub.

Did you mean pvgrub? (with a Vee)

pygrub (with a Why) runs way before the libxc domain builder gets in on the
act, doesn't it.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-12 11:14   ` Wei Liu
  2015-11-12 11:20     ` Ian Campbell
@ 2015-11-12 11:21     ` Juergen Gross
  2015-11-12 11:24       ` Wei Liu
  1 sibling, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-11-12 11:21 UTC (permalink / raw)
  To: Wei Liu
  Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 12/11/15 12:14, Wei Liu wrote:
> On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
>> Guest memory allocation in the domain builder of libxc is done via
>> virtual addresses only. In order to be able to support preallocated
>> areas not virtually mapped reorganize the memory allocator to keep
>> track of allocated pages globally and in allocated segments.
>>
>> This requires an interface change of the allocate callback of the
>> domain builder which currently is using the last mapped virtual
>> address as a parameter. This is no problem as the only user of this
>> callback is stubdom/grub/kexec.c using this virtual address to
>> calculate the last used pfn.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> One question: did you test building with pygrub after this change? I
> think the code is correct but we'd better be sure it doesn't break
> pygrub.

Yes, pygrub is being built in my environment.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-12 11:20     ` Ian Campbell
@ 2015-11-12 11:22       ` Wei Liu
  2015-11-12 11:29       ` Juergen Gross
  1 sibling, 0 replies; 25+ messages in thread
From: Wei Liu @ 2015-11-12 11:22 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Juergen Gross, Wei Liu, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Thu, Nov 12, 2015 at 11:20:47AM +0000, Ian Campbell wrote:
> On Thu, 2015-11-12 at 11:14 +0000, Wei Liu wrote:
> > On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
> > > Guest memory allocation in the domain builder of libxc is done via
> > > virtual addresses only. In order to be able to support preallocated
> > > areas not virtually mapped reorganize the memory allocator to keep
> > > track of allocated pages globally and in allocated segments.
> > > 
> > > This requires an interface change of the allocate callback of the
> > > domain builder which currently is using the last mapped virtual
> > > address as a parameter. This is no problem as the only user of this
> > > callback is stubdom/grub/kexec.c using this virtual address to
> > > calculate the last used pfn.
> > > 
> > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > 
> > One question: did you test building with pygrub after this change? I
> > think the code is correct but we'd better be sure it doesn't break
> > pygrub.
> 
> Did you mean pvgrub? (with a Vee)
> 

Yes. p*V*grub. Sorry for the typo. :-)

> pygrub (with a Why) runs way before the libxc domain builder gets in on the
> act, doesn't it.
> 
> Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-12 11:21     ` Juergen Gross
@ 2015-11-12 11:24       ` Wei Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2015-11-12 11:24 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Wei Liu, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Thu, Nov 12, 2015 at 12:21:24PM +0100, Juergen Gross wrote:
> On 12/11/15 12:14, Wei Liu wrote:
> > On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
> >> Guest memory allocation in the domain builder of libxc is done via
> >> virtual addresses only. In order to be able to support preallocated
> >> areas not virtually mapped reorganize the memory allocator to keep
> >> track of allocated pages globally and in allocated segments.
> >>
> >> This requires an interface change of the allocate callback of the
> >> domain builder which currently is using the last mapped virtual
> >> address as a parameter. This is no problem as the only user of this
> >> callback is stubdom/grub/kexec.c using this virtual address to
> >> calculate the last used pfn.
> >>
> >> Signed-off-by: Juergen Gross <jgross@suse.com>
> > 
> > One question: did you test building with pygrub after this change? I
> > think the code is correct but we'd better be sure it doesn't break
> > pygrub.
> 
> Yes, pygrub is being built in my environment.
> 

Sorry. I meant pvgrub.

You should be able to test it according to

  http://wiki.xenproject.org/wiki/PvGrub

The test is to make sure the change to stubdom/grub/* doesn't break
things.

Wei.

> 
> Juergen
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-12 11:20     ` Ian Campbell
  2015-11-12 11:22       ` Wei Liu
@ 2015-11-12 11:29       ` Juergen Gross
  2015-11-12 11:32         ` Wei Liu
  1 sibling, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-11-12 11:29 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu
  Cc: stefano.stabellini, ian.jackson, roger.pau, xen-devel

On 12/11/15 12:20, Ian Campbell wrote:
> On Thu, 2015-11-12 at 11:14 +0000, Wei Liu wrote:
>> On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
>>> Guest memory allocation in the domain builder of libxc is done via
>>> virtual addresses only. In order to be able to support preallocated
>>> areas not virtually mapped reorganize the memory allocator to keep
>>> track of allocated pages globally and in allocated segments.
>>>
>>> This requires an interface change of the allocate callback of the
>>> domain builder which currently is using the last mapped virtual
>>> address as a parameter. This is no problem as the only user of this
>>> callback is stubdom/grub/kexec.c using this virtual address to
>>> calculate the last used pfn.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> One question: did you test building with pygrub after this change? I
>> think the code is correct but we'd better be sure it doesn't break
>> pygrub.
> 
> Did you mean pvgrub? (with a Vee)
> 
> pygrub (with a Why) runs way before the libxc domain builder gets in on the
> act, doesn't it.

Yep.

To be clear: we have:

- pygrub: running on dom0, mounts the root disk and analyzes the grub
  configuration found, selects kernel and initrd and loads those via
  libxl.

- pvgrub: based on grub2, is started via libxl as paravirtualized kernel
  in the new domain. pvgrub is reading the grub configuration and starts
  the appropriate kernel via it's own mechanisms (patches for support of
  huge domains are sent, no comments up to now).

- stubdom based grub: subject to the patch we are discussing here. It is
  being built in my environment, but not tested up to now.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-12 11:29       ` Juergen Gross
@ 2015-11-12 11:32         ` Wei Liu
  2015-11-12 12:28           ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2015-11-12 11:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Wei Liu, Ian Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Thu, Nov 12, 2015 at 12:29:06PM +0100, Juergen Gross wrote:
> On 12/11/15 12:20, Ian Campbell wrote:
> > On Thu, 2015-11-12 at 11:14 +0000, Wei Liu wrote:
> >> On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
> >>> Guest memory allocation in the domain builder of libxc is done via
> >>> virtual addresses only. In order to be able to support preallocated
> >>> areas not virtually mapped reorganize the memory allocator to keep
> >>> track of allocated pages globally and in allocated segments.
> >>>
> >>> This requires an interface change of the allocate callback of the
> >>> domain builder which currently is using the last mapped virtual
> >>> address as a parameter. This is no problem as the only user of this
> >>> callback is stubdom/grub/kexec.c using this virtual address to
> >>> calculate the last used pfn.
> >>>
> >>> Signed-off-by: Juergen Gross <jgross@suse.com>
> >>
> >> One question: did you test building with pygrub after this change? I
> >> think the code is correct but we'd better be sure it doesn't break
> >> pygrub.
> > 
> > Did you mean pvgrub? (with a Vee)
> > 
> > pygrub (with a Why) runs way before the libxc domain builder gets in on the
> > act, doesn't it.
> 
> Yep.
> 
> To be clear: we have:
> 
> - pygrub: running on dom0, mounts the root disk and analyzes the grub
>   configuration found, selects kernel and initrd and loads those via
>   libxl.
> 
> - pvgrub: based on grub2, is started via libxl as paravirtualized kernel
>   in the new domain. pvgrub is reading the grub configuration and starts
>   the appropriate kernel via it's own mechanisms (patches for support of
>   huge domains are sent, no comments up to now).
> 
> - stubdom based grub: subject to the patch we are discussing here. It is
>   being built in my environment, but not tested up to now.
> 
> 

I think we use pvgrub to denote the third case (according to our wiki).
The second case is just ... upstream grub2 with pv capability ?

(So many grubs...)

> Juergen
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator
  2015-11-12 11:32         ` Wei Liu
@ 2015-11-12 12:28           ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-12 12:28 UTC (permalink / raw)
  To: Wei Liu
  Cc: roger.pau, xen-devel, ian.jackson, Ian Campbell, stefano.stabellini

On 12/11/15 12:32, Wei Liu wrote:
> On Thu, Nov 12, 2015 at 12:29:06PM +0100, Juergen Gross wrote:
>> On 12/11/15 12:20, Ian Campbell wrote:
>>> On Thu, 2015-11-12 at 11:14 +0000, Wei Liu wrote:
>>>> On Thu, Nov 05, 2015 at 03:36:27PM +0100, Juergen Gross wrote:
>>>>> Guest memory allocation in the domain builder of libxc is done via
>>>>> virtual addresses only. In order to be able to support preallocated
>>>>> areas not virtually mapped reorganize the memory allocator to keep
>>>>> track of allocated pages globally and in allocated segments.
>>>>>
>>>>> This requires an interface change of the allocate callback of the
>>>>> domain builder which currently is using the last mapped virtual
>>>>> address as a parameter. This is no problem as the only user of this
>>>>> callback is stubdom/grub/kexec.c using this virtual address to
>>>>> calculate the last used pfn.
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> One question: did you test building with pygrub after this change? I
>>>> think the code is correct but we'd better be sure it doesn't break
>>>> pygrub.
>>>
>>> Did you mean pvgrub? (with a Vee)
>>>
>>> pygrub (with a Why) runs way before the libxc domain builder gets in on the
>>> act, doesn't it.
>>
>> Yep.
>>
>> To be clear: we have:
>>
>> - pygrub: running on dom0, mounts the root disk and analyzes the grub
>>   configuration found, selects kernel and initrd and loads those via
>>   libxl.
>>
>> - pvgrub: based on grub2, is started via libxl as paravirtualized kernel
>>   in the new domain. pvgrub is reading the grub configuration and starts
>>   the appropriate kernel via it's own mechanisms (patches for support of
>>   huge domains are sent, no comments up to now).
>>
>> - stubdom based grub: subject to the patch we are discussing here. It is
>>   being built in my environment, but not tested up to now.
>>
>>
> 
> I think we use pvgrub to denote the third case (according to our wiki).
> The second case is just ... upstream grub2 with pv capability ?

Urgs. Okay. Nevertheless: I've built it. :-)

> (So many grubs...)

I'm particularly happy you didn't come up with a variant I wasn't aware
of. :-)


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 8/9] libxc: rework of domain builder's page table handler
  2015-11-05 14:36 ` [PATCH v4 8/9] libxc: rework of domain builder's page table handler Juergen Gross
@ 2015-11-12 12:39   ` Wei Liu
  2015-11-12 12:45     ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2015-11-12 12:39 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Thu, Nov 05, 2015 at 03:36:34PM +0100, Juergen Gross wrote:
> In order to prepare a p2m list outside of the initial kernel mapping
> do a rework of the domain builder's page table handler. The goal is
> to be able to use common helpers for page table allocation and setup
> for initial kernel page tables and page tables mapping the p2m list.
> This is achieved by supporting multiple mapping areas. The mapped
> virtual addresses of the single areas must not overlap, while the
> page tables of a new area added might already be partially present.
> Especially the top level page table is existing only once, of course.
> 
> Currently restrict the number of mappings to 1 because the only mapping
> now is the initial mapping created by toolstack. There should not be
> behaviour change and guest visible change introduced.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Some comments below.

[...]
>  
> -static unsigned long
> -nr_page_tables(struct xc_dom_image *dom,
> -               xen_vaddr_t start, xen_vaddr_t end, unsigned long bits)
> +static int count_pgtables(struct xc_dom_image *dom, xen_vaddr_t from,
> +                          xen_vaddr_t to, xen_pfn_t pfn)
>  {
> -    xen_vaddr_t mask = bits_to_mask(bits);
> -    int tables;
> +    struct xc_dom_image_x86 *domx86 = dom->arch_private;
> +    struct xc_dom_x86_mapping *map, *map_cmp;
> +    xen_pfn_t pfn_end;
> +    xen_vaddr_t mask;
> +    unsigned bits;
> +    int l, m;
>  
> -    if ( bits == 0 )
> -        return 0;  /* unused */
> +    if ( domx86->n_mappings == MAPPING_MAX )
> +    {
> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
> +                     "%s: too many mappings\n", __FUNCTION__);
> +        return -ENOMEM;
> +    }
> +    map = domx86->maps + domx86->n_mappings;
>  
> -    if ( bits == (8 * sizeof(unsigned long)) )
> +    pfn_end = pfn + ((to - from) >> PAGE_SHIFT_X86);
> +    if ( pfn_end >= dom->p2m_size )
>      {
> -        /* must be pgd, need one */
> -        start = 0;
> -        end = -1;
> -        tables = 1;
> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
> +                     "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
> +                     __FUNCTION__, pfn_end, dom->p2m_size);
> +        return -ENOMEM;
>      }
> -    else
> +    for ( m = 0; m < domx86->n_mappings; m++ )
>      {
> -        start = round_down(start, mask);
> -        end = round_up(end, mask);
> -        tables = ((end - start) >> bits) + 1;
> +        map_cmp = domx86->maps + m;
> +        if ( from < map_cmp->area.to && to > map_cmp->area.from )
> +        {
> +            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
> +                         "%s: overlapping mappings\n", __FUNCTION__);
> +            return -1;

Please use -EINVAL.

> +        }
>      }
>  
> -    DOMPRINTF("%s: 0x%016" PRIx64 "/%ld: 0x%016" PRIx64
> -              " -> 0x%016" PRIx64 ", %d table(s)",
> -              __FUNCTION__, mask, bits, start, end, tables);
> -    return tables;
> +    memset(map, 0, sizeof(*map));
> +    map->area.from = from & domx86->params->vaddr_mask;
> +    map->area.to = to & domx86->params->vaddr_mask;
> +
> +    for ( l = domx86->params->levels - 1; l >= 0; l-- )
> +    {
> +        map->lvls[l].pfn = pfn + map->area.pgtables;
> +        if ( l == domx86->params->levels - 1 )
> +        {
> +            if ( domx86->n_mappings == 0 )
> +            {
> +                map->lvls[l].from = 0;
> +                map->lvls[l].to = domx86->params->vaddr_mask;
> +                map->lvls[l].pgtables = 1;
> +                map->area.pgtables++;
> +            }
> +            continue;
> +        }

Could use some comments before this hunk. Like "only one top-level
mapping is required".

> +
> +        bits = PAGE_SHIFT_X86 + (l + 1) * PGTBL_LEVEL_SHIFT_X86;
> +        mask = bits_to_mask(bits);
> +        map->lvls[l].from = map->area.from & ~mask;
> +        map->lvls[l].to = map->area.to | mask;
> +
> +        if ( domx86->params->levels == 3 && domx86->n_mappings == 0 &&

Please use PGTBL_LEVELS_I386.

> +             to < 0xc0000000 && l == 1 )
> +        {
> +            DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
> +            map->lvls[l].to = domx86->params->vaddr_mask;
> +        }
> +
> +        for ( m = 0; m < domx86->n_mappings; m++ )
> +        {
> +            map_cmp = domx86->maps + m;
> +            if ( map_cmp->lvls[l].from == map_cmp->lvls[l].to )
> +                continue;
> +            if ( map->lvls[l].from >= map_cmp->lvls[l].from &&
> +                 map->lvls[l].to <= map_cmp->lvls[l].to )
> +            {
> +                map->lvls[l].from = 0;
> +                map->lvls[l].to = 0;
> +                break;
> +            }
> +            assert(map->lvls[l].from >= map_cmp->lvls[l].from ||
> +                   map->lvls[l].to <= map_cmp->lvls[l].to);
> +            if ( map->lvls[l].from >= map_cmp->lvls[l].from &&
> +                 map->lvls[l].from <= map_cmp->lvls[l].to )
> +                map->lvls[l].from = map_cmp->lvls[l].to + 1;
> +            if ( map->lvls[l].to >= map_cmp->lvls[l].from &&
> +                 map->lvls[l].to <= map_cmp->lvls[l].to )
> +                map->lvls[l].to = map_cmp->lvls[l].from - 1;
> +        }
> +        if ( map->lvls[l].from < map->lvls[l].to )
> +            map->lvls[l].pgtables =
> +                ((map->lvls[l].to - map->lvls[l].from) >> bits) + 1;
> +        DOMPRINTF("%s: 0x%016" PRIx64 "/%d: 0x%016" PRIx64 " -> 0x%016" PRIx64
> +                  ", %d table(s)", __FUNCTION__, mask, bits,
> +                  map->lvls[l].from, map->lvls[l].to, map->lvls[l].pgtables);
> +        map->area.pgtables += map->lvls[l].pgtables;
> +    }
> +
> +    return 0;
>  }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 9/9] libxc: create p2m list outside of kernel mapping if supported
  2015-11-05 14:36 ` [PATCH v4 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
@ 2015-11-12 12:42   ` Wei Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2015-11-12 12:42 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Thu, Nov 05, 2015 at 03:36:35PM +0100, Juergen Gross wrote:
> In case the kernel of a new pv-domU indicates it is supporting a p2m
> list outside the initial kernel mapping by specifying INIT_P2M, let
> the domain builder allocate the memory for the p2m list from physical
> guest memory only and map it to the address the kernel is expecting.
> 
> This will enable loading pv-domUs larger than 512 GB.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 8/9] libxc: rework of domain builder's page table handler
  2015-11-12 12:39   ` Wei Liu
@ 2015-11-12 12:45     ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-11-12 12:45 UTC (permalink / raw)
  To: Wei Liu
  Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 12/11/15 13:39, Wei Liu wrote:
> On Thu, Nov 05, 2015 at 03:36:34PM +0100, Juergen Gross wrote:
>> In order to prepare a p2m list outside of the initial kernel mapping
>> do a rework of the domain builder's page table handler. The goal is
>> to be able to use common helpers for page table allocation and setup
>> for initial kernel page tables and page tables mapping the p2m list.
>> This is achieved by supporting multiple mapping areas. The mapped
>> virtual addresses of the single areas must not overlap, while the
>> page tables of a new area added might already be partially present.
>> Especially the top level page table is existing only once, of course.
>>
>> Currently restrict the number of mappings to 1 because the only mapping
>> now is the initial mapping created by toolstack. There should not be
>> behaviour change and guest visible change introduced.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
> 
> Some comments below.

Okay, will fix them all.


Thanks,

Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-11-12 12:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-05 14:36 [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
2015-11-05 14:36 ` [PATCH v4 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
2015-11-12 11:14   ` Wei Liu
2015-11-12 11:20     ` Ian Campbell
2015-11-12 11:22       ` Wei Liu
2015-11-12 11:29       ` Juergen Gross
2015-11-12 11:32         ` Wei Liu
2015-11-12 12:28           ` Juergen Gross
2015-11-12 11:21     ` Juergen Gross
2015-11-12 11:24       ` Wei Liu
2015-11-05 14:36 ` [PATCH v4 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
2015-11-05 17:38   ` Andrew Cooper
2015-11-05 14:36 ` [PATCH v4 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables Juergen Gross
2015-11-05 14:36 ` [PATCH v4 4/9] libxc: introduce domain builder architecture specific data Juergen Gross
2015-11-05 14:36 ` [PATCH v4 5/9] libxc: use domain builder architecture private data for x86 pv domains Juergen Gross
2015-11-05 14:36 ` [PATCH v4 6/9] libxc: create unmapped initrd in domain builder if supported Juergen Gross
2015-11-05 14:36 ` [PATCH v4 7/9] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
2015-11-05 14:36 ` [PATCH v4 8/9] libxc: rework of domain builder's page table handler Juergen Gross
2015-11-12 12:39   ` Wei Liu
2015-11-12 12:45     ` Juergen Gross
2015-11-05 14:36 ` [PATCH v4 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
2015-11-12 12:42   ` Wei Liu
2015-11-12  5:09 ` [PATCH v4 0/9] libxc: support building large pv-domains Juergen Gross
2015-11-12  9:39   ` Wei Liu
2015-11-12  9:41     ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.