All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/7] Initial PVHv2 Dom0 support
@ 2017-02-22 14:24 Roger Pau Monne
  2017-02-22 14:24 ` [PATCH v7 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky

Hello,

The main difference with previous versions is the split of the libelf and
bzimage changes into separate patches and the rebase on top of staging, that
already contains Jan's patch for the VM86 TSS issue.

The full series can also be found on a git branch in my personal git repo:

git://xenbits.xen.org/people/royger/xen.git dom0_hvm_v7

Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v7 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
  2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
@ 2017-02-22 14:24 ` Roger Pau Monne
  2017-02-22 14:24 ` [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

PVHv2 guests, unlike HVM guests, won't have the option to route interrupts
from physical or emulated devices over event channels using PIRQs. This
applies to both DomU and Dom0 PVHv2 guests.

Introduce a new XEN_X86_EMU_USE_PIRQ to notify Xen whether a HVM guest can
route physical interrupts (even from emulated devices) over event channels,
and is thus allowed to use some of the PHYSDEV ops.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v6:
 - Rebase on top of the HVM hypercall changes (and drop Andrew's RB).

Changes since v5:
 - Introduce a has_pirq macro to match other XEN_X86_EMU_ options, and simplify
   some of the code.

Changes since v3:
 - Update docs.

Changes since v2:
 - Change local variable name to currd instead of d.
 - Use currd where it makes sense.
---
 docs/misc/hvmlite.markdown        | 20 ++++++++++++++++++++
 xen/arch/x86/hvm/hypercall.c      |  2 ++
 xen/arch/x86/physdev.c            |  4 ++--
 xen/common/kernel.c               |  2 +-
 xen/include/asm-x86/domain.h      |  2 ++
 xen/include/public/arch-x86/xen.h |  4 +++-
 6 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/docs/misc/hvmlite.markdown b/docs/misc/hvmlite.markdown
index 898b8ee..b2557f7 100644
--- a/docs/misc/hvmlite.markdown
+++ b/docs/misc/hvmlite.markdown
@@ -75,3 +75,23 @@ info structure that's passed at boot time (field rsdp_paddr).
 
 Description of paravirtualized devices will come from XenStore, just as it's
 done for HVM guests.
+
+## Interrupts ##
+
+### Interrupts from physical devices ###
+
+Interrupts from physical devices are delivered using native methods, this is
+done in order to take advantage of new hardware assisted virtualization
+functions, like posted interrupts. This implies that PVHv2 guests with physical
+devices will also have the necessary interrupt controllers in order to manage
+the delivery of interrupts from those devices, using the same interfaces that
+are available on native hardware.
+
+### Interrupts from paravirtualized devices ###
+
+Interrupts from paravirtualized devices are delivered using event channels, see
+[Event Channel Internals][event_channels] for more detailed information about
+event channels. Delivery of those interrupts can be configured in the same way
+as HVM guests, check xen/include/public/hvm/params.h and
+xen/include/public/hvm/hvm_op.h for more information about available delivery
+methods.
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0f7c310..6499caa 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,8 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
+        if ( !has_pirq(curr->domain) && !is_pvh_vcpu(curr) )
+            return -ENOSYS;
         break;
     }
 
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 018f8b5..fc45bfb 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -93,7 +93,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,
     int pirq, irq, ret = 0;
     void *map_data = NULL;
 
-    if ( domid == DOMID_SELF && is_hvm_domain(d) )
+    if ( domid == DOMID_SELF && is_hvm_domain(d) && has_pirq(d) )
     {
         /*
          * Only makes sense for vector-based callback, else HVM-IRQ logic
@@ -264,7 +264,7 @@ int physdev_unmap_pirq(domid_t domid, int pirq)
     if ( ret )
         goto free_domain;
 
-    if ( is_hvm_domain(d) )
+    if ( is_hvm_domain(d) && has_pirq(d) )
     {
         spin_lock(&d->event_lock);
         if ( domain_pirq_to_emuirq(d, pirq) != IRQ_UNBOUND )
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index d0edb13..4b87c60 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -332,7 +332,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             case guest_type_hvm:
                 fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
                              (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
+                             (has_pirq(d) ? (1U << XENFEAT_hvm_pirqs) : 0);
                 break;
             }
 #endif
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 09232bf..d8749df 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -422,6 +422,8 @@ struct arch_domain
 #define has_vvga(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_VGA))
 #define has_viommu(d)      (!!((d)->arch.emulation_flags & XEN_X86_EMU_IOMMU))
 #define has_vpit(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_PIT))
+#define has_pirq(d)        (!!((d)->arch.emulation_flags & \
+                            XEN_X86_EMU_USE_PIRQ))
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
 
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index 363c8cc..73c829a 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -293,12 +293,14 @@ struct xen_arch_domainconfig {
 #define XEN_X86_EMU_IOMMU           (1U<<_XEN_X86_EMU_IOMMU)
 #define _XEN_X86_EMU_PIT            8
 #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
+#define _XEN_X86_EMU_USE_PIRQ       9
+#define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
 
 #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |  \
                                      XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      \
                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  \
                                      XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   \
-                                     XEN_X86_EMU_PIT)
+                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
     uint32_t emulation_flags;
 };
 
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
  2017-02-22 14:24 ` [PATCH v7 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
@ 2017-02-22 14:24 ` Roger Pau Monne
  2017-02-23 13:39   ` Jan Beulich
  2017-02-22 14:24 ` [PATCH v7 3/7] x86/bzimage: change the types from char * to void * Roger Pau Monne
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

Craft the Dom0 e820 memory map and populate it. Introduce a helper to remove
memory pages that are shared between Xen and a domain, and use it in order to
remove low 1MB RAM regions from dom_io in order to assign them to a PVHv2 Dom0.

On hardware lacking support for unrestricted mode also craft the identity page
tables and the TSS used for virtual 8086 mode.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v6:
 - Rebase on top of Jan VM86 TSS fix.
 - Use hvm_copy_to_guest_phys to zero the TSS area.
 - Request the TSS memory area to be aligned to 128.
 - Move write_32bit_pse_identmap to arch-specific mm.c file.

Changes since v5:
 - Adjust the logic to set need_paging.
 - Remove the usage of the _AC macro.
 - Subtract memory from the end of regions instead of the start.
 - Create the VM86_TSS before the identity page table, so that the page table
   is aligned to a page boundary.
 - Use MB1_PAGES in modify_identity_mmio.
 - Move and simply the ASSERT in pvh_setup_p2m.
 - Move the creation of the PSE page tables to a separate function, and use it
   in shadow_enable also.
 - Make the map modify_identiy_mmio parameter a constant.
 - Add a comment to HVM_VM86_TSS_SIZE, although it seems this might need
   further fixing.
 - Introduce pvh_add_mem_range in order to mark the regions used by the VM86
   TSS and the identity page tables as reserved in the memory map.
 - Add a parameter to request aligned memory from pvh_steal_ram.

Changes since v4:
 - Move process_pending_softirqs to previous patch.
 - Fix off-by-one errors in some checks.
 - Make unshare_xen_page_with_guest __init.
 - Improve unshare_xen_page_with_guest by making use of already existing
   is_xen_heap_page and put_page.
 - s/hvm/pvh/.
 - Use PAGE_ORDER_4K in pvh_setup_e820 in order to keep consistency with the
   p2m code.

Changes since v3:
 - Drop get_order_from_bytes_floor, it was only used by
   hvm_populate_memory_range.
 - Switch hvm_populate_memory_range to use frame numbers instead of full memory
   addresses.
 - Add a helper to steal the low 1MB RAM areas from dom_io and add them to Dom0
   as normal RAM.
 - Introduce unshare_xen_page_with_guest in order to remove pages from dom_io,
   so they can be assigned to other domains. This is needed in order to remove
   the low 1MB RAM regions from dom_io and assign them to the hardware_domain.
 - Simplify the loop in hvm_steal_ram.
 - Move definition of map_identity_mmio into this patch.

Changes since v2:
 - Introduce get_order_from_bytes_floor as a local function to
   domain_build.c.
 - Remove extra asserts.
 - Make hvm_populate_memory_range return an error code instead of panicking.
 - Fix comments and printks.
 - Use ULL sufix instead of casting to uint64_t.
 - Rename hvm_setup_vmx_unrestricted_guest to
   hvm_setup_vmx_realmode_helpers.
 - Only substract two pages from the memory calculation, that will be used
   by the MADT replacement.
 - Remove some comments.
 - Remove printing allocation information.
 - Don't stash any pages for the MADT, TSS or ident PT, those will be
   subtracted directly from RAM regions of the memory map.
 - Count the number of iterations before calling process_pending_softirqs
   when populating the memory map.
 - Move the initial call to process_pending_softirqs into construct_dom0,
   and remove the ones from construct_dom0_hvm and construct_dom0_pv.
 - Make memflags global so it can be shared between alloc_chunk and
   hvm_populate_memory_range.

Changes since RFC:
 - Use IS_ALIGNED instead of checking with PAGE_MASK.
 - Use the new %pB specifier in order to print sizes in human readable form.
 - Create a VM86 TSS for hardware that doesn't support unrestricted mode.
 - Subtract guest RAM for the identity page table and the VM86 TSS.
 - Split the creation of the unrestricted mode helper structures to a
   separate function.
 - Use preemption with paging_set_allocation.
 - Use get_order_from_bytes_floor.
---
 xen/arch/x86/domain_build.c     | 365 +++++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/mm.c               |  26 +++
 xen/arch/x86/mm/shadow/common.c |   7 +-
 xen/include/asm-x86/mm.h        |   5 +
 4 files changed, 393 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 0c8a269..adc4c00 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -21,6 +21,7 @@
 #include <xen/compat.h>
 #include <xen/libelf.h>
 #include <xen/pfn.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -44,6 +45,16 @@ static long __initdata dom0_min_nrpages;
 static long __initdata dom0_max_nrpages = LONG_MAX;
 
 /*
+ * Have the TSS cover the ISA port range, which makes it
+ * - 104 bytes base structure
+ * - 32 bytes interrupt redirection bitmap
+ * - 128 bytes I/O bitmap
+ * - one trailing byte
+ * or a total of 265 bytes.
+ */
+#define HVM_VM86_TSS_SIZE 265
+
+/*
  * dom0_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
  * 
  * <min_amt>: The minimum amount of memory which should be allocated for dom0.
@@ -243,11 +254,12 @@ boolean_param("ro-hpet", ro_hpet);
 #define round_pgup(_p)    (((_p)+(PAGE_SIZE-1))&PAGE_MASK)
 #define round_pgdown(_p)  ((_p)&PAGE_MASK)
 
+static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
+
 static struct page_info * __init alloc_chunk(
     struct domain *d, unsigned long max_pages)
 {
     static unsigned int __initdata last_order = MAX_ORDER;
-    static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
     struct page_info *page;
     unsigned int order = get_order_from_pages(max_pages), free_order;
 
@@ -332,7 +344,9 @@ static unsigned long __init compute_dom0_nr_pages(
             avail -= max_pdx >> s;
     }
 
-    need_paging = opt_dom0_shadow || (is_pvh_domain(d) && !iommu_hap_pt_share);
+    need_paging = has_hvm_container_domain(d)
+                  ? !iommu_hap_pt_share || !paging_mode_hap(d)
+                  : opt_dom0_shadow;
     for ( ; ; need_paging = 0 )
     {
         nr_pages = dom0_nrpages;
@@ -364,7 +378,8 @@ static unsigned long __init compute_dom0_nr_pages(
         avail -= dom0_paging_pages(d, nr_pages);
     }
 
-    if ( (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
+    if ( is_pv_domain(d) &&
+         (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
          ((dom0_min_nrpages <= 0) || (nr_pages > min_pages)) )
     {
         /*
@@ -580,6 +595,7 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
     struct e820entry *entry, *entry_guest;
     unsigned int i;
     unsigned long pages, cur_pages = 0;
+    uint64_t start, end;
 
     /*
      * Craft the e820 memory map for Dom0 based on the hardware e820 map.
@@ -607,8 +623,22 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
             continue;
         }
 
-        *entry_guest = *entry;
-        pages = PFN_UP(entry_guest->size);
+        /*
+         * Make sure the start and length are aligned to PAGE_SIZE, because
+         * that's the minimum granularity of the 2nd stage translation. Since
+         * the p2m code uses PAGE_ORDER_4K internally, also use it here in
+         * order to prevent this code from getting out of sync.
+         */
+        start = ROUNDUP(entry->addr, PAGE_SIZE << PAGE_ORDER_4K);
+        end = (entry->addr + entry->size) &
+              ~((PAGE_SIZE << PAGE_ORDER_4K) - 1);
+        if ( start >= end )
+            continue;
+
+        entry_guest->type = E820_RAM;
+        entry_guest->addr = start;
+        entry_guest->size = end - start;
+        pages = PFN_DOWN(entry_guest->size);
         if ( (cur_pages + pages) > nr_pages )
         {
             /* Truncate region */
@@ -1676,15 +1706,340 @@ out:
     return rc;
 }
 
+static int __init modify_identity_mmio(struct domain *d, unsigned long pfn,
+                                       unsigned long nr_pages, const bool map)
+{
+    int rc;
+
+    for ( ; ; )
+    {
+        rc = (map ? map_mmio_regions : unmap_mmio_regions)
+             (d, _gfn(pfn), nr_pages, _mfn(pfn));
+        if ( rc == 0 )
+            break;
+        if ( rc < 0 )
+        {
+            printk(XENLOG_WARNING
+                   "Failed to identity %smap [%#lx,%#lx) for d%d: %d\n",
+                   map ? "" : "un", pfn, pfn + nr_pages, d->domain_id, rc);
+            break;
+        }
+        nr_pages -= rc;
+        pfn += rc;
+        process_pending_softirqs();
+    }
+
+    return rc;
+}
+
+/* Populate a HVM memory range using the biggest possible order. */
+static int __init pvh_populate_memory_range(struct domain *d,
+                                            unsigned long start,
+                                            unsigned long nr_pages)
+{
+    unsigned int order, i = 0;
+    struct page_info *page;
+    int rc;
+#define MAP_MAX_ITER 64
+
+    order = MAX_ORDER;
+    while ( nr_pages != 0 )
+    {
+        unsigned int range_order = get_order_from_pages(nr_pages + 1);
+
+        order = min(range_order ? range_order - 1 : 0, order);
+        page = alloc_domheap_pages(d, order, memflags);
+        if ( page == NULL )
+        {
+            if ( order == 0 && memflags )
+            {
+                /* Try again without any memflags. */
+                memflags = 0;
+                order = MAX_ORDER;
+                continue;
+            }
+            if ( order == 0 )
+            {
+                printk("Unable to allocate memory with order 0!\n");
+                return -ENOMEM;
+            }
+            order--;
+            continue;
+        }
+
+        rc = guest_physmap_add_page(d, _gfn(start), _mfn(page_to_mfn(page)),
+                                    order);
+        if ( rc != 0 )
+        {
+            printk("Failed to populate memory: [%#lx,%lx): %d\n",
+                   start, start + (1UL << order), rc);
+            return -ENOMEM;
+        }
+        start += 1UL << order;
+        nr_pages -= 1UL << order;
+        if ( (++i % MAP_MAX_ITER) == 0 )
+            process_pending_softirqs();
+    }
+
+    return 0;
+#undef MAP_MAX_ITER
+}
+
+/* Steal RAM from the end of a memory region. */
+static int __init pvh_steal_ram(struct domain *d, unsigned long size,
+                                unsigned long align, paddr_t limit,
+                                paddr_t *addr)
+{
+    unsigned int i = d->arch.nr_e820;
+
+    /*
+     * Alignment 0 should be set to 1, so it doesn't wrap around in the
+     * calculations below.
+     */
+    align = align ? : 1;
+    while ( i-- )
+    {
+        struct e820entry *entry = &d->arch.e820[i];
+
+        if ( entry->type != E820_RAM || entry->addr + entry->size > limit ||
+             entry->addr < MB(1) )
+            continue;
+
+        *addr = (entry->addr + entry->size - size) & ~(align - 1);
+        if ( *addr < entry->addr )
+            continue;
+
+        entry->size = *addr - entry->addr;
+        return 0;
+    }
+
+    return -ENOMEM;
+}
+
+/* NB: memory map must be sorted at all times for this to work correctly. */
+static int __init pvh_add_mem_range(struct domain *d, uint64_t s, uint64_t e,
+                                    unsigned int type)
+{
+    struct e820entry *map;
+    unsigned int i;
+
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        uint64_t rs = d->arch.e820[i].addr;
+        uint64_t re = rs + d->arch.e820[i].size;
+
+        if ( rs == e && d->arch.e820[i].type == type )
+        {
+            d->arch.e820[i].addr = s;
+            return 0;
+        }
+
+        if ( re == s && d->arch.e820[i].type == type &&
+             (i + 1 == d->arch.nr_e820 || d->arch.e820[i + 1].addr >= e) )
+        {
+            d->arch.e820[i].size += e - s;
+            return 0;
+        }
+
+        if ( rs >= e )
+            break;
+
+        if ( re > s )
+            return -EEXIST;
+    }
+
+    map = xzalloc_array(struct e820entry, d->arch.nr_e820 + 1);
+    if ( !map )
+    {
+        printk(XENLOG_WARNING "E820: out of memory to add region\n");
+        return -ENOMEM;
+    }
+
+    memcpy(map, d->arch.e820, i * sizeof(*d->arch.e820));
+    memcpy(map + i + 1, d->arch.e820 + i,
+           (d->arch.nr_e820 - i) * sizeof(*d->arch.e820));
+    map[i].addr = s;
+    map[i].size = e - s;
+    map[i].type = type;
+    xfree(d->arch.e820);
+    d->arch.e820 = map;
+    d->arch.nr_e820++;
+
+    return 0;
+}
+
+static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
+{
+    p2m_type_t p2mt;
+    uint32_t rc, *ident_pt;
+    mfn_t mfn;
+    paddr_t gaddr;
+    struct vcpu *v = d->vcpu[0];
+
+    /*
+     * Steal some space from the last RAM region below 4GB and use it to
+     * store the real-mode TSS. It needs to be aligned to 128 so that the
+     * TSS structure (which accounts for the first 104b) doesn't cross
+     * a page boundary.
+     */
+    if ( !pvh_steal_ram(d, HVM_VM86_TSS_SIZE, 128, GB(4), &gaddr) )
+    {
+        if ( hvm_copy_to_guest_phys(gaddr, NULL, HVM_VM86_TSS_SIZE, v) !=
+             HVMCOPY_okay )
+            printk("Unable to zero VM86 TSS area\n");
+        d->arch.hvm_domain.params[HVM_PARAM_VM86_TSS_SIZED] =
+            VM86_TSS_UPDATED | ((uint64_t)HVM_VM86_TSS_SIZE << 32) | gaddr;
+        if ( pvh_add_mem_range(d, gaddr, gaddr + HVM_VM86_TSS_SIZE,
+                               E820_RESERVED) )
+            printk("Unable to set VM86 TSS as reserved in the memory map\n");
+    }
+    else
+        printk("Unable to allocate VM86 TSS area\n");
+
+    /* Steal some more RAM for the identity page tables. */
+    if ( pvh_steal_ram(d, PAGE_SIZE, PAGE_SIZE, GB(4), &gaddr) )
+    {
+        printk("Unable to find memory to stash the identity page tables\n");
+        return -ENOMEM;
+    }
+
+    /*
+     * Identity-map page table is required for running with CR0.PG=0
+     * when using Intel EPT. Create a 32-bit non-PAE page directory of
+     * superpages.
+     */
+    ident_pt = map_domain_gfn(p2m_get_hostp2m(d), _gfn(PFN_DOWN(gaddr)),
+                              &mfn, &p2mt, 0, &rc);
+    if ( ident_pt == NULL )
+    {
+        printk("Unable to map identity page tables\n");
+        return -ENOMEM;
+    }
+    write_32bit_pse_identmap(ident_pt);
+    unmap_domain_page(ident_pt);
+    put_page(mfn_to_page(mfn_x(mfn)));
+    d->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] = gaddr;
+    if ( pvh_add_mem_range(d, gaddr, gaddr + PAGE_SIZE, E820_RESERVED) )
+            printk("Unable to set identity page tables as reserved in the memory map\n");
+
+    return 0;
+}
+
+/* Assign the low 1MB to Dom0. */
+static void __init pvh_steal_low_ram(struct domain *d, unsigned long start,
+                                     unsigned long nr_pages)
+{
+    unsigned long mfn;
+
+    ASSERT(start + nr_pages <= PFN_DOWN(MB(1)));
+
+    for ( mfn = start; mfn < start + nr_pages; mfn++ )
+    {
+        struct page_info *pg = mfn_to_page(mfn);
+        int rc;
+
+        rc = unshare_xen_page_with_guest(pg, dom_io);
+        if ( rc )
+        {
+            printk("Unable to unshare Xen mfn %#lx: %d\n", mfn, rc);
+            continue;
+        }
+
+        share_xen_page_with_guest(pg, d, XENSHARE_writable);
+        rc = guest_physmap_add_entry(d, _gfn(mfn), _mfn(mfn), 0, p2m_ram_rw);
+        if ( rc )
+            printk("Unable to add mfn %#lx to p2m: %d\n", mfn, rc);
+    }
+}
+
+static int __init pvh_setup_p2m(struct domain *d)
+{
+    struct vcpu *v = d->vcpu[0];
+    unsigned long nr_pages;
+    unsigned int i;
+    int rc;
+    bool preempted;
+#define MB1_PAGES PFN_DOWN(MB(1))
+
+    nr_pages = compute_dom0_nr_pages(d, NULL, 0);
+
+    pvh_setup_e820(d, nr_pages);
+    do {
+        preempted = false;
+        paging_set_allocation(d, dom0_paging_pages(d, nr_pages),
+                              &preempted);
+        process_pending_softirqs();
+    } while ( preempted );
+
+    /*
+     * Memory below 1MB is identity mapped.
+     * NB: this only makes sense when booted from legacy BIOS.
+     */
+    rc = modify_identity_mmio(d, 0, MB1_PAGES, true);
+    if ( rc )
+    {
+        printk("Failed to identity map low 1MB: %d\n", rc);
+        return rc;
+    }
+
+    /* Populate memory map. */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        unsigned long addr, size;
+
+        if ( d->arch.e820[i].type != E820_RAM )
+            continue;
+
+        addr = PFN_DOWN(d->arch.e820[i].addr);
+        size = PFN_DOWN(d->arch.e820[i].size);
+
+        if ( addr >= MB1_PAGES )
+            rc = pvh_populate_memory_range(d, addr, size);
+        else
+        {
+            ASSERT(addr + size < MB1_PAGES);
+            pvh_steal_low_ram(d, addr, size);
+        }
+
+        if ( rc )
+            return rc;
+    }
+
+    if ( cpu_has_vmx && paging_mode_hap(d) && !vmx_unrestricted_guest(v) )
+    {
+        /*
+         * Since Dom0 cannot be migrated, we will only setup the
+         * unrestricted guest helpers if they are needed by the current
+         * hardware we are running on.
+         */
+        rc = pvh_setup_vmx_realmode_helpers(d);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+#undef MB1_PAGES
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
                                      void *(*bootstrap_map)(const module_t *),
                                      char *cmdline)
 {
+    int rc;
 
     printk("** Building a PVH Dom0 **\n");
 
+    iommu_hwdom_init(d);
+
+    rc = pvh_setup_p2m(d);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 physical memory map\n");
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 75bdbc3..14cf652 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -474,6 +474,22 @@ void share_xen_page_with_guest(
     spin_unlock(&d->page_alloc_lock);
 }
 
+int __init unshare_xen_page_with_guest(struct page_info *page,
+                                       struct domain *d)
+{
+    if ( page_get_owner(page) != d || !is_xen_heap_page(page) )
+        return -EINVAL;
+
+    if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
+        put_page(page);
+
+    /* Remove the owner and clear the flags. */
+    page->u.inuse.type_info = 0;
+    page_set_owner(page, NULL);
+
+    return 0;
+}
+
 void share_xen_page_with_privileged_guests(
     struct page_info *page, int readonly)
 {
@@ -6595,6 +6611,16 @@ void paging_invlpg(struct vcpu *v, unsigned long va)
         hvm_funcs.invlpg(v, va);
 }
 
+/* Build a 32bit PSE page table using 4MB pages. */
+void write_32bit_pse_identmap(uint32_t *l2)
+{
+    unsigned int i;
+
+    for ( i = 0; i < PAGE_SIZE / sizeof(*l2); i++ )
+        l2[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
+                 _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 51d6bdf..560a7fd 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3053,7 +3053,7 @@ int shadow_enable(struct domain *d, u32 mode)
     unsigned int old_pages;
     struct page_info *pg = NULL;
     uint32_t *e;
-    int i, rv = 0;
+    int rv = 0;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     mode |= PG_SH_enable;
@@ -3109,10 +3109,7 @@ int shadow_enable(struct domain *d, u32 mode)
         /* Fill it with 32-bit, non-PAE superpage entries, each mapping 4MB
          * of virtual address space onto the same physical address range */
         e = __map_domain_page(pg);
-        for ( i = 0; i < PAGE_SIZE / sizeof(*e); i++ )
-            e[i] = ((0x400000U * i)
-                    | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER
-                    | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+        write_32bit_pse_identmap(e);
         unmap_domain_page(e);
         pg->u.inuse.type_info = PGT_l2_page_table | 1 | PGT_validated;
     }
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index a66d5b1..d4a074a 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -275,6 +275,8 @@ struct spage_info
 #define XENSHARE_readonly 1
 extern void share_xen_page_with_guest(
     struct page_info *page, struct domain *d, int readonly);
+extern int unshare_xen_page_with_guest(struct page_info *page,
+                                       struct domain *d);
 extern void share_xen_page_with_privileged_guests(
     struct page_info *page, int readonly);
 extern void free_shared_domheap_page(struct page_info *page);
@@ -597,4 +599,7 @@ typedef struct mm_rwlock {
 
 extern const char zero_page[];
 
+/* Build a 32bit PSE page table using 4MB pages. */
+void write_32bit_pse_identmap(uint32_t *l2);
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 3/7] x86/bzimage: change the types from char * to void *
  2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
  2017-02-22 14:24 ` [PATCH v7 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
  2017-02-22 14:24 ` [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
@ 2017-02-22 14:24 ` Roger Pau Monne
  2017-02-23 13:41   ` Jan Beulich
  2017-02-22 14:24 ` [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build Roger Pau Monne
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

This allows to also change the types of image_base and image_start in the Dom0
builder from char * to void *.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v6:
 - New in this version.
---
 xen/arch/x86/bzimage.c        | 9 +++++----
 xen/arch/x86/domain_build.c   | 4 ++--
 xen/include/asm-x86/bzimage.h | 4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index 50ebb84..ac4fd42 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -9,9 +9,9 @@
 #include <xen/libelf.h>
 #include <asm/bzimage.h>
 
-static __init unsigned long output_length(char *image, unsigned long image_len)
+static __init unsigned long output_length(void *image, unsigned long image_len)
 {
-    return *(uint32_t *)&image[image_len - 4];
+    return *(uint32_t *)(image + image_len - 4);
 }
 
 struct __packed setup_header {
@@ -71,7 +71,7 @@ static __init int bzimage_check(struct setup_header *hdr, unsigned long len)
 
 static unsigned long __initdata orig_image_len;
 
-unsigned long __init bzimage_headroom(char *image_start,
+unsigned long __init bzimage_headroom(void *image_start,
                                       unsigned long image_length)
 {
     struct setup_header *hdr = (struct setup_header *)image_start;
@@ -104,7 +104,8 @@ unsigned long __init bzimage_headroom(char *image_start,
     return headroom;
 }
 
-int __init bzimage_parse(char *image_base, char **image_start, unsigned long *image_len)
+int __init bzimage_parse(void *image_base, void **image_start,
+                         unsigned long *image_len)
 {
     struct setup_header *hdr = (struct setup_header *)(*image_start);
     int err = bzimage_check(hdr, *image_len);
diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index adc4c00..aa1625a 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -1029,9 +1029,9 @@ static int __init construct_dom0_pv(
     start_info_t *si;
     struct vcpu *v = d->vcpu[0];
     unsigned long long value;
-    char *image_base = bootstrap_map(image);
+    void *image_base = bootstrap_map(image);
     unsigned long image_len = image->mod_end;
-    char *image_start = image_base + image_headroom;
+    void *image_start = image_base + image_headroom;
     unsigned long initrd_len = initrd ? initrd->mod_end : 0;
     l4_pgentry_t *l4tab = NULL, *l4start = NULL;
     l3_pgentry_t *l3tab = NULL, *l3start = NULL;
diff --git a/xen/include/asm-x86/bzimage.h b/xen/include/asm-x86/bzimage.h
index 0bf5bca..7ed69d3 100644
--- a/xen/include/asm-x86/bzimage.h
+++ b/xen/include/asm-x86/bzimage.h
@@ -3,9 +3,9 @@
 
 #include <xen/init.h>
 
-unsigned long bzimage_headroom(char *image_start, unsigned long image_length);
+unsigned long bzimage_headroom(void *image_start, unsigned long image_length);
 
-int bzimage_parse(char *image_base, char **image_start,
+int bzimage_parse(void *image_base, void **image_start,
                   unsigned long *image_len);
 
 #endif /* __X86_BZIMAGE_H__ */
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build
  2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (2 preceding siblings ...)
  2017-02-22 14:24 ` [PATCH v7 3/7] x86/bzimage: change the types from char * to void * Roger Pau Monne
@ 2017-02-22 14:24 ` Roger Pau Monne
  2017-02-23 13:47   ` Jan Beulich
  2017-02-22 14:24 ` [PATCH v7 5/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Jan Beulich, boris.ostrovsky,
	Roger Pau Monne

Allow setting the destination vCPU for libelf, so that elf_load_image can take
it into account when loading the kernel for Dom0. This is needed for PVHv2 Dom0
build, so that hvm_copy_to_guest_phys can be called with a Dom0 vCPU instead of
current (that contains the idle vCPU at this point).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v6:
 - New in this version.
---
 xen/arch/x86/domain_build.c       |  1 +
 xen/common/libelf/libelf-loader.c | 25 +++++++++++++++++++++++--
 xen/include/xen/libelf.h          |  6 ++++++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index aa1625a..d2a1105 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -1462,6 +1462,7 @@ static int __init construct_dom0_pv(
     /* Copy the OS image and free temporary buffer. */
     elf.dest_base = (void*)vkern_start;
     elf.dest_size = vkern_end - vkern_start;
+    elf_set_vcpu(&elf, v);
     rc = elf_load_binary(&elf);
     if ( rc < 0 )
     {
diff --git a/xen/common/libelf/libelf-loader.c b/xen/common/libelf/libelf-loader.c
index 1644f16..371061c 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -146,6 +146,25 @@ void elf_set_verbose(struct elf_binary *elf)
     elf->verbose = 1;
 }
 
+static elf_errorstatus elf_memcpy(struct vcpu *v, void *dst, void *src,
+                                  uint64_t size)
+{
+    int rc = 0;
+
+#ifdef CONFIG_X86
+    if ( is_hvm_vcpu(v) )
+    {
+        rc = hvm_copy_to_guest_phys((paddr_t)dst, src, size, v);
+        rc = rc != HVMCOPY_okay ? -1 : 0;
+    }
+    else
+#endif
+        rc = src == NULL ? raw_clear_guest(dst, size) :
+                           raw_copy_to_guest(dst, src, size);
+
+    return rc;
+}
+
 static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, elf_ptrval src, uint64_t filesz, uint64_t memsz)
 {
     elf_errorstatus rc;
@@ -153,10 +172,12 @@ static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, el
         return -1;
     /* We trust the dom0 kernel image completely, so we don't care
      * about overruns etc. here. */
-    rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src), filesz);
+    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src),
+                    filesz);
     if ( rc != 0 )
         return -1;
-    rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
+    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst + filesz), NULL,
+                    memsz - filesz);
     if ( rc != 0 )
         return -1;
     return 0;
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 1b763f3..b739981 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -212,6 +212,8 @@ struct elf_binary {
     /* misc */
     elf_log_callback *log_callback;
     void *log_caller_data;
+#else
+    struct vcpu *vcpu;
 #endif
     bool verbose;
     const char *broken;
@@ -351,6 +353,10 @@ elf_errorstatus elf_init(struct elf_binary *elf, const char *image, size_t size)
    */
 #ifdef __XEN__
 void elf_set_verbose(struct elf_binary *elf);
+static inline void elf_set_vcpu(struct elf_binary *elf, struct vcpu *v)
+{
+    elf->vcpu = v;
+}
 #else
 void elf_set_log(struct elf_binary *elf, elf_log_callback*,
                  void *log_caller_pointer, bool verbose);
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 5/7] xen/x86: parse Dom0 kernel for PVHv2
  2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (3 preceding siblings ...)
  2017-02-22 14:24 ` [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build Roger Pau Monne
@ 2017-02-22 14:24 ` Roger Pau Monne
  2017-02-23 13:50   ` Jan Beulich
  2017-02-22 14:24 ` [PATCH v7 6/7] xen/x86: Setup PVHv2 Dom0 CPUs Roger Pau Monne
  2017-02-22 14:24 ` [PATCH v7 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
  6 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

Introduce a helper to parse the Dom0 kernel.

A new helper is also introduced to libelf, that's used to store the destination
vcpu of the domain. This parameter is needed when loading the kernel on a HVM
domain (PVHv2), since hvm_copy_to_guest_phys requires passing the destination
vcpu.

While there also fix image_base and image_start to be of type "void *", and do
the necessary fixup of related functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - s/hvm_copy_to_guest_phys_vcpu/hvm_copy_to_guest_phys/.
 - Use void * for image_base and image_start, make the necessary changes.
 - Introduce elf_set_vcpu in order to store the destination vcpu in
   elf_binary, and use it in elf_load_image. This avoids having to override
   current.
 - Style fixes.
 - Round up the position of the modlist/start_info to an aligned address
   depending on the kernel bitness.

Changes since v4:
 - s/hvm/pvh.
 - Use hvm_copy_to_guest_phys_vcpu.

Changes since v3:
 - Change one error message.
 - Indent "out" label by one space.
 - Introduce hvm_copy_to_phys and slightly simplify the code in hvm_load_kernel.

Changes since v2:
 - Remove debug messages.
 - Don't hardcode the number of modules to 1.
---
 xen/arch/x86/domain_build.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index d2a1105..a47b8d2 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -39,6 +39,7 @@
 
 #include <public/version.h>
 #include <public/hvm/hvm_info_table.h>
+#include <public/arch-x86/hvm/start_info.h>
 
 static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
@@ -2022,12 +2023,136 @@ static int __init pvh_setup_p2m(struct domain *d)
 #undef MB1_PAGES
 }
 
+static int __init pvh_load_kernel(struct domain *d, const module_t *image,
+                                  unsigned long image_headroom,
+                                  module_t *initrd, void *image_base,
+                                  char *cmdline, paddr_t *entry,
+                                  paddr_t *start_info_addr)
+{
+    void *image_start = image_base + image_headroom;
+    unsigned long image_len = image->mod_end;
+    struct elf_binary elf;
+    struct elf_dom_parms parms;
+    paddr_t last_addr;
+    struct hvm_start_info start_info = { 0 };
+    struct hvm_modlist_entry mod = { 0 };
+    struct vcpu *v = d->vcpu[0];
+    int rc;
+
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
+    {
+        printk("Error trying to detect bz compressed kernel\n");
+        return rc;
+    }
+
+    if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
+    {
+        printk("Unable to init ELF\n");
+        return rc;
+    }
+#ifdef VERBOSE
+    elf_set_verbose(&elf);
+#endif
+    elf_parse_binary(&elf);
+    if ( (rc = elf_xen_parse(&elf, &parms)) != 0 )
+    {
+        printk("Unable to parse kernel for ELFNOTES\n");
+        return rc;
+    }
+
+    if ( parms.phys_entry == UNSET_ADDR32 )
+    {
+        printk("Unable to find XEN_ELFNOTE_PHYS32_ENTRY address\n");
+        return -EINVAL;
+    }
+
+    printk("OS: %s version: %s loader: %s bitness: %s\n", parms.guest_os,
+           parms.guest_ver, parms.loader,
+           elf_64bit(&elf) ? "64-bit" : "32-bit");
+
+    /* Copy the OS image and free temporary buffer. */
+    elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
+    elf.dest_size = parms.virt_kend - parms.virt_kstart;
+
+    elf_set_vcpu(&elf, v);
+    rc = elf_load_binary(&elf);
+    if ( rc < 0 )
+    {
+        printk("Failed to load kernel: %d\n", rc);
+        printk("Xen dom0 kernel broken ELF: %s\n", elf_check_broken(&elf));
+        return rc;
+    }
+
+    last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
+
+    if ( initrd != NULL )
+    {
+        rc = hvm_copy_to_guest_phys(last_addr, mfn_to_virt(initrd->mod_start),
+                                    initrd->mod_end, v);
+        if ( rc )
+        {
+            printk("Unable to copy initrd to guest\n");
+            return rc;
+        }
+
+        mod.paddr = last_addr;
+        mod.size = initrd->mod_end;
+        last_addr += ROUNDUP(initrd->mod_end, PAGE_SIZE);
+    }
+
+    /* Free temporary buffers. */
+    discard_initial_images();
+
+    if ( cmdline != NULL )
+    {
+        rc = hvm_copy_to_guest_phys(last_addr, cmdline, strlen(cmdline) + 1, v);
+        if ( rc )
+        {
+            printk("Unable to copy guest command line\n");
+            return rc;
+        }
+        start_info.cmdline_paddr = last_addr;
+        /*
+         * Round up to 32/64 bits (depending on the guest kernel bitness) so
+         * the modlist/start_info is aligned.
+         */
+        last_addr += ROUNDUP(strlen(cmdline) + 1, elf_64bit(&elf) ? 8 : 4);
+    }
+    if ( initrd != NULL )
+    {
+        rc = hvm_copy_to_guest_phys(last_addr, &mod, sizeof(mod), v);
+        if ( rc )
+        {
+            printk("Unable to copy guest modules\n");
+            return rc;
+        }
+        start_info.modlist_paddr = last_addr;
+        start_info.nr_modules = 1;
+        last_addr += sizeof(mod);
+    }
+
+    start_info.magic = XEN_HVM_START_MAGIC_VALUE;
+    start_info.flags = SIF_PRIVILEGED | SIF_INITDOMAIN;
+    rc = hvm_copy_to_guest_phys(last_addr, &start_info, sizeof(start_info), v);
+    if ( rc )
+    {
+        printk("Unable to copy start info to guest\n");
+        return rc;
+    }
+
+    *entry = parms.phys_entry;
+    *start_info_addr = last_addr;
+
+    return 0;
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
                                      void *(*bootstrap_map)(const module_t *),
                                      char *cmdline)
 {
+    paddr_t entry, start_info;
     int rc;
 
     printk("** Building a PVH Dom0 **\n");
@@ -2041,6 +2166,14 @@ static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = pvh_load_kernel(d, image, image_headroom, initrd, bootstrap_map(image),
+                         cmdline, &entry, &start_info);
+    if ( rc )
+    {
+        printk("Failed to load Dom0 kernel\n");
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 6/7] xen/x86: Setup PVHv2 Dom0 CPUs
  2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (4 preceding siblings ...)
  2017-02-22 14:24 ` [PATCH v7 5/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
@ 2017-02-22 14:24 ` Roger Pau Monne
  2017-02-22 14:24 ` [PATCH v7 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
  6 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

Initialize Dom0 BSP/APs and setup the memory and IO permissions. This also sets
the initial BSP state in order to match the protocol specified in
docs/misc/hvmlite.markdown.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Make cpus and i unsigned ints.
 - Use an initializer for cpu_ctx (and remove the memset).
 - Move the clear_bit of vcpu 0 the end of pvh_setup_cpus.
---
 xen/arch/x86/domain_build.c | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index a47b8d2..d5abc9c 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -40,6 +40,7 @@
 #include <public/version.h>
 #include <public/hvm/hvm_info_table.h>
 #include <public/arch-x86/hvm/start_info.h>
+#include <public/hvm/hvm_vcpu.h>
 
 static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
@@ -2146,6 +2147,59 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     return 0;
 }
 
+static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
+                                 paddr_t start_info)
+{
+    struct vcpu *v = d->vcpu[0];
+    unsigned int cpu, i;
+    int rc;
+    /* 
+     * This sets the vCPU state according to the state described in
+     * docs/misc/hvmlite.markdown.
+     */
+    vcpu_hvm_context_t cpu_ctx = {
+        .mode = VCPU_HVM_MODE_32B,
+        .cpu_regs.x86_32.ebx = start_info,
+        .cpu_regs.x86_32.eip = entry,
+        .cpu_regs.x86_32.cr0 = X86_CR0_PE | X86_CR0_ET,
+        .cpu_regs.x86_32.cs_limit = ~0u,
+        .cpu_regs.x86_32.ds_limit = ~0u,
+        .cpu_regs.x86_32.ss_limit = ~0u,
+        .cpu_regs.x86_32.tr_limit = 0x67,
+        .cpu_regs.x86_32.cs_ar = 0xc9b,
+        .cpu_regs.x86_32.ds_ar = 0xc93,
+        .cpu_regs.x86_32.ss_ar = 0xc93,
+        .cpu_regs.x86_32.tr_ar = 0x8b,
+    };
+
+    cpu = v->processor;
+    for ( i = 1; i < d->max_vcpus; i++ )
+    {
+        cpu = cpumask_cycle(cpu, &dom0_cpus);
+        setup_dom0_vcpu(d, i, cpu);
+    }
+
+    rc = arch_set_info_hvm_guest(v, &cpu_ctx);
+    if ( rc )
+    {
+        printk("Unable to setup Dom0 BSP context: %d\n", rc);
+        return rc;
+    }
+
+    rc = setup_permissions(d);
+    if ( rc )
+    {
+        panic("Unable to setup Dom0 permissions: %d\n", rc);
+        return rc;
+    }
+
+    update_domain_wallclock_time(d);
+
+    clear_bit(_VPF_down, &v->pause_flags);
+
+    return 0;
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
@@ -2174,6 +2228,13 @@ static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = pvh_setup_cpus(d, entry, start_info);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 CPUs: %d\n", rc);
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables
  2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (5 preceding siblings ...)
  2017-02-22 14:24 ` [PATCH v7 6/7] xen/x86: Setup PVHv2 Dom0 CPUs Roger Pau Monne
@ 2017-02-22 14:24 ` Roger Pau Monne
  6 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-22 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, boris.ostrovsky, Roger Pau Monne, Jan Beulich

Create a new MADT table that contains the topology exposed to the guest. A
new XSDT table is also created, in order to filter the tables that we want
to expose to the guest, plus the Xen crafted MADT. This in turn requires Xen
to also create a new RSDP in order to make it point to the custom XSDT.

Also, regions marked as E820_ACPI or E820_NVS are identity mapped into Dom0
p2m, plus any top-level ACPI tables that should be accessible to Dom0 and
reside in reserved regions. This is needed because some memory maps don't
properly account for all the memory used by ACPI, so it's common to find ACPI
tables in reserved regions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - s/hvm_copy_to_guest_phys_vcpu/hvm_copy_to_guest_phys/.
 - Move pvh_add_mem_range to previous patch.
 - Add a comment regarding the current limitation to only 1 emulated IO APIC.
 - s/dom0_max_vcpus()/max_vcpus/ in pvh_setup_acpi_madt.
 - Cast structures to void when assigning.
 - Declare banned_tables with the initconst annotation.
 - Expand some comments messages.
 - Initialize the RSDP local variable.
 - Only provide x2APIC entries in the MADT.

Changes since v4:
 - s/hvm/pvh.
 - Use hvm_copy_to_guest_phys_vcpu.
 - Don't allocate up to E820MAX entries for the Dom0 memory map and instead
   allow pvh_add_mem_range to dynamically grow the memory map.
 - Add a comment about the lack of x2APIC MADT entries.
 - Change acpi_intr_overrides to unsigned int and the max iterator bound to
   UINT_MAX.
 - Set the MADT version as the minimum version between the hardware value and
   our supported version (4).
 - Set the MADT IO APIC ID to the current value of the domain vioapic->id.
 - Use void * when subtracting two pointers.
 - Fix indentation of nr_pages and use PFN_UP instead of DIV_ROUND_UP.
 - Change wording of the pvh_acpi_table_allowed error message.
 - Make j unsigned in pvh_setup_acpi_xsdt.
 - Move initialization of local variables with declarations in
   pvh_setup_acpi_xsdt.
 - Reword the comment about the allocated size of the xsdt custom table.
 - Fix line splitting.
 - Add a comment regarding the layering violation caused by the usage of
   acpi_tb_checksum.
 - Pass IO APIC NMI sources found in the MADT to Dom0.
 - Create x2APIC entries if the native MADT also contains them.
 - s/acpi_intr_overrrides/acpi_intr_overrides/.
 - Make sure the MADT is properly mapped into Dom0, or else Dom0 might not be
   able to access the output of the _MAT method depending on the
   implementation.
 - Get the first ACPI processor ID and use that as the base processor ID of the
   crafted MADT. This is done so that local/x2 APIC NMI entries match with the
   local/x2 APIC objects.

Changes since v3:
 - Use hvm_copy_to_phys in order to copy the tables to Dom0 memory.
 - Return EEXIST for overlaping ranges in hvm_add_mem_range.
 - s/ov/ovr/ for interrupt override parsing functions.
 - Constify intr local variable in acpi_set_intr_ovr.
 - Use structure asignement for type safety.
 - Perform sizeof using local variables in hvm_setup_acpi_madt.
 - Manually set revision of crafted/modified tables.
 - Only map tables to guest that reside in reserved or ACPI memory regions.
 - Copy the RSDP OEM signature to the crafted RSDP.
 - Pair calls to acpi_os_map_memory/acpi_os_unmap_memory.
 - Add memory regions for allowed ACPI tables to the memory map and then
   perform the identity mappings. This avoids having to call modify_identity_mmio
   multiple times.
 - Add a FIXME comment regarding the lack of multiple vIO-APICs.

Changes since v2:
 - Completely reworked.
---
 xen/arch/x86/domain_build.c | 434 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 434 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index d5abc9c..932fa4e 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -22,6 +22,7 @@
 #include <xen/libelf.h>
 #include <xen/pfn.h>
 #include <xen/guest_access.h>
+#include <xen/acpi.h>
 #include <asm/regs.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -37,6 +38,8 @@
 #include <asm/io_apic.h>
 #include <asm/hpet.h>
 
+#include <acpi/actables.h>
+
 #include <public/version.h>
 #include <public/hvm/hvm_info_table.h>
 #include <public/arch-x86/hvm/start_info.h>
@@ -56,6 +59,12 @@ static long __initdata dom0_max_nrpages = LONG_MAX;
  */
 #define HVM_VM86_TSS_SIZE 265
 
+static unsigned int __initdata acpi_intr_overrides;
+static struct acpi_madt_interrupt_override __initdata *intsrcovr;
+
+static unsigned int __initdata acpi_nmi_sources;
+static struct acpi_madt_nmi_source __initdata *nmisrc;
+
 /*
  * dom0_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
  * 
@@ -2200,6 +2209,424 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
     return 0;
 }
 
+static int __init acpi_count_intr_ovr(struct acpi_subtable_header *header,
+                                     const unsigned long end)
+{
+
+    acpi_intr_overrides++;
+    return 0;
+}
+
+static int __init acpi_set_intr_ovr(struct acpi_subtable_header *header,
+                                    const unsigned long end)
+{
+    const struct acpi_madt_interrupt_override *intr =
+        container_of(header, struct acpi_madt_interrupt_override, header);
+
+    *intsrcovr = *intr;
+    intsrcovr++;
+
+    return 0;
+}
+
+static int __init acpi_count_nmi_src(struct acpi_subtable_header *header,
+                                     const unsigned long end)
+{
+
+    acpi_nmi_sources++;
+    return 0;
+}
+
+static int __init acpi_set_nmi_src(struct acpi_subtable_header *header,
+                                   const unsigned long end)
+{
+    const struct acpi_madt_nmi_source *src =
+        container_of(header, struct acpi_madt_nmi_source, header);
+
+    *nmisrc = *src;
+    nmisrc++;
+
+    return 0;
+}
+
+static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
+{
+    struct acpi_table_madt *madt;
+    struct acpi_table_header *table;
+    struct acpi_madt_io_apic *io_apic;
+    struct acpi_madt_local_x2apic *x2apic;
+    acpi_status status;
+    unsigned long size;
+    unsigned int i, max_vcpus;
+    int rc;
+
+    /* Count number of interrupt overrides in the MADT. */
+    acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
+                          acpi_count_intr_ovr, UINT_MAX);
+
+    /* Count number of NMI sources in the MADT. */
+    acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_count_nmi_src,
+                          UINT_MAX);
+
+    max_vcpus = dom0_max_vcpus();
+    /* Calculate the size of the crafted MADT. */
+    size = sizeof(*madt);
+    /*
+     * FIXME: the current vIO-APIC code just supports one IO-APIC instance
+     * per domain. This must be fixed in order to provide the same amount of
+     * IO APICs as available on bare metal.
+     */
+    size += sizeof(*io_apic);
+    size += sizeof(*intsrcovr) * acpi_intr_overrides;
+    size += sizeof(*nmisrc) * acpi_nmi_sources;
+    size += sizeof(*x2apic) * max_vcpus;
+
+    madt = xzalloc_bytes(size);
+    if ( !madt )
+    {
+        printk("Unable to allocate memory for MADT table\n");
+        return -ENOMEM;
+    }
+
+    /* Copy the native MADT table header. */
+    status = acpi_get_table(ACPI_SIG_MADT, 0, &table);
+    if ( !ACPI_SUCCESS(status) )
+    {
+        printk("Failed to get MADT ACPI table, aborting.\n");
+        return -EINVAL;
+    }
+    madt->header = *table;
+    madt->address = APIC_DEFAULT_PHYS_BASE;
+    /*
+     * NB: this is currently set to 4, which is the revision in the ACPI
+     * spec 6.1. Sadly ACPICA doesn't provide revision numbers for the
+     * tables described in the headers.
+     */
+    madt->header.revision = min_t(unsigned char, table->revision, 4);
+
+    /*
+     * Setup the IO APIC entry.
+     * FIXME: the current vIO-APIC code just supports one IO-APIC instance
+     * per domain. This must be fixed in order to provide the same amount of
+     * IO APICs as available on bare metal, and with the same IDs as found in
+     * the native IO APIC MADT entries.
+     */
+    if ( nr_ioapics > 1 )
+        printk("WARNING: found %d IO APICs, Dom0 will only have access to 1 emulated IO APIC\n",
+               nr_ioapics);
+    io_apic = (void *)(madt + 1);
+    io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
+    io_apic->header.length = sizeof(*io_apic);
+    io_apic->id = domain_vioapic(d)->id;
+    io_apic->address = VIOAPIC_DEFAULT_BASE_ADDRESS;
+
+    x2apic = (void *)(io_apic + 1);
+    for ( i = 0; i < max_vcpus; i++ )
+    {
+        x2apic->header.type = ACPI_MADT_TYPE_LOCAL_X2APIC;
+        x2apic->header.length = sizeof(*x2apic);
+        x2apic->uid = i;
+        x2apic->local_apic_id = i * 2;
+        x2apic->lapic_flags = ACPI_MADT_ENABLED;
+        x2apic++;
+    }
+
+    /* Setup interrupt overrides. */
+    intsrcovr = (void *)x2apic;
+    acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_set_intr_ovr,
+                          acpi_intr_overrides);
+
+    /* Setup NMI sources. */
+    nmisrc = (void *)intsrcovr;
+    acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_set_nmi_src,
+                          acpi_nmi_sources);
+
+    ASSERT(((void *)nmisrc - (void *)madt) == size);
+    madt->header.length = size;
+    /*
+     * Calling acpi_tb_checksum here is a layering violation, but
+     * introducing a wrapper for such simple usage seems overkill.
+     */
+    madt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, madt), size);
+
+    /* Place the new MADT in guest memory space. */
+    if ( pvh_steal_ram(d, size, 0, GB(4), addr) )
+    {
+        printk("Unable to find allocate guest RAM for MADT\n");
+        return -ENOMEM;
+    }
+
+    /* Mark this region as E820_ACPI. */
+    if ( pvh_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
+        printk("Unable to add MADT region to memory map\n");
+
+    rc = hvm_copy_to_guest_phys(*addr, madt, size, d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy MADT into guest memory\n");
+        return rc;
+    }
+    xfree(madt);
+
+    return 0;
+}
+
+static bool __init acpi_memory_banned(unsigned long address,
+                                      unsigned long size)
+{
+    unsigned long mfn, nr_pages, i;
+
+    mfn = PFN_DOWN(address);
+    nr_pages = PFN_UP((address & ~PAGE_MASK) + size);
+    for ( i = 0 ; i < nr_pages; i++ )
+        if ( !page_is_ram_type(mfn + i, RAM_TYPE_RESERVED) &&
+             !page_is_ram_type(mfn + i, RAM_TYPE_ACPI) )
+            return true;
+
+    return false;
+}
+
+static bool __init pvh_acpi_table_allowed(const char *sig)
+{
+    static const char __initconst banned_tables[][ACPI_NAME_SIZE] = {
+        ACPI_SIG_HPET, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_MPST,
+        ACPI_SIG_PMTT, ACPI_SIG_MADT, ACPI_SIG_DMAR};
+    unsigned int i;
+
+    for ( i = 0 ; i < ARRAY_SIZE(banned_tables); i++ )
+        if ( strncmp(sig, banned_tables[i], ACPI_NAME_SIZE) == 0 )
+            return false;
+
+    /* Make sure table doesn't reside in a RAM region. */
+    if ( acpi_memory_banned(acpi_gbl_root_table_list.tables[i].address,
+                            acpi_gbl_root_table_list.tables[i].length) )
+    {
+        printk("Skipping table %.4s because resides in a non-ACPI, non-reserved region\n",
+               sig);
+        return false;
+    }
+
+    return true;
+}
+
+static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr,
+                                      paddr_t *addr)
+{
+    struct acpi_table_xsdt *xsdt;
+    struct acpi_table_header *table;
+    struct acpi_table_rsdp *rsdp;
+    unsigned long size = sizeof(*xsdt);
+    unsigned int i, j, num_tables = 0;
+    paddr_t xsdt_paddr;
+    int rc;
+
+    /*
+     * Restore original DMAR table signature, we are going to filter it from
+     * the new XSDT that is presented to the guest, so it is no longer
+     * necessary to have it's signature zapped.
+     */
+    acpi_dmar_reinstate();
+
+    /* Count the number of tables that will be added to the XSDT. */
+    for( i = 0; i < acpi_gbl_root_table_list.count; i++ )
+    {
+        const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii;
+
+        if ( pvh_acpi_table_allowed(sig) )
+            num_tables++;
+    }
+
+    /*
+     * No need to add or subtract anything because struct acpi_table_xsdt
+     * includes one array slot already, and we have filtered out the original
+     * MADT and we are going to add a custom built MADT.
+     */
+    size += num_tables * sizeof(xsdt->table_offset_entry[0]);
+
+    xsdt = xzalloc_bytes(size);
+    if ( !xsdt )
+    {
+        printk("Unable to allocate memory for XSDT table\n");
+        return -ENOMEM;
+    }
+
+    /* Copy the native XSDT table header. */
+    rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(*rsdp));
+    if ( !rsdp )
+    {
+        printk("Unable to map RSDP\n");
+        return -EINVAL;
+    }
+    xsdt_paddr = rsdp->xsdt_physical_address;
+    acpi_os_unmap_memory(rsdp, sizeof(*rsdp));
+    table = acpi_os_map_memory(xsdt_paddr, sizeof(*table));
+    if ( !table )
+    {
+        printk("Unable to map XSDT\n");
+        return -EINVAL;
+    }
+    xsdt->header = *table;
+    acpi_os_unmap_memory(table, sizeof(*table));
+
+    /* Add the custom MADT. */
+    xsdt->table_offset_entry[0] = madt_addr;
+
+    /* Copy the addresses of the rest of the allowed tables. */
+    for( i = 0, j = 1; i < acpi_gbl_root_table_list.count; i++ )
+    {
+        const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii;
+
+        if ( pvh_acpi_table_allowed(sig) )
+            xsdt->table_offset_entry[j++] =
+                                acpi_gbl_root_table_list.tables[i].address;
+    }
+
+    xsdt->header.revision = 1;
+    xsdt->header.length = size;
+    /*
+     * Calling acpi_tb_checksum here is a layering violation, but
+     * introducing a wrapper for such simple usage seems overkill.
+     */
+    xsdt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, xsdt), size);
+
+    /* Place the new XSDT in guest memory space. */
+    if ( pvh_steal_ram(d, size, 0, GB(4), addr) )
+    {
+        printk("Unable to find guest RAM for XSDT\n");
+        return -ENOMEM;
+    }
+
+    /* Mark this region as E820_ACPI. */
+    if ( pvh_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
+        printk("Unable to add XSDT region to memory map\n");
+
+    rc = hvm_copy_to_guest_phys(*addr, xsdt, size, d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy XSDT into guest memory\n");
+        return rc;
+    }
+    xfree(xsdt);
+
+    return 0;
+}
+
+static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info)
+{
+    unsigned long pfn, nr_pages;
+    paddr_t madt_paddr, xsdt_paddr, rsdp_paddr;
+    unsigned int i;
+    int rc;
+    struct acpi_table_rsdp *native_rsdp, rsdp = {
+        .signature = ACPI_SIG_RSDP,
+        .revision = 2,
+        .length = sizeof(rsdp),
+    };
+
+
+    /* Scan top-level tables and add their regions to the guest memory map. */
+    for( i = 0; i < acpi_gbl_root_table_list.count; i++ )
+    {
+        const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii;
+        unsigned long addr = acpi_gbl_root_table_list.tables[i].address;
+        unsigned long size = acpi_gbl_root_table_list.tables[i].length;
+
+        /*
+         * Make sure the original MADT is also mapped, so that Dom0 can
+         * properly access the data returned by _MAT methods in case it's
+         * re-using MADT memory.
+         */
+        if ( strncmp(sig, ACPI_SIG_MADT, ACPI_NAME_SIZE)
+             ? pvh_acpi_table_allowed(sig)
+             : !acpi_memory_banned(addr, size) )
+             pvh_add_mem_range(d, addr, addr + size, E820_ACPI);
+    }
+
+    /* Identity map ACPI e820 regions. */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        if ( d->arch.e820[i].type != E820_ACPI &&
+             d->arch.e820[i].type != E820_NVS )
+            continue;
+
+        pfn = PFN_DOWN(d->arch.e820[i].addr);
+        nr_pages = PFN_UP((d->arch.e820[i].addr & ~PAGE_MASK) +
+                          d->arch.e820[i].size);
+
+        rc = modify_identity_mmio(d, pfn, nr_pages, true);
+        if ( rc )
+        {
+            printk("Failed to map ACPI region [%#lx, %#lx) into Dom0 memory map\n",
+                   pfn, pfn + nr_pages);
+            return rc;
+        }
+    }
+
+    rc = pvh_setup_acpi_madt(d, &madt_paddr);
+    if ( rc )
+        return rc;
+
+    rc = pvh_setup_acpi_xsdt(d, madt_paddr, &xsdt_paddr);
+    if ( rc )
+        return rc;
+
+    /* Craft a custom RSDP. */
+    native_rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(rsdp));
+    memcpy(rsdp.oem_id, native_rsdp->oem_id, sizeof(rsdp.oem_id));
+    acpi_os_unmap_memory(native_rsdp, sizeof(rsdp));
+    rsdp.xsdt_physical_address = xsdt_paddr;
+    /*
+     * Calling acpi_tb_checksum here is a layering violation, but
+     * introducing a wrapper for such simple usage seems overkill.
+     */
+    rsdp.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, &rsdp),
+                                      ACPI_RSDP_REV0_SIZE);
+    rsdp.extended_checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, &rsdp),
+                                               sizeof(rsdp));
+
+    /*
+     * Place the new RSDP in guest memory space.
+     *
+     * NB: this RSDP is not going to replace the original RSDP, which should
+     * still be accessible to the guest. However that RSDP is going to point to
+     * the native RSDT, and should not be used for the Dom0 kernel's boot
+     * purposes (we keep it visible for post boot access).
+     */
+    if ( pvh_steal_ram(d, sizeof(rsdp), 0, GB(4), &rsdp_paddr) )
+    {
+        printk("Unable to allocate guest RAM for RSDP\n");
+        return -ENOMEM;
+    }
+
+    /* Mark this region as E820_ACPI. */
+    if ( pvh_add_mem_range(d, rsdp_paddr, rsdp_paddr + sizeof(rsdp),
+                           E820_ACPI) )
+        printk("Unable to add RSDP region to memory map\n");
+
+    /* Copy RSDP into guest memory. */
+    rc = hvm_copy_to_guest_phys(rsdp_paddr, &rsdp, sizeof(rsdp), d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy RSDP into guest memory\n");
+        return rc;
+    }
+
+    /* Copy RSDP address to start_info. */
+    rc = hvm_copy_to_guest_phys(start_info +
+                                offsetof(struct hvm_start_info, rsdp_paddr),
+                                &rsdp_paddr,
+                                sizeof(((struct hvm_start_info *)
+                                        0)->rsdp_paddr),
+                                d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy RSDP into guest memory\n");
+        return rc;
+    }
+
+    return 0;
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
@@ -2235,6 +2662,13 @@ static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = pvh_setup_acpi(d, start_info);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 ACPI tables: %d\n", rc);
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-22 14:24 ` [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
@ 2017-02-23 13:39   ` Jan Beulich
  2017-02-23 15:27     ` Roger Pau Monne
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-02-23 13:39 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 22.02.17 at 15:24, <roger.pau@citrix.com> wrote:
> Craft the Dom0 e820 memory map and populate it. Introduce a helper to remove
> memory pages that are shared between Xen and a domain, and use it in order to
> remove low 1MB RAM regions from dom_io in order to assign them to a PVHv2 Dom0.
> 
> On hardware lacking support for unrestricted mode also craft the identity page
> tables and the TSS used for virtual 8086 mode.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit ...

> @@ -44,6 +45,16 @@ static long __initdata dom0_min_nrpages;
>  static long __initdata dom0_max_nrpages = LONG_MAX;
>  
>  /*
> + * Have the TSS cover the ISA port range, which makes it
> + * - 104 bytes base structure
> + * - 32 bytes interrupt redirection bitmap
> + * - 128 bytes I/O bitmap
> + * - one trailing byte
> + * or a total of 265 bytes.
> + */
> +#define HVM_VM86_TSS_SIZE 265

... I'm not convinced the same rationale as used in hvmloader
applies here. Namely, the more without legacy devices, there
should be pretty little reason for such a Dom0 to do port I/O
to any (including the ISA) ports from real mode, nor can I see
the usefulness of invoking INT $n instructions without there
being any firmware.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/7] x86/bzimage: change the types from char * to void *
  2017-02-22 14:24 ` [PATCH v7 3/7] x86/bzimage: change the types from char * to void * Roger Pau Monne
@ 2017-02-23 13:41   ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2017-02-23 13:41 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 22.02.17 at 15:24, <roger.pau@citrix.com> wrote:
> This allows to also change the types of image_base and image_start in the Dom0
> builder from char * to void *.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build
  2017-02-22 14:24 ` [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build Roger Pau Monne
@ 2017-02-23 13:47   ` Jan Beulich
  2017-02-23 16:01     ` Roger Pau Monne
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-02-23 13:47 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, Ian Jackson, xen-devel

>>> On 22.02.17 at 15:24, <roger.pau@citrix.com> wrote:
> Allow setting the destination vCPU for libelf, so that elf_load_image can take
> it into account when loading the kernel for Dom0. This is needed for PVHv2 Dom0
> build, so that hvm_copy_to_guest_phys can be called with a Dom0 vCPU instead of
> current (that contains the idle vCPU at this point).
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit ...

> --- a/xen/common/libelf/libelf-loader.c
> +++ b/xen/common/libelf/libelf-loader.c
> @@ -146,6 +146,25 @@ void elf_set_verbose(struct elf_binary *elf)
>      elf->verbose = 1;
>  }
>  
> +static elf_errorstatus elf_memcpy(struct vcpu *v, void *dst, void *src,
> +                                  uint64_t size)
> +{
> +    int rc = 0;
> +
> +#ifdef CONFIG_X86
> +    if ( is_hvm_vcpu(v) )
> +    {
> +        rc = hvm_copy_to_guest_phys((paddr_t)dst, src, size, v);
> +        rc = rc != HVMCOPY_okay ? -1 : 0;
> +    }
> +    else
> +#endif
> +        rc = src == NULL ? raw_clear_guest(dst, size) :
> +                           raw_copy_to_guest(dst, src, size);
> +
> +    return rc;
> +}

... elf_errorstatus is not a correct type for the return values of
raw_{copy_to,clear}_guest(). Nevertheless that's in line with ...

> static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, elf_ptrval src, uint64_t filesz, uint64_t memsz)
> {
>     elf_errorstatus rc;

... the variable declared here having been used ...

> @@ -153,10 +172,12 @@ static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, el
>          return -1;
>      /* We trust the dom0 kernel image completely, so we don't care
>       * about overruns etc. here. */
> -    rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src), filesz);
> +    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src),
> +                    filesz);
>      if ( rc != 0 )
>          return -1;
> -    rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
> +    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst + filesz), NULL,
> +                    memsz - filesz);

... the same (wrong) way, so should be good enough for now.
Ideally the setting of rc in elf_memcpy() would be corrected, though.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 5/7] xen/x86: parse Dom0 kernel for PVHv2
  2017-02-22 14:24 ` [PATCH v7 5/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
@ 2017-02-23 13:50   ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2017-02-23 13:50 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 22.02.17 at 15:24, <roger.pau@citrix.com> wrote:
> Introduce a helper to parse the Dom0 kernel.
> 
> A new helper is also introduced to libelf, that's used to store the destination
> vcpu of the domain. This parameter is needed when loading the kernel on a HVM
> domain (PVHv2), since hvm_copy_to_guest_phys requires passing the destination
> vcpu.
> 
> While there also fix image_base and image_start to be of type "void *", and do
> the necessary fixup of related functions.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-23 13:39   ` Jan Beulich
@ 2017-02-23 15:27     ` Roger Pau Monne
  2017-02-23 16:09       ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-23 15:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

On Thu, Feb 23, 2017 at 06:39:53AM -0700, Jan Beulich wrote:
> >>> On 22.02.17 at 15:24, <roger.pau@citrix.com> wrote:
> > Craft the Dom0 e820 memory map and populate it. Introduce a helper to remove
> > memory pages that are shared between Xen and a domain, and use it in order to
> > remove low 1MB RAM regions from dom_io in order to assign them to a PVHv2 Dom0.
> > 
> > On hardware lacking support for unrestricted mode also craft the identity page
> > tables and the TSS used for virtual 8086 mode.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit ...
> 
> > @@ -44,6 +45,16 @@ static long __initdata dom0_min_nrpages;
> >  static long __initdata dom0_max_nrpages = LONG_MAX;
> >  
> >  /*
> > + * Have the TSS cover the ISA port range, which makes it
> > + * - 104 bytes base structure
> > + * - 32 bytes interrupt redirection bitmap
> > + * - 128 bytes I/O bitmap
> > + * - one trailing byte
> > + * or a total of 265 bytes.
> > + */
> > +#define HVM_VM86_TSS_SIZE 265
> 
> ... I'm not convinced the same rationale as used in hvmloader
> applies here. Namely, the more without legacy devices, there
> should be pretty little reason for such a Dom0 to do port I/O
> to any (including the ISA) ports from real mode, nor can I see
> the usefulness of invoking INT $n instructions without there
> being any firmware.

Right, without firmware there isn't much point in any of this. This is just
going to be used to boot the APs, and that's probably all (which shouldn't
attempt to write to any IO port or execute any INT instruction).

I also don't see much benefit from deviating from what HVM does, so I would
just leave it as is.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build
  2017-02-23 13:47   ` Jan Beulich
@ 2017-02-23 16:01     ` Roger Pau Monne
  2017-02-23 16:17       ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-23 16:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, boris.ostrovsky, Ian Jackson, xen-devel

On Thu, Feb 23, 2017 at 06:47:24AM -0700, Jan Beulich wrote:
> >>> On 22.02.17 at 15:24, <roger.pau@citrix.com> wrote:
> > Allow setting the destination vCPU for libelf, so that elf_load_image can take
> > it into account when loading the kernel for Dom0. This is needed for PVHv2 Dom0
> > build, so that hvm_copy_to_guest_phys can be called with a Dom0 vCPU instead of
> > current (that contains the idle vCPU at this point).
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit ...
> 
> > --- a/xen/common/libelf/libelf-loader.c
> > +++ b/xen/common/libelf/libelf-loader.c
> > @@ -146,6 +146,25 @@ void elf_set_verbose(struct elf_binary *elf)
> >      elf->verbose = 1;
> >  }
> >  
> > +static elf_errorstatus elf_memcpy(struct vcpu *v, void *dst, void *src,
> > +                                  uint64_t size)
> > +{
> > +    int rc = 0;
> > +
> > +#ifdef CONFIG_X86
> > +    if ( is_hvm_vcpu(v) )
> > +    {
> > +        rc = hvm_copy_to_guest_phys((paddr_t)dst, src, size, v);
> > +        rc = rc != HVMCOPY_okay ? -1 : 0;
> > +    }
> > +    else
> > +#endif
> > +        rc = src == NULL ? raw_clear_guest(dst, size) :
> > +                           raw_copy_to_guest(dst, src, size);
> > +
> > +    return rc;
> > +}
> 
> ... elf_errorstatus is not a correct type for the return values of
> raw_{copy_to,clear}_guest(). Nevertheless that's in line with ...
> 
> > static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, elf_ptrval src, uint64_t filesz, uint64_t memsz)
> > {
> >     elf_errorstatus rc;
> 
> ... the variable declared here having been used ...
> 
> > @@ -153,10 +172,12 @@ static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, el
> >          return -1;
> >      /* We trust the dom0 kernel image completely, so we don't care
> >       * about overruns etc. here. */
> > -    rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src), filesz);
> > +    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src),
> > +                    filesz);
> >      if ( rc != 0 )
> >          return -1;
> > -    rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
> > +    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst + filesz), NULL,
> > +                    memsz - filesz);
> 
> ... the same (wrong) way, so should be good enough for now.
> Ideally the setting of rc in elf_memcpy() would be corrected, though.

Would you like to squash the following chunk on top of this patch?

---8<---
diff --git a/xen/common/libelf/libelf-loader.c b/xen/common/libelf/libelf-loader.c
index 371061c..1acecab 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -149,20 +149,22 @@ void elf_set_verbose(struct elf_binary *elf)
 static elf_errorstatus elf_memcpy(struct vcpu *v, void *dst, void *src,
                                   uint64_t size)
 {
-    int rc = 0;
+    unsigned int res;
 
 #ifdef CONFIG_X86
     if ( is_hvm_vcpu(v) )
     {
+        enum hvm_copy_result rc;
+
         rc = hvm_copy_to_guest_phys((paddr_t)dst, src, size, v);
-        rc = rc != HVMCOPY_okay ? -1 : 0;
+        return rc != HVMCOPY_okay ? -1 : 0;
     }
     else
 #endif
-        rc = src == NULL ? raw_clear_guest(dst, size) :
-                           raw_copy_to_guest(dst, src, size);
+        res = src == NULL ? raw_clear_guest(dst, size) :
+                            raw_copy_to_guest(dst, src, size);
 
-    return rc;
+    return res != 0 ? -1 : 0;
 }
 
 static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, elf_ptrval src, uint64_t filesz, uint64_t memsz)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-23 15:27     ` Roger Pau Monne
@ 2017-02-23 16:09       ` Jan Beulich
  2017-02-23 16:16         ` Andrew Cooper
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-02-23 16:09 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 23.02.17 at 16:27, <roger.pau@citrix.com> wrote:
> I also don't see much benefit from deviating from what HVM does, so I would
> just leave it as is.

Understood. The risk is that in a couple of years time understanding
why it is the way it is and (whether that's necessary) may take as
much effort as did the deciphering of the (buggy) hvmloader
setup code for this.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-23 16:09       ` Jan Beulich
@ 2017-02-23 16:16         ` Andrew Cooper
  2017-02-23 16:31           ` Roger Pau Monne
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2017-02-23 16:16 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne; +Cc: xen-devel, boris.ostrovsky

On 23/02/17 16:09, Jan Beulich wrote:
>>>> On 23.02.17 at 16:27, <roger.pau@citrix.com> wrote:
>> I also don't see much benefit from deviating from what HVM does, so I would
>> just leave it as is.
> Understood. The risk is that in a couple of years time understanding
> why it is the way it is and (whether that's necessary) may take as
> much effort as did the deciphering of the (buggy) hvmloader
> setup code for this.

In which case, just leave a short comment saying something like "Copy
HVMLoader for consistency, not that we expect a PVH domain from using
this for anything other than their AP trampoline".

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build
  2017-02-23 16:01     ` Roger Pau Monne
@ 2017-02-23 16:17       ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2017-02-23 16:17 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, Ian Jackson, xen-devel

>>> On 23.02.17 at 17:01, <roger.pau@citrix.com> wrote:
> On Thu, Feb 23, 2017 at 06:47:24AM -0700, Jan Beulich wrote:
>> >>> On 22.02.17 at 15:24, <roger.pau@citrix.com> wrote:
>> > --- a/xen/common/libelf/libelf-loader.c
>> > +++ b/xen/common/libelf/libelf-loader.c
>> > @@ -146,6 +146,25 @@ void elf_set_verbose(struct elf_binary *elf)
>> >      elf->verbose = 1;
>> >  }
>> >  
>> > +static elf_errorstatus elf_memcpy(struct vcpu *v, void *dst, void *src,
>> > +                                  uint64_t size)
>> > +{
>> > +    int rc = 0;
>> > +
>> > +#ifdef CONFIG_X86
>> > +    if ( is_hvm_vcpu(v) )
>> > +    {
>> > +        rc = hvm_copy_to_guest_phys((paddr_t)dst, src, size, v);
>> > +        rc = rc != HVMCOPY_okay ? -1 : 0;
>> > +    }
>> > +    else
>> > +#endif
>> > +        rc = src == NULL ? raw_clear_guest(dst, size) :
>> > +                           raw_copy_to_guest(dst, src, size);
>> > +
>> > +    return rc;
>> > +}
>> 
>> ... elf_errorstatus is not a correct type for the return values of
>> raw_{copy_to,clear}_guest(). Nevertheless that's in line with ...
>> 
>> > static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, elf_ptrval src, uint64_t filesz, uint64_t memsz)
>> > {
>> >     elf_errorstatus rc;
>> 
>> ... the variable declared here having been used ...
>> 
>> > @@ -153,10 +172,12 @@ static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, el
>> >          return -1;
>> >      /* We trust the dom0 kernel image completely, so we don't care
>> >       * about overruns etc. here. */
>> > -    rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src), filesz);
>> > +    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src),
>> > +                    filesz);
>> >      if ( rc != 0 )
>> >          return -1;
>> > -    rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
>> > +    rc = elf_memcpy(elf->vcpu, ELF_UNSAFE_PTR(dst + filesz), NULL,
>> > +                    memsz - filesz);
>> 
>> ... the same (wrong) way, so should be good enough for now.
>> Ideally the setting of rc in elf_memcpy() would be corrected, though.
> 
> Would you like to squash the following chunk on top of this patch?

Oh, thanks. Yes, I'll try to remember to do so in case I end up
committing the series.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-23 16:16         ` Andrew Cooper
@ 2017-02-23 16:31           ` Roger Pau Monne
  0 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monne @ 2017-02-23 16:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, boris.ostrovsky, Jan Beulich

On Thu, Feb 23, 2017 at 04:16:15PM +0000, Andrew Cooper wrote:
> On 23/02/17 16:09, Jan Beulich wrote:
> >>>> On 23.02.17 at 16:27, <roger.pau@citrix.com> wrote:
> >> I also don't see much benefit from deviating from what HVM does, so I would
> >> just leave it as is.
> > Understood. The risk is that in a couple of years time understanding
> > why it is the way it is and (whether that's necessary) may take as
> > much effort as did the deciphering of the (buggy) hvmloader
> > setup code for this.
> 
> In which case, just leave a short comment saying something like "Copy
> HVMLoader for consistency, not that we expect a PVH domain from using
> this for anything other than their AP trampoline".

What about adding the chunk below. If you agree I can prepare a git branch with
this and the libelf changes merged in (and the RBs), so you can pull from it.

Thanks, Roger.
---8<---
diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index adc4c00..4d7f4ae 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -51,6 +51,11 @@ static long __initdata dom0_max_nrpages = LONG_MAX;
  * - 128 bytes I/O bitmap
  * - one trailing byte
  * or a total of 265 bytes.
+ *
+ * NB: as PVHv2 Dom0 doesn't have legacy devices (ISA), it shouldn't have any
+ * business in accessing the ISA port range, much less in real mode, and due to
+ * the lack of firmware it shouldn't also execute any INT instruction. This is
+ * done for consistency with what hvmloader does.
  */
 #define HVM_VM86_TSS_SIZE 265
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-02-23 16:35 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-22 14:24 [PATCH v7 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
2017-02-22 14:24 ` [PATCH v7 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
2017-02-22 14:24 ` [PATCH v7 2/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
2017-02-23 13:39   ` Jan Beulich
2017-02-23 15:27     ` Roger Pau Monne
2017-02-23 16:09       ` Jan Beulich
2017-02-23 16:16         ` Andrew Cooper
2017-02-23 16:31           ` Roger Pau Monne
2017-02-22 14:24 ` [PATCH v7 3/7] x86/bzimage: change the types from char * to void * Roger Pau Monne
2017-02-23 13:41   ` Jan Beulich
2017-02-22 14:24 ` [PATCH v7 4/7] x86/libelf: pass the destination vCPU to libelf for Dom0 build Roger Pau Monne
2017-02-23 13:47   ` Jan Beulich
2017-02-23 16:01     ` Roger Pau Monne
2017-02-23 16:17       ` Jan Beulich
2017-02-22 14:24 ` [PATCH v7 5/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
2017-02-23 13:50   ` Jan Beulich
2017-02-22 14:24 ` [PATCH v7 6/7] xen/x86: Setup PVHv2 Dom0 CPUs Roger Pau Monne
2017-02-22 14:24 ` [PATCH v7 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.