All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Initial PVHv2 Dom0 support
@ 2017-02-10 12:33 Roger Pau Monne
  2017-02-10 12:33 ` [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk

Hello,

This is the first batch of the PVHv2 Dom0 support series that includes
everything up to the point where ACPI tables for Dom0 are crafted. I've
decided to left the last part of the series (the one that contains the PCI
config space handlers, and other emulation/trapping related code) separated,
in order to focus and ease the review. This is of course not functional, one
might be able to partially boot a Dom0 kernel if it doesn't try to access
any physical devices, and the panic in setup.c is removed.

The full series can also be found on a git branch in my personal git repo:

git://xenbits.xen.org/people/royger/xen.git dom0_hvm_v6

Each patch contains the changelog between versions.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
  2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
@ 2017-02-10 12:33 ` Roger Pau Monne
  2017-02-13 13:09   ` Jan Beulich
  2017-02-10 12:33 ` [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

PVHv2 guests, unlike HVM guests, won't have the option to route interrupts
from physical or emulated devices over event channels using PIRQs. This
applies to both DomU and Dom0 PVHv2 guests.

Introduce a new XEN_X86_EMU_USE_PIRQ to notify Xen whether a HVM guest can
route physical interrupts (even from emulated devices) over event channels,
and is thus allowed to use some of the PHYSDEV ops.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Introduce a has_pirq macro to match other XEN_X86_EMU_ options, and simplify
   some of the code.

Changes since v3:
 - Update docs.

Changes since v2:
 - Change local variable name to currd instead of d.
 - Use currd where it makes sense.
---
 docs/misc/hvmlite.markdown        | 20 ++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c            | 23 ++++++++++++++---------
 xen/arch/x86/physdev.c            |  4 ++--
 xen/common/kernel.c               |  2 +-
 xen/include/asm-x86/domain.h      |  2 ++
 xen/include/public/arch-x86/xen.h |  4 +++-
 6 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/docs/misc/hvmlite.markdown b/docs/misc/hvmlite.markdown
index 898b8ee..b2557f7 100644
--- a/docs/misc/hvmlite.markdown
+++ b/docs/misc/hvmlite.markdown
@@ -75,3 +75,23 @@ info structure that's passed at boot time (field rsdp_paddr).
 
 Description of paravirtualized devices will come from XenStore, just as it's
 done for HVM guests.
+
+## Interrupts ##
+
+### Interrupts from physical devices ###
+
+Interrupts from physical devices are delivered using native methods, this is
+done in order to take advantage of new hardware assisted virtualization
+functions, like posted interrupts. This implies that PVHv2 guests with physical
+devices will also have the necessary interrupt controllers in order to manage
+the delivery of interrupts from those devices, using the same interfaces that
+are available on native hardware.
+
+### Interrupts from paravirtualized devices ###
+
+Interrupts from paravirtualized devices are delivered using event channels, see
+[Event Channel Internals][event_channels] for more detailed information about
+event channels. Delivery of those interrupts can be configured in the same way
+as HVM guests, check xen/include/public/hvm/params.h and
+xen/include/public/hvm/hvm_op.h for more information about available delivery
+methods.
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5f72758..9e40865 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3764,10 +3764,12 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
+    struct domain *currd = current->domain;
+
     switch ( cmd )
     {
     default:
-        if ( !is_pvh_vcpu(current) || !is_hardware_domain(current->domain) )
+        if ( !is_pvh_domain(currd) || !is_hardware_domain(currd) )
             return -ENOSYS;
         /* fall through */
     case PHYSDEVOP_map_pirq:
@@ -3775,7 +3777,8 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
-        return do_physdev_op(cmd, arg);
+        return (has_pirq(currd) || is_pvh_domain(currd)) ?
+                do_physdev_op(cmd, arg) : -ENOSYS;
     }
 }
 
@@ -3808,17 +3811,19 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 static long hvm_physdev_op_compat32(
     int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
+    struct domain *d = current->domain;
+
     switch ( cmd )
     {
-        case PHYSDEVOP_map_pirq:
-        case PHYSDEVOP_unmap_pirq:
-        case PHYSDEVOP_eoi:
-        case PHYSDEVOP_irq_status_query:
-        case PHYSDEVOP_get_free_pirq:
-            return compat_physdev_op(cmd, arg);
+    case PHYSDEVOP_map_pirq:
+    case PHYSDEVOP_unmap_pirq:
+    case PHYSDEVOP_eoi:
+    case PHYSDEVOP_irq_status_query:
+    case PHYSDEVOP_get_free_pirq:
+        return has_pirq(d) ? compat_physdev_op(cmd, arg) : -ENOSYS;
         break;
     default:
-            return -ENOSYS;
+        return -ENOSYS;
         break;
     }
 }
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 5a49796..b4cc6a8 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -94,7 +94,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,
     int pirq, irq, ret = 0;
     void *map_data = NULL;
 
-    if ( domid == DOMID_SELF && is_hvm_domain(d) )
+    if ( domid == DOMID_SELF && is_hvm_domain(d) && has_pirq(d) )
     {
         /*
          * Only makes sense for vector-based callback, else HVM-IRQ logic
@@ -265,7 +265,7 @@ int physdev_unmap_pirq(domid_t domid, int pirq)
     if ( ret )
         goto free_domain;
 
-    if ( is_hvm_domain(d) )
+    if ( is_hvm_domain(d) && has_pirq(d) )
     {
         spin_lock(&d->event_lock);
         if ( domain_pirq_to_emuirq(d, pirq) != IRQ_UNBOUND )
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index d0edb13..4b87c60 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -332,7 +332,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             case guest_type_hvm:
                 fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
                              (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
+                             (has_pirq(d) ? (1U << XENFEAT_hvm_pirqs) : 0);
                 break;
             }
 #endif
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index e6c7e13..7226ab3 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -419,6 +419,8 @@ struct arch_domain
 #define has_vvga(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_VGA))
 #define has_viommu(d)      (!!((d)->arch.emulation_flags & XEN_X86_EMU_IOMMU))
 #define has_vpit(d)        (!!((d)->arch.emulation_flags & XEN_X86_EMU_PIT))
+#define has_pirq(d)        (!!((d)->arch.emulation_flags & \
+                            XEN_X86_EMU_USE_PIRQ))
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
 
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index 363c8cc..73c829a 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -293,12 +293,14 @@ struct xen_arch_domainconfig {
 #define XEN_X86_EMU_IOMMU           (1U<<_XEN_X86_EMU_IOMMU)
 #define _XEN_X86_EMU_PIT            8
 #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
+#define _XEN_X86_EMU_USE_PIRQ       9
+#define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
 
 #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |  \
                                      XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      \
                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  \
                                      XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   \
-                                     XEN_X86_EMU_PIT)
+                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
     uint32_t emulation_flags;
 };
 
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2
  2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
  2017-02-10 12:33 ` [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
@ 2017-02-10 12:33 ` Roger Pau Monne
  2017-02-10 13:32   ` Andrew Cooper
  2017-02-13 13:12   ` Jan Beulich
  2017-02-10 12:33 ` [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Split the Dom0 builder into two different functions, one for PV (and classic
PVH), and another one for PVHv2. Introduce a new command line parameter called
'dom0' that can be used to request the creation of a PVHv2 Dom0 by setting the
'hvm' sub-option. A panic has also been added if a user tries to use dom0=hvm
until all the code is in place, then the panic will be removed.

While there mark the dom0_shadow option that was used by PV Dom0 as deprecated,
it was lacking documentation and was not functional. Point users towards
dom0=shadow instead.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Remove duplicate define.
 - Also move the sanity check for d->vcpu[0]->is_initialised.
 - Mark dom0_shadow as deprecated, point users to switch to dom0=shadow.
 - Move the temporary panic from setup.c to the end of construct_dom0_pvh.

Changes since v4:
 - Move common sanity BUG_ONs and process_pending_softirqs to construct_dom0.
 - Remove the non-existant documentation about dom0_shadow option.
 - Fix the define of dom0_shadow to be 'false' instead of 0.
 - Move the parsing of the dom0 command line option to domain_build.c.
 - s/hvm/pvh.

Changes since v3:
 - Correctly declare the parameter list.
 - Add a panic if dom0=hvm is used. This will be removed once all the code is in
   place.

Changes since v2:
 - Fix coding style.
 - Introduce a new dom0 option that allows passing several parameters.
   Currently supported ones are hvm and shadow.

Changes since RFC:
 - Add documentation for the new command line option.
 - Simplify the logic in construct_dom0.
---
 docs/misc/xen-command-line.markdown | 19 ++++++++++
 xen/arch/x86/domain_build.c         | 71 +++++++++++++++++++++++++++++++------
 xen/arch/x86/setup.c                |  8 +++++
 xen/include/asm-x86/setup.h         |  7 ++++
 4 files changed, 94 insertions(+), 11 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index a11fdf9..3acbb33 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -649,6 +649,8 @@ affinities to prefer but be not limited to the specified node(s).
 ### dom0\_shadow
 > `= <boolean>`
 
+This option is deprecated, please use `dom0=shadow` instead.
+
 ### dom0\_vcpus\_pin
 > `= <boolean>`
 
@@ -656,6 +658,23 @@ affinities to prefer but be not limited to the specified node(s).
 
 Pin dom0 vcpus to their respective pcpus
 
+### dom0
+> `= List of [ pvh | shadow ]`
+
+> Sub-options:
+
+> `pvh`
+
+> Default: `false`
+
+Flag that makes a dom0 boot in PVHv2 mode.
+
+> `shadow`
+
+> Default: `false`
+
+Flag that makes a dom0 use shadow paging.
+
 ### dom0pvh
 > `= <boolean>`
 
diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 243df96..7123931 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -191,11 +191,38 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 }
 
 #ifdef CONFIG_SHADOW_PAGING
-static bool_t __initdata opt_dom0_shadow;
+bool __initdata opt_dom0_shadow;
 boolean_param("dom0_shadow", opt_dom0_shadow);
-#else
-#define opt_dom0_shadow 0
 #endif
+bool __initdata dom0_pvh;
+
+/*
+ * List of parameters that affect Dom0 creation:
+ *
+ *  - pvh               Create a PVHv2 Dom0.
+ *  - shadow            Use shadow paging for Dom0.
+ */
+static void __init parse_dom0_param(char *s)
+{
+    char *ss;
+
+    do {
+
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strcmp(s, "pvh") )
+            dom0_pvh = true;
+#ifdef CONFIG_SHADOW_PAGING
+        else if ( !strcmp(s, "shadow") )
+            opt_dom0_shadow = true;
+#endif
+
+        s = ss + 1;
+    } while ( ss );
+}
+custom_param("dom0", parse_dom0_param);
 
 static char __initdata opt_dom0_ioports_disable[200] = "";
 string_param("dom0_ioports_disable", opt_dom0_ioports_disable);
@@ -951,7 +978,7 @@ static int __init setup_permissions(struct domain *d)
     return rc;
 }
 
-int __init construct_dom0(
+static int __init construct_dom0_pv(
     struct domain *d,
     const module_t *image, unsigned long image_headroom,
     module_t *initrd,
@@ -1007,13 +1034,6 @@ int __init construct_dom0(
     /* Machine address of next candidate page-table page. */
     paddr_t mpt_alloc;
 
-    /* Sanity! */
-    BUG_ON(d->domain_id != 0);
-    BUG_ON(d->vcpu[0] == NULL);
-    BUG_ON(v->is_initialised);
-
-    process_pending_softirqs();
-
     printk("*** LOADING DOMAIN 0 ***\n");
 
     d->max_pages = ~0U;
@@ -1655,6 +1675,35 @@ out:
     return rc;
 }
 
+static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
+                                     unsigned long image_headroom,
+                                     module_t *initrd,
+                                     void *(*bootstrap_map)(const module_t *),
+                                     char *cmdline)
+{
+
+    printk("** Building a PVH Dom0 **\n");
+
+    panic("Building a PVHv2 Dom0 is not yet supported.");
+    return 0;
+}
+
+int __init construct_dom0(struct domain *d, const module_t *image,
+                          unsigned long image_headroom, module_t *initrd,
+                          void *(*bootstrap_map)(const module_t *),
+                          char *cmdline)
+{
+    /* Sanity! */
+    BUG_ON(d->domain_id != 0);
+    BUG_ON(d->vcpu[0] == NULL);
+    BUG_ON(d->vcpu[0]->is_initialised);
+
+    process_pending_softirqs();
+
+    return (is_hvm_domain(d) ? construct_dom0_pvh : construct_dom0_pv)
+           (d, image, image_headroom, initrd,bootstrap_map, cmdline);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 176ee74..48fd955 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1550,6 +1550,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( opt_dom0pvh )
         domcr_flags |= DOMCRF_pvh | DOMCRF_hap;
 
+    if ( dom0_pvh )
+    {
+        domcr_flags |= DOMCRF_hvm |
+                       ((hvm_funcs.hap_supported && !opt_dom0_shadow) ?
+                         DOMCRF_hap : 0);
+        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
+    }
+
     /* Create initial domain 0. */
     dom0 = domain_create(0, domcr_flags, 0, &config);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
index c65b79c..47b9442 100644
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -57,4 +57,11 @@ extern uint8_t kbd_shift_flags;
 extern unsigned long highmem_start;
 #endif
 
+#ifdef CONFIG_SHADOW_PAGING
+extern bool opt_dom0_shadow;
+#else
+#define opt_dom0_shadow false
+#endif
+extern bool dom0_pvh;
+
 #endif
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
  2017-02-10 12:33 ` [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
  2017-02-10 12:33 ` [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
@ 2017-02-10 12:33 ` Roger Pau Monne
  2017-02-13 13:53   ` Jan Beulich
  2017-02-10 12:33 ` [PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Craft the Dom0 e820 memory map and populate it. Introduce a helper to remove
memory pages that are shared between Xen and a domain, and use it in order to
remove low 1MB RAM regions from dom_io in order to assign them to a PVHv2 Dom0.

On hardware lacking support for unrestricted mode also craft the identity page
tables and the TSS used for virtual 8086 mode.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Adjust the logic to set need_paging.
 - Remove the usage of the _AC macro.
 - Subtract memory from the end of regions instead of the start.
 - Create the VM86_TSS before the identity page table, so that the page table
   is aligned to a page boundary.
 - Use MB1_PAGES in modify_identity_mmio.
 - Move and simply the ASSERT in pvh_setup_p2m.
 - Move the creation of the PSE page tables to a separate function, and use it
   in shadow_enable also.
 - Make the map modify_identiy_mmio parameter a constant.
 - Add a comment to HVM_VM86_TSS_SIZE, although it seems this might need
   further fixing.
 - Introduce pvh_add_mem_range in order to mark the regions used by the VM86
   TSS and the identity page tables as reserved in the memory map.
 - Add a parameter to request aligned memory from pvh_steal_ram.

Changes since v4:
 - Move process_pending_softirqs to previous patch.
 - Fix off-by-one errors in some checks.
 - Make unshare_xen_page_with_guest __init.
 - Improve unshare_xen_page_with_guest by making use of already existing
   is_xen_heap_page and put_page.
 - s/hvm/pvh/.
 - Use PAGE_ORDER_4K in pvh_setup_e820 in order to keep consistency with the
   p2m code.

Changes since v3:
 - Drop get_order_from_bytes_floor, it was only used by
   hvm_populate_memory_range.
 - Switch hvm_populate_memory_range to use frame numbers instead of full memory
   addresses.
 - Add a helper to steal the low 1MB RAM areas from dom_io and add them to Dom0
   as normal RAM.
 - Introduce unshare_xen_page_with_guest in order to remove pages from dom_io,
   so they can be assigned to other domains. This is needed in order to remove
   the low 1MB RAM regions from dom_io and assign them to the hardware_domain.
 - Simplify the loop in hvm_steal_ram.
 - Move definition of map_identity_mmio into this patch.

Changes since v2:
 - Introduce get_order_from_bytes_floor as a local function to
   domain_build.c.
 - Remove extra asserts.
 - Make hvm_populate_memory_range return an error code instead of panicking.
 - Fix comments and printks.
 - Use ULL sufix instead of casting to uint64_t.
 - Rename hvm_setup_vmx_unrestricted_guest to
   hvm_setup_vmx_realmode_helpers.
 - Only substract two pages from the memory calculation, that will be used
   by the MADT replacement.
 - Remove some comments.
 - Remove printing allocation information.
 - Don't stash any pages for the MADT, TSS or ident PT, those will be
   subtracted directly from RAM regions of the memory map.
 - Count the number of iterations before calling process_pending_softirqs
   when populating the memory map.
 - Move the initial call to process_pending_softirqs into construct_dom0,
   and remove the ones from construct_dom0_hvm and construct_dom0_pv.
 - Make memflags global so it can be shared between alloc_chunk and
   hvm_populate_memory_range.

Changes since RFC:
 - Use IS_ALIGNED instead of checking with PAGE_MASK.
 - Use the new %pB specifier in order to print sizes in human readable form.
 - Create a VM86 TSS for hardware that doesn't support unrestricted mode.
 - Subtract guest RAM for the identity page table and the VM86 TSS.
 - Split the creation of the unrestricted mode helper structures to a
   separate function.
 - Use preemption with paging_set_allocation.
 - Use get_order_from_bytes_floor.
---
 xen/arch/x86/domain_build.c     | 360 +++++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/mm.c               |  16 ++
 xen/arch/x86/mm/shadow/common.c |   7 +-
 xen/include/asm-x86/mm.h        |   2 +
 xen/include/asm-x86/page.h      |  12 ++
 5 files changed, 387 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 7123931..be50b65 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -22,6 +22,7 @@
 #include <xen/compat.h>
 #include <xen/libelf.h>
 #include <xen/pfn.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -44,6 +45,12 @@ static long __initdata dom0_min_nrpages;
 static long __initdata dom0_max_nrpages = LONG_MAX;
 
 /*
+ * Size of the VM86 TSS for virtual 8086 mode to use. This value has been
+ * taken from what hvmloader does.
+ * */
+#define HVM_VM86_TSS_SIZE   128
+
+/*
  * dom0_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
  * 
  * <min_amt>: The minimum amount of memory which should be allocated for dom0.
@@ -242,11 +249,12 @@ boolean_param("ro-hpet", ro_hpet);
 #define round_pgup(_p)    (((_p)+(PAGE_SIZE-1))&PAGE_MASK)
 #define round_pgdown(_p)  ((_p)&PAGE_MASK)
 
+static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
+
 static struct page_info * __init alloc_chunk(
     struct domain *d, unsigned long max_pages)
 {
     static unsigned int __initdata last_order = MAX_ORDER;
-    static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
     struct page_info *page;
     unsigned int order = get_order_from_pages(max_pages), free_order;
 
@@ -331,7 +339,9 @@ static unsigned long __init compute_dom0_nr_pages(
             avail -= max_pdx >> s;
     }
 
-    need_paging = opt_dom0_shadow || (is_pvh_domain(d) && !iommu_hap_pt_share);
+    need_paging = has_hvm_container_domain(d)
+                  ? !iommu_hap_pt_share || !paging_mode_hap(d)
+                  : opt_dom0_shadow;
     for ( ; ; need_paging = 0 )
     {
         nr_pages = dom0_nrpages;
@@ -363,7 +373,8 @@ static unsigned long __init compute_dom0_nr_pages(
         avail -= dom0_paging_pages(d, nr_pages);
     }
 
-    if ( (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
+    if ( is_pv_domain(d) &&
+         (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
          ((dom0_min_nrpages <= 0) || (nr_pages > min_pages)) )
     {
         /*
@@ -579,6 +590,7 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
     struct e820entry *entry, *entry_guest;
     unsigned int i;
     unsigned long pages, cur_pages = 0;
+    uint64_t start, end;
 
     /*
      * Craft the e820 memory map for Dom0 based on the hardware e820 map.
@@ -606,8 +618,22 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
             continue;
         }
 
-        *entry_guest = *entry;
-        pages = PFN_UP(entry_guest->size);
+        /*
+         * Make sure the start and length are aligned to PAGE_SIZE, because
+         * that's the minimum granularity of the 2nd stage translation. Since
+         * the p2m code uses PAGE_ORDER_4K internally, also use it here in
+         * order to prevent this code from getting out of sync.
+         */
+        start = ROUNDUP(entry->addr, PAGE_SIZE << PAGE_ORDER_4K);
+        end = (entry->addr + entry->size) &
+              ~((PAGE_SIZE << PAGE_ORDER_4K) - 1);
+        if ( start >= end )
+            continue;
+
+        entry_guest->type = E820_RAM;
+        entry_guest->addr = start;
+        entry_guest->size = end - start;
+        pages = PFN_DOWN(entry_guest->size);
         if ( (cur_pages + pages) > nr_pages )
         {
             /* Truncate region */
@@ -1675,15 +1701,339 @@ out:
     return rc;
 }
 
+static int __init modify_identity_mmio(struct domain *d, unsigned long pfn,
+                                       unsigned long nr_pages, const bool map)
+{
+    int rc;
+
+    for ( ; ; )
+    {
+        rc = (map ? map_mmio_regions : unmap_mmio_regions)
+             (d, _gfn(pfn), nr_pages, _mfn(pfn));
+        if ( rc == 0 )
+            break;
+        if ( rc < 0 )
+        {
+            printk(XENLOG_WARNING
+                   "Failed to identity %smap [%#lx,%#lx) for d%d: %d\n",
+                   map ? "" : "un", pfn, pfn + nr_pages, d->domain_id, rc);
+            break;
+        }
+        nr_pages -= rc;
+        pfn += rc;
+        process_pending_softirqs();
+    }
+
+    return rc;
+}
+
+/* Populate a HVM memory range using the biggest possible order. */
+static int __init pvh_populate_memory_range(struct domain *d,
+                                            unsigned long start,
+                                            unsigned long nr_pages)
+{
+    unsigned int order, i = 0;
+    struct page_info *page;
+    int rc;
+#define MAP_MAX_ITER 64
+
+    order = MAX_ORDER;
+    while ( nr_pages != 0 )
+    {
+        unsigned int range_order = get_order_from_pages(nr_pages + 1);
+
+        order = min(range_order ? range_order - 1 : 0, order);
+        page = alloc_domheap_pages(d, order, memflags);
+        if ( page == NULL )
+        {
+            if ( order == 0 && memflags )
+            {
+                /* Try again without any memflags. */
+                memflags = 0;
+                order = MAX_ORDER;
+                continue;
+            }
+            if ( order == 0 )
+            {
+                printk("Unable to allocate memory with order 0!\n");
+                return -ENOMEM;
+            }
+            order--;
+            continue;
+        }
+
+        rc = guest_physmap_add_page(d, _gfn(start), _mfn(page_to_mfn(page)),
+                                    order);
+        if ( rc != 0 )
+        {
+            printk("Failed to populate memory: [%#lx,%lx): %d\n",
+                   start, start + (1UL << order), rc);
+            return -ENOMEM;
+        }
+        start += 1UL << order;
+        nr_pages -= 1UL << order;
+        if ( (++i % MAP_MAX_ITER) == 0 )
+            process_pending_softirqs();
+    }
+
+    return 0;
+#undef MAP_MAX_ITER
+}
+
+/* Steal RAM from the end of a memory region. */
+static int __init pvh_steal_ram(struct domain *d, unsigned long size,
+                                unsigned long align, paddr_t limit,
+                                paddr_t *addr)
+{
+    unsigned int i = d->arch.nr_e820;
+
+    /*
+     * Alignment 0 should be set to 1, so it doesn't wrap around in the
+     * calculations below.
+     */
+    align = align ? : 1;
+    while ( i-- )
+    {
+        struct e820entry *entry = &d->arch.e820[i];
+
+        if ( entry->type != E820_RAM || entry->addr + entry->size > limit ||
+             entry->addr < MB(1) )
+            continue;
+
+        *addr = (entry->addr + entry->size - size) & ~(align - 1);
+        if ( *addr < entry->addr )
+            continue;
+
+        entry->size = *addr - entry->addr;
+        return 0;
+    }
+
+    return -ENOMEM;
+}
+
+/* NB: memory map must be sorted at all times for this to work correctly. */
+static int __init pvh_add_mem_range(struct domain *d, uint64_t s, uint64_t e,
+                                    unsigned int type)
+{
+    struct e820entry *map;
+    unsigned int i;
+
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        uint64_t rs = d->arch.e820[i].addr;
+        uint64_t re = rs + d->arch.e820[i].size;
+
+        if ( rs == e && d->arch.e820[i].type == type )
+        {
+            d->arch.e820[i].addr = s;
+            return 0;
+        }
+
+        if ( re == s && d->arch.e820[i].type == type &&
+             (i + 1 == d->arch.nr_e820 || d->arch.e820[i + 1].addr >= e) )
+        {
+            d->arch.e820[i].size += e - s;
+            return 0;
+        }
+
+        if ( rs >= e )
+            break;
+
+        if ( re > s )
+            return -EEXIST;
+    }
+
+    map = xzalloc_array(struct e820entry, d->arch.nr_e820 + 1);
+    if ( !map )
+    {
+        printk(XENLOG_WARNING "E820: out of memory to add region\n");
+        return -ENOMEM;
+    }
+
+    memcpy(map, d->arch.e820, i * sizeof(*d->arch.e820));
+    memcpy(map + i + 1, d->arch.e820 + i,
+           (d->arch.nr_e820 - i) * sizeof(*d->arch.e820));
+    map[i].addr = s;
+    map[i].size = e - s;
+    map[i].type = type;
+    xfree(d->arch.e820);
+    d->arch.e820 = map;
+    d->arch.nr_e820++;
+
+    return 0;
+}
+
+static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
+{
+    p2m_type_t p2mt;
+    uint32_t rc, *ident_pt;
+    uint8_t *tss;
+    mfn_t mfn;
+    paddr_t gaddr;
+
+    /*
+     * Steal some space from the last RAM region below 4GB and use it to
+     * store the real-mode TSS.
+     */
+    if ( !pvh_steal_ram(d, HVM_VM86_TSS_SIZE, 0, GB(4), &gaddr) &&
+         (tss = map_domain_gfn(p2m_get_hostp2m(d), _gfn(PFN_DOWN(gaddr)),
+                               &mfn, &p2mt, 0, &rc)) )
+    {
+        memset(tss, 0, HVM_VM86_TSS_SIZE);
+        unmap_domain_page(tss);
+        put_page(mfn_to_page(mfn_x(mfn)));
+        d->arch.hvm_domain.params[HVM_PARAM_VM86_TSS] = gaddr;
+        if ( pvh_add_mem_range(d, gaddr, gaddr + HVM_PARAM_VM86_TSS,
+                               E820_RESERVED) )
+            printk("Unable to set VM86 TSS as reserved in the memory map\n");
+    }
+    else
+        printk("Unable to allocate or map VM86 TSS area\n");
+
+    /* Steal some more RAM for the identity page tables. */
+    if ( pvh_steal_ram(d, PAGE_SIZE, PAGE_SIZE, GB(4), &gaddr) )
+    {
+        printk("Unable to find memory to stash the identity page tables\n");
+        return -ENOMEM;
+    }
+
+    /*
+     * Identity-map page table is required for running with CR0.PG=0
+     * when using Intel EPT. Create a 32-bit non-PAE page directory of
+     * superpages.
+     */
+    ident_pt = map_domain_gfn(p2m_get_hostp2m(d), _gfn(PFN_DOWN(gaddr)),
+                              &mfn, &p2mt, 0, &rc);
+    if ( ident_pt == NULL )
+    {
+        printk("Unable to map identity page tables\n");
+        return -ENOMEM;
+    }
+    write_32bit_pse_identmap(ident_pt);
+    unmap_domain_page(ident_pt);
+    put_page(mfn_to_page(mfn_x(mfn)));
+    d->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] = gaddr;
+    if ( pvh_add_mem_range(d, gaddr, gaddr + PAGE_SIZE, E820_RESERVED) )
+            printk("Unable to set identity page tables as reserved in the memory map\n");
+
+    return 0;
+}
+
+/* Assign the low 1MB to Dom0. */
+static void __init pvh_steal_low_ram(struct domain *d, unsigned long start,
+                                     unsigned long nr_pages)
+{
+    unsigned long mfn;
+
+    ASSERT(start + nr_pages <= PFN_DOWN(MB(1)));
+
+    for ( mfn = start; mfn < start + nr_pages; mfn++ )
+    {
+        struct page_info *pg = mfn_to_page(mfn);
+        int rc;
+
+        rc = unshare_xen_page_with_guest(pg, dom_io);
+        if ( rc )
+        {
+            printk("Unable to unshare Xen mfn %#lx: %d\n", mfn, rc);
+            continue;
+        }
+
+        share_xen_page_with_guest(pg, d, XENSHARE_writable);
+        rc = guest_physmap_add_entry(d, _gfn(mfn), _mfn(mfn), 0, p2m_ram_rw);
+        if ( rc )
+            printk("Unable to add mfn %#lx to p2m: %d\n", mfn, rc);
+    }
+}
+
+static int __init pvh_setup_p2m(struct domain *d)
+{
+    struct vcpu *v = d->vcpu[0];
+    unsigned long nr_pages;
+    unsigned int i;
+    int rc;
+    bool preempted;
+#define MB1_PAGES PFN_DOWN(MB(1))
+
+    nr_pages = compute_dom0_nr_pages(d, NULL, 0);
+
+    pvh_setup_e820(d, nr_pages);
+    do {
+        preempted = false;
+        paging_set_allocation(d, dom0_paging_pages(d, nr_pages),
+                              &preempted);
+        process_pending_softirqs();
+    } while ( preempted );
+
+    /*
+     * Memory below 1MB is identity mapped.
+     * NB: this only makes sense when booted from legacy BIOS.
+     */
+    rc = modify_identity_mmio(d, 0, MB1_PAGES, true);
+    if ( rc )
+    {
+        printk("Failed to identity map low 1MB: %d\n", rc);
+        return rc;
+    }
+
+    /* Populate memory map. */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        unsigned long addr, size;
+
+        if ( d->arch.e820[i].type != E820_RAM )
+            continue;
+
+        addr = PFN_DOWN(d->arch.e820[i].addr);
+        size = PFN_DOWN(d->arch.e820[i].size);
+
+        if ( addr >= MB1_PAGES )
+            rc = pvh_populate_memory_range(d, addr, size);
+        else
+        {
+            ASSERT(addr + size < MB1_PAGES);
+            pvh_steal_low_ram(d, addr, size);
+        }
+
+        if ( rc )
+            return rc;
+    }
+
+    if ( cpu_has_vmx && paging_mode_hap(d) && !vmx_unrestricted_guest(v) )
+    {
+        /*
+         * Since Dom0 cannot be migrated, we will only setup the
+         * unrestricted guest helpers if they are needed by the current
+         * hardware we are running on.
+         */
+        rc = pvh_setup_vmx_realmode_helpers(d);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+#undef MB1_PAGES
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
                                      void *(*bootstrap_map)(const module_t *),
                                      char *cmdline)
 {
+    int rc;
 
     printk("** Building a PVH Dom0 **\n");
 
+    iommu_hwdom_init(d);
+
+    rc = pvh_setup_p2m(d);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 physical memory map\n");
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index f87c08f..d8f4983 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -475,6 +475,22 @@ void share_xen_page_with_guest(
     spin_unlock(&d->page_alloc_lock);
 }
 
+int __init unshare_xen_page_with_guest(struct page_info *page,
+                                       struct domain *d)
+{
+    if ( page_get_owner(page) != d || !is_xen_heap_page(page) )
+        return -EINVAL;
+
+    if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
+        put_page(page);
+
+    /* Remove the owner and clear the flags. */
+    page->u.inuse.type_info = 0;
+    page_set_owner(page, NULL);
+
+    return 0;
+}
+
 void share_xen_page_with_privileged_guests(
     struct page_info *page, int readonly)
 {
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index a619d65..013f78c 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3064,7 +3064,7 @@ int shadow_enable(struct domain *d, u32 mode)
     unsigned int old_pages;
     struct page_info *pg = NULL;
     uint32_t *e;
-    int i, rv = 0;
+    int rv = 0;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     mode |= PG_SH_enable;
@@ -3120,10 +3120,7 @@ int shadow_enable(struct domain *d, u32 mode)
         /* Fill it with 32-bit, non-PAE superpage entries, each mapping 4MB
          * of virtual address space onto the same physical address range */
         e = __map_domain_page(pg);
-        for ( i = 0; i < PAGE_SIZE / sizeof(*e); i++ )
-            e[i] = ((0x400000U * i)
-                    | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER
-                    | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+        write_32bit_pse_identmap(e);
         unmap_domain_page(e);
         pg->u.inuse.type_info = PGT_l2_page_table | 1 | PGT_validated;
     }
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index f0efacb..8b53bad 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -276,6 +276,8 @@ struct spage_info
 #define XENSHARE_readonly 1
 extern void share_xen_page_with_guest(
     struct page_info *page, struct domain *d, int readonly);
+extern int unshare_xen_page_with_guest(struct page_info *page,
+                                       struct domain *d);
 extern void share_xen_page_with_privileged_guests(
     struct page_info *page, int readonly);
 extern void free_shared_domheap_page(struct page_info *page);
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index af7d3e8..861f480 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -374,6 +374,18 @@ perms_strictly_increased(uint32_t old_flags, uint32_t new_flags)
     return ((of | (of ^ nf)) == nf);
 }
 
+/* Build a 32bit PSE page table using 4MB pages. */
+static inline void
+write_32bit_pse_identmap(uint32_t *l2)
+{
+    unsigned int i;
+
+    for ( i = 0; i < PAGE_SIZE / sizeof(*l2); i++ )
+        l2[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
+                 _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+}
+
+
 #endif /* !__ASSEMBLY__ */
 
 #define PAGE_ALIGN(x) (((x) + PAGE_SIZE - 1) & PAGE_MASK)
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2
  2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (2 preceding siblings ...)
  2017-02-10 12:33 ` [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
@ 2017-02-10 12:33 ` Roger Pau Monne
  2017-02-10 14:34   ` Ian Jackson
  2017-02-10 12:33 ` [PATCH v6 5/7] x86/PVHv2: fix dom0_max_vcpus so it's capped to HVM_MAX_VCPUS for PVHv2 Dom0 Roger Pau Monne
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk
  Cc: Andrew Cooper, Ian Jackson, Jan Beulich, Roger Pau Monne

Introduce a helper to parse the Dom0 kernel.

A new helper is also introduced to libelf, that's used to store the destination
vcpu of the domain. This parameter is needed when loading the kernel on a HVM
domain (PVHv2), since hvm_copy_to_guest_phys requires passing the destination
vcpu.

While there also fix image_base and image_start to be of type "void *", and do
the necessary fixup of related functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v5:
 - s/hvm_copy_to_guest_phys_vcpu/hvm_copy_to_guest_phys/.
 - Use void * for image_base and image_start, make the necessary changes.
 - Introduce elf_set_vcpu in order to store the destination vcpu in
   elf_binary, and use it in elf_load_image. This avoids having to override
   current.
 - Style fixes.
 - Round up the position of the modlist/start_info to an aligned address
   depending on the kernel bitness.

Changes since v4:
 - s/hvm/pvh.
 - Use hvm_copy_to_guest_phys_vcpu.

Changes since v3:
 - Change one error message.
 - Indent "out" label by one space.
 - Introduce hvm_copy_to_phys and slightly simplify the code in hvm_load_kernel.

Changes since v2:
 - Remove debug messages.
 - Don't hardcode the number of modules to 1.
---
 xen/arch/x86/bzimage.c            |   3 +-
 xen/arch/x86/domain_build.c       | 136 +++++++++++++++++++++++++++++++++++++-
 xen/common/libelf/libelf-loader.c |  13 +++-
 xen/include/asm-x86/bzimage.h     |   2 +-
 xen/include/xen/libelf.h          |   6 ++
 5 files changed, 155 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index 50ebb84..124c386 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -104,7 +104,8 @@ unsigned long __init bzimage_headroom(char *image_start,
     return headroom;
 }
 
-int __init bzimage_parse(char *image_base, char **image_start, unsigned long *image_len)
+int __init bzimage_parse(void *image_base, void **image_start,
+                         unsigned long *image_len)
 {
     struct setup_header *hdr = (struct setup_header *)(*image_start);
     int err = bzimage_check(hdr, *image_len);
diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index be50b65..01c9348 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -39,6 +39,7 @@
 #include <asm/hpet.h>
 
 #include <public/version.h>
+#include <public/arch-x86/hvm/start_info.h>
 
 static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
@@ -1026,7 +1027,7 @@ static int __init construct_dom0_pv(
     unsigned long long value;
     char *image_base = bootstrap_map(image);
     unsigned long image_len = image->mod_end;
-    char *image_start = image_base + image_headroom;
+    void *image_start = image_base + image_headroom;
     unsigned long initrd_len = initrd ? initrd->mod_end : 0;
     l4_pgentry_t *l4tab = NULL, *l4start = NULL;
     l3_pgentry_t *l3tab = NULL, *l3start = NULL;
@@ -1457,6 +1458,7 @@ static int __init construct_dom0_pv(
     /* Copy the OS image and free temporary buffer. */
     elf.dest_base = (void*)vkern_start;
     elf.dest_size = vkern_end - vkern_start;
+    elf_set_vcpu(&elf, v);
     rc = elf_load_binary(&elf);
     if ( rc < 0 )
     {
@@ -2015,12 +2017,136 @@ static int __init pvh_setup_p2m(struct domain *d)
 #undef MB1_PAGES
 }
 
+static int __init pvh_load_kernel(struct domain *d, const module_t *image,
+                                  unsigned long image_headroom,
+                                  module_t *initrd, void *image_base,
+                                  char *cmdline, paddr_t *entry,
+                                  paddr_t *start_info_addr)
+{
+    void *image_start = image_base + image_headroom;
+    unsigned long image_len = image->mod_end;
+    struct elf_binary elf;
+    struct elf_dom_parms parms;
+    paddr_t last_addr;
+    struct hvm_start_info start_info = { 0 };
+    struct hvm_modlist_entry mod = { 0 };
+    struct vcpu *v = d->vcpu[0];
+    int rc;
+
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
+    {
+        printk("Error trying to detect bz compressed kernel\n");
+        return rc;
+    }
+
+    if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
+    {
+        printk("Unable to init ELF\n");
+        return rc;
+    }
+#ifdef VERBOSE
+    elf_set_verbose(&elf);
+#endif
+    elf_parse_binary(&elf);
+    if ( (rc = elf_xen_parse(&elf, &parms)) != 0 )
+    {
+        printk("Unable to parse kernel for ELFNOTES\n");
+        return rc;
+    }
+
+    if ( parms.phys_entry == UNSET_ADDR32 )
+    {
+        printk("Unable to find XEN_ELFNOTE_PHYS32_ENTRY address\n");
+        return -EINVAL;
+    }
+
+    printk("OS: %s version: %s loader: %s bitness: %s\n", parms.guest_os,
+           parms.guest_ver, parms.loader,
+           elf_64bit(&elf) ? "64-bit" : "32-bit");
+
+    /* Copy the OS image and free temporary buffer. */
+    elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
+    elf.dest_size = parms.virt_kend - parms.virt_kstart;
+
+    elf_set_vcpu(&elf, v);
+    rc = elf_load_binary(&elf);
+    if ( rc < 0 )
+    {
+        printk("Failed to load kernel: %d\n", rc);
+        printk("Xen dom0 kernel broken ELF: %s\n", elf_check_broken(&elf));
+        return rc;
+    }
+
+    last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
+
+    if ( initrd != NULL )
+    {
+        rc = hvm_copy_to_guest_phys(last_addr, mfn_to_virt(initrd->mod_start),
+                                    initrd->mod_end, v);
+        if ( rc )
+        {
+            printk("Unable to copy initrd to guest\n");
+            return rc;
+        }
+
+        mod.paddr = last_addr;
+        mod.size = initrd->mod_end;
+        last_addr += ROUNDUP(initrd->mod_end, PAGE_SIZE);
+    }
+
+    /* Free temporary buffers. */
+    discard_initial_images();
+
+    if ( cmdline != NULL )
+    {
+        rc = hvm_copy_to_guest_phys(last_addr, cmdline, strlen(cmdline) + 1, v);
+        if ( rc )
+        {
+            printk("Unable to copy guest command line\n");
+            return rc;
+        }
+        start_info.cmdline_paddr = last_addr;
+        /*
+         * Round up to 32/64 bits (depending on the guest kernel bitness) so
+         * the modlist/start_info is aligned.
+         */
+        last_addr += ROUNDUP(strlen(cmdline) + 1, elf_64bit(&elf) ? 8 : 4);
+    }
+    if ( initrd != NULL )
+    {
+        rc = hvm_copy_to_guest_phys(last_addr, &mod, sizeof(mod), v);
+        if ( rc )
+        {
+            printk("Unable to copy guest modules\n");
+            return rc;
+        }
+        start_info.modlist_paddr = last_addr;
+        start_info.nr_modules = 1;
+        last_addr += sizeof(mod);
+    }
+
+    start_info.magic = XEN_HVM_START_MAGIC_VALUE;
+    start_info.flags = SIF_PRIVILEGED | SIF_INITDOMAIN;
+    rc = hvm_copy_to_guest_phys(last_addr, &start_info, sizeof(start_info), v);
+    if ( rc )
+    {
+        printk("Unable to copy start info to guest\n");
+        return rc;
+    }
+
+    *entry = parms.phys_entry;
+    *start_info_addr = last_addr;
+
+    return 0;
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
                                      void *(*bootstrap_map)(const module_t *),
                                      char *cmdline)
 {
+    paddr_t entry, start_info;
     int rc;
 
     printk("** Building a PVH Dom0 **\n");
@@ -2034,6 +2160,14 @@ static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = pvh_load_kernel(d, image, image_headroom, initrd, bootstrap_map(image),
+                         cmdline, &entry, &start_info);
+    if ( rc )
+    {
+        printk("Failed to load Dom0 kernel\n");
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
diff --git a/xen/common/libelf/libelf-loader.c b/xen/common/libelf/libelf-loader.c
index 1644f16..de140ed 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -153,10 +153,19 @@ static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, el
         return -1;
     /* We trust the dom0 kernel image completely, so we don't care
      * about overruns etc. here. */
-    rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src), filesz);
+    if ( is_hvm_vcpu(elf->vcpu) )
+        rc = hvm_copy_to_guest_phys((paddr_t)ELF_UNSAFE_PTR(dst),
+                                    ELF_UNSAFE_PTR(src), filesz, elf->vcpu);
+    else
+        rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src),
+                               filesz);
     if ( rc != 0 )
         return -1;
-    rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
+    if ( is_hvm_vcpu(elf->vcpu) )
+        rc = hvm_copy_to_guest_phys((paddr_t)ELF_UNSAFE_PTR(dst + filesz),
+                                    NULL, filesz, elf->vcpu);
+    else
+        rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
     if ( rc != 0 )
         return -1;
     return 0;
diff --git a/xen/include/asm-x86/bzimage.h b/xen/include/asm-x86/bzimage.h
index b48a256..a51f583 100644
--- a/xen/include/asm-x86/bzimage.h
+++ b/xen/include/asm-x86/bzimage.h
@@ -6,7 +6,7 @@
 
 unsigned long bzimage_headroom(char *image_start, unsigned long image_length);
 
-int bzimage_parse(char *image_base, char **image_start,
+int bzimage_parse(void *image_base, void **image_start,
                   unsigned long *image_len);
 
 #endif /* __X86_BZIMAGE_H__ */
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 1b763f3..b739981 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -212,6 +212,8 @@ struct elf_binary {
     /* misc */
     elf_log_callback *log_callback;
     void *log_caller_data;
+#else
+    struct vcpu *vcpu;
 #endif
     bool verbose;
     const char *broken;
@@ -351,6 +353,10 @@ elf_errorstatus elf_init(struct elf_binary *elf, const char *image, size_t size)
    */
 #ifdef __XEN__
 void elf_set_verbose(struct elf_binary *elf);
+static inline void elf_set_vcpu(struct elf_binary *elf, struct vcpu *v)
+{
+    elf->vcpu = v;
+}
 #else
 void elf_set_log(struct elf_binary *elf, elf_log_callback*,
                  void *log_caller_pointer, bool verbose);
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 5/7] x86/PVHv2: fix dom0_max_vcpus so it's capped to HVM_MAX_VCPUS for PVHv2 Dom0
  2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (3 preceding siblings ...)
  2017-02-10 12:33 ` [PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
@ 2017-02-10 12:33 ` Roger Pau Monne
  2017-02-13 13:57   ` Jan Beulich
  2017-02-10 12:33 ` [PATCH v6 6/7] xen/x86: Setup PVHv2 Dom0 CPUs Roger Pau Monne
  2017-02-10 12:33 ` [PATCH v6 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

PVHv2 Dom0 is limited to 128 vCPUs, as are all HVM guests at the moment. Fix
dom0_max_vcpus so it takes this limitation into account.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Introduce a new limit local variable and use that to store the guest max
   number of vCPUs, this allows having a single check suitable for both PVH and
   PV.

Changes since v4:
 - Fix codding style to match rest of the function.

Changes since v3:
 - New in the series.
---
 xen/arch/x86/domain_build.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 01c9348..407e479 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -40,6 +40,7 @@
 
 #include <public/version.h>
 #include <public/arch-x86/hvm/start_info.h>
+#include <public/hvm/hvm_info_table.h>
 
 static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
@@ -157,7 +158,7 @@ static nodemask_t __initdata dom0_nodes;
 
 unsigned int __init dom0_max_vcpus(void)
 {
-    unsigned int i, max_vcpus;
+    unsigned int i, max_vcpus, limit;
     nodeid_t node;
 
     for ( i = 0; i < dom0_nr_pxms; ++i )
@@ -177,8 +178,9 @@ unsigned int __init dom0_max_vcpus(void)
         max_vcpus = opt_dom0_max_vcpus_min;
     if ( opt_dom0_max_vcpus_max < max_vcpus )
         max_vcpus = opt_dom0_max_vcpus_max;
-    if ( max_vcpus > MAX_VIRT_CPUS )
-        max_vcpus = MAX_VIRT_CPUS;
+    limit = dom0_pvh ? HVM_MAX_VCPUS : MAX_VIRT_CPUS;
+    if ( max_vcpus > limit )
+        max_vcpus = limit;
 
     return max_vcpus;
 }
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 6/7] xen/x86: Setup PVHv2 Dom0 CPUs
  2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (4 preceding siblings ...)
  2017-02-10 12:33 ` [PATCH v6 5/7] x86/PVHv2: fix dom0_max_vcpus so it's capped to HVM_MAX_VCPUS for PVHv2 Dom0 Roger Pau Monne
@ 2017-02-10 12:33 ` Roger Pau Monne
  2017-02-13 13:59   ` Jan Beulich
  2017-02-10 12:33 ` [PATCH v6 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Initialize Dom0 BSP/APs and setup the memory and IO permissions. This also sets
the initial BSP state in order to match the protocol specified in
docs/misc/hvmlite.markdown.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Make cpus and i unsigned ints.
 - Use an initializer for cpu_ctx (and remove the memset).
 - Move the clear_bit of vcpu 0 the end of pvh_setup_cpus.
---
 xen/arch/x86/domain_build.c | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 407e479..1ff2ddb 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -41,6 +41,7 @@
 #include <public/version.h>
 #include <public/arch-x86/hvm/start_info.h>
 #include <public/hvm/hvm_info_table.h>
+#include <public/hvm/hvm_vcpu.h>
 
 static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
@@ -2142,6 +2143,59 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     return 0;
 }
 
+static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
+                                 paddr_t start_info)
+{
+    struct vcpu *v = d->vcpu[0];
+    unsigned int cpu, i;
+    int rc;
+    /* 
+     * This sets the vCPU state according to the state described in
+     * docs/misc/hvmlite.markdown.
+     */
+    vcpu_hvm_context_t cpu_ctx = {
+        .mode = VCPU_HVM_MODE_32B,
+        .cpu_regs.x86_32.ebx = start_info,
+        .cpu_regs.x86_32.eip = entry,
+        .cpu_regs.x86_32.cr0 = X86_CR0_PE | X86_CR0_ET,
+        .cpu_regs.x86_32.cs_limit = ~0u,
+        .cpu_regs.x86_32.ds_limit = ~0u,
+        .cpu_regs.x86_32.ss_limit = ~0u,
+        .cpu_regs.x86_32.tr_limit = 0x67,
+        .cpu_regs.x86_32.cs_ar = 0xc9b,
+        .cpu_regs.x86_32.ds_ar = 0xc93,
+        .cpu_regs.x86_32.ss_ar = 0xc93,
+        .cpu_regs.x86_32.tr_ar = 0x8b,
+    };
+
+    cpu = v->processor;
+    for ( i = 1; i < d->max_vcpus; i++ )
+    {
+        cpu = cpumask_cycle(cpu, &dom0_cpus);
+        setup_dom0_vcpu(d, i, cpu);
+    }
+
+    rc = arch_set_info_hvm_guest(v, &cpu_ctx);
+    if ( rc )
+    {
+        printk("Unable to setup Dom0 BSP context: %d\n", rc);
+        return rc;
+    }
+
+    rc = setup_permissions(d);
+    if ( rc )
+    {
+        panic("Unable to setup Dom0 permissions: %d\n", rc);
+        return rc;
+    }
+
+    update_domain_wallclock_time(d);
+
+    clear_bit(_VPF_down, &v->pause_flags);
+
+    return 0;
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
@@ -2170,6 +2224,13 @@ static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = pvh_setup_cpus(d, entry, start_info);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 CPUs: %d\n", rc);
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables
  2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
                   ` (5 preceding siblings ...)
  2017-02-10 12:33 ` [PATCH v6 6/7] xen/x86: Setup PVHv2 Dom0 CPUs Roger Pau Monne
@ 2017-02-10 12:33 ` Roger Pau Monne
  2017-02-22 10:08   ` Jan Beulich
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-10 12:33 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky, konrad.wilk
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Create a new MADT table that contains the topology exposed to the guest. A
new XSDT table is also created, in order to filter the tables that we want
to expose to the guest, plus the Xen crafted MADT. This in turn requires Xen
to also create a new RSDP in order to make it point to the custom XSDT.

Also, regions marked as E820_ACPI or E820_NVS are identity mapped into Dom0
p2m, plus any top-level ACPI tables that should be accessible to Dom0 and
reside in reserved regions. This is needed because some memory maps don't
properly account for all the memory used by ACPI, so it's common to find ACPI
tables in reserved regions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - s/hvm_copy_to_guest_phys_vcpu/hvm_copy_to_guest_phys/.
 - Move pvh_add_mem_range to previous patch.
 - Add a comment regarding the current limitation to only 1 emulated IO APIC.
 - s/dom0_max_vcpus()/max_vcpus/ in pvh_setup_acpi_madt.
 - Cast structures to void when assigning.
 - Declare banned_tables with the initconst annotation.
 - Expand some comments messages.
 - Initialize the RSDP local variable.
 - Only provide x2APIC entries in the MADT.

Changes since v4:
 - s/hvm/pvh.
 - Use hvm_copy_to_guest_phys_vcpu.
 - Don't allocate up to E820MAX entries for the Dom0 memory map and instead
   allow pvh_add_mem_range to dynamically grow the memory map.
 - Add a comment about the lack of x2APIC MADT entries.
 - Change acpi_intr_overrides to unsigned int and the max iterator bound to
   UINT_MAX.
 - Set the MADT version as the minimum version between the hardware value and
   our supported version (4).
 - Set the MADT IO APIC ID to the current value of the domain vioapic->id.
 - Use void * when subtracting two pointers.
 - Fix indentation of nr_pages and use PFN_UP instead of DIV_ROUND_UP.
 - Change wording of the pvh_acpi_table_allowed error message.
 - Make j unsigned in pvh_setup_acpi_xsdt.
 - Move initialization of local variables with declarations in
   pvh_setup_acpi_xsdt.
 - Reword the comment about the allocated size of the xsdt custom table.
 - Fix line splitting.
 - Add a comment regarding the layering violation caused by the usage of
   acpi_tb_checksum.
 - Pass IO APIC NMI sources found in the MADT to Dom0.
 - Create x2APIC entries if the native MADT also contains them.
 - s/acpi_intr_overrrides/acpi_intr_overrides/.
 - Make sure the MADT is properly mapped into Dom0, or else Dom0 might not be
   able to access the output of the _MAT method depending on the
   implementation.
 - Get the first ACPI processor ID and use that as the base processor ID of the
   crafted MADT. This is done so that local/x2 APIC NMI entries match with the
   local/x2 APIC objects.

Changes since v3:
 - Use hvm_copy_to_phys in order to copy the tables to Dom0 memory.
 - Return EEXIST for overlaping ranges in hvm_add_mem_range.
 - s/ov/ovr/ for interrupt override parsing functions.
 - Constify intr local variable in acpi_set_intr_ovr.
 - Use structure asignement for type safety.
 - Perform sizeof using local variables in hvm_setup_acpi_madt.
 - Manually set revision of crafted/modified tables.
 - Only map tables to guest that reside in reserved or ACPI memory regions.
 - Copy the RSDP OEM signature to the crafted RSDP.
 - Pair calls to acpi_os_map_memory/acpi_os_unmap_memory.
 - Add memory regions for allowed ACPI tables to the memory map and then
   perform the identity mappings. This avoids having to call modify_identity_mmio
   multiple times.
 - Add a FIXME comment regarding the lack of multiple vIO-APICs.

Changes since v2:
 - Completely reworked.
---
 xen/arch/x86/domain_build.c | 434 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 434 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 1ff2ddb..85de84f 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -23,6 +23,7 @@
 #include <xen/libelf.h>
 #include <xen/pfn.h>
 #include <xen/guest_access.h>
+#include <xen/acpi.h>
 #include <asm/regs.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -38,6 +39,8 @@
 #include <asm/io_apic.h>
 #include <asm/hpet.h>
 
+#include <acpi/actables.h>
+
 #include <public/version.h>
 #include <public/arch-x86/hvm/start_info.h>
 #include <public/hvm/hvm_info_table.h>
@@ -53,6 +56,12 @@ static long __initdata dom0_max_nrpages = LONG_MAX;
  * */
 #define HVM_VM86_TSS_SIZE   128
 
+static unsigned int __initdata acpi_intr_overrides;
+static struct acpi_madt_interrupt_override __initdata *intsrcovr;
+
+static unsigned int __initdata acpi_nmi_sources;
+static struct acpi_madt_nmi_source __initdata *nmisrc;
+
 /*
  * dom0_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
  * 
@@ -2196,6 +2205,424 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
     return 0;
 }
 
+static int __init acpi_count_intr_ovr(struct acpi_subtable_header *header,
+                                     const unsigned long end)
+{
+
+    acpi_intr_overrides++;
+    return 0;
+}
+
+static int __init acpi_set_intr_ovr(struct acpi_subtable_header *header,
+                                    const unsigned long end)
+{
+    const struct acpi_madt_interrupt_override *intr =
+        container_of(header, struct acpi_madt_interrupt_override, header);
+
+    *intsrcovr = *intr;
+    intsrcovr++;
+
+    return 0;
+}
+
+static int __init acpi_count_nmi_src(struct acpi_subtable_header *header,
+                                     const unsigned long end)
+{
+
+    acpi_nmi_sources++;
+    return 0;
+}
+
+static int __init acpi_set_nmi_src(struct acpi_subtable_header *header,
+                                   const unsigned long end)
+{
+    const struct acpi_madt_nmi_source *src =
+        container_of(header, struct acpi_madt_nmi_source, header);
+
+    *nmisrc = *src;
+    nmisrc++;
+
+    return 0;
+}
+
+static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
+{
+    struct acpi_table_madt *madt;
+    struct acpi_table_header *table;
+    struct acpi_madt_io_apic *io_apic;
+    struct acpi_madt_local_x2apic *x2apic;
+    acpi_status status;
+    unsigned long size;
+    unsigned int i, max_vcpus;
+    int rc;
+
+    /* Count number of interrupt overrides in the MADT. */
+    acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
+                          acpi_count_intr_ovr, UINT_MAX);
+
+    /* Count number of NMI sources in the MADT. */
+    acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_count_nmi_src,
+                          UINT_MAX);
+
+    max_vcpus = dom0_max_vcpus();
+    /* Calculate the size of the crafted MADT. */
+    size = sizeof(*madt);
+    /*
+     * FIXME: the current vIO-APIC code just supports one IO-APIC instance
+     * per domain. This must be fixed in order to provide the same amount of
+     * IO APICs as available on bare metal.
+     */
+    size += sizeof(*io_apic);
+    size += sizeof(*intsrcovr) * acpi_intr_overrides;
+    size += sizeof(*nmisrc) * acpi_nmi_sources;
+    size += sizeof(*x2apic) * max_vcpus;
+
+    madt = xzalloc_bytes(size);
+    if ( !madt )
+    {
+        printk("Unable to allocate memory for MADT table\n");
+        return -ENOMEM;
+    }
+
+    /* Copy the native MADT table header. */
+    status = acpi_get_table(ACPI_SIG_MADT, 0, &table);
+    if ( !ACPI_SUCCESS(status) )
+    {
+        printk("Failed to get MADT ACPI table, aborting.\n");
+        return -EINVAL;
+    }
+    madt->header = *table;
+    madt->address = APIC_DEFAULT_PHYS_BASE;
+    /*
+     * NB: this is currently set to 4, which is the revision in the ACPI
+     * spec 6.1. Sadly ACPICA doesn't provide revision numbers for the
+     * tables described in the headers.
+     */
+    madt->header.revision = min_t(unsigned char, table->revision, 4);
+
+    /*
+     * Setup the IO APIC entry.
+     * FIXME: the current vIO-APIC code just supports one IO-APIC instance
+     * per domain. This must be fixed in order to provide the same amount of
+     * IO APICs as available on bare metal, and with the same IDs as found in
+     * the native IO APIC MADT entries.
+     */
+    if ( nr_ioapics > 1 )
+        printk("WARNING: found %d IO APICs, Dom0 will only have access to 1 emulated IO APIC\n",
+               nr_ioapics);
+    io_apic = (void *)(madt + 1);
+    io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
+    io_apic->header.length = sizeof(*io_apic);
+    io_apic->id = domain_vioapic(d)->id;
+    io_apic->address = VIOAPIC_DEFAULT_BASE_ADDRESS;
+
+    x2apic = (void *)(io_apic + 1);
+    for ( i = 0; i < max_vcpus; i++ )
+    {
+        x2apic->header.type = ACPI_MADT_TYPE_LOCAL_X2APIC;
+        x2apic->header.length = sizeof(*x2apic);
+        x2apic->uid = i;
+        x2apic->local_apic_id = i * 2;
+        x2apic->lapic_flags = ACPI_MADT_ENABLED;
+        x2apic++;
+    }
+
+    /* Setup interrupt overrides. */
+    intsrcovr = (void *)x2apic;
+    acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_set_intr_ovr,
+                          acpi_intr_overrides);
+
+    /* Setup NMI sources. */
+    nmisrc = (void *)intsrcovr;
+    acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_set_nmi_src,
+                          acpi_nmi_sources);
+
+    ASSERT(((void *)nmisrc - (void *)madt) == size);
+    madt->header.length = size;
+    /*
+     * Calling acpi_tb_checksum here is a layering violation, but
+     * introducing a wrapper for such simple usage seems overkill.
+     */
+    madt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, madt), size);
+
+    /* Place the new MADT in guest memory space. */
+    if ( pvh_steal_ram(d, size, 0, GB(4), addr) )
+    {
+        printk("Unable to find allocate guest RAM for MADT\n");
+        return -ENOMEM;
+    }
+
+    /* Mark this region as E820_ACPI. */
+    if ( pvh_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
+        printk("Unable to add MADT region to memory map\n");
+
+    rc = hvm_copy_to_guest_phys(*addr, madt, size, d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy MADT into guest memory\n");
+        return rc;
+    }
+    xfree(madt);
+
+    return 0;
+}
+
+static bool __init acpi_memory_banned(unsigned long address,
+                                      unsigned long size)
+{
+    unsigned long mfn, nr_pages, i;
+
+    mfn = PFN_DOWN(address);
+    nr_pages = PFN_UP((address & ~PAGE_MASK) + size);
+    for ( i = 0 ; i < nr_pages; i++ )
+        if ( !page_is_ram_type(mfn + i, RAM_TYPE_RESERVED) &&
+             !page_is_ram_type(mfn + i, RAM_TYPE_ACPI) )
+            return true;
+
+    return false;
+}
+
+static bool __init pvh_acpi_table_allowed(const char *sig)
+{
+    static const char __initconst banned_tables[][ACPI_NAME_SIZE] = {
+        ACPI_SIG_HPET, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_MPST,
+        ACPI_SIG_PMTT, ACPI_SIG_MADT, ACPI_SIG_DMAR};
+    unsigned int i;
+
+    for ( i = 0 ; i < ARRAY_SIZE(banned_tables); i++ )
+        if ( strncmp(sig, banned_tables[i], ACPI_NAME_SIZE) == 0 )
+            return false;
+
+    /* Make sure table doesn't reside in a RAM region. */
+    if ( acpi_memory_banned(acpi_gbl_root_table_list.tables[i].address,
+                            acpi_gbl_root_table_list.tables[i].length) )
+    {
+        printk("Skipping table %.4s because resides in a non-ACPI, non-reserved region\n",
+               sig);
+        return false;
+    }
+
+    return true;
+}
+
+static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr,
+                                      paddr_t *addr)
+{
+    struct acpi_table_xsdt *xsdt;
+    struct acpi_table_header *table;
+    struct acpi_table_rsdp *rsdp;
+    unsigned long size = sizeof(*xsdt);
+    unsigned int i, j, num_tables = 0;
+    paddr_t xsdt_paddr;
+    int rc;
+
+    /*
+     * Restore original DMAR table signature, we are going to filter it from
+     * the new XSDT that is presented to the guest, so it is no longer
+     * necessary to have it's signature zapped.
+     */
+    acpi_dmar_reinstate();
+
+    /* Count the number of tables that will be added to the XSDT. */
+    for( i = 0; i < acpi_gbl_root_table_list.count; i++ )
+    {
+        const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii;
+
+        if ( pvh_acpi_table_allowed(sig) )
+            num_tables++;
+    }
+
+    /*
+     * No need to add or subtract anything because struct acpi_table_xsdt
+     * includes one array slot already, and we have filtered out the original
+     * MADT and we are going to add a custom built MADT.
+     */
+    size += num_tables * sizeof(xsdt->table_offset_entry[0]);
+
+    xsdt = xzalloc_bytes(size);
+    if ( !xsdt )
+    {
+        printk("Unable to allocate memory for XSDT table\n");
+        return -ENOMEM;
+    }
+
+    /* Copy the native XSDT table header. */
+    rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(*rsdp));
+    if ( !rsdp )
+    {
+        printk("Unable to map RSDP\n");
+        return -EINVAL;
+    }
+    xsdt_paddr = rsdp->xsdt_physical_address;
+    acpi_os_unmap_memory(rsdp, sizeof(*rsdp));
+    table = acpi_os_map_memory(xsdt_paddr, sizeof(*table));
+    if ( !table )
+    {
+        printk("Unable to map XSDT\n");
+        return -EINVAL;
+    }
+    xsdt->header = *table;
+    acpi_os_unmap_memory(table, sizeof(*table));
+
+    /* Add the custom MADT. */
+    xsdt->table_offset_entry[0] = madt_addr;
+
+    /* Copy the addresses of the rest of the allowed tables. */
+    for( i = 0, j = 1; i < acpi_gbl_root_table_list.count; i++ )
+    {
+        const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii;
+
+        if ( pvh_acpi_table_allowed(sig) )
+            xsdt->table_offset_entry[j++] =
+                                acpi_gbl_root_table_list.tables[i].address;
+    }
+
+    xsdt->header.revision = 1;
+    xsdt->header.length = size;
+    /*
+     * Calling acpi_tb_checksum here is a layering violation, but
+     * introducing a wrapper for such simple usage seems overkill.
+     */
+    xsdt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, xsdt), size);
+
+    /* Place the new XSDT in guest memory space. */
+    if ( pvh_steal_ram(d, size, 0, GB(4), addr) )
+    {
+        printk("Unable to find guest RAM for XSDT\n");
+        return -ENOMEM;
+    }
+
+    /* Mark this region as E820_ACPI. */
+    if ( pvh_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
+        printk("Unable to add XSDT region to memory map\n");
+
+    rc = hvm_copy_to_guest_phys(*addr, xsdt, size, d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy XSDT into guest memory\n");
+        return rc;
+    }
+    xfree(xsdt);
+
+    return 0;
+}
+
+static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info)
+{
+    unsigned long pfn, nr_pages;
+    paddr_t madt_paddr, xsdt_paddr, rsdp_paddr;
+    unsigned int i;
+    int rc;
+    struct acpi_table_rsdp *native_rsdp, rsdp = {
+        .signature = ACPI_SIG_RSDP,
+        .revision = 2,
+        .length = sizeof(rsdp),
+    };
+
+
+    /* Scan top-level tables and add their regions to the guest memory map. */
+    for( i = 0; i < acpi_gbl_root_table_list.count; i++ )
+    {
+        const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii;
+        unsigned long addr = acpi_gbl_root_table_list.tables[i].address;
+        unsigned long size = acpi_gbl_root_table_list.tables[i].length;
+
+        /*
+         * Make sure the original MADT is also mapped, so that Dom0 can
+         * properly access the data returned by _MAT methods in case it's
+         * re-using MADT memory.
+         */
+        if ( strncmp(sig, ACPI_SIG_MADT, ACPI_NAME_SIZE)
+             ? pvh_acpi_table_allowed(sig)
+             : !acpi_memory_banned(addr, size) )
+             pvh_add_mem_range(d, addr, addr + size, E820_ACPI);
+    }
+
+    /* Identity map ACPI e820 regions. */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        if ( d->arch.e820[i].type != E820_ACPI &&
+             d->arch.e820[i].type != E820_NVS )
+            continue;
+
+        pfn = PFN_DOWN(d->arch.e820[i].addr);
+        nr_pages = PFN_UP((d->arch.e820[i].addr & ~PAGE_MASK) +
+                          d->arch.e820[i].size);
+
+        rc = modify_identity_mmio(d, pfn, nr_pages, true);
+        if ( rc )
+        {
+            printk("Failed to map ACPI region [%#lx, %#lx) into Dom0 memory map\n",
+                   pfn, pfn + nr_pages);
+            return rc;
+        }
+    }
+
+    rc = pvh_setup_acpi_madt(d, &madt_paddr);
+    if ( rc )
+        return rc;
+
+    rc = pvh_setup_acpi_xsdt(d, madt_paddr, &xsdt_paddr);
+    if ( rc )
+        return rc;
+
+    /* Craft a custom RSDP. */
+    native_rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(rsdp));
+    memcpy(rsdp.oem_id, native_rsdp->oem_id, sizeof(rsdp.oem_id));
+    acpi_os_unmap_memory(native_rsdp, sizeof(rsdp));
+    rsdp.xsdt_physical_address = xsdt_paddr;
+    /*
+     * Calling acpi_tb_checksum here is a layering violation, but
+     * introducing a wrapper for such simple usage seems overkill.
+     */
+    rsdp.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, &rsdp),
+                                      ACPI_RSDP_REV0_SIZE);
+    rsdp.extended_checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, &rsdp),
+                                               sizeof(rsdp));
+
+    /*
+     * Place the new RSDP in guest memory space.
+     *
+     * NB: this RSDP is not going to replace the original RSDP, which should
+     * still be accessible to the guest. However that RSDP is going to point to
+     * the native RSDT, and should not be used for the Dom0 kernel's boot
+     * purposes (we keep it visible for post boot access).
+     */
+    if ( pvh_steal_ram(d, sizeof(rsdp), 0, GB(4), &rsdp_paddr) )
+    {
+        printk("Unable to allocate guest RAM for RSDP\n");
+        return -ENOMEM;
+    }
+
+    /* Mark this region as E820_ACPI. */
+    if ( pvh_add_mem_range(d, rsdp_paddr, rsdp_paddr + sizeof(rsdp),
+                           E820_ACPI) )
+        printk("Unable to add RSDP region to memory map\n");
+
+    /* Copy RSDP into guest memory. */
+    rc = hvm_copy_to_guest_phys(rsdp_paddr, &rsdp, sizeof(rsdp), d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy RSDP into guest memory\n");
+        return rc;
+    }
+
+    /* Copy RSDP address to start_info. */
+    rc = hvm_copy_to_guest_phys(start_info +
+                                offsetof(struct hvm_start_info, rsdp_paddr),
+                                &rsdp_paddr,
+                                sizeof(((struct hvm_start_info *)
+                                        0)->rsdp_paddr),
+                                d->vcpu[0]);
+    if ( rc )
+    {
+        printk("Unable to copy RSDP into guest memory\n");
+        return rc;
+    }
+
+    return 0;
+}
+
 static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
@@ -2231,6 +2658,13 @@ static int __init construct_dom0_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = pvh_setup_acpi(d, start_info);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 ACPI tables: %d\n", rc);
+        return rc;
+    }
+
     panic("Building a PVHv2 Dom0 is not yet supported.");
     return 0;
 }
-- 
2.10.1 (Apple Git-78)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2
  2017-02-10 12:33 ` [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
@ 2017-02-10 13:32   ` Andrew Cooper
  2017-02-13 13:12   ` Jan Beulich
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2017-02-10 13:32 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel, boris.ostrovsky, konrad.wilk; +Cc: Jan Beulich

On 10/02/17 12:33, Roger Pau Monne wrote:
> Split the Dom0 builder into two different functions, one for PV (and classic
> PVH), and another one for PVHv2. Introduce a new command line parameter called
> 'dom0' that can be used to request the creation of a PVHv2 Dom0 by setting the
> 'hvm' sub-option. A panic has also been added if a user tries to use dom0=hvm
> until all the code is in place, then the panic will be removed.
>
> While there mark the dom0_shadow option that was used by PV Dom0 as deprecated,
> it was lacking documentation and was not functional. Point users towards
> dom0=shadow instead.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2
  2017-02-10 12:33 ` [PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
@ 2017-02-10 14:34   ` Ian Jackson
  2017-02-13 12:07     ` Roger Pau Monne
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Jackson @ 2017-02-10 14:34 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, boris.ostrovsky, Andrew Cooper, Jan Beulich

Roger Pau Monne writes ("[PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2"):
> Introduce a helper to parse the Dom0 kernel.
> 
> A new helper is also introduced to libelf, that's used to store the
> destination vcpu of the domain. This parameter is needed when
> loading the kernel on a HVM domain (PVHv2), since
> hvm_copy_to_guest_phys requires passing the destination vcpu.

The new helper and variable seems fine to me.

> While there also fix image_base and image_start to be of type "void
> *", and do the necessary fixup of related functions.

IMO this should be separate patch(es).

> +static int __init pvh_load_kernel(struct domain *d, const module_t *image,
> +                                  unsigned long image_headroom,
> +                                  module_t *initrd, void *image_base,
> +                                  char *cmdline, paddr_t *entry,
> +                                  paddr_t *start_info_addr)
> +{

FAOD this is used for dom0 only, right ?  In which case I don't feel
the need to review it.

> diff --git a/xen/common/libelf/libelf-loader.c b/xen/common/libelf/libelf-loader.c
> index 1644f16..de140ed 100644
> --- a/xen/common/libelf/libelf-loader.c
> +++ b/xen/common/libelf/libelf-loader.c
> @@ -153,10 +153,19 @@ static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, el
>          return -1;
>      /* We trust the dom0 kernel image completely, so we don't care
>       * about overruns etc. here. */
> -    rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src), filesz);
> +    if ( is_hvm_vcpu(elf->vcpu) )
> +        rc = hvm_copy_to_guest_phys((paddr_t)ELF_UNSAFE_PTR(dst),
> +                                    ELF_UNSAFE_PTR(src), filesz, elf->vcpu);
> +    else
> +        rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src),
> +                               filesz);
>      if ( rc != 0 )
>          return -1;
> -    rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
> +    if ( is_hvm_vcpu(elf->vcpu) )
> +        rc = hvm_copy_to_guest_phys((paddr_t)ELF_UNSAFE_PTR(dst + filesz),
> +                                    NULL, filesz, elf->vcpu);
> +    else
> +        rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);

This seems to involve open coding all four elements of a 2x2 matrix.
Couldn't you provide a helper function that:
 * Checks is_hvm_vcpu
 * Has the "NULL means clear" behaviour which I infer
   hvm_copy_to_guest_phys has
 * Calls hvm_copy_to_guest_phys or raw_{copy_to,clear}_guest
(Does raw_copy_to_guest have the "NULL means clear" feature ?  Maybe
that feature should be added, further lifting that into more general
code.)

Then the source and destination calculations would be done once for
each part, rather than twice, and the is_hvm_vcpu condition would be
done once rather than twice.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2
  2017-02-10 14:34   ` Ian Jackson
@ 2017-02-13 12:07     ` Roger Pau Monne
  0 siblings, 0 replies; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-13 12:07 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, boris.ostrovsky, Andrew Cooper, Jan Beulich

On Fri, Feb 10, 2017 at 02:34:16PM +0000, Ian Jackson wrote:
> Roger Pau Monne writes ("[PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2"):
> > Introduce a helper to parse the Dom0 kernel.
> > 
> > A new helper is also introduced to libelf, that's used to store the
> > destination vcpu of the domain. This parameter is needed when
> > loading the kernel on a HVM domain (PVHv2), since
> > hvm_copy_to_guest_phys requires passing the destination vcpu.
> 
> The new helper and variable seems fine to me.
> 
> > While there also fix image_base and image_start to be of type "void
> > *", and do the necessary fixup of related functions.
> 
> IMO this should be separate patch(es).

Thanks, I've now splitted it.

> > +static int __init pvh_load_kernel(struct domain *d, const module_t *image,
> > +                                  unsigned long image_headroom,
> > +                                  module_t *initrd, void *image_base,
> > +                                  char *cmdline, paddr_t *entry,
> > +                                  paddr_t *start_info_addr)
> > +{
> 
> FAOD this is used for dom0 only, right ?  In which case I don't feel
> the need to review it.

Right, this is Dom0 only.

> > diff --git a/xen/common/libelf/libelf-loader.c b/xen/common/libelf/libelf-loader.c
> > index 1644f16..de140ed 100644
> > --- a/xen/common/libelf/libelf-loader.c
> > +++ b/xen/common/libelf/libelf-loader.c
> > @@ -153,10 +153,19 @@ static elf_errorstatus elf_load_image(struct elf_binary *elf, elf_ptrval dst, el
> >          return -1;
> >      /* We trust the dom0 kernel image completely, so we don't care
> >       * about overruns etc. here. */
> > -    rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src), filesz);
> > +    if ( is_hvm_vcpu(elf->vcpu) )
> > +        rc = hvm_copy_to_guest_phys((paddr_t)ELF_UNSAFE_PTR(dst),
> > +                                    ELF_UNSAFE_PTR(src), filesz, elf->vcpu);
> > +    else
> > +        rc = raw_copy_to_guest(ELF_UNSAFE_PTR(dst), ELF_UNSAFE_PTR(src),
> > +                               filesz);
> >      if ( rc != 0 )
> >          return -1;
> > -    rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
> > +    if ( is_hvm_vcpu(elf->vcpu) )
> > +        rc = hvm_copy_to_guest_phys((paddr_t)ELF_UNSAFE_PTR(dst + filesz),
> > +                                    NULL, filesz, elf->vcpu);
> > +    else
> > +        rc = raw_clear_guest(ELF_UNSAFE_PTR(dst + filesz), memsz - filesz);
> 
> This seems to involve open coding all four elements of a 2x2 matrix.
> Couldn't you provide a helper function that:
>  * Checks is_hvm_vcpu
>  * Has the "NULL means clear" behaviour which I infer
>    hvm_copy_to_guest_phys has
>  * Calls hvm_copy_to_guest_phys or raw_{copy_to,clear}_guest
> (Does raw_copy_to_guest have the "NULL means clear" feature ?  Maybe
> that feature should be added, further lifting that into more general
> code.)

In the pv case raw_copy_to_guest calls clear_user which is a specific function
to zero a memory area, and AFAICT raw_copy_to_guest doesn't have the NULL means
clear. I've added the following helper, let me know if that looks fine:

static elf_errorstatus elf_memcpy(struct vcpu *v, void *dst, void *src,
                                  uint64_t size)
{
    int rc = 0;

#ifdef CONFIG_X86
    if ( is_hvm_vcpu(v) &&
         hvm_copy_to_guest_phys((paddr_t)dst, src, size, v) != HVMCOPY_okay )
        rc = -1;
    else
#endif
        rc = src == NULL ? raw_clear_guest(dst, size) :
                           raw_copy_to_guest(dst, src, size);

    return rc;
}

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
  2017-02-10 12:33 ` [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
@ 2017-02-13 13:09   ` Jan Beulich
  2017-02-13 14:22     ` Roger Pau Monne
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2017-02-13 13:09 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> --- a/docs/misc/hvmlite.markdown
> +++ b/docs/misc/hvmlite.markdown
> @@ -75,3 +75,23 @@ info structure that's passed at boot time (field rsdp_paddr).
>  
>  Description of paravirtualized devices will come from XenStore, just as it's
>  done for HVM guests.
> +
> +## Interrupts ##
> +
> +### Interrupts from physical devices ###
> +
> +Interrupts from physical devices are delivered using native methods, this is
> +done in order to take advantage of new hardware assisted virtualization
> +functions, like posted interrupts. This implies that PVHv2 guests with physical
> +devices will also have the necessary interrupt controllers in order to manage
> +the delivery of interrupts from those devices, using the same interfaces that
> +are available on native hardware.
> +
> +### Interrupts from paravirtualized devices ###
> +
> +Interrupts from paravirtualized devices are delivered using event channels, see
> +[Event Channel Internals][event_channels] for more detailed information about
> +event channels. Delivery of those interrupts can be configured in the same way
> +as HVM guests, check xen/include/public/hvm/params.h and
> +xen/include/public/hvm/hvm_op.h for more information about available delivery
> +methods.

Just FTR - my comments on v4 still stand; I especially don't buy
your argument [1] of posted interrupts becoming more widely
available, since - as pointed out before - we continue to not fully
enable their use by default, due to unresolved problems with
the implementation. Furthermore emulation_flags_ok() allows
EMU_LAPIC to be clear (together with all other bits being clear),
in which case posted interrupts can't be used at all afaict.

Jan

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg02092.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2
  2017-02-10 12:33 ` [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
  2017-02-10 13:32   ` Andrew Cooper
@ 2017-02-13 13:12   ` Jan Beulich
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2017-02-13 13:12 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> Split the Dom0 builder into two different functions, one for PV (and classic
> PVH), and another one for PVHv2. Introduce a new command line parameter called
> 'dom0' that can be used to request the creation of a PVHv2 Dom0 by setting the
> 'hvm' sub-option. A panic has also been added if a user tries to use dom0=hvm
> until all the code is in place, then the panic will be removed.
> 
> While there mark the dom0_shadow option that was used by PV Dom0 as deprecated,
> it was lacking documentation and was not functional. Point users towards
> dom0=shadow instead.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-10 12:33 ` [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
@ 2017-02-13 13:53   ` Jan Beulich
  2017-02-14 10:10     ` Roger Pau Monne
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2017-02-13 13:53 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> ---
> Changes since v5:
>  - Adjust the logic to set need_paging.
>  - Remove the usage of the _AC macro.
>  - Subtract memory from the end of regions instead of the start.
>  - Create the VM86_TSS before the identity page table, so that the page table
>    is aligned to a page boundary.
>  - Use MB1_PAGES in modify_identity_mmio.
>  - Move and simply the ASSERT in pvh_setup_p2m.
>  - Move the creation of the PSE page tables to a separate function, and use it
>    in shadow_enable also.
>  - Make the map modify_identiy_mmio parameter a constant.
>  - Add a comment to HVM_VM86_TSS_SIZE, although it seems this might need
>    further fixing.

Indeed, I think this wants to be re-based on top of the patch I've
just sent (with you Cc-ed).

> +static int __init pvh_steal_ram(struct domain *d, unsigned long size,
> +                                unsigned long align, paddr_t limit,
> +                                paddr_t *addr)
> +{
> +    unsigned int i = d->arch.nr_e820;
> +
> +    /*
> +     * Alignment 0 should be set to 1, so it doesn't wrap around in the
> +     * calculations below.
> +     */
> +    align = align ? : 1;
> +    while ( i-- )
> +    {
> +        struct e820entry *entry = &d->arch.e820[i];
> +
> +        if ( entry->type != E820_RAM || entry->addr + entry->size > limit ||
> +             entry->addr < MB(1) )
> +            continue;
> +
> +        *addr = (entry->addr + entry->size - size) & ~(align - 1);

Without taking the present callers into account (who don't request
huge alignment) this (theoretically) risks running into the low 1Mb.
I see two options - either add a comment clarifying that an entry
will never cross the 1Mb boundary (and hence the issue can't
occur), or limit the alignment (by way of ASSERT()) to PAGE_SIZE
(in which case no significant harm would result from crossing the
boundary).

> +static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
> +{
> +    p2m_type_t p2mt;
> +    uint32_t rc, *ident_pt;
> +    uint8_t *tss;
> +    mfn_t mfn;
> +    paddr_t gaddr;
> +
> +    /*
> +     * Steal some space from the last RAM region below 4GB and use it to
> +     * store the real-mode TSS.
> +     */
> +    if ( !pvh_steal_ram(d, HVM_VM86_TSS_SIZE, 0, GB(4), &gaddr) &&

Please request at least 128-byte alignment here, to avoid the
base TSS structure crossing a page boundary.

> +         (tss = map_domain_gfn(p2m_get_hostp2m(d), _gfn(PFN_DOWN(gaddr)),
> +                               &mfn, &p2mt, 0, &rc)) )
> +    {
> +        memset(tss, 0, HVM_VM86_TSS_SIZE);

What makes you assume that you've mapped all the space you've
allocated?

> --- a/xen/include/asm-x86/page.h
> +++ b/xen/include/asm-x86/page.h
> @@ -374,6 +374,18 @@ perms_strictly_increased(uint32_t old_flags, uint32_t new_flags)
>      return ((of | (of ^ nf)) == nf);
>  }
>  
> +/* Build a 32bit PSE page table using 4MB pages. */
> +static inline void
> +write_32bit_pse_identmap(uint32_t *l2)

Why does this need to be an inline function?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 5/7] x86/PVHv2: fix dom0_max_vcpus so it's capped to HVM_MAX_VCPUS for PVHv2 Dom0
  2017-02-10 12:33 ` [PATCH v6 5/7] x86/PVHv2: fix dom0_max_vcpus so it's capped to HVM_MAX_VCPUS for PVHv2 Dom0 Roger Pau Monne
@ 2017-02-13 13:57   ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2017-02-13 13:57 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> PVHv2 Dom0 is limited to 128 vCPUs, as are all HVM guests at the moment. Fix
> dom0_max_vcpus so it takes this limitation into account.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 6/7] xen/x86: Setup PVHv2 Dom0 CPUs
  2017-02-10 12:33 ` [PATCH v6 6/7] xen/x86: Setup PVHv2 Dom0 CPUs Roger Pau Monne
@ 2017-02-13 13:59   ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2017-02-13 13:59 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> Initialize Dom0 BSP/APs and setup the memory and IO permissions. This also sets
> the initial BSP state in order to match the protocol specified in
> docs/misc/hvmlite.markdown.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
  2017-02-13 13:09   ` Jan Beulich
@ 2017-02-13 14:22     ` Roger Pau Monne
  2017-02-13 14:33       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-13 14:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

On Mon, Feb 13, 2017 at 06:09:27AM -0700, Jan Beulich wrote:
> >>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> > --- a/docs/misc/hvmlite.markdown
> > +++ b/docs/misc/hvmlite.markdown
> > @@ -75,3 +75,23 @@ info structure that's passed at boot time (field rsdp_paddr).
> >  
> >  Description of paravirtualized devices will come from XenStore, just as it's
> >  done for HVM guests.
> > +
> > +## Interrupts ##
> > +
> > +### Interrupts from physical devices ###
> > +
> > +Interrupts from physical devices are delivered using native methods, this is
> > +done in order to take advantage of new hardware assisted virtualization
> > +functions, like posted interrupts. This implies that PVHv2 guests with physical
> > +devices will also have the necessary interrupt controllers in order to manage
> > +the delivery of interrupts from those devices, using the same interfaces that
> > +are available on native hardware.
> > +
> > +### Interrupts from paravirtualized devices ###
> > +
> > +Interrupts from paravirtualized devices are delivered using event channels, see
> > +[Event Channel Internals][event_channels] for more detailed information about
> > +event channels. Delivery of those interrupts can be configured in the same way
> > +as HVM guests, check xen/include/public/hvm/params.h and
> > +xen/include/public/hvm/hvm_op.h for more information about available delivery
> > +methods.
> 
> Just FTR - my comments on v4 still stand; I especially don't buy
> your argument [1] of posted interrupts becoming more widely
> available, since - as pointed out before - we continue to not fully
> enable their use by default, due to unresolved problems with
> the implementation. Furthermore emulation_flags_ok() allows
> EMU_LAPIC to be clear (together with all other bits being clear),
> in which case posted interrupts can't be used at all afaict.

It only allows it for DomUs, which don't support passthrough at all. If a
device is passed thought, a LAPIC will be required.

Also note that the current set of physdev ops available to PVHv2 is completely
wrong. Since PVHv2 is considered a HVM guest from Xen PoV, it will take the
is_hvm_domain branch at the top of physdev_map_pirq, which will certainly lead
to all kinds of fun, because that's the weird path used in (PV)HVM guests that
allows them to remap interrupts from emulated or physical devices into event
channels. So at the moment, I think this should even be considered a bug-fix.

As I think I've said before, I don't think we should initially follow this
route for PVHv2, but in any case this can always be added afterwards by
re-enabling the XENFEAT_hvm_pirqs flag and fixing the physdev ops so they can
work properly in this context.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
  2017-02-13 14:22     ` Roger Pau Monne
@ 2017-02-13 14:33       ` Jan Beulich
  2017-02-13 14:48         ` Roger Pau Monne
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2017-02-13 14:33 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 13.02.17 at 15:22, <roger.pau@citrix.com> wrote:
> On Mon, Feb 13, 2017 at 06:09:27AM -0700, Jan Beulich wrote:
>> >>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
>> > --- a/docs/misc/hvmlite.markdown
>> > +++ b/docs/misc/hvmlite.markdown
>> > @@ -75,3 +75,23 @@ info structure that's passed at boot time (field rsdp_paddr).
>> >  
>> >  Description of paravirtualized devices will come from XenStore, just as it's
>> >  done for HVM guests.
>> > +
>> > +## Interrupts ##
>> > +
>> > +### Interrupts from physical devices ###
>> > +
>> > +Interrupts from physical devices are delivered using native methods, this is
>> > +done in order to take advantage of new hardware assisted virtualization
>> > +functions, like posted interrupts. This implies that PVHv2 guests with physical
>> > +devices will also have the necessary interrupt controllers in order to manage
>> > +the delivery of interrupts from those devices, using the same interfaces that
>> > +are available on native hardware.
>> > +
>> > +### Interrupts from paravirtualized devices ###
>> > +
>> > +Interrupts from paravirtualized devices are delivered using event channels, see
>> > +[Event Channel Internals][event_channels] for more detailed information about
>> > +event channels. Delivery of those interrupts can be configured in the same way
>> > +as HVM guests, check xen/include/public/hvm/params.h and
>> > +xen/include/public/hvm/hvm_op.h for more information about available delivery
>> > +methods.
>> 
>> Just FTR - my comments on v4 still stand; I especially don't buy
>> your argument [1] of posted interrupts becoming more widely
>> available, since - as pointed out before - we continue to not fully
>> enable their use by default, due to unresolved problems with
>> the implementation. Furthermore emulation_flags_ok() allows
>> EMU_LAPIC to be clear (together with all other bits being clear),
>> in which case posted interrupts can't be used at all afaict.
> 
> It only allows it for DomUs, which don't support passthrough at all. If a
> device is passed thought, a LAPIC will be required.

Is that spelled out anywhere? And I don't recall any fixme
comments regarding pass-through not working.

> Also note that the current set of physdev ops available to PVHv2 is completely
> wrong. Since PVHv2 is considered a HVM guest from Xen PoV, it will take the
> is_hvm_domain branch at the top of physdev_map_pirq, which will certainly lead
> to all kinds of fun, because that's the weird path used in (PV)HVM guests that
> allows them to remap interrupts from emulated or physical devices into event
> channels. So at the moment, I think this should even be considered a bug-fix.

I think I had indicated that I'm okay with the physdev ops change
for now; i.e. ...

> As I think I've said before, I don't think we should initially follow this
> route for PVHv2, but in any case this can always be added afterwards by
> re-enabling the XENFEAT_hvm_pirqs flag and fixing the physdev ops so they can
> work properly in this context.

... please note that my comments are specifically against the
document, not the code (anymore). And the document should
spell out options, even if their implementation may be deferred.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
  2017-02-13 14:33       ` Jan Beulich
@ 2017-02-13 14:48         ` Roger Pau Monne
  0 siblings, 0 replies; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-13 14:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

On Mon, Feb 13, 2017 at 07:33:17AM -0700, Jan Beulich wrote:
> >>> On 13.02.17 at 15:22, <roger.pau@citrix.com> wrote:
> > On Mon, Feb 13, 2017 at 06:09:27AM -0700, Jan Beulich wrote:
> >> >>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> >> > --- a/docs/misc/hvmlite.markdown
> >> > +++ b/docs/misc/hvmlite.markdown
> >> > @@ -75,3 +75,23 @@ info structure that's passed at boot time (field rsdp_paddr).
> >> >  
> >> >  Description of paravirtualized devices will come from XenStore, just as it's
> >> >  done for HVM guests.
> >> > +
> >> > +## Interrupts ##
> >> > +
> >> > +### Interrupts from physical devices ###
> >> > +
> >> > +Interrupts from physical devices are delivered using native methods, this is
> >> > +done in order to take advantage of new hardware assisted virtualization
> >> > +functions, like posted interrupts. This implies that PVHv2 guests with physical
> >> > +devices will also have the necessary interrupt controllers in order to manage
> >> > +the delivery of interrupts from those devices, using the same interfaces that
> >> > +are available on native hardware.
> >> > +
> >> > +### Interrupts from paravirtualized devices ###
> >> > +
> >> > +Interrupts from paravirtualized devices are delivered using event channels, see
> >> > +[Event Channel Internals][event_channels] for more detailed information about
> >> > +event channels. Delivery of those interrupts can be configured in the same way
> >> > +as HVM guests, check xen/include/public/hvm/params.h and
> >> > +xen/include/public/hvm/hvm_op.h for more information about available delivery
> >> > +methods.
> >> 
> >> Just FTR - my comments on v4 still stand; I especially don't buy
> >> your argument [1] of posted interrupts becoming more widely
> >> available, since - as pointed out before - we continue to not fully
> >> enable their use by default, due to unresolved problems with
> >> the implementation. Furthermore emulation_flags_ok() allows
> >> EMU_LAPIC to be clear (together with all other bits being clear),
> >> in which case posted interrupts can't be used at all afaict.
> > 
> > It only allows it for DomUs, which don't support passthrough at all. If a
> > device is passed thought, a LAPIC will be required.
> 
> Is that spelled out anywhere? And I don't recall any fixme
> comments regarding pass-through not working.

I'm not sure there's anywhere were a FIXME would be appropriate, this is not
something that needs fixes, it's something that simply doesn't yet exist.

Maybe libxl should be slightly adjusted to print a nice error message if
someone attempts adding a PCI device to a PVHv2 guest.

It is currently being discussed with the ARM guys also [0], in order to try to
come up with a similar model for both ARM and PVHv2 (except of course for the
arch-specific bits).

> > Also note that the current set of physdev ops available to PVHv2 is completely
> > wrong. Since PVHv2 is considered a HVM guest from Xen PoV, it will take the
> > is_hvm_domain branch at the top of physdev_map_pirq, which will certainly lead
> > to all kinds of fun, because that's the weird path used in (PV)HVM guests that
> > allows them to remap interrupts from emulated or physical devices into event
> > channels. So at the moment, I think this should even be considered a bug-fix.
> 
> I think I had indicated that I'm okay with the physdev ops change
> for now; i.e. ...
> 
> > As I think I've said before, I don't think we should initially follow this
> > route for PVHv2, but in any case this can always be added afterwards by
> > re-enabling the XENFEAT_hvm_pirqs flag and fixing the physdev ops so they can
> > work properly in this context.
> 
> ... please note that my comments are specifically against the
> document, not the code (anymore). And the document should
> spell out options, even if their implementation may be deferred.

The document simply describes what I plan to implement myself, so if someone
wants to work on those bits (physdev ops), he/she is free to do it and expand
the document appropriately. I simply don't plan to implement them myself,
because I don't see much value in it and it's an interface that I would rather
get rid of.

Roger.

[0] https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg03032.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-13 13:53   ` Jan Beulich
@ 2017-02-14 10:10     ` Roger Pau Monne
  2017-02-14 10:19       ` Roger Pau Monne
  2017-02-14 10:20       ` Jan Beulich
  0 siblings, 2 replies; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-14 10:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

On Mon, Feb 13, 2017 at 06:53:49AM -0700, Jan Beulich wrote:
> >>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> > ---
> > Changes since v5:
> >  - Adjust the logic to set need_paging.
> >  - Remove the usage of the _AC macro.
> >  - Subtract memory from the end of regions instead of the start.
> >  - Create the VM86_TSS before the identity page table, so that the page table
> >    is aligned to a page boundary.
> >  - Use MB1_PAGES in modify_identity_mmio.
> >  - Move and simply the ASSERT in pvh_setup_p2m.
> >  - Move the creation of the PSE page tables to a separate function, and use it
> >    in shadow_enable also.
> >  - Make the map modify_identiy_mmio parameter a constant.
> >  - Add a comment to HVM_VM86_TSS_SIZE, although it seems this might need
> >    further fixing.
> 
> Indeed, I think this wants to be re-based on top of the patch I've
> just sent (with you Cc-ed).

Sure. Just done that.

> > +static int __init pvh_steal_ram(struct domain *d, unsigned long size,
> > +                                unsigned long align, paddr_t limit,
> > +                                paddr_t *addr)
> > +{
> > +    unsigned int i = d->arch.nr_e820;
> > +
> > +    /*
> > +     * Alignment 0 should be set to 1, so it doesn't wrap around in the
> > +     * calculations below.
> > +     */
> > +    align = align ? : 1;
> > +    while ( i-- )
> > +    {
> > +        struct e820entry *entry = &d->arch.e820[i];
> > +
> > +        if ( entry->type != E820_RAM || entry->addr + entry->size > limit ||
> > +             entry->addr < MB(1) )
> > +            continue;
> > +
> > +        *addr = (entry->addr + entry->size - size) & ~(align - 1);
> 
> Without taking the present callers into account (who don't request
> huge alignment) this (theoretically) risks running into the low 1Mb.
> I see two options - either add a comment clarifying that an entry
> will never cross the 1Mb boundary (and hence the issue can't
> occur), or limit the alignment (by way of ASSERT()) to PAGE_SIZE
> (in which case no significant harm would result from crossing the
> boundary).

I don't mind adding the ASSERT, but I don't see how this can risk running into
the low 1MB. If entry->addr < MB(1) the entry is skipped. If an entry crosses
the 1Mb boundary it will certainly have entry->addr < 1Mb.

> > +static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
> > +{
> > +    p2m_type_t p2mt;
> > +    uint32_t rc, *ident_pt;
> > +    uint8_t *tss;
> > +    mfn_t mfn;
> > +    paddr_t gaddr;
> > +
> > +    /*
> > +     * Steal some space from the last RAM region below 4GB and use it to
> > +     * store the real-mode TSS.
> > +     */
> > +    if ( !pvh_steal_ram(d, HVM_VM86_TSS_SIZE, 0, GB(4), &gaddr) &&
> 
> Please request at least 128-byte alignment here, to avoid the
> base TSS structure crossing a page boundary.

Right, this TSS is loaded while using the identity PT, so with paging enabled.

> > +         (tss = map_domain_gfn(p2m_get_hostp2m(d), _gfn(PFN_DOWN(gaddr)),
> > +                               &mfn, &p2mt, 0, &rc)) )
> > +    {
> > +        memset(tss, 0, HVM_VM86_TSS_SIZE);
> 
> What makes you assume that you've mapped all the space you've
> allocated?

Hm, right, I've just realized we don't really need to map anything here, a
hvm_copy_to_guest_phys with NULL should be enough, and then I don't need to
worry about boundaries.

> > --- a/xen/include/asm-x86/page.h
> > +++ b/xen/include/asm-x86/page.h
> > @@ -374,6 +374,18 @@ perms_strictly_increased(uint32_t old_flags, uint32_t new_flags)
> >      return ((of | (of ^ nf)) == nf);
> >  }
> >  
> > +/* Build a 32bit PSE page table using 4MB pages. */
> > +static inline void
> > +write_32bit_pse_identmap(uint32_t *l2)
> 
> Why does this need to be an inline function?

Given it's size and the low number of callers I though it would be better to
make it inline, but since this is not in any performance critical path I'm
going to remove the inlining, although I think the compiler is probably going
to do it anyway.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-14 10:10     ` Roger Pau Monne
@ 2017-02-14 10:19       ` Roger Pau Monne
  2017-02-14 10:22         ` Jan Beulich
  2017-02-14 10:20       ` Jan Beulich
  1 sibling, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2017-02-14 10:19 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, boris.ostrovsky, xen-devel

On Tue, Feb 14, 2017 at 10:10:16AM +0000, Roger Pau Monne wrote:
> On Mon, Feb 13, 2017 at 06:53:49AM -0700, Jan Beulich wrote:
> > >>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> > > --- a/xen/include/asm-x86/page.h
> > > +++ b/xen/include/asm-x86/page.h
> > > @@ -374,6 +374,18 @@ perms_strictly_increased(uint32_t old_flags, uint32_t new_flags)
> > >      return ((of | (of ^ nf)) == nf);
> > >  }
> > >  
> > > +/* Build a 32bit PSE page table using 4MB pages. */
> > > +static inline void
> > > +write_32bit_pse_identmap(uint32_t *l2)
> > 
> > Why does this need to be an inline function?
> 
> Given it's size and the low number of callers I though it would be better to
> make it inline, but since this is not in any performance critical path I'm
> going to remove the inlining, although I think the compiler is probably going
> to do it anyway.

Right, now I remember why it needs to be inline:

xen/include/asm/page.h:379:1: error: unused function 'write_32bit_pse_identmap'
      [-Werror,-Wunused-function]
write_32bit_pse_identmap(uint32_t *l2)

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-14 10:10     ` Roger Pau Monne
  2017-02-14 10:19       ` Roger Pau Monne
@ 2017-02-14 10:20       ` Jan Beulich
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2017-02-14 10:20 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 14.02.17 at 11:10, <roger.pau@citrix.com> wrote:
> On Mon, Feb 13, 2017 at 06:53:49AM -0700, Jan Beulich wrote:
>> >>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
>> > +static int __init pvh_steal_ram(struct domain *d, unsigned long size,
>> > +                                unsigned long align, paddr_t limit,
>> > +                                paddr_t *addr)
>> > +{
>> > +    unsigned int i = d->arch.nr_e820;
>> > +
>> > +    /*
>> > +     * Alignment 0 should be set to 1, so it doesn't wrap around in the
>> > +     * calculations below.
>> > +     */
>> > +    align = align ? : 1;
>> > +    while ( i-- )
>> > +    {
>> > +        struct e820entry *entry = &d->arch.e820[i];
>> > +
>> > +        if ( entry->type != E820_RAM || entry->addr + entry->size > limit ||
>> > +             entry->addr < MB(1) )
>> > +            continue;
>> > +
>> > +        *addr = (entry->addr + entry->size - size) & ~(align - 1);
>> 
>> Without taking the present callers into account (who don't request
>> huge alignment) this (theoretically) risks running into the low 1Mb.
>> I see two options - either add a comment clarifying that an entry
>> will never cross the 1Mb boundary (and hence the issue can't
>> occur), or limit the alignment (by way of ASSERT()) to PAGE_SIZE
>> (in which case no significant harm would result from crossing the
>> boundary).
> 
> I don't mind adding the ASSERT, but I don't see how this can risk running into
> the low 1MB. If entry->addr < MB(1) the entry is skipped. If an entry crosses
> the 1Mb boundary it will certainly have entry->addr < 1Mb.

Oh, of course. I'm sorry for the noise (and the code here is fine
then as is).

>> > --- a/xen/include/asm-x86/page.h
>> > +++ b/xen/include/asm-x86/page.h
>> > @@ -374,6 +374,18 @@ perms_strictly_increased(uint32_t old_flags, uint32_t new_flags)
>> >      return ((of | (of ^ nf)) == nf);
>> >  }
>> >  
>> > +/* Build a 32bit PSE page table using 4MB pages. */
>> > +static inline void
>> > +write_32bit_pse_identmap(uint32_t *l2)
>> 
>> Why does this need to be an inline function?
> 
> Given it's size and the low number of callers I though it would be better to
> make it inline, but since this is not in any performance critical path I'm
> going to remove the inlining, although I think the compiler is probably going
> to do it anyway.

I don't think so, unless you use LTO, as the function body can be
visible in at most one of the two CUs containing a call to it.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map
  2017-02-14 10:19       ` Roger Pau Monne
@ 2017-02-14 10:22         ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2017-02-14 10:22 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 14.02.17 at 11:19, <roger.pau@citrix.com> wrote:
> On Tue, Feb 14, 2017 at 10:10:16AM +0000, Roger Pau Monne wrote:
>> On Mon, Feb 13, 2017 at 06:53:49AM -0700, Jan Beulich wrote:
>> > >>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
>> > > --- a/xen/include/asm-x86/page.h
>> > > +++ b/xen/include/asm-x86/page.h
>> > > @@ -374,6 +374,18 @@ perms_strictly_increased(uint32_t old_flags, uint32_t new_flags)
>> > >      return ((of | (of ^ nf)) == nf);
>> > >  }
>> > >  
>> > > +/* Build a 32bit PSE page table using 4MB pages. */
>> > > +static inline void
>> > > +write_32bit_pse_identmap(uint32_t *l2)
>> > 
>> > Why does this need to be an inline function?
>> 
>> Given it's size and the low number of callers I though it would be better to
>> make it inline, but since this is not in any performance critical path I'm
>> going to remove the inlining, although I think the compiler is probably going
>> to do it anyway.
> 
> Right, now I remember why it needs to be inline:
> 
> xen/include/asm/page.h:379:1: error: unused function 
> 'write_32bit_pse_identmap'
>       [-Werror,-Wunused-function]
> write_32bit_pse_identmap(uint32_t *l2)

Funny - when sending the other reply a second ago I did think about
whether this needs clarification, and I decide it's obvious enough.
But it looks like I was wrong: Not inlining the function of course means
moving its definition to a C file, keeping just the declaration here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables
  2017-02-10 12:33 ` [PATCH v6 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
@ 2017-02-22 10:08   ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2017-02-22 10:08 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, boris.ostrovsky, xen-devel

>>> On 10.02.17 at 13:33, <roger.pau@citrix.com> wrote:
> Changes since v5:
>  - s/hvm_copy_to_guest_phys_vcpu/hvm_copy_to_guest_phys/.
>  - Move pvh_add_mem_range to previous patch.
>  - Add a comment regarding the current limitation to only 1 emulated IO APIC.
>  - s/dom0_max_vcpus()/max_vcpus/ in pvh_setup_acpi_madt.
>  - Cast structures to void when assigning.
>  - Declare banned_tables with the initconst annotation.
>  - Expand some comments messages.
>  - Initialize the RSDP local variable.
>  - Only provide x2APIC entries in the MADT.

I've just checked the system I had mentioned earlier, and indeed
there's a mixture of x2APIC and LAPIC entries there, but no two
entries for the same CPU / APIC ID. So I guess as long as we're
convinced PVH guests are fine with finding just x2APIC entries we
can keep it that way, and if later we find the need to add LAPIC
entries, we may still do so. Of course this implies that PVH guests
must not assume to only find x2APIC entries. Hence
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-02-22 10:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-10 12:33 [PATCH v6 0/7] Initial PVHv2 Dom0 support Roger Pau Monne
2017-02-10 12:33 ` [PATCH v6 1/7] xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests Roger Pau Monne
2017-02-13 13:09   ` Jan Beulich
2017-02-13 14:22     ` Roger Pau Monne
2017-02-13 14:33       ` Jan Beulich
2017-02-13 14:48         ` Roger Pau Monne
2017-02-10 12:33 ` [PATCH v6 2/7] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
2017-02-10 13:32   ` Andrew Cooper
2017-02-13 13:12   ` Jan Beulich
2017-02-10 12:33 ` [PATCH v6 3/7] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
2017-02-13 13:53   ` Jan Beulich
2017-02-14 10:10     ` Roger Pau Monne
2017-02-14 10:19       ` Roger Pau Monne
2017-02-14 10:22         ` Jan Beulich
2017-02-14 10:20       ` Jan Beulich
2017-02-10 12:33 ` [PATCH v6 4/7] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
2017-02-10 14:34   ` Ian Jackson
2017-02-13 12:07     ` Roger Pau Monne
2017-02-10 12:33 ` [PATCH v6 5/7] x86/PVHv2: fix dom0_max_vcpus so it's capped to HVM_MAX_VCPUS for PVHv2 Dom0 Roger Pau Monne
2017-02-13 13:57   ` Jan Beulich
2017-02-10 12:33 ` [PATCH v6 6/7] xen/x86: Setup PVHv2 Dom0 CPUs Roger Pau Monne
2017-02-13 13:59   ` Jan Beulich
2017-02-10 12:33 ` [PATCH v6 7/7] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
2017-02-22 10:08   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.